What is OOM Killer
- OOM Killer is special process invoked by kernel when system is critically low on memory. This occurs when processes consume large amount of memory and system requires more memory for its own processes.
- When process starts, it requests block of memory from kernel. This initial request is usually much more than process actually needs. However, since kernel is smart thanks to Linus Torvalds, it knows processes tend to grab as much memory as they like, so kernel puts more “fake” memory for processes to use. For example, if system has 8GB of real physical memory, kernel can allocate 9GB for processes to use.
When is OOM Killer invoked?
- OOM Killer is invoked when system is low on memory. Solution for overpopulated memory is OOM Killer which, when called, reviews all running processes and kills one or more of them (based on oom_score file) in order to free up system memory and keep system running.
How OOM Killer assigns oom_score?
First off, let’s understand how oom_score is calculated. So, each running process has a file in
/proc/$PID/oom_score. When times come for OOM Killer to rule, it needs to see which process uses most memory to kill it. OOM Killer will go to each process directory from
/proc/$PID/ and search for
oom_score file and see its rate. The one with largest value will get killed to free up system memory. But, there’s more…
How oom_score number is calculated?
Now, what is this number and how it is calculated? For example, if process is using 1G out of 2G of total system memory, oom_score is calculated as 10 x percent of memory used by process which results to 10 x 50% = 500. If set to 1000, process is using all memory because of 10 x 100% = 1000 and this puts process in first row to be terminated. Root owned processes have value of 30 reduced from this number as they are more privileged.
Can we do something to prevent OOM from killing a process?
Yeah, apparently, apart from oom_score file which is automatically generated by kernel based on how much memory that process uses, there is also a file
/proc/$PID/oom_score_adj which is used to fine-tune OOM score. You can easily add large negative number to this file and ensure that process never gets killed by OOM Killer. The values in this file vary from –1000 to 1000. If you assign -1000, that process can use 100% of memory and still be skipped by OOM Killer. On the other hand, if you put 1000 to it, OOM Killer will keep killing the process even if it uses 1% of memory.
Furthermore, there is another file
/proc/$PID/oom_adj which is similar, if not same as oom_score_adj. This file accepts values from -16 to 15. Also, this file has magic value of -17 which, when set, tells this process should never be killed.
Troubleshooting why OOM Killer killed a process
Step 1) Check system log:
When invoked, OOM Killer leaves breadcrumbs in system logs, so check there:
# dmesg | egrep -i "killed process"
Step 2) Take a look at syslog:
# grep -i "out of memory" /var/log/kern.log host kernel: Out of Memory: Killed process 2592 (mysql). # grep -i "kill" /var/log/syslog <process> invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Here, “Killed” means process was killed with SIGKILL (-9) signal, which is good sign that OOM Killer was called.
Step 3) Inspect atop daily file:
# atop -r /var/log/atop/atop_YYYYMMDD
Disable OOM Killer
# sysctl vm.overcommit_memory=2