7

We have a Linux server running Debian 4.0.5 (Kernel 4.0.0-2) with 32G RAM installed and 16G Swap configured. The system uses lxc containers for compartmentalisation, but that shouldn't matter here. The issue exists inside and out of different containers.

Here's a typical free -h:

 total used free shared buff/cache available Mem: 28G 2.1G 25G 15M 936M 26G Swap: 15G 1.4G 14G 

/proc/meminfo has

Committed_AS: 12951172 kB 

So there's plenty of free memory, even if everything allocated was actually used at once. However, the system is instantly paging even running processes.

This is most notable with Gitlab, a Rails application using Unicorn: newly forked Unicorn workers are instantly swapped, and when a request comes in need to be read from disk at ~1400kB/s (data from iotop) and runs into timeouts (30s for now, to get it restarted in time. No normal request should take more than 5s) before it gets loaded into memory completely, thus getting instantly killed. Note that this is just an example, I have seen this happen to redis, amavis, postgres, mysql, java(openjdk) and others.

The system is otherwise in a low-load situation with about 5% CPU utilization and a loadavg around 2 (on 8 cores).

What we tried (in no particular order):

  1. swapoff -a: fails at about 800M still swapped
  2. Reducing swappiness (in steps) using sysctl vm.swappiness=NN. This seems to have no impact at all, we went down to 0% and still exactly the same behaviour exists
  3. Stopping non-essential services (Gitlab, a Jetty-based webapp...), freeing ca. 8G of committed-but-not-mapped memory and bringing Committed_AS down to about 5G. No change at all.
  4. Clearing system caches using sync && echo 3 > /proc/sys/vm/drop_caches. This frees up memory, but does nothing to the swap situation.
  5. Combinations of the above

Restarting the machine to completely disable swap via fstab as a test is not really an option, as some services have availability issues and need planned downtimes, not "poking around"... and also we don't really want to disable swap as a fallback.

I don't see why there is any swapping occuring here. Any ideas what may be going on?


This problem has existed for a while now, but it showed up first during a period of high IO load (long background data processing task), so I can't pinpoint a specific event. This task is done for some days and the problem persists, hence this question.

6
  • How did you decrease the swappiness? Did you edit /etc/sysctl.conf or did you use the sysctl command. Only the latter comes into effect instantly. Commented Aug 23, 2015 at 17:17
  • You used the correct steps. How did you do them? I suspect an error in your implementation of steps 1-3. Could you edit your answer and clarify? And could you also post the output of: for szFile in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 "\t" $3}END{ print "" }' $szFile done | sort --key 2 --numeric --reverse | head --lines=20 and a sysctl --all | grep swap and leave a comment @Fabby Commented Aug 23, 2015 at 18:22
  • @Fabby I've already had a script for that, here's that: Sample ruby2.1 processes are all Unicorn. Other processes further down in terms of paged size are more php, apache, nginx, redis etc. All of these are regularly active. The only matching sysctl is vm.swappiness, currently set to 1. the_Seppi: using sysctl; also edited my question. Commented Aug 23, 2015 at 20:33
  • The only thing I can think of is that you use both disk intensive and memory intensive apps and that the system is preferring to have memory ready to allocate to the disk cache... I'm not familiar with Rails, but can you call C library functions? Because there are some that allow you to tell the system "do not swap this memory" and I think that's the direction you need to take. Commented Aug 23, 2015 at 20:44
  • That may be a possibility, but shouldn't explicitly dropping all caches give at lease some short improvement then? Commented Aug 25, 2015 at 11:44

1 Answer 1

9

Remember how I said:

The system uses lxc containers for compartmentalisation, but that shouldn't matter here.

Well, turns out it did matter. Or rather, the cgroups at the heart of lxc matter.

The host machine only sees reboots for kernel upgrades. So, what were the last kernels used? 3.19, replaced by 4.0.5 2 months ago and yesterday with 4.1.3. And what happened yesterday? Processes getting memkilled left, right and center. Checking /var/log/kern.log, the affected processes were in cgroups with 512M memory. Wait, 512M? That can't be right (when the expected requirement is around 4G!). As it turns out, this is exactly what we configured in the lxc configs when setting this all up months ago.

So, what happened is that 3.19 completely ignored the memory limit for cgroups; 4.0.5 always paged if the cgroup required more than allowed (this is the core issue of this question) and only 4.1.3 does a full memkiller-sweep.

The swappiness of the host system had no influence on this, since it never was anywhere near being out of physical memory.

The solution:

For a temporary change, you can directly modify the cgroup, for example for an lxc container named box1 the cgroup is called lxc/box1 and you may execute (as root in the host machine):

$ echo 8G > /sys/fs/cgroup/memory/lxc/box1/memory.limit_in_bytes 

The permanent solution is to correctly configure the container in /var/lb/lxc/...

lxc.cgroup.memory.limit_in_bytes = 8G 

Moral of the story: always check your configuration. Even if you think it can't possibly be the issue (and takes a different bug/inconsistency in the kernel to actually fail).

1
  • Good one! An edit and an upvote... I wouldn't have thought about this at all, so good to know that my blabbering pushed you in the right direction! Commented Aug 26, 2015 at 21:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.