Have a long-lived process (couple months) that is handling a huge number of connections and doing work, yet most statistics on it show it doing zero work. Here's an strace output, showing a bunch of syscalls, yet 0 CPU time:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 66376 write 0.00 0.000000 0 27579 close 0.00 0.000000 0 77 stat 0.00 0.000000 0 46 socket 0.00 0.000000 0 46 connect 0.00 0.000000 0 21360 347 accept 0.00 0.000000 0 140039 sendto 0.00 0.000000 0 169718 21567 recvfrom 0.00 0.000000 0 21013 getsockname 0.00 0.000000 0 21013 getpeername 0.00 0.000000 0 42026 setsockopt 0.00 0.000000 0 46 getsockopt 0.00 0.000000 0 63177 fcntl 0.00 0.000000 0 68858 gettimeofday 0.00 0.000000 0 1861 clock_gettime 0.00 0.000000 0 620 epoll_wait 0.00 0.000000 0 96667 epoll_ctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 740522 21914 total The utime and stime fields for this process are non-zero, but completely static, and not near of any int overflow limits that I recognize:
$ awk '{print "utime:" $14, "ctime:" $15}' /proc/128618/stat utime:78891786 ctime:403631467 What's bizarre is I see this exact same symptom on several other boxes, with this same process. These are all running on kernel 3.13.0. This process is single-threaded, and not doing anything fancy (pgbouncer).
Why might a process continue to do work without having any of these statistics reflect that work occurring?