1

Have a long-lived process (couple months) that is handling a huge number of connections and doing work, yet most statistics on it show it doing zero work. Here's an strace output, showing a bunch of syscalls, yet 0 CPU time:

% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 66376 write 0.00 0.000000 0 27579 close 0.00 0.000000 0 77 stat 0.00 0.000000 0 46 socket 0.00 0.000000 0 46 connect 0.00 0.000000 0 21360 347 accept 0.00 0.000000 0 140039 sendto 0.00 0.000000 0 169718 21567 recvfrom 0.00 0.000000 0 21013 getsockname 0.00 0.000000 0 21013 getpeername 0.00 0.000000 0 42026 setsockopt 0.00 0.000000 0 46 getsockopt 0.00 0.000000 0 63177 fcntl 0.00 0.000000 0 68858 gettimeofday 0.00 0.000000 0 1861 clock_gettime 0.00 0.000000 0 620 epoll_wait 0.00 0.000000 0 96667 epoll_ctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 740522 21914 total 

The utime and stime fields for this process are non-zero, but completely static, and not near of any int overflow limits that I recognize:

$ awk '{print "utime:" $14, "ctime:" $15}' /proc/128618/stat utime:78891786 ctime:403631467 

What's bizarre is I see this exact same symptom on several other boxes, with this same process. These are all running on kernel 3.13.0. This process is single-threaded, and not doing anything fancy (pgbouncer).

Why might a process continue to do work without having any of these statistics reflect that work occurring?

1 Answer 1

2

This is caused by a bug in the Linux kernel.

1
  • Is it possible, that kernel version is old enough... Commented May 23, 2020 at 5:01

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.