Note that CFS utilizes a periodic timer interrupt, which means it can only make decisions at fixed time intervals. This interrupt goes off frequently (e.g., every 1 ms), giving CFS a chance to wake up and determine if the current job has reached the end of its run.
This confuses me for the following reason:
- CFS gives each runnable task a time slice (say ~6–48 ms).
- The periodic timer tick is more frequent than the time slice (e.g., 1 ms tick vs. ~6-48 ms slice).
- If a program voluntarily gives up the CPU (blocks on I/O, calls
sched_yield, etc.), it will trap into the kernel and the scheduler can run immediately, without waiting for the next timer tick.
Intuitively, if the time slice is 10 ms, I might expect the kernel to set a timer to interrupt at the end of that slice. If the program yields early, the kernel can schedule a new process and reset the timer to interrupt after 10 ms. This would save the context switch overheads, even if it might not be a lot?