Why does simple embedded scheduler timing drift when last_run is updated after task execution?

Question

I'm working with a simple non-preemptive task scheduler on a microcontroller The goal is to run several tasks periodically based on their defined rates.

I'm currently studying embedded systems and have learned about the bare-metal scheduler. I don't quite understand this part. Why can my function be implemented normally when this last_run is assigned a value first?However, after simply changing the sequence, my function couldn't be realized normally.

Here's the core scheduling loop concept:

// Assuming Scheduler_Task is an array of structs like: // typedef struct { // void (*task_func)(void); // Pointer to the task function // unsigned long int rate_ms; // Desired period in milliseconds // unsigned long int last_run; // Timestamp of the last run start // } Task_t; // extern Task_t Scheduler_Task[]; // extern int task_num; // extern unsigned long int uwTick; // System millisecond tick timer for (int i = 0; i < task_num; i++) { unsigned long int now_time = uwTick; // Get current time // Check if it's time to run the task if (now_time >= (Scheduler_Task[i].last_run + Scheduler_Task[i].rate_ms)) { // ------------ VERSION 1: Working Order ------------ Scheduler_Task[i].last_run = now_time; // Update last run time FIRST Scheduler_Task[i].task_func(); // Execute task SECOND /* // ------------ VERSION 2: Problematic Order ------------ Scheduler_Task[i].task_func(); // Execute task FIRST Scheduler_Task[i].last_run = now_time; // Update last run time SECOND */ } }

Observation:

Version 1 (Update last_run Before Task): This version seems to work as expected. Tasks run approximately at their interval over the long term.

Version 2 (Update last_run After Task): When I swap the order as shown in Version 2, I observe timing drift ,The MCU run error, the function display error, such Seg, and Voltage. Tasks tend to run less frequently than their specified . The actual interval between task starts gradually becomes longer than, and the error accumulates over time, especially if takes a noticeable amount of time to execute.

I understand that takes a non-zero execution time. However, I'm struggling to pinpoint exactly why updating after the task execution (Version 2) causes this accumulating timing drift.

Conversely, how does updating before the task (Version 1) effectively prevent or compensate for this drift, ensuring the task scheduling stays closer to the intended interval between starts? It seems counter-intuitive to me that updating before somehow accounts for the execution time better than updating after.

Could someone explain the precise mechanism behind this behavior in the context of calculating the next execution time based on?

The core of my single-chip microcomputer is 8051, produced by WCH, with the model number IAP15F2K1S2. uwTick is a timer interrupt that increments by 1 per millisecond. Recently, I have been learning embedded systems and have some doubts about the scheduler part. That is, after swapping the positions of the above code, the function cannot run normally. I think the meanings of these two pieces of code are the same. The value of last_run is always assigned by now_time. However, in the if judgment, the value of now_time does not change. There is no difference in the order of assignment. But the function cannot be implemented normally before and after swapping. I asked the ai, but I did not understand the ai's answer.

(You should also check the generated assembly code, each way. I've some concerns that would be dispelled or confirmed based upon what you see in the generated assembly code. It's just a small worry right now.) What is the update rate for the timer you have? Is it just every 1 ms? — periblepsis
– periblepsis, Commented Apr 21 at 7:08
@RussellMcMahon The current now_time is fetched before the execution in both cases. The result for last_run should be identical regardless of if task is executed before or after setting now_time. Maybe it gets clogged by stack overwrite or something. — Justme
– Justme, Commented Apr 21 at 7:29
The uwTick is a private static volatile variable inside HAL library and thus it should not be possible to access it directly. But it might depend on HAL version. Please explain which STM32 MCU you use, which HAL version and how you enable access to the said variable and why you are not using the getter function HAL_GetTick() ? — Justme
– Justme, Commented Apr 21 at 10:18
It appears from light reading today that the STM32 HAL permits the timer interrupt period to be set as desired. In this case, I'm assuming that hasn't been altered and it is ticking along at 1 ms. The fact that @qww can link to the uwTick itself strongly violates principles espoused within the source code about data-hiding. They even go to the trouble of providing public structures with obfuscated entries to hide their internal structure details while still providing a way for users to find out the size of those structures. There is a GetTick function that should be used for uwTick. — periblepsis
– periblepsis, Commented Apr 22 at 18:43
Also, as @brhans has pointed out (and I was earlier thinking and writing, but edited out and instead just asked for the assembly output), uwTick is directly modified by code called by (or directly, depending on the version) the timer interrupt code. As such, if it is to be accessed directly, it must be declared volatile. That's not done in the source code I read, likely because they never intended the variable to be directly read by user's source code. Again, this is a violation of the rules they have expressed. GetTick is the function that they intended for user code. — periblepsis
– periblepsis, Commented Apr 22 at 18:47

periblepsis · Accepted Answer · 2025-04-26 19:52:37Z

just a comment -- you need to say and write more

You haven't been very interactive here. And there are some questions outstanding. In particular, showing the assembly code output for the two cases which exhibit different behavior. That would be very helpful to see and likely would explain a lot. For now, there is just too little information from you to help nail this down to a root cause.

delta list/queue

Since you are looking for a cooperative scheduler and since I mentioned a delta queue or delta list to you, I thought I'd talk about that part. I'll assume that you can somehow debug and/or work out a method that allows the following function to be written:

long int last= (long int) *(volatile unsigned long int *) &uwTick; long int deltatime( void ) { long int curr= (long int) *(volatile unsigned long int *) &uwTick; long int diff= curr - last; last= curr; return diff; }

If you can get something akin to that working correctly, providing values of 0 or larger (can be more than 1), then I can use that for a simple polling loop to provide what I think you are shooting for. It's more code. But it is more versatile.

Let's start with three tasks. The first task is to be executed every 19 milliseconds. The second task is to be executed every 15 milliseconds. And the third task, every 27 milliseconds. Also, I'll assume that all three are to be scheduled to run in the first millisecond, together. After that, they diverge and rejoin each other per their individual time periods, as appropriate.

void task1(void) { insert(0, 19); printf(" T1"); } void task2(void) { insert(1, 15); printf(" T2"); } void task3(void) { insert(2, 27); printf(" T3"); }

Note that they are responsible for re-inserting themselves into the queue with the time delay they want, relative to the "current moment" they are executed. These routines may take much longer than 1 millisecond, if they want. That's okay. The scheduler won't miss a beat.

However, there may be some delay beyond the 1 millisecond period. If so, then that delay should impact deltatime shown above, causing it to yield a value larger than 1 because of the excess delay involved in running your tasks at the same instant of time.

Next is the list of the above tasks. I'm using a simple array that can be allocated statically, dynamically, or pretty much any way you please. The idea depends on the fact that you know, at compile time, the number of tasks. If that's not the case, if you need an extensible queue, then I'll leave the implementation of it up to you. But for my example case, I'll assume you know you have exactly three tasks.

I also need two more elements in the list, so a total of five in this example. One is for the queue HEAD and the other is for the queue TAIL. This wastes a little bit of data for simpler code. Again, you can do it differently. But this is how I'm handling it here.

typedef struct mytask_s { long int delta; int next; int prev; void (*run)(void); } mytask_t; mytask_t tasks[] = { { 0, 0, 0, &task1 }, /* T1 is tasks[0] */ { 0, 0, 0, &task2 }, /* T2 is tasks[1] */ { 0, 0, 0, &task3 }, /* T3 is tasks[2] */ { 0, 0, 0, 0 }, /* queue HEAD is tasks[3] */ { 0, 0, 0, 0 } /* queue TAIL is tasks[4] */ }; #define QHEAD ((int) (sizeof(tasks)/sizeof(tasks[0])-2)) #define QTAIL ((int) (sizeof(tasks)/sizeof(tasks[0])-1))

With the insert code here:

void insert(int taskID, long int delta) { int prev = QHEAD, next = tasks[QHEAD].next; for (; next != QTAIL && tasks[next].delta <= delta; prev = next, next = tasks[next].next) delta -= tasks[next].delta; tasks[taskID].next = next; tasks[taskID].prev = prev; tasks[taskID].delta = delta; tasks[next].prev = tasks[prev].next = taskID; if (next != QTAIL) tasks[next].delta -= delta; return; }

There's not much more to all this:

int main(int argc, char* argv[]) { /* initialize empty queue */ tasks[QHEAD].next = tasks[QHEAD].prev = QTAIL; tasks[QTAIL].next = tasks[QTAIL].prev = QHEAD; /* insert each of the three tasks into their initial positions */ /* this may be done in any order -- not just the one shown */ insert(0, 1); /* next tick, task 0 */ insert(1, 1); /* next tick, task 1 */ insert(2, 1); /* next tick, task 2 */ long int timestamp = 0; printf("time: %ld\n", timestamp); printqueue(timestamp); for (;;) { long int delta = deltatime(); if (delta < 1) continue; timestamp += delta; int next = tasks[QHEAD].next; if (next != QTAIL) do { long int residual = tasks[next].delta - delta; if (residual > 0) { tasks[next].delta = residual; break; } printf("\ntime: %ld: ", timestamp); do { /* remove the current task from the queue */ int nextup = tasks[next].next; tasks[nextup].prev = tasks[next].prev; tasks[tasks[next].prev].next = nextup; /* run the current task */ tasks[next].run(); /* move to the next possible task */ next = nextup; } while (next != QTAIL && tasks[next].delta == 0); /* there still may be some more time to account for */ /* in the newly recomputed queue -- so check on that */ delta = -residual; printf("\n"); printqueue(timestamp); } while (next != QTAIL && delta > 0); /* all time passed has been accounted for */ } printf("time: %ld\n", timestamp); printqueue(timestamp); return 0; }

The printqueue code is just for diagnostics (along with some of the above code that uses printf):

void printqueue(long int timestamp) { for (int i = tasks[QHEAD].next; i != QTAIL; i = tasks[i].next) printf(" T%d: %ld\n", i+1, tasks[i].delta); return; }

example run

If I set things up so that deltatime() just returns 1 all the time (the clock advances smoothly on the assumption that all of the tasks can be performed together in less than 1 millisecond) then I get the following output:

time: 0 T1: 1 T2: 0 T3: 0 time: 1: T1 T2 T3 T2: 15 T1: 4 T3: 8 time: 16: T2 T1: 4 T3: 8 T2: 3 time: 20: T1 T3: 8 T2: 3 T1: 8 time: 28: T3 T2: 3 T1: 8 T3: 16 time: 31: T2 T1: 8 T2: 7 T3: 9 time: 39: T1 T2: 7 T3: 9 T1: 3 time: 46: T2 T3: 9 T1: 3 T2: 3 time: 55: T3 T1: 3 T2: 3 T3: 21 time: 58: T1 T2: 3 T1: 16 T3: 5 time: 61: T2 T2: 15 T1: 1 T3: 5 time: 76: T2 T1: 1 T3: 5 T2: 9 time: 77: T1 T3: 5 T2: 9 T1: 5 time: 82: T3 T2: 9 T1: 5 T3: 13 time: 91: T2 T1: 5 T2: 10 T3: 3 time: 96: T1 T2: 10 T3: 3 T1: 6 time: 106: T2 T3: 3 T1: 6 T2: 6 time: 109: T3 T1: 6 T2: 6 T3: 15 time: 115: T1 T2: 6 T1: 13 T3: 2 time: 121: T2 T1: 13 T3: 2 T2: 0 time: 134: T1 T3: 2 T2: 0 T1: 17 time: 136: T3 T2 T2: 15 T1: 2 T3: 10 . . .

For example, you'd expect to see that T1 executes at time 1, time 20, etc.

And that's what you see:

T1: 1, 20, 39, 58, 77, 96, 115, 134 ... T2: 1, 16, 31, 46, 61, 76, 91, 106, 121, 136 ... T3: 1, 28, 55, 82, 109, 136 ...

All of the above are appropriate. And they do not drift.

(Note: If you look at the timestamp shown and then look at the ordered numbers in the queue displayed in the diagnostic immediately following the timestamp and any tasks that were run, you will see that the timestamp plus the first value provides you with the next timestamp. Also, the timestamp plus the first two values gives you the timestamp following that one. And the timestamp plus the all three values gives to the timestamp after that one. These are deltas relative to the current moment and they are ordered.)

side note 1

One nice thing is that you can allow each of the tasks to perform their own rescheduling, as I do, in the first few lines of code (if known then) or later. Just so long as it is performed before the task exits. That frees the scheduling code itself from having to worry about that part of it.

On the other hand, you could add another value to the task structure itself and let the scheduling code use that value, instead, and keep the rescheduling code out of the tasks, themselves.

Either way is fine.

side note 2

Also take note that the above method does NOT require looking at all the tasks. Only the first task in the queue is examined. The reason is that the insertion routine ensures that tasks after the first task in the queue only have the time remaining after the task before it. So in the above example, when I first inserted all three to start at time 1, the queue would have a delta of 1 for the first entry in the queue, a delta of 0 for the second, and a delta of 0 for the third. That's because when the first task starts, it's 1 value will have expired and then the scheduler will run all three of them at once (anything after the first queue entry that has a delta of 0 is run at the same time as the first entry is run.) This saves looping over all the tasks all the time. Only the first entry needs to be examined, because it is known a priori that all the rest in the queue is to be executed at the same time, or else later. The deltas keep track of by how much.

complex tasks

Finally, although I started this out by using a fixed-size array with a known set of tasks, the fact is that the above works perfectly well if the array is instead the maximum number of tasks and you have fewer and varying numbers of tasks.

For example, to illustrate just how useful this scheme can be, in one measurement system where I applied the above idea, I knew that I had 100 milliseconds per measurement to be emitted precisely. The system spec was 10 measurements per second.

However, each measurement consisted of a varying number of decay curves, their \$\tau\$ or duration being predicted by the prior measurement. And each decay reading period had to be preceded by an illumination period that was also variable in duration. In the worst case, I knew I needed 10 milliseconds of dead time to allow the prior decay curves to settle out enough so that I could make a measurement offset reading, as well.

So the total cycle consisted of N [illumination + decay cycles + short-delay] cycles, followed by a dead period waiting for offset to arrive, followed by an offset measurement, followed by a measurement calculation cycle and DAC update, followed by a very short dead period, before the next cycle began again.

Each measurement cycle task would schedule two things: itself 100 milliseconds from `now' and would set up a remaining time counter for the current cycle and then schedule the first [illumination + decay cycles + short-delay] cycle to occur almost immediately.

Each [illumination + decay cycles + short-delay] cycle would start the illumination and then schedule the decay ADC reading task, as well as scheduling the next [illumination + decay cycles + short-delay] cycle, if the remaining time counter permitted it. Otherwise, it would schedule the [offset delay + offset ADC reading series + calculation/DAC update] task at an exact period before the next measurement cycle.

Once the DAC update happened, there would only be one task left -- the next cycle to start exactly 100 milliseconds after the last one.

So things can be quite complex, if you need them to be. This is a versatile and yet also relatively simple system that can perform almost any complex task needed. It does all this without requiring all the complexity and difficulties of a pre-emptive scheduler.

summary

Of course, you still must be sure that the tasks to be completed can be completed in the time you allow. Nothing will solve the problem where your tasks require more time than is available!!!

Also, none of the above solves your problem with the timer. Until we see the assembly code for both cases and some more details, perhaps, I think the root cause remains to be seen. But if you can solve that problem, perhaps consider the above as the next step. It's easy and yet quite flexible.

Take a moment and think about your future. Not just this project, but future ones too. Having a good conceptual process is worth its weight in gold. Is this worth the effort to fully understand? I think YES it is. It's not complicated and it can solve very complex problems and do it with simplicity and ease. Does it solve every possible need? Of course not! Pre-emptive systems offer a whole new array of valuable features that you cannot get here. But they also bring along many difficulties, too.

This is a tool like any other. But I can definitely say that this is a tool worth learning about and deepening into your soul. Pay the price. It will give back many dividends in exchange.

Dr. Comer

All of the above comes from work by Dr. Comer. If you want to read more about this example, and more, see his first textbook from 1983 on XINU:

\$\quad\quad\$

You can also read more in his later editions. But to be frank about it, this first edition is really his very best on this topic. The second edition requires more "work" in reading.

Here's a figure discussing delta lists taken from page 238 of his 2nd edition (which I'd also recommend, but not to replace getting the first edition):

Dr. Douglas Comer also has introduced another important concept that is only discussed in detail in his first 1983 book shown above. That is the idea of memory marking. I cannot recommend more highly learning about both these ideas -- the delta list and memory marking from his first edition on XINU. The 1983 book is a gold mine. It's also very accessible and easy to read and understand. I cannot more highly recommend that book.

Sadly, his later books had to remove important sections to make room for more modern ones. I can't fault him. That's where the world was (and still is) going. But it means that his later books on XINU don't provide, directly, the important information on these two key concepts that he introduced to the world. He didn't argue, instead saying he was impressed that I'd noticed these specific two contributions in particular: delta lists and memory marking. I'm sure that's what caused him to invite me to serve with him for a short period in my life. In my opinion, these two set his work apart from the work of others in the field. And only the 1983 edition provides the fuller view.

(He also writes better for those interested in self-paced learning about operating systems than does Dr. Tannenbaum, who now operates, with others, a website called electoral-vote.)

@brhans It's the core of 8051. My English is not very good, so the content above was generated by AI — qw w
– qw w, Commented Apr 24 at 0:16
@qww You may want to move your comment to brhans so that is located immediately underneath their comment under your question rather than here underneath my answer to you. Think about it and see if you agree or not. — periblepsis
– periblepsis, Commented Apr 24 at 1:44
Latest version of Comer's book appears to be currently aborning. I found a full copy if edition 2 online but will not post a link as I imagine that it is bootleg. — Russell McMahon
– Russell McMahon ♦, Commented Apr 28 at 13:11
re "Nothing will solve the problem where your tasks require more time than is available!!! " -->. Usually :-) . cf Apollo 1202 error. ~= "I do not have enough time to do everything so I've missed out the less important stuff" :-). || Cause here - quite fascinating wiki2.org/en/… || I long ago wrote a round robin multitasker that under some circumstances could run out of time occasionally. More by luck than careful design it ran "well enough for the task" when occasionally late getting to do things. — Russell McMahon
– Russell McMahon ♦, Commented Apr 28 at 13:12

Stack Exchange Network

Why does simple embedded scheduler timing drift when last_run is updated after task execution?

1 Answer 1

just a comment -- you need to say and write more

delta list/queue

example run

side note 1

side note 2

complex tasks

summary

Dr. Comer

Hot Network Questions

Why does simple embedded scheduler timing drift when last_run is updated after task execution?

1 Answer 1

just a comment -- you need to say and write more

delta list/queue

example run

side note 1

side note 2

complex tasks

summary

Dr. Comer

Related

Hot Network Questions