The mighty strace has let me down. How is this possible?
time foo shows that foo takes several seconds to run ("real"), but uses negligible cpu time, both in userspace ("user") and in the kernel ("sys"). For the curious, foo is defined below.
So it spends most of its time waiting for something else, not executing CPU instructions. Normally, I can see how it is waiting in strace - i.e. what system call is blocking for a long period of time. Unfortunately this approach didn't work.
strace -ttt -T -C -w foo shows system calls, timestamped, and a summary of the (real) time spent in system calls. But this particular process showed as spending negligible overall (real) time inside system calls.
foo is actually journalctl -b -u dev-hugepages.mount. Except that I had to change the last argument must be changed to a different systemd unit each time in order to reproduce this. In other words, the delay I am investigating happenshappened the first time that I try to get the logs for any one systemd unit. It EDIT: after answering the main question, I also realized the reason I was having this problem reproducing the delay.
The time spent by this process is ana specific issue which, apparently it does not occur on all systems. https://github.com/systemd/systemd/issues/7963