12

I am looking for a command similar to ionice and trickle and pv -L and tc - but which puts limits on IO to files.

ionice looks at block devices on the local machine. I want this to work for file I/O in general. If it does not work for encfs mounted on NFS it is useless to me.

trickle shapes the network I/O - not the file system I/O. So if my program accesses a file on encfs, it will have no clue whether this is mounted on top of NFS or a local dir.

tc is similar to trickle: It shapes network I/O - not file system I/O.

pv -L only works on pipes. I need it to work on all files that the program opens. Think Firefox: It opens files for read and write in multiple threads.

I want this:

thecommand --read 5M --write 1M firefox 

to limit firefox to read files at 5 MB/s and write at 1 MB/s.

firefox is just an example: I want it to work for any program that opens different files for both reading and writing.

I want the limitation to work no matter if firefox is accessing a local file or a file on NFS/tmpfs/encfs/SSHFS/RandomFS.

It is OK if it only works on dynamically linked programs (i.e. it is OK if it does some LD_PRELOAD magic).

Does such a beast exist?

7
  • Can you use rsync instead of cp? It has an overall rate-limit option, IDK if that works when running locally. Commented Oct 8, 2024 at 13:34
  • @PeterCordes I am not interested in copying the file. cp is just a simple example. My use would do stuff to the file (including opening other files) - not just copying it. Commented Oct 8, 2024 at 18:10
  • Are you okay with this just limiting the userspace write throughput (i.e. between the process and the pagecache)? Because that's very much not the same as the throughput to the backing device: writes will be buffered up in the pagecache (with the backing device sitting completely idle) and then background writeout will kick in and flush everything as fast as the backing device allows. This feels like potentially an XY problem (if what you actually want to control is activity of the backing device). Can you shed some light on what your end goal is? Commented Oct 8, 2024 at 19:14
  • 1
    @TooTea End goal: Do not overload my slow NAS. It is connected with 1 Gbit/s, but can really only deliver 80 Mbit/s total. So I want non-important IO to be limited so that important IO can get enough IO. The NAS is access through encfs on NFS on ZFS on LUKS. But I want this to work no matter the file system I access data through - including tmpfs. I do not want a solution that is limited to my current setup, but a general solution. Commented Oct 8, 2024 at 20:01
  • 1
    I was going to suggest piping through pv with a --rate-limit option, but that's basically this answer: unix.stackexchange.com/questions/225455/… Commented Oct 8, 2024 at 21:28

4 Answers 4

5

Crappy solution:

#define _GNU_SOURCE #include <dlfcn.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <time.h> #include <errno.h> #define LIMIT_MB_PER_SEC 1 #define LIMIT_BYTES_PER_SEC (LIMIT_MB_PER_SEC * 1024 * 1024) ssize_t (*real_read)(int fd, void *buf, size_t count) = NULL; ssize_t (*real_write)(int fd, const void *buf, size_t count) = NULL; static size_t bytes_transferred = 0; static struct timespec start_time; void init() { real_read = dlsym(RTLD_NEXT, "read"); real_write = dlsym(RTLD_NEXT, "write"); clock_gettime(CLOCK_MONOTONIC, &start_time); } void sleep_if_needed(size_t bytes) { struct timespec current_time; clock_gettime(CLOCK_MONOTONIC, &current_time); size_t elapsed_sec = current_time.tv_sec - start_time.tv_sec; size_t elapsed_nsec = current_time.tv_nsec - start_time.tv_nsec; double elapsed_time = elapsed_sec + elapsed_nsec / 1e9; if (elapsed_time >= 1.0) { // Reset for next second start_time = current_time; bytes_transferred = 0; } bytes_transferred += bytes; if (bytes_transferred > LIMIT_BYTES_PER_SEC) { double excess_time = (bytes_transferred - LIMIT_BYTES_PER_SEC) / (double)LIMIT_BYTES_PER_SEC; usleep((useconds_t)(excess_time * 1e6)); // Sleep to limit speed } } ssize_t read(int fd, void *buf, size_t count) { if (!real_read) init(); ssize_t result = real_read(fd, buf, count); if (result > 0) { sleep_if_needed(result); } return result; } ssize_t write(int fd, const void *buf, size_t count) { if (!real_write) init(); ssize_t result = real_write(fd, buf, count); if (result > 0) { sleep_if_needed(result); } return result; } 

Run:

gcc -fPIC -shared -o limitrate.so -ldl limitrate.c LD_PRELOAD=./limitrate.so time wc /tmp/test10M 

Major drawback: It does not take options for the speed, but it might be basis for another answer.

While this is a crappy solution, the core functionality pretty much does The Right Thing: It works on all files by intercepting read and write. It does not care if the files are stored locally or on a network file system. It does not require any specialized setup that would require root access.

Edit:

Now made into a small command line tool: https://git.data.coop/tange/tangetools/src/branch/master/iothrottle

5
  • 1
    You could "pass" the speed limits via environment variables. Commented Oct 8, 2024 at 9:09
  • Beware of potential race conditions, though. The use of init() here is not thread-safe. Commented Oct 8, 2024 at 16:21
  • 1
    @AndrewHenle Not just that, the whole thing isn't thread-safe at all. Both bytes_transferred and start_time will be accessed concurrently from all threads that happen to call into read/write at the same time, and specifically for start_time just using atomic accesses doesn't cut it (you would need a separate timer per thread). The solution also misses all the other common ways to do I/O, and while this is very fixable for pwrite/pread, it's going to be pretty much impossible to take mmapped files into account (and for many programs, mmap is responsible for the bulk of their IO). Commented Oct 8, 2024 at 19:26
  • Actually, "pretty much impossible" is too pessimistic, I shouldn't have written that (but can't edit it any more). You could catch file-backed mmap(), mprotect() the mapping PROT_NONE, and do the rate-limiting in a SIGSEGV handler (unprotecting the page temporarily so that the access succeeds once the faulting instruction is restarted). However, it's going to be hard to make it reliable: figure out how to re-protect the pages again after access, also catch sigaction()&Co. to stop the application from replacing the handler, plus sleeping in a signal handler isn't the greatest idea, etc. Commented Oct 8, 2024 at 19:48
  • @TooTea This would also miss any calls that do direct kernel traps - I know some implementations of fread()/fscanf()/fgets()/fwrite()/fprintf()/etc directly issue kernel traps instead of using the read() or write() library calls that can be interposed on. But to be fair, it's characterized right up front as a "Crappy solution". It is a good starting point for one that might meet some needs. Commented Oct 8, 2024 at 20:44
4

Not in a place to write a nice answer with an example, but cgroups can do that for you; systemd-run is your friend when running something in a scope that has limited resources.

2
  • The only examples I have seen with cgroups are on block-devices. So are you absolutely sure this will work on any file system - including NFS/SSHFS? Commented Oct 7, 2024 at 22:23
  • haven't tried (and as said, not in a place to try), but docs.kernel.org/admin-guide/cgroup-v2.html#io doesn't mention anything about block devices – it's IOPS and bytes-per-second limiting of IO access, not to specific block devices. Commented Oct 8, 2024 at 0:05
0

The documentation is somewhat dated, but setting up traffic control with tc might be an option.

https://lartc.org/howto/

2
  • Not useful. Program low-priority and program high-priority saves to the same dir that is a encfs on NFS. How would tc be able to tell which packets are caused by low-priority and with packets were caused by high-priority. Commented Oct 8, 2024 at 21:39
  • @OleTange as you've added to your question since I first saw it this morning maybe this isn't the right answer. One way I can think you might be able to make this possible on NFS is to have multiple IP addresses from the NFS server, which you mount to different directories to give the priority levels you want. Commented Oct 8, 2024 at 21:45
0

trickle will rate limit the user mode side. Mostly very effective

2
  • As far as I can tell, it only limits network access of a process - not file IO. In my case low-priority will access files on encfs mounted on NFS. In other words: low-priority will have no idea that it indirectly causes network access - because it thinks it is simply accessing a file. Commented Oct 8, 2024 at 21:59
  • Yes, it deals with network io in the application. It decidedly does not manage NFS network throughout. Something like tc might be able to do that Commented Oct 9, 2024 at 0:25

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.