1

Say I have 4096 bytes of data that I want to save in a file. It should take exactly one page and can be flushed only once. However, if I do it using multiple calls to kernel i.e. write piece by piece, the kernel might interrupt my writing to flush the current content to the disk and then when I finish writing it would flush agian. So to make sure it doesn't waste its flushes I would have to make only one call to the kernel. That would be write. Does using write once guarantee me that the page will be marked dirty only once?

man pages offer some MAP_UNINITIALIZED that I could use together with MAP_FIXED to map some prewritten page to the file and then msync to write it to the file, but they say it's only available on embedded devices.

Kernel source code reveals that write translates to sys_write whish translates to my_syscall3 which is a macro with assembler code that only fills some registers and writes a single line "syscall\n" I have no clue how to investigate the source code further to find the answer to my question.

EDIT: Now I see the misunderstanding. When I say flush, I (wrongly) don't really mean a flush. I mean a flush to kernel's cache. That place from where the kernel picks pages to flush. I.e. I want to mark the page as dirty only once. That is precisely what I want. I don't want to mark the page as dirty then kernel updates, sets the page as not dirty then I write more and mark it as dirty again. I want to write atomically. Is write atomic with respect to dirtying pages?

16
  • Can you describe what "atomic in terms of dirtying pages" would mean? Commented Nov 15, 2022 at 1:11
  • And again, what is the problem you're solving. It seems like you want to optimize away updates that are free. Commented Nov 15, 2022 at 1:12
  • That the state of durtyness doesn't change while write is operating. Specifically, that if I open,write(once),close then the state of dirtyness doesn'y change from clean to dirty twice. Commented Nov 15, 2022 at 1:13
  • How can something change from clean to dirty if it's dirty to begin with? And what would be the overhead here? Commented Nov 15, 2022 at 1:13
  • 1
    But you still don't have anything to solve. You actually just think that these things should be one flush, where in reality the desire to write it data soon will be needed to be balanced with the desire to cache everything for as long as possible. And you've really not shown any attempt at clarifying why that trade-off wouldn't work for you, even after being repeatedly told that you cannot influence it from user land Commented Nov 15, 2022 at 8:38

1 Answer 1

0

same man page:

A successful return from write() does not make any guarantee that data has been committed to disk.

There's not necessarily be any one flush, and as a userland process you also can't know the necessities of the storage system (512 B blocks on a hard drive, written consecutively to a file but due to fragmentations to different ends of the disk? Network file system with a timeout? LVM duplicated across 256 disks?) to make assessments on how many physical interactions with the storage medium will be necessary to commit the written data to storage.

The only thing that is guaranteed is that if your file system makes POSIX guarantees (so, any of the "UNIXy" file systems like XFS, NFS, …), then the moment write returns, the data just written must be visible to reads after.


Without knowing the reason you ask: I'm also not quite sure why you'd worry about avoiding flushes – filesystems, unless specifically told not to, will cache and not flush data to disk all the time, but in efficient bulks. So, it's fairly probable that neither your 4 KB nor smaller writes would instantly kick off a write, unless the medium is active but idling away, anyways.

17
  • When I said "in one flush" I was constraining the ambient to my file and one disk. If it's ever written to the disk then there has been at least one flush. I am asking about guarantee that it is not more than 1 flush. Agian, I am not going into specificities of different filesystems and different hardware. I am concerned only about abstract communication with the kernel. All that the kernel can tell is whether my page is dirty or not. I am essentially asking whether a page can become dirty while write call is still operating, or only after the call has returned. Also,... Commented Nov 15, 2022 at 0:20
  • Also, the man page doesn't enlist regular files under examples of when write can return less than requested bytes: "The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setr-limit(2)), or the call was interrupted by a signal handler after having written less than count bytes. (See also pipe(7).)" The file is being rewritten so there is surely enough free space and 4096 is certainly less than RLIMIT_FSIZE. I can also not handle any signals on my thread. Commented Nov 15, 2022 at 0:21
  • I am not using direct IO, so how the kernel flushes is not up to me, I only want to only once mark the page as dirty and let the kernel flush it however it thinks is the best. That part should be up to me. Commented Nov 15, 2022 at 0:26
  • 1
    As said, no such guarantee about any specific number of flushes. If you want the kernel to just commit data to disk at is own discretion: just use one or multiple write, that's what the kernel does anyway. But you seem to be attributing special properties to your page size, which I don't seem to be able to recognize. Commented Nov 15, 2022 at 0:45
  • 1
    Let me ask again: what is the actual problem you think you can solve could you interfere directly with the write caching from userland and stop flushing of pages? What's the advantage? Commented Nov 15, 2022 at 0:54

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.