Why is appending different streams to a file safe?

Question

It's well known that redirecting standard output and error to the same file with cmd >out_err.txt 2>out_err.txt can lead to loss of data, as per the example below:

work:/tmp$ touch file.txt work:/tmp$ ls another_file.txt ls: cannot access 'another_file.txt': No such file or directory

The above is the setup code for the example. An empty file file.txt exists and another_file.txt is not a thing. In the code below, I naively redirect to out_err.txt both input and output os listing these files.

work:/tmp$ ls file.txt another_file.txt >out_err.txt 2>out_err.txt work:/tmp$ cat out_err.txt file.txt t access 'another_file.txt': No such file or directory

And we see that we lost a few characters in the error stream. However, using >> works in the sense that replicating the example would yield keep the whole output and the whole error.

Why and how does cmd >>out_err.txt 2>>out_err.txt work?

I'm not sure I understand the question. When you use > any new output will overwrite the previous output as the filename is the same. >> will not overwrite but new output will always be added after the last line of the file; i.e. file is appended. Can you please clarify, maybe even with examples, what exactly it is you're after? — Peregrino69
– Peregrino69, Commented Feb 6, 2023 at 14:05
"Why and how does cmd >>out_err.txt 2>>out_err.txt work?" – Somewhat related: Is redirection with >> equivalent to > when target file doesn't yet exist? — Kamil Maciorowski
– Kamil Maciorowski, Commented Feb 6, 2023 at 20:32

ilkkachu · Accepted Answer · 2023-02-06 20:10:09Z

Not sure it's that well known, but it happens because done like that, the two file handles are completely separate, and have independent read/write positions. Hence they can overwrite each other. (They correspond to two distinct open file descriptions, to use the technical term, which is sadly somewhat easy to confuse with the term "file descriptor".)

This only happens with foo > out.txt 2>out.txt, not with foo > out.txt 2>&1, since the latter copies the file descriptor (referring to the same open file description).

When appending, all writes go the to end of the file, as it is during the moment of the write. This is handled by the OS, atomically, so that there's no way for even another process to get in the middle. Hence, the issue from independent read/write positions is defused. (Except it might not work over NFS, that's a filesystem restriction.)

In your example, the error message ls: cannot access... is written first, at the start of the file. The write position of the stderr fd is now at the end of the file. Then the regular output of file.txt<newline> is also written, but the write position of the stdout fd is still at the start, so those 9 bytes overwrite part of the error message.

With an appending fd, that second write would go to end, regardless of anything.

Thank you, this is helpful. Can you elaborate on how the reading and writing positions work? In my ignorant eyes, it is conceivable that while appending writes to the end of the file, both streams may be being writting portions of data to the end of the file one at a time. For example, in the example I now added to the question, stdout may write fi to the end of the file, then stderr may write ls: to the end of the file (resulting in the file starting with fils:) and so on and so forth. — Sweet Shell O'Mine
– Sweet Shell O'Mine, Commented Feb 6, 2023 at 14:27
@SweetShellO'Mine, the output streams don't do anything them, it's the process making the system calls that causes the output. If ls calls write(1, "foo", 3), those three bytes get written to stdout as one unit. If there's only one process, there's no question of ordering. If there was more than one, scheduling might after how their system calls are interleaved, but the data written in one call should still go as one atomic unit. — ilkkachu
– ilkkachu, Commented Feb 6, 2023 at 20:20
(Technically, write() can return after writing just some but not all of the given data, but in practice I'd be surprised if that happens when writing to a regular file. If it does happen, it's the job of the program to try again with the rest of the data, and anyway, with just one process involved, there still wouldn't be interleaving. Unless the program goes out of its way to do that on purpose.) — ilkkachu
– ilkkachu, Commented Feb 6, 2023 at 20:23
more info about file position can be found in the lseek(2) man page — Jasen
– Jasen, Commented Feb 7, 2023 at 2:43
Minor nitpick: not all writes are atomic, the kernel only guarantees atomicity up to a certain size (I forget which constant it is off the top of my head). For many writes it won’t matter, but for large writes you will still see races — gntskn
– gntskn, Commented Feb 7, 2023 at 5:16

Jasen · Accepted Answer · 2023-02-07 02:34:10Z

Simple redirection open(2)s the file with option O_CREAT and O_TRUNC this creates an empty file and positions the fileposition at the first byte

Appending a file opens it with with the O_APPEND option, this causes a seek to the current end of file before each write operation.

from man 2 open

 O_APPEND The file is opened in append mode. Before each write(2), the file offset is positioned at the end of the file, as if with lseek(2). The modification of the file offset and the write op‐ eration are performed as a single atomic step.

In other words append is guaranteed by the kernel to not clash

Stack Exchange Network

Why is appending different streams to a file safe?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Why is appending different streams to a file safe?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions