Skip to main content
replaced http://stackoverflow.com/ with https://stackoverflow.com/
Source Link

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/
Source Link

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

replaced http://programmers.stackexchange.com/ with https://softwareengineering.stackexchange.com/
Source Link

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

You can improve on the solution suggested by @pskocik by reducing the number of calls to read. There are a lot of calls to read BUFSIZ chunks from a 1Gb file. The usual approach to doing this is by increasing the buffer size:

  • just for fun, try increasing the buffer-size by a factor of 10. Or 100. On my Debian 7, BUFSIZ is 8192. With the original program, that's 120 thousand read operations. You can probably afford a 1Mb input buffer to reduce that by a factor of 100.
  • for a more optimal approach, applications may allocate a buffer as large as the file, requiring a single read operation. That works well enough for "small" files (though some readers have more than 1Gb on their machine).
  • finally, you could experiment with memory-mapped I/O, which handles the allocation as such.

When benchmarking the various approaches, you might keep in mind that some systems (such as Linux) use most of your machine's unused memory as a disk cache. A while back (almost 20 years ago, mentioned in the vile FAQ), I was puzzled by unexpectedly good results from a (not very good) paging algorithm which I had developed to handle low-memory conditions in a text editor. It was explained to me that it ran fast because the program was working from the memory buffers used to read the file, and that only if the file were re-read or written would there be a difference in speed.

The same applies to mmap (in another case still on my to-do list to incorporate into an FAQ, a developer reported very good results in a scenario where the disk cache was the actual reason for improvement). Developing benchmarks takes time and care to analyze the reasons for the good (or bad) performance.

Further reading:

clarify
Source Link
Thomas Dickey
  • 79.3k
  • 9
  • 189
  • 290
Loading
clarify
Source Link
Thomas Dickey
  • 79.3k
  • 9
  • 189
  • 290
Loading
Source Link
Thomas Dickey
  • 79.3k
  • 9
  • 189
  • 290
Loading