I'm trying to find the "right" way to read files line-by-line.
I have been using for for line in $(cat "$FILE"); do for a while, and I really enjoy its clearness.
I know that while IFS= read -r line; do ... done < "$FILE" should be more optimized (without a subshell), but I don't like that the file is specified at the end of the loop. Also, when testing it, I encountered some weird issues with variable scopes.
Recently I found out about mapfile -t LINES < $FILE which is supposed to be super-optimized, and it looks cleaner than while read, but the performance in my tests show that it is only faster on very small files.
So, my question is - does it make any sense to use other methods, rather than for line in $(cat "$FILE"); do. The only scenario that I can imagine, where it would be slower would be reading thousands of small files in a loop. In other cases, the difference is negligible while sacrificing readability
I took files of various sizes, and a script(below) to compare
################ test-med.txt (140698 lines) ################### for line in $(cat "$FILE"); do real 0m0,924s user 0m0,812s sys 0m0,128s while IFS= read -r line; do real 0m1,328s user 0m1,113s sys 0m0,215s mapfile -t LINES < $FILE real 0m1,240s user 0m1,129s sys 0m0,111s ################ test-small.txt (180 lines) ################### for line in $(cat "$FILE"); do real 0m0,050s user 0m0,001s sys 0m0,049s while IFS= read -r line; do real 0m0,001s user 0m0,001s sys 0m0,000s mapfile -t LINES < $FILE real 0m0,011s user 0m0,006s sys 0m0,005s ################ test-tiny.txt (32 lines) ################### for line in $(cat "$FILE"); do real 0m0,050s user 0m0,000s sys 0m0,050s while IFS= read -r line; do real 0m0,000s user 0m0,000s sys 0m0,000s mapfile -t LINES < $FILE real 0m0,000s user 0m0,000s sys 0m0,000s Comparison script used:
#!/bin/bash _t1() { IFS=$'\n' for line in $(cat "$FILE"); do echo "$line" done } _t2() { while IFS= read -r line; do echo "$line" done < "$FILE" } _t3() { mapfile -t LINES < $FILE for line in "${LINES[@]}"; do echo $line done } for FILE in $(ls *.txt); do CNT=$(cat $FILE | wc -l) echo "################ $FILE ($CNT lines) ###################" echo 'for line in $(cat "$FILE"); do' time _t1 >/dev/null echo 'while IFS= read -r line; do' time _t2 >/dev/null echo 'mapfile -t LINES < $FILE' time _t3 >/dev/null done
for line in $(cat "$FILE")andwhile IFS= read -r linedo different things, so a performance comparison by itself doesn't make sense. In general, if you care about performance, you probably shouldn't use a shell, and definitely not Bash. Also, there's never any reason to use$(ls *.txt), it can only break things.for line in $(cat "$FILE"); doas it breaks when the input contains spaces and/or globbing metachars, and would skip any blank lines. Get a robust solution first and then think about performance.catshould probably do just fine, also your argument is a textbook classic example of why you DRLWFmapfileis leading you to the wrong conclusion. If you want to compare how fastmapfileworks to how fast an equivalent read-loop works you should be comparing onlymapfile -t LINES < "$FILE"towhile IFS= read -r line; do LINES+=( "$line" ); done, i.e. how fast can you populate an array from file contents, not how fast can you print the contents of a file as your current code is implementing. If you don't need an array then you wouldn't usemapfile(akareadarray) as it exists to populate an array.bash" and "performance" in the same Question is barking up the wrong tree. Interpreted languages are inherently slower.