11

I tend to use pipelines in my bash scripts over process substitution in most situations, especially in cases of using multiple sets of commands as it seems more readable to do ... | ... | ... over ... < <(... < <(...)).

I'm wondering though why using process substitution is much faster in some situations than using a pipeline.

To test this, I timed two scripts using 10000 iterations of the same attached commands with one using a pipeline and another using process substitution.

Scripts:

pipeline.bash:

for i in {1..10000}; do echo foo bar | while read; do echo $REPLY >/dev/null done done 

proc-sub.bash

for i in {1..10000}; do while read; do echo $REPLY >/dev/null done < <(echo foo bar) done 

Results:

~$ time ./pipeline.bash real 0m17.678s user 0m14.666s sys 0m14.807s ~$ time ./proc-sub.bash real 0m8.479s user 0m4.649s sys 0m6.358s 

I know that pipelines create a sub process whereas process substitution creates a named pipe or some file in /dev/fd, but am unclear about how those differences impact performance.

3
  • fork() is pretty inexpensive (depending on OS), but doing it 10000 will have a performance impact. Commented May 3, 2014 at 4:51
  • 5
    If you worry about performance, the first thing to do would be to avoid bash. Your code is zsh syntax now also supported by bash and ksh93. You'll notice bash is 2 to 3 times as slow as those. Commented May 3, 2014 at 7:30
  • Very true @StephaneChazelas, zsh runs these scripts with nearly equal speed in at least half the time it takes bash to run them. Commented May 4, 2014 at 1:16

1 Answer 1

11

Doing same strace, you can see the differences:

With pipe:

$ strace -c ./pipe.sh % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 57.89 0.103005 5 20000 clone 40.81 0.072616 2 30000 10000 wait4 0.58 0.001037 0 120008 rt_sigprocmask 0.40 0.000711 0 10000 pipe 

With proc-sub:

$ strace -c ./procsub.sh % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 85.08 0.045502 5 10000 clone 3.25 0.001736 0 90329 322 read 2.12 0.001133 0 20009 open 2.03 0.001086 0 50001 dup2 

With above statistics, you can see pipe create more child processes (clone syscall) and spending many times to wait child process (wait4 syscall) to finish for parent process to continue executing.

Process substitution is not. It can read directly from child processes. Process substitution is performed at the same time with parameter and variable expansion, the command in Process Substitution run in background. From bash manpage:

Process Substitution Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of naming open files. It takes the form of <(list) or >(list). The process list is run with its input or out‐ put connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to the current command as the result of the expansion. If the >(list) form is used, writing to the file will pro‐ vide input for list. If the <(list) form is used, the file passed as an argument should be read to obtain the output of list. When available, process substitution is performed simultaneously with parameter and variable expansion, command substitution, and arithmetic expansion. 

Update

Doing strace with statistics from child processes:

With pipe:

$ strace -fqc ./pipe.sh % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 70.76 0.215739 7 30000 10000 wait4 28.04 0.085490 4 20000 clone 0.78 0.002374 0 220008 rt_sigprocmask 0.17 0.000516 0 110009 20000 close 0.15 0.000456 0 10000 pipe 

With proc-sub:

$ strace -fqc ./procsub.sh % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 52.38 0.033977 3 10000 clone 32.24 0.020913 0 96070 6063 read 5.24 0.003398 0 20009 open 2.34 0.001521 0 110003 10001 fcntl 1.87 0.001210 0 100009 close 
3
  • 2
    You'd need strace -fqc to have relevant statistics here as what bash does in the children matters as well. Commented May 3, 2014 at 7:36
  • @StephaneChazelas: Updated! But I think we still can see the differences without statistics from child processes. Commented May 3, 2014 at 12:43
  • I was unaware of strace's existence, thanks. Definitely helps see what causes the difference. Commented May 4, 2014 at 1:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.