System is LUbuntu 20 LTS. parallel -V returns 20161222. Comparision of raw serial and parallel decompression performance on Dual-Core i3-4130 with hyper threading (4 threads):
time ls -S /var/lib/apt/lists/*lz4 | parallel --ungroup lz4 -dc > /dev/null 1.461s time lz4 -dmc /var/lib/apt/lists/*lz4 > /dev/null 3.069s The real use case is as follows (workarond without time ls -S /var/lib/apt/lists/*lz4 | parallel --ungroup lz4 -dc > /dev/null
1.461s
time lz4 line-dmc /var/lib/apt/lists/*lz4 > /dev/nullbuffer):
time lz4 -dmc /var/lib/apt/lists/*Contents* | grep -F $'/parallel\t' | sort -u usr/bin/parallel universe/utils/moreutils,universe/utils/parallel usr/bin/parallel universe/utils/parallel usr/lib/R/library/parallel/R/parallel universe/math/r-base-core usr/lib/cups/backend/parallel net/cups-filters usr/share/doc-base/parallel universe/utils/parallel real 0m5.349s user 0m3.970s sys 0m5.839s time ls -S /var/lib/apt/lists/*Contents* | parallel lz4 -dc '{}' \| grep -F "\$'/parallel\t'" | sort -u (same output as above) real 0m3.669s user 0m5.888s sys 0m7.676s This parallelizes not only the decompression but also the postprocessing and is the better solution here where the work is not 99 % in the first part of the pipe.
3.069sBut this approach to parallelize the complete pipe is not always possible, so the general question remains open for cases where output of first step is not very small and thus streaming is wanted.