I would like to use GNU Parallel to process a huge .gz or .bz2 file.
I know I can do:
bzcat huge.bz2 | parallel --pipe ... But it would be nice if there was a way similar to --pipe-part that can read multiple parts of the file in parallel. One option is to decompress the file:
bzcat huge.bz2 > huge parallel --pipe-part -a huge ... but huge.bz2 is huge, and I would much prefer decompressing it multiple times than storing it uncompressed.
--pipe-part(three minuses)? (the man page says it is, but I think of the man page as a proxy for your authority on GNU parallel)Gzip::RandomAccessin case you want to add that into GNUparallel. Some more modern compression formats make random access easier.paralleloption, so I assume you want some sort of workaround that allows you to pipe parts of the file? How about splitting the file and then processing the pieces in parallel?