0

I have a directory which consists of multiple sub-directories. Each subdirectory has a tar.gz file which on untar gives two files namely R1.fastq and R2.fastq. I want to untar the tar file and compress R1.fastq and R2.fastq immediately after piping,and save the output of each sub-directories into another SINGLE directory.

 sourcedir=/sdir destdir=/ddir for f in $sourcedir/*; do fbase=${f##*/} echo "Inside $fbase" tar -xvf $f/*.gz |gzip -n9 $f/*.fastq > $destdir/ done 

I wrote the above code but it goes until the echo command.

1
  • 1
    You cannot save the files from all the subdirectories to the same directory if the files have the same names. You should give an example of the (extracted) files in at least two subdirectories and should show the desired result. Commented Jan 27, 2015 at 2:50

2 Answers 2

0

The $f/*.fastq doesn't match anything by the time that line gets invoked. But the major problem is that tar doesn't extract to stdout, it creates files (there would need to be some way to tell the 'receiving' on the other end of the pipe that one file ends and the other starts, and there isn't at least not with tar).

So when using tar you have to let tar create the files and then gzip the created files afterwards.

If the files fit in memory you can write a quick script in Python using tarfile and gzip modules and never let the uncompressed code touch the disc. IIRC the individual files are completely extracted, so for this to work they have to fit in memory.

4
  • R1.fastq and R2.fastq are both in 1 tar.gz file,and should be decompressed on untar,and compressed into individual .gz files.the above code does not work Commented Jan 26, 2015 at 20:48
  • @Ron That is contrary to what the question says. There you state that you decompress. Please update your question so that is correctly describes what you want to happen. Commented Jan 26, 2015 at 20:54
  • my bad.I have updated the question.It was a mistake on my part Commented Jan 26, 2015 at 20:58
  • i have updated the code which I want but its not running Commented Jan 26, 2015 at 21:25
0

The general approach is to read the file names from the archive (unless they are always the same) and then have tar extract only one file at a time. GNU tar has the option --to-stdout which prevents it from writing a file. Without that you would need a FIFO for each file name.

> tar -tf subdir.tar.gz R1.fastq R1.fastq tar -xf subdir.tar.gz --to-stdout R1.fastq | gzip -n9 >/path/to/R1.fastq.gz tar -xf subdir.tar.gz --to-stdout R2.fastq | gzip -n9 >/path/to/R2.fastq.gz 
6
  • i didn't get how the same command tar -xf subdir.tar.gz would --to-stdout to different files.Do i need to do pattern matching while extracting? Commented Jan 27, 2015 at 16:11
  • @Ron It's not the same. Look better. Commented Jan 27, 2015 at 17:58
  • @Ron, look closer. The tar commands are indeed going to stdout, but that is then piped into gzip, which throws the results into separate .gz files. Commented Jan 27, 2015 at 17:58
  • tar -tf UNCID_2193585.70c7f658-5208-4b3f-9767-27a3b06b8ddc.120409_UNC14-SN744_0226_AC0M8NACXX_4_ACTTGA.tar.gz 120409_UNC14-SN744_0226_AC0M8NACXX_ACTTGA_L004_2.fastq 120409_UNC14-SN744_0226_AC0M8NACXX_ACTTGA_L004_1.fastq Commented Jan 27, 2015 at 18:54
  • For example The above shows one of the directories and it gives the name of the two files,now I should redirect this --to--stdout by piping? and pipe again then to gzip it ? Commented Jan 27, 2015 at 18:56

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.