12

I have a directory full of .tsv files and I want to run a grep command on each of them to pull out a certain group of text lines and then save it to an associated text file with a similar file name. So for example, if I was grepping just one of the files, my grep command looks like this:

grep -h 8-K 2008-QTR1.tsv > 2008Q1.txt 

But I have a list of tsv files that look like:

2008-QTR1.tsv 2008-QTR2.tsv 2008-QTR3.tsv 2008-QTR4.tsv 2009-QTR1.tsv 2009-QTR2.tsv 2009-QTR3.tsv ... 

And after grepping they need to be stored as:

2008Q1.txt 2008Q2.txt 2008Q3.txt 2008Q4.txt 2009Q1.txt 2009Q2.txt 2009Q3.txt 

Any thoughts?

2
  • The grep pattern is always the same? Commented Nov 7, 2019 at 17:10
  • Yes, the grep pattern is always that same, thank you! Commented Nov 8, 2019 at 20:06

4 Answers 4

12

In ksh93/bash/zsh, with a simple for loop and parameter expansion:

for f in *-QTR*.tsv do grep 8-K < "$f" > "${f:0:4}"Q"${f:8:1}".txt done 

This runs the grep on one file at a time (where that list of files is generated from a wildcard pattern that requires "-QTR" to exist in the filename as well as a ".tsv" ending to the filename), redirecting the output to a carefully-constructed filename based on:

  • the first four characters of the filename -- the year
  • the letter Q
  • the 9th character of the filename -- the quarter
2
  • <"$f" is unnecessary here and could be just "$f" Commented Nov 7, 2019 at 19:22
  • 2
    @D.BenKnoble Stéphane edited that in, but it's useful if you want consistent error messages (from your shell, versus from the various utilities); see also unix.stackexchange.com/a/458268/117549 Commented Nov 7, 2019 at 19:32
5

The obligatory POSIX sh variant:

#! /bin/sh - ret=0 for file in [[:digit:]][[:digit:]][[:digit:]][[:digit:]]-QTR[1234].tsv; do base=${file%.tsv} grep 8-K < "$file" > "${base%%-*}Q${base##*-QTR}".txt || ret=$? done exit "$ret" 
3
  • Why <"$f"? Redirection not necessary there. Commented Nov 7, 2019 at 19:23
  • 1
    @D.BenKnoble, using redirections has many advantages over passing the file as argument. Note that it's also generally less work, so I'd say it's more the passing the file as argument that is not necessary here (as in this case, we don't need grep to know the name of the file, just its contents). See When should I use input redirection? Commented Nov 7, 2019 at 19:43
  • Thank you so much!!! Commented Nov 8, 2019 at 20:11
2

Another option

for f in 200{8..9}-QTR{1..4}.tsv; do grep "pattern" $f > $(sed "s/[-RTtsv]*//g" <<< $f)txt; done 

Walkthrough: Set up an expansion that creates a list of your filenames

200{8..9}-QTR{1..4}.tsv 

expands to

2008-QTR1.tsv 2008-QTR2.tsv 2008-QTR3.tsv 2008-QTR4.tsv 2009-QTR1.tsv 2009-QTR2.tsv 2009-QTR3.tsv 2009-QTR4.tsv 

and to do every year and quarter to date would be

20{08..19}-QTR{1..4}.tsv 

Iterate over the list for..do..done, extract the pattern you are looking for from the file

grep "pattern" $f 

and redirect to the new filename formed by deleting the unwanted characters with sed and adding the txt suffix

$(sed "s/[-RTtsv]*//g" <<< $f)txt 

or

$(sed "s/[-RT]*//g" <<< ${f%%.*}.txt) 
5
  • It should be noted that this brace-expansion idea hard-codes the expected filenames; it would not pick up newer or older files, and would complain of missing files in the range. Not a deficiency, except the OP showed a file name listing ending in "..." Commented Nov 7, 2019 at 19:36
  • Very true, but so does @stephanechazelas to an extent. What this doesn't do is assume that every tsv file is required, nor that the substrings extracted will conform to OP's pattern. Bananannanan-QTRanana.tsv wont, that's for sure. What it does do is allow OP to process a selected subset of known files, Swings and roundabouts. Commented Nov 7, 2019 at 19:46
  • Indeed, the three of us so far came up with different approaches; I just enjoy adding a little explanation about how it works so that the OP (or future readers) understand why it works so that they know if they can adapt it to their situation. Commented Nov 7, 2019 at 19:48
  • Anyone for golf? Commented Nov 7, 2019 at 19:49
  • Thank you so much!!! Commented Nov 8, 2019 at 20:11
0

If you want to avoid an explicit loop, there is the following solution. Someone will maybe be able to improve it. It looks something like this.

ls -1 *.tsv | xargs -n1 -I'{}' bash -c 'f="{}";grep 8-K $f > ${f//[^0-9Q]/}.txt' 
  1. ls just lists the files you want to process
  2. xargs process each of these files, on by one (-n1)
  3. a bash shell is launched to be able to process the strings (cf point 5)
  4. Sets the filename to variable $f
  5. ${f//[^0-9Q]/} removes all the characters you don't want in the .txt filenames (so this is specific to your example)

Pros: - Simple one liner

Cons: - A bash process is started for each file processed

Maybe is there a similar solution without using bash, but I don't know one (for example, eval shouldn't work in this context)

2
  • Alternative with awk: grep -H 8-K *.tsv | awk -F ':' -v OFS=':' '{fn=gensub(/[^0-9Q]/,"","g",$1) ".txt";for(i=1;i<NF;i++){$i=$(i+1)};NF--;print $0 > fn}', but we start here to another world that bash, with awk. Commented Nov 10, 2019 at 22:02
  • Pure awk, something like: awk '/8-K/{print > (gensub(/[^0-9Q]/,"","g",FILENAME) ".txt")}' *.tsv Commented Nov 10, 2019 at 22:15

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.