3

I have these long IDs which consist of four sections:

AKJHGFGUIKL,OIUYT,KJHBTYUI,98765434567 RTYUIKHGFGH,TYUJI,TGHYJKJKLJKL,6789876 ETRYTUUI,YTYUIL,UIOKJHGFGH,34567898766 

I want to put the numbers in new line and remove the third comma.

AKJHGFGUIKL,OIUYT,KJHBTYUI 98765434567 RTYUIKHGFGH,TYUJI,TGHYJKJKLJKL 6789876 ETRYTUUI,YTYUIL,UIOKJHGFGH 34567898766 

How can I do this?

3
  • 3
    Hello Simon Q. What have you tried so far? Commented Oct 11, 2018 at 20:08
  • 1
    @roaima I tried sed and awk but failed, I spent four hours on this! Commented Oct 12, 2018 at 0:05
  • @SimonQ, 4 hours, ouch! Anyway, for next time please remember that it helps other learners (among whom are many good people also) to see the failed code. It helps shows them what to avoid. Commented Oct 12, 2018 at 11:15

6 Answers 6

12

Using GNU sed:

sed "s/,/\n/3; G" file ETRYTUUI,YTYUIL,UIOKJHGFGH 34567898766 

Explanation💡:

sed stands for stream editor..sed has the following syntax:

substitute/match(or pattern)/replacement/position file 

In the above command, s substitute the third , by a new line \n.

G is to append a new line to the contents of the pattern space and then append the contents of the hold space to the pattern space.

0
6
awk -F, -vOFS=, '{print $1,$2,$3; print $4; print ""}' file 

will produce your desired output

4

A few Perl approaches:

$ perl -pe 's/,([^,]+)$/\n$1\n/' file AKJHGFGUIKL,OIUYT,KJHBTYUI 98765434567 RTYUIKHGFGH,TYUJI,TGHYJKJKLJKL 6789876 ETRYTUUI,YTYUIL,UIOKJHGFGH 34567898766 

The -p means "read the input file line by line and print each line after applying the script given by -e to it". The s/foo/bar/ is the substitution operator which will replace foo with bar. Here, we are matching a comma followed by one or more non-comma characters ([^,]+) until the end of the line ($). The parentheses around the ([^,]+) will "capture" whatever is matched so we can refer to it as $1 on the right hand side of the operator. Therefore, this will replace the text after the last comma with a newline, then the matched text and then another newline.

If you can't be sure the third comma is the last one, you can do:

perl -pe 's/([^,]+,){3}\K(.+)/\n$2\n/' file 

or

perl -pe 's/(.+?,.+?,.+?),(.+)/$1\n$2\n/' file 

And here are some more, just for fun:

perl -pe 's/([^,]+,){3}\K(.+)/\n$2\n/' file perl -F, -pe '$k=pop(@F); $_=join(",", @F)."\n$k\n"' file perl -F, -le 'print join ",", @F[0..2],"\n@F[3..$#F]\n"' file 
1

Alternative less succinct sed code:

sed 's/,\([^,]*\)/\n\1\n/3' file 

...which can be used if the hold buffer were needed for some other purpose, (supposing some additional requirement). If you need portable code (the above is gnu sed syntax) use literal newlines in the RHS:

sed 's/,\([^,]*\)/\ \1\ /3' file 
0
awk -F, '{print $1, $2, $3,"\n"$4}' 
4
  • 2
    You'd need to set OFS=, as well in order to preserve the other delimiters Commented Oct 11, 2018 at 17:00
  • @steeldriver but it will not remove the third comma as requested? Commented Oct 11, 2018 at 17:03
  • ... then change the last part to a simple concatenation $3 "\n" $4 Commented Oct 11, 2018 at 17:04
  • This removes all commas, replacing them with spaces and doesn't add the extra newline between records. Please make sure your output is the same as the desired output shown in the question. Commented Oct 12, 2018 at 8:08
0

A more portable sed solution:

<infile sed -E 'G;s/(.*),(.*)(\n)/\1\3\2\3/' 

Awk solution(s):

<infile awk -F, '{printf("%s,%s,%s\n%s\n\n",$1,$2,$3,$4)}' <infile gawk '{match($0,/(.*),(.*)/,a);print(a[1]"\n"a[2]"\n")}' 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.