Skip to main content
don't reinvent the wheel; much less do so badly
Source Link
terdon
  • 252.9k
  • 69
  • 481
  • 720

And here's an awk approach:

$ awk -F'[ ,]' -v RS='\n\n'RS= '{n=split($0,a,"\n"); print $2,n-1}' file [email protected]  1 [email protected]  2 [email protected]  0 [email protected] 2 <-- WRONG! See text1 

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS='\n\n'" set the record separator to be \n\nRS=, so a: enables paragraph mode, where records are separated by one or more consecutive blank lines.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.

Note that this is getting the last record wrong. That's because the file doesn't end with two consecutive newline characters but only with the default trailing newline that all text files require. As a result, this is also counted for the last record. You can get around this by adding an extra newline:

$ printf '%s\n\n' "$(cat file)" | awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

Or, you print only after moving to the next line, and then print the last one specially:

$ awk -F'[ ,]' -v RS='\n\n' '{ if(last){print last}; n=split($0,a,"\n"); last=$2" "n-1} END{print $2,n-2 }' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

And here's an awk approach:

$ awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' file [email protected]  1 [email protected]  2 [email protected]  0 [email protected] 2 <-- WRONG! See text 

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS='\n\n'" set the record separator to be \n\n, so a paragraph.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.

Note that this is getting the last record wrong. That's because the file doesn't end with two consecutive newline characters but only with the default trailing newline that all text files require. As a result, this is also counted for the last record. You can get around this by adding an extra newline:

$ printf '%s\n\n' "$(cat file)" | awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

Or, you print only after moving to the next line, and then print the last one specially:

$ awk -F'[ ,]' -v RS='\n\n' '{ if(last){print last}; n=split($0,a,"\n"); last=$2" "n-1} END{print $2,n-2 }' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

And here's an awk approach:

$ awk -F'[ ,]' -v RS= '{n=split($0,a,"\n"); print $2,n-1}' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS=: enables paragraph mode, where records are separated by one or more consecutive blank lines.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.
Source Link
terdon
  • 252.9k
  • 69
  • 481
  • 720

And here's an awk approach:

$ awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 2 <-- WRONG! See text 

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS='\n\n'" set the record separator to be \n\n, so a paragraph.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.

Note that this is getting the last record wrong. That's because the file doesn't end with two consecutive newline characters but only with the default trailing newline that all text files require. As a result, this is also counted for the last record. You can get around this by adding an extra newline:

$ printf '%s\n\n' "$(cat file)" | awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

Or, you print only after moving to the next line, and then print the last one specially:

$ awk -F'[ ,]' -v RS='\n\n' '{ if(last){print last}; n=split($0,a,"\n"); last=$2" "n-1} END{print $2,n-2 }' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1