Applying function to first column and inserting it as second

Question

So I have been searching like crazy, but I still have not found a satisfactory solution. I have some output which looks like the following

kdeconnec 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353

I want to apply a function called exec_path_from_process_id to each value in the second column, and insert it as the second column. Resulting in the following. The exact formating (alignment) is not important, as long as it is aligned.

kdeconnec /usr/lib/kdeconnectd 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b /opt/vivaldi/vivaldi-bin 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron /usr/lib/electron/electron 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask /usr/bin/python3.10 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox /usr/lib/firefox/firefox 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python /usr/bin/python3.10 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium /usr/lib/chromium/chromium 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353 kioslave5 /usr/lib/kf5/kioslave5 133514 1000 6u IPv4 499063 0t0 TCP 192.168.0.17:54238->84.208.4.225:443 (ESTABLISHED)

My current code is a hot mess, but I got it working at least. The only restraint is that it has to work on bash 3.2+

listeners=$( lsof -Pnl +M -i | awk -F" " '!_[$1]++' | tail -n +2 ) function exec_path_from_process_id () { local pid="${1}" path=$(readlink -f /proc/"$pid"/exe) if [ -z "${path}" ]; then path=$(awk '{print $(NF)}' <<< $(ls -alF /proc/"$pid"/exe)) fi echo ${path:-null} } pids=($(awk '{ print $2 }' <<< "$listeners")) IFS=$'\n' read -rd '' -a listeners_array <<< "$listeners" IFS=$'\n' read -rd '' -a paths <<< $(for i in "${pids[@]}"; do exec_path_from_process_id "$i"; done) for i in "${!pids[@]}"; do row="${listeners_array[i]}" row=$(awk -v r="${paths[i]}" '{ print $1 " " r " " $2 " " $3 " " $4 " " $5 " " $6 " " $7 " " $8 " " $9 " " $10}' <<< $row) printf '%s\n' "${row[@]}" done | column -t

@ibuprofen For instance I get it for flatpak and sometimes snap packages — N3buchadnezzar
– N3buchadnezzar, Commented Aug 28, 2022 at 10:45
YMMV with piping the output to column -t to get alignment since that will split the input at any field that contains blanks, e.g. a file path. You can see in your question it's already splitting the final field on each line into 2 separate fields, e.g. <127.0.0.1:3000 (LISTEN)> becomes <127.0.0.1:3000> <(LISTEN)>. — Ed Morton
– Ed Morton, Commented Aug 28, 2022 at 12:49

ibuprofen · Accepted Answer · 2022-08-28 11:08:20Z

3

Perhaps something like:

lsof -Pnl +M -i | awk ' # Use: NR > 1 to skip header NR > 1 && !x[$1]++ { # realpath -m # (no path components need exist or be a directory) cmd = "realpath -m /proc/"$2"/exe" cmd | getline path close(cmd) # We can edit field $2 and print $0 $2 = path" "$2 print $0 }' | column -t

The line cmd | getline path executes the command cmd and reads the output into variable path. The command is not closed unless one do close(expression), hence I have it in a variable.

edited Aug 28, 2022 at 11:08

answered Aug 28, 2022 at 10:55

ibuprofen

3,0601 gold badge18 silver badges34 bronze badges

This looks really promissing! what does the line c | getline p do?

N3buchadnezzar
– N3buchadnezzar

2022-08-28 11:00:49 +00:00
Commented Aug 28, 2022 at 11:00
@N3buchadnezzar expanded on it a little. See also for example: stackoverflow.com/q/1960895/3342816 , unix.stackexchange.com/a/139559/140633 etc.

ibuprofen
– ibuprofen

2022-08-28 11:10:32 +00:00
Commented Aug 28, 2022 at 11:10
2

cmd = "realpath -m /proc/"$2"/exe" should be cmd = "realpath -m \047/proc/"$2"/exe\047" or the contents of $2 will be exposed to the shell for globbing, word splitting, and filename expansion. cmd | getline path should be if ( (cmd | getline path) > 0 ) { ... or similar to protect against failures in cmd | getline, see awk.freeshell.org/AllAboutGetline.

Ed Morton
– Ed Morton

2022-08-28 12:08:07 +00:00
Commented Aug 28, 2022 at 12:08
2

Regarding !x[$1]++ = an array used in that context is commonly, idiomatically named seen instead of x for clarity, i.e. !seen[$1]++. Changing a field as in $2 = path" "$2 will change all of the white space on the line to individual blank chars which may be undesirable. I know the OP is outputing to column -t but that will corrupt the output if a path contains spaces. print $0 can be written as just print since $0 is what is printed by default.

Ed Morton
– Ed Morton

2022-08-28 12:10:34 +00:00
Commented Aug 28, 2022 at 12:10
1

@EdMorton Thanks for the pointers.

ibuprofen
– ibuprofen

2022-08-30 05:46:29 +00:00
Commented Aug 30, 2022 at 5:46

| Show 1 more comment

Ed Morton · Accepted Answer · 2022-08-28 13:45:28Z

You said you don't care about formatting as long as the fields are aligned so, just pick a width that'll be wide enough for your needs and then:

$ while read -r a pid b; do printf "%-12s%-10s%10s %s\n" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b" done < <(lsof -Pnl +M -i) kdeconnec <5> 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b <5> 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron <5> 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask <6> 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox <6> 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python <6> 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium <7> 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353

The above assumes your first column doesn't contain any spaces.

Obviously just change <$(wc -c <<<"$pid")> to whatever the real command is you need to run and the first %-10s to be whatever max width string that command could output. If you REALLY feel there is no max value you could use for that width, let us know as then it'd take a 2-pass approach - 1 to produce the output and then 2 to format the output. If you're happy with using column -t for the formatting then it'd be (replace file with <(lsof -Pnl +M -i) which obviously I don't really have available):

$ while read -r a pid b; do printf "%s %s %s %s\n" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b" done < file | column -t kdeconnec <5> 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b <5> 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron <5> 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask <6> 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox <6> 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python <6> 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium <7> 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353

but that would fail if any part of your line contained spaces, e.g. the output of the command you're running on the pid.

Since you asked, here's a 2-pass approach:

Instead of outputting text that has spaces separating fields and newlines separating records as above, produce output that uses newlines to separate fields and NUL to separate records:

while read -r a pid b; do printf "%s\n%s\n%s\n%s\0" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"; done < file

Write an awk script that reads NUL-separated records containing newline-separated fields, calculate the max width of each field when reading the input and output each field in that width when printing the output, recombining the fields into single lines:

$ while read -r a pid b; do printf "%s\n%s\n%s\n%s\0" "$a" "<$(wc -c <<<"$pid")>" "$pid" "$b"; done < file | awk -v RS='\0' -F'\n' ' { recs[NR]=$0; for (i=1; i<=NF; i++) wids[i]=(length($i)>wids[i] ? length($i) : wids[i]) } END { for (n=1; n<=NR; n++) { $0=recs[n]; for (i=1;i<=NF;i++) printf "%-*s%s", wids[i], $i, (i<NF ? OFS : ORS) } } ' kdeconnec <5> 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b <5> 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron <5> 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask <6> 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox <6> 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python <6> 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium <7> 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353

That requires an awk that can read NUL-separated input, e.g. GNU awk. It assumes that none of your path names or other fields can contain newlines.

If you REALLY wanted to do all of the above in a single awk script, that means awk would have to spin off a subshell every time your external command is called which would be slow and you'd have to ensure you get the quoting right (see http://awk.freeshell.org/AllAboutGetline) but here you go, assuming no spaces that you care about retaining within fields in your input but non-newline spaces in paths would be fine:

$ awk ' { recs[NR] = $0 for (i=1; i<=NF; i++) { lgth = length($i) wids[i] = ( lgth > wids[i] ? lgth : wids[i] ) } cmd = "wc -c <<<\047" $2 "\047" paths[NR] = ( (cmd | getline line) > 0 ? line : "N/A" ) close(cmd) lgth = length(paths[NR]) pathWid = ( lgth > pathWid ? lgth : pathWid ) } END { for (n=1; n<=NR; n++) { $0 = recs[n] for (i=1; i<=NF; i++) { if ( i == 2 ) { printf "%-*s%s", pathWid, paths[n], OFS } printf "%-*s%s", wids[i], $i, (i<NF ? OFS : ORS) } } } ' < file kdeconnec 5 1625 1000 11u IPv6 414426 0t0 UDP *:1716 vivaldi-b 5 1937 1000 263u IPv4 440390 0t0 UDP 224.0.0.251:5353 electron 5 9522 1000 23u IPv4 414465 0t0 TCP 192.168.0.17:58692->157.240.194.18:443 (ESTABLISHED) flask 6 27084 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) firefox 6 27094 1000 99u IPv4 425877 0t0 TCP 192.168.0.17:34114->54.191.222.112:443 (ESTABLISHED) python 6 36425 1000 3u IPv4 109532 0t0 TCP 127.0.0.1:3000 (LISTEN) chromium 7 110937 1000 130u IPv4 439461 0t0 UDP 224.0.0.251:5353

Hmm, would normal file paths ever contain spaces? I do not min the end of the output getting aligned. — N3buchadnezzar
– N3buchadnezzar, Commented Aug 28, 2022 at 12:59
@N3buchadnezzar yes, file paths often contain spaces, assuming they won't and so not handling them properly when they do is a common point of failure for shell scripts. — Ed Morton
– Ed Morton, Commented Aug 28, 2022 at 13:02
Hmm, interesting. I would be interested in the two pass solution then. As the number of elements is quite small the difference in speed would be negligible. — N3buchadnezzar
– N3buchadnezzar, Commented Aug 28, 2022 at 13:06
OK I posted a 2-pass approach. I'm using a shell loop to run your shell command (which is what shell is good for), then an awk script to format the output (which is what awk is good for). — Ed Morton
– Ed Morton, Commented Aug 28, 2022 at 13:19

Stack Exchange Network

Applying function to first column and inserting it as second

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Applying function to first column and inserting it as second

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions