Revisions to Joining entries based off of column using awk/join

added 40 characters in body

edited Mar 7, 2018 at 11:22

85.7k
31
234
263

You can't do this with a single join invocation.
You'llWith your approach you have to use join twice e.g. using your code here with minor adjustments(or change your approach to do it with a single join invocation) :

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:

awk -F'|' 'NR==FNR{z[$1"|"$2]=$4;next}{x[$1"|"$2]=$4} END{for (j in x){if (!(j in z)){print j, "0", x[j]}}; for (i in z){if (i in x){print i, z[i], x[i]} else {print i, z[i], "0"}} }' OFS="|" file1 file2

You can't do this with a single join invocation.
You'll have to use join twice e.g. using your code here with minor adjustments:

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:

awk -F'|' 'NR==FNR{z[$1"|"$2]=$4;next}{x[$1"|"$2]=$4} END{for (j in x){if (!(j in z)){print j, "0", x[j]}}; for (i in z){if (i in x){print i, z[i], x[i]} else {print i, z[i], "0"}} }' OFS="|" file1 file2

With your approach you have to use join twice (or change your approach to do it with a single join invocation) :

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:

awk -F'|' 'NR==FNR{z[$1"|"$2]=$4;next}{x[$1"|"$2]=$4} END{for (j in x){if (!(j in z)){print j, "0", x[j]}}; for (i in z){if (i in x){print i, z[i], x[i]} else {print i, z[i], "0"}} }' OFS="|" file1 file2

added 441 characters in body

Source Link

edited Mar 7, 2018 at 0:28

don_crissti

85.7k
31
234
263

You can't do this with a single join invocation.
You'll have to use join twice e.g. using your code here with minor adjustments:

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:

awk -F'|' 'NR==FNR{z[$1"|"$2]=$4;next}{x[$1"|"$2]=$4} END{for (j in x){if (!(j in z)){print j, "0", x[j]}}; for (i in z){if (i in x){print i, z[i], x[i]} else {print i, z[i], "0"}} }' OFS="|" file1 file2

You can't do this with a single join invocation.
You'll have to use join twice e.g. using your code here with minor adjustments:

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can't do this with a single join invocation.
You'll have to use join twice e.g. using your code here with minor adjustments:

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

You can do the same with a single awk invocation, storing $4 in two arrays indexed by e.g. $1|$2 and then in the END block iterating over each array indices, comparing them and printing accordingly:

awk -F'|' 'NR==FNR{z[$1"|"$2]=$4;next}{x[$1"|"$2]=$4} END{for (j in x){if (!(j in z)){print j, "0", x[j]}}; for (i in z){if (i in x){print i, z[i], x[i]} else {print i, z[i], "0"}} }' OFS="|" file1 file2

Source Link

answered Mar 6, 2018 at 23:24

don_crissti

85.7k
31
234
263

You can't do this with a single join invocation.
You'll have to use join twice e.g. using your code here with minor adjustments:

print the common lines and the unpairable lines from file1 with join -t'|' -e0 -a1 -o 1.2,1.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)
print the unpairable lines from file2 with join -t'|' -e0 -v2 -o 2.2,2.3,1.5,2.5 <(<file1 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1) <(<file2 awk -F'|' '{print $1"-"$2"|"$0}' | sort -t'|' -k1,1)

Stack Exchange Network

Return to Answer