Revisions to Merging two text files using AWK

added 35 characters in body

edited Oct 29, 2021 at 20:58

42k
17
75
118

awk '!second { seen[$1file1vals[$1 FS $2 ]=$0$2]=$0 } second { print (($1 FS $2 in seenfile1vals)?seen[$1file1vals[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1file1vals[$1 FS $2] } END{ for(x in seenfile1vals) print seen[x]file1vals[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

In the the first block !second {...}, which it only runs when it's the first file and we load the file1 into an associated array on the pair of 1^st &2^nd columns as the keys for the array.

In the second block second {...}, which it only runs when it's the second file we print the joined lines those that have matched keys in both files, otherwise we print the keys and 0s followed by rest of the columns from file2;file2; then we also delete the keys from the array with delete seen[$1file1vals[$1 FS $2] which their keys was exist in both files.

In the last blokcblock at the END, we print the remained non-matching keys related to file1file1.

awk '!second { seen[$1 FS $2 ]=$0 } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

In the the first block !second {...}, which it only runs when it's the first file and we load the file1 into an associated array on the pair of 1^st &2^nd columns as the keys for the array.

In the second block second {...}, which it only runs when it's the second file we print the joined lines those that have matched keys in both files, otherwise we print the keys and 0s followed by rest of the columns from file2; then we also delete the keys from the array with delete seen[$1 FS $2] which their keys was exist in both files.

In the last blokc at the END, we print the remained non-matching keys related to file1.

awk '!second { file1vals[$1 FS $2]=$0 } second { print (($1 FS $2 in file1vals)?file1vals[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete file1vals[$1 FS $2] } END{ for(x in file1vals) print file1vals[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

In the the first block !second {...}, which it only runs when it's the first file and we load the file1 into an associated array on the pair of 1^st &2^nd columns as the keys for the array.

In the second block second {...}, which it only runs when it's the second file we print the joined lines those that have matched keys in both files, otherwise we print the keys and 0s followed by rest of the columns from file2; then we also delete the keys from the array with delete file1vals[$1 FS $2] which their keys was exist in both files.

In the last block at the END, we print the remained non-matching keys related to file1.

added 642 characters in body

Source Link

edited Oct 29, 2021 at 18:46

αғsнιη

42k
17
75
118

awk '!second { seen[$1 FS $2 ]=$0; next]=$0 } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

In the the first block !second {...}, which it only runs when it's the first file and we load the file1 into an associated array on the pair of 1^st &2^nd columns as the keys for the array.

In the second block second {...}, which it only runs when it's the second file we print the joined lines those that have matched keys in both files, otherwise we print the keys and 0s followed by rest of the columns from file2; then we also delete the keys from the array with delete seen[$1 FS $2] which their keys was exist in both files.

In the last blokc at the END, we print the remained non-matching keys related to file1.

awk '!second { seen[$1 FS $2 ]=$0; next } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

awk '!second { seen[$1 FS $2 ]=$0 } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

In the the first block !second {...}, which it only runs when it's the first file and we load the file1 into an associated array on the pair of 1^st &2^nd columns as the keys for the array.

In the second block second {...}, which it only runs when it's the second file we print the joined lines those that have matched keys in both files, otherwise we print the keys and 0s followed by rest of the columns from file2; then we also delete the keys from the array with delete seen[$1 FS $2] which their keys was exist in both files.

In the last blokc at the END, we print the remained non-matching keys related to file1.

added 105 characters in body

Source Link

edited Oct 29, 2021 at 18:27

αғsнιη

42k
17
75
118

awk '!second { seen[$1 FS $2 ]=$0; next } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

awk '!second { seen[$1 FS $2 ]=$0; next } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

awk '!second { seen[$1 FS $2 ]=$0; next } second { print (($1 FS $2 in seen)?seen[$1 FS $2]: $1 FS $2 FS "0 0 0 0") FS $3, $4, $5, $6; delete seen[$1 FS $2] } END{ for(x in seen) print seen[x], "0 0 0 0" }' file1 second=1 file2

this will work as long as there is enough memory in order to loading the first file1 into the memory.

Source Link

answered Oct 29, 2021 at 18:18

αғsнιη

42k
17
75
118

Loading

Stack Exchange Network

Return to Answer