Skip to main content
added 179 characters in body
Source Link
cas
  • 85.1k
  • 9
  • 139
  • 207

BTW, if you don't want the almost-empty lines (with just [HH:MM::SS]) to be printed, change the print "$out\n"; line to print "$out\n" unless ($out =~ /^\[[^]]\]\s*$/);.

BTW, if you don't want the almost-empty lines (with just [HH:MM::SS]) to be printed, change the print "$out\n"; line to print "$out\n" unless ($out =~ /^\[[^]]\]\s*$/);.

added cosmetically-enhanced version
Source Link
cas
  • 85.1k
  • 9
  • 139
  • 207

Updated version:

This version extracts the first 34 characters of each line as the "first word", uses sprintf() to format each element of @out as a 10-character-wide string, and strips trailing spaces from the output. It also uses \s* and \s+ instead of a single space to match (or split) on any white-space characters. Comments have been removed for brevity, as the comments in the original script are still applicable (changes are trivial).

$ cat remove-dupes2.pl #!/usr/bin/perl use strict; my %vars; while(<>) { chomp; my @out = (); my $first_word; ($first_word = $_) =~ s/\b\w+\s+[TF]//g; $first_word =~ s/^(.{34})\s+$/$1 /; push @out, $first_word if $first_word; while (/\s*\b(\w+\s+[TF])\b/g) { my ($key,$val) = split /\s+/, $1; next if ($vars{$key} eq $val); push @out, sprintf("%-10s",$1); $vars{$key} = $val; }; my $out = join(" ",@out); $out =~ s/\s+$//; # remove trailing spaces print "$out\n"; } 

Sample output using your new input data (input.txt). It's still not exactly the same as your sample output, there are still a few whitespace differences, but the output is a LOT closer than the output from the original script above.

$ ./remove-dupes2.pl input.txt [10:03:43] 10:03:35 22JUN22 ConUP CW1B1 T CW1B3 T CW1B5 T CW1B7 T [10:03:44] CW2B1 T CW2B4 T CW2B5 T CW2B7 T [10:03:44] CW3B1 T CW3B4 T CW3B5 T CW4B4 T [10:03:44] CW4B8 T CW5B4 T CW5B8 T CW6B4 T [10:03:44] CW6B8 T CW7B8 T CW8B4 T CW8B8 T [10:03:44] CW9B4 T CW9B8 T CW10B4 T CW10B8 T [10:03:44] CW11B4 T CW11B8 T CW12B4 T CW12B8 T [10:04:16] 10:04:28 22JUN22 ConUP [10:04:36] CW2B1 F CW2B3 F CW2B5 F CW2B7 F [10:04:36] [10:04:36] [10:04:37] [10:04:37] [10:04:37] 

Using diff to confirm that the only differences are spaces (output.txt is your sample output):

$ ./remove-dupes2.pl input.txt > out.txt $ diff --ignore-space-change -u output.txt out.txt $ 

Updated version:

This version extracts the first 34 characters of each line as the "first word", uses sprintf() to format each element of @out as a 10-character-wide string, and strips trailing spaces from the output. It also uses \s* and \s+ instead of a single space to match (or split) on any white-space characters. Comments have been removed for brevity, as the comments in the original script are still applicable (changes are trivial).

$ cat remove-dupes2.pl #!/usr/bin/perl use strict; my %vars; while(<>) { chomp; my @out = (); my $first_word; ($first_word = $_) =~ s/\b\w+\s+[TF]//g; $first_word =~ s/^(.{34})\s+$/$1 /; push @out, $first_word if $first_word; while (/\s*\b(\w+\s+[TF])\b/g) { my ($key,$val) = split /\s+/, $1; next if ($vars{$key} eq $val); push @out, sprintf("%-10s",$1); $vars{$key} = $val; }; my $out = join(" ",@out); $out =~ s/\s+$//; # remove trailing spaces print "$out\n"; } 

Sample output using your new input data (input.txt). It's still not exactly the same as your sample output, there are still a few whitespace differences, but the output is a LOT closer than the output from the original script above.

$ ./remove-dupes2.pl input.txt [10:03:43] 10:03:35 22JUN22 ConUP CW1B1 T CW1B3 T CW1B5 T CW1B7 T [10:03:44] CW2B1 T CW2B4 T CW2B5 T CW2B7 T [10:03:44] CW3B1 T CW3B4 T CW3B5 T CW4B4 T [10:03:44] CW4B8 T CW5B4 T CW5B8 T CW6B4 T [10:03:44] CW6B8 T CW7B8 T CW8B4 T CW8B8 T [10:03:44] CW9B4 T CW9B8 T CW10B4 T CW10B8 T [10:03:44] CW11B4 T CW11B8 T CW12B4 T CW12B8 T [10:04:16] 10:04:28 22JUN22 ConUP [10:04:36] CW2B1 F CW2B3 F CW2B5 F CW2B7 F [10:04:36] [10:04:36] [10:04:37] [10:04:37] [10:04:37] 

Using diff to confirm that the only differences are spaces (output.txt is your sample output):

$ ./remove-dupes2.pl input.txt > out.txt $ diff --ignore-space-change -u output.txt out.txt $ 
Source Link
cas
  • 85.1k
  • 9
  • 139
  • 207

The following perl script will remove duplicate variable values from each input line. Only variables that we either haven't seen before or that have changed since the last time they were seen will be included in the output. Empty lines (i.e. those with no new or changed variables) aren't printed.

$ cat remove-dupes.pl #!/usr/bin/perl use strict; # %vars is a hash (associative array) to store last-seen T/F values # for the "variables". This is global in scope, i.e. defined here # outside of the main loop because its keys & values need to be # remembered across lines. my %vars; # read in and process each input line while(<>) { chomp; # remove newline character at end of line # @out is an array to hold "VAR [TF]" values that we either # haven't seen before or have changed since last time we saw # them. i.e. the values we want to output. It needs to be # reset to empty every time we read a line. my @out = (); # extract the first word (i.e. "ConUp") from input line # by deleting all "VAR [TF]" strings from the line my $first_word = $_; $first_word =~ s/ ?\w+\s+[TF]//g; # add it to @out if it isn't empty push @out, $first_word if $first_word; # Iterate over each "VAR [TF]" string in the line. Split # it into the key (variable name) and value (T or F). # If it's new or changed then add it to @out and # remember its current value in %vars while (/ ?\b(\w+ [TF])\b/g) { my ($key,$val) = split / /, $1; next if ($vars{$key} eq $val); push @out, $1; $vars{$key} = $val; }; # join (with spaces) and print @out if it isn't empty. if (@out) {print join(" ",@out), "\n"}; } 

NOTE: if there are any other strings that aren't "VAR [TF]" on any line, they will all be moved to the beginning of that line in the output. This may or may not be considered to be a bug. Since the only such instance in your sample data is the "ConUp" at the beginning of the first line, it works as required. The variable probably shouldn't be called $first_word, but that's a lot shorter than $anything_that_doesnt_match_a_boolean_variable_pattern

Your input file doesn't contain any duplicate variable values that should be removed, so I've created one that does (and adds a few instances of variables that change, too):

$ cat file2.txt ConUp CW1B1 T CW1B2 T CW1B3 F CW1B4 F CW2B1 T CW2B2 F CW2B3 F CW2B4 T CW2B1 F CW2B2 F CW2B3 F CW2B4 T CW2B1 F CW2B2 F CW2B3 F CW2B4 T CW2B1 T CW2B2 F CW2B3 F CW2B4 T CW2B1 F CW2B2 F CW2B3 T CW2B4 T 

Sample output:

$ ./remove-dupes.pl file2.txt ConUp CW1B1 T CW1B2 T CW1B3 F CW1B4 F CW2B1 T CW2B2 F CW2B3 F CW2B4 T CW2B1 F CW2B1 T CW2B1 F CW2B3 T