3

I'm trying to figure out a way of counting how many attribute values (for a multi-valued attribute in LDAP) various different users have. For example, the data looks something like this...

dn: [email protected],ou=test,dc=acme,dc=com accountid: a45ff948-e154-4c48-aa74-5b64ea876735 dn: [email protected],ou=test,dc=acme,dc=com accountid: f8103174-7853-4b0c-8d0e-faa820c8eff8 accountid: 3bea64d3-98d5-4ff1-b654-d01e4e3128cd dn: [email protected],ou=test,dc=acme,dc=com dn: [email protected],ou=test,dc=acme,dc=com accountid: 90ad7323-20ca-4087-9b13-62d5713ae57e 

I'd like to have output along the lines of...

[email protected],ou=test,dc=acme,dc=com , 1 [email protected],ou=test,dc=acme,dc=com , 2 [email protected],ou=test,dc=acme,dc=com , 0 [email protected],ou=test,dc=acme,dc=com , 1 

Or at the very least I'd just like to know (and print out) which DNs have multiple values for the accountid attribute.

Any ideas please?

Thanks in advance!

New contributor
darrensunley is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
1
  • Does your input include any other attributes or is it ONLY accountids? If there are others, add a few to your sample input so we can test with that. Commented yesterday

4 Answers 4

5

This seems very well suited to Perl in paragraph mode, where a record ("line") is defined by two consecutive newline characters. Something like this:

$ perl -00 -ne '@n=split(/\n/); /(uid=.+?),/; print "$1, $#n\n"' file [email protected], 1 [email protected], 2 [email protected], 0 [email protected], 1 

Explanation

  • -00: enable paragraph mode
  • -ne: read the input file line by line (here, because of the previous bullet point, a line is a paragraph) and apply the script given by -e to each line.
  • @n=split(/\n/): split the input line on \n characters (newline) into the array @n. This array will now have as many elements as there are actual lines in the paragraph.
  • /(uid=.+?),/; capture the string between uid= and the first comma, saving it as $1.
  • print "$1, $#n\n": print what was captured above followed by the last index of the array @n. Since arrays start from 0, the last index will be the same as the number of elements in the array minus one. Since we want the number of attribute lines which seems to be the number of lines in the paragraph excluding the first one, this gives the expected result.
4
  • Thanks so much for the quick response - it did exactly what I needed!! One final question please if you don't mind... do you know a way of printing out all/only the ones that have more than one accountid line please? I'd imagine it's something like "print the entire 'line' if the array count is greater than 1...? For example, in the above example it'd be only the whole of the user2 object that I'd be looking for please. Commented yesterday
  • 1
    @darrensunley perl -00 -ne '@n=split(/\n/); print if $#n > 1' file should do it. Or, since this is Perl and TIMTOWTDI, perl -00 -ne 'print if scalar(split(/\n/)) > 2' file. Commented yesterday
  • Thanks again - you've been super helpful!!! Commented yesterday
  • Shouldn't it be @n=grep {/^accountid:/} split(/\n/) (just to be safe)? Commented 56 mins ago
5

Using Miller (mlr) to read the data as "xtab" input (a format where records are separated by empty lines and fields are separated by newlines, with the field name and a tab at the start of the line), with the tabs in the format replaced by : (colon+space):

$ mlr --xtab --ips ': ' put -q 'print $dn, NF - 1' file [email protected],ou=test,dc=acme,dc=com 1 [email protected],ou=test,dc=acme,dc=com 2 [email protected],ou=test,dc=acme,dc=com 0 [email protected],ou=test,dc=acme,dc=com 1 

This simply outputs the dn field and the count of however many other fields there are in the record.

If you need commas in the output, use print $dn . ", " . string(NF - 1) as the put expression. Wrap that in a conditional that only prints the expression if NF > 1 (if you wish), like so:

$ mlr --xtab --ips ': ' put -q 'NF > 1 { print $dn . ", " . string(NF - 1) }' file [email protected],ou=test,dc=acme,dc=com, 1 [email protected],ou=test,dc=acme,dc=com, 2 [email protected],ou=test,dc=acme,dc=com, 1 

Alternatively, add the count as a new field and then cut out the dn field and your new field (output is on a whitespace-delimited indexed fields format ("nidx")):

$ mlr --x2n --ips ': ' put '$c = NF - 1' then cut -f dn,c file [email protected],ou=test,dc=acme,dc=com 1 [email protected],ou=test,dc=acme,dc=com 2 [email protected],ou=test,dc=acme,dc=com 0 [email protected],ou=test,dc=acme,dc=com 1 

Add --ofs ', ' to the options if you want comma+space as the output field delimiter. Here's how that may look, together with filtering out any record whose calculated c value is zero:

$ mlr --x2n --ips ': ' --ofs ', ' put '$c = NF - 1' then filter -x '$c == 0' then cut -f dn,c file [email protected],ou=test,dc=acme,dc=com, 1 [email protected],ou=test,dc=acme,dc=com, 2 [email protected],ou=test,dc=acme,dc=com, 1 
2
  • No need to filter after put as in mlr --x2n --ips ': ' --ofs ', ' put '$c = NF - 1; $c == 0' then cut -f dn,c' file1. And You know it. Commented yesterday
  • @PrabhjotSingh So it is, but it's not evident. Commented yesterday
3

Using awk:

 $ awk '/^accountid: /{c++} /^dn: /{ if (dn) print dn,c; # Use (dn && c) to print if accountid attribute is one or more. c=0; sub(/^dn /,""); dn=$0 }END{print dn,c}' 

Or awk in paragraph mode:

$ awk -v RS= -F'\n' '{sub(/^dn: /,"");print $1"," NF-1}' 
2

And here's an awk approach:

$ awk -F'[ ,]' -v RS= '{n=split($0,a,"\n"); print $2,n-1}' file [email protected] 1 [email protected] 2 [email protected] 0 [email protected] 1 

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS=: enables paragraph mode, where records are separated by one or more consecutive blank lines.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.
1
  • 1
    My usual approach to handling this scenario is to put the block output processing in a function and then not only call it at the end of the block but also from END { } Commented yesterday

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.