6

I got weird comments on posting an awk answer on SO which uses the getline function. Here is the link to that answer.

After posting my answer, a user came up with the below comment,(I don't criticize him.)

Not a good solution, it will join lines regardless of content and not process more lines if needed. And you should avoid using getline.

It states that we should avoid the getline function in awk. So my questions are,

  • Is it safe to use getline function in awk?
  • In what circumstances should we use getline in what cases shouldn't we?
  • If this function produces unexpected results then why don't we file a bug report?
2
  • 3
    I don't think there is anything wrong with awk's getline function per se. It can be a code smell though as it is easily misused, either (a) in un-idiomatic constructions when awk already supports line-by-line pattern-action operation, or (b) in attempting tasks too complicated for awk. I would liken it to the much debated goto statement in C in these respects: potentially useful, easily abused. Commented Jun 27, 2014 at 16:37
  • I mostly use getline in the BEGIN to get information and fill arrays that I will later use when processing the main file. Commented Jun 27, 2014 at 16:45

1 Answer 1

10

Most people argue over getline on coding style ground.

It's alien to the normal awk processing of have the code process one record at a time.

getline (when not used as getline var < "file" or "cmd" | getline) pulls in the next record (possibly from the next file) in the middle of the code statement. It's easy to lose track of the fact that it increments NR, FNR, may change FILENAME.

Another thing not to forget about when using it is to check its return value, as it will return 0 on EOF or <0 on error.

So it's not getline or if/while (getline) ..., it's:

if/while ((getline) > 0) { .... } 

Or:

if/while ((getline < "file") > 0) {...} 

Most of the usages of getline can be turned round by using a state-machine like approach.

Instead of:

/pattern/ {getline; print} 

Which is probably wrong and should be written:

/pattern/ && (getline) > 0 {print} 

You would do:

found_pattern {print; found_pattern=0} /pattern/{found_pattern=1} 

Also note how the two are different if pattern is matched on two consecutive lines.

Now, as long as you're aware of that, getline is fine. If you do want to process several files at the same time, then you do need getline, but remember to check the return value:

while ((getline a < "a") > 0 && (getline b < "b") > 0) { .... 

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.