find lines longer than X in JSON and delete the whole object

Question

I have a huge JSON Array with multiple thousand objects and I need to filter all objects where the text field is too long (say 200 chars).

I've found a lot of SED/AWK advices to find a line with a certain length, but how can I delete that line AND the 1 before and the 2 after it; so that the whole JSON object is deleted?

The structure is like follows:

{ "text": "blah blah blah", "author": "John Doe" }

Thanks!

Next time you need to process JSON, also have a look at jq. — dirkt
– dirkt, Commented May 24, 2018 at 11:57

igal · Accepted Answer · 2018-05-23 21:59:25Z

Here's a Python script that does what you want:

#!/usr/bin/env python # -*- coding: ascii -*- """filter.py""" import sys # Get the file and the maximum line-length as command-line arguments filepath = sys.argv[1] maxlen = int(sys.argv[2]) # Initialize a list to store the unfiltered lines lines = [] # Read the data file line-by-line jsonfile = open(filepath, 'r') for line in jsonfile: # Only consider non-empty lines if line: # For "text" lines that are too line, remove the previous line # and also skip the next two line if "text" in line and len(line) > maxlen: lines.pop() next(jsonfile) next(jsonfile) # Add all other lines to the list else: lines.append(line) # Strip trailing comma from the last object lines[-2] = lines[-2].replace(',', '') # Output the lines from the list for line in lines: sys.stdout.write(line)

You could run it like this:

python filter.py data.json 34

Suppose you had the following data file:

[ { "text": "blah blah blah one", "author": "John Doe" }, { "text": "blah blah blah two", "author": "John Doe" }, { "text": "blah blah blah three", "author": "John Doe" } ]

Then running the script as described would produce the following output:

[ { "text": "blah blah blah one", "author": "John Doe" }, { "text": "blah blah blah two", "author": "John Doe" } ]

Stack Exchange Network

find lines longer than X in JSON and delete the whole object

1 Answer 1

You must log in to answer this question.

Hot Network Questions

find lines longer than X in JSON and delete the whole object

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions