Since per-process bookkeeping IO load is a thing on Linux, what you want really would just amount to building a process tree with each node carrying the individual IO rate of the respective process, and then doing a leaf-to-bottom sum of the individual IO rates (sum over the children, not the node itself).
Finally, you'd then walk the tree to find the "heaviest" nodes.
Note that these nodes will be all at the root – in the end, all processes are children of process 1, so that must have the highest cumulative sum.
Building that tree from userland is probably good enough in most cases. All you need to do is read all /proc/*/stat, go for the 1. field (pid) and 4. field (ppid) and use that to build a doubly-linked tree structure (in Python, since you seem to know python). This is written from the depths of my heart, no testing whatsoever happened. You find a bug, you keep it.
#!/usr/bin/env python3 # # This is licensed under European Public License 1.2 # https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12 from os import walk from dataclasses import dataclass @dataclass class process: pid: int ppid: int io_rate: int children_io: int children: set[int] unprocessed_children: set[int] nodes = dict() for dirpath, dirnames, filenames in walk("/proc"): # ignore the files, we only care about the directories for dir in dirnames: if not dir.isnumeric(): continue stat = open(f"/proc/{dir}/stat", "r", encoding="ascii").read() stat_fields = stat.split(" ") io_fields = dict() with open(f"/proc/{dir}/io", "r", encoding="ascii") as io: for line in io: key, val = line.split(":") if val.isnumeric(): io_fields[key] = int(val) node = process( pid=int(stat_fields[0]), ppid=int(stat_fields[3]), io_rate=io_fields.get("read_bytes", 0) + io_fields.get("write_bytes", 0), unprocessed_children=set(), children=set(), children_io=0, ) nodes[node.pid] = process # Fill in children for node in nodes: parent = nodes.get(node.ppid, None) if not parent: # parent has already died continue parent.children.add(node.pid) parent.unprocessed_children.add(node.pid) while True: # Find leaf nodes leafs = set(node for node in nodes if not node.unprocessed_children) if not leafs: # done! break for leaf in leafs: leaf.seen = True parent = nodes[leaf.ppid] parent.children_io += leaf.io_rate # if you also want to count the IO of children's children: parent.children_io += leaf.children_io parent.unprocessed_children.remove(leaf.pid) # print result level = 0 print(nodes) def print_node(level: int, node: process): print(f"{' '*level} {node.pid:5d}: {node.children_io}") def recurse(level: int, node: process): print_node(level, node) level += 1 for child in node.children: recurse(level, child) recurse(nodes[1]) # Start at init