With the GNU implementation of du, you can do:
<../all-file LC_ALL=C grep '\.[hc]$' | tr '\n' '\0' | du -sh --files0-from=-
With the GNU implementation of xargs:
<../all-file LC_ALL=C grep '\.[hc]$' | xargs -rd '\n' du -sh --
Or with a few other xargs implementations that have copied GNU's -r and -0 if not its -d such as on most BSDs:
<../all-file LC_ALL=C grep '\.[hc]$' | tr '\n' '\0' | xargs -r0 du -sh --
But beware that since du reports the disk usage of hardlinks only once, if you break the list into several batches like xargs does, you could end up with different results. If you wanted to use the -c option to get the cumulative result at the end, you'd also have to use the --files0-from approach.
Example:
$ seq 100000 > a.c $ ln a.c b.c $ du -shc a.c b.c 651K a.c 651K total
See how b.c is not reported as that's effectively the same file as a.c. The cumulative disk usage of both files is 651K.
$ du -shc a.c; du -shc b.c 651K a.c 651K total 651K b.c 651K total
You get 2 totals of 651K which hides the fact that a.c and b.c are the same file.
If, on the other end, you wanted to disable that hard link handling and for the disk usage of every file to be reported as if in isolation, with the GNU implementation of du, you can use the -l / --count-links option. See also the --apparent-size option if you're interested in the size of the file rather than their disk usage.
On modern versions of Linux, you can raise the limit on the size of arguments + environment by raising the stacksize resource limit.
For instance, with: ulimit -s unlimited in most shells (also limit stacksize unlimited in zsh where it applies only to child processes), you might be able to get over the limit and avoid having to break down the list.
$ /bin/true {1..150000} zsh: argument list too long: /bin/true (127)$ limit stacksize unlimited $ /bin/true {1..150000} $ /bin/true {1..250000} $ /bin/true {1..350000} $ /bin/true {1..500000} zsh: argument list too long: /bin/true
And then, you may be able to use split+glob, but as always, you'd need to tune it to fix the delimiter to newline only (IFS=$'\n') and disable globbing which you likely don't want here (set -o noglob):
( ulimit -s unlimited IFS=$'\n' set -o noglob du -sh -- $(<../all-file LC_ALL=C grep '\.[hc]$') )
With the additional caveat that in the cases where ../all-file does not have lines matching that pattern, you'll end up running du without argument which will default to getting the disk usage of the current working directory (the -r option of xargs is there to handle those cases).
A few more notes:
sed -ne /re/p is grep (that's the grep acronym, though from ed's g/re/p). - file paths can be made of any non-NUL bytes, in order to be able to represent an arbitrary list of them, an obvious choice is to separate them with NULs, hence the
--files0-from which expects a NUL-delimited list. That means your newline-delimited all-file file cannot list arbitrary file paths. That also means that since file paths don't have to be made of text, processing them with text utilities is not guaranteed to work unless in locales such as C where all bytes are characters. Using LC_ALL=C will also likely improve performance as it avoids the decoding of bytes into characters. $(...) unquoted in bash is split+glob. You can only split a file list with that if you know the file paths don't contain glob characters nor characters in the $IFS variable. - For
xargs without -0/-d, the splitting is done on blanks or newline and single / double quotes and backslashes are interpreted as escaping operators, which means it can't process newline delimited list of arbitrary file paths as your all-file looks like it is either. - when passing a variable, unknown in advance list of arguments to a command, you want to make sure you use
-- before it to mark the end of options to make sure that if ever the first (or any with GNU implementations) argument starts with - (or + with some commands), it is not treated as an option.
xargsis your friend.