How can I get reasonable parallelisation on multi-core nodes without saturating resources? As in many other similar questions, the question is really how to learn to tweak GNU Parallel to get reasonable performance.
In the following example, I can't get to run processes in parallel without saturating resources or everything seems to run in one CPU after using some -j -N options.
From inside a Bash script running in a multi-core machine, the following loop is passed to GNU Parallel
for BAND in $(seq 1 "$BANDS") ;do echo "gdalmerge_and_clean $VARIABLE $YEAR $BAND $OUTPUT_PIXEL_SIZE_X $OUTPUT_PIXEL_SIZE_Y" done |parallel This saturates, however, the machine and slows down processing.
In man parallel I read
--jobs -N
-j -N
--max-procs -N
-P -NSubtract N from the number of CPU threads.
Run this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used.
See also: --number-of-threads --number-of-cores --number-of-sockets
and I've tried to use
|parallel -j -3 but this, for some reason, uses only one CPU out of the 40. Checking with [h]top, only one CPU is reported high-use, the rest down to 0. Should -j -3 not use 'Number of CPUs' - 3 which would be 37 CPUs for example?
and I extended the previous call then
-j -3 --use-cores-instead-of-threads blindly doing so, I guess. I've read https://unix.stackexchange.com/a/114678/13011, and I know from the admins of the cluster I used to run such parallel jobs, that hyperthreading is disabled. This is still running in one CPU.
I am now trying to use the following:
for BAND in $(seq 1 "$BANDS") ;do echo "gdalmerge_and_clean $VARIABLE $YEAR $BAND $OUTPUT_PIXEL_SIZE_X $OUTPUT_PIXEL_SIZE_Y" done |parallel -j 95% or with |parallel -j 95% --use-cores-instead-of-threads.
Note
For the record, this is part of a batch job, scheduled via HTCondor and each job running on a separate node with some 40 physical CPUs available.
Above, I kept only the essential -- the complete for loop piped to parallel is:
for BAND in $(seq 1 "$BANDS") ;do # Do not extract, unscale and merge if the scaled map exists already! SCALED_MAP="era5_and_land_${VARIABLE}_${YEAR}_band_${BAND}_merged_scaled.nc" MERGED_MAP="era5_and_land_${VARIABLE}_${YEAR}_band_${BAND}_merged.nc" if [ ! -f "${SCALED_MAP+set}" ] ;then echo "log $LOG_FILE Action=Merge, Output=$MERGED_MAP, Pixel >size=$OUTPUT_PIXEL_SIZE_X $OUTPUT_PIXEL_SIZE_Y, Timestamp=$(timestamp)" echo "gdalmerge_and_clean $VARIABLE $YEAR $BAND $OUTPUT_PIXEL_SIZE_X >$OUTPUT_PIXEL_SIZE_Y" else echo "warning "Scaled map "$SCALED_MAP" exists already! Skipping merging.-"" fi done |parallel -j 95% log "$LOG_FILE" "Action=Merge, End=$(timestamp)"
where `log` and `warning` are a custom functions
parallelcorrectly detect your CPU cores ? What doesparallel --number-of-coresprint? Also, does it work as expected if you explicitly specify the no. of cores to be used, e.g.-j 17?parallelcorrectly detects all CPU cores. In one job running in some node, for example, it reports 40. Not all nodes are the same, however. Anyhow, jobs will only be assigned to machines with at least 24 CPUs. The idea behind-j 95%was to avoid hardcoding the number of CPUs. Nonetheless, in my latest attempts I just put-j 20.