Skip to main content

Questions tagged [slurm]

SLURM is a workload manager for Linux clusters

0 votes
0 answers
18 views

I've been trying to configure slurmdbd in a small test cluster. So far it's able to start successfully. However, when I try to do anything with sacctmgr, like sacctmgr list cluster, the command hangs, ...
Lisanna Dettwyler's user avatar
2 votes
1 answer
67 views

I'm working on a "cluster" that currently has only one computenode, with 8x H100 GPUs. Slurm is configured such that each GPU is available either as a whole GPU, or as 20 shards. The (from ...
Raketenolli's user avatar
0 votes
1 answer
81 views

What’s the right way to pull a complete answer to an InfluxQL query over http? I’m using the acct_gather plugin for a slurm cluster. It sends resource usage data to an influxdb v1 database. So if I ...
wobtax's user avatar
  • 1,195
0 votes
1 answer
456 views

I'd like my log files to be named after a variable. Since this isn't possible: #SBATCH --output some_software.${var}.out #SBATCH --error some_software.${var}.err I came across this work around but ...
Caterina's user avatar
  • 103
1 vote
1 answer
364 views

If I run a Slurm job with sbatch, I can specify output and error filenames with a custom format. But how can I look up these filenames, given the job ID (e.g. 123456) of a running job? For example: ...
wobtax's user avatar
  • 1,195
0 votes
0 answers
125 views

I'd like to make a shell alias for customers to use like squeue --format="%.9i %10j %8u %8T %.12M %6D %20R %16P %.4C %m" The right-most %m format tag shows how much memory a job requested ...
billt's user avatar
  • 101
1 vote
0 answers
96 views

When I specify in gres.conf to omit the first GPU, Processes in Slurm still use the first one. If I allow Slurm to manage both, the second concurrent process properly goes onto the second GPU. Why? ...
Bartłomiej Popielarz's user avatar
0 votes
1 answer
483 views

On a slurm cluster, is there ever a time when it’s appropriate to use sbatch inside an sbatch script? Or is it always a bad pattern? I’ve seen this in use, and it looks iffy: #SBATCH -J ...
wobtax's user avatar
  • 1,195
0 votes
0 answers
233 views

I'm in the process of setting up a singe-node Slurm workstation machine and I believe I followed the process closely and everything is working just fine. See below: sudo systemctl restart slurmdbd &...
Matteo's user avatar
  • 387
0 votes
0 answers
123 views

Hi I was working on installing Slurm and got most things sorted out but upon launching sudo journalctl -fu slurmdbd I get the following: Jan 25 12:49:49 ... systemd[1]: Stopped slurmdbd.service - ...
Matteo's user avatar
  • 387
0 votes
1 answer
104 views

I am working on understanding of how cgroups memory resource controller is enabled on Ubuntu 20.04. I have several Ubuntu machines that make up a Slurm 23.02.7 cluster. In cgroup.conf, SchedMD ...
irritable_phd_syndrome's user avatar
0 votes
0 answers
431 views

I am new to Slurm. I have set it up in the cluster and on some nodes of a partition, the job runs perfectly fine but some other nodes of the same partition, the jobs do not run. They get cancelled the ...
Sanji Vinsmoke's user avatar
0 votes
0 answers
90 views

I wrote a SLURM job script to run a computational chemistry calculation using the CREST program (part of the xtb software package). In the script, I create a temporary directory on the local storage ...
lay lay's user avatar
0 votes
1 answer
86 views

I am doing experiments seeing how slurm behaves when it finds offline CPUs. In my experiments, slurm provides configurations that make available too few CPUs. Here's a few examples from an 8-cpu node ...
eftshift0's user avatar
  • 785
0 votes
1 answer
588 views

I am facing a problem where slurmctld and slurmd are not in sync in terms of using the same slurm.conf file so we have this: error: Node node1 appears to have a different slurm.conf than the slurmctld....
eftshift0's user avatar
  • 785

15 30 50 per page
1
2 3 4 5
7