modes-slurm does not terminate if the SLURM job is cancelled
- Start a job with
modes-slurm.sh
:
$ ./modes-slurm.sh ../database.modest --partition r415 --nodes 40 --cores 8 -D --unsafe -M MA -E "RED=2" -E "RED=3" -E "RED=4" -E "RED=5" -R Uniform --statistical CI --relative-width --rare Restart --ifun-op Max Max -O ../database.uni-restart.cluster-40-8.1.txt
SLURM configuration:
- partition = r415
- nodes = 40
- cores = 8
- timeout-slurm =
Using mono binary /home/hartmannsa/mono/bin/mono
Using modes binary ./modes.exe
Allocating SLURM nodes for the modes slaves
SLURM job name: EF_modeslurm
Output file (server nodes): servers_EF_modeslurm.out
salloc: Pending job allocation 157603
salloc: job 157603 queued and waiting for resources
.......
- Cancel the job (in this case, before it was allocated resources) from another terminal/screen:
scancel 157603
Expected behaviour: The job is cancelled and modes-slurm.sh
terminates.
Actual behaviour: The job is cancelled, but modes-slurm.sh
keeps waiting and printing dots:
salloc: Job allocation 157603 has been revoked.
salloc: Job has been cancelled
salloc: error: Failed to allocate resources: No error
...................................................................................................