modes-slurm: Invalid IP address list
Running modes-slurm.sh on korenvliet with --partition t630
, I get the following output:
hartmannsa@korenvliet:~/sa$ bash ~/Release/modes-slurm.sh models/m2.modest --partition t630 --nodes 3 --cores 16 --thread-budget 16 -D -J 16 --batch-size 1000 -M STA --unsafe -Y -W 0.005 --collect-schedulers -L 10000 "hist(o)/4" -O "m2.hist(o)-4.10000.1.txt"
SLURM configuration:
- partition = t630
- nodes = 3
- cores = 16
- timeout-slurm =
Using mono binary /home/hartmannsa/mono/bin/mono
Using modes binary /home/hartmannsa/Release/modes.exe
Allocating SLURM nodes for the modes slaves
SLURM job name: 0C_modeslurm
Output file (server nodes): servers_0C_modeslurm.out
salloc: Pending job allocation 38140
salloc: job 38140 queued and waiting for resources
....salloc: job 38140 has been allocated resources
salloc: Granted job allocation 38140
.Running distributed modes in the slaves of the SLURM batch 38140
Host IP addresses: 192.168.111.81 130.89.7.54,192.168.111.82 130.89.7.55,192.168.111.83 130.89.7.56
modes invocation arguments: models/m2.modest --thread-budget 16 -D -J 16 --batch-size 1000 -M STA --unsafe -Y -W 0.005 --collect-schedulers -L 10000 hist(o)/4 -O m2.hist(o)-4.10000.1.txt
Unexpected value "130.89.7.54,192.168.111.82".
Use modes --help to show usage information.
real 0m0.336s
user 0m0.308s
sys 0m0.004s
salloc: Job allocation 38140 has been revoked.
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: ctit082: task 1: Terminated
srun: error: ctit083: task 2: Terminated
srun: error: ctit081: task 0: Terminated
Something is wrong with the IP addresses being passed to modes here.