modes-slurm leaves SLURM job running on crash of master
When running modes-slurm.sh and the master (?) encounters an exception, the associated SLURM job keeps running and occupying resources:
token-wireless.modest: error: Could not connect to host 220.127.116.11. token-wireless.modest: error: Could not connect to host 18.104.22.168. [ERROR] FATAL UNHANDLED EXCEPTION: System.ObjectDisposedException: The CancellationTokenSource has been disposed. at System.Threading.CancellationTokenSource.Cancel (System.Boolean throwOnFirstException) [0x00000] in <cc0368638257483f94f364ec47500332>:0 [...]
squeue still lists the job (and another one that had also crashed):
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 157516 r415 bh_modes hartmann R 4:55 20 ctit[001-020] 157517 r415 6h_modes hartmann R 3:55 20 ctit[021-040]
The jobs then need to be cancelled with scancel. The expected behaviour is that the script cancels the jobs itself, even when a crash occurs.