Piggybacking from an earlier question (re: new HPC user)
Running my script in a local machine with 32 cores, using the command
python3 ./atrialBARS_reEntry_v1.1.py --np 32
The simulation time is only ~40 mins as seen here.
However, I cannot reproduce the same run time in an HPC. I'm using the Graham cluster in Ontario, Canada. Our HPC uses SLURM as a resource manager and I have the following job script.
#!/bin/bash
#SBATCH --account=def-someuser
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem-per-cpu=4G
#SBATCH --time=0-02:00:00
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
module load StdEnv/2020 gcc/9.3.0 openmpi/4.0.3 petsc/3.15.0 scipy-stack/2023b opencarp/13.0
mpiexec python3 atrialBARS_reEntry_v1.1.py --overwrite-behaviour overwrite
This run has an estimated completion time of 6 hours, although I have seen instances where it goes down to 3 hours.
Now, when I request for "./atrialBARS_reEntry.py --np 32" as I do in my local machine, "I get an error that there aren't enough slots in the system available to satisfy the 32 slots requested by the application"
I've also tried mpirun --oversubscribe to brute force my way in to no avail. I've already tried various iterations of mpi and python flags, I just can't seem to find the right one.
I've reached out to support for our HPC but am equally clueless as to what to tell them about the problem other than that I cannot replicate the performance of my local machine in an HPC. Moreover, is this openCARP specific? OpenMPI specific? HPC specific?
Any help would be appreciated.
Karl