First time here? Checkout the FAQ!
x
0 votes
ago by (120 points)
Hello,

We have been using openCARP to perform S1-S2 threshold activation studies. We are noticing some weird behaviour on our clusters and wondering if you could point us in the right direction.

When we run our simulations on a single node, we see very good scaling up to 80 cores on our large memory node. However, when we try to use two or three 32 core nodes together performance is significantly worse. For example a simulation for just S1 takes 8 minutes with 80 cores, when we use 3 nodes (96 cores) it takes 30 minutes. Do you have any ideas on what can be going wrong here?

We have compiled openCARP with the instructions given in the documentation. For Petsc the only change we have made is that we use openMPI as that is what is installed on our clusters. The specific commit of the code we are using is: 675501a5e0e0fac521aa3e1ef2950c2f9012457b. We have also tried to use different meshes for our simulations and observe the same behaviour.

Please let me know if you require any additional information.

Thanks!

1 Answer

0 votes
ago by (180 points)
Hey!

the first thing I would check is the architecture of the different nodes. Do you know what kind of hardware is used on the different nodes? It might be as simple as the 3 smaller nodes are older and have slower CPUs overall compared to the single large memory node.

By using multiple nodes you introduce different types of additional overhead such as communication between nodes and synchronization between MPI processes. Depending on the size of your problem this can mean that communication time increases significantly more than computation time.

You could try profiling tools to see where the extra time is spent in your simulation.

For now that's all I could think of. Hope it helps!

Best,

Tobias
ago by (120 points)
Hi Tobias,

Thanks for response. The 3 nodes we are using are identical and connected with 100 Gbit Infiniband. When I use 32 cores of our the large memory node, the simulation takes the same time as with our smaller compute nodes.

We have had some help from our cluster support team in benchmarking the code and they have not noticed anything obvious.

We will continue to look into profiling. We just find this very strange because the slow down happens as soon as we introduce an additional node. Do you think the mesh we are using can have any effect on this?

Thanks,
Kyle Klenk
ago by (180 points)
Can you give some details for this simulation? Mesh degrees of freedom, simulation parameters (especially IO related settings). Maybe that can lead us in the right direction.
ago by (120 points)
I hope this is helpful for the mesh as my colleague generated the mesh, and is more familiar with the specifics. This is what they had written about our mesh for me:

All simulations are performed on a 10 mm × 10 mm × 1 mm cuboidal domain The spatial domain is discretized with openCARP’s default settings, that is, with piecewise linear tetrahedral finite elements and a mesh resolution of 0.1 mm. When applied to the considered domain, this leads to a discretization of 112,211 nodes and 500,000 elements. The default settings of openCARP are also used for the temporal discretization.

As for the settings, here is what is in our .par file:

############### physical regions ##############

num_phys_regions     = 2

phys_region[0].name  = "Intracellular domain"

phys_region[0].ptype = 0

phys_region[0].num_IDs = 1

phys_region[0].ID[0] = 1

phys_region[1].name  = "Extracellular domain"

phys_region[1].ptype = 1

phys_region[1].num_IDs = 1

phys_region[1].ID[0] = 1

############### ionic setup ###################

num_imp_regions      = 1

imp_region[0].im     = Shannon

############## stimulus setup #################

num_stim             =      3

stimulus[0].name     = "S1"

stimulus[0].stimtype =      1

stimulus[0].duration =      2.

stimulus[0].start    =      0.

stimulus[0].npls     =      1

stimulus[0].x0       = -50.0 #in um

stimulus[0].xd       = 323.6 #thickness in um

stimulus[0].y0       = -50.0

stimulus[0].yd       = 323.6

stimulus[0].z0       = 950.0

stimulus[0].zd       = 100.0

stimulus[1].name     = "Ground"

stimulus[1].stimtype =      3

stimulus[1].x0       = -50.0

stimulus[1].xd       = 10100.0

stimulus[1].y0       = -50.0

stimulus[1].yd       = 10100.0

stimulus[1].z0       = -50.0

stimulus[1].zd       = 100.0

################# Simulation parameters #################

bidomain = 1

tend    =  70.

spacedt = 1.0

timedt = 1.0

parab_solve = 1

vofile = "vm.igb"

# Number of events to detect

num_LATs  = 1

# Event 1: activation

lats[0].ID         = ACTs

lats[0].all        = 1

lats[0].measurand  = 0

lats[0].threshold  = 0

lats[0].mode       = 0

# Event Monitor

sentinel_ID = 0

t_sentinel = 10.0

t_sentinel_start = 0.0

I hope this is what you were looking for.
ago by (180 points)
There seem to be some parameters missing for the stimulus, since I cannot get it to activate the tissue and the simulation stops early due to the sentinel. If you give me the correct parameters I can check if I get the same behavior on our HPC system.

On a side note, please change to the stim[] definitions for stimuli in future simulations, since stimulus[] is deprecated.
ago by (120 points)
My apologies, we have been passing in the stimulus as a parameter. We run the simulation with a lower bound of 22500000 and an upper bound of 72500000 as we search for S1. The full command we use to start the simulation is below.

mpirun -n 96 ./openCARP +F s1.par -stimulus[0].strength 72500000
ago by (180 points)
Hey!

I have run the experiment on our HPC system in different configurations and I can confirm your observations. What I observed was an excessive increase in computation time of the ionic models (look out for the ODE_stats.dat file written into your output directory) as soon as you move to 2 or more nodes.

I currently suspect that something goes wrong in the partitioning of the mesh and I will look at it more closely. Essentially, I noticed that when you use multiple nodes with X tasks each, the mesh is only partitioned into X blocks instead of X*nodes blocks. If you want to see it yourself, you can use the -gridout_p and -output_level parameters to get more output with regard towards the partitioning. However, I am not quite sure yet how it is connected to the increased computation times in the ODEs.
Welcome to openCARP Q&A. Ask questions and receive answers from other members of the community. For best support, please use appropriate TAGS!
architecture, carputils, documentation, experiments, installation-containers-packages, limpet, slimfem, website, governance
MathJax.Hub.Config({ tex2jax: { inlineMath: [ ['$','$'], ["\\(","\\)"] ], config: ["MMLorHTML.js"], jax: ["input/TeX"], processEscapes: true } }); MathJax.Hub.Config({ "HTML-CSS": { linebreaks: { automatic: true } } });
...