Poor Internode Scaling With MPI/Petsc

Question

Poor Internode Scaling With MPI/Petsc

asked Nov 19, 2024 by Kyle Klenk (120 points)

Hello,

We have been using openCARP to perform S1-S2 threshold activation studies. We are noticing some weird behaviour on our clusters and wondering if you could point us in the right direction.

When we run our simulations on a single node, we see very good scaling up to 80 cores on our large memory node. However, when we try to use two or three 32 core nodes together performance is significantly worse. For example a simulation for just S1 takes 8 minutes with 80 cores, when we use 3 nodes (96 cores) it takes 30 minutes. Do you have any ideas on what can be going wrong here?

We have compiled openCARP with the instructions given in the documentation. For Petsc the only change we have made is that we use openMPI as that is what is installed on our clusters. The specific commit of the code we are using is: 675501a5e0e0fac521aa3e1ef2950c2f9012457b. We have also tried to use different meshes for our simulations and observe the same behaviour.

Please let me know if you require any additional information.

Thanks!

2 Answers

Tobias Gerach · Answer 1 · 2024-11-19T11:42:47+0000

commented Nov 19, 2024 by Kyle Klenk (120 points)

commented Nov 19, 2024 by Tobias Gerach (500 points)

commented Nov 19, 2024 by Kyle Klenk (120 points)

I hope this is helpful for the mesh as my colleague generated the mesh, and is more familiar with the specifics. This is what they had written about our mesh for me:

All simulations are performed on a 10 mm × 10 mm × 1 mm cuboidal domain The spatial domain is discretized with openCARP’s default settings, that is, with piecewise linear tetrahedral finite elements and a mesh resolution of 0.1 mm. When applied to the considered domain, this leads to a discretization of 112,211 nodes and 500,000 elements. The default settings of openCARP are also used for the temporal discretization.

As for the settings, here is what is in our .par file:

############### physical regions ##############

num_phys_regions     = 2

phys_region[0].name = "Intracellular domain"

phys_region[0].ptype = 0

phys_region[0].num_IDs = 1

phys_region[0].ID[0] = 1

phys_region[1].name = "Extracellular domain"

phys_region[1].ptype = 1

phys_region[1].num_IDs = 1

phys_region[1].ID[0] = 1

############### ionic setup ###################

num_imp_regions      = 1

imp_region[0].im     = Shannon

############## stimulus setup #################

num_stim             =      3

stimulus[0].name     = "S1"

stimulus[0].stimtype =      1

stimulus[0].duration =      2.

stimulus[0].start    =      0.

stimulus[0].npls     =      1

stimulus[0].x0       = -50.0 #in um

stimulus[0].xd       = 323.6 #thickness in um

stimulus[0].y0       = -50.0

stimulus[0].yd       = 323.6

stimulus[0].z0       = 950.0

stimulus[0].zd       = 100.0

stimulus[1].name     = "Ground"

stimulus[1].stimtype =      3

stimulus[1].x0       = -50.0

stimulus[1].xd       = 10100.0

stimulus[1].y0       = -50.0

stimulus[1].yd       = 10100.0

stimulus[1].z0       = -50.0

stimulus[1].zd       = 100.0

################# Simulation parameters #################

bidomain = 1

tend    = 70.

spacedt = 1.0

timedt = 1.0

parab_solve = 1

vofile = "vm.igb"

# Number of events to detect

num_LATs = 1

# Event 1: activation

lats[0].ID         = ACTs

lats[0].all        = 1

lats[0].measurand = 0

lats[0].threshold = 0

lats[0].mode       = 0

# Event Monitor

sentinel_ID = 0

t_sentinel = 10.0

t_sentinel_start = 0.0

I hope this is what you were looking for.

commented Nov 20, 2024 by Tobias Gerach (500 points)

commented Nov 20, 2024 by Kyle Klenk (120 points)

commented Nov 22, 2024 by Tobias Gerach (500 points)

Hey!

I have run the experiment on our HPC system in different configurations and I can confirm your observations. What I observed was an excessive increase in computation time of the ionic models (look out for the ODE_stats.dat file written into your output directory) as soon as you move to 2 or more nodes.

I currently suspect that something goes wrong in the partitioning of the mesh and I will look at it more closely. Essentially, I noticed that when you use multiple nodes with X tasks each, the mesh is only partitioned into X blocks instead of X*nodes blocks. If you want to see it yourself, you can use the -gridout_p and -output_level parameters to get more output with regard towards the partitioning. However, I am not quite sure yet how it is connected to the increased computation times in the ODEs.

Aurel Neic · Answer 2 · 2024-11-26T08:01:40+0000

commented Nov 26, 2024 by Tobias Gerach (500 points)

commented Nov 26, 2024 by Aurel Neic (8.6k points)

commented Nov 26, 2024 by Kyle Klenk (120 points)

commented Nov 27, 2024 by Tobias Gerach (500 points)

commented Nov 28, 2024 by Aurel Neic (8.6k points)

Kyle, your parameters look good. The poor scaling must be due to the combination of small local problem size and sub-optimal interconnect. A 100Gbit interconnect sounds a lot, but throughput is less relevant than latency. As such, solving the problem in parallel requires the exchange of many small messages with minimal latency. For this, infini-band is much better than Gbit Ethernet. As such, you might only scale to larger local problem sizes (e.g. 100K - 200K) compared to what is reported in literature (20K - 50K).

Please report the local problem sizes you have. You can see them using the "-output_level 5" option. You should see a small table with the ranks and their local number of elements / indices. The distribution should be pretty even, so you only need to post one set of values. Also, index multiplicities are worth reporting.

Best, Aurel

Poor Internode Scaling With MPI/Petsc

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Poor Internode Scaling With MPI/Petsc

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions