We recently acquired a new HPC cluster at our institutions
Each node is configured with:
AMD EPYC 9755 128-Core Processor
~1.1 Tb RAM
I pulled down a fresh clone of openCARP and created a new from source build on one of these nodes. Unfurtunatly, the simulations are taking orders of magnitude longer to finish than they did on our older clusters (ex: a stimulation protocol; that usually takes ~15 min to run on other nodes is now showing estimated comp times of > 1000 DAYS!)
I am working on profiling this build to see where the hangup might be, but wanted to ask:
1) What is the most likely suspect as far as where this slowdown is coming from? Petsc? openMPI ?
2) has anyone had success using profilers like perf, valgrind, or others? If so, how did you go about it.
I suspect that one of the dependancies I have built against is not ideal. I have seen it be the case in the past where openCARP gets fussy about the petsc version or things like that, but I am out of my depth here and would greatly appreciate any advice or help. More details below.
Best,
Jake B
Here are some more details:
Simulation setup:
- S1S2 protocol using a left atrial model of average edge length 350 uM, usualy simulation completion time for a single site is ~10 minutes for the S1 and ~20 to 40 min for the S2 depending on how long after the stimulus we simulate. These numbers are from a 192 core intel based node.
Things I have tried:
- reverting the git repo to match the working build on our intel machines (no change)
- attempting to build petsc from source (either with mpich or using the system openMPI) (both intructed compiler errors that I could not get around)
- yelling at the screen (changed nothing but was somewhat cathartic)
Here is how openCARP is linked for this build:
openCARP/_build/bin$ ldd openCARP
linux-vdso.so.1 (0x00007ffe778c9000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x0000726278ae7000)
libmpi_cxx.so.40 => /lib/x86_64-linux-gnu/libmpi_cxx.so.40 (0x0000726278ace000)
libpetsc_real.so.3.19 => /usr/lib/petscdir/petsc3.19/x86_64-linux-gnu-real/lib/libpetsc_real.so.3.19 (0x0000726277400000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000726277000000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000726278aa0000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007262789b5000)
libmpi.so.40 => /lib/x86_64-linux-gnu/libmpi.so.40 (0x00007262772ce000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000726276c00000)
libopen-pal.so.40 => /lib/x86_64-linux-gnu/libopen-pal.so.40 (0x0000726278901000)
libHYPRE-2.28.0.so => /lib/x86_64-linux-gnu/libHYPRE-2.28.0.so (0x0000726276600000)
libspqr.so.4 => /lib/x86_64-linux-gnu/libspqr.so.4 (0x0000726276faa000)
libumfpack.so.6 => /lib/x86_64-linux-gnu/libumfpack.so.6 (0x0000726276ef5000)
libamd.so.3 => /lib/x86_64-linux-gnu/libamd.so.3 (0x00007262788f2000)
libcholmod.so.5 => /lib/x86_64-linux-gnu/libcholmod.so.5 (0x0000726276a41000)
libklu.so.2 => /lib/x86_64-linux-gnu/libklu.so.2 (0x00007262772a3000)
libdmumps-5.6.so => /lib/x86_64-linux-gnu/libdmumps-5.6.so (0x0000726276200000)
libscalapack-openmpi.so.2.2 => /lib/x86_64-linux-gnu/libscalapack-openmpi.so.2.2 (0x0000726275c00000)
libsuperlu.so.6 => /lib/x86_64-linux-gnu/libsuperlu.so.6 (0x0000726276e85000)
libsuperlu_dist.so.8 => /lib/x86_64-linux-gnu/libsuperlu_dist.so.8 (0x0000726275a29000)
libtrilinos_ml.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_ml.so.13.2 (0x0000726275600000)
libfftw3.so.3 => /lib/x86_64-linux-gnu/libfftw3.so.3 (0x0000726275200000)
libfftw3_mpi.so.3 => /lib/x86_64-linux-gnu/libfftw3_mpi.so.3 (0x00007262788d8000)
liblapack.so.3 => /lib/x86_64-linux-gnu/liblapack.so.3 (0x0000726274a00000)
libblas.so.3 => /lib/x86_64-linux-gnu/libblas.so.3 (0x0000726276e18000)
libptscotch-7.0.so => /lib/x86_64-linux-gnu/libptscotch-7.0.so (0x0000726276525000)
libhdf5_openmpi.so.103 => /lib/x86_64-linux-gnu/libhdf5_openmpi.so.103 (0x0000726274600000)
libOpenCL.so.1 => /lib/x86_64-linux-gnu/libOpenCL.so.1 (0x0000726277290000)
libyaml-0.so.2 => /lib/x86_64-linux-gnu/libyaml-0.so.2 (0x0000726276504000)
libX11.so.6 => /lib/x86_64-linux-gnu/libX11.so.6 (0x00007262758ec000)
libmpi_mpifh.so.40 => /lib/x86_64-linux-gnu/libmpi_mpifh.so.40 (0x000072627649f000)
libgfortran.so.5 => /lib/x86_64-linux-gnu/libgfortran.so.5 (0x0000726274200000)
/lib64/ld-linux-x86-64.so.2 (0x0000726278e4e000)
libopen-rte.so.40 => /lib/x86_64-linux-gnu/libopen-rte.so.40 (0x0000726275544000)
libhwloc.so.15 => /lib/x86_64-linux-gnu/libhwloc.so.15 (0x000072627643e000)
libevent_core-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_core-2.1.so.7 (0x00007262761cb000)
libevent_pthreads-2.1.so.7 => /lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7 (0x0000726277287000)
libsuitesparseconfig.so.7 => /lib/x86_64-linux-gnu/libsuitesparseconfig.so.7 (0x0000726277282000)
libcolamd.so.3 => /lib/x86_64-linux-gnu/libcolamd.so.3 (0x0000726276a38000)
libcamd.so.3 => /lib/x86_64-linux-gnu/libcamd.so.3 (0x0000726276433000)
libccolamd.so.3 => /lib/x86_64-linux-gnu/libccolamd.so.3 (0x00007262761bf000)
libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007262754ee000)
libbtf.so.2 => /lib/x86_64-linux-gnu/libbtf.so.2 (0x0000726276e12000)
libmumps_common-5.6.so => /lib/x86_64-linux-gnu/libmumps_common-5.6.so (0x0000726275474000)
libptscotchparmetisv3-7.0.so => /lib/x86_64-linux-gnu/libptscotchparmetisv3-7.0.so (0x00007262761b8000)
libmetis.so.5 => /lib/x86_64-linux-gnu/libmetis.so.5 (0x0000726275196000)
libCombBLAS.so.2.0.0 => /lib/x86_64-linux-gnu/libCombBLAS.so.2.0.0 (0x00007262761a0000)
libtrilinos_ifpack.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_ifpack.so.13.2 (0x000072627406c000)
libtrilinos_amesos.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_amesos.so.13.2 (0x0000726275140000)
libtrilinos_galeri-epetra.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_galeri-epetra.so.13.2 (0x000072627542b000)
libtrilinos_aztecoo.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_aztecoo.so.13.2 (0x00007262750bb000)
libtrilinos_zoltan.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_zoltan.so.13.2 (0x0000726274523000)
libtrilinos_epetraext.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_epetraext.so.13.2 (0x0000726273f00000)
libscotch-7.0.so => /lib/x86_64-linux-gnu/libscotch-7.0.so (0x0000726273e6b000)
libtrilinos_epetra.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_epetra.so.13.2 (0x0000726273d08000)
libtrilinos_teuchosparameterlist.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchosparameterlist.so.13.2 (0x0000726273800000)
libtrilinos_teuchoscore.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchoscore.so.13.2 (0x00007262749a3000)
libopenblas.so.0 => /lib/x86_64-linux-gnu/libopenblas.so.0 (0x0000726271420000)
libptscotcherr-7.0.so => /lib/x86_64-linux-gnu/libptscotcherr-7.0.so (0x000072627619b000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x0000726276187000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x0000726275089000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x0000726270e00000)
libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x0000726273c47000)
libsz.so.2 => /lib/x86_64-linux-gnu/libsz.so.2 (0x0000726276182000)
libxcb.so.1 => /lib/x86_64-linux-gnu/libxcb.so.1 (0x0000726273c1e000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x0000726273beb000)
libtrilinos_trilinosss.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_trilinosss.so.13.2 (0x0000726273bbd000)
libtrilinos_teuchosremainder.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchosremainder.so.13.2 (0x00007262758e4000)
libtrilinos_teuchosnumerics.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchosnumerics.so.13.2 (0x00007262758c7000)
libtrilinos_teuchoscomm.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchoscomm.so.13.2 (0x0000726271342000)
libtrilinos_triutils.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_triutils.so.13.2 (0x0000726273b5e000)
libscotcherr-7.0.so => /lib/x86_64-linux-gnu/libscotcherr-7.0.so (0x0000726275426000)
libtrilinos_teuchosparser.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_teuchosparser.so.13.2 (0x0000726270d9e000)
libtrilinos_kokkoscore.so.13.2 => /lib/x86_64-linux-gnu/libtrilinos_kokkoscore.so.13.2 (0x0000726270d15000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x0000726271317000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x0000726270cf3000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x0000726270cd5000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x0000726270c64000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x0000726273b4a000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x0000726270bba000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x0000726270b66000)
libldap.so.2 => /lib/x86_64-linux-gnu/libldap.so.2 (0x0000726270b08000)
liblber.so.2 => /lib/x86_64-linux-gnu/liblber.so.2 (0x0000726270af8000)
libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x0000726270a3e000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x0000726270a30000)
libaec.so.0 => /lib/x86_64-linux-gnu/libaec.so.0 (0x000072627499a000)
libXau.so.6 => /lib/x86_64-linux-gnu/libXau.so.6 (0x000072627507f000)
libXdmcp.so.6 => /lib/x86_64-linux-gnu/libXdmcp.so.6 (0x0000726270a28000)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x0000726270a1b000)
libunistring.so.5 => /lib/x86_64-linux-gnu/libunistring.so.5 (0x000072627086e000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x0000726270674000)
libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x000072627062c000)
libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007262705d7000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x0000726270553000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x000072627048a000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x000072627045e000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x0000726270458000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x000072627044b000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x0000726270431000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x000072627040e000)
libbsd.so.0 => /lib/x86_64-linux-gnu/libbsd.so.0 (0x00007262703f8000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x0000726270254000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x000072627023e000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x0000726270237000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x0000726270224000)
libmd.so.0 => /lib/x86_64-linux-gnu/libmd.so.0 (0x0000726270215000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x0000726270209000)