Running on a Linux clusterΒΆ
After implementing your model, you may want to scale up your training procedure by running on a cluster. Below we provide basic instructions for installation, building, and running on a cluster. This manual assumes you work on a Linux cluster with command line interface. We provide specific suggestions for running on the Snellius cluster, which is the Dutch national supercomputer hosted by Surf.
0. Download and unzip LibTorch (Only on Snellius):
Easiest way to link LibTorch: you can download the LibTorch version for Linux and upload it (still zipped) to your home folder. Next, you can unzip the library:
unzip libtorch-version-name.zip
1. Build:
We provide a special
build.jobfile in the/bashfolder that will automate all build processes for you, you will only need to change the specific target for your build and next run it using:cd bash sbatch build.jobFor those that prefer a manual build, below we provide the steps for building. Note that it is not allowed to build on the login node, so you will first need to request a compute node for your job:
srun -p genoa -c 192 -n 1 -t 00:15:00 --pty /bin/bashAs soon as you have been allocated a node, you need to load the modules using
source loadmodules.shcd bash source loadmodules.sh cd .. #back to rootNext follow the below steps.
For a specific preset from CMake user presets (e.g.,
LinRel):cmake --preset=LinRel # Other options: LinDeb/ LinDBCompile all code:
cmake --build out/LinRel -- -j120Note: The option
-- -j120instructs to parallelize the build.If you encounter the error:
CMake Error: Could not read presets from /home/willemvj/DynaPlexPrivate: Unrecognized "version" fieldYour CMake version is not recent enough. On Snellius, you may have forgotten to
source loadmodules.shto bring the recent CMake version into scope.Compile a specific target (e.g.,
sometarget). This will only build you specific target, e.g., a target in thesrccmake --build out/LinRel --target sometarget -j12
3. Run:
Executables may be run from using a job script, using the
sbatchcommand, seeCPU.jobin thebash/folder for an example file.