JUREAP
Note
If you need help, please send us a mail at jureap-support@fz-juelich.de or alternatively, open an issue in the JSC gitlab repository assigned to your application in the jureap project.
Every participant in JUREAP is required to create a continuous-benchmarking-ready version of their application using the exacb framework. This allows for tracking both system and application performance during build-up of JUPITER – and beyond. The benchmark helps in ensuring that the application is running correctly and efficiently on JUPITER.
The participants are expected to design scientific use-cases for each of the benchmarks. The scientific use case should be chosen such that it is representative of the application’s eventual use case on JUPITER. The use cases should be designed to run on both the JUWELS Booster and the JUPITER Booster systems. In general, the applications should design a use-case for each of the different significant steps towards the eventual full system run of the application on JUPITER. The section on Designing your Benchmarks provides more details on how to design the use cases.
For the purpose of continuous benchmarking, the benchmarks are expected to be wrapped up in a reproducible workflow that can then be run in a continuous integration environment. The applications can choose a workflow framework of their choice, however, the use of the JUBE framework is strongly encourage due to its extensive use at JSC and our experience with testing it in context of reproducible benchmarks. Please look at the quickstart guide on how to use JUBE for running the benchmarks in the Reproducible Benchmarks using JUBE section.
At JSC, we support continuous integration on JSC machines using the Jacamar CI/CD Gitlab Executor. In addition, the benchmarks are expected to use the exacb framework for running the benchmarks inside the continuous integration environment. The exacb framework assists in running the benchmarks, collecting the results and generating reports in a uniform manner across all applications in JUREAP.
With that in mind, there are following main steps to be followed to integrate your application with the exacb framework as listed below. We will go over each of these steps in detail in the following sections.
Designing your Benchmarks
The first step in JUREAP is to design a set of use-cases that can be used to benchmark your application for different scenarios. It should be possible to run the complete workflow of downloading, compiling, setting up, running, and validating the result of the use-case in a reasonable amount of time. In particular, the auxiliary parts of the benchmark should not scale super-linearly with the size of the use case. Please talk to us if you have special requirements or need help with managing the runtime of the auxiliary parts of the benchmark.
For JUREAP, we initially require the following benchmark variants to be designed for each application.
Single node benchmark
The purpose of the single node variant of the benchmark is to ensure that the application can run correctly on the queue platform. The benchmark can use any number of cores / gpus on a single node. Once the benchmark is running, the benchmark will be used to measure the performance of the application on a single node.
Strong Scaling Benchmarks
The strong scaling benchmarks should be designed to test the scaling behavior of the application. In the JUREAP, we expect the applications to evaluate their strong scaling behavior for different subvariants which correspond to different values of base node counts and their corresponding problem sizes. The different subvariants and their base node counts and max node counts are described later.
For each subvariant, we expect the application to run the benchmark over at least three levels of strong scaling centered around the base node count. Since this is strong scaling behavior test, the problem size should remain identical for the three levels. For example, for base node count of 1, the application should run the benchmark for 1, 2, and 4 nodes with the same problem size and evalue the scaling behavior.
We expect different subvariants to have different base node counts and the the problem sizes with increase with the base node count. For example, if the problem size is N for base node count value of 1, then problem size can be 64*N for base node count value of 64.
Starter:
base node count: 1
max node count: 8
Tiny:
base node count: 1
max node count: 64
Small: 64-256 nodes
base node count: 64
max node count: 256
Medium: 256-768 nodes
base node count: 256
max node count: 768
Weak Scaling Benchmarks
The weak scaling benchmarks are designed to test the ability of application to solve a large problem on a large number of nodes. In the weak scaling benchmarks, the computational load per node should remain roughly constant as the number of nodes are scaled up to the full system.
In JUREAP, we expect the applications to evaluate their weak scaling behavior for different subvariants which correspond to different values of base node counts and their corresponding problem sizes. The purpose of the different subvariants is to allow applications to test their weak scaling in a systematic manner over different scales without having to run all the node sizes on the full system. In particular, we expect the applications to evaluate their weak scaling behavior for the following subvariants:
Tiny : 1-32 nodes
Small : 64-256 nodes
Medium : 512-2048 nodes
Full System Benchmark
This variant of benchmark should be designed to run on a full system. The number of nodes corresponding to the full system are given as below:
Juwels Booster : 900 nodes
Reproducible Benchmarks using JUBE
We strongly encourage the use of JUBE benchmark environment to run the benchmarks. The framework is supposed to be agnostic to how the JUBE file is implemented, but it is does impose specified behavior on accepting certain command line options (in form of JUBE tags) and expects the output to be in a specific format, both of which we describe in the following sections.
Please refer to the following instructions for special things one needs to consider while writing the JUBE file for JEDI.
Specifications on the Expected Output of JUBE
exacb
expects that the JUBE benchmark file should produce a CSV file
when the following command is executed at the end of the benchmark run:
jube result --style csv ../outpath > filename.csv
the CSV result produced by JUBE should have columns as shown in the example. There is no restriction on any additional columns that you might want to add to the CSV file. Depending on the type of benchmark, you might have different number of rows in the CSV file, corresponding to different nodes.
system |
version |
queue |
variant |
jobid |
nodes |
taskspernode |
threadspertask |
runtime |
success |
---|---|---|---|---|---|---|---|---|---|
jurecadc |
2024.01 |
dc-cpu |
strong.tiny |
1273 |
1 |
4 |
1 |
4993.08 |
true |
jurecadc |
2024.01 |
dc-cpu |
strong.tiny |
1274 |
2 |
4 |
1 |
2633.73 |
true |
The meaning of each of the columns in the CSV file is as follows:
column |
description |
---|---|
system |
system name |
version |
system version |
queue |
slurm queue |
variant |
benchmark variant (and subvariant) |
jobid |
slurm jobid |
nodes |
number of nodes |
taskspernode |
number of tasks per node |
threadspertasks |
number of threads per task |
runtime |
runtime in seconds as reported by the application |
success |
result of verification of the output |
system:
The value of enviornment variable
SYSTEMNAME
on the system where the benchmark was run. The value is also available in/etc/FZJ/systemname
.version:
The version of the system on which the application was benchmarked. In future, this will be the value of environment variable
SYSTEMVERSION
. For now, we just hardcode2024.01
queue:
The value of the tags corresponding to the benchmark queues as described above.
variant:
The value of the tags corresponding to the benchmark variants as described above.
jobid:
The slurm job ID of the specific job run in JUBE.
nodes:
The number of nodes used in the benchmark run.
taskspernode:
The number of tasks per node used in the benchmark run.
threadspertask:
The number of threads per task used in the benchmark run. This option corresponds to the value passed to slurm option –threads-per-task on JSC systems.
runtime:
The runtime of the benchmark in seconds. Please note that this is the runtime that the benchmark wants to report. This is not the walltime of the job.
success:
Report whether the benchmark is successful or not. This can either be true or false.
Continuous Benchmarking using exacb
Once a reproducible workflow is designed using JUBE, the next step is to integrate the workflow with the CI/CD system at JSC. For JUREAP, we use the jureap/jube CI/CD component to run the benchmarks in a continuous integration environment. You can refer to the tutorial for an introduction on how to integrate your application with the CI/CD system at JSC.