JUREAP

Note

If you need help, please send us a mail at jureap-support@fz-juelich.de or alternatively, open an issue in the JSC gitlab repository assigned to your application in the jureap project.

Every participant in JUREAP is required to create a continuous-benchmarking-ready version of their application using the exacb framework. This allows for tracking both system and application performance during build-up of JUPITER – and beyond. The benchmark helps in ensuring that the application is running correctly and efficiently on JUPITER.

The participants are expected to design scientific use-cases for each of the benchmarks. The scientific use case should be chosen such that it is representative of the application’s eventual use case on JUPITER. The use cases should be designed to run on both the JUWELS Booster and the JUPITER Booster systems. In general, the applications should design a use-case for each of the different significant steps towards the eventual full system run of the application on JUPITER. The section on Designing your Benchmarks provides more details on how to design the use cases.

For the purpose of continuous benchmarking, the benchmarks are expected to be wrapped up in a reproducible workflow that can then be run in a continuous integration environment. The applications can choose a workflow framework of their choice, however, the use of the JUBE framework is strongly encourage due to its extensive use at JSC and our experience with testing it in context of reproducible benchmarks. Please look at the quickstart guide on how to use JUBE for running the benchmarks in the Reproducible Benchmarks using JUBE section.

At JSC, we support continuous integration on JSC machines using the Jacamar CI/CD Gitlab Executor. In addition, the benchmarks are expected to use the exacb framework for running the benchmarks inside the continuous integration environment. The exacb framework assists in running the benchmarks, collecting the results and generating reports in a uniform manner across all applications in JUREAP.

With that in mind, there are following main steps to be followed to integrate your application with the exacb framework as listed below. We will go over each of these steps in detail in the following sections.

Designing your Benchmarks
Reproducible Benchmarks using JUBE
Continuous Benchmarking using exacb

Designing your Benchmarks

The first step in JUREAP is to design a set of use-cases that can be used to benchmark your application for different scenarios. It should be possible to run the complete workflow of downloading, compiling, setting up, running, and validating the result of the use-case in a reasonable amount of time. In particular, the auxiliary parts of the benchmark should not scale super-linearly with the size of the use case. Please talk to us if you have special requirements or need help with managing the runtime of the auxiliary parts of the benchmark.

For JUREAP, we initially require the following benchmark variants to be designed for each application.

Single node benchmark

The purpose of the single node variant of the benchmark is to ensure that the application can run correctly on the queue platform. The benchmark can use any number of cores / gpus on a single node. Once the benchmark is running, the benchmark will be used to measure the performance of the application on a single node.
Strong Scaling Benchmarks

The strong scaling benchmarks should be designed to test the scaling behavior of the application. In the JUREAP, we expect the applications to evaluate their strong scaling behavior for different subvariants which correspond to different values of base node counts and their corresponding problem sizes. The different subvariants and their base node counts and max node counts are described later.

For each subvariant, we expect the application to run the benchmark over at least three levels of strong scaling centered around the base node count. Since this is strong scaling behavior test, the problem size should remain identical for the three levels. For example, for base node count of 1, the application should run the benchmark for 1, 2, and 4 nodes with the same problem size and evalue the scaling behavior.

We expect different subvariants to have different base node counts and the the problem sizes with increase with the base node count. For example, if the problem size is N for base node count value of 1, then problem size can be 64*N for base node count value of 64.

Starter:

base node count: 1

max node count: 8

Tiny:

base node count: 1

max node count: 64

Small: 64-256 nodes

base node count: 64

max node count: 256

Medium: 256-768 nodes

base node count: 256

max node count: 768

Weak Scaling Benchmarks

The weak scaling benchmarks are designed to test the ability of application to solve a large problem on a large number of nodes. In the weak scaling benchmarks, the computational load per node should remain roughly constant as the number of nodes are scaled up to the full system.

In JUREAP, we expect the applications to evaluate their weak scaling behavior for different subvariants which correspond to different values of base node counts and their corresponding problem sizes. The purpose of the different subvariants is to allow applications to test their weak scaling in a systematic manner over different scales without having to run all the node sizes on the full system. In particular, we expect the applications to evaluate their weak scaling behavior for the following subvariants:

Tiny : 1-32 nodes

Small : 64-256 nodes

Medium : 512-2048 nodes

Full System Benchmark

This variant of benchmark should be designed to run on a full system. The number of nodes corresponding to the full system are given as below:

Juwels Booster : 900 nodes

Reproducible Benchmarks using JUBE

We strongly encourage the use of JUBE benchmark environment to run the benchmarks. The framework is supposed to be agnostic to how the JUBE file is implemented, but it is does impose specified behavior on accepting certain command line options (in form of JUBE tags) and expects the output to be in a specific format, both of which we describe in the following sections.

Please refer to the following instructions for special things one needs to consider while writing the JUBE file for JUPITER.

JUPITER-Specific Instructions

Tags and their Expected Behavior

Given a JUBE benchmark file, the exacb framework runs the equivalent of the following code to execute the benchmark

jube-autorun -r "-e --tag <queue> <variant{.subvariant}> --outpath ../outpath" <jube-file>

As shown above, the tags are used to specify the queue, the variant and the subvariant of the benchmark. The following behavior is expected for each class of tags:

queue

The queue tags are used to the specify which slurm queue of the benchmark should run on. While the application can support any of the slurm queues they wish, the following queues are expected to be supported on each of the systems:
- JUWELS Booster : booster
- JURECA-DC : dc-gpu
variants and subvariants

Single Node Benchmarks
- single : Corresponds to the single node benchmark described above.
- Strong Scaling Variant
  - strong.tiny : Corresponds to the tiny subvariant of the strong scaling benchmark.
  - strong.small : Corresponds to the small subvariant of the strong scaling benchmark.
  - strong.medium : Corresponds to the medium subvariant of the strong scaling benchmark.
- Weak Scaling Variant
  - weak.tiny : Corresponds to the tiny subvariant of the weak scaling benchmark.
  - weak.small : Corresponds to the small subvariant of the weak scaling benchmark.
  - weak.medium : Corresponds to the medium subvariant of the weak scaling benchmark.
- Full System Benchmark
  - full : Corresponds to the full system benchmark described above.
JUPITER Evaluation Runs

The tags are introduced to make it easier to run the JUPITER evaluation benchmark runs in CI/CD. There are currently three supported tags:

jupiter.evaluation.jupiter : Corresponds to the JUPITER evaluation benchmark run on JUPITER

jupiter.evaluation.jureca : Corresponds to the JUPITER evaluation benchmark run on JURECA

jupiter.evaluation.juwels_booster : Corresponds to the JUPITER evaluation benchmark run on JUWELS Booster

Specifications on the Expected Output of JUBE

exacb expects that the JUBE benchmark file should produce a CSV file when the following command is executed at the end of the benchmark run:

jube result --style csv ../outpath > filename.csv

the CSV result produced by JUBE should have columns as shown in the example. There is no restriction on any additional columns that you might want to add to the CSV file. Depending on the type of benchmark, you might have different number of rows in the CSV file, corresponding to different nodes.

Expected CSV Output of JUBE
system	version	queue	variant	jobid	nodes	taskspernode	threadspertask	runtime	success
jurecadc	2024.01	dc-cpu	strong.tiny	1273	1	4	1	4993.08	true
jurecadc	2024.01	dc-cpu	strong.tiny	1274	2	4	1	2633.73	true

The meaning of each of the columns in the CSV file is as follows:

Description of the columns in the CSV file
column	description
system	system name
version	system version
queue	slurm queue
variant	benchmark variant (and subvariant)
jobid	slurm jobid
nodes	number of nodes
taskspernode	number of tasks per node
threadspertasks	number of threads per task
runtime	runtime in seconds as reported by the application
success	result of verification of the output

system:

The value of enviornment variable SYSTEMNAME on the system where the benchmark was run. The value is also available in /etc/FZJ/systemname.
version:

The version of the system on which the application was benchmarked. In future, this will be the value of environment variable SYSTEMVERSION. For now, we just hardcode 2024.01
queue:

The value of the tags corresponding to the benchmark queues as described above.
variant:

The value of the tags corresponding to the benchmark variants as described above.
jobid:

The slurm job ID of the specific job run in JUBE.
nodes:

The number of nodes used in the benchmark run.
taskspernode:

The number of tasks per node used in the benchmark run.
threadspertask:

The number of threads per task used in the benchmark run. This option corresponds to the value passed to slurm option –threads-per-task on JSC systems.
runtime:

The runtime of the benchmark in seconds. Please note that this is the runtime that the benchmark wants to report. This is not the walltime of the job.
success:

Report whether the benchmark is successful or not. This can either be true or false.

Continuous Benchmarking using exacb

Once a reproducible workflow is designed using JUBE, the next step is to integrate the workflow with the CI/CD system at JSC. For JUREAP, we use the jureap/jube CI/CD component to run the benchmarks in a continuous integration environment. You can refer to the tutorial for an introduction on how to integrate your application with the CI/CD system at JSC.