How to Run
The Python file run_chain.py
in the root directory is the main script of the
Processing Chain.
It reads the user’s input from the command line and from the config.yaml
file of the
respective case.
Then it will start the Processing Chain.
Starting the Chain
The chain has to be run with the following command:
$ ./run_chain.py <casename>
Here, <casename>
is the name of a directory in the cases/
-directory where
there is a config.yaml
-file specifying the configuration, as well as templates
for the necessary namelist files for int2lm, COSMO or ICON. It may also
contain additional runscripts to be submitted via sbatch
.
Hint
Technically, you can run several cases (instead of a single case) in one command,
which is useful for nested runs, for example. This can be achieved by running
./run_chain.py <case1> <case2>
. With that, the full chain is executed for
case1
first, and afterwards for case2
.
There are several optional arguments available to change the behavior of the chain:
$ ./run_chain.py -h
-h
,--help
Show this help message and exit.
-j [JOB_LIST ...]
,--jobs [JOB_LIST ...]
List of job names to be executed. A job is a
.py
file in i``jobs/`` with amain()
function, which handles one aspect of the Processing Chain, for example copyingmeteo
input data or launching a job forint2lm
. Jobs are executed in the order in which they are given here. If no jobs are given, default jobs will be executed as defined inconfig/models.yaml
.
-f
,--force
Force the Processing Chain to redo all specified jobs, even if they have been started already or were finished previously. WARNING: Only logfiles get deleted, other effects of a given job (copied files etc.) are simply overwritten. This may cause errors or unexpected behavior.
-r
,--resume
Resume the Processing Chain by restarting the last unfinished job. WARNING: Only the logfile gets deleted, other effects of a given job (copied files etc.) are simply overwritten. This may cause errors or unexpected behavior.
What it Does
The script run_chain.py
reads the command line arguments and the config file
from the specified case.
It then calls the function run_chain.restart_runs()
, which divides the
simulation time according to the specified restart steps. Then it calls
run_chain.run_chunk()
for each part (chunk) of the simulation workflow.
This function sets up the directory structure of the chain and then submits the
specified jobs via sbatch
to the Slurm workload manager,
taking job dependencies into account.
Test Cases
The following test cases are available:
cosmo-ghg-spinup-test
cosmo-ghg-test
icon-test
icon-art-oem-test
icon-art-global-test
To be able to run these test cases, it is necessary to provide the input data, to setup spack and to compile the models and tools. All this is automized via the script:
$ ./jenkins/scripts/jenkins.sh
This will run all the individual scripts in jenkins/scripts/
, which
can also be launched separately if desired.
These cases undergo regulary testing to ensure that the Processing Chain runs correctly. A corresponding Jenkins plan is launched on a weekly basis and when triggered within a GitHub pull request.
Directory Structure
The directory structure generated by the Processing Chain for a cosmo-ghg
run looks like this:
cfg.work_root/cfg.casename/
└── cfg.chain_root/
├── checkpoints/
│ ├── cfg.log_working_dir/
│ ├── cfg.log_finished_dir/
├── cfg.cosmo_base/
│ ├── cfg.cosmo_work/
│ ├── cfg.cosmo_output/
│ ├── cfg.cosmo_restart_out/
└── cfg.int2lm_base/
├── cfg.int2lm_input/
├── cfg.int2lm_work/
└── cfg.int2lm_output/
As one can see, it creates working directories for both the int2lm
preprocessor
and cosmo
. Additionally, and this is always the case, the checkpoints
directory holds all the job logfiles. Whenever a job has successfully finished,
the logfile is copied from the working
to the finished
sub-directory.
Running the cosmo-ghg-test
case therefore produces the following
directories and files (showing four levels of directories deep):
work/cosmo-ghg-test
├── 2015010100_2015010106/
│ ├── checkpoints/
│ │ ├── finished/
│ │ │ ├── biofluxes
│ │ │ ├── cosmo
│ │ │ ├── emissions
│ │ │ ├── int2lm
│ │ │ ├── oem
│ │ │ ├── online_vprm
│ │ │ ├── post_cosmo
│ │ │ ├── post_int2lm
│ │ │ └── prepare_cosmo
│ │ └── working/
│ │ ├── biofluxes
│ │ ├── cosmo
│ │ ├── emissions
│ │ ├── int2lm
│ │ ├── oem
│ │ ├── online_vprm
│ │ ├── post_cosmo
│ │ ├── post_int2lm
│ │ └── prepare_cosmo
│ ├── cosmo/
│ │ ├── input/
│ │ │ ├── oem/
│ │ │ └── vprm/
│ │ ├── output/
│ │ │ └── lffd*.nc
│ │ ├── restart/
│ │ │ └── lrff00060000o.nc
│ │ └── run/
│ │ ├── cosmo-ghg
│ │ ├── INPUT_*
│ │ ├── post_cosmo.job
│ │ ├── run.job
│ │ └── YU*
│ └── int2lm/
│ ├── input/
│ │ ├── emissions
│ │ ├── extpar
│ │ ├── icbc
│ │ ├── meteo
│ │ └── vprm
│ ├── output/
│ │ ├── laf*.nc
│ │ └── lbfd*.nc
│ └── run/
│ ├── INPUT
│ ├── INPUT_ART
│ ├── int2lm
│ ├── OUTPUT
│ ├── run.job
│ └── YU*
└── 2015010106_2015010112/
├── checkpoints/
│ ├── finished/
│ │ ├── biofluxes
│ │ ├── cosmo
│ │ ├── emissions
│ │ ├── int2lm
│ │ ├── oem
│ │ ├── online_vprm
│ │ ├── post_cosmo
│ │ ├── post_int2lm
│ │ └── prepare_cosmo
│ └── working/
│ ├── biofluxes
│ ├── cosmo
│ ├── emissions
│ ├── int2lm
│ ├── oem
│ ├── online_vprm
│ ├── post_cosmo
│ ├── post_int2lm
│ └── prepare_cosmo
├── cosmo/
│ ├── input/
│ │ ├── oem
│ │ └── vprm
│ ├── output/
│ │ └── lffd*.nc
│ ├── restart/
│ │ └── lrff00060000o.nc
│ └── run/
│ ├── cosmo-ghg
│ ├── INPUT_*
│ ├── post_cosmo.job
│ ├── run.job
│ └── YU*
└── int2lm/
├── input/
│ ├── emissions
│ ├── extpar
│ ├── icbc
│ ├── meteo
│ └── vprm
├── output/
│ ├── laf*.nc
│ └── lbfd*.nc
└── run/
├── INPUT
├── INPUT_ART
├── int2lm
├── OUTPUT
├── run.job
└── YU*
- run_chain.run_chunk(cfg, force, resume)[source]
Run a chunk of the processing chain, managing job execution and logging.
This function sets up and manages the execution of a Processing Chain, handling job execution, logging, and various configuration settings.
- Parameters:
cfg (Config) – Object holding user-defined configuration parameters as attributes.
force (bool) – If True, it will force the execution of jobs regardless of their completion status.
resume (bool) – If True, it will resume the last unfinished job.
- Raises:
RuntimeError – If an error or timeout occurs during job execution.
Notes
This function sets various configuration values based on the provided parameters.
It checks for job completion status and resumes or forces execution accordingly.
Job log files are managed, and errors or timeouts are handled with notifications.
- run_chain.restart_runs(cfg, force, resume)[source]
Start subchains in specified intervals and manage restarts.
This function slices the total runtime of the processing chain according to the cfg.restart_step_hours configuration. It calls run_chunk() for each specified interval.
- Parameters:
cfg (Config) – Object holding all user-configuration parameters as attributes.
force (bool) – If True, it will force the execution of jobs regardless of their completion status.
resume (bool) – If True, it will resume the last unfinished job.
Notes
The function iterates over specified intervals, calling run_chunk() for each.
It manages restart settings and logging for each subchain.