COSMO

The COSMO model is a limited-area, non-hydrostatic model developed by a collaboration of National Weather Services called the Consortium for Small-scale Modeling.

Support status

C2SM currently facilitates the utilisation of COSMO on the Piz Daint computing platform for CPU and GPU architectures. The master and c2sm-features branches are being continuously tested an Piz Daint.

The following table summarises the features ported to GPU and their correspoding namelist parameters.

gpu ported COSMO features
Parameters in INPUT_ORG
scheme/parameterisation namelist parameter GPU porting status
Physics lphys ported
Diagnostics ldiagnos ported
Digital filtering ldfi not ported
Use observations luseobs ported
Ensemble mode leps ported
Stochastic perturbation of physics tendencies lsppt ported
Synthetic satellite images luse_rttov not ported
Radar forward operator luse_radarfwo not ported
Aerosol and Reactive Tracer module (ART) l_cosmo_art not ported
Pollen module l_pollen ported (available in MeteoSwiss Fork only)
Online trajectory module l_traj ported
Zero vertical velocity on lower boundary llm not supported in the C++ dycore
Incremental analysis update itype_iau = 0, 1, 2 Only itype_iau = 0 ported
Idealised runs lartif_data not ported
2D model runs l2dim not ported
Periodic boundary conditions in X direction lperi_x ported (not tested)
Periodic boundary conditions in Y direction lperi_y ported (not tested)
Reproducible results in parallel mode lreproduce ported
Reorder MPI process numbering lreorder not ported
Implicit MPI buffering ldatatypes ported
Additional MPI barriers ltime_barrier ported
Write ASCII files every time step ldump_ascii ported
All processors write debug output lprintdeb_all ported
Debug statements in various model sections ldebug_dyn, ldebug_gsp, ldebug_rad, ldebug_sso, ldebug_tur, ldebug_con, ldebug_soi, ldebug_io, ldebug_mpe, ldebug_dia, ldebug_lhn, ldebug_ass, ldebug_art partially ported, not all prints are active on GPU
Initialise local variables linit_fields not ported
Parameters in INPUT_PHY
scheme/parameterization namelist parameter GPU porting status
Grid-scale precipitation scheme lgsp ported
Grid-scale precipitation scheme type itype_gscp = 1, 2, 3, 4 only itype_gscp = 3, 4 ported to gpu
Run grid-scale precipitation scheme first lgsp_first only lgsp_first = .TRUE. ported to gpu
Radiation lrad ported
Cloud representation mode icldm_rad = 0, 1, 3, 4 all options ported
Forest lforest ported
Topographic correction of radiation lradtopo ported
External surface emissivity lemiss ported
Aerosol scheme type itype_aerosol = 1, 2, 3 only itype_aerosol = 1, 2 ported
Albedo scheme type itype_albedo = 1, 2, 3, 4 all options ported
Convection scheme lconv ported
Convection scheme type itype_conv = 0, 2, 3 all options ported
Vertical turbulent diffusion ltur ported
Old turbulence scheme behavior loldtur Only loldtur = .TRUE. is ported and tested
Vertical diffusion calculation location itype_vdif = -1, 0, 1 itype_vdif = -1 is ported. itype_vdif = 0, 1 is ported but NOT tested
Turbulence scheme type itype_turb = 1, 3, 5/7 only itype_turb = 3 is tested
3D turbulence l3dturb not ported
TKE equation type imode_turb = 0, 1, 2 only imode_turb is tested
SSO wake turbulent production ltkesso ported
TKE convective buoyancy production ltkecon ported
TKE horizontal shear production ltkeshs ported - not tested
Shear production type itype_sher = 0, 1, 2 only 0 is tested
Transfer scheme type itype_tran = 1, 2 only 0 is tested
TKE equation type in transfer scheme imode_tran = 0, 1, 2 only 1 is tested
Soil model lsoil ported
Sea ice scheme lseaice not ported
Flake lake model llake ported
Multi-layer snow model lmulti_snow ported but NOT tested
Vegetation transpiration type itype_trvg = 1, 2 all options ported
Bare soil evaporation type itype_evsl = 2, 3, 4 all options ported
Root distribution type itype_root = 1, 2 all options ported
Canopy parameterization type itype_canopy = 1, 2 all options ported
Soil heat conductivity type itype_heatcond = 1, 2, 3 all options ported
Mire parameterization type itype_mire = 0, 1 all options ported
Hydraulic lower boundary parameterization itype_hydbound = 1, 3 all options ported
Snow-cover fraction type idiag_snowfrac = 1, 2, 3, 4, 20, 30, 40 all options ported
Subgrid scale orography lsso ported
Parameters in INPUT_DYN

The GPU porting of the dynamical core of COSMO was accomplished by rewriting the dynamics with the Gridtools stencil library. The Gridtools dycore supports a subset of the parameters of the COSMO Fortran dynamical core. The list of features currently supported in the Gridtools dycore can be found in the [documentation in the code repository] (https://github.com/C2SM-RCM/cosmo/blob/master/dycore/doc/Dycore/supported_configuration.tex).

Warnings

  • The support status on the future Alps system is not yet known. It strongly depends on the ability to use an old interpretation of the OpenACC standard.

  • C2SM’s support for COSMO is scheduled to stop end of 2024

Access

In order to get access to the COSMO repository hosted on the C2SM-RCM GitHub organisation, please contact C2SM Support.

Once you have access, clone the repository from GitHub using the SSH protocol:

git clone git@github.com:C2SM-RCM/cosmo.git

If you do not already have an SSH key set up for GitHub but would like to do so, follow the instructions.

Configure and compile

For configuring and building COSMO with Spack, please refer to the official spack-c2sm documentation, which provides instructions for setting up a Spack instance and installing COSMO on Piz Daint.

In the Tools section, you will find relevant tools for working with COSMO:

  • Extpar: External parameters for the COSMO-grid (preprocessing)
  • int2lm: The interpolation software for the COSMO-model (preprocessing)
  • Processing Chain: Python workflow tool for COSMO

Documentation

COSMO documentation is available at:

Asynchronous IO for NetCDF - A Guide for an optimal model setup

Node configuration

When using asynchronous IO (Input/Output), the workload of the IO processors must be carefully balanced. In general, no robust “rule of thumb” has been found, so some benchmark runs may be necessary. Finding an optimal setup is not trivial, since the number of output namelists, the number of fields, and the number of IO processors may vary between different setups. In particular, overloading the IO processors during model cleanup at the end of a job leads to additional runtime. Note that using online compression can sometimes significantly increase the time and resources required for IO.

A quick overview of the actual workload on the IO processors can be obtained from the COSMO logs by increasing the verbosity settings of the log (ldebug_io=.true., idbg_level=6):

Asyn-IO: block number xx was filled. Allocating a new block

The above log is printed each time a new buffer block is allocated. This means that the Compute PEs (Processing Elements) store output data until the IO processors write it to disk. If the number of allocated blocks increases during the simulation, i.e., more and more buffer blocks must be stored on the compute PEs after each output step, the IO processors will not write as fast as the model generates new output.

When the speed of writing data to disk by the IO processors and the model timestepping are in balance, the log message changes to:

 Asyn-IO: block number xx was filled, but the oldest one was released

It indicates that a balance has been reached between writing data to disk and the model generating new output. In addition, each time an output file is completely written to disk, the corresponding IO processor prints a message that looks like this:

start_ionode:  Next asynchronous output will be done in step: xx

This log can give additional insight into how long it takes to write a file to disk. In addition, there is a timer called asyncio_wait in the output section of YUTIMING.

Output                         0.19         0.43         0.86        72.65
  computations O               0.08         0.11         0.16        17.71
  meta  data                   0.00         0.00         0.00         0.00
  write data                   0.00         0.19         0.55        32.70
  gather data                  0.10         0.12         0.14        20.35
  asynIO wait                  0.00         0.78         4.31       130.84

It provides information about how long all compute-PEs had to wait at the end of the simulation for all IO processors to finish writing data to disk. This timer should be as small as possible.

Zlib replacement for NetCDF compression

Online compression, enabled with the lcompress_netcdf=.true. parameter, uses the slower Zlib by default. A speedup of a factor of two can be achieved by using Zlib_ng instead. The +zlib_ng variant for building COSMO is required in combination with the command spack load cosmo@master%pgi cosmo_target=gpu +zlib_ng before execution. Unfortunately, the convenient way of using RPATH for this feature is not possible.