Skip to content

ICON Meeting 2024/2

20 June 2024, ETH Zurich (L 17.1) and via Zoom

Participants (on-site)

Michael Jähn (MJ), Matthieu Leclair (ML), Annika Lauber (AL), Jonas Jucker (JJ), Fulden Batibeniz (FB), Emmanuele Russo (ER), Guillaume Bertolli (GB), Brigitta Goger (BG), Clarissa Kroll (CK), Doris Folini (DF)

Participants (Zoom)

Sylvaine Ferrachat (SF), Fabian Gessler (FG), Lukas Jansing (LJ), Mikael Stellio (MS), Nander Wever (NW), Jacopo Canton (JC), Christian Steger (CS), Alina Yapparova (AY), David Leutwyler (DL), Sven Kotlarski (SK), Dominik Brunner (DB), Arash Hamzehloo (AH), Will Sawyer (WS)

Minutes by Annika Lauber

Reports

C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker)

MJ welcomes everyone and shares the latest news about ICON from C2SM .

ML reporting on Alps. CK asks how to do benchmarking. Nobody really knows. ML answers so far the only piece of information available is WS's benchmark (presentation to come).

Guillaume Bertolli

Over the last three years, GB has been focused on building a machine learning implementation for ECRAD. Next, instead of just predicting the contribution of radiation, the plan is to predict the sum of all physics contributing to tendencies. The dataset needed for this next phase has not yet been constructed.

Emmanuele Russo

ER shared that for ICON-CLM, they finally found the optimal calibration for the EURO-CORDEX domain. The model performance is pretty good, especially compared to COSMO-CLM. They applied several model developments, including a routine for transient aerosol.

AL asked if it will run on a CPU as the transient aerosol datasets is not yet ported to GPU.

ER confirmed that it will be on a CPU for now but asks when the port is expected to be ready.

AL answers that it is really hard to predict as it has to be done from scratch now.

Brigitta Goger

BG reported that she is using ICON below 1km. There has been a paper accepted using the limited area setup over Austria. Regarding the hecometric range, BG mentioned that there might be an issue with the Smagorinsky model, as there is a clear bias, at least.

As a side project with Nadja Omanovic, they explored the representation of clouds at 1km and 65m resolutions, using both 1-moment and 2-moment schemes. They concluded that to represent clouds resolution is more important than on the microphysics scheme.

Clarissa Kroll

CK reported that they are using the ICON-Seamless/ICON-xpp (extended predictions and projections) setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option.

Doris Folini

DF is just here to inform herself on ICON. She was part of Christoph Schär's group, which no longer exists.

Fulden Batıbeniz

FB reported that her goal is to use the ICON-CLM configuration, hopefully on GPU. However, they need to do benchmarking, which is not possible yet. FB's project focuses on seasonal predictions.

Nander Wever

NW reported that he works at SLF, with his main task being snowpack development. They aim to make it operational for the next winter season.

Sylvaine Ferrachat

SF reported that there has been no new development on ICON-HAM since the last meeting. However, they plan to soon set up ICON-HAM as a global simulation with climatology at the R2B8 resolution.

Jacopo Canton

JC reported that he is mostly working with ICON4py and using ICON primarily for validation. He is attending the meeting to stay up to date.

Sven Kotlarski

SK, from the climate department of MCH, mentioned their efforts to link ICON-CLM development to climate scenarios. They understand that this integration will take more time, but fortunately, the scenarios do not have a real dependency. They expressed the desire to have something available soon. SK clarified that he does not engage in coding for ICON and is not familiar with the technical details.

David Leutwyler

DL supports Team X's campaign in Austria and is working on writing proposals for CSCS.

Christian Steger

CS, from the numerical weather forecasting team, is currently investigating the topographic setting for surface radiation. They are exploring which parameterizations would be appropriate to port. CS asks people to let him know if they are working with complex terrain at high resolutions.

Mikael Stellio

MS is currently focused on the GPU side of ICON-HAM and is working on optimizing tracer advection.

Alina Yapparova

AY is part of the GLORI-A project and is involved in simulating 3D radar data. They are currently running ICON with Emvorado, and recently fixed a bug in Emvorado.

Lukas Jansing

LJ started working in MCH-SEN last September, focusing on preparing the model for simulations. They are still in the process of finishing product migration and are experimenting with finding a better configuration for production. LJ noted there is still a temperature bias in valleys that needs addressing. Additionally, they have begun working on GLORI-A, conducting the first set of 500m simulations, though no significant results have emerged thus far.

Fabian Gessler

FG spends most of his time getting ICON operational on GPUs and is currently focused on achieving further performance improvements. They are still using more nodes than preferred. FG will probably be contributing to the GLORI-A project in the future.

Arash Hamzehloo

AH is currently porting components of ART to GPUs and testing new code using CO2 over Zurich.

Dominik Brunner

DB mentioned that his group is working a lot on greenhouse gas modeling and has found GPU ports to be crucial. They are currently transitioning to the GPU version and also engage in air pollution modeling. ICON-ART includes chemistry capabilities, although they haven't yet implemented all required processes; a postdoc is dedicated to this task.

They are also examining the impact of net-zero scenarios on air pollution. Recently, they conducted a two-day training course for ICON-ART, including a setup exercise for air pollution modeling on Levante.

Will Saywer presenting "ICON on Alps: How is it performing?"

Discussion

ML asks how WS's benchmarking will be able to be used.

WS responds: It can be used, but not exactly the same.


ML asks about achieving higher resolution with less use of the power cabinet.

WS believes they have identified the issue and will rerun the benchmarks. R2B9 shows improvement but still doesn't match R2B7.

DL asks if WS can correlate the different steps. There are various steps in R2B7.

WS explains that initialization is a key step, and in the first timestep, reading boundary conditions takes a significant amount of time. This process is much quicker for lower resolutions.


DL remarks that it would be valuable to have these results documented somewhere on the wiki. He also speculates that WS has many tricks for conducting benchmarks and asks if WS could share them.

WS responds that he will definitely share those tricks. He mentions there's no decision yet on enabling low-noise mode but assures that there will be at least one partition available with low-noise mode.

DL then mentions he's working on a recommended configuration. He asks if not using just-fit mode instantly drops below 50%.

WS clarifies that it doesn't. Personally, WS disagrees with using just-fit mode and suggests running on the best-fit configuration instead. He points out that the machine is so powerful that it will still yield a good time-to-solution.


FB mentions she is writing a proposal using ICON-CLM, which is still in the development phase for GPU porting. They tested it on Daint because it doesn't run on ALPS. Maria Grazia mentioned that scaling on Daint will likely be rejected.

WS responds that Maria Grazia mentioned they can apply for an account to conduct early testing.

FB adds that ICON-CLM is not finalized yet and hasn't been tested on pre-ALPS systems.

MJ notes they have a similar setup and could potentially run it on ALPS. They ask if FB can use WS's benchmarking.

WS confirms FB can use their benchmarking, but it's a different configuration. WS isn't sure if this could be a reason for rejection. WS offers asking Maria Grazia if that would help.

ML suggests that benchmarking for ICON-NWP could also be useful for ICON-CLM. They ask if WS could share the namelist.

WS agrees to share the namelist and mentions that he can also provide access to weak scalability tests for certain people.


DB asks how much faster a given configuration runs on Alps compared to Daint and if one can estimate from that comparison.

WS responds that If you run GPU-to-GPU configuration, Alps is roughly a factor of 9 faster. However, this approach wouldn't fully utilize memory, so you would use fewer nodes.


JC asks about the power cap per cabinet.

WS corrects that it's actually per GPU. In terms of energy-to-solution, you can make it run faster but not in a linear way. Grace CPU is very powerful and shares memory with GPU. Some people may want to run components on CPU, but this gives priority to the CPU, potentially slowing down the GPU. WS notes they need to caution users because running large components on the CPU might not perform well. Ocean-atmosphere coupling is an example of such a component. CSCS is advocating for a GPU port to address these issues.


ML asks how reducing the cap would affect scaling.

WS responds that it would likely improve scaling, but emphasizes that setting the power cap involves political decisions.

ML queries if the power cap could vary across different clusters.

WS replies that he is unsure but can inquire about it.


DF mentions that all tests were conducted within one cabinet and asks if it's expected for individual jobs to run within a single cabinet.

WS responds by mentioning they ran other tests (R2B11) that did not fit into one cabinet, noting it had minimal impact.

ML comments that a virtual cluster cap may not work because it's virtual.

WS admits he's uncertain about how elastic it is and plans to revisit the issue.

WS states he will discuss what information can be shared on the wiki and will inquire with Maria Grazia about whether benchmarks from Daint are sufficient.