Skip to content

CIME V6 Planning! #3886

@jgfouca

Description

@jgfouca

CIME V6 Planning

Let's use this issue to discuss CIME 6 plans. Once we have solidified a feature list, I can look at creating and organizing related issues. Feel free to assign anyone or edit this document. None of the content of this document is official until the team signs off; this is just me organizing my thoughts and getting the discussion started.

Background

In the last couple major CIME versions, the system transitioned from a loose collection of Perl and csh scripts, to a somewhat more centralized Perl system, and finally to a highly centralized Python system. We have made excellent progress in the areas of system cohesion, software design/engineering, performance, robustness, and testing while expanding capabilities. CIME also went from being closely coupled to CESM to being a more-independent infrastructure package supporting several climate models.

On the less-positive side, some of this progress came at the expense of added complexity and less transparency for users. I've had numerous interactions with users over the years where it was clear they were not happy with their CIME experience. This could be selection bias since happy users are usually quieter than unhappy ones, so it's hard to say if users are unhappier overall than they were before these centralization efforts.

Current Situation

CIME is in a decent state but I think with some renewed energy and the freedom to break backwards compatibility, we can make it even better.

On the E3SM side of things, we have new developer resources in @WesCoomber (.4 FTE) and @jasonb5 (1.0 FTE, full time!) which I want to take full advantage of. I want Wes and Jason to be excited to do CIME development, enthusiastic about (and contributing to) the vision for the project, and eventually stepping up to fill my role as E3SM's main CIME resource. Between me, Rob, Wes, and Jason, I think this is the most that E3SM has ever had invested in CIME and this is one of the main reasons for me creating this document.

In general, transformative changes to CIME have slowed down a bit in the last couple years. The reasons for this are many: the system is naturally maturing, the resources for core CIME development have been limited (especially on the E3SM side, just me and @rljacob have had longterm commitment to CIME), limitations in testing have made us afraid to break each other (more on this later), and changing behavior / breaking backwards incompatibility is very painful in a production system.

The meta goals of V6 will be to accelerate the next rounds of transformative CIME changes while getting our new developers invested in CIME and improving the user experience.

Technical goals

Solidify role of namelists in the system and how model components should interact with them

We've have some discussion of this in the past here: #1278

Namelists are input files for the model execution so, in theory, we should be able to do nml generation once at the start of the RUN phase and that's it. What happens in practice is that namelists are generated during SETUP, BUILD, SUBMIT, and RUN phases. Back when generating namelists was computationally expensive, this was a significant performance bottleneck for the case control system. Now that it's pretty fast, the problem is more of an issue of technical debt and unnecessary complexity.

During my work on the build system, it quickly became clear that most of our major components (like eam (our atm) and elm (our lnd)) were using buildnml as a sort of pre-build configuration step. See this diagram:

before_cmake

The components know that CIME promises to call buildnml before the BUILD phase, so they took the opportunity to have buildnml setup the key Filepath and CCSM_cppdefs files that the build system depends on even though these files don't have anything to do with namelists. In a sense, buildnml became a catch-all for all pre-build component-specific setup. Adding to the problem is that, for our bigger components, the buildnml and configure (the script the generates Filepath and CCSM_cppdefs) scripts have become multi-thousand-line piles of Perl and so it's difficult to see exactly what they doing. At a glance, our configure scripts do not seem to be accessing or modifying namelists directly, so they appear to be decouple-able from buildnml.

Proposal

At the very least, configure should be decoupled from buildnml and integrated into the BUILD phase since that's what it's for. On the E3SM side, we could even integrate it into our CMake system since that's what CMake is for, configuring your build. This should allow us to remove the buildnml calls from both the SETUP and BUILD phases. If any component is using namelists to store/maintain general case data, that is a violation of their contract with CIME and those instances should be immediately changed to use the env XML system instead. If any components need custom setup actions, we should provide an extension point for that in CIME that is not buildnml; something like $component/cime_config/setup.

Additional investigation is needed to see if about the buildnml class in SUBMIT and RUN. I think it's possible the one in SUBMIT can be removed without too much difficulty as well.

Build system

Some of this effort will likely be specific to E3SM only since we already diverged significantly from the classic CIME build system when we went to a CMake-based system two years ago.

Related issues:
#3446
#3341
#3287

I'd like to continue to reorganize and unify the build system around CMake. The first part of this will be refactoring how we handle sharedlibs which currently require lots of special handling in CIME. The SHAREDLIB_BUILD phase of the case-control system iterates over a list of sharedlibs that it thinks the case needs and calls cime/src/build_scripts/buildlib.$libname. What happens then varies greatly between sharedlibs. Some leverage the classic CIME scripts/Tools/Makefile, some have CMake, some have their own Makefiles. It would be nice if we could have all the shared lib builds use CMake under-the-hood. That way, we could begin to unwind some of the complexity of our various systems for managing compilation settings and just put all that stuff in CMake directly. There's a significant amount of complexity in the code that generates Macro files because it needs to support multiple build system languages (Make and CMake) and language-neutral config_compilers.xml. If the build system was fully cmake-ified , we could potentially just write all the compiler/flag stuff in hierarchical cmake cache files. Ideally, I think all the info in the Depends files, config_compilers.xml, and hardcoded stuff in CMakeLists.txt could all be nicely encapsulated in a cmake cache file system. This would remove layers of CIME magic between users and their compilation settings.

Another very important topic is thinking about how to reduce the amount of building that goes on when doing test suites. The default behavior is to do a full build for every case. E3SM test suites offer the ability to mark a suite as shared build but this feature is not yet widely used and is use-at-your-own-risk. It would be interesting to see if the sharedlib system could be expanded to work for components.

Proposal

  1. A deep-dive into CMake-ifying the sharedlibs similar to the CMake deep-dive that was done for the components two years ago
  2. A consolidation of flag/compiler settings into a single system, preferably a hierarchical cmake cache system..
  3. Sharedlibs are not current shared across cases for E3SM. That needs to change.
  4. Investigate how to further reduce amount of time spent building cases when running test suites

XML env database

Relevant issues:
#2161
#3338
#2965

We've made good progress on CIME's env XML "database", especially in the areas of robustness, performance/caching, and encapsulating python's XML ElementTree. I think more progress can be made in formalizing the guarantees/invariants of the system, further standardization of syntax across env xml files, expanding and standardizing the attribute selector concept, and all-around simplification.

Even as co-author of the XML system, I often get a bit lost in our XML code because the execution path, even for fairly simple actions like get_value, is so complex. I'd like for a developer to do a deep dive into CIME/XML/*.py and try to find sources of complexity and potential remedies.

Proposal

  1. Env XML database system guarantees, invariants, restrictions, etc are well-documented
  2. Users should be able to use attribute selectors on any field using any previously defined field as a selector
  3. Deep-dive for complexity reduction in implementation of the system

Testing

Relevant issues:
#2521

The problem is nicely described in that issue, so I won't repeat too much here. My hunch is that the best thing to do would be to expand upon the GitHub CI system to cover more system, model, and compiler combinations. We've had great results doing this for my other project SCREAM using Jenkins and autotester.

Test scheduling

We currently have create_test/test_scheduler.py that works fine if you are on the machine where you want to run. Can things be improved by using CWL?

Code / repo layout

Relevant issues:
#3432
#3393

Drop support for python2, let's move to requiring a newish version of python3. I'd like to be able to use python's latest string formatting syntax, pathlib, and modern concurrency libraries.

Complete the separation of fortran science packages (cpl/drv data models, etc) by model. I think we have an issue for this but I couldn't find it.

It would be nice to be able to do model-specific extensions to CIME without touching the CIME repo. Things like model-specific provenance should be modifiable from the host repo.

Finally, I got beat up pretty badly in an E3SM all-hands a few years ago for the non-standard, non-"pythonic" organization of CIME's python code tree under CIME/scripts. This was causing problems for developers who use python IDEs and confusion with importing , PYTHONPATH etc. This should be pretty easy to clean up, so I think it's worth addressing even if most of us are using text editors to develop CIME. Potentially even look into integrating CIME within the Python ecosystem, PIP, anaconda, etc.

CIME development process

As I look through our open issues, I see lots of old issues falling through the cracks, including bug reports and other items that look high priority. We occasionally go through open issues during our Wed meetings but that is time consuming and not much fun. I don't have any concrete proposal to deal with this, but it seems like we need some additional mechanisms for organizing, prioritizing, and shepherding tickets. It would be ideal if we could achieve this without additional meeting overhead.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions