I have some concerns with the logic in #3917 (currently in create_clone, but planned to be introduced into get_src_root as well) for determining if you're in a standalone cime checkout: It seems very possible that the share directory would exist within cime, because someone ran manage_externals from within cime, either by accident or on purpose, yet the user's intent is to use a srcroot one level up from cime.
I see two possible solutions:
Using safer auto-detection heuristics
If we're going to stick with some auto-detection heuristics to determine if you're in a standalone cime checkout, then I would want to make this logic as guaranteed as possible to assume that you're not in a standalone cime checkout in ambiguous cases, since that should be the case for most/all users (non-core-developers). The specific scenario that @gold2718 and I are concerned about is one where someone has run manage_externals/checkout_externals both at the top-level (in the top-level directory of CESM, CAM, CTSM, etc.) and (deliberately or by accident) within cime. The current logic in #3917 would assume that you're in a standalone cime checkout in this ambiguous case, which I feel could cause problems.
In today's discussion, I argued against basing heuristics on what is present in the parent directory of cime, but as I think about it more, I actually think it could be a good idea to look there. But I would suggest that, if we do that, we look for a specially-named file like I_AM_A_MODEL_CONTAINING_CIME (maybe a hidden file, .I_AM_A_MODEL_CONTAINING_CIME, though someone can probably come up with a better name than that). Then models containing cime (CESM, CAM, CTSM, etc.) could have an empty file with that name at the top level. The heuristics would then assume you are not in a standalone cime checkout if they find that file in the parent directory of cime.
Assuming you are not in a standalone cime checkout unless you say you are
Alternatively, we could always assume that you are not in a standalone cime checkout unless the user explicitly specifies that they are (through command-line flags to create_newcase, create_clone, scripts_regression_tests (which would pass it along as needed), and possibly any other scripts that need it, though there may not be any others). create_newcase already lets you specify --srcroot; IIRC, @fischer-ncar ran into one or two issues that would still need to be solved with that (which led him to introduce the call to get_src_root in generic_xml.py, for example), but it seemed to me like this was a fundamentally solvable problem.
The advantages of this approach are:
- We wouldn't run the risk of assuming a standalone cime checkout when in fact the user is intending to use cime within another model – which is the standard mode of operation for typical users.
- In the scenario where we have deliberately checked out externals both within cime and at the higher level, this would allow us to manually select which set of externals we want to use for a given test. (Though we could also achieve the same result by combining both ideas: i.e., using heuristics but letting you override those heuristics.)
Here were my thoughts from about 6 weeks ago:
Maybe you've already seen this, but in case not: I poked around for a few minutes and found this:
https://github.com/esmci/cime/blob/master/scripts/create_newcase#L101-L108
However, it looks like there isn't a corresponding --srcroot argument to create_test. I'm thinking that, if you added a --srcroot argument to create_test then that would get you part of the way.
But this would also be a problem, since some tests use create_clone:
https://github.com/esmci/cime/blob/master/scripts/lib/CIME/case/case_clone.py#L42
I'm thinking there are actually a few problems there:
(1) For user use, create_clone should have the same srcroot logic as create_newcase, with the ability to override it just like in create_newcase. (This isn't important for getting your testing to work, but would be a good thing in general.)
(2) Maybe for the purposes of system tests, the create_clone function needs an option like keep_original_srcroot; when called from the system tests, that should be set to true, in which case the clone's SRCROOT is set equal to the original case's SRCROOT.
With those things in place, I'm thinking you'd be able to get this to work if you specify '--srcroot ..' (assuming you're running this from within cime/scripts) to the create_test command.
In a discussion today, @jedwards4b raised concerns about allowing any arbitrary --srcroot. We could deal with this by making this argument more specific, like --srcroot-is-cimeroot. However, I'm not sure why this would be more of a problem for create_clone than for create_newcase. It feels to me like we should support the same argument in both: either a general --srcroot or a more limited --srcroot-is-cimeroot.
I have some concerns with the logic in #3917 (currently in
create_clone, but planned to be introduced intoget_src_rootas well) for determining if you're in a standalone cime checkout: It seems very possible that the share directory would exist within cime, because someone ran manage_externals from within cime, either by accident or on purpose, yet the user's intent is to use a srcroot one level up from cime.I see two possible solutions:
Using safer auto-detection heuristics
If we're going to stick with some auto-detection heuristics to determine if you're in a standalone cime checkout, then I would want to make this logic as guaranteed as possible to assume that you're not in a standalone cime checkout in ambiguous cases, since that should be the case for most/all users (non-core-developers). The specific scenario that @gold2718 and I are concerned about is one where someone has run
manage_externals/checkout_externalsboth at the top-level (in the top-level directory of CESM, CAM, CTSM, etc.) and (deliberately or by accident) within cime. The current logic in #3917 would assume that you're in a standalone cime checkout in this ambiguous case, which I feel could cause problems.In today's discussion, I argued against basing heuristics on what is present in the parent directory of cime, but as I think about it more, I actually think it could be a good idea to look there. But I would suggest that, if we do that, we look for a specially-named file like
I_AM_A_MODEL_CONTAINING_CIME(maybe a hidden file,.I_AM_A_MODEL_CONTAINING_CIME, though someone can probably come up with a better name than that). Then models containing cime (CESM, CAM, CTSM, etc.) could have an empty file with that name at the top level. The heuristics would then assume you are not in a standalone cime checkout if they find that file in the parent directory of cime.Assuming you are not in a standalone cime checkout unless you say you are
Alternatively, we could always assume that you are not in a standalone cime checkout unless the user explicitly specifies that they are (through command-line flags to
create_newcase,create_clone,scripts_regression_tests(which would pass it along as needed), and possibly any other scripts that need it, though there may not be any others).create_newcasealready lets you specify--srcroot; IIRC, @fischer-ncar ran into one or two issues that would still need to be solved with that (which led him to introduce the call toget_src_rootingeneric_xml.py, for example), but it seemed to me like this was a fundamentally solvable problem.The advantages of this approach are:
Here were my thoughts from about 6 weeks ago:
In a discussion today, @jedwards4b raised concerns about allowing any arbitrary
--srcroot. We could deal with this by making this argument more specific, like--srcroot-is-cimeroot. However, I'm not sure why this would be more of a problem forcreate_clonethan forcreate_newcase. It feels to me like we should support the same argument in both: either a general--srcrootor a more limited--srcroot-is-cimeroot.