CLI sandboxing evolves and supercedes the Nu Sync design in #913. It's largely based on the same ideas, but shaped towards the new openfn sandboxing.
May need breaking up into an epic, or I may just chew through the work (the sandbox epic has been too hard to author upfront)
Design
Project Files
As in new sync, each in-app project is saved to a yaml file. Confusingly this is sort of a hybrid between a project.yaml and state.json, and the format is totally arbitrary (I think we'll prefer yaml).
A project file can be local or synced to an app. If its synced to an app, it has openfn keys on various elements, which include app-only state and configuration, like UUIDs.
Project files are named like <name>@<domain>, where the name is a local name (default to main). Each sandbox is a named variant of the project. Most projects will only have one domain, this is fine,
The project file contains a complete record of that app project: the settings, workflows, UUIDs (if present), credentials (copied if marked as dev, else just a UUID).
Checkout
A project can be "checked out" or "expanded" onto disk, which means extracting the workflows and step expressions into nicely organised files which can be run through the CLI.
When you run openfn pull <projectname>, that project will be instantly checked out to disk.
When checking out a project, the CLI will first attempt to write any on-disk diffs to the project file. In the event of a conflict, an error will be raised.
It is possible for checked out files to be inconsistent with the project.yaml state - ie, I check out a project and edit a workflow.
Merging
- merge when pushing local changes into an existing project file (ie, track UUIDs properly)
- merge any two project files and show the result on disk. Top level command
- need to resolve credentials here
Syncing & Versioning
The project file should always sync with the app.
When pulling, we first update the project file, and then expand it onto disk.
When deploying, we create a temporary new project file in memory, push that to the app, and if successful write the returned project state to disk (including any changes and new version history).
Each workflow has a version history, maintained by the CLI and the app in the same way. The version is a string hash of the workflow contents. It is unique and not reversible.
Each time the app makes a change* the history is updated. Consecutive changes from the CLI (or app) will squash histories together, reducing the history length.
* Well, conceptually, not literally. The only time the CLI will actually write a new version history is on checkout (stash),
The app (and maybe CLI) will have an algorithm to prune histories, removing redundant versions
When pushing to the app, the app will decide whether the incoming workflow versions are compatible. If so it'll write the change, if not it'll error. Writes can be forced, which overwrites conflicts.
The inverse is true when pulling: if pulling a workflow that has incompatible changes, the CLI will throw an error. Difficulty: how do we pull and merge those changes? I am increasingly thinking the CLI needs a local conflict resolution strategy. Maybe everything on disk stays the same, but any conflicting files from the checkout will create a .conflcit file. The user must then resolve all the conflict files and then carry on.
Credentials
The idea is that dev credentials will be included inline in the project yaml, but private credentials will just have a UUID. Private credentials won't run locally until we work out a way to safely attach them to a local run. This is probably fine - in a sandbox we only need local creds and they can just run inline.
This is largely driven by the provisioner API, I think. I suppose the CLI must also upload credentials in much the same way.
In fact, how would one maintain a private credential in the CLI? You'd have to store it on disk, outside of git, but something in the workflow.yaml must reference it.
Perhaps they can be saved by UUID and we can have some convention to store them in a gitignored folder
If a dev cred is changed locally, should we propagate this change up to the app? I kinda don't think so tbh - you'll have to go into the app and change it manually. Credentials are special, even dev ones. We can warn about this case.
Provisioner
All this needs to run against a v2 provisioner API, I think, which has different rules and returns different sturctures.
That assumes it's viabile to maintain a v1 and v2 provisioner API on the app at the same time. If not, old CLIs will start throwing errors
We also need a means to standardise the expected structure of the state object
Github Sync
Seperate issue. The v2 github sync is designed to sync a whole repo against a whole app.
Each branch will have a checkedout project or sandbox. Branches can be merged which merges the workflows naturally, and can then be deployed.
Commands
To support sandboxing the CLI needs the following commands:
$ openfn pull [name] [domain] --no-checkout
By default pull will checkout the project to disk, but pass --no-checkout to skip this. This is like a git fetch and just updates the local project file.
$ openfn deploy [name] [domain]
This will deploy whatever is currently checked out to the app. Name and domain are optional if they are unique (if you only have one domain, like most projects, you never need to specify it).
Note: would main@app.openfn.org be a better convention here? Now domain isn't even an option, and what you deploy better matches a file.
$ openfn fork
Create a new sandbox from the currently checked out project. Should we include UUIDs or let lightning generate them?
$ openfn checkout
Checkout a project file onto the file system
$ openfn merge
Behaves like the app merge: this will push the workflows from one project into another. Essentially it's a simple replacement of the workflows.
Shows a picklist of workflows to merge, which the user can select. Only changed workflows can be picked,
If a UUID exists in the project file, the CLI needs to decide whether to preserve or drop it. We should come up with a heuristic for thism as it will affect he audit trail for the workflow in the app.
Like if the id doesn't change, we preserve the UUID. Simple. But if the local id changes, we need to make a decision about whether we treat this as a new node or a changed one. We can probaly say things like: if the same number of nodes are added as removed, check similarity based on adaptor and expression. If we think we can match nodes, then keep the UUID. We may also be able to infer based on structure: if the id changes but the parent is the same, we can consider it the same.
At the end of the day, if you change every property of every node, the CLI would just delete all nodes and rebuild them, and in the app all nodes would be new, with a weaker audit trail.
$ open project
List the project status of the current working directly
This does something like:
- List the currently checked out project
- Say whether there's a diff from the yaml file
- maybe List the workflows by name
- Maybe if
--verbose is passed, show the version hash of each workflow
- List all the domains associated with the project
- For each domain, list the project and its sandboxes by name
- List all the project files, by name and domain, that are available
Maybe show the last sync date
Difficulty: is it possible to have two unrelated projects in one folder? My instinct is we should just disallow this and cut out a lot of complexity. So we assume that all project files relate to the same conceptual project. Do we recognise a single source of truth?
$ openfn save/eject/stash
Maybe? This would update the local project file based on the contents of the disk, but not deploy it. It's sort of the inverse of pull-without-checkout. Why do you want this? It feels like a reassuring option, and would mean you can share the project.yaml file
CLI sandboxing evolves and supercedes the Nu Sync design in #913. It's largely based on the same ideas, but shaped towards the new openfn sandboxing.
May need breaking up into an epic, or I may just chew through the work (the sandbox epic has been too hard to author upfront)
Design
Project Files
As in new sync, each in-app project is saved to a yaml file. Confusingly this is sort of a hybrid between a project.yaml and state.json, and the format is totally arbitrary (I think we'll prefer yaml).
A project file can be local or synced to an app. If its synced to an app, it has
openfnkeys on various elements, which include app-only state and configuration, like UUIDs.Project files are named like
<name>@<domain>, where the name is a local name (default to main). Each sandbox is a named variant of the project. Most projects will only have one domain, this is fine,The project file contains a complete record of that app project: the settings, workflows, UUIDs (if present), credentials (copied if marked as dev, else just a UUID).
Checkout
A project can be "checked out" or "expanded" onto disk, which means extracting the workflows and step expressions into nicely organised files which can be run through the CLI.
When you run
openfn pull <projectname>, that project will be instantly checked out to disk.When checking out a project, the CLI will first attempt to write any on-disk diffs to the project file. In the event of a conflict, an error will be raised.
It is possible for checked out files to be inconsistent with the project.yaml state - ie, I check out a project and edit a workflow.
Merging
Syncing & Versioning
The project file should always sync with the app.
When pulling, we first update the project file, and then expand it onto disk.
When deploying, we create a temporary new project file in memory, push that to the app, and if successful write the returned project state to disk (including any changes and new version history).
Each workflow has a version history, maintained by the CLI and the app in the same way. The version is a string hash of the workflow contents. It is unique and not reversible.
Each time the app makes a change* the history is updated. Consecutive changes from the CLI (or app) will squash histories together, reducing the history length.
* Well, conceptually, not literally. The only time the CLI will actually write a new version history is on checkout (stash),
The app (and maybe CLI) will have an algorithm to prune histories, removing redundant versions
When pushing to the app, the app will decide whether the incoming workflow versions are compatible. If so it'll write the change, if not it'll error. Writes can be forced, which overwrites conflicts.
The inverse is true when pulling: if pulling a workflow that has incompatible changes, the CLI will throw an error. Difficulty: how do we pull and merge those changes? I am increasingly thinking the CLI needs a local conflict resolution strategy. Maybe everything on disk stays the same, but any conflicting files from the checkout will create a
.conflcitfile. The user must then resolve all the conflict files and then carry on.Credentials
The idea is that dev credentials will be included inline in the project yaml, but private credentials will just have a UUID. Private credentials won't run locally until we work out a way to safely attach them to a local run. This is probably fine - in a sandbox we only need local creds and they can just run inline.
This is largely driven by the provisioner API, I think. I suppose the CLI must also upload credentials in much the same way.
In fact, how would one maintain a private credential in the CLI? You'd have to store it on disk, outside of git, but something in the workflow.yaml must reference it.
Perhaps they can be saved by UUID and we can have some convention to store them in a gitignored folder
If a dev cred is changed locally, should we propagate this change up to the app? I kinda don't think so tbh - you'll have to go into the app and change it manually. Credentials are special, even dev ones. We can warn about this case.
Provisioner
All this needs to run against a v2 provisioner API, I think, which has different rules and returns different sturctures.
That assumes it's viabile to maintain a v1 and v2 provisioner API on the app at the same time. If not, old CLIs will start throwing errors
We also need a means to standardise the expected structure of the state object
Github Sync
Seperate issue. The v2 github sync is designed to sync a whole repo against a whole app.
Each branch will have a checkedout project or sandbox. Branches can be merged which merges the workflows naturally, and can then be deployed.
Commands
To support sandboxing the CLI needs the following commands:
$ openfn pull [name] [domain] --no-checkoutBy default pull will checkout the project to disk, but pass
--no-checkoutto skip this. This is like a git fetch and just updates the local project file.$ openfn deploy [name] [domain]This will deploy whatever is currently checked out to the app. Name and domain are optional if they are unique (if you only have one domain, like most projects, you never need to specify it).
Note: would main@app.openfn.org be a better convention here? Now domain isn't even an option, and what you deploy better matches a file.
$ openfn forkCreate a new sandbox from the currently checked out project. Should we include UUIDs or let lightning generate them?
$ openfn checkoutCheckout a project file onto the file system
$ openfn mergeBehaves like the app merge: this will push the workflows from one project into another. Essentially it's a simple replacement of the workflows.
Shows a picklist of workflows to merge, which the user can select. Only changed workflows can be picked,
If a UUID exists in the project file, the CLI needs to decide whether to preserve or drop it. We should come up with a heuristic for thism as it will affect he audit trail for the workflow in the app.
Like if the id doesn't change, we preserve the UUID. Simple. But if the local id changes, we need to make a decision about whether we treat this as a new node or a changed one. We can probaly say things like: if the same number of nodes are added as removed, check similarity based on adaptor and expression. If we think we can match nodes, then keep the UUID. We may also be able to infer based on structure: if the id changes but the parent is the same, we can consider it the same.
At the end of the day, if you change every property of every node, the CLI would just delete all nodes and rebuild them, and in the app all nodes would be new, with a weaker audit trail.
$ open projectList the project status of the current working directly
This does something like:
--verboseis passed, show the version hash of each workflowMaybe show the last sync date
Difficulty: is it possible to have two unrelated projects in one folder? My instinct is we should just disallow this and cut out a lot of complexity. So we assume that all project files relate to the same conceptual project. Do we recognise a single source of truth?
$ openfn save/eject/stashMaybe? This would update the local project file based on the contents of the disk, but not deploy it. It's sort of the inverse of pull-without-checkout. Why do you want this? It feels like a reassuring option, and would mean you can share the project.yaml file