Skip to content

CLI: Sandboxes #963

@josephjclark

Description

@josephjclark

CLI sandboxing evolves and supercedes the Nu Sync design in #913. It's largely based on the same ideas, but shaped towards the new openfn sandboxing.

May need breaking up into an epic, or I may just chew through the work (the sandbox epic has been too hard to author upfront)

Design

Project Files

As in new sync, each in-app project is saved to a yaml file. Confusingly this is sort of a hybrid between a project.yaml and state.json, and the format is totally arbitrary (I think we'll prefer yaml).

A project file can be local or synced to an app. If its synced to an app, it has openfn keys on various elements, which include app-only state and configuration, like UUIDs.

Project files are named like <name>@<domain>, where the name is a local name (default to main). Each sandbox is a named variant of the project. Most projects will only have one domain, this is fine,

The project file contains a complete record of that app project: the settings, workflows, UUIDs (if present), credentials (copied if marked as dev, else just a UUID).

Checkout

A project can be "checked out" or "expanded" onto disk, which means extracting the workflows and step expressions into nicely organised files which can be run through the CLI.

When you run openfn pull <projectname>, that project will be instantly checked out to disk.

When checking out a project, the CLI will first attempt to write any on-disk diffs to the project file. In the event of a conflict, an error will be raised.

It is possible for checked out files to be inconsistent with the project.yaml state - ie, I check out a project and edit a workflow.

Merging

  • merge when pushing local changes into an existing project file (ie, track UUIDs properly)
  • merge any two project files and show the result on disk. Top level command
  • need to resolve credentials here

Syncing & Versioning

The project file should always sync with the app.

When pulling, we first update the project file, and then expand it onto disk.

When deploying, we create a temporary new project file in memory, push that to the app, and if successful write the returned project state to disk (including any changes and new version history).

Each workflow has a version history, maintained by the CLI and the app in the same way. The version is a string hash of the workflow contents. It is unique and not reversible.

Each time the app makes a change* the history is updated. Consecutive changes from the CLI (or app) will squash histories together, reducing the history length.

* Well, conceptually, not literally. The only time the CLI will actually write a new version history is on checkout (stash),

The app (and maybe CLI) will have an algorithm to prune histories, removing redundant versions

When pushing to the app, the app will decide whether the incoming workflow versions are compatible. If so it'll write the change, if not it'll error. Writes can be forced, which overwrites conflicts.

The inverse is true when pulling: if pulling a workflow that has incompatible changes, the CLI will throw an error. Difficulty: how do we pull and merge those changes? I am increasingly thinking the CLI needs a local conflict resolution strategy. Maybe everything on disk stays the same, but any conflicting files from the checkout will create a .conflcit file. The user must then resolve all the conflict files and then carry on.

Credentials

The idea is that dev credentials will be included inline in the project yaml, but private credentials will just have a UUID. Private credentials won't run locally until we work out a way to safely attach them to a local run. This is probably fine - in a sandbox we only need local creds and they can just run inline.

This is largely driven by the provisioner API, I think. I suppose the CLI must also upload credentials in much the same way.

In fact, how would one maintain a private credential in the CLI? You'd have to store it on disk, outside of git, but something in the workflow.yaml must reference it.

Perhaps they can be saved by UUID and we can have some convention to store them in a gitignored folder

If a dev cred is changed locally, should we propagate this change up to the app? I kinda don't think so tbh - you'll have to go into the app and change it manually. Credentials are special, even dev ones. We can warn about this case.

Provisioner

All this needs to run against a v2 provisioner API, I think, which has different rules and returns different sturctures.

That assumes it's viabile to maintain a v1 and v2 provisioner API on the app at the same time. If not, old CLIs will start throwing errors

We also need a means to standardise the expected structure of the state object

Github Sync

Seperate issue. The v2 github sync is designed to sync a whole repo against a whole app.

Each branch will have a checkedout project or sandbox. Branches can be merged which merges the workflows naturally, and can then be deployed.

Commands

To support sandboxing the CLI needs the following commands:

$ openfn pull [name] [domain] --no-checkout

By default pull will checkout the project to disk, but pass --no-checkout to skip this. This is like a git fetch and just updates the local project file.

$ openfn deploy [name] [domain]

This will deploy whatever is currently checked out to the app. Name and domain are optional if they are unique (if you only have one domain, like most projects, you never need to specify it).

Note: would main@app.openfn.org be a better convention here? Now domain isn't even an option, and what you deploy better matches a file.

$ openfn fork

Create a new sandbox from the currently checked out project. Should we include UUIDs or let lightning generate them?

$ openfn checkout

Checkout a project file onto the file system

$ openfn merge

Behaves like the app merge: this will push the workflows from one project into another. Essentially it's a simple replacement of the workflows.

Shows a picklist of workflows to merge, which the user can select. Only changed workflows can be picked,

If a UUID exists in the project file, the CLI needs to decide whether to preserve or drop it. We should come up with a heuristic for thism as it will affect he audit trail for the workflow in the app.

Like if the id doesn't change, we preserve the UUID. Simple. But if the local id changes, we need to make a decision about whether we treat this as a new node or a changed one. We can probaly say things like: if the same number of nodes are added as removed, check similarity based on adaptor and expression. If we think we can match nodes, then keep the UUID. We may also be able to infer based on structure: if the id changes but the parent is the same, we can consider it the same.

At the end of the day, if you change every property of every node, the CLI would just delete all nodes and rebuild them, and in the app all nodes would be new, with a weaker audit trail.

$ open project

List the project status of the current working directly

This does something like:

  • List the currently checked out project
  • Say whether there's a diff from the yaml file
  • maybe List the workflows by name
  • Maybe if --verbose is passed, show the version hash of each workflow
  • List all the domains associated with the project
  • For each domain, list the project and its sandboxes by name
  • List all the project files, by name and domain, that are available
    Maybe show the last sync date

Difficulty: is it possible to have two unrelated projects in one folder? My instinct is we should just disallow this and cut out a lot of complexity. So we assume that all project files relate to the same conceptual project. Do we recognise a single source of truth?

$ openfn save/eject/stash

Maybe? This would update the local project file based on the contents of the disk, but not deploy it. It's sort of the inverse of pull-without-checkout. Why do you want this? It feels like a reassuring option, and would mean you can share the project.yaml file

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions