[Hackathon] feat: BioFlow Genesis — the AI reads your dataset, not your prompt#5122
Open
yangzhang75 wants to merge 5 commits into
Open
[Hackathon] feat: BioFlow Genesis — the AI reads your dataset, not your prompt#5122yangzhang75 wants to merge 5 commits into
yangzhang75 wants to merge 5 commits into
Conversation
Wire Genesis agent POST with ?source=genesis & wid, cap ReAct at 30 steps, strip delete tools from the model tool map, and align Bob prompts with plan-then-execute plus non-template Iris reference. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Demo Video
https://drive.google.com/file/d/10wiyRbZVvXGEn5lw5Wvws5WyjD521ICz/view?usp=sharing
What changes were proposed in this PR?
This PR adds BioFlow Genesis, a drag-and-drop entry point that turns any CSV into a running ML workflow. The user drops a file onto the dashboard, an LLM profiles the columns and proposes four analyses grounded in the actual data, and one click materializes a wired Texera workflow with real sklearn training and a Python UDF that writes a five-section interpretation of the run.
The interaction is the point. Genesis reads the dataset, not the user. A biology PhD or social scientist doesn't need to know what task type fits their data, which target column matters, or how to wire operators — they drop the file, look at four typed recommendations, and click. Free-text input below the cards lets advanced users override the recommendation in plain English (typing "predict diabetes using random forest" swaps the trainer node). Genesis is not a workflow preview, not a schema mockup, not a code template — it is a working pipeline that trains a model on real data and returns results, on every drop.
How the workflow is generated
Workflow JSON generation is the most failure-prone surface in LLM-driven workflow tools — one wrong port name or one inverted link breaks the run. Most existing approaches handle this with a self-repair retry loop: prompt the LLM, validate, retry up to N times on failure. Genesis avoids that entirely by keeping the LLM out of the JSON path.
The LLM (Claude Haiku via the existing LLM_ENDPOINT) returns plain text only: profiling notes, recommended task type, target column, algorithm, and the four card titles. A deterministic Python module (
core/workflow_builder.py) emits Texera JSON. The skeletons are tested code paths, so LLM output never touches operator IDs, port maps, or link wiring. Validation cannot fail at the LLM layer because the LLM doesn't produce structure. The wiring is correct by construction, the LLM token budget per workflow drops from thousands to a few hundred, and any future skeleton author edits Python instead of debugging a prompt.The skeleton library
All skeletons compose existing Texera operators only. No new operator types were added, no engine changes, no protocol changes. The skeletons demonstrate that Texera's operator catalog is already sufficient to express the entire non-technical-researcher entry path.
The AI Insight card
The AI Insight node at the end of every skeleton is what closes the loop for non-technical users. Most workflow tools stop at "your model scored X%." The Insight UDF reads the prediction table, computes accuracy or R² depending on the task, picks the top three features from the column metadata, and emits a five-section result table — summary, top predictors, interpretation, next steps, caveat — that renders as a structured card in the result panel. This is the artifact a researcher actually hands to their team: not a number, but a readable explanation of what the model learned.
Verified end-to-end on real public datasets
The same product was run on a medical CSV and a real-estate CSV to confirm that skeleton choice comes from the data itself, not from any preset. Both ran to completion with real sklearn training, real holdout splits, and real metrics.
Pima Indians Diabetes (768 rows, originally collected by the U.S. National Institute of Diabetes — the same NIH agency, NIDDK, that funds dkNET — and the standard medical ML benchmark since 1988) was auto-detected as classification, produced a 6-node LogisticRegression pipeline, scored 72.5% accuracy on a 20% holdout split, and the AI Insight card surfaced Glucose, BMI, and Age as the top three predictors.
California Housing (20,640 rows, UC Irvine source) was auto-detected as regression, produced a 7-node LinearRegression pipeline with automatic feature preprocessing, scored R² = 0.63 and MAE around $51K, and the AI Insight card surfaced longitude, latitude, and housing_median_age as top predictors with a residuals-and-extrapolation caveat.
Free-text input: typing
predict diabetes using random forestproduced a workflow with SklearnRandomForest as the trainer instead of the default LogisticRegression. The LLM parsed the algorithm name from the natural-language sentence; the Python builder swapped the trainer node accordingly, and the pipeline ran end-to-end to completion.The cross-domain switch is the part worth checking. The only user action in all three cases is one drop (or one drop plus one sentence). The skeleton diverges fundamentally between domains because the recommendation is grounded in the data, not in a template.
What is in this PR
The new directory
bioflow-genesis-service/is a standalone FastAPI service on port 9099:core/classifier.py— data profiling, task inference, target detection, algorithm selection, four-card generation, and free-text intent parsing.core/workflow_builder.py— skeleton builders that emit Texera JSON. Includes the preprocessing UDF code, the AI Insight UDF code, and trainer-node selection.core/texera_client.py— thin wrapper around the standardPOST /api/workflow/persistendpoint.core/llm_client.pyandcore/prompts.py— LLM client and prompt templates.api/build.py—POST /api/genesis/buildendpoint consumed by the frontend.tests/test_workflow_builder.py— 10 unit tests covering all skeletons. All passing.Frontend integration ships in a companion commit on the same branch: a drop zone on the dashboard, a card grid modal that animates from analysis to build, a free-text natural language input below the cards, and the 5-section AI Insight card rendering in the result panel.
This PR does not touch the Amber engine, does not add any new Texera operators, and uses the standard
/api/workflow/persistendpoint without modification. The LLM never emits workflow JSON, only text.Any related issues, documentation, discussions?
See Discussion #5059.
How was this PR tested?
pytest bioflow-genesis-service/tests/test_workflow_builder.py— 10 passed.End-to-end manual testing on three cases:
diabetes.csvonto the dashboard, four recommendation cards appeared in a few seconds, clicked the first card, 6-node workflow generated, hit Run, completed in about 10 seconds, 72.5% accuracy on the holdout split, AI Insight card rendered with Glucose / BMI / Age as top predictors.houses.csv, four cards (all regression) appeared, clicked the first card, 7-node workflow with feature preprocessing generated, hit Run, completed in about 15 seconds, R² = 0.63, MAE around $51K, AI Insight rendered with longitude / latitude / housing_median_age.diabetes.csv, typedpredict diabetes using random forest, hit Build, workflow generated with RandomForest trainer, hit Run, ran end-to-end to completion.Cross-domain check confirmed: dropping the two CSVs in sequence produces fundamentally different running pipelines with no user instruction beyond the drop itself.
Future work
The drag-and-drop recommendation entry point is the core interaction shape this PR establishes. Several directions extend naturally from here:
More skeletons. Time-series forecasting, clustering, anomaly detection, and multi-class with imbalance handling all fit the existing recommendation pipeline — each is one new skeleton function plus a profiling rule.
Beyond CSV. The data-profiling layer is the only file-format-aware code in the system; the workflow builder is format-agnostic. Adding FASTQ / VCF / BAM (bioinformatics), Parquet, or image folders is a matter of new profilers and matching Texera operators. The recommendation logic and the AI Insight card stay unchanged. The path to genuinely biomedical, omics-grade workflows is short.
One-drop compound workflows. A research dataset often feeds multiple analyses. An obvious extension is letting the cards combine — "run classification AND find drivers" — by composing two skeletons into a parent workflow with shared upstream nodes, still produced from a single drop.
The intent of this submission is to demonstrate that the data-driven recommendation entry point — drop a file, see four typed analyses, click one, run — is the right interaction shape for non-technical researchers, and to ship it as a working, end-to-end system rather than a preview or design document.
Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude 4.7