Skip to content

[Hackathon] feat: Multi-Source Data Import — URL, Local File, SQLite, REST API#5119

Open
EmilySun621 wants to merge 3 commits into
apache:mainfrom
EmilySun621:hackathon/data-sources
Open

[Hackathon] feat: Multi-Source Data Import — URL, Local File, SQLite, REST API#5119
EmilySun621 wants to merge 3 commits into
apache:mainfrom
EmilySun621:hackathon/data-sources

Conversation

@EmilySun621
Copy link
Copy Markdown

Paste a URL. Drop a file. Open a SQLite database. Ask the AI agent. Four new import paths, zero manual download.


What's New

🔗 URL Import — Paste any CSV/JSON URL on the Datasets page, click Import. Server-side fetch, auto-format detection.

📁 Local File Drop — Drag & drop CSV, JSON, XLSX, TSV, SQLite directly onto the Datasets page.

🗄️ SQLite Import — Drop a .sqlite file → pick tables from a list → each table becomes a dataset. Uses Bun's built-in bun:sqlite, no external dependencies.

⚡ REST API Agent Toolfetch_api_data tool lets the AI agent fetch from any API endpoint. Auto-flattens nested JSON to tabular format.


How It Works

Frontend (Datasets page)          Agent Service (port 3001)         Texera
┌─────────────────────┐          ┌─────────────────────────┐      ┌──────────┐
│ URL input ──────────┼────→     │ POST /fetch-url         │─────→│ Dataset  │
│ File drop zone ─────┼────→     │ POST /sqlite-tables     │      │ Creation │
│ Agent chat ─────────┼────→     │ POST /sqlite-export     │      │ API      │
└─────────────────────┘          │ Tool: fetch_api_data    │      └──────────┘
                                 └─────────────────────────┘

Verified

$ curl -X POST localhost:3001/api/data-source/fetch-url \
    -H "Content-Type: application/json" \
    -d '{"url":"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"}'

→ {"rows":150, "columns":["5.1","3.5","1.4","0.2","Iris-setosa"], "format":"csv"} ✅

Files Changed

New: agent-service/src/api/data-source-api.ts (3 endpoints), data-source-tools.ts (agent tool)

Modified: user-dataset.component.* (URL input + drop zone), dataset.service.ts (fetch methods), proxy.config.json, DatasetSearchQueryBuilder.scala (fix: new datasets now appear in list immediately)

Emily Sun and others added 3 commits May 15, 2026 21:55
This bundles the feature work that built up on this branch:

- Custom agents: dashboard CRUD page and editor dialog (48px icon tile,
  chip-style guardrails, model selector). Each custom agent now carries a
  LiteLLM model_name (Opus 4.7 / Haiku 4.5) that is passed through to the
  agent-service so different agents can use different models.

- Conversation history is scoped per (workflowId, agentId): switching
  agent or workflow yields a different conversation list. localStorage
  key: texera.workflowConversations.v1.{workflowId}.{agentId}.

- Time machine: workflow snapshot list, revert, and agent-tagged
  checkpoints. New workflow-history-tool in agent-service backs the
  "undo my last change" flow; amber gains a WorkflowSnapshotResource;
  sql/updates/23.sql adds the snapshot table.

- Operator-aware custom-agent prompts: the system prompt now injects the
  full operator catalog with a "prefer built-in operators over Python
  UDFs" rule, sourced from WorkflowSystemMetadata at request time.

- LiteLLM: added the claude-opus-4.7 entry alongside claude-haiku-4.5
  and gpt-5-mini in bin/litellm-config.yaml.

- Agent panel rewritten around the (conversation list / chat) two-view
  model with subscription-managed list reloads and per-step persistence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…set UI

Preserves in-progress work-in-progress changes before switching branches:
agent-service gains a data-source router with format utilities, and the
user-dataset frontend gains UI/styles backed by new dataset service
helpers. Saved so the snippets-quicksteps branch can be resumed cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent tool

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added engine ddl-change Changes to the TexeraDB DDL frontend Changes related to the frontend GUI dev common agent-service labels May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-service common ddl-change Changes to the TexeraDB DDL dev engine frontend Changes related to the frontend GUI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant