Skip to content

DB-3: secondary indexes on colouring_count_3 / colouring_count_5 / alexander_polynomial #33

@hyperpolymath

Description

@hyperpolymath

Source: Audit doc /tmp/krl-quandle-tangle-audit-2026-06-01.md DB-3.

Claim

Queries that filter by colouring_count_3, colouring_count_5, or alexander_polynomial (and after the polynomial-extension issue lands: Jones, Conway, HOMFLY) should hit a secondary index, not table-scan the knot table.

Why valuable

Currently find where colouring_count_3 = p and crossing < 8 scans the whole semantic index table. For O(10^4)+ knots this becomes the bottleneck of every multi-filter query.

A B-tree or hash index on these fingerprint columns turns it into O(log n) lookup.

Architecture position

Storage lives in Skein.jl. QuandleDB queries the semantic index it populates. The index lives in Skein.jl too. So this issue is upstream-blocked on Skein.jl exposing index-creation primitives.

Acceptance criteria

Phase A — audit (in this repo)

  • Identify every column accessed in find where ... = ... filters across server/serve.jl route handlers and server/krl/SqlFrontend.jl.
  • For each, classify: high-cardinality numeric (colouring counts, generator_count, relation_count), low-cardinality string (degree_partition, quandle_key), polynomial string (alexander, future jones/homfly), or other.
  • Document index strategy per column.

Phase B — Skein.jl extension (upstream)

  • PR to Skein.jl exposing create_index!(db, table, column, index_kind) and drop_index!(db, table, column).
  • Default index_kind per data type (B-tree for numeric, hash for string).

Phase C — QuandleDB consumption

  • On serve.jl startup, ensure the canonical indexes exist (create-if-missing).
  • On find where query execution, prefer the indexed path.
  • Add a --no-index env knob for benchmark comparison.

Phase D — verification

  • Benchmark: find where colouring_count_3 = 9 on a 10⁴-knot table before vs after.
  • Property test: indexed and non-indexed queries return identical result sets.

Effort

  • Phase A: 4h.
  • Phase B: 1w upstream PR review cycle.
  • Phase C: 1d.
  • Phase D: 4h benchmark + property tests.

Out of scope

  • Composite indexes (multi-column). Phase E.
  • Full-text indexes on canonical_presentation or polynomial strings. Phase F.
  • Materialised views. Different concern entirely.

Cross-references

  • Audit doc DB-3, DB-7 (cost-based query rewriting depends on this).
  • Skein.jl issue tracker — open a corresponding tracker there for Phase B.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions