Skip to content

Centralize materialized view registry and refactor refresh task#292

Merged
MoralCode merged 4 commits into
chaoss:mainfrom
shlokgilda:feature/materialized-views-registry
Jun 17, 2026
Merged

Centralize materialized view registry and refactor refresh task#292
MoralCode merged 4 commits into
chaoss:mainfrom
shlokgilda:feature/materialized-views-registry

Conversation

@shlokgilda

@shlokgilda shlokgilda commented May 5, 2026

Copy link
Copy Markdown
Contributor

Description:

Introduces a single source of truth for all 15 existing PostgreSQL materialized view definitions and replaces a fragile hardcoded refresh task with a dynamic loop.

  • add collectoss/application/db/materialized_views.py — registry of all 15 existing materialized views (SQL definitions + unique index columns). get_refresh_sql() helper builds REFRESH MATERIALIZED VIEW statements.
  • refactor collectoss/tasks/db/refresh_materialized_views.py — replaces 14 hardcoded try/except: pass blocks with a dynamic loop over the registry. Uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY actually works
    (it cannot run inside a transaction block, which the old code silently violated). Raises RuntimeError at the end if any view failed both concurrent and non-concurrent refresh, so failures are visible in Celery instead of swallowed.
  • wire alembic_utils into collectoss/application/schema/alembic/env.py — registers all views from the registry so alembic revision --autogenerate detects SQL definition changes automatically (phase 2 of Add new materialized views for heatmaps on 8knot (and make a system for it) #243).
  • add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

No schema changes in this PR. The 3 new heatmap views for 8Knot follow in a separate incremental PR on top of this one.

This PR fixes (partial — heatmap views follow separately) #243

Notes for Reviewers:

The key correctness fix worth understanding: REFRESH MATERIALIZED VIEW CONCURRENTLY which wraps everything in engine.begin(), so every concurrent refresh was silently failing and falling back to a blocking refresh on every run. This PR fixes that by using engine.connect().execution_options(isolation_level="AUTOCOMMIT") directly.

The ; COMMIT; embedded in the old SQL strings has also been removed — those were breaking SQLAlchemy's transaction management.

First post-deploy alembic revision --autogenerate may propose a no-op normalization revision for views whose SQL Postgres normalizes differently from what's in the registry (Postgres reformats SQL on storage). Safe to discard that revision.

Signed commits

  • Yes, I signed my commits.

AI Disclosure: Claude Code was used to draft this PR description and write docstrings in the new materialized_views.py module. I reviewed and verified all code changes, SQL definitions, and fixes before committing.

@shlokgilda shlokgilda requested a review from MoralCode as a code owner May 5, 2026 05:02
Comment thread collectoss/application/db/materialized_views.py Outdated
@MoralCode MoralCode added this to the v1.1 Migration Release milestone May 5, 2026
Comment thread collectoss/application/db/models/augur_data.py
Comment thread collectoss/tasks/git/facade_tasks.py
Comment thread collectoss/tasks/db/refresh_materialized_views.py
@MoralCode

Copy link
Copy Markdown
Contributor

How easily would this method handle a move of all materialized views to their own db schema?

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 37f7acd to 0db2f05 Compare May 18, 2026 18:08
@MoralCode

MoralCode commented May 18, 2026

Copy link
Copy Markdown
Contributor

im taking this over/picking up this issue on behalf of shlok, just rebased it and made an adjustment to more intelligently handle jumping straight to a non-concurrent refresh when there is no index (indicating we can run concurrently)

@MoralCode MoralCode moved this to In Progress in CollectOSS Feature Roadmap May 18, 2026
@MoralCode MoralCode moved this from In Progress to Paused in CollectOSS Feature Roadmap May 27, 2026
@MoralCode

Copy link
Copy Markdown
Contributor

Note to self: materialized views should be moved to a schema called stable

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 0db2f05 to b9b7f02 Compare June 9, 2026 22:36
@MoralCode

Copy link
Copy Markdown
Contributor

need to also test an actual refresh cycle for this

@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from b9b7f02 to 2a8f113 Compare June 10, 2026 23:19
shlokgilda and others added 4 commits June 10, 2026 19:20
- add collectoss/application/db/materialized_views.py: single source of
  truth for all 15 existing materialized view definitions (SQL + unique
  index columns). get_refresh_sql() helper constructs REFRESH statements.

- refactor collectoss/tasks/db/refresh_materialized_views.py: replace 14
  hardcoded try/except-pass blocks with a dynamic loop over the registry.
  uses an AUTOCOMMIT connection so REFRESH CONCURRENTLY works correctly
  (it cannot run inside a transaction block). raises RuntimeError if any
  view fails both concurrent and non-concurrent refresh.

- wire alembic_utils into collectoss/application/schema/alembic/env.py:
  register all views from the registry so that alembic revision
  --autogenerate detects SQL definition changes automatically.

- add alembic-utils==0.8.8 to pyproject.toml and regenerate uv.lock.

closes chaoss#243 (partial — heatmap views follow in a separate PR)

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Replace the list-of-dicts registry with a frozen MaterializedView dataclass
exposing fqn, refresh_sql(), and to_pg_view(). Brings the registry's shape
in line with the declarative ORM style used elsewhere in the codebase and
gives callers attribute access + type checking instead of string-keyed
dict lookups.

unique_index_columns is a tuple so frozen=True actually means immutable.
__repr__ is overridden to keep the multi-hundred-line view SQL out of
debug logs. Refresh task and alembic env.py updated to use the new API;
get_refresh_sql free function removed (only two call sites).

Emitted REFRESH SQL is byte-identical to the previous version.
…e correct way based on whether indexes exist or not

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
@MoralCode MoralCode force-pushed the feature/materialized-views-registry branch from 2a8f113 to fd9a862 Compare June 10, 2026 23:20
@MoralCode

Copy link
Copy Markdown
Contributor

I wonder if DBT may be better for this usecase by allowing us to move the transformation of data from our internal messy schema to an external presentation ready schema

See #18

I think this could move a lot of our processing outside of postgres. Unsure how that would help with resource contention though.

That said, DBT seems to be a standard thing that huge data systems use....

@MoralCode

Copy link
Copy Markdown
Contributor

DBT seems to do the same thing and wouldnt even remove processing load from CollectOSS. We can switch later if needed.

This should be fairly thoroughly tested. Merging.

@MoralCode MoralCode merged commit 2521a3a into chaoss:main Jun 17, 2026
15 checks passed
@github-project-automation github-project-automation Bot moved this from Paused/Background to Done in CollectOSS Feature Roadmap Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants