Skip to content

Add optional total_count field to ListPipelineJobsResponse#167

Open
Mbeaulne wants to merge 1 commit intomasterfrom
03-13-add_total_counts_to_the_runs_endpoint
Open

Add optional total_count field to ListPipelineJobsResponse#167
Mbeaulne wants to merge 1 commit intomasterfrom
03-13-add_total_counts_to_the_runs_endpoint

Conversation

@Mbeaulne
Copy link
Collaborator

@Mbeaulne Mbeaulne commented Mar 13, 2026

TL;DR

Added total_count field to the ListPipelineJobsResponse to provide the total number of pipeline runs matching the query criteria when requested.

What changed?

  • Added total_count: int | None = None field to ListPipelineJobsResponse class
  • Added include_total_count: bool = False parameter to the list method in PipelineRunsApiService_Sql
  • When include_total_count is True, the method executes a count query using sql.func.count() with the same filter conditions as the main query
  • The total_count is populated in the response object when the parameter is enabled

How to test?

Run the existing test suite which now includes comprehensive tests for the total_count functionality:

  • test_list_total_count_not_included_by_default - verifies count is None by default
  • test_list_total_count_empty - verifies count is 0 when no runs exist
  • test_list_total_count_matches_results - confirms count matches actual results
  • test_list_total_count_with_pagination - ensures count remains consistent across paginated requests
  • test_list_total_count_with_filter - validates count respects filter conditions

Why make this change?

This enhancement allows clients to optionally retrieve the total number of pipeline runs that match their query criteria, which is essential for implementing proper pagination controls and displaying accurate result counts in user interfaces. The feature is opt-in to avoid performance overhead when the total count is not needed.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Mbeaulne Mbeaulne marked this pull request as ready for review March 13, 2026 18:53
@Mbeaulne Mbeaulne requested a review from Ark-kun as a code owner March 13, 2026 18:53
@Mbeaulne Mbeaulne changed the title Add total counts to the runs endpoint Add total_count field to ListPipelineJobsResponse Mar 13, 2026
@Mbeaulne Mbeaulne force-pushed the 03-13-add_total_counts_to_the_runs_endpoint branch from d45706d to 9d010d9 Compare March 13, 2026 19:13
@Mbeaulne Mbeaulne changed the title Add total_count field to ListPipelineJobsResponse Add optional total_count field to ListPipelineJobsResponse Mar 13, 2026
total_count = session.scalar(
sql.select(sql.func.count())
.select_from(bts.PipelineRun)
.where(*where_clauses)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance this looks like it will only return you the count for the current page. Can you confirm?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, your tests seem to suggest otherwise

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what's happening. This is based on limit + offset. Which will soon change:

https://app.graphite.com/github/pr/TangleML/tangle/131?ref=gt-pasteable-stack

I guess, let's adjust this solution accordingly

).all()
)

total_count = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic as this makes queries about 4000+x slower. Basically, we:

  1. Run same query twice
  2. The 2nd time we run it without any limits.
    This is really bad for query performance.
    As the DB grows, these queries become slower and slower. For some heavy queries like substring match (search by name), the slowdown will be immense.
    Normal queries search for "First 10 matches", scanning the DB until 10 matches are found. But the new "total_count" query always scans the whiole DB (40000+ runs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants