Add `--dry-run` to check how much data would be processed, but don't actually run query by hugovk · Pull Request #168 · ofek/pypinfo

hugovk · 2025-04-29T12:53:39Z

Here's how to run a dry-run query to see how much data would be processed by a given command's query, by adding dry_run=True, use_query_cache=False to the config:

https://cloud.google.com/bigquery/docs/samples/bigquery-query-dry-run#bigquery_query_dry_run-python

Let's expose it with the --dry-run option.

Dry run

❯ pypinfo --dry-run --all --days 1 --percent pillow pyversion
Served from cache: False
Data processed: 179.27 MiB
Data billed: 0.00 B
Estimated cost: $0.00

❯ pypinfo --dry-run --all --days 1 --percent --json pillow pyversion
{"last_update":"2025-04-29 12:49:20","query":{"bytes_billed":0,"bytes_processed":187974046,"cached":false,"estimated_cost":"0.00"},"rows":[]}

Real run

❯ pypinfo --all --days 1 --percent pillow pyversion
Served from cache: False
Data processed: 179.27 MiB
Data billed: 180.00 MiB
Estimated cost: $0.01

| python_version | percent | download_count |
| -------------- | ------- | -------------- |
| 3.11           |  21.14% |        885,918 |
| 3.12           |  17.61% |        738,188 |
| 3.10           |  17.24% |        722,728 |
| 3.9            |  13.72% |        574,983 |
| 3.7            |  10.09% |        423,114 |
| 3.8            |   6.99% |        292,813 |
| 3.13           |   6.58% |        275,766 |
| None           |   4.99% |        209,134 |
| 3.6            |   1.41% |         58,972 |
| 2.7            |   0.24% |          9,901 |
| Total          |         |      4,191,517 |

❯ pypinfo --all --days 1 --percent --json pillow pyversion
{"last_update":"2025-04-29 12:49:45","query":{"bytes_billed":188743680,"bytes_processed":187974046,"cached":false,"estimated_cost":"0.01"},"rows":[{"download_count":885960,"percent":"0.21","python_version":"3.11"},{"download_count":738283,"percent":"0.18","python_version":"3.12"},{"download_count":722784,"percent":"0.17","python_version":"3.10"},{"download_count":575054,"percent":"0.14","python_version":"3.9"},{"download_count":423088,"percent":"0.1","python_version":"3.7"},{"download_count":292835,"percent":"0.07","python_version":"3.8"},{"download_count":275799,"percent":"0.066","python_version":"3.13"},{"download_count":209172,"percent":"0.05","python_version":"None"},{"download_count":58961,"percent":"0.014","python_version":"3.6"},{"download_count":9899,"percent":"0.0024","python_version":"2.7"}]}

…ally run query

gemini-code-assist

Hello @hugovk, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request introduces a --dry-run option to the pypinfo CLI tool. This allows users to check the amount of data that would be processed by a query without actually running the query. The changes involve adding the --dry-run option using click, modifying the create_config function to include dry_run functionality, and adjusting the output to display the data processed and estimated cost without showing the actual results when --dry-run is enabled. Additionally, tests have been added to verify the correct behavior of the create_config function with the dry_run option.

Highlights

New Feature: --dry-run option: Adds a --dry-run option to the pypinfo CLI, enabling users to estimate data processing costs before running a query.
Configuration Modification: Modifies the create_config function to set dry_run and use_query_cache options based on the --dry-run flag.
Output Adjustment: Adjusts the CLI output to display data processing information without showing query results when --dry-run is active.
Testing: Adds a new test case to verify the correct behavior of the create_config function when dry_run is enabled.

Changelog

Click here to see the changelog

README.md
- Added -n, --dry-run option to the documentation.
pypinfo/cli.py
- Added --dry-run option using click (line 85).
- Modified pypinfo function to accept dry_run parameter (line 108).
- Modified pypinfo function to pass dry_run to create_config (line 172).
- Modified output logic to conditionally display results based on dry_run (lines 192, 197, 210).
pypinfo/core.py
- Modified create_config function to accept dry_run parameter (line 29).
- Added logic to set config.dry_run and config.use_query_cache based on dry_run flag (lines 32-34).
tests/test_core.py
- Added test_create_config_dry_run to verify create_config with dry_run=True (lines 40-46).
- Added assertion for dry_run in test_create_config (line 36).
tox.ini
- Updated coverage report commands to include --show-missing and html (lines 21-22).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A dry run's gentle grace,
No data spills all over the place,
Just costs we see,
For you and me,
Before the query we embrace.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a --dry-run option to the pypinfo CLI, allowing users to estimate the data processed and cost of a query without actually running it. The changes are well-structured and include necessary modifications to the CLI, core logic, and tests. The addition of this feature enhances the tool's usability and transparency regarding resource consumption.

Summary of Findings

Missing Error Handling for Dry Run: The code does not explicitly handle potential errors that might occur during the dry run execution. While the dry run itself doesn't process data, it still interacts with the BigQuery API and could encounter issues like authentication failures or invalid query syntax. Adding error handling would improve the robustness of the feature.
Inconsistent Output for Dry Run: The output format for the dry run is slightly different from a real run, particularly when using the --json flag. While the core information is present, ensuring a consistent output structure would enhance the user experience and simplify parsing of the results.

Merge Readiness

The pull request is well-structured and introduces a valuable feature. However, addressing the missing error handling and ensuring consistent output for the dry run would further improve the quality of the code. I recommend addressing these points before merging. I am unable to directly approve this pull request, and other reviewers should review and approve this code before merging.

codecov · 2025-10-08T20:28:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (0720138) to head (2eb947e).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #168   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          547       558   +11     
  Branches        74        43   -31     
=========================================
+ Hits           547       558   +11

Flag	Coverage Δ
macOS-latest	`100.00% <100.00%> (ø)`
ubuntu-latest	`100.00% <100.00%> (ø)`
windows-latest	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-10-08T20:28:21Z

    if run:
        with create_client(get_credentials()) as client:
-            query_job = client.query(built_query, job_config=create_config())
+            query_job = client.query(built_query, job_config=create_config(dry_run))


Bug: Dry Run Metadata Parsing Error

When dry_run=True, BigQuery queries return only metadata, not data rows. The code incorrectly attempts to retrieve and parse query results via query_job.result() and parse_query_result(), which causes errors or invalid data in subsequent processing.

@hugovk Legitimate?

I get no errors running this:

❯ pypinfo --dry-run --all --days 1 --percent pillow pyversion Served from cache: False Data processed: 344.62 MiB Data billed: 0.00 B Estimated cost: $0.00

Adding some debug print:

query_job = client.query(built_query, job_config=create_config(dry_run)) print(query_job) query_rows = query_job.result(timeout=timeout // 1000) print(query_rows) rows = parse_query_result(query_rows) print(rows)

Gives:

❯ pypinfo --dry-run --all --days 1 --percent pillow pyversion QueryJob<project=pypinfo-hugovk, location=US, id=None> <google.cloud.bigquery.table._EmptyRowIterator object at 0x10b0e2f90> [['python_version', 'download_count']] Served from cache: False Data processed: 344.62 MiB Data billed: 0.00 B Estimated cost: $0.00

Indeed, rows does not contain download data.

But this PR also adds not dry_run guards around subsequent processing, so we don't get errors.

ofek

Thanks a lot!

hugovk · 2025-10-15T12:24:51Z

I'll do a 23.0.0 release.

hugovk · 2025-10-15T12:59:11Z

Done! https://pypi.org/project/pypinfo/23.0.0/

Add --dry-run to check how many bytes would be billed, but don't actu…

6c39ade

…ally run query

hugovk added the enhancement label Apr 29, 2025

gemini-code-assist Bot reviewed Apr 29, 2025

View reviewed changes

Comment thread pypinfo/cli.py

hugovk mentioned this pull request Oct 8, 2025

Add support for Python 3.14 #170

Merged

Merge branch 'master' into dry-run

2eb947e

cursor Bot reviewed Oct 8, 2025

View reviewed changes

ofek approved these changes Oct 10, 2025

View reviewed changes

hugovk merged commit aa28405 into ofek:master Oct 15, 2025
19 checks passed

hugovk deleted the dry-run branch October 15, 2025 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--dry-run` to check how much data would be processed, but don't actually run query#168

Add `--dry-run` to check how much data would be processed, but don't actually run query#168
hugovk merged 2 commits into
ofek:masterfrom
hugovk:dry-run

hugovk commented Apr 29, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Oct 8, 2025

Uh oh!

ofek Oct 9, 2025

Uh oh!

hugovk Oct 10, 2025

Uh oh!

ofek left a comment

Uh oh!

Uh oh!

hugovk commented Oct 15, 2025

Uh oh!

hugovk commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hugovk commented Apr 29, 2025

Dry run

Real run

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

codecov Bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor Bot Oct 8, 2025

Choose a reason for hiding this comment

Bug: Dry Run Metadata Parsing Error

Uh oh!

ofek Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

hugovk Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

ofek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hugovk commented Oct 15, 2025

Uh oh!

hugovk commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Oct 8, 2025 •

edited

Loading