Add --dry-run to check how much data would be processed, but don't actually run query#168
Conversation
There was a problem hiding this comment.
Hello @hugovk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces a --dry-run option to the pypinfo CLI tool. This allows users to check the amount of data that would be processed by a query without actually running the query. The changes involve adding the --dry-run option using click, modifying the create_config function to include dry_run functionality, and adjusting the output to display the data processed and estimated cost without showing the actual results when --dry-run is enabled. Additionally, tests have been added to verify the correct behavior of the create_config function with the dry_run option.
Highlights
- New Feature:
--dry-runoption: Adds a--dry-runoption to thepypinfoCLI, enabling users to estimate data processing costs before running a query. - Configuration Modification: Modifies the
create_configfunction to setdry_runanduse_query_cacheoptions based on the--dry-runflag. - Output Adjustment: Adjusts the CLI output to display data processing information without showing query results when
--dry-runis active. - Testing: Adds a new test case to verify the correct behavior of the
create_configfunction whendry_runis enabled.
Changelog
Click here to see the changelog
- README.md
- Added
-n, --dry-runoption to the documentation.
- Added
- pypinfo/cli.py
- Added
--dry-runoption usingclick(line 85). - Modified
pypinfofunction to acceptdry_runparameter (line 108). - Modified
pypinfofunction to passdry_runtocreate_config(line 172). - Modified output logic to conditionally display results based on
dry_run(lines 192, 197, 210).
- Added
- pypinfo/core.py
- Modified
create_configfunction to acceptdry_runparameter (line 29). - Added logic to set
config.dry_runandconfig.use_query_cachebased ondry_runflag (lines 32-34).
- Modified
- tests/test_core.py
- Added
test_create_config_dry_runto verifycreate_configwithdry_run=True(lines 40-46). - Added assertion for
dry_runintest_create_config(line 36).
- Added
- tox.ini
- Updated coverage report commands to include
--show-missingandhtml(lines 21-22).
- Updated coverage report commands to include
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A dry run's gentle grace,
No data spills all over the place,
Just costs we see,
For you and me,
Before the query we embrace.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a --dry-run option to the pypinfo CLI, allowing users to estimate the data processed and cost of a query without actually running it. The changes are well-structured and include necessary modifications to the CLI, core logic, and tests. The addition of this feature enhances the tool's usability and transparency regarding resource consumption.
Summary of Findings
- Missing Error Handling for Dry Run: The code does not explicitly handle potential errors that might occur during the dry run execution. While the dry run itself doesn't process data, it still interacts with the BigQuery API and could encounter issues like authentication failures or invalid query syntax. Adding error handling would improve the robustness of the feature.
- Inconsistent Output for Dry Run: The output format for the dry run is slightly different from a real run, particularly when using the
--jsonflag. While the core information is present, ensuring a consistent output structure would enhance the user experience and simplify parsing of the results.
Merge Readiness
The pull request is well-structured and introduces a valuable feature. However, addressing the missing error handling and ensuring consistent output for the dry run would further improve the quality of the code. I recommend addressing these points before merging. I am unable to directly approve this pull request, and other reviewers should review and approve this code before merging.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #168 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 547 558 +11
Branches 74 43 -31
=========================================
+ Hits 547 558 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| if run: | ||
| with create_client(get_credentials()) as client: | ||
| query_job = client.query(built_query, job_config=create_config()) | ||
| query_job = client.query(built_query, job_config=create_config(dry_run)) |
There was a problem hiding this comment.
There was a problem hiding this comment.
I get no errors running this:
❯ pypinfo --dry-run --all --days 1 --percent pillow pyversion
Served from cache: False
Data processed: 344.62 MiB
Data billed: 0.00 B
Estimated cost: $0.00Adding some debug print:
query_job = client.query(built_query, job_config=create_config(dry_run))
print(query_job)
query_rows = query_job.result(timeout=timeout // 1000)
print(query_rows)
rows = parse_query_result(query_rows)
print(rows)Gives:
❯ pypinfo --dry-run --all --days 1 --percent pillow pyversion
QueryJob<project=pypinfo-hugovk, location=US, id=None>
<google.cloud.bigquery.table._EmptyRowIterator object at 0x10b0e2f90>
[['python_version', 'download_count']]
Served from cache: False
Data processed: 344.62 MiB
Data billed: 0.00 B
Estimated cost: $0.00Indeed, rows does not contain download data.
But this PR also adds not dry_run guards around subsequent processing, so we don't get errors.
|
I'll do a 23.0.0 release. |
Here's how to run a dry-run query to see how much data would be processed by a given command's query, by adding
dry_run=True, use_query_cache=Falseto the config:https://cloud.google.com/bigquery/docs/samples/bigquery-query-dry-run#bigquery_query_dry_run-python
Let's expose it with the
--dry-runoption.Dry run
Real run