Skip to content

implement timestamp correction utilities and tests for git commit data#346

Merged
MoralCode merged 5 commits into
mainfrom
shlokgilda/fix/issue-3472-validate-committer-timestamps
May 27, 2026
Merged

implement timestamp correction utilities and tests for git commit data#346
MoralCode merged 5 commits into
mainfrom
shlokgilda/fix/issue-3472-validate-committer-timestamps

Conversation

@MoralCode

Copy link
Copy Markdown
Contributor

Description

Fixes #157. originally filed by @shlokgilda in augurlabs/augur#3518

The Bug:
facade_bulk_insert_commits() only validated cmt_author_timestamp when handling DB insertion failures. If the author timestamp was valid but the committer timestamp had an invalid timezone (e.g., -13068837 from git corruption), the insert would fail permanently.

The Fix:

  • Created augur/tasks/git/correction.py with timezone validation functions (simple string operations)
  • Modified lib.py to validate both author and committer timestamps when the binary search isolates a problematic commit
  • Uses author timestamp as fallback for invalid committer timestamp to minimize data loss
  • Only validates on insertion failure (keeps the existing performance optimization intact)

Changes:

  1. NEW: augur/tasks/git/correction.py - Timestamp validation utilities
  2. MODIFIED: augur/application/db/lib.py - Now validates both timestamps on failure
  3. NEW: tests/test_tasks/test_git/test_correction.py - 12 unit tests (all passing)

Notes for Reviewers

  • The fix maintains the existing binary search optimization - we only validate timestamps when PostgreSQL rejects an insert, not proactively
  • POSTGRES_VALID_TIMEZONES includes all real-world timezone offsets (-12:00 to +14:00). Some of the timezone offsets were missing. Reference: https://docs.oracle.com/cd/E19563-01/819-4437/anovd/index.html
  • Fallback chain: invalid committer → use author timestamp → UTC (minimizes data loss per issue discussion)

Signed commits

  • Yes, I signed my commits.

AI Disclosure: I used Claude Code to assist with writing docstrings, code review, writing test cases, and drafting this PR description. All test cases were manually verified and are passing locally.

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bandit found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
Comment thread tests/test_application/test_db/test_timestamp_utils.py
@MoralCode

Copy link
Copy Markdown
Contributor Author

Thorough testing is needed given that this has been rebased since it was initially written/tested

@MoralCode MoralCode added the waiting This change is waiting for some other changes to land first label May 26, 2026
shlokgilda and others added 4 commits May 27, 2026 09:07
Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
…ction tests

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Addresses PR #3518 review feedback from Ulincsys: the db module must not
import from the tasks layer, as it's intended to be pip-installable as a
standalone package. Moved correction.py to augur/application/db/timestamp_utils.py
and relocated its tests accordingly.

Signed-off-by: Shlok Gilda <gildashlok@hotmail.com>
@MoralCode MoralCode force-pushed the shlokgilda/fix/issue-3472-validate-committer-timestamps branch from 0537c1b to fd57eb8 Compare May 27, 2026 13:07
@MoralCode MoralCode self-assigned this May 27, 2026
@MoralCode

MoralCode commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

replicated the issue with the frr repo during analyze commits in parallel.

now running with ~this branch (i introduced a bug). will consider it tested when i see the FRR repo hit full collection (or at least have analyze_commits_in_parallel not failing for it in flower AKA facade task success)

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
@MoralCode MoralCode force-pushed the shlokgilda/fix/issue-3472-validate-committer-timestamps branch from fd57eb8 to 05f5eb6 Compare May 27, 2026 14:26
@MoralCode MoralCode moved this from Todo to In Progress in CollectOSS Feature Roadmap May 27, 2026
@MoralCode MoralCode removed the waiting This change is waiting for some other changes to land first label May 27, 2026
@MoralCode

Copy link
Copy Markdown
Contributor Author

I observed analyze_commits_in_parallel task complete for this repo in flower. Test passed.

@MoralCode MoralCode merged commit 24805b5 into main May 27, 2026
16 checks passed
@MoralCode MoralCode deleted the shlokgilda/fix/issue-3472-validate-committer-timestamps branch May 27, 2026 18:15
@github-project-automation github-project-automation Bot moved this from In Progress to Done in CollectOSS Feature Roadmap May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

facade_bulk_insert_commits should also correct cmt_committer_timestamp

3 participants