Skip to content

[Refactor][Bitbucket_Server] Speed up PR collector/extractor #7457

Description

@sstojak1

What and why to refactor

What are you trying to refactor? Why should it be refactored now?
The pr_collector for Bitbucket Server consistently adds the same data to the RAW_PULL_REQUEST_TABLE after each run.
Consequently, the extractApiPullRequests process slows down because it has to sift through all the records in the raw table, including duplicates.
For instance, if a repository has 1000 pull requests, after 10 job runs, the raw table will contain 10,000 rows, and extractApiPullRequests will have to process each of these records.

Was there a need to have all those history raw API data imports and because of that delete is not a feasible option?

Describe the solution you'd like

How to refactor?

  1. Perhaps we could go with deleting all records from the raw table and importing PR again with each job run. This would prevent duplicates and avoid slowing down extractApiPullRequests task.
  2. Check how Bitbucket plugin is doing it and maybe reuse the logic if it's better?

Related issues

Please link any other

Additional context

Add any other context or screenshots about the feature request here.

How to recreate:
Run Collect Data for Bitbucket Server more than once and observe the size of _raw_bitbucket_server_api_pull_requests table.

Metadata

Metadata

Assignees

Labels

type/refactorThis issue is to refactor existing code

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions