What and why to refactor
What are you trying to refactor? Why should it be refactored now?
The pr_collector for Bitbucket Server consistently adds the same data to the RAW_PULL_REQUEST_TABLE after each run.
Consequently, the extractApiPullRequests process slows down because it has to sift through all the records in the raw table, including duplicates.
For instance, if a repository has 1000 pull requests, after 10 job runs, the raw table will contain 10,000 rows, and extractApiPullRequests will have to process each of these records.
Was there a need to have all those history raw API data imports and because of that delete is not a feasible option?
Describe the solution you'd like
How to refactor?
- Perhaps we could go with deleting all records from the raw table and importing PR again with each job run. This would prevent duplicates and avoid slowing down extractApiPullRequests task.
- Check how Bitbucket plugin is doing it and maybe reuse the logic if it's better?
Related issues
Please link any other
Additional context
Add any other context or screenshots about the feature request here.
How to recreate:
Run Collect Data for Bitbucket Server more than once and observe the size of _raw_bitbucket_server_api_pull_requests table.
What and why to refactor
What are you trying to refactor? Why should it be refactored now?
The pr_collector for Bitbucket Server consistently adds the same data to the RAW_PULL_REQUEST_TABLE after each run.
Consequently, the extractApiPullRequests process slows down because it has to sift through all the records in the raw table, including duplicates.
For instance, if a repository has 1000 pull requests, after 10 job runs, the raw table will contain 10,000 rows, and extractApiPullRequests will have to process each of these records.
Was there a need to have all those history raw API data imports and because of that delete is not a feasible option?
Describe the solution you'd like
How to refactor?
Related issues
Please link any other
Additional context
Add any other context or screenshots about the feature request here.
How to recreate:
Run Collect Data for Bitbucket Server more than once and observe the size of _raw_bitbucket_server_api_pull_requests table.