Add index on id columns by MuhammadTahaNaveed · Pull Request #2117 · apache/age

MuhammadTahaNaveed · 2024-09-23T09:21:58Z

Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created.
This change is expected to improve the performance of queries that involve joins. From some tests, it was observed that the performance of queries improved alot.
Loader was updated to insert tuples in indices as well. This has caused the loader to slow down a bit, but it was necessary.
A bug related to command ids in cypher_delete executor was also fixed.

jrgemignani · 2025-12-02T02:45:15Z

@MuhammadTahaNaveed Will you be able to get a chance to rebase this to the latest master branch? Or, would it be okay for someone else to take this on?

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed.

jrgemignani

Looks good to me

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. Resolved conflicts: regress/expected/cypher_match.out regress/expected/expr.out

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed.

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. (cherry picked from commit 5aed9ec) Conflicts: regress/expected/cypher_subquery.out src/backend/utils/load/ag_load_labels.c Resolved by keeping new changes.

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. (cherry picked from commit 5aed9ec) Conflicts: regress/expected/cypher_subquery.out src/backend/utils/load/ag_load_labels.c Accepted incoming changes.

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. (cherry picked from commit 5aed9ec)

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. (cherry picked from commit 5aed9ec) Conflicts: regress/expected/cypher_subquery.out src/backend/utils/load/ag_load_labels.c

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. (cherry picked from commit 5aed9ec)

- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed.

* Add index on id columns (#2117) - Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed. * Fix possible memory and file descriptors leaks (#2258) - Used postgres memory allocation functions instead of standard ones. - Wrapped main loop of csv loader in PG_TRY block for better error handling. * Restrict age_load commands (#2274) This PR applies restrictions to the following age_load commands - load_labels_from_file() load_edges_from_file() They are now tied to a specific root directory and are required to have a specific file extension to eliminate any attempts to force them to access any other files. Nothing else has changed with the actual command formats or parameters, only that they work out of the /tmp/age directory and only access files with an extension of .csv. Added regression tests and updated the location of the csv files for those regression tests. modified: regress/expected/age_load.out modified: regress/sql/age_load.sql modified: src/backend/utils/load/age_load.c * Fix and improve index.sql regression test coverage (#2300) NOTE: This PR was created with AI tools and a human. - Remove unused copy command (leftover from deleted agload_test_graph test) - Replace broken Section 4 that referenced non-existent graph with comprehensive WHERE clause tests covering string, int, bool, and float properties with AND/OR/NOT operators - Add EXPLAIN tests to verify index usage: - Section 3: Validate GIN indices (load_city_gin_idx, load_country_gin_idx) show Bitmap Index Scan for property matching - Section 4: Validate all expression indices (city_country_code_idx, city_id_idx, city_west_coast_idx, country_life_exp_idx) show Index Scan for WHERE clause filtering All indices now have EXPLAIN verification confirming they are used as expected. modified: regress/expected/index.out modified: regress/sql/index.sql * Fix and improve index.sql addendum (#2301) NOTE: This PR was created with the help of AI tools and a human. Added additional requested regression tests - *EXPLAIN for pattern with WHERE clause *EXPLAIN for pattern with filters on both country and city modified: regress/expected/index.out modified: regress/sql/index.sql * Replace libcsv with pg COPY for csv loading (#2310) - Commit also adds permission checks - Resolves a critical memory spike issue on loading large file - Use pg's COPY infrastructure (BeginCopyFrom, NextCopyFromRawFields) for 64KB buffered CSV parsing instead of libcsv - Add byte based flush threshold (64KB) matching COPY behavior for memory safety - Use heap_multi_insert with BulkInsertState for optimized batch inserts - Add per batch memory context to prevent memory growth during large loads - Remove libcsv dependency (libcsv.c, csv.h) - Improves loading performance by 15-20% - No previous regression tests were impacted - Added regression tests for permissions/rls Assisted-by AI --------- Co-authored-by: Aleksey Konovkin <alkon2000@mail.ru> Co-authored-by: John Gemignani <jrgemignani@gmail.com>

MuhammadTahaNaveed requested review from jrgemignani and rafsun42 September 23, 2024 09:21

github-actions bot added master override-stale To keep issues/PRs untouched from stale action labels Sep 23, 2024

uhayat mentioned this pull request Nov 14, 2024

Run MATCH with multiple edges query faster #2129

Closed

This was referenced Dec 2, 2024

How to update index after importing from files #2136

Closed

MATCH Query Always Using Sequential Scan, Ignoring Index Scan #2137

Closed

MuhammadTahaNaveed force-pushed the perf branch from 62c25fa to 53a8140 Compare December 2, 2025 18:48

jrgemignani approved these changes Dec 3, 2025

View reviewed changes

jrgemignani merged commit 5aed9ec into apache:master Dec 3, 2025
7 checks passed

jrgemignani mentioned this pull request Jan 30, 2026

Update PG17 to master prior to 1.7.0 #2325

Merged

HarshDaryani896 mentioned this pull request Apr 3, 2026

Cherry-pick of upstream PR #2117: Add index on id columns (#2117) yugabyte/age#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index on id columns#2117

Add index on id columns#2117
jrgemignani merged 1 commit intoapache:masterfrom
MuhammadTahaNaveed:perf

MuhammadTahaNaveed commented Sep 23, 2024 •

edited

Loading

Uh oh!

jrgemignani commented Dec 2, 2025 •

edited

Loading

Uh oh!

jrgemignani left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MuhammadTahaNaveed commented Sep 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrgemignani commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrgemignani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MuhammadTahaNaveed commented Sep 23, 2024 •

edited

Loading

jrgemignani commented Dec 2, 2025 •

edited

Loading