Skip to content

Add index on id columns#2117

Merged
jrgemignani merged 1 commit intoapache:masterfrom
MuhammadTahaNaveed:perf
Dec 3, 2025
Merged

Add index on id columns#2117
jrgemignani merged 1 commit intoapache:masterfrom
MuhammadTahaNaveed:perf

Conversation

@MuhammadTahaNaveed
Copy link
Copy Markdown
Member

@MuhammadTahaNaveed MuhammadTahaNaveed commented Sep 23, 2024

  • Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created.

  • This change is expected to improve the performance of queries that involve joins. From some tests, it was observed that the performance of queries improved alot.

  • Loader was updated to insert tuples in indices as well. This has caused the loader to slow down a bit, but it was necessary.

  • A bug related to command ids in cypher_delete executor was also fixed.

@jrgemignani
Copy link
Copy Markdown
Contributor

jrgemignani commented Dec 2, 2025

@MuhammadTahaNaveed Will you be able to get a chance to rebase this to the latest master branch? Or, would it be okay for someone else to take this on?

- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.
Copy link
Copy Markdown
Contributor

@jrgemignani jrgemignani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@jrgemignani jrgemignani merged commit 5aed9ec into apache:master Dec 3, 2025
7 checks passed
jrgemignani pushed a commit to jrgemignani/age that referenced this pull request Dec 16, 2025
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

Resolved conflicts:
	regress/expected/cypher_match.out
	regress/expected/expr.out
MuhammadTahaNaveed added a commit that referenced this pull request Dec 16, 2025
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

Resolved conflicts:
	regress/expected/cypher_match.out
	regress/expected/expr.out
jrgemignani pushed a commit to jrgemignani/age that referenced this pull request Jan 30, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.
MuhammadTahaNaveed added a commit that referenced this pull request Feb 3, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Mar 17, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)

 Conflicts:
	regress/expected/cypher_subquery.out
	src/backend/utils/load/ag_load_labels.c

 Resolved by keeping new changes.
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Mar 31, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)

 Conflicts:
	regress/expected/cypher_subquery.out
	src/backend/utils/load/ag_load_labels.c

Accepted incoming changes.
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Mar 31, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Apr 1, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Apr 3, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)

Conflicts:
	regress/expected/cypher_subquery.out
	src/backend/utils/load/ag_load_labels.c
HarshDaryani896 pushed a commit to yugabyte/age that referenced this pull request Apr 3, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

(cherry picked from commit 5aed9ec)
MuhammadTahaNaveed added a commit to MuhammadTahaNaveed/age that referenced this pull request Apr 8, 2026
- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.
jrgemignani added a commit that referenced this pull request Apr 8, 2026
* Add index on id columns (#2117)

- Whenever a label will be created, indices on id columns will be
  created by default. In case of vertex, a unique index on id column
  will be created, which will also serve as a unique constraint.
  In case of edge, a non-unique index on start_id and end_id columns
  will be created.

- This change is expected to improve the performance of queries that
  involve joins. From some performance tests, it was observed that
  the performance of queries improved alot.

- Loader was updated to insert tuples in indices as well. This has
  caused to slow the loader down a bit, but it was necessary.

- A bug related to command ids in cypher_delete executor was also fixed.

* Fix possible memory and file descriptors leaks (#2258)

- Used postgres memory allocation functions instead of standard ones.
- Wrapped main loop of csv loader in PG_TRY block for better error handling.

* Restrict age_load commands (#2274)

This PR applies restrictions to the following age_load commands -

    load_labels_from_file()
    load_edges_from_file()

They are now tied to a specific root directory and are required to have a
specific file extension to eliminate any attempts to force them to access
any other files.

Nothing else has changed with the actual command formats or parameters,
only that they work out of the /tmp/age directory and only access files
with an extension of .csv.

Added regression tests and updated the location of the csv files for
those regression tests.

modified:   regress/expected/age_load.out
modified:   regress/sql/age_load.sql
modified:   src/backend/utils/load/age_load.c

* Fix and improve index.sql regression test coverage (#2300)

NOTE: This PR was created with AI tools and a human.

- Remove unused copy command (leftover from deleted agload_test_graph test)
- Replace broken Section 4 that referenced non-existent graph with
  comprehensive WHERE clause tests covering string, int, bool, and float
  properties with AND/OR/NOT operators
- Add EXPLAIN tests to verify index usage:
  - Section 3: Validate GIN indices (load_city_gin_idx, load_country_gin_idx)
    show Bitmap Index Scan for property matching
  - Section 4: Validate all expression indices (city_country_code_idx,
    city_id_idx, city_west_coast_idx, country_life_exp_idx) show Index Scan
    for WHERE clause filtering

All indices now have EXPLAIN verification confirming they are used as expected.

modified:   regress/expected/index.out
modified:   regress/sql/index.sql

* Fix and improve index.sql addendum (#2301)

NOTE: This PR was created with the help of AI tools and a human.

Added additional requested regression tests -

 *EXPLAIN for pattern with WHERE clause

 *EXPLAIN for pattern with filters on both country and city

modified:   regress/expected/index.out
modified:   regress/sql/index.sql

* Replace libcsv with pg COPY for csv loading (#2310)

- Commit also adds permission checks
- Resolves a critical memory spike issue on loading large file
- Use pg's COPY infrastructure (BeginCopyFrom, NextCopyFromRawFields)
  for 64KB buffered CSV parsing instead of libcsv
- Add byte based flush threshold (64KB) matching COPY behavior for memory safety
- Use heap_multi_insert with BulkInsertState for optimized batch inserts
- Add per batch memory context to prevent memory growth during large loads
- Remove libcsv dependency (libcsv.c, csv.h)
- Improves loading performance by 15-20%
- No previous regression tests were impacted
- Added regression tests for permissions/rls
Assisted-by AI

---------

Co-authored-by: Aleksey Konovkin <alkon2000@mail.ru>
Co-authored-by: John Gemignani <jrgemignani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master override-stale To keep issues/PRs untouched from stale action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants