Update PG16 prior to 1.7.0 part 2#2375
Merged
jrgemignani merged 6 commits intoapache:PG16from Apr 8, 2026
Merged
Conversation
- Whenever a label will be created, indices on id columns will be created by default. In case of vertex, a unique index on id column will be created, which will also serve as a unique constraint. In case of edge, a non-unique index on start_id and end_id columns will be created. - This change is expected to improve the performance of queries that involve joins. From some performance tests, it was observed that the performance of queries improved alot. - Loader was updated to insert tuples in indices as well. This has caused to slow the loader down a bit, but it was necessary. - A bug related to command ids in cypher_delete executor was also fixed.
- Used postgres memory allocation functions instead of standard ones. - Wrapped main loop of csv loader in PG_TRY block for better error handling.
This PR applies restrictions to the following age_load commands -
load_labels_from_file()
load_edges_from_file()
They are now tied to a specific root directory and are required to have a
specific file extension to eliminate any attempts to force them to access
any other files.
Nothing else has changed with the actual command formats or parameters,
only that they work out of the /tmp/age directory and only access files
with an extension of .csv.
Added regression tests and updated the location of the csv files for
those regression tests.
modified: regress/expected/age_load.out
modified: regress/sql/age_load.sql
modified: src/backend/utils/load/age_load.c
NOTE: This PR was created with AI tools and a human.
- Remove unused copy command (leftover from deleted agload_test_graph test)
- Replace broken Section 4 that referenced non-existent graph with
comprehensive WHERE clause tests covering string, int, bool, and float
properties with AND/OR/NOT operators
- Add EXPLAIN tests to verify index usage:
- Section 3: Validate GIN indices (load_city_gin_idx, load_country_gin_idx)
show Bitmap Index Scan for property matching
- Section 4: Validate all expression indices (city_country_code_idx,
city_id_idx, city_west_coast_idx, country_life_exp_idx) show Index Scan
for WHERE clause filtering
All indices now have EXPLAIN verification confirming they are used as expected.
modified: regress/expected/index.out
modified: regress/sql/index.sql
NOTE: This PR was created with the help of AI tools and a human. Added additional requested regression tests - *EXPLAIN for pattern with WHERE clause *EXPLAIN for pattern with filters on both country and city modified: regress/expected/index.out modified: regress/sql/index.sql
- Commit also adds permission checks - Resolves a critical memory spike issue on loading large file - Use pg's COPY infrastructure (BeginCopyFrom, NextCopyFromRawFields) for 64KB buffered CSV parsing instead of libcsv - Add byte based flush threshold (64KB) matching COPY behavior for memory safety - Use heap_multi_insert with BulkInsertState for optimized batch inserts - Add per batch memory context to prevent memory growth during large loads - Remove libcsv dependency (libcsv.c, csv.h) - Improves loading performance by 15-20% - No previous regression tests were impacted - Added regression tests for permissions/rls Assisted-by AI
jrgemignani
approved these changes
Apr 8, 2026
There was a problem hiding this comment.
Pull request overview
This PR cherry-picks a set of loader/indexing/security changes onto the PG16 branch ahead of the 1.7.0 release, including migrating CSV loading to PostgreSQL’s COPY infrastructure, tightening file access, and improving index-related behavior and regression coverage.
Changes:
- Replace the libcsv-based loader with COPY-based CSV parsing and batch insertion, plus sandboxing (/tmp/age) and privilege/RLS checks.
- Create default indexes on label id columns (and edge start/end ids) to improve join performance, and expand index regression assertions.
- Misc. memory/FD safety improvements (e.g., replacing strdup/free patterns with pstrdup/pnstrdup, adding repalloc helper), plus regression output updates reflecting new plan/order behavior.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/utils/load/csv.h | Removes libcsv header (dependency cleanup). |
| src/backend/utils/load/libcsv.c | Removes libcsv implementation. |
| Makefile | Drops libcsv object from build. |
| src/include/utils/load/age_load.h | Updates loader API/types for COPY-based batching and buffering. |
| src/include/utils/load/ag_load_labels.h | Updates vertex CSV loader interface/docs for COPY-based load. |
| src/include/utils/load/ag_load_edges.h | Updates edge CSV loader interface/docs for COPY-based load. |
| src/backend/utils/load/age_load.c | Adds sandboxing/permission/RLS checks; adds batch insert helpers; updates insert paths for indexes. |
| src/backend/utils/load/ag_load_labels.c | Reimplements vertex CSV loading via COPY raw-field parsing + batch inserts. |
| src/backend/utils/load/ag_load_edges.c | Reimplements edge CSV loading via COPY raw-field parsing + batch inserts. |
| src/include/utils/agtype.h | Adds repalloc_check declaration. |
| src/backend/utils/adt/agtype.c | Implements repalloc_check; replaces strdup/strndup with pstrdup/pnstrdup and removes corresponding frees. |
| src/backend/utils/adt/age_global_graph.c | Switches to pnstrdup and removes manual free. |
| src/backend/executor/cypher_delete.c | Updates command id fields after delete to keep executor state consistent. |
| src/backend/commands/label_commands.c | Adds automatic indexes on id/start_id/end_id at label creation; adjusts vertex id constraints. |
| regress/sql/index.sql | Updates index tests and adds EXPLAIN assertions for index usage. |
| regress/expected/index.out | Expected output updates for index tests. |
| regress/sql/age_load.sql | Updates load tests for /tmp/age sandbox and adds permission/RLS/constraint scenarios. |
| regress/expected/age_load.out | Expected output updates for new sandbox/security behavior. |
| regress/expected/map_projection.out | Expected output changes reflecting different row order. |
| regress/expected/graph_generation.out | Expected output ordering updates. |
| regress/expected/expr.out | Expected output ordering updates. |
| regress/expected/cypher_vle.out | Expected output ordering updates. |
| regress/expected/cypher_merge.out | Expected output ordering updates. |
| regress/expected/cypher_match.out | Expected output ordering updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following commits are cherry-picked from master. This PR is an an extension of PR #2358
Please use Rebase and Merge for this PR to maintain commit history.