Skip to content

Make RowSelection's from_consecutive_ranges public#5848

Merged
alamb merged 3 commits intoapache:masterfrom
advancedxy:issue-5846
Jun 11, 2024
Merged

Make RowSelection's from_consecutive_ranges public#5848
alamb merged 3 commits intoapache:masterfrom
advancedxy:issue-5846

Conversation

@advancedxy
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #5846

Rationale for this change

This constructor method should be easier to use.

What changes are included in this PR?

  1. make the from_consecutive_ranges function public
  2. tweak a bit and makes the total rows optional
  3. update the tests.

Are there any user-facing changes?

NO.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 6, 2024
@advancedxy
Copy link
Copy Markdown
Contributor Author

@alamb @tustvold it would be great if you can take a look at this when you are free.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use-case for not knowing the total number of rows?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that making total_rows optional will make it very hard to use this API correctly. Specifically, the `total_rows1 almost always needs to be necessary, otherwise it will not be possible to represent skipping the last rows in the selection

For example, a range of 100..200 for 400 total rows needs to look ike

Skip(100)
Scan(100)
Skip(100). <--- you can't figure this out from the ranges, you need the total row count

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the final skip is optional based on the trim function. I will revert the API change later. I am traveling now, so it might take some days for me to get back with access to a computer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I thought the final skip is optional based on the trim function.

It probably didn't help that several test cases in apache/datafusion#10738 didn't do this correctly either. I have fixed them in apache/datafusion#10813

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank all for your suggestions. Addressed in the latest commit. Please let me know what you think.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @advancedxy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that making total_rows optional will make it very hard to use this API correctly. Specifically, the `total_rows1 almost always needs to be necessary, otherwise it will not be possible to represent skipping the last rows in the selection

For example, a range of 100..200 for 400 total rows needs to look ike

Skip(100)
Scan(100)
Skip(100). <--- you can't figure this out from the ranges, you need the total row count

@advancedxy
Copy link
Copy Markdown
Contributor Author

The CI failure seems unrelated to this PR?

Run cargo msrv verify
  cargo msrv verify
  shell: sh -e {0}
  env:
    RUSTFLAGS: -C debuginfo=1
    RUST_BACKTRACE: 1
Fetching index
Verifying the Minimum Supported Rust Version (MSRV) for toolchain x86_64-unknown-linux-gnu
Using check command cargo check
   Failed check command cargo check didn't succeed
Crate source was found to be incompatible with its MSRV '1.6[2](https://github.com/apache/arrow-rs/actions/runs/9462178785/job/26064555740?pr=5848#step:11:2).1', as defined in '/__w/arrow-rs/arrow-rs/object_store/Cargo.toml'
Error: Process completed with exit code 1.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 11, 2024

The CI failure seems unrelated to this PR?

@korowa fixed this in #5866 earlier today

I merged up from main to get a clean CI run

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @advancedxy

/// let actual: Vec<RowSelector> = selection.into();
/// assert_eq!(actual, expected);
///
/// // you can also create a selection from consecutive ranges
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit a20116e into apache:master Jun 11, 2024
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jun 11, 2024

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make RowSelection::from_consecutive_ranges public

3 participants