Skip to content
This repository was archived by the owner on May 24, 2022. It is now read-only.

Add option to interpret null byte as an empty cell#24

Open
tomplex wants to merge 2 commits intomasterfrom
tc/null-byte-as-empty
Open

Add option to interpret null byte as an empty cell#24
tomplex wants to merge 2 commits intomasterfrom
tc/null-byte-as-empty

Conversation

@tomplex
Copy link
Copy Markdown

@tomplex tomplex commented Apr 12, 2022

Addresses #17

That said, I can't quite figure out how to make the test "test_null_byte_as_empty_cell" not escape the \x00.

@tomplex tomplex requested a review from emk April 12, 2022 20:12
Comment thread tests/tests.rs
.output_with_stdin(
br#"c1,c2,c3
1,2,3
\x00,\x00,1
Copy link
Copy Markdown
Author

@tomplex tomplex Apr 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is failing because the \x00 is being escaped to \\x00, and I'm not quite sure how to fix it. the docs suggest that using br# should work.

@tomplex tomplex changed the title initial commit: option to interpret null byte as an empty cell Add option to interpret null byte as an empty cell Apr 12, 2022
Comment thread src/main.rs
let s = format!("^{}$", null_re_str);
let re = Regex::new(&s).context("can't compile regular expression")?;
Some(re)
let mut pattern = if opt.null_byte_as_empty == true {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a better way to deal with this, but I'm not familiar enough to tell for sure.

Copy link
Copy Markdown
Contributor

@emk emk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to double-check whether this is actually a good use case for scrubcsv and not for a separate Unix tool that just deletes NULL bytes. We've encountered similar issues before and the best solution has often been to strip NULL bytes before invoking scrubcsv.

If that doesn't work in this case, I would like to quickly double-check our options.

Comment thread src/main.rs
Comment on lines +107 to +108
#[structopt(long = "null-byte-as-empty")]
null_byte_as_empty: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before adding an option, did we try --null '^\x00$'? Rust's regex` library allows bytes to be escaped.

If we need to add another null option besides --null, we should aim for a consistent naming convention, both with -null and the other existing options. Most of our existing options are either "--verb" or "--noun", and --null-byte-as-empty . It's definitely good to start with --null- like you do here, because that will ensure the right sort order on the display.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants