Add option to interpret null byte as an empty cell#24
Add option to interpret null byte as an empty cell#24
Conversation
| .output_with_stdin( | ||
| br#"c1,c2,c3 | ||
| 1,2,3 | ||
| \x00,\x00,1 |
There was a problem hiding this comment.
this is failing because the \x00 is being escaped to \\x00, and I'm not quite sure how to fix it. the docs suggest that using br# should work.
| let s = format!("^{}$", null_re_str); | ||
| let re = Regex::new(&s).context("can't compile regular expression")?; | ||
| Some(re) | ||
| let mut pattern = if opt.null_byte_as_empty == true { |
There was a problem hiding this comment.
I'm sure there's a better way to deal with this, but I'm not familiar enough to tell for sure.
emk
left a comment
There was a problem hiding this comment.
I'd like to double-check whether this is actually a good use case for scrubcsv and not for a separate Unix tool that just deletes NULL bytes. We've encountered similar issues before and the best solution has often been to strip NULL bytes before invoking scrubcsv.
If that doesn't work in this case, I would like to quickly double-check our options.
| #[structopt(long = "null-byte-as-empty")] | ||
| null_byte_as_empty: bool, |
There was a problem hiding this comment.
Before adding an option, did we try --null '^\x00$'? Rust's regex` library allows bytes to be escaped.
If we need to add another null option besides --null, we should aim for a consistent naming convention, both with -null and the other existing options. Most of our existing options are either "--verb" or "--noun", and --null-byte-as-empty . It's definitely good to start with --null- like you do here, because that will ensure the right sort order on the display.
Addresses #17
That said, I can't quite figure out how to make the test "test_null_byte_as_empty_cell" not escape the
\x00.