Skip to content

Refactor Excel writers and readers for performance and limit handling#13

Merged
GabrielMarquezMatte merged 15 commits into
masterfrom
develop
Jun 27, 2026
Merged

Refactor Excel writers and readers for performance and limit handling#13
GabrielMarquezMatte merged 15 commits into
masterfrom
develop

Conversation

@GabrielMarquezMatte

Copy link
Copy Markdown
Owner

This pull request introduces configurable input size limits for Excel file reading and writing, enhancing security and robustness by bounding resource usage and preventing excessive memory allocations. It adds an ExcelReaderOptions class for specifying limits, updates all reader factory methods to accept these options, and throws a new ExcelLimitExceededException when limits are exceeded. The documentation is updated to explain the new defaults and how to tune or disable them.

Configurable Input Limits and Exception Handling

API Changes

  • Updated all Excel factory methods (sync and async) to accept an optional ExcelReaderOptions parameter, passing it to the underlying readers. This change is source-compatible and allows consumers to tune or disable limits as needed. (src/ExcelReader.Core/Reader/Excel.cs [1] [2] [3] [4]

Documentation and Benchmark Updates

  • Updated the README.md to document the new size limits, show how to tune or disable them, and refresh benchmark results to reflect recent performance and allocation improvements. (README.md [1] [2] [3]

These changes make the library safer by default and provide flexibility for advanced users to adjust limits based on their requirements.

Gabriel Matte and others added 14 commits June 26, 2026 17:04
…writers; enhance tests for round-trip validation
…ord buffering; implement FlushRecords method for improved performance
…ng; update FlushThreshold to SpillThreshold for better performance
… refactor XlsbSheetWriter to support writing rows with XlsbCell; enhance benchmarks and tests for round-trip validation
…st ExcelReader comparisons with MiniExcel, Sylvan, and SpreadCheetah
… method for RowWriter and implement row flushing based on a configurable threshold in SheetWriter
…ment GetSpan and Advance methods to reduce memory allocations during value formatting
…ate value writing methods into a single WriteValue method for improved efficiency and reduced memory allocations
- Introduced `ExcelLimitExceededException` to handle limit violations.
- Created `ExcelReaderOptions` to configure maximum limits for decompressed bytes and shared strings.
- Implemented `LimitChecks` to validate limits during reading operations.
- Added `DecompressedByteCounter` to track total decompressed bytes.
- Updated `LimitedReadStream` to enforce limits on read operations.
- Refactored `XlsReader`, `XlsxReader`, and `XlsbReader` to utilize new limit checks and options.
- Enhanced shared string parsing to respect configured limits.
- Added unit tests to verify limit enforcement for decompressed bytes and shared strings.
@codecov-commenter

codecov-commenter commented Jun 27, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 73.85943% with 212 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.99%. Comparing base (9f077d2) to head (472f1c6).

Files with missing lines Patch % Lines
...c/ExcelReader.Core/Reader/XlsxReader.Enumerator.cs 63.35% 33 Missing and 15 partials ⚠️
src/ExcelReader.Core/Writer/XlsbSheetWriter.cs 66.66% 25 Missing and 7 partials ⚠️
src/ExcelReader.Core/Writer/XlsbRowWriter.cs 62.90% 21 Missing and 2 partials ⚠️
src/ExcelReader.Core/Writer/RowWriter.cs 68.57% 22 Missing ⚠️
src/ExcelReader.Core/Writer/XlsRowWriter.cs 65.38% 17 Missing and 1 partial ⚠️
src/ExcelReader.Core/Reader/LimitedReadStream.cs 76.27% 13 Missing and 1 partial ⚠️
src/ExcelReader.Core/Writer/XlsbCell.cs 50.00% 13 Missing ⚠️
.../ExcelReader.Core/Writer/Internal/CellFormatter.cs 86.20% 6 Missing and 2 partials ⚠️
src/ExcelReader.Core/Writer/SheetWriter.cs 82.22% 7 Missing and 1 partial ⚠️
src/ExcelReader.Core/Reader/XlsxXml.cs 63.15% 5 Missing and 2 partials ⚠️
... and 5 more
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #13      +/-   ##
==========================================
- Coverage   90.79%   85.99%   -4.81%     
==========================================
  Files          54       60       +6     
  Lines        3336     3862     +526     
  Branches      603      692      +89     
==========================================
+ Hits         3029     3321     +292     
- Misses        172      382     +210     
- Partials      135      159      +24     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown

Benchmark Results

Measured on ubuntu-latest (GitHub Actions). Runner noise may affect absolute numbers; use these for relative comparisons within a PR.

ExcelReader.Benchmarks.ParseBenchmark


BenchmarkDotNet v0.15.8, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 7763 2.45GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.301
  [Host]     : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3
  Job-MEHJPP : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3

IterationCount=5  WarmupCount=1  

Method Rows Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
ExcelParserSync 50000 29.60 ms 0.152 ms 0.039 ms 1.00 0.00 218.7500 3.88 MB 1.00
ExcelParserAsync 50000 32.79 ms 0.121 ms 0.031 ms 1.11 0.00 187.5000 3.88 MB 1.00
ExcelParserXlsbSync 50000 13.05 ms 0.062 ms 0.016 ms 0.44 0.00 234.3750 3.88 MB 1.00
ExcelParserXlsbAsync 50000 14.85 ms 0.062 ms 0.016 ms 0.50 0.00 234.3750 3.88 MB 1.00
MiniExcel 50000 241.08 ms 6.987 ms 1.814 ms 8.14 0.06 12000.0000 199.31 MB 51.42
Sylvan 50000 96.82 ms 26.187 ms 4.052 ms 3.27 0.12 - 10.47 MB 2.70
SylvanAsync 50000 96.60 ms 7.505 ms 1.949 ms 3.26 0.06 - 10.48 MB 2.70

ExcelReader.Benchmarks.ReadBenchmark


BenchmarkDotNet v0.15.8, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 9V74 2.86GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.301
  [Host]     : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3
  Job-MEHJPP : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3

IterationCount=5  WarmupCount=1  

Method Rows Mean Error StdDev Ratio RatioSD Gen0 Allocated Alloc Ratio
ExcelReader 50000 26.461 ms 0.3240 ms 0.0841 ms 1.00 0.00 - 14.07 KB 1.00
ExcelReaderAsync 50000 27.494 ms 0.0946 ms 0.0246 ms 1.04 0.00 - 16.09 KB 1.14
ExcelReaderXlsb 50000 5.519 ms 0.1764 ms 0.0458 ms 0.21 0.00 - 13.93 KB 0.99
ExcelReaderXlsbAsync 50000 6.519 ms 0.0567 ms 0.0088 ms 0.25 0.00 - 16.37 KB 1.16
MiniExcel 50000 219.966 ms 23.3093 ms 6.0534 ms 8.31 0.21 13000.0000 215587.88 KB 15,322.18
Sylvan 50000 65.301 ms 107.3509 ms 16.6127 ms 2.47 0.56 - 1939.21 KB 137.82

ExcelReader.Benchmarks.WriteBenchmark


BenchmarkDotNet v0.15.8, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 7763 2.45GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.301
  [Host]     : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3
  Job-MEHJPP : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3

IterationCount=5  WarmupCount=1  

Method Rows Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
ExcelReaderWriter 50000 23.13 ms 0.381 ms 0.099 ms 1.00 0.01 375.0000 375.0000 375.0000 2.11 MB 1.00
ExcelReaderXlsbWriter 50000 11.48 ms 0.229 ms 0.059 ms 0.50 0.00 187.5000 187.5000 187.5000 2.04 MB 0.97
MiniExcel 50000 106.06 ms 4.174 ms 1.084 ms 4.59 0.05 6000.0000 2000.0000 2000.0000 85.14 MB 40.41
SpreadCheetah 50000 19.77 ms 0.344 ms 0.089 ms 0.85 0.00 1093.7500 468.7500 468.7500 12.32 MB 5.85

ExcelReader.Benchmarks.XlsReadBenchmark


BenchmarkDotNet v0.15.8, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 7763 3.02GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.301
  [Host]     : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3
  Job-MEHJPP : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3

IterationCount=5  WarmupCount=1  

Method Rows Mean Error StdDev Ratio Gen0 Gen1 Allocated Alloc Ratio
ExcelReader 50000 6.629 ms 0.0914 ms 0.0237 ms 1.00 - - 61.3 KB 1.00
ExcelReaderAsync 50000 6.694 ms 0.2346 ms 0.0609 ms 1.01 - - 61.38 KB 1.00
Sylvan 50000 9.012 ms 0.1718 ms 0.0266 ms 1.36 93.7500 46.8750 1717.73 KB 28.02

ExcelReader.Benchmarks.XlsWriteBenchmark


BenchmarkDotNet v0.15.8, Linux Ubuntu 24.04.4 LTS (Noble Numbat)
AMD EPYC 7763 2.60GHz, 1 CPU, 4 logical and 2 physical cores
.NET SDK 10.0.301
  [Host]     : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3
  Job-MEHJPP : .NET 10.0.9 (10.0.9, 10.0.926.27113), X64 RyuJIT x86-64-v3

IterationCount=5  WarmupCount=1  

Method Rows Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
XlsWriter 50000 8.867 ms 0.4769 ms 0.0738 ms 1.00 0.01 593.7500 468.7500 468.7500 12.37 MB 1.00
XlsxWriter 50000 22.726 ms 0.4144 ms 0.0641 ms 2.56 0.02 375.0000 375.0000 375.0000 2.11 MB 0.17

@GabrielMarquezMatte GabrielMarquezMatte merged commit 4e372e8 into master Jun 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants