Skip to content

Refactor and optimize selection logic in ResultExtractor#30

Open
ycherkes wants to merge 2 commits into
masterfrom
optimize-extract-methods
Open

Refactor and optimize selection logic in ResultExtractor#30
ycherkes wants to merge 2 commits into
masterfrom
optimize-extract-methods

Conversation

@ycherkes

@ycherkes ycherkes commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Implements #29

PR Summary — optimize-extract-methods

Branch: optimize-extract-methods
Head commit: 20a65f733843640636139da5eb8c5dc5286b7bd5
Commit message: "Refactor and optimize selection logic in ResultExtractor"

Overview

Refactors the selection logic used by the ResultExtractor to improve performance and clarity when selecting best / top-N matches. Core extractor public APIs are preserved; internal selection was rewritten and extracted into smaller helpers and types.

Key changes

  • Added ScoredCandidate<T> (record-like readonly struct) and ScoredCandidateComparer<T> to represent scored items and provide simple score comparisons.
    • File: FuzzySharp/FuzzySharp/Extractor/ScoredCandidate.cs
  • Introduced BestCandidate<T> helper to encapsulate "best so far" selection logic with a deterministic tie-breaker (prefer lower index on equal scores).
    • File: FuzzySharp/FuzzySharp/Extractor/ScoredCandidate.cs
  • Refactored ResultExtractor internals:
    • ExtractOneCore<T>: simplified linear scan for best match using local tracking variables.
    • ExtractTopCore<T>: implemented top-N extraction using a MinHeap<ScoredCandidate<T>> to achieve O(n log k) behavior and lower overhead compared to full sorting.
    • Added helpers: AddTopCandidate, CreateTopResults to encapsulate heap logic and result materialization.
    • File: FuzzySharp/FuzzySharp/Extractor/ResultExtractor.cs

Behavior / semantics

  • Cutoff handling preserved: items with score < cutoff are ignored.
  • Tie-breaking for best candidate: when scores are equal, the candidate with the smaller index (earlier in input) is preferred.
  • Public extraction methods (ExtractOne, ExtractTop, ExtractSorted, ExtractWithoutOrder) keep the same signatures and behavior from the consumer perspective.

Rationale / benefits

  • Performance: top-N selection moved from potentially heavier sorting to a bounded heap approach (O(n log k)), reducing work and allocations when limit << n.
  • Code clarity: extraction selection logic split into well-named helpers and small data types, improving maintainability and reusability.
  • Deterministic tie-breaking added explicitly.

Tests / verification

  • Recommended tests:
    • Validate ExtractTop for correct top-N results and ordering with duplicates and ties.
    • Verify ExtractOne selects earliest index on equal scores.
    • Benchmarks comparing previous implementation (if available) and new heap-based approach for large choices and small limit.
  • No test changes detected in this commit; consider adding micro-benchmarks for the top-N path.

Files changed (approximate)

  • Added / modified:
    • FuzzySharp/FuzzySharp/Extractor/ScoredCandidate.cs (new helpers / types)
    • FuzzySharp/FuzzySharp/Extractor/ResultExtractor.cs (refactored selection logic)

Notes for reviewers

  • Check MinHeap usage and ScoredCandidateComparer<T> for expected ordering semantics.
  • Confirm no API regressions for edge cases (empty inputs, zero limit for ExtractTop, very large cutoffs).
  • Consider unit / performance tests to lock behavior and measure improvement.

ycherkes added 2 commits June 9, 2026 20:34
Replaced legacy .Max()/.MaxN() selection with efficient core helpers using min-heaps and new internal types for candidate tracking and tie-breaking. Improved performance and determinism for both sequential and parallel variants. Added comprehensive benchmarks and unit tests to validate correctness, tie-breaking, and enumeration guarantees. Standardized API usage and ensured thread safety in parallel methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant