Skip to content

Fix several LINQ gaps: DateOnly/TimeOnly projection, SelectMany Distinct().Count(), and GroupJoin Distinct/scalar/aggregates#4677

Closed
haefele-octoja wants to merge 5 commits into
JasperFx:masterfrom
haefele-octoja:fix-dateonly-timeonly-scalar-projection
Closed

Fix several LINQ gaps: DateOnly/TimeOnly projection, SelectMany Distinct().Count(), and GroupJoin Distinct/scalar/aggregates#4677
haefele-octoja wants to merge 5 commits into
JasperFx:masterfrom
haefele-octoja:fix-dateonly-timeonly-scalar-projection

Conversation

@haefele-octoja

@haefele-octoja haefele-octoja commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Summary

This fixes a few related LINQ-translation gaps I ran into while using Marten, each with a focused regression test. They're independent, so I'm very happy to split this into separate PRs, rework it in whatever direction you'd prefer, or drop it entirely if any of it is the wrong approach — no attachment to the implementation, I just wanted to surface the bugs with repros.

The changes are ~220 lines across 4 source files in src/Marten/Linq, plus 32 tests in three Bug_* files.

What's fixed

1. DateOnly / TimeOnly scalar projection

session.Query<Doc>().Select(x => x.SomeDateOnly).ToListAsync();

threw System.Text.Json.JsonException: '-' is an invalid end of a number on non-empty results. SelectorVisitor.ToScalar didn't list DateOnly/TimeOnly among the scalar types, so the projection fell through to DataSelectClause, which selects the quote-stripped data ->> 'x' text and JSON-deserializes it. Fix: add them to the scalar branch (mirroring DateTime) so the value is read from the member's native mt_immutable_date()/mt_immutable_time() locator. Nullable variants covered.

2. Count()/LongCount() over a Distinct() of a SelectMany() projection

session.Query<Doc>().SelectMany(d => d.Children).Select(c => c.Value).Distinct().CountAsync();

threw The database operator 'DISTINCT' cannot be used with non-simple types even though .Distinct().ToList() returned the correct distinct set. BuildSelectManyStatement re-applied DISTINCT to the count clause ProcessSingleValueModeIfAny had already produced, and the Inner-merged IsDistinct never reached the count path. Fix: skip the tail DISTINCT for Count/LongCount and re-sync IsDistinct for those modes.

3. GroupJoin(...).SelectMany(...)Distinct, scalar projections, and aggregates

All of the following used to throw or return wrong results, and now work:

var join = session.Query<Customer>()
    .GroupJoin(session.Query<Order>(), c => c.Id, o => o.CustomerId, (c, orders) => new { c, orders });

// --- Distinct over an object projection ---
// previously: Distinct() was silently dropped (non-distinct rows / full joined row count)
await join.SelectMany(x => x.orders, (x, o) => new { x.c.Name, o.Amount }).Distinct().ToListAsync();
await join.SelectMany(x => x.orders, (x, o) => new { x.c.Name, o.Amount }).Distinct().CountAsync();

// --- Bare scalar result selector ---
// previously: threw "Sequence contains no elements" at materialization
await join.SelectMany(x => x.orders, (x, o) => o.Amount).ToListAsync();
await join.SelectMany(x => x.orders, (x, o) => o.Amount).Distinct().CountAsync();

// --- Aggregates over a scalar projection ---
// previously: threw NotSupportedException
await join.SelectMany(x => x.orders, (x, o) => o.Amount).SumAsync(z => z);
await join.SelectMany(x => x.orders, (x, o) => o.Amount).MinAsync(z => z);
await join.SelectMany(x => x.orders, (x, o) => o.Amount).MaxAsync(z => z);
await join.SelectMany(x => x.orders, (x, o) => o.Amount).AverageAsync(z => z); // -> double

// --- Left join (DefaultIfEmpty): childless outer rows included, null inner handled ---
await join.SelectMany(x => x.orders.DefaultIfEmpty(), (x, o) => x.c.Id).Distinct().CountAsync();
await join.SelectMany(x => x.orders.DefaultIfEmpty(), (x, o) => (int?)o.Amount).SumAsync(z => z);

The three underlying issues:

  • Distinct() was silently dropped. JoinSelectClause was neither IScalarSelectClause nor ICountClause, so neither distinct branch in ProcessSingleValueModeIfAny matched: Distinct().Count() returned the full joined row count, and Distinct().ToList() returned non-distinct rows. Fix: JoinSelectClause implements IScalarSelectClause (renders DISTINCT(<projection>)), and CompileGroupJoin transfers IsDistinct from the SelectMany usage. Reuses the existing distinct / count-over-CTE machinery.
  • A bare scalar result selector threw. (x, o) => o.Amount built an empty NewObjectSequence contains no elements at materialization (JoinSelectParser only handled new {...} projections). Fix: render a bare scalar as to_jsonb(<scalar>) so the existing SerializationSelector<T> deserializes it; the join selector also returns default for a null data column (a left-join unmatched inner row) instead of throwing.
  • Sum/Min/Max/Average over a scalar projection threw. The scalar is rendered as to_jsonb(...) and Postgres has no sum/min/max/avg for jsonb. Fix: for an aggregate, render the raw scalar into the join CTE and put a standard NewScalarSelectClause over it, so the existing aggregate machinery handles result types (SUM(int)->bigint, AVG->double), enums, nullable scalars, and left-join nulls.

Known limitations (left in place, fail with a clear error — not silently)

  • Aggregates over an object projection (.SelectMany(... => new {...}).Sum(z => z.Amount)) — there's no single column to aggregate. Workaround: project the scalar directly in the result selector.
  • .Select() / .Where() after the join's SelectManyCompileGroupJoin ignores a post-SelectMany SelectExpression. Supporting it would need a synthesized member collection mapping the anonymous result type's members onto the join's jsonb data column; I judged that a separate, larger feature and left it as a documented limitation (workaround: put the projection in the SelectMany result selector). Happy to take a run at it if you think it's worthwhile.

Testing

  • 32 regression tests across Bug_dateonly_timeonly_scalar_projection, Bug_distinct_count_over_selectmany, and Bug_groupjoin_distinct_and_scalar_projection. Each headline bug was verified red→green (fails without the fix, passes with it).
  • Full LinqTests suite passes on both net9.0 and net10.0 (1301 passing), including all existing group_join_operator tests — no regressions.
  • I also did an adversarial pass over the GroupJoin Distinct for silent-wrong-result shapes (multi-field dedup correctness, type round-trips under both System.Text.Json and Newtonsoft, left-join nulls) — none found.

Notes for reviewers

These are independent fixes; I bundled them only because I found them together. Please tell me if you'd rather I split them into separate PRs (the DateOnly/TimeOnly and SelectMany-Distinct().Count() ones are small and standalone; the GroupJoin change is the larger one), rework any of them, or drop them. I'll happily adapt.

🤖 Generated with Claude Code

haefele-octoja and others added 5 commits June 7, 2026 15:27
…tion

Query<T>().Select(x => x.SomeDateOnly) -- and the TimeOnly and nullable
variants -- threw System.Text.Json "'-' is an invalid end of a number":
SelectorVisitor.ToScalar omitted DateOnly/TimeOnly from the scalar whitelist,
so the projection fell to DataSelectClause and JSON-deserialized the
quote-stripped `data ->> 'x'` text. Route them through NewScalarSelectClause
like DateTime/DateTimeOffset so the native date/time TypedLocator
(mt_immutable_date()/mt_immutable_time()) is read directly. MemberType is
already the non-nullable underlying type, so DateOnly?/TimeOnly? are covered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Query<T>().SelectMany(x => x.Children).Select(c => c.Value).Distinct().Count()
threw NotSupportedException "The database operator 'DISTINCT' cannot be used with
non-simple types" even though the equivalent Distinct().ToList() worked.
BuildSelectManyStatement re-applied DISTINCT to the count clause that
ProcessSingleValueModeIfAny had already produced, and the Inner-merged IsDistinct
flag never reached the count path -- so when the throw was avoided a plain
non-distinct count was produced instead. Skip the tail DISTINCT operator for the
Count/LongCount modes (already handled as count(*) over a DISTINCT CTE) and re-sync
IsDistinct onto the statement for those modes so the distinct projection is the
thing being counted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GroupJoin(...).SelectMany(...) had two gaps:
  - Distinct() was silently dropped over the join. JoinSelectClause was neither
    IScalarSelectClause nor ICountClause, so neither distinct branch in
    ProcessSingleValueModeIfAny matched: Distinct().Count() returned the full joined
    row count, and Distinct().ToList() returned non-distinct rows.
  - A bare scalar result selector ((x, t) => x.l.Id) built an empty NewObject and threw
    "Sequence contains no elements" at materialization -- JoinSelectParser only handled
    new {...} / MemberInit object projections.

Fix:
  - JoinSelectClause implements IScalarSelectClause and renders DISTINCT(<projection>)
    when the DISTINCT operator is applied, reusing the standard distinct / count-over-CTE
    machinery for both materialized Distinct() and Distinct().Count()/LongCount().
  - CompileGroupJoin transfers IsDistinct from the SelectMany usage chain onto the join
    statement before ProcessSingleValueModeIfAny.
  - JoinSelectParser renders a bare scalar result selector as to_jsonb(<scalar>) so the
    existing SerializationSelector<T> deserializes it and DISTINCT applies unchanged.
  - The join selector returns default for a null "data" column (a bare scalar whose value
    is null, e.g. a left join projecting an inner member of an unmatched row) rather than
    throwing, matching Marten's scalar-select convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Coverage review additions (tests only):
- GroupJoin: dedup correctness (a multi-field object Distinct() must keep rows that
  differ in one field -- not over-merge); bare scalar projections across string,
  decimal, enum and DateOnly (to_jsonb round-trip); and a clear-error test for a
  non-translatable computed scalar body (BadLinqExpressionException).
- DateOnly/TimeOnly projection: a Newtonsoft serializer variant (the fix is
  serializer-agnostic) and Distinct() over the projected value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare scalar projection over a join -- (x, c) => c.Amount -- could already be
materialized, distinct'd and counted, but Sum/Min/Max/Average threw NotSupportedException:
the scalar is rendered as to_jsonb(...) so it can be deserialized/deduped, and Postgres
has no sum/min/max/avg aggregate for jsonb.

Render the raw (un-wrapped) scalar into the join CTE and put a standard scalar select
clause (NewScalarSelectClause / NewScalarStringSelectClause) over it, so the existing
aggregate machinery handles everything -- result types (SUM(int)->bigint, AVG->double via
CloneToDouble), enums, nullable scalars (the clause is built over the underlying type and
also implements ISelector<T?>), and null inner rows from a left join (SUM ignores them).

Object-projection aggregates remain unsupported (there is no single column to aggregate)
and throw a clear error pointing at the bare-scalar shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jeremydmiller added a commit that referenced this pull request Jun 8, 2026
…ct/Where

Addresses the two "Known limitations" that PR #4677 explicitly left open:

1. **Aggregates over an *object* projection.**
   `.SelectMany(.., (x, c) => new {..}).Sum(z => z.X)` (which QueryableExtensions
   lowers to `.Select(z => z.X).SumAsync()`) used to hit
   `JoinSelectClause.ApplyOperator("SUM")` on the object projection and throw.

2. **`.Select(...)` / `.Where(...)` *after* the join's `.SelectMany(...)`.**
   `CompileGroupJoin` ignored both: a post-SelectMany Select returned the original
   anon-typed rows and a post-SelectMany Where was silently dropped.

The fix is a single seam in `CompileGroupJoin`: a new `AnonProjectionExpander`
visitor walks the FlattenedResultSelector's `(x, c) => new { Member = source }`
bindings and rewrites a `z.Member` access (on a parameter of the projection's
anon type) back to its source expression in (x, c) terms. With the expander in
hand `CompileGroupJoin` reduces a post-SelectMany Select into a synthesized
effective result selector that `JoinSelectParser` then renders normally —
including lighting up the bare-scalar aggregate path that already supports
Sum/Min/Max/Average. Post-SelectMany Where filters get expanded the same way
and routed onto the appropriate CTE's existing Where pipeline (inner-side onto
`InnerCollectionUsage.WhereExpressions` before its `ParseWhereClause` runs,
outer-side onto `outerStatement.ParseWhereClause` after the fact with
`storage=null` so we don't double-apply the tenant / soft-delete defaults).

Cross-side Where filters (touching both x.* and c.*) are pinned as a clear
`BadLinqExpressionException` rather than silently mis-translated — that's a
join-level WHERE clause feature for another PR if anyone hits it. Computed
expressions on the projected members (`z.Amount * 2`) also stay pinned: that's
the existing "computed scalar projection" limitation the upstream PR already
locked in.

11 new tests in `Bug_4677_groupjoin_post_selectmany.cs` cover:
* object-projection Sum / Min / Max / Average over inner and outer members,
  decimal scalars, and left-join with NULL inner rows (SQL SUM ignores NULL);
* post-SelectMany Select reducing to a scalar member, and reshuffling to a new
  anonymous type (without compute);
* post-SelectMany Where on the inner side, the outer side, and chained with Sum;
* the both-sides Where clear-error pin.

Existing 53 tests across `Bug_groupjoin_distinct_and_scalar_projection`,
`Bug_distinct_count_over_selectmany`, `Bug_dateonly_timeonly_scalar_projection`,
and the `group_join_operator` family stay green. Full `LinqTests` suite passes
on both net9.0 and net10.0 (1312 / 1313 — 1 unrelated pre-existing skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller

Copy link
Copy Markdown
Member

This was superseded by #4677, but it's in. Thank you @haefele-octoja!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants