Skip to content

#4677 follow-up — object-projection aggregates + post-SelectMany Select/Where#4678

Merged
jeremydmiller merged 6 commits into
masterfrom
feature/4677-followup-object-aggregate-select-where
Jun 8, 2026
Merged

#4677 follow-up — object-projection aggregates + post-SelectMany Select/Where#4678
jeremydmiller merged 6 commits into
masterfrom
feature/4677-followup-object-aggregate-select-where

Conversation

@jeremydmiller

Copy link
Copy Markdown
Member

Stacked on top of #4677 — must merge after #4677 (the diff includes that PR's commits because the base branch lives on the contributor's fork and isn't reachable from this remote). Addresses the two Known limitations that #4677 explicitly left open:

  1. Aggregates over an object projection.
    .SelectMany(.., (x, c) => new {..}).Sum(z => z.X) (which QueryableExtensions
    lowers to .Select(z => z.X).SumAsync()) used to hit
    JoinSelectClause.ApplyOperator(\"SUM\") on the object projection and throw the
    clear-error pin shipped in Fix several LINQ gaps: DateOnly/TimeOnly projection, SelectMany Distinct().Count(), and GroupJoin Distinct/scalar/aggregates #4677.

  2. .Select(...) / .Where(...) after the join's .SelectMany(...).
    CompileGroupJoin ignored both: a post-SelectMany Select returned the original
    anon-typed rows and a post-SelectMany Where was silently dropped.

Approach

The fix is a single seam in CompileGroupJoin — a new AnonProjectionExpander
visitor walks the FlattenedResultSelector's (x, c) => new { Member = source }
bindings and rewrites a z.Member access (on a parameter of the projection's
anon type) back to its source expression in (x, c) terms.

With the expander in hand:

  • Post-SelectMany Select(z => ...) reduces to a synthesized effective
    result selector that JoinSelectParser then renders normally — including
    lighting up the bare-scalar aggregate path that already handles
    Sum/Min/Max/Average over a scalar projection.
  • Post-SelectMany Where(z => ...) filters get expanded the same way and
    routed onto the appropriate CTE's existing Where pipeline — inner-side onto
    InnerCollectionUsage.WhereExpressions before its ParseWhereClause runs;
    outer-side onto outerStatement.ParseWhereClause after the fact with
    storage=null so we don't double-apply tenant / soft-delete defaults.

Both Select and Where walk the Inner chain (Inner and Inner.Inner)
the same way SingleValueMode / IsDistinct already do — re-linq sometimes
pushes a post-SelectMany Select onto a new usage because the element type
changed, so the operators don't always sit on Inner directly.

Pinned limitations (clear-error, not silent)

Tests

11 new tests in Bug_4677_groupjoin_post_selectmany.cs:

  • object_projection_sum_of_inner_member / _of_outer_member / _min_and_max
    / _average_returns_double / _sum_over_left_join_ignores_unmatched
  • post_select_many_select_reduces_to_scalar_member
  • post_select_many_select_reshuffles_to_new_anonymous_type
  • post_select_many_where_filters_on_inner_member / _on_outer_member
  • post_select_many_where_then_sum_chains (both fixes in one chain)
  • post_select_many_where_touching_both_sides_throws_clear_error (limitation pin)

The 53 existing tests across Bug_groupjoin_distinct_and_scalar_projection,
Bug_distinct_count_over_selectmany, Bug_dateonly_timeonly_scalar_projection,
and the group_join_operator family stay green. Full LinqTests suite passes
on both net9.0 and net10.0 (1312 / 1313 — 1 unrelated pre-existing skip).

🤖 Generated with Claude Code

haefele-octoja and others added 6 commits June 7, 2026 15:27
…tion

Query<T>().Select(x => x.SomeDateOnly) -- and the TimeOnly and nullable
variants -- threw System.Text.Json "'-' is an invalid end of a number":
SelectorVisitor.ToScalar omitted DateOnly/TimeOnly from the scalar whitelist,
so the projection fell to DataSelectClause and JSON-deserialized the
quote-stripped `data ->> 'x'` text. Route them through NewScalarSelectClause
like DateTime/DateTimeOffset so the native date/time TypedLocator
(mt_immutable_date()/mt_immutable_time()) is read directly. MemberType is
already the non-nullable underlying type, so DateOnly?/TimeOnly? are covered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Query<T>().SelectMany(x => x.Children).Select(c => c.Value).Distinct().Count()
threw NotSupportedException "The database operator 'DISTINCT' cannot be used with
non-simple types" even though the equivalent Distinct().ToList() worked.
BuildSelectManyStatement re-applied DISTINCT to the count clause that
ProcessSingleValueModeIfAny had already produced, and the Inner-merged IsDistinct
flag never reached the count path -- so when the throw was avoided a plain
non-distinct count was produced instead. Skip the tail DISTINCT operator for the
Count/LongCount modes (already handled as count(*) over a DISTINCT CTE) and re-sync
IsDistinct onto the statement for those modes so the distinct projection is the
thing being counted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GroupJoin(...).SelectMany(...) had two gaps:
  - Distinct() was silently dropped over the join. JoinSelectClause was neither
    IScalarSelectClause nor ICountClause, so neither distinct branch in
    ProcessSingleValueModeIfAny matched: Distinct().Count() returned the full joined
    row count, and Distinct().ToList() returned non-distinct rows.
  - A bare scalar result selector ((x, t) => x.l.Id) built an empty NewObject and threw
    "Sequence contains no elements" at materialization -- JoinSelectParser only handled
    new {...} / MemberInit object projections.

Fix:
  - JoinSelectClause implements IScalarSelectClause and renders DISTINCT(<projection>)
    when the DISTINCT operator is applied, reusing the standard distinct / count-over-CTE
    machinery for both materialized Distinct() and Distinct().Count()/LongCount().
  - CompileGroupJoin transfers IsDistinct from the SelectMany usage chain onto the join
    statement before ProcessSingleValueModeIfAny.
  - JoinSelectParser renders a bare scalar result selector as to_jsonb(<scalar>) so the
    existing SerializationSelector<T> deserializes it and DISTINCT applies unchanged.
  - The join selector returns default for a null "data" column (a bare scalar whose value
    is null, e.g. a left join projecting an inner member of an unmatched row) rather than
    throwing, matching Marten's scalar-select convention.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Coverage review additions (tests only):
- GroupJoin: dedup correctness (a multi-field object Distinct() must keep rows that
  differ in one field -- not over-merge); bare scalar projections across string,
  decimal, enum and DateOnly (to_jsonb round-trip); and a clear-error test for a
  non-translatable computed scalar body (BadLinqExpressionException).
- DateOnly/TimeOnly projection: a Newtonsoft serializer variant (the fix is
  serializer-agnostic) and Distinct() over the projected value.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A bare scalar projection over a join -- (x, c) => c.Amount -- could already be
materialized, distinct'd and counted, but Sum/Min/Max/Average threw NotSupportedException:
the scalar is rendered as to_jsonb(...) so it can be deserialized/deduped, and Postgres
has no sum/min/max/avg aggregate for jsonb.

Render the raw (un-wrapped) scalar into the join CTE and put a standard scalar select
clause (NewScalarSelectClause / NewScalarStringSelectClause) over it, so the existing
aggregate machinery handles everything -- result types (SUM(int)->bigint, AVG->double via
CloneToDouble), enums, nullable scalars (the clause is built over the underlying type and
also implements ISelector<T?>), and null inner rows from a left join (SUM ignores them).

Object-projection aggregates remain unsupported (there is no single column to aggregate)
and throw a clear error pointing at the bare-scalar shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ct/Where

Addresses the two "Known limitations" that PR #4677 explicitly left open:

1. **Aggregates over an *object* projection.**
   `.SelectMany(.., (x, c) => new {..}).Sum(z => z.X)` (which QueryableExtensions
   lowers to `.Select(z => z.X).SumAsync()`) used to hit
   `JoinSelectClause.ApplyOperator("SUM")` on the object projection and throw.

2. **`.Select(...)` / `.Where(...)` *after* the join's `.SelectMany(...)`.**
   `CompileGroupJoin` ignored both: a post-SelectMany Select returned the original
   anon-typed rows and a post-SelectMany Where was silently dropped.

The fix is a single seam in `CompileGroupJoin`: a new `AnonProjectionExpander`
visitor walks the FlattenedResultSelector's `(x, c) => new { Member = source }`
bindings and rewrites a `z.Member` access (on a parameter of the projection's
anon type) back to its source expression in (x, c) terms. With the expander in
hand `CompileGroupJoin` reduces a post-SelectMany Select into a synthesized
effective result selector that `JoinSelectParser` then renders normally —
including lighting up the bare-scalar aggregate path that already supports
Sum/Min/Max/Average. Post-SelectMany Where filters get expanded the same way
and routed onto the appropriate CTE's existing Where pipeline (inner-side onto
`InnerCollectionUsage.WhereExpressions` before its `ParseWhereClause` runs,
outer-side onto `outerStatement.ParseWhereClause` after the fact with
`storage=null` so we don't double-apply the tenant / soft-delete defaults).

Cross-side Where filters (touching both x.* and c.*) are pinned as a clear
`BadLinqExpressionException` rather than silently mis-translated — that's a
join-level WHERE clause feature for another PR if anyone hits it. Computed
expressions on the projected members (`z.Amount * 2`) also stay pinned: that's
the existing "computed scalar projection" limitation the upstream PR already
locked in.

11 new tests in `Bug_4677_groupjoin_post_selectmany.cs` cover:
* object-projection Sum / Min / Max / Average over inner and outer members,
  decimal scalars, and left-join with NULL inner rows (SQL SUM ignores NULL);
* post-SelectMany Select reducing to a scalar member, and reshuffling to a new
  anonymous type (without compute);
* post-SelectMany Where on the inner side, the outer side, and chained with Sum;
* the both-sides Where clear-error pin.

Existing 53 tests across `Bug_groupjoin_distinct_and_scalar_projection`,
`Bug_distinct_count_over_selectmany`, `Bug_dateonly_timeonly_scalar_projection`,
and the `group_join_operator` family stay green. Full `LinqTests` suite passes
on both net9.0 and net10.0 (1312 / 1313 — 1 unrelated pre-existing skip).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit f4166f6 into master Jun 8, 2026
8 checks passed
@jeremydmiller jeremydmiller deleted the feature/4677-followup-object-aggregate-select-where branch June 8, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants