Skip to content

fix: Array.sample() fails for List and nested Array inner types#279

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-sampling-array-list-dtype
Draft

fix: Array.sample() fails for List and nested Array inner types#279
Copilot wants to merge 3 commits intomainfrom
copilot/fix-sampling-array-list-dtype

Conversation

Copy link
Contributor

Copilot AI commented Feb 17, 2026

Array.sample() crashes when the inner type is List or nested Array:

class TestSchema(dy.Schema):
    a = dy.Array(dy.List(dy.Bool()), 2)

TestSchema.sample(1, generator=dy.random.Generator(0))
# InvalidOperationError: cannot reshape array of size 49 into shape (1, 2)

Root Cause

Series.reshape() operates on primitive element counts, not complex element counts. When sampling Array(List(Bool()), 2):

  • Samples 2 Lists (each containing variable-length Bool sequences)
  • Attempts reshape((1, 2)) on a Series containing 2 List objects
  • Polars counts all nested Bool elements (e.g., 49) instead of the 2 List containers
  • Reshape fails with size mismatch

Changes

  • Array._sample_unchecked: Branch on isinstance(inner_dtype, (pl.List, pl.Array))

    • Complex types: Manual chunking via slice() + to_list(), construct with explicit dtype
    • Other types: Existing reshape() path (scalars, Struct)
  • Tests: Parametrized coverage for Array(List(...)) and Array(Array(...))

Original prompt

This section details on the original issue you should resolve

<issue_title>Sampling fails for columns of dtype Array(List(...), ...)</issue_title>
<issue_description>Consider:

def test_sample_array_list(arr_size, n_samples):
    class TestSchema(dy.Schema):
        a = dy.Array(dy.List(dy.Bool()), arr_size)
    TestSchema.sample(n_samples, generator=dy.random.Generator(0))

On dataframely==2.3.1:

  • calling test_sample_array_list(2, 1)
    fails with polars.exceptions.InvalidOperationError: cannot reshape array of size 49 into shape (1, 2)
  • and calling test_sample_array_list(1, 1)
    fails with polars.exceptions.SchemaError: type Boolean is incompatible with expected type List(Boolean)

I'd expect both to pass.</issue_description>

<agent_instructions>Fix the issue with as few changes as possible</agent_instructions>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 17, 2026 14:37
Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix sampling issue for columns of dtype Array(List(...)) fix: Array.sample() fails for List and nested Array inner types Feb 17, 2026
Copilot AI requested a review from borchero February 17, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sampling fails for columns of dtype Array(List(...), ...)

2 participants

Comments