Skip to content

[Python][Parquet] timestamp[s] does not round-trip parquet serialization. #41382

@randolf-scholz

Description

@randolf-scholz

Describe the bug, including details regarding any error messages, version, and platform.

Timestamps with second resolution get upcasted to millisecond resolution when serializing and deserializing. They should either round trip, or there should be a warning/error when attempting to serialize them.

from datetime import datetime
import pyarrow as pa
import pyarrow.compute as pc
from pyarrow import parquet

dates = [
    datetime(2021, 1, 1, 0, 0, 3),
    datetime(2021, 1, 1, 0, 0, 4),
    datetime(2021, 1, 1, 0, 0, 5),
]

table = pa.table({"time": pa.array(dates, type=pa.timestamp("s"))})
print(table.schema)  # timestamp[s]
parquet.write_table(table, "timestamp_roundtrip.parquet")
table2 = parquet.read_table("timestamp_roundtrip.parquet")
print(table2.schema)  # timestamp[ms]

Tested with pyarrow 16.0.0

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions