Skip to content

Using multi-timestamp causes loss of _source and other metadata fields #218

@Jacob-gr

Description

@Jacob-gr

Description

When using rdump --multi-timestamp, metadata fields (_source, _classification, _generated) are set to null or modified on all output records.

Disclosure: I used Claude to help explore and diagnose this bug.

To Replicate

Run rdump and rdump --multi-timestamp --record-classification TEST and observe outputs. The _source and _classification will appear as null on the second.

Example Output (Truncated)

target-query /some/image -f userassist | rdump --record-classification TEST -w jsonfile://userassist.jsonl

{"hostname": "DESKTOP-KLOQJ0V", "ts": "2017-01-27T00:33:27.668032+00:00", "_source": "/home/user/test-images/HD1.E01", "_classification": "TEST", "_generated": "2026-03-26T19:05:01.081233+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 2720671085]}

target-query /some/image -f userassist | rdump --multi-timestamp --record-classification TEST -w jsonfile://userassist.jsonl

{"ts": "2017-01-27T00:33:27.668032+00:00", "ts_description": "ts", "hostname": "DESKTOP-KLOQJ0V", "_source": null, "_classification": null, "_generated": "2026-03-26T19:08:18.288666+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 1909757769]}

Bug

Looking into the problem, it appears the issue lies here:

TimestampRecord = RecordDescriptor(
"record/timestamp",
[
("datetime", "ts"),
("string", "ts_description"),
],
)
def iter_timestamped_records(record: Record) -> Iterator[Record]:
"""Yields timestamped annotated records for each ``datetime`` fieldtype in ``record``.
If ``record`` does not have any ``datetime`` fields the original record is returned.
Args:
record: Record to add timestamp fields for.
Yields:
Record annotated with ``ts`` and ``ts_description`` fields for each ``datetime`` fieldtype.
"""
# get all ``datetime`` fields. (excluding _generated).
dt_fields = record._desc.getfields("datetime")
if not dt_fields:
yield record
return
# yield a new record for each ``datetime`` field assigned as ``ts``.
record_name = record._desc.name
for field in dt_fields:
ts_record = TimestampRecord(getattr(record, field.name), field.name)
# we extend ``ts_record`` with original ``record`` so TSRecord info goes first.
record = extend_record(ts_record, [record], name=record_name)
yield record

iter_timestamped_records() creates a new TimestampRecord and merges it with the original record using extend_record() on line 1126.

Because TimestampRecord is a newly created RecordDescriptor, its metadata fields default to None. During the extend process, it returns the first value it finds for each key. Because the new TimestampRecord is provided first, the None values are used over the original record's real metadata.

This also probably has the result of generating a new _generated timestamp, but this might be appropriate given it is when that specific record is generated. However, because these are derived records from an original record, I would personally like for them to all have the original _generated timestamp too. I'm unsure what the desired behavior is though.

There is also a small scoping issue with record being reused each iteration. record is already used as a parameter, so to avoid record being inadvertently extended with each subsequent loop of fields in dt_fields, the extended record should be assigned to a new variable.

Suggested Fix

Explicitly mapping the metadata fields after creation of the TimestampRecord seems like the most straightforward approach. It would have to be modified if any metadata fields are added in the future, but does allow more control if, for example, the _generated field should not be carried over from the original record.

Additionally, tests should be created that check if the desired metadata fields are preserved.

    record_name = record._desc.name
    for field in dt_fields:
        ts_record = TimestampRecord(getattr(record, field.name), field.name)
1. Preserve metadata from the original record so it isn't shadowed by
1. the newly-created TimestampRecord's None defaults.
           ts_record._source = record._source
           ts_record._classification = record._classification
           ts_record._generated = record._generated
1. we extend ``ts_record`` with original ``record`` so TSRecord info goes first.
           result = extend_record(ts_record, [record], name=record_name)
           yield result

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions