Description
When using rdump --multi-timestamp, metadata fields (_source, _classification, _generated) are set to null or modified on all output records.
Disclosure: I used Claude to help explore and diagnose this bug.
To Replicate
Run rdump and rdump --multi-timestamp --record-classification TEST and observe outputs. The _source and _classification will appear as null on the second.
Example Output (Truncated)
target-query /some/image -f userassist | rdump --record-classification TEST -w jsonfile://userassist.jsonl
{"hostname": "DESKTOP-KLOQJ0V", "ts": "2017-01-27T00:33:27.668032+00:00", "_source": "/home/user/test-images/HD1.E01", "_classification": "TEST", "_generated": "2026-03-26T19:05:01.081233+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 2720671085]}
target-query /some/image -f userassist | rdump --multi-timestamp --record-classification TEST -w jsonfile://userassist.jsonl
{"ts": "2017-01-27T00:33:27.668032+00:00", "ts_description": "ts", "hostname": "DESKTOP-KLOQJ0V", "_source": null, "_classification": null, "_generated": "2026-03-26T19:08:18.288666+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 1909757769]}
Bug
Looking into the problem, it appears the issue lies here:
|
TimestampRecord = RecordDescriptor( |
|
"record/timestamp", |
|
[ |
|
("datetime", "ts"), |
|
("string", "ts_description"), |
|
], |
|
) |
|
|
|
|
|
def iter_timestamped_records(record: Record) -> Iterator[Record]: |
|
"""Yields timestamped annotated records for each ``datetime`` fieldtype in ``record``. |
|
If ``record`` does not have any ``datetime`` fields the original record is returned. |
|
|
|
Args: |
|
record: Record to add timestamp fields for. |
|
|
|
Yields: |
|
Record annotated with ``ts`` and ``ts_description`` fields for each ``datetime`` fieldtype. |
|
""" |
|
|
|
# get all ``datetime`` fields. (excluding _generated). |
|
dt_fields = record._desc.getfields("datetime") |
|
if not dt_fields: |
|
yield record |
|
return |
|
|
|
# yield a new record for each ``datetime`` field assigned as ``ts``. |
|
record_name = record._desc.name |
|
for field in dt_fields: |
|
ts_record = TimestampRecord(getattr(record, field.name), field.name) |
|
# we extend ``ts_record`` with original ``record`` so TSRecord info goes first. |
|
record = extend_record(ts_record, [record], name=record_name) |
|
yield record |
iter_timestamped_records() creates a new TimestampRecord and merges it with the original record using extend_record() on line 1126.
Because TimestampRecord is a newly created RecordDescriptor, its metadata fields default to None. During the extend process, it returns the first value it finds for each key. Because the new TimestampRecord is provided first, the None values are used over the original record's real metadata.
This also probably has the result of generating a new _generated timestamp, but this might be appropriate given it is when that specific record is generated. However, because these are derived records from an original record, I would personally like for them to all have the original _generated timestamp too. I'm unsure what the desired behavior is though.
There is also a small scoping issue with record being reused each iteration. record is already used as a parameter, so to avoid record being inadvertently extended with each subsequent loop of fields in dt_fields, the extended record should be assigned to a new variable.
Suggested Fix
Explicitly mapping the metadata fields after creation of the TimestampRecord seems like the most straightforward approach. It would have to be modified if any metadata fields are added in the future, but does allow more control if, for example, the _generated field should not be carried over from the original record.
Additionally, tests should be created that check if the desired metadata fields are preserved.
record_name = record._desc.name
for field in dt_fields:
ts_record = TimestampRecord(getattr(record, field.name), field.name)
1. Preserve metadata from the original record so it isn't shadowed by
1. the newly-created TimestampRecord's None defaults.
ts_record._source = record._source
ts_record._classification = record._classification
ts_record._generated = record._generated
1. we extend ``ts_record`` with original ``record`` so TSRecord info goes first.
result = extend_record(ts_record, [record], name=record_name)
yield result
Description
When using
rdump --multi-timestamp, metadata fields (_source,_classification,_generated) are set to null or modified on all output records.Disclosure: I used Claude to help explore and diagnose this bug.
To Replicate
Run
rdumpandrdump --multi-timestamp --record-classification TESTand observe outputs. The_sourceand_classificationwill appear asnullon the second.Example Output (Truncated)
target-query /some/image -f userassist | rdump --record-classification TEST -w jsonfile://userassist.jsonl{"hostname": "DESKTOP-KLOQJ0V", "ts": "2017-01-27T00:33:27.668032+00:00", "_source": "/home/user/test-images/HD1.E01", "_classification": "TEST", "_generated": "2026-03-26T19:05:01.081233+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 2720671085]}target-query /some/image -f userassist | rdump --multi-timestamp --record-classification TEST -w jsonfile://userassist.jsonl{"ts": "2017-01-27T00:33:27.668032+00:00", "ts_description": "ts", "hostname": "DESKTOP-KLOQJ0V", "_source": null, "_classification": null, "_generated": "2026-03-26T19:08:18.288666+00:00", "_version": 1, "_type": "record", "_recorddescriptor": ["windows/registry/userassist", 1909757769]}Bug
Looking into the problem, it appears the issue lies here:
flow.record/flow/record/base.py
Lines 1097 to 1129 in a594c19
iter_timestamped_records()creates a newTimestampRecordand merges it with the original record usingextend_record()on line 1126.Because
TimestampRecordis a newly createdRecordDescriptor, its metadata fields default toNone. During the extend process, it returns the first value it finds for each key. Because the newTimestampRecordis provided first, theNonevalues are used over the original record's real metadata.This also probably has the result of generating a new
_generatedtimestamp, but this might be appropriate given it is when that specific record is generated. However, because these are derived records from an original record, I would personally like for them to all have the original_generatedtimestamp too. I'm unsure what the desired behavior is though.There is also a small scoping issue with
recordbeing reused each iteration.recordis already used as a parameter, so to avoidrecordbeing inadvertently extended with each subsequent loop offields in dt_fields, the extended record should be assigned to a new variable.Suggested Fix
Explicitly mapping the metadata fields after creation of the
TimestampRecordseems like the most straightforward approach. It would have to be modified if any metadata fields are added in the future, but does allow more control if, for example, the_generatedfield should not be carried over from the original record.Additionally, tests should be created that check if the desired metadata fields are preserved.