Convert py123d arrow datasets to PufferDrive .bin files.
raw dataset -> [123D] -> .arrow -> [123Drive] -> .bin
123Drive is uv-only. Use Python 3.11-3.13 from a local git checkout.
Pick the extra that matches your workflow:
uv sync: dataset conversion (base dependencies)uv sync --extra viz: browser vieweruv sync --extra all: everything
uv sync without extras installs only the minimal base package.
Convert only:
uv sync
uv run convert --py123d_path /data/123d --output ./outputInspect existing .bin output in the browser:
uv sync --extra viz
uv run web --dir ./outputOpen http://localhost:8080.
convert: py123d output root (logs/+maps/) -> PufferDrive.binmapforge: affine variants of static map.binfilesbuild: build Docker images for the extraction/conversion pipelineweb: browser viewer for.binfiles
Basic use:
uv run convert --py123d_path /path/to/123d --output ./outputOutput files are named from dataset + scenario identity (for example nuplan__<scenario>.bin).
Mental model:
py123d output root -> load scene/map -> extract PufferScenario -> transforms -> serialize -> .bin
Examples:
# Parallel conversion
uv run convert --py123d_path /path/to/123d --output ./output --workers 8
# Filter datasets / splits / logs
uv run convert --py123d_path /path/to/123d --output ./output \
--datasets nuplan --split_types val --num_scenes 100
# Route filtering knobs
uv run convert --py123d_path /path/to/123d --output ./output \
--min_route_valid_points 10 --route_check_timestep 5
# Map-only conversion
uv run convert --py123d_path /path/to/123d --output ./output --map_onlyPresets bundle the right defaults to reproduce a dataset with one command, and
are the recommended way to convert any dataset. Each pins a dataset family; pick
a split on top with --split_names. Explicit CLI flags always override preset
values. Defined in src/bin_factory/presets.toml.
nuPlan: always convert via
--preset nuplan(or set--duration_sexplicitly). Raw nuPlan logs span minutes; loading one without trimming the duration can exhaust RAM. The preset pins--duration_s 20for you.
uv run convert --preset nuplan --py123d_path /path/to/123d --output ./output
# narrow to a split / override anything inline
uv run convert --preset nuplan --split_names nuplan-mini_val \
--py123d_path /path/to/123d --output ./output --num_scenes 100Core flags:
| Flag | Default | Description |
|---|---|---|
--preset |
none | Apply a dataset preset (av2/carla/nuplan/nuscenes/opendrive/wod-motion) |
--py123d_path |
PY123D_DATA_ROOT or required |
Path to 123D dataset with logs/ and maps/ |
--output |
./output |
Directory for .bin files |
--workers |
0 |
Parallel workers (0 = 80% of CPU cores) |
--chunk_target_scenes |
10000 |
Scenarios per worker dispatch batch |
--validate_level |
1 |
Validation strictness |
--log_level |
INFO |
Root logging level (DEBUG/INFO/WARNING/ERROR/CRITICAL) |
Failures are written to failures.jsonl under --output.
Filtering flags:
| Flag | Default | Description |
|---|---|---|
--num_scenes |
all | Limit number of scenarios |
--datasets |
all | Dataset names to include |
--split_types |
all | Split types to include |
--split_names |
all | Split names to include |
--log_names |
all | Specific log names to include |
--scene_uuids |
all | Specific scene UUIDs to include (debugging) |
--duration_s |
0 |
Scenario duration in seconds, 0 = full |
--map_only |
off | Load map-only scenarios |
Geometry + route flags:
| Flag | Default | Description |
|---|---|---|
--max_segment_length |
10.0 |
Max segment length for polyline interpolation |
--area_threshold |
0.1 |
Polyline simplification threshold, 0 = off |
--min_route_valid_points |
0.0 |
Min valid trajectory percentage for route computation (0-100) |
--route_check_timestep |
0 |
Timestep that must be valid for route computation |
--no_reindex |
off | Skip reindexing element IDs to contiguous range(0, n) |
--interpolate_tl |
off | Interpolate traffic light states from vehicle trajectories |
--invalid_agent_overlap |
off | Zero out log-only agents whose bbox overlaps an active agent during replay |
Validation levels:
| Level | Behavior |
|---|---|
0 |
Skip validation |
1 |
Schema checks: required keys, container types, array shapes, and length consistency |
2 |
Semantic checks: schema plus topology refs, finite values, valid traffic-light states, and ego-only temporal sanity |
The mapforge CLI generates affine-transformed variants of static map .bin files.
Transforms are grouped into families: scale, shear, flip (the catalog lives in
src/mapforge/affine.py). Pick families with --groups (default: all). Original maps
are always copied alongside the variants.
# All groups (scale + shear + flip)
uv run mapforge --input_dir data/static_maps --output_dir data/static_maps_aug
# Only specific groups
uv run mapforge --groups flip --input_dir data/static_maps --output_dir data/static_maps_flip
uv run mapforge --groups scale shear --input_dir data/static_maps --output_dir data/static_maps_warp| Flag | Default | Description |
|---|---|---|
--groups |
all groups | Subset of families to run (scale/shear/flip) |
--input_dir |
required | Directory of source .bin maps |
--output_dir |
required | Directory for augmented .bin files |
uv sync --extra viz
uv run web --dir ./output --port 8080- browse
.binscenarios from a directory - inspect map, agents, route, and traffic controls
- playback, follow-ego, selection, and layer toggles
# Build py123d image
uv run build py123d --dataset nuplan-mini
Images are portable - run them however you want (docker run, Kubernetes, etc.).
py123d-<dataset>is an opinionated BEV-oriented extractor with raw sensors disabled123drive:latestis a thin uv-backed runtime image built from the current checkout and forwards args directly toconvertbuildrequires Docker- Dockerfiles require BuildKit because they use
RUN --mount=type=cache
- Binary format:
docs/binary-format.md - Route search notes:
docs/route-algorithm.md