The finetuning is implemented with the BEVDet framework. We also provide models trained with its advanced setups.
| Config | mAP | NDS | Download |
|---|---|---|---|
| bevdet-swinb-4d-256x704-cbgs | 33.98 | 47.19 | Model |
| bevdet-swinb-4d-256x704-cbgs-geomim | 42.25 | 53.1 | Model |
| bevdet-swinb-4d-stereo-256x704-cbgs-geomim | 45.33 | 55.1 | Model |
| bevdet-swinb-4d-stereo-512x1408-cbgs-geomim | 52.04 | 60.92 | Model |
Note that all the results are obtained by aligning previous frame bev feature during the view transformation.
| Config | mAP | NDS | Latency(ms) | FPS | Model | Log |
|---|---|---|---|---|---|---|
| BEVDet-R50 | 28.3 | 35.0 | 29.1/4.2/33.3 | 30.7 | baidu | baidu |
| BEVDet-R50-CBGS | 31.3 | 39.8 | 28.9/4.3/33.2 | 30.1 | baidu | baidu |
| BEVDet-R50-4D-CBGS | 31.4/35.4# | 44.7/44.9# | 29.1/4.3/33.4 | 30.0 | baidu | baidu |
| BEVDet-R50-4D-Depth-CBGS | 36.1/36.2# | 48.3/48.4# | 35.7/4.0/39.7 | 25.2 | baidu | baidu |
| BEVDet-R50-4D-Stereo-CBGS | 38.2/38.4# | 49.9/50.0# | - | - | baidu | baidu |
| BEVDet-R50-4DLongterm-CBGS | 34.8/35.4# | 48.2/48.7# | 30.8/4.2/35.0 | 28.6 | baidu | baidu |
| BEVDet-R50-4DLongterm-Depth-CBGS | 39.4/39.9# | 51.5/51.9# | 38.4/4.0/42.4 | 23.6 | baidu | baidu |
| BEVDet-R50-4DLongterm-Stereo-CBGS | 41.1/41.5# | 52.3/52.7# | - | - | baidu | baidu |
| BEVDet-STBase-4D-Stereo-512x1408-CBGS | 47.2# | 57.6# | - | - | baidu | baidu |
# align previous frame bev feature during the view transformation.
Depth: Depth supervised from Lidar as BEVDepth.
Longterm: cat 8 history frame in temporal modeling. 1 by default.
Stereo: A private implementation that concat cost-volumn with image feature before executing model.view_transformer.depth_net.
The latency includes Network/Post-Processing/Total. Training without CBGS is deprecated.
| Config | mIoU | Download |
|---|---|---|
| bevdet-occ-swinb-4d-stereo-2x-geomim | 45.0 | Model |
| bevdet-occ-swinb-4d-stereo-2x-geomim (*) | 45.73 | Model |
| bevdet-occ-swinl-4d-stereo-2x-geomim | 46.27 | Model |
| Config | mIOU | Model | Log |
|---|---|---|---|
| BEVDet-Occ-R50-4D-Stereo-2x | 36.1 | baidu | baidu |
| BEVDet-Occ-R50-4D-Stereo-2x-384x704 | 37.3 | baidu | baidu |
| BEVDet-Occ-R50-4DLongterm-Stereo-2x-384x704 | 39.3 | baidu | baidu |
| BEVDet-Occ-STBase-4D-Stereo-2x (*) | 42.0 | baidu | baidu |
(*) Load 3D detection checkpoint.
step 1. Please prepare environment as that in Docker.
step 2. Clone the repo and install by.
pip install -v -e .step 3. Prepare nuScenes dataset as introduced in nuscenes_det.md and create the pkl for BEVDet by running:
python tools/create_data_bevdet.pystep 4. For Occupancy Prediction task, download (only) the 'gts' from CVPR2023-3D-Occupancy-Prediction and arrange the folder as:
└── nuscenes
├── v1.0-trainval (existing)
├── sweeps (existing)
├── samples (existing)
└── gts (new)# single gpu
python tools/train.py $config
# multiple gpu
./tools/dist_train.sh $config num_gpu# single gpu
python tools/test.py $config $checkpoint --eval mAP
# multiple gpu
./tools/dist_test.sh $config $checkpoint num_gpu --eval mAP# with pre-computation acceleration
python tools/analysis_tools/benchmark.py $config $checkpoint --fuse-conv-bn
# 4D with pre-computation acceleration
python tools/analysis_tools/benchmark_sequential.py $config $checkpoint --fuse-conv-bn
# view transformer only
python tools/analysis_tools/benchmark_view_transformer.py $config $checkpointpython tools/analysis_tools/get_flops.py configs/bevdet/bevdet-r50.py --shape 256 704- Private implementation. (Visualization remotely/locally)
python tools/test.py $config $checkpoint --format-only --eval-options jsonfile_prefix=$savepath
python tools/analysis_tools/vis.py $savepath/pts_bbox/results_nusc.json1. install mmdeploy from https://github.com/HuangJunJie2017/mmdeploy
2. convert to TensorRT
python tools/convert_bevdet_to_TRT.py $config $checkpoint $work_dir --fuse-conv-bn --fp16 --int8
3. test inference speed
python tools/analysis_tools/benchmark_trt.py $config $engineThe finetuning code is based on BEVDet.