BEVFormer

A BEVFormer reimplementation in pure PyTorch for camera-only 3D object detection on nuScenes, with no MMDetection or MMCV dependencies. Written primarily as an educational reference.(Even though its named tiny and written primarily as an educational reference if you add the custom cuda kernel for deformable attention this repo if good enough probably even a bit faster than the original implemnetion, but if you want to load the pretrained weights from the orginal repo then that might be easier with the original repo )

This repo includes the core model architecture, a nuScenes data pipeline, temporal self attention and spatial cross attention have been written cleanly using einops focusing more on readabilty, PyTorch Lightning training, nuScenes metric integration, and unit tests for the main components. So if someone want to dig into a component they can use the unit test to specifically check that module.

Implementation Notes

Key differences from the original implementation:

Reference point pre-computation — The 2D and 3D reference point calculations, which were originally computed inside the model's forward pass, are moved outside since they are constant across steps. Only the lidar2img projection is kept in the forward path.
Single decoder computation graph — The original implementation built a redundant computation graph for the regression head (once for iterative reference point updates and once for final outputs). This has been reduced to a single pass.
Yaw convention — The original saves yaw in the SECOND coordinate system. Here, yaw is stored directly in the nuScenes coordinate system, which slightly changes the ego-motion shift calculation in the temporal self-attention.
Regresion head - In the original implemenation they have cx,cy,cz normalized to metric in the regression head out, here i have kept it in the [0,1] normalized space and did the normalization for the groundtruth before the loss calcualtions.
Readability — Most complex tensor operations have been rewritten using einops for clarity.

Known limitation: Original implemenation have good image augmentation pipe line including very useful ones like GridMask, currently its not there also image resizing is not currently supported , the corresponding intrinsic matrix scaling transform has not been implemented yet.(This is not much a simple scale tranform would suffice.)

Quick Start

1. Install

pip install -r requirements.txt
pip install -e .

2. Prepare nuScenes data

Use the helper script:

bash scripts/prepare_data.sh

This runs:

python tools/data_converter/create_data.py nuscenes \
    --root-path data/nuscenes \
    --canbus data/can_bus \
    --version v1.0-mini \
    --out-dir data

The default tiny config expects temporal nuScenes info files under data/nuscenes/.

3. Train

Use the provided script:

bash scripts/train_tiny.sh

Or run the trainer directly:

python tools/train.py --config configs/bevformer_tiny.yaml

The training script also supports checkpoint resume and dot-notation config overrides:

python tools/train.py --config configs/bevformer_tiny.yaml \
    --resume work_dirs/nuscenes_mini/checkpoints/last.ckpt

python tools/train.py --config configs/bevformer_tiny.yaml \
    train.batch_size=2 optimizer.lr=1e-4 data.load_interval=5

Outputs are written under work_dirs/nuscenes_mini/, including checkpoints, TensorBoard logs, and nuScenes eval artifacts. (basically depends on your work_dir in configs, the above if for the current demo yaml)

4. Train on the full nuScenes dataset

Step 1 — Generate PKL files (one-time):

python tools/data_converter/create_data.py nuscenes \
    --root-path /path/to/nuscenes \
    --canbus /path/to/nuscenes_full \
    --version v1.0 \
    --out-dir /path/to/nuscenes_full

Step 2 — Train on full dataset (no file edits, overrides passed on CLI):

python tools/train.py \
    "data.root=/path/to/nuscenes" \
    "data.train_ann=/path/to/nuscenes_full/nuscenes_infos_temporal_train.pkl" \
    "data.val_ann=/path/to/nuscenes_full/nuscenes_infos_temporal_val.pkl" \
    "data.version=v1.0-trainval" \
    "data.eval_set=val"

References

All thanks to the very neatly written BEVFormer paper and its implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
bevformer		bevformer
configs		configs
images		images
scripts		scripts
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BEVFormer

Implementation Notes

Quick Start

1. Install

2. Prepare nuScenes data

3. Train

4. Train on the full nuScenes dataset

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BEVFormer

Implementation Notes

Quick Start

1. Install

2. Prepare nuScenes data

3. Train

4. Train on the full nuScenes dataset

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages