Skip to main content

Write a comment

PREreview of Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning

Published
DOI
10.5281/zenodo.15565288
License
CC BY 4.0

Gen-Drive is a promising, yet compute-hungry, stride toward fully learned planning. The framework shows compelling closed-loop gains, but real-time feasibility remains out of reach. 

Summary

Gen-Drive combines a diffusion generator with reinforcement learning to train autonomous-driving policies end to end. The generator proposes diverse trajectories; a reward model trained on pairwise human in addition to a VLM preferences scoring them. RL fine-tuning then supports the generation model toward high-reward, human-aligned behavior. The method outperforms imitation-only and other learning-based planners on the nuPlan closed-loop benchmark.

Strengths

  • Closed-loop boost: overall nuPlan score rises by about 16 points and collision rate drops by roughly 50 % compared with the imitation baseline.

  • Scalable data: VLM-assisted preference collection cuts annotation time significantly by reducing human effort.

Limitations and open questions

  • Latency: single-sample planning averages 282 ms and multi-sample planning (32 samples) averages 484 ms on an RTX 4090 GPU. Such delays risk missing sudden VRU incursions and are unrealistic for embedded automotive hardware. 

  • Safety: the learned reward model imposes no hard constraints, so edge cases which are rare but catastrophic remain unbounded.

Future Scope 

The authors plan show scope to integrate raw LiDAR and radar perception. The sensor noise propagation through the diffusion and RL loop can be scope for future study.

Overall, Gen-Drive advances generative planning, but its heavy compute budget and unproven reactive-safety margin calls for caution before deployment in real vehicles.

Competing interests

The author declares that they have no competing interests.

You can write a comment on this PREreview of Gen-Drive: Enhancing Diffusion Generative Driving Policies with Reward Modeling and Reinforcement Learning Fine-tuning.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now