CSE5519 Advances in Computer Vision (Topic I: 2025: Embodied Computer Vision and Robotics)

Novelty in NWM

Conditional Diffusion Transformer
Use time and action to conditioning the diffusion process

Tip

This paper provides a new way to train navigation world models. Via conditioned diffusion, the model can generate an imagined trajectory in an unknown environment and perform navigation tasks.

However, the model collapses frequently when using out-of-distribution data, resulting in poor navigation performance. I wonder how we can further condition on the novelty of the environment and integrate exploration strategies to train the model online to fix the collapse issue. What might be the challenges of doing so in the Conditioned Diffusion Transformer?

Last updated on March 9, 2026

CSE5519 Advances in Computer Vision (Topic G: 2025: Correspondence Estimation and Structure from Motion)CSE5519 Advances in Computer Vision (Topic H: 2025: Safety, Robustness, and Evaluation of CV Models)

CSE5519 Advances in Computer Vision (Topic I: 2025: Embodied Computer Vision and Robotics)

Navigation World Models

Novelty in NWM