CSE5519 Advances in Computer Vision (Topic I: 2022: Embodied Computer Vision and Robotics)

DayDreamer: World Models for Physical Robot Learning

This is a real world learning framework for robotics.

Novelty in the integration of world‑model learning with reinforcement learning

Leverage the dreamer algorithm for fast robot learning in real world.

Two neural network components drawn from the replay buffer.

Encoder

Fuses all sensory modalities into discrete codes. The decoder reconstruction the inputs from the codes, providing a rich learning signal and enabling human inspection of model predictions.

A recurrent state-space model is trained to predict the future code given actions, without observing the intermediate inputs.

World model learning

The world model enables massively parallel policy optimization from imagined rollouts in the compact latent space using a large batch size, without having to reconstruct sensory inputs. Dreamer trains a policy network and value network from the imagined rollouts and learned.

Tip

This paper uses online reinforcement learning to reach unsupervised training in a real environment with replay buffers.

The key limitation in the process is that it requires long real training time, as the simulator can concurrently generate large batches of data for training. Is it more efficient to use the simulator to train some parts of the model first and use the real-world data to fine-tune the model, or would it be more efficient in terms of training time and repair costs? In the paper, there are a few comparisons of results between simulator training and the real-world model. I wonder what the story is on the other side? How does the pure simulator-based model training go?