CSE510 Deep Reinforcement Learning (Lecture 1)

Artificial general intelligence

Multimodeal perception
Persistent memory + retrieval
World modeling + planning
Tool use with verification
Interactive learning loops (RLHF/RLAIF)
Uncertainty estimation & oversight

LLM may not be the ultimate solution for AGI, but may be a part of solution.

Long-Horizon Agency

Decision-Making/Control and Multi-Agent collaboration

Course logistics

Announcement and discussion on Canvas

Weekly recitations

Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030

or night office hours (11am-12pm Wed in Mckelvey Hall 2010D)

or by appointment

Prerequisites

Proficiency in Python programming.
Programming experience with deep learning.
Research Experience (Not required, but highly recommended)
Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.

Textbook

Not required, but recommended:

Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online).
Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.).
OpenAI Spinning Up in Deep RL tutorial.

Final Project

Research-level project of your choice

Improving an existing approach
Tackling an unsolved task/benchmark
Creating a new task/problem that hasn’t been addressed by RL

Can be done in a team of 1-2 students

Must be harder than homework.

The core is to understand the pipeline of RL research, may not always be an improvement over existing methods.

Milestones

Proposal (max 2 pages)
Progress report with brief survey (max 4 pages)
Presentation/Poster session
Final report (7-10 pages, NeurIPS style)

What is RL?

Goal for course

How to build intelligent agents that learn to act and achieve specific goals in a dynamic environments?

Acting to achieve is key part of intelligence.

Brain is to produce adaptable and complex movements. (Daniel Wolpert)

What RL do

A general-purpose framwork for decision making/behavioral learning

RL is for an agent with the capacity to act
Each action influences the agent’s future observation
Success is measured by a scalar reward signal
Goal: find a policy that maximize expected total rewards.

Exploration: Add randomness to your action selection

If the result was better than expected, do more of the same in the future.

Deep reinforcement learning

DL is a general-purpose framework for representation learning.

Given an objective
Learn representation that is required to achieve objective
Directly from raw inputs
Using minimal domain knowledge

Deep learning enables RL algorithms to solve complex problems in an end-to-end manner.

Machine learning Paradigm

Supervised learning: learning from examples

Self-supervised learning: learning structures in data

Reinforcement learning: learning from experiences

Example using LLMs:

Self-supervised: pretraining

SFT: supervised fine-tuning (post-training)

RL is also used in post-training for improving reasoning capabilities.

RLHF: reinforcement learning from human feedback (fine-tuning)

RL generates data beyond the original training data.

All the paradigm are “supervised” by a loss function.

Differences for RL from other paradigms

Exploration: the agent does not have prior data known to be good.

Non-stationarity: the environment is dynamic and the agent’s actions influence the environment.

Credit assignment: the agent needs to learn to assign credit to its actions. (delayed reward)

Limited samples: actions take time to execute in the real world, which may limited the amount of experience.