Skip to Content
CSE510CSE510 Deep Reinforcement Learning (Lecture 1)

CSE510 Deep Reinforcement Learning (Lecture 1)

Artificial general intelligence

  • Multimodeal perception
  • Persistent memory + retrieval
  • World modeling + planning
  • Tool use with verification
  • Interactive learning loops (RLHF/RLAIF)
  • Uncertainty estimation & oversight

LLM may not be the ultimate solution for AGI, but may be a part of solution.

Long-Horizon Agency

Decision-Making/Control and Multi-Agent collaboration

Course logistics

Announcement and discussion on Canvas

Weekly recitations

Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030

or night office hours (11am-12pm Wed in Mckelvey Hall 2010D)

or by appointment

Prerequisites

  • Proficiency in Python programming.
  • Programming experience with deep learning.
  • Research Experience (Not required, but highly recommended)
  • Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.

Textbook

Not required, but recommended:

  • Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online).
  • Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.).
  • OpenAI Spinning Up in Deep RL tutorial.

Final Project

Research-level project of your choice

  • Improving an existing approach
  • Tackling an unsolved task/benchmark
  • Creating a new task/problem that hasn’t been addressed by RL

Can be done in a team of 1-2 students

Must be harder than homework.

The core is to understand the pipeline of RL research, may not always be an improvement over existing methods.

Milestones

  • Proposal (max 2 pages)
  • Progress report with brief survey (max 4 pages)
  • Presentation/Poster session
  • Final report (7-10 pages, NeurIPS style)

What is RL?

Goal for course

How to build intelligent agents that learn to act and achieve specific goals in a dynamic environments?

Acting to achieve is key part of intelligence.

Brain is to produce adaptable and complex movements. (Daniel Wolpert)

What RL do

A general-purpose framwork for decision making/behavioral learning

  • RL is for an agent with the capacity to act
  • Each action influences the agent’s future observation
  • Success is measured by a scalar reward signal
  • Goal: find a policy that maximize expected total rewards.

Exploration: Add randomness to your action selection

If the result was better than expected, do more of the same in the future.

Deep reinforcement learning

DL is a general-purpose framework for representation learning.

  • Given an objective
  • Learn representation that is required to achieve objective
  • Directly from raw inputs
  • Using minimal domain knowledge

Deep learning enables RL algorithms to solve complex problems in an end-to-end manner.

Machine learning Paradigm

Supervised learning: learning from examples

Self-supervised learning: learning structures in data

Reinforcement learning: learning from experiences

Example using LLMs:

Self-supervised: pretraining

SFT: supervised fine-tuning (post-training)

RL is also used in post-training for improving reasoning capabilities.

RLHF: reinforcement learning from human feedback (fine-tuning)

RL generates data beyond the original training data.

All the paradigm are “supervised” by a loss function.

Differences for RL from other paradigms

Exploration: the agent does not have prior data known to be good.

Non-stationarity: the environment is dynamic and the agent’s actions influence the environment.

Credit assignment: the agent needs to learn to assign credit to its actions. (delayed reward)

Limited samples: actions take time to execute in the real world, which may limited the amount of experience.

Last updated on