CSE510 Deep Reinforcement Learning (Lecture 8)

Convolutional Neural Networks

Another note in computer vision can be found here: CSE559A Lecture 10

Basically, it is a stack of different layers:

Filtering: The math behind the matching.

Idea of a convolutional neural network, in some sense, is to let the network “learn” the right filters for a specific task.

Tip

This is irrelevant to the lecture, but consider the following term:

“Bounded rationality”

Convolution is a linear operation
Non-linearity layer creates an activation map from the feature map generated by the convolutional layer
Consisting an activation function (an element-wise operation)
Rectified linear units (ReLUs) is advantageous over the traditional sigmoid or tanh activation functions

Shrinking the Image Stack

Motivation: the activation maps can be large
Reducing the spacial size of the activation maps
- Often after multiple stages of other layers (i.e., convolutional and non-linear layers)
Steps:
1. Pick a window size (usually 2 or 3).
2. Pick a stride (usually 2).
3. Walk your window across your filtered images.
4. From each window, take the maximum value.

Pros:

Cons:

Aggressive reduction can limit the depth of a network and ultimately limit the performance

Multilayer perceptron (MLP)
Mapping the activation volume from previous layers into a class probability distribution
Non-linearity is built in the neurons, instead of a separate layer
Viewed as 1x1 convolution kernels

For classification: Output layer is a regular, fully connected layer with softmax non-linearity

Tip

The golden triangle of machine learning: