CSE559A Lecture 7
Computer Vision (In Artificial Neural Networks for Image Understanding)
Early example of image understanding using Neural Networks: [Back propagation for zip code recognition]
Central idea; representation change, on each layer of feature.
Plan for next few weeks:
- How do we train such models?
- What are those building blocks
- How should we combine those building blocks?
How do we train such models?
CV is finally useful…
- Image classification
- Image segmentation
- Object detection
ImageNet Large Scale Visual Recognition Challenge (ILSVRC)
- 1000 classes
- 1.2 million images
- 10000 test images
Deep Learning (Just neural networks)
Bigger datasets, larger models, faster computers, lots of incremental improvements.
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:]
num_features = 1
for s in size:
num_features *= s
return num_features
# create pytorch dataset and dataloader
dataset = torch.utils.data.TensorDataset(torch.randn(1000, 1, 28, 28), torch.randint(10, (1000,)))
dataloader = torch.utils.data.DataLoader(dataset, batch_size=4, shuffle=True, num_workers=2)
# training process
net = Net()
optimizer = optim.Adam(net.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
# loop over the dataset multiple times
for epoch in range(2):
for i, data in enumerate(dataloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
print(f"Finished Training")Some generated code above.
Supervised Learning
Training: given a dataset, learn a mapping from input to output.
Testing: given a new input, predict the output.
Example: Linear classification models
Find a linear function that separates the data.
Simple representation of a linear classifier.
Empirical loss minimization framework
Given a training set, find a model that minimizes the loss function.
Assume iid samples.
Example of loss function:
l1 loss:
l2 loss:
Linear classification models
hard to find the global minimum.
Linear regression
However, if we use l2 loss, we can find the global minimum.
This is a convex function, so we can find the global minimum.
The gradient is:
Set the gradient to 0, we get:
From the maximum likelihood perspective, we can also derive the same result.
Logistic regression
Sigmoid function:
The loss of logistic regression is not convex, so we cannot find the global minimum using normal equations.
Gradient Descent
Full batch gradient descent:
Stochastic gradient descent:
Mini-batch gradient descent:
Mini-batch Gradient Descent:
at each step, we update the weights using the average gradient of the mini-batch.
the mini-batch is selected randomly from the training set.
Multi-class classification
Use softmax function to convert the output to a probability distribution.
Neural Networks
From linear to non-linear.
- Shadow approach:
- Use feature transformation to make the data linearly separable.
- Deep approach:
- Stack multiple layers of linear models.
Common non-linear functions:
- ReLU:
- Sigmoid:
- Tanh: