일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
- mask r-cnn
- SQLD
- Machine Learning
- Alexnet
- marchine learning
- Adam
- BFS
- RNN
- 밑바닥부터 시작하는 딥러닝
- 밑바닥부터 시작하는 딥러닝2
- deep learning
- do it! 알고리즘 코딩테스트: c++편
- DFS
- CNN
- assignment1
- CPP
- Regularization
- Optimization
- 딥러닝
- assignment2
- Multi-Head Attention
- cs231n
- C++
- Generative Models
- Algorithm
- dropout
- Baekjoon
- Python
- computer vision
- Transformer
- Today
- Total
newhaneul
[Stanford Univ: CS231n] Spring 2025 Assignment2. Q3(Convolutional Neural Networks) 본문
[Stanford Univ: CS231n] Spring 2025 Assignment2. Q3(Convolutional Neural Networks)
뉴하늘 2025. 5. 16. 16:39본 포스팅은 Stanford University School of Engineering의 CS231n: Convolutional Neural Networks for Visual Recognition을 수강하고 공부한 내용을 정리하기 위한 포스팅입니다.
https://github.com/cs231n/cs231n.github.io/blob/master/assignments/2025/assignment2.md
cs231n.github.io/assignments/2025/assignment2.md at master · cs231n/cs231n.github.io
Public facing notes page. Contribute to cs231n/cs231n.github.io development by creating an account on GitHub.
github.com
https://github.com/KwonKiHyeok/CS231n/tree/main
GitHub - KwonKiHyeok/CS231n: This repository contains my solutions to the assignments of the CS231n course offered by Stanford U
This repository contains my solutions to the assignments of the CS231n course offered by Stanford University (Spring 2025). - KwonKiHyeok/CS231n
github.com
Q3. Convolutional Neural Networks

def conv_forward_naive(x, w, b, conv_param):
"""A naive implementation of the forward pass for a convolutional layer.
The input consists of N data points, each with C channels, height H and
width W. We convolve each input with F different filters, where each filter
spans all C channels and has height HH and width WW.
Input:
- x: Input data of shape (N, C, H, W)
- w: Filter weights of shape (F, C, HH, WW)
- b: Biases, of shape (F,)
- conv_param: A dictionary with the following keys:
- 'stride': The number of pixels between adjacent receptive fields in the
horizontal and vertical directions.
- 'pad': The number of pixels that will be used to zero-pad the input.
During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
along the height and width axes of the input. Be careful not to modfiy the original
input x directly.
Returns a tuple of:
- out: Output data, of shape (N, F, H', W') where H' and W' are given by
H' = 1 + (H + 2 * pad - HH) / stride
W' = 1 + (W + 2 * pad - WW) / stride
- cache: (x, w, b, conv_param)
"""
out = None
###########################################################################
# TODO: Implement the convolutional forward pass. #
# Hint: you can use the function np.pad for padding. #
###########################################################################
stride = conv_param["stride"]
pad = conv_param["pad"]
N, C, H, W = x.shape
F, _, HH, WW = w.shape
x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode = 'constant', constant_values = 0)
out_H = 1 + (H + 2 * pad - HH) // stride
out_W = 1 + (W + 2 * pad - WW) // stride
out = np.zeros((N, F, out_H, out_W))
for n in range(N):
for f in range(F):
for i in range(out_H):
for j in range(out_W):
h_start = i * stride
h_end = h_start + HH
w_start = j * stride
w_end = w_start + WW
window = x_padded[n, :, h_start:h_end, w_start:w_end]
out[n][f][i][j] = np.sum(window * w[f]) + b[f]
###########################################################################
# END OF YOUR CODE #
###########################################################################
cache = (x, w, b, conv_param)
return out, cache
x_shape = (2, 3, 4, 4)
w_shape = (3, 3, 4, 4)
x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)
w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)
b = np.linspace(-0.1, 0.2, num=3)
conv_param = {'stride': 2, 'pad': 1}
out, _ = conv_forward_naive(x, w, b, conv_param)
correct_out = np.array([[[[-0.08759809, -0.10987781],
[-0.18387192, -0.2109216 ]],
[[ 0.21027089, 0.21661097],
[ 0.22847626, 0.23004637]],
[[ 0.50813986, 0.54309974],
[ 0.64082444, 0.67101435]]],
[[[-0.98053589, -1.03143541],
[-1.19128892, -1.24695841]],
[[ 0.69108355, 0.66880383],
[ 0.59480972, 0.56776003]],
[[ 2.36270298, 2.36904306],
[ 2.38090835, 2.38247847]]]])
# Compare your output to ours; difference should be around e-8
print('Testing conv_forward_naive')
print('difference: ', rel_error(out, correct_out))
Testing conv_forward_naive
difference: 2.2121476417505994e-08

from imageio import imread
from PIL import Image
kitten = imread('cs231n/notebook_images/kitten.jpg')
puppy = imread('cs231n/notebook_images/puppy.jpg')
# kitten is wide, and puppy is already square
d = kitten.shape[1] - kitten.shape[0]
kitten_cropped = kitten[:, d//2:-d//2, :]
img_size = 200 # Make this smaller if it runs too slow
resized_puppy = np.array(Image.fromarray(puppy).resize((img_size, img_size)))
resized_kitten = np.array(Image.fromarray(kitten_cropped).resize((img_size, img_size)))
x = np.zeros((2, 3, img_size, img_size))
x[0, :, :, :] = resized_puppy.transpose((2, 0, 1))
x[1, :, :, :] = resized_kitten.transpose((2, 0, 1))
# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))
# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]
# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]
# Vector of biases. We don't need any bias for the grayscale
# filter, but for the edge detection filter we want to add 128
# to each output so that nothing is negative.
b = np.array([0, 128])
# Compute the result of convolving each input in x with each filter in w,
# offsetting by b, and storing the results in out.
out, _ = conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1})
def imshow_no_ax(img, normalize=True):
""" Tiny helper to show images as uint8 and remove axis labels """
if normalize:
img_max, img_min = np.max(img), np.min(img)
img = 255.0 * (img - img_min) / (img_max - img_min)
plt.imshow(img.astype('uint8'))
plt.gca().axis('off')
# Show the original images and the results of the conv operation
plt.subplot(2, 3, 1)
imshow_no_ax(puppy, normalize=False)
plt.title('Original image')
plt.subplot(2, 3, 2)
imshow_no_ax(out[0, 0])
plt.title('Grayscale')
plt.subplot(2, 3, 3)
imshow_no_ax(out[0, 1])
plt.title('Edges')
plt.subplot(2, 3, 4)
imshow_no_ax(kitten_cropped, normalize=False)
plt.subplot(2, 3, 5)
imshow_no_ax(out[1, 0])
plt.subplot(2, 3, 6)
imshow_no_ax(out[1, 1])
plt.show()


def conv_backward_naive(dout, cache):
"""A naive implementation of the backward pass for a convolutional layer.
Inputs:
- dout: Upstream derivatives.
- cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive
Returns a tuple of:
- dx: Gradient with respect to x
- dw: Gradient with respect to w
- db: Gradient with respect to b
"""
dx, dw, db = None, None, None
###########################################################################
# TODO: Implement the convolutional backward pass. #
###########################################################################
x, w, b, conv_param = cache
stride = conv_param["stride"]
pad = conv_param["pad"]
F, C, HH, WW = w.shape
N, _, H, W = x.shape
_, _, out_H, out_W = dout.shape
x_padded = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)), mode = 'constant', constant_values = 0)
dx_padded = np.zeros_like(x_padded)
db = np.zeros_like(b)
dw = np.zeros_like(w)
# db: sum over all n, i, j
for f in range(F):
db[f] = np.sum(dout[:, f, :, :])
# dw, dx
for n in range(N):
for f in range(F):
for i in range(out_H):
for j in range(out_W):
h_start = i * stride
h_end = h_start + HH
w_start = j * stride
w_end = w_start + WW
# dw: gradient of filter weight
dw[f] += x_padded[n, :, h_start:h_end, w_start:w_end] * dout[n, f, i, j]
# dx_padded: gradient of padded input
dx_padded[n, :, h_start:h_end, w_start:w_end] += w[f] * dout[n, f, i, j]
# remove padding from dx_padded to get dx
dx = dx_padded[:, :, pad:H+pad, pad:W+pad]
###########################################################################
# END OF YOUR CODE #
###########################################################################
return dx, dw, db
np.random.seed(231)
x = np.random.randn(4, 3, 5, 5)
w = np.random.randn(2, 3, 3, 3)
b = np.random.randn(2,)
dout = np.random.randn(4, 2, 5, 5)
conv_param = {'stride': 1, 'pad': 1}
dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)
out, cache = conv_forward_naive(x, w, b, conv_param)
dx, dw, db = conv_backward_naive(dout, cache)
# Your errors should be around e-8 or less.
print('Testing conv_backward_naive function')
print('dx error: ', rel_error(dx, dx_num))
print('dw error: ', rel_error(dw, dw_num))
print('db error: ', rel_error(db, db_num))
Testing conv_backward_naive function
dx error: 1.159803161159293e-08
dw error: 2.2471264748452487e-10
db error: 3.3726153958780465e-11

def max_pool_forward_naive(x, pool_param):
"""A naive implementation of the forward pass for a max-pooling layer.
Inputs:
- x: Input data, of shape (N, C, H, W)
- pool_param: dictionary with the following keys:
- 'pool_height': The height of each pooling region
- 'pool_width': The width of each pooling region
- 'stride': The distance between adjacent pooling regions
No padding is necessary here, eg you can assume:
- (H - pool_height) % stride == 0
- (W - pool_width) % stride == 0
Returns a tuple of:
- out: Output data, of shape (N, C, H', W') where H' and W' are given by
H' = 1 + (H - pool_height) / stride
W' = 1 + (W - pool_width) / stride
- cache: (x, pool_param)
"""
out = None
###########################################################################
# TODO: Implement the max-pooling forward pass #
###########################################################################
pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
N, C, H, W = x.shape
out_H = 1 + (H - pool_height) // stride
out_W = 1 + (W - pool_width) // stride
out = np.zeros((N, C, out_H, out_W))
for n in range(N):
for c in range(C):
for i in range(out_H):
for j in range(out_W):
h_start = i * stride
h_end = h_start + pool_height
w_start = j * stride
w_end = w_start + pool_width
out[n, c, i, j] = np.max(x[n, c, h_start:h_end, w_start:w_end])
###########################################################################
# END OF YOUR CODE #
###########################################################################
cache = (x, pool_param)
return out, cache
x_shape = (2, 3, 4, 4)
x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)
pool_param = {'pool_width': 2, 'pool_height': 2, 'stride': 2}
out, _ = max_pool_forward_naive(x, pool_param)
correct_out = np.array([[[[-0.26315789, -0.24842105],
[-0.20421053, -0.18947368]],
[[-0.14526316, -0.13052632],
[-0.08631579, -0.07157895]],
[[-0.02736842, -0.01263158],
[ 0.03157895, 0.04631579]]],
[[[ 0.09052632, 0.10526316],
[ 0.14947368, 0.16421053]],
[[ 0.20842105, 0.22315789],
[ 0.26736842, 0.28210526]],
[[ 0.32631579, 0.34105263],
[ 0.38526316, 0.4 ]]]])
# Compare your output with ours. Difference should be on the order of e-8.
print('Testing max_pool_forward_naive function:')
print('difference: ', rel_error(out, correct_out))
Testing max_pool_forward_naive function:
difference: 4.1666665157267834e-08

def max_pool_backward_naive(dout, cache):
"""A naive implementation of the backward pass for a max-pooling layer.
Inputs:
- dout: Upstream derivatives
- cache: A tuple of (x, pool_param) as in the forward pass.
Returns:
- dx: Gradient with respect to x
"""
dx = None
###########################################################################
# TODO: Implement the max-pooling backward pass #
###########################################################################
x, pool_param = cache
pool_height = pool_param["pool_height"]
pool_width = pool_param["pool_width"]
stride = pool_param["stride"]
N, C, H, W = x.shape
_, _, out_H, out_W = dout.shape
dx = np.zeros_like(x)
for n in range(N):
for c in range(C):
for i in range(out_H):
for j in range(out_W):
h_start = i * stride
h_end = h_start + pool_height
w_start = j * stride
w_end = w_start + pool_width
window = x[n, c, h_start:h_end, w_start:w_end]
window_max = np.max(window)
# mask: if position == max: return True
mask = (window == window_max)
# dx
dx[n, c, h_start:h_end, w_start:w_end] = mask * dout[n, c, i, j]
###########################################################################
# END OF YOUR CODE #
###########################################################################
return dx
np.random.seed(231)
x = np.random.randn(3, 2, 8, 8)
dout = np.random.randn(3, 2, 4, 4)
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)
out, cache = max_pool_forward_naive(x, pool_param)
dx = max_pool_backward_naive(dout, cache)
# Your error should be on the order of e-12
print('Testing max_pool_backward_naive function:')
print('dx error: ', rel_error(dx, dx_num))
Testing max_pool_backward_naive function:
dx error: 3.27562514223145e-12

# Remember to restart the runtime after executing this cell!
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/
!python setup.py build_ext --inplace
%cd /content/drive/My\ Drive/$FOLDERNAME/
/content/drive/My Drive/cs231n/assignments/assignment2/cs231n
/content/drive/My Drive/cs231n/assignments/assignment2

# Rel errors should be around e-9 or less.
from cs231n.fast_layers import conv_forward_fast, conv_backward_fast
from time import time
np.random.seed(231)
x = np.random.randn(100, 3, 31, 31)
w = np.random.randn(25, 3, 3, 3)
b = np.random.randn(25,)
dout = np.random.randn(100, 25, 16, 16)
conv_param = {'stride': 2, 'pad': 1}
t0 = time()
out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)
t1 = time()
out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)
t2 = time()
print('Testing conv_forward_fast:')
print('Naive: %fs' % (t1 - t0))
print('Fast: %fs' % (t2 - t1))
print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('Difference: ', rel_error(out_naive, out_fast))
t0 = time()
dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)
t1 = time()
dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)
t2 = time()
print('\nTesting conv_backward_fast:')
print('Naive: %fs' % (t1 - t0))
print('Fast: %fs' % (t2 - t1))
print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('dx difference: ', rel_error(dx_naive, dx_fast))
print('dw difference: ', rel_error(dw_naive, dw_fast))
print('db difference: ', rel_error(db_naive, db_fast))
Testing conv_forward_fast:
Naive: 8.520559s
Fast: 0.029350s
Speedup: 290.310588x
Difference: 4.926407851494105e-11
Testing conv_backward_fast:
Naive: 9.971311s
Fast: 0.360914s
Speedup: 27.627960x
dx difference: 1.949764775345631e-11
dw difference: 3.681156828004736e-13
db difference: 3.1393858025571252e-15
# Relative errors should be close to 0.0.
from cs231n.fast_layers import max_pool_forward_fast, max_pool_backward_fast
np.random.seed(231)
x = np.random.randn(100, 3, 32, 32)
dout = np.random.randn(100, 3, 16, 16)
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
t0 = time()
out_naive, cache_naive = max_pool_forward_naive(x, pool_param)
t1 = time()
out_fast, cache_fast = max_pool_forward_fast(x, pool_param)
t2 = time()
print('Testing pool_forward_fast:')
print('Naive: %fs' % (t1 - t0))
print('fast: %fs' % (t2 - t1))
print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('difference: ', rel_error(out_naive, out_fast))
t0 = time()
dx_naive = max_pool_backward_naive(dout, cache_naive)
t1 = time()
dx_fast = max_pool_backward_fast(dout, cache_fast)
t2 = time()
print('\nTesting pool_backward_fast:')
print('Naive: %fs' % (t1 - t0))
print('fast: %fs' % (t2 - t1))
print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('dx difference: ', rel_error(dx_naive, dx_fast))
Testing pool_forward_fast:
Naive: 1.737666s
fast: 0.013117s
speedup: 132.478397x
difference: 0.0
Testing pool_backward_fast:
Naive: 1.954240s
fast: 0.013476s
speedup: 145.020046x
dx difference: 0.0

from cs231n.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward
np.random.seed(231)
x = np.random.randn(2, 3, 16, 16)
w = np.random.randn(3, 3, 3, 3)
b = np.random.randn(3,)
dout = np.random.randn(2, 3, 8, 8)
conv_param = {'stride': 1, 'pad': 1}
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}
out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)
dx, dw, db = conv_relu_pool_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)
# Relative errors should be around e-8 or less
print('Testing conv_relu_pool')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))
Testing conv_relu_pool
dx error: 9.591132621921372e-09
dw error: 5.802391137330214e-09
db error: 1.0146343411762047e-09
from cs231n.layer_utils import conv_relu_forward, conv_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 8, 8)
w = np.random.randn(3, 3, 3, 3)
b = np.random.randn(3,)
dout = np.random.randn(2, 3, 8, 8)
conv_param = {'stride': 1, 'pad': 1}
out, cache = conv_relu_forward(x, w, b, conv_param)
dx, dw, db = conv_relu_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)
# Relative errors should be around e-8 or less
print('Testing conv_relu:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))
Testing conv_relu:
dx error: 1.5218619980349303e-09
dw error: 2.702022646099404e-10
db error: 1.451272393591721e-10

class ThreeLayerConvNet(object):
"""
A three-layer convolutional network with the following architecture:
conv - relu - 2x2 max pool - affine - relu - affine - softmax
The network operates on minibatches of data that have shape (N, C, H, W)
consisting of N images, each with height H and width W and with C input
channels.
"""
def __init__(
self,
input_dim=(3, 32, 32),
num_filters=32,
filter_size=7,
hidden_dim=100,
num_classes=10,
weight_scale=1e-3,
reg=0.0,
dtype=np.float32,
):
"""
Initialize a new network.
Inputs:
- input_dim: Tuple (C, H, W) giving size of input data
- num_filters: Number of filters to use in the convolutional layer
- filter_size: Width/height of filters to use in the convolutional layer
- hidden_dim: Number of units to use in the fully-connected hidden layer
- num_classes: Number of scores to produce from the final affine layer.
- weight_scale: Scalar giving standard deviation for random initialization
of weights.
- reg: Scalar giving L2 regularization strength
- dtype: numpy datatype to use for computation.
"""
self.params = {}
self.reg = reg
self.dtype = dtype
############################################################################
# TODO: Initialize weights and biases for the three-layer convolutional #
# network. Weights should be initialized from a Gaussian centered at 0.0 #
# with standard deviation equal to weight_scale; biases should be #
# initialized to zero. All weights and biases should be stored in the #
# dictionary self.params. Store weights and biases for the convolutional #
# layer using the keys 'W1' and 'b1'; use keys 'W2' and 'b2' for the #
# weights and biases of the hidden affine layer, and keys 'W3' and 'b3' #
# for the weights and biases of the output affine layer. #
# #
# IMPORTANT: For this assignment, you can assume that the padding #
# and stride of the first convolutional layer are chosen so that #
# **the width and height of the input are preserved**. Take a look at #
# the start of the loss() function to see how that happens. #
############################################################################
C, H, W = input_dim
# conv layer
self.params["W1"] = np.random.randn(num_filters, C, filter_size, filter_size) * weight_scale
self.params["b1"] = np.zeros(num_filters)
# affine layer
self.params["W2"] = np.random.randn(num_filters * H // 2 * W // 2, hidden_dim) * weight_scale
self.params["b2"] = np.zeros(hidden_dim)
self.params["W3"] = np.random.randn(hidden_dim, num_classes) * weight_scale
self.params["b3"] = np.zeros(num_classes)
############################################################################
# END OF YOUR CODE #
############################################################################
for k, v in self.params.items():
self.params[k] = v.astype(dtype)
def loss(self, X, y=None):
"""
Evaluate loss and gradient for the three-layer convolutional network.
Input / output: Same API as TwoLayerNet in fc_net.py.
"""
W1, b1 = self.params["W1"], self.params["b1"]
W2, b2 = self.params["W2"], self.params["b2"]
W3, b3 = self.params["W3"], self.params["b3"]
# pass conv_param to the forward pass for the convolutional layer
# Padding and stride chosen to preserve the input spatial size
filter_size = W1.shape[2]
conv_param = {"stride": 1, "pad": (filter_size - 1) // 2}
# pass pool_param to the forward pass for the max-pooling layer
pool_param = {"pool_height": 2, "pool_width": 2, "stride": 2}
scores = None
############################################################################
# TODO: Implement the forward pass for the three-layer convolutional net, #
# computing the class scores for X and storing them in the scores #
# variable. #
# #
# Remember you can use the functions defined in cs231n/fast_layers.py and #
# cs231n/layer_utils.py in your implementation (already imported). #
############################################################################
out, cn_cache = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
out, af1_cache = affine_relu_forward(out, W2, b2)
scores, af2_cache = affine_forward(out, W3, b3)
############################################################################
# END OF YOUR CODE #
############################################################################
if y is None:
return scores
loss, grads = 0, {}
############################################################################
# TODO: Implement the backward pass for the three-layer convolutional net, #
# storing the loss and gradients in the loss and grads variables. Compute #
# data loss using softmax, and make sure that grads[k] holds the gradients #
# for self.params[k]. Don't forget to add L2 regularization! #
# #
# NOTE: To ensure that your implementation matches ours and you pass the #
# automated tests, make sure that your L2 regularization includes a factor #
# of 0.5 to simplify the expression for the gradient. #
############################################################################
loss, dout = softmax_loss(scores, y)
dout, dw3, db3 = affine_backward(dout, af2_cache)
dout, dw2, db2 = affine_relu_backward(dout, af1_cache)
dout, dw1, db1 = conv_relu_pool_backward(dout, cn_cache)
loss += 0.5 * self.reg * (np.sum(W1**2) + np.sum(W2**2) + np.sum(W3**2))
grads["W3"] = dw3 + self.reg * W3
grads["b3"] = db3
grads["W2"] = dw2 + self.reg * W2
grads["b2"] = db2
grads["W1"] = dw1 + self.reg * W1
grads["b1"] = db1
############################################################################
# END OF YOUR CODE #
############################################################################
return loss, grads
model = ThreeLayerConvNet()
N = 50
X = np.random.randn(N, 3, 32, 32)
y = np.random.randint(10, size=N)
loss, grads = model.loss(X, y)
print('Initial loss (no regularization): ', loss)
model.reg = 0.5
loss, grads = model.loss(X, y)
print('Initial loss (with regularization): ', loss)
Initial loss (no regularization): 2.302583520950895
Initial loss (with regularization): 2.508822446407896

num_inputs = 2
input_dim = (3, 16, 16)
reg = 0.0
num_classes = 10
np.random.seed(231)
X = np.random.randn(num_inputs, *input_dim)
y = np.random.randint(num_classes, size=num_inputs)
model = ThreeLayerConvNet(
num_filters=3,
filter_size=3,
input_dim=input_dim,
hidden_dim=7,
dtype=np.float64
)
loss, grads = model.loss(X, y)
# Errors should be small, but correct implementations may have
# relative errors up to the order of e-2
for param_name in sorted(grads):
f = lambda _: model.loss(X, y)[0]
param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)
e = rel_error(param_grad_num, grads[param_name])
print('%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name])))
W1 max relative error: 3.053965e-04
W2 max relative error: 1.822723e-02
W3 max relative error: 3.422399e-04
b1 max relative error: 3.397321e-06
b2 max relative error: 2.517459e-03
b3 max relative error: 9.711800e-10

np.random.seed(231)
num_train = 100
small_data = {
'X_train': data['X_train'][:num_train],
'y_train': data['y_train'][:num_train],
'X_val': data['X_val'],
'y_val': data['y_val'],
}
model = ThreeLayerConvNet(weight_scale=1e-2)
solver = Solver(
model,
small_data,
num_epochs=15,
batch_size=50,
update_rule='adam',
optim_config={'learning_rate': 1e-3,},
verbose=True,
print_every=1
)
solver.train()
(Iteration 1 / 30) loss: 2.414060
(Epoch 0 / 15) train acc: 0.200000; val_acc: 0.137000
(Iteration 2 / 30) loss: 3.102719
(Epoch 1 / 15) train acc: 0.140000; val_acc: 0.087000
(Iteration 3 / 30) loss: 2.270332
(Iteration 4 / 30) loss: 2.099074
(Epoch 2 / 15) train acc: 0.230000; val_acc: 0.093000
(Iteration 5 / 30) loss: 1.841253
(Iteration 6 / 30) loss: 1.935296
(Epoch 3 / 15) train acc: 0.490000; val_acc: 0.168000
(Iteration 7 / 30) loss: 1.828834
(Iteration 8 / 30) loss: 1.652295
(Epoch 4 / 15) train acc: 0.520000; val_acc: 0.182000
(Iteration 9 / 30) loss: 1.332530
(Iteration 10 / 30) loss: 1.772358
(Epoch 5 / 15) train acc: 0.640000; val_acc: 0.171000
(Iteration 11 / 30) loss: 1.029629
(Iteration 12 / 30) loss: 1.038692
(Epoch 6 / 15) train acc: 0.720000; val_acc: 0.226000
(Iteration 13 / 30) loss: 1.152896
(Iteration 14 / 30) loss: 0.834351
(Epoch 7 / 15) train acc: 0.810000; val_acc: 0.248000
(Iteration 15 / 30) loss: 0.584665
(Iteration 16 / 30) loss: 0.644552
(Epoch 8 / 15) train acc: 0.830000; val_acc: 0.240000
(Iteration 17 / 30) loss: 0.811508
(Iteration 18 / 30) loss: 0.430228
(Epoch 9 / 15) train acc: 0.840000; val_acc: 0.175000
(Iteration 19 / 30) loss: 0.421580
(Iteration 20 / 30) loss: 0.555581
(Epoch 10 / 15) train acc: 0.930000; val_acc: 0.197000
(Iteration 21 / 30) loss: 0.364954
(Iteration 22 / 30) loss: 0.271303
(Epoch 11 / 15) train acc: 0.860000; val_acc: 0.215000
(Iteration 23 / 30) loss: 0.405969
(Iteration 24 / 30) loss: 0.385744
(Epoch 12 / 15) train acc: 0.950000; val_acc: 0.207000
(Iteration 25 / 30) loss: 0.109979
(Iteration 26 / 30) loss: 0.113894
(Epoch 13 / 15) train acc: 0.960000; val_acc: 0.218000
(Iteration 27 / 30) loss: 0.123754
(Iteration 28 / 30) loss: 0.172242
(Epoch 14 / 15) train acc: 0.990000; val_acc: 0.222000
(Iteration 29 / 30) loss: 0.123923
(Iteration 30 / 30) loss: 0.070897
(Epoch 15 / 15) train acc: 0.990000; val_acc: 0.218000
# Print final training accuracy.
print(
"Small data training accuracy:",
solver.check_accuracy(small_data['X_train'], small_data['y_train'])
)
Small data training accuracy: 0.81
# Print final validation accuracy.
print(
"Small data validation accuracy:",
solver.check_accuracy(small_data['X_val'], small_data['y_val'])
)
Small data validation accuracy: 0.248
plt.subplot(2, 1, 1)
plt.plot(solver.loss_history, 'o')
plt.xlabel('iteration')
plt.ylabel('loss')
plt.subplot(2, 1, 2)
plt.plot(solver.train_acc_history, '-o')
plt.plot(solver.val_acc_history, '-o')
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()


model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)
solver = Solver(
model,
data,
num_epochs=1,
batch_size=50,
update_rule='adam',
optim_config={'learning_rate': 1e-3,},
verbose=True,
print_every=20
)
solver.train()
(Iteration 1 / 980) loss: 2.304740
(Epoch 0 / 1) train acc: 0.103000; val_acc: 0.107000
(Iteration 21 / 980) loss: 2.129645
(Iteration 41 / 980) loss: 1.945061
(Iteration 61 / 980) loss: 1.770560
(Iteration 81 / 980) loss: 1.901148
(Iteration 101 / 980) loss: 1.919195
(Iteration 121 / 980) loss: 1.765398
(Iteration 141 / 980) loss: 1.898770
(Iteration 161 / 980) loss: 1.738502
(Iteration 181 / 980) loss: 1.843249
(Iteration 201 / 980) loss: 2.066862
(Iteration 221 / 980) loss: 1.990459
(Iteration 241 / 980) loss: 1.828815
(Iteration 261 / 980) loss: 1.604638
(Iteration 281 / 980) loss: 1.692068
(Iteration 301 / 980) loss: 1.754931
(Iteration 321 / 980) loss: 1.742302
(Iteration 341 / 980) loss: 1.699758
(Iteration 361 / 980) loss: 1.723792
(Iteration 381 / 980) loss: 1.542635
(Iteration 401 / 980) loss: 1.738328
(Iteration 421 / 980) loss: 1.427555
(Iteration 441 / 980) loss: 1.759026
(Iteration 461 / 980) loss: 1.879609
(Iteration 481 / 980) loss: 1.413088
(Iteration 501 / 980) loss: 1.367136
(Iteration 521 / 980) loss: 1.637779
(Iteration 541 / 980) loss: 1.749722
(Iteration 561 / 980) loss: 1.645201
(Iteration 581 / 980) loss: 1.373341
(Iteration 601 / 980) loss: 1.573957
(Iteration 621 / 980) loss: 1.570977
(Iteration 641 / 980) loss: 1.683598
(Iteration 661 / 980) loss: 1.755739
(Iteration 681 / 980) loss: 1.741579
(Iteration 701 / 980) loss: 1.627704
(Iteration 721 / 980) loss: 1.655453
(Iteration 741 / 980) loss: 1.634793
(Iteration 761 / 980) loss: 1.447560
(Iteration 781 / 980) loss: 1.793672
(Iteration 801 / 980) loss: 1.627485
(Iteration 821 / 980) loss: 1.656835
(Iteration 841 / 980) loss: 1.351238
(Iteration 861 / 980) loss: 1.745392
(Iteration 881 / 980) loss: 1.552356
(Iteration 901 / 980) loss: 1.539406
(Iteration 921 / 980) loss: 1.492501
(Iteration 941 / 980) loss: 1.683674
(Iteration 961 / 980) loss: 1.638435
(Epoch 1 / 1) train acc: 0.480000; val_acc: 0.485000
# Print final training accuracy.
print(
"Full data training accuracy:",
solver.check_accuracy(data['X_train'], data['y_train'])
)
Full data training accuracy: 0.46477551020408164
# Print final validation accuracy.
print(
"Full data validation accuracy:",
solver.check_accuracy(data['X_val'], data['y_val'])
)
Full data validation accuracy: 0.485

from cs231n.vis_utils import visualize_grid
grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))
plt.axis('off')
plt.gcf().set_size_inches(5, 5)
plt.show()



np.random.seed(231)
# Check the training-time forward pass by checking means and variances
# of features both before and after spatial batch normalization.
N, C, H, W = 2, 3, 4, 5
x = 4 * np.random.randn(N, C, H, W) + 10
print('Before spatial batch normalization:')
print(' shape: ', x.shape)
print(' means: ', x.mean(axis=(0, 2, 3)))
print(' stds: ', x.std(axis=(0, 2, 3)))
# Means should be close to zero and stds close to one
gamma, beta = np.ones(C), np.zeros(C)
bn_param = {'mode': 'train'}
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization:')
print(' shape: ', out.shape)
print(' means: ', out.mean(axis=(0, 2, 3)))
print(' stds: ', out.std(axis=(0, 2, 3)))
# Means should be close to beta and stds close to gamma
gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization (nontrivial gamma, beta):')
print(' shape: ', out.shape)
print(' means: ', out.mean(axis=(0, 2, 3)))
print(' stds: ', out.std(axis=(0, 2, 3)))
np.random.seed(231)
# Check the training-time forward pass by checking means and variances
# of features both before and after spatial batch normalization.
N, C, H, W = 2, 3, 4, 5
x = 4 * np.random.randn(N, C, H, W) + 10
print('Before spatial batch normalization:')
print(' shape: ', x.shape)
print(' means: ', x.mean(axis=(0, 2, 3)))
print(' stds: ', x.std(axis=(0, 2, 3)))
# Means should be close to zero and stds close to one
gamma, beta = np.ones(C), np.zeros(C)
bn_param = {'mode': 'train'}
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization:')
print(' shape: ', out.shape)
print(' means: ', out.mean(axis=(0, 2, 3)))
print(' stds: ', out.std(axis=(0, 2, 3)))
# Means should be close to beta and stds close to gamma
gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization (nontrivial gamma, beta):')
print(' shape: ', out.shape)
print(' means: ', out.mean(axis=(0, 2, 3)))
print(' stds: ', out.std(axis=(0, 2, 3)))
Before spatial batch normalization:
shape: (2, 3, 4, 5)
means: [9.33463814 8.90909116 9.11056338]
stds: [3.61447857 3.19347686 3.5168142 ]
After spatial batch normalization:
shape: (2, 3, 4, 5)
means: [-3.33066907e-16 2.22044605e-17 -1.27675648e-16]
stds: [0.99999962 0.99999951 0.9999996 ]
After spatial batch normalization (nontrivial gamma, beta):
shape: (2, 3, 4, 5)
means: [6. 7. 8.]
stds: [2.99999885 3.99999804 4.99999798]
np.random.seed(231)
# Check the test-time forward pass by running the training-time
# forward pass many times to warm up the running averages, and then
# checking the means and variances of activations after a test-time
# forward pass.
N, C, H, W = 10, 4, 11, 12
bn_param = {'mode': 'train'}
gamma = np.ones(C)
beta = np.zeros(C)
for t in range(50):
x = 2.3 * np.random.randn(N, C, H, W) + 13
spatial_batchnorm_forward(x, gamma, beta, bn_param)
bn_param['mode'] = 'test'
x = 2.3 * np.random.randn(N, C, H, W) + 13
a_norm, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
# Means should be close to zero and stds close to one, but will be
# noisier than training-time forward passes.
print('After spatial batch normalization (test-time):')
print(' means: ', a_norm.mean(axis=(0, 2, 3)))
print(' stds: ', a_norm.std(axis=(0, 2, 3)))
After spatial batch normalization (test-time):
means: [-0.08034406 0.07562881 0.05716371 0.04378383]
stds: [0.96718744 1.0299714 1.02887624 1.00585577]

np.random.seed(231)
# Check the test-time forward pass by running the training-time
# forward pass many times to warm up the running averages, and then
# checking the means and variances of activations after a test-time
# forward pass.
N, C, H, W = 10, 4, 11, 12
bn_param = {'mode': 'train'}
gamma = np.ones(C)
beta = np.zeros(C)
for t in range(50):
x = 2.3 * np.random.randn(N, C, H, W) + 13
spatial_batchnorm_forward(x, gamma, beta, bn_param)
bn_param['mode'] = 'test'
x = 2.3 * np.random.randn(N, C, H, W) + 13
a_norm, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
# Means should be close to zero and stds close to one, but will be
# noisier than training-time forward passes.
print('After spatial batch normalization (test-time):')
print(' means: ', a_norm.mean(axis=(0, 2, 3)))
print(' stds: ', a_norm.std(axis=(0, 2, 3)))
np.random.seed(231)
N, C, H, W = 2, 3, 4, 5
x = 5 * np.random.randn(N, C, H, W) + 12
gamma = np.random.randn(C)
beta = np.random.randn(C)
dout = np.random.randn(N, C, H, W)
bn_param = {'mode': 'train'}
fx = lambda x: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
fg = lambda a: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
fb = lambda b: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
dx_num = eval_numerical_gradient_array(fx, x, dout)
da_num = eval_numerical_gradient_array(fg, gamma, dout)
db_num = eval_numerical_gradient_array(fb, beta, dout)
#You should expect errors of magnitudes between 1e-12~1e-06
_, cache = spatial_batchnorm_forward(x, gamma, beta, bn_param)
dx, dgamma, dbeta = spatial_batchnorm_backward(dout, cache)
print('dx error: ', rel_error(dx_num, dx))
print('dgamma error: ', rel_error(da_num, dgamma))
print('dbeta error: ', rel_error(db_num, dbeta))
dx error: 2.786648193872555e-07
dgamma error: 7.097288082068512e-12
dbeta error: 3.2755517433052766e-12



def spatial_groupnorm_forward(x, gamma, beta, G, gn_param):
"""Computes the forward pass for spatial group normalization.
In contrast to layer normalization, group normalization splits each entry in the data into G
contiguous pieces, which it then normalizes independently. Per-feature shifting and scaling
are then applied to the data, in a manner identical to that of batch normalization and layer
normalization.
Inputs:
- x: Input data of shape (N, C, H, W)
- gamma: Scale parameter, of shape (1, C, 1, 1)
- beta: Shift parameter, of shape (1, C, 1, 1)
- G: Integer number of groups to split into, should be a divisor of C
- gn_param: Dictionary with the following keys:
- eps: Constant for numeric stability
Returns a tuple of:
- out: Output data, of shape (N, C, H, W)
- cache: Values needed for the backward pass
"""
out, cache = None, None
eps = gn_param.get("eps", 1e-5)
###########################################################################
# TODO: Implement the forward pass for spatial group normalization. #
# This will be extremely similar to the layer norm implementation. #
# In particular, think about how you could transform the matrix so that #
# the bulk of the code is similar to both train-time batch normalization #
# and layer normalization! #
###########################################################################
N, C, H, W = x.shape
# Step 1: reshape x to group the channels
x_group = x.reshape(N, G, C // G, H, W)
# Step 2: compute mean and variance over group (per sample)
mean = np.mean(x_group, axis=(2, 3, 4), keepdims=True) # shape (N, G, 1, 1, 1)
var = np.var(x_group, axis=(2, 3, 4), keepdims=True) # shape (N, G, 1, 1, 1)
# Step 3: normalize
x_groupnorm = (x_group - mean) / np.sqrt(var + eps)
# Step 4: reshape back to (N, C, H, W)
x_hat = x_groupnorm.reshape(N, C, H, W)
# Step 5: apply scale and shift
out = gamma * x_hat + beta
# Cache for backward
cache = (G, x, x_hat, mean, var, gamma, beta, eps)
###########################################################################
# END OF YOUR CODE #
###########################################################################
return out, cache
np.random.seed(231)
# Check the training-time forward pass by checking means and variances
# of features both before and after spatial batch normalization.
N, C, H, W = 2, 6, 4, 5
G = 2
x = 4 * np.random.randn(N, C, H, W) + 10
x_g = x.reshape((N*G,-1))
print('Before spatial group normalization:')
print(' shape: ', x.shape)
print(' means: ', x_g.mean(axis=1))
print(' stds: ', x_g.std(axis=1))
# Means should be close to zero and stds close to one
gamma, beta = np.ones((1,C,1,1)), np.zeros((1,C,1,1))
bn_param = {'mode': 'train'}
out, _ = spatial_groupnorm_forward(x, gamma, beta, G, bn_param)
out_g = out.reshape((N*G,-1))
print('After spatial group normalization:')
print(' shape: ', out.shape)
print(' means: ', out_g.mean(axis=1))
print(' stds: ', out_g.std(axis=1))
Before spatial group normalization:
shape: (2, 6, 4, 5)
means: [9.72505327 8.51114185 8.9147544 9.43448077]
stds: [3.67070958 3.09892597 4.27043622 3.97521327]
After spatial group normalization:
shape: (2, 6, 4, 5)
means: [-2.14643118e-16 5.25505565e-16 2.65528340e-16 -3.38618023e-16]
stds: [0.99999963 0.99999948 0.99999973 0.99999968]

def spatial_groupnorm_backward(dout, cache):
"""Computes the backward pass for spatial group normalization.
Inputs:
- dout: Upstream derivatives, of shape (N, C, H, W)
- cache: Values from the forward pass
Returns a tuple of:
- dx: Gradient with respect to inputs, of shape (N, C, H, W)
- dgamma: Gradient with respect to scale parameter, of shape (1, C, 1, 1)
- dbeta: Gradient with respect to shift parameter, of shape (1, C, 1, 1)
"""
dx, dgamma, dbeta = None, None, None
###########################################################################
# TODO: Implement the backward pass for spatial group normalization. #
# This will be extremely similar to the layer norm implementation. #
###########################################################################
G, x, x_hat, mean, var, gamma, beta, eps = cache
N, C, H, W = dout.shape
# Step 1: reshape
x_group = x.reshape(N, G, C // G, H, W)
x_hat_group = x_hat.reshape(N, G, C // G, H, W)
dout_group = dout.reshape(N, G, C // G, H, W)
# Step 2: compute dbeta and dgamma
dbeta = np.sum(dout, axis=(0, 2, 3), keepdims=True) # shape (1, C, 1, 1)
dgamma = np.sum(dout * x_hat, axis=(0, 2, 3), keepdims=True)
# Step 3: gradient through scale and shift
dx_hat = dout * gamma # shape (N, C, H, W)
dx_hat_group = dx_hat.reshape(N, G, C // G, H, W)
# m: number of elements per group
m = C // G * H * W
# Step 4: group norm backward (from layernorm derivation)
dvar = np.sum(dx_hat_group * (x_group - mean) * -0.5 * (var + eps) ** (-3/2), axis=(2, 3, 4), keepdims=True)
dmean = np.sum(-dx_hat_group / np.sqrt(var + eps), axis=(2, 3, 4), keepdims=True) + \
dvar * np.mean(-2.0 * (x_group - mean), axis=(2, 3, 4), keepdims=True)
dx_group = dx_hat_group / np.sqrt(var + eps) + \
dvar * 2 * (x_group - mean) / m + \
dmean / m
# Step 5: reshape back to (N, C, H, W)
dx = dx_group.reshape(N, C, H, W)
###########################################################################
# END OF YOUR CODE #
###########################################################################
return dx, dgamma, dbeta
np.random.seed(231)
N, C, H, W = 2, 6, 4, 5
G = 2
x = 5 * np.random.randn(N, C, H, W) + 12
gamma = np.random.randn(1,C,1,1)
beta = np.random.randn(1,C,1,1)
dout = np.random.randn(N, C, H, W)
gn_param = {}
fx = lambda x: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]
fg = lambda a: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]
fb = lambda b: spatial_groupnorm_forward(x, gamma, beta, G, gn_param)[0]
dx_num = eval_numerical_gradient_array(fx, x, dout)
da_num = eval_numerical_gradient_array(fg, gamma, dout)
db_num = eval_numerical_gradient_array(fb, beta, dout)
_, cache = spatial_groupnorm_forward(x, gamma, beta, G, gn_param)
dx, dgamma, dbeta = spatial_groupnorm_backward(dout, cache)
# You should expect errors of magnitudes between 1e-12 and 1e-07.
print('dx error: ', rel_error(dx_num, dx))
print('dgamma error: ', rel_error(da_num, dgamma))
print('dbeta error: ', rel_error(db_num, dbeta))
dx error: 7.413109648400194e-08
dgamma error: 9.468195772749234e-12
dbeta error: 3.354494437653335e-12