Instructions

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 2

DownloadInstructions
Open PDF In BrowserView PDF
CSC421/2516 Winter 2019

Homework 1

Homework 1
Deadline: Thursday, Jan. 24, at 11:59pm.
Submission: You must submit your solutions as a PDF file through MarkUs1 . You can produce
the file however you like (e.g. LaTeX, Microsoft Word, scanner), as long as it is readable.
Late Submission: MarkUs will remain open until 3 days after the deadline, after which no late
submissions will be accepted.
Weekly homeworks are individual work. See the Course Information handout2 for detailed policies.
1. Hard-Coding a Network. [2pts] In this problem, you need to find a set of weights and
biases for a multilayer perceptron which determines if a list of length 4 is in sorted order.
More specifically, you receive four inputs x1 , . . . , x4 , where xi ∈ R, and the network must
output 1 if x1 < x2 < x3 < x4 , and 0 otherwise. You will use the following architecture:

All of the hidden units and the output unit use a hard threshold activation function:

1 if z ≥ 0
φ(z) =
0 if z < 0
Please give a set of weights and biases for the network which correctly implements this function
(including cases where some of the inputs are equal). Your answer should include:
• A 3 × 4 weight matrix W(1) for the hidden layer
• A 3-dimensional vector of biases b(1) for the hidden layer
• A 3-dimensional weight vector w(2) for the output layer
• A scalar bias b(2) for the output layer
You do not need to show your work.
2. Backprop. Consider a neural network with N input units, N output units, and K hidden
units. The activations are computed as follows:
z = W(1) x + b(1)
h = σ(z)
y = x + W(2) h + b(2) ,
1
2

https://markus.teach.cs.toronto.edu/csc421-2019-01
http://www.cs.toronto.edu/~rgrosse/courses/csc421_2019/syllabus.pdf

1

CSC421/2516 Winter 2019

Homework 1

where σ denotes the logistic function, applied elementwise. The cost will involve both h and
y:
J =R+S
R = r> h
1
S = ky − sk2
2
for given vectors r and s.
• [1pt] Draw the computation graph relating x, z, h, y, R, S, and J .
• [3pts] Derive the backprop equations for computing x = ∂J /∂x. You may use σ 0 to
denote the derivative of the logistic function (so you don’t need to write it out explicitly).
3. Sparsifying Activation Function. [4pts] One of the interesting features of the ReLU
activation function is that it sparsifies the activations and the derivatives, i.e. sets a large
fraction of the values to zero for any given input vector. Consider the following network:

Note that each wi refers to the weight on a single connection, not the whole layer. Suppose
we are trying to minimize a loss function L which depends only on the activation of the
output unit y. (For instance, L could be the squared error loss 21 (y − t)2 .) Suppose the unit
h1 receives an input of -1 on a particular training case, so the ReLU evaluates to 0. Based
only on this information, which of the weight derivatives
∂L ∂L ∂L
,
,
∂w1 ∂w2 ∂w3
are guaranteed to be 0 for this training case? Write YES or NO for each. Justify your
answers.

2



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 2
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.19
Create Date                     : 2019:01:12 21:39:40-05:00
Modify Date                     : 2019:01:12 21:39:40-05:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) kpathsea version 6.3.0
EXIF Metadata provided by EXIF.tools

Navigation menu