Neural Network User Guide

User Manual:

Open the PDF directly: View PDF .
Page Count: 558 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Neural Network Toolbox™

User's Guide

Mark Hudson Beale

Martin T. Hagan

Howard B. Demuth

R2018a

How to Contact MathWorks

See Also

More About

• “Four Levels of Neural Network Design” on page 1-4

• “Neuron Model” on page 1-5

• “Neural Network Architectures” on page 1-11

• “Understanding Neural Network Toolbox Data Structures” on page 1-23

See Also

1-3

Four Levels of Neural Network Design

There are four dierent levels at which the Neural Network Toolbox software can be

used. The rst level is represented by the GUIs that are described in “Getting Started

with Neural Network Toolbox”. These provide a quick way to access the power of the

toolbox for many problems of function tting, pattern recognition, clustering and time

series analysis.

The second level of toolbox use is through basic command-line operations. The command-

line functions use simple argument lists with intelligent default settings for function

parameters. (You can override all of the default settings, for increased functionality.) This

topic, and the ones that follow, concentrate on command-line operations.

The GUIs described in Getting Started can automatically generate MATLAB code les

with the command-line implementation of the GUI operations. This provides a nice

introduction to the use of the command-line functionality.

A third level of toolbox use is customization of the toolbox. This advanced capability

allows you to create your own custom neural networks, while still having access to the full

functionality of the toolbox.

The fourth level of toolbox usage is the ability to modify any of the code les contained in

the toolbox. Every computational component is written in MATLAB code and is fully

accessible.

The rst level of toolbox use (through the GUIs) is described in Getting Started which also

introduces command-line operations. The following topics will discuss the command-line

operations in more detail. The customization of the toolbox is described in “Dene

Shallow Neural Network Architectures”.

See Also

More About

•“Workow for Neural Network Design” on page 1-2

1Neural Network Objects, Data, and Training Styles

1-4

Neuron Model

In this section...

“Simple Neuron” on page 1-5

“Transfer Functions” on page 1-6

“Neuron with Vector Input” on page 1-7

Simple Neuron

The fundamental building block for neural networks is the single-input neuron, such as

this example.

There are three distinct functional operations that take place in this example neuron.

First, the scalar input p is multiplied by the scalar weight w to form the product wp, again

a scalar. Second, the weighted input wp is added to the scalar bias b to form the net input

n. (In this case, you can view the bias as shifting the function f to the left by an amount b.

The bias is much like a weight, except that it has a constant input of 1.) Finally, the net

input is passed through the transfer function f, which produces the scalar output a. The

names given to these three processes are: the weight function, the net input function and

the transfer function.

For many types of neural networks, the weight function is a product of a weight times the

input, but other weight functions (e.g., the distance between the weight and the input, |w

− p|) are sometimes used. (For a list of weight functions, type help nnweight.) The

most common net input function is the summation of the weighted inputs with the bias,

but other operations, such as multiplication, can be used. (For a list of net input functions,

type help nnnetinput.) “Introduction to Radial Basis Neural Networks” on page 7-2

Neuron Model

1-5

discusses how distance can be used as the weight function and multiplication can be used

as the net input function. There are also many types of transfer functions. Examples of

various transfer functions are in “Transfer Functions” on page 1-6. (For a list of

transfer functions, type help nntransfer.)

Note that w and b are both adjustable scalar parameters of the neuron. The central idea

of neural networks is that such parameters can be adjusted so that the network exhibits

some desired or interesting behavior. Thus, you can train the network to do a particular

job by adjusting the weight or bias parameters.

All the neurons in the Neural Network Toolbox software have provision for a bias, and a

bias is used in many of the examples and is assumed in most of this toolbox. However, you

can omit a bias in a neuron if you want.

Transfer Functions

Many transfer functions are included in the Neural Network Toolbox software.

Two of the most commonly used functions are shown below.

The following gure illustrates the linear transfer function.

Neurons of this type are used in the nal layer of multilayer networks that are used as

function approximators. This is shown in “Multilayer Neural Networks and

Backpropagation Training” on page 4-2.

The sigmoid transfer function shown below takes the input, which can have any value

between plus and minus innity, and squashes the output into the range 0 to 1.

1Neural Network Objects, Data, and Training Styles

1-6

This transfer function is commonly used in the hidden layers of multilayer networks, in

part because it is dierentiable.

The symbol in the square to the right of each transfer function graph shown above

represents the associated transfer function. These icons replace the general f in the

network diagram blocks to show the particular transfer function being used.

For a complete list of transfer functions, type help nntransfer. You can also specify

your own transfer functions.

You can experiment with a simple neuron and various transfer functions by running the

example program nnd2n1.

Neuron with Vector Input

The simple neuron can be extended to handle inputs that are vectors. A neuron with a

single R-element input vector is shown below. Here the individual input elements

p p pR1 2

,,…

are multiplied by weights

w w w R1 1 1 2 1, , ,

, ,…

and the weighted values are fed to the summing junction. Their sum is simply Wp, the dot

product of the (single row) matrix W and the vector p. (There are other weight functions,

in addition to the dot product, such as the distance between the row of the weight matrix

and the input vector, as in “Introduction to Radial Basis Neural Networks” on page 7-

2.)

Neuron Model

1-7

The neuron has a bias b, which is summed with the weighted inputs to form the net input

n. (In addition to the summation, other net input functions can be used, such as the

multiplication that is used in “Introduction to Radial Basis Neural Networks” on page 7-

2.) The net input n is the argument of the transfer function f.

n w p w p w p b

R R

= + + + +

1 1 1 1 2 2 1, , ,

…

This expression can, of course, be written in MATLAB code as

n = W*p + b

However, you will seldom be writing code at this level, for such code is already built into

functions to dene and simulate entire networks.

Abbreviated Notation

The gure of a single neuron shown above contains a lot of detail. When you consider

networks with many neurons, and perhaps layers of many neurons, there is so much

detail that the main thoughts tend to be lost. Thus, the authors have devised an

abbreviated notation for an individual neuron. This notation, which is used later in

circuits of multiple neurons, is shown here.

1Neural Network Objects, Data, and Training Styles

1-8

Here the input vector p is represented by the solid dark vertical bar at the left. The

dimensions of p are shown below the symbol p in the gure as R × 1. (Note that a capital

letter, such as R in the previous sentence, is used when referring to the size of a vector.)

Thus, p is a vector of R input elements. These inputs postmultiply the single-row, R-

column matrix W. As before, a constant 1 enters the neuron as an input and is multiplied

by a scalar bias b. The net input to the transfer function f is n, the sum of the bias b and

the product Wp. This sum is passed to the transfer function f to get the neuron's output a,

which in this case is a scalar. Note that if there were more than one neuron, the network

output would be a vector.

A layer of a network is dened in the previous gure. A layer includes the weights, the

multiplication and summing operations (here realized as a vector product Wp), the bias b,

and the transfer function f. The array of inputs, vector p, is not included in or called a

layer.

As with the “Simple Neuron” on page 1-5, there are three operations that take place in

the layer: the weight function (matrix multiplication, or dot product, in this case), the net

input function (summation, in this case), and the transfer function.

Each time this abbreviated network notation is used, the sizes of the matrices are shown

just below their matrix variable names. This notation will allow you to understand the

architectures and follow the matrix mathematics associated with them.

As discussed in “Transfer Functions” on page 1-6, when a specic transfer function is to

be used in a gure, the symbol for that transfer function replaces the f shown above. Here

are some examples.

Neuron Model

1-9

You can experiment with a two-element neuron by running the example program nnd2n2.

See Also

More About

• “Neural Network Architectures” on page 1-11

•“Workow for Neural Network Design” on page 1-2

1Neural Network Objects, Data, and Training Styles

1-10

Neural Network Architectures

In this section...

“One Layer of Neurons” on page 1-11

“Multiple Layers of Neurons” on page 1-13

“Input and Output Processing Functions” on page 1-15

Two or more of the neurons shown earlier can be combined in a layer, and a particular

network could contain one or more such layers. First consider a single layer of neurons.

One Layer of Neurons

A one-layer network with R input elements and S neurons follows.

In this network, each element of the input vector p is connected to each neuron input

through the weight matrix W. The ith neuron has a summer that gathers its weighted

inputs and bias to form its own scalar output n(i). The various n(i) taken together form an

S-element net input vector n. Finally, the neuron layer outputs form a column vector a.

The expression for a is shown at the bottom of the gure.

Neural Network Architectures

1-11

Note that it is common for the number of inputs to a layer to be dierent from the number

of neurons (i.e., R is not necessarily equal to S). A layer is not constrained to have the

number of its inputs equal to the number of its neurons.

You can create a single (composite) layer of neurons having dierent transfer functions

simply by putting two of the networks shown earlier in parallel. Both networks would

have the same inputs, and each network would create some of the outputs.

The input vector elements enter the network through the weight matrix W.

w w w

S S S R

1 1 1 2 1

2 1 2 2 2

1 2

, , ,

…

Note that the row indices on the elements of matrix W indicate the destination neuron of

the weight, and the column indices indicate which source is the input for that weight.

Thus, the indices in w1,2 say that the strength of the signal from the second input element

to the rst (and only) neuron is w1,2.

The S neuron R-input one-layer network also can be drawn in abbreviated notation.

Here p is an R-length input vector, W is an S × R matrix, a and b are S-length vectors. As

dened previously, the neuron layer includes the weight matrix, the multiplication

operations, the bias vector b, the summer, and the transfer function blocks.

1Neural Network Objects, Data, and Training Styles

1-12

Inputs and Layers

To describe networks having multiple layers, the notation must be extended. Specically,

it needs to make a distinction between weight matrices that are connected to inputs and

weight matrices that are connected between layers. It also needs to identify the source

and destination for the weight matrices.

We will call weight matrices connected to inputs input weights; we will call weight

matrices connected to layer outputs layer weights. Further, superscripts are used to

identify the source (second index) and the destination (rst index) for the various weights

and other elements of the network. To illustrate, the one-layer multiple input network

shown earlier is redrawn in abbreviated form here.

As you can see, the weight matrix connected to the input vector p is labeled as an input

weight matrix (IW1,1) having a source 1 (second index) and a destination 1 (rst index).

Elements of layer 1, such as its bias, net input, and output have a superscript 1 to say that

they are associated with the rst layer.

“Multiple Layers of Neurons” on page 1-13 uses layer weight (LW) matrices as well as

input weight (IW) matrices.

Multiple Layers of Neurons

A network can have several layers. Each layer has a weight matrix W, a bias vector b, and

an output vector a. To distinguish between the weight matrices, output vectors, etc., for

each of these layers in the gures, the number of the layer is appended as a superscript

to the variable of interest. You can see the use of this layer notation in the three-layer

network shown next, and in the equations at the bottom of the gure.

Neural Network Architectures

1-13

The network shown above has R1 inputs, S1 neurons in the rst layer, S2 neurons in the

second layer, etc. It is common for dierent layers to have dierent numbers of neurons.

A constant input 1 is fed to the bias for each neuron.

Note that the outputs of each intermediate layer are the inputs to the following layer.

Thus layer 2 can be analyzed as a one-layer network with S1 inputs, S2 neurons, and an S2

× S1 weight matrix W2. The input to layer 2 is a1; the output is a2. Now that all the

vectors and matrices of layer 2 have been identied, it can be treated as a single-layer

network on its own. This approach can be taken with any layer of the network.

The layers of a multilayer network play dierent roles. A layer that produces the network

output is called an output layer. All other layers are called hidden layers. The three-layer

network shown earlier has one output layer (layer 3) and two hidden layers (layer 1 and

layer 2). Some authors refer to the inputs as a fourth layer. This toolbox does not use that

designation.

The architecture of a multilayer network with a single input vector can be specied with

the notation R − S1 − S2 −...− SM, where the number of elements of the input vector and

the number of neurons in each layer are specied.

The same three-layer network can also be drawn using abbreviated notation.

1Neural Network Objects, Data, and Training Styles

1-14

Multiple-layer networks are quite powerful. For instance, a network of two layers, where

the rst layer is sigmoid and the second layer is linear, can be trained to approximate any

function (with a nite number of discontinuities) arbitrarily well. This kind of two-layer

network is used extensively in “Multilayer Neural Networks and Backpropagation

Training” on page 4-2.

Here it is assumed that the output of the third layer, a3, is the network output of interest,

and this output is labeled as y. This notation is used to specify the output of multilayer

networks.

Input and Output Processing Functions

Network inputs might have associated processing functions. Processing functions

transform user input data to a form that is easier or more eicient for a network.

For instance, mapminmax transforms input data so that all values fall into the interval

[−1, 1]. This can speed up learning for many networks. removeconstantrows removes

the rows of the input vector that correspond to input elements that always have the same

value, because these input elements are not providing any useful information to the

network. The third common processing function is fixunknowns, which recodes

unknown data (represented in the user's data with NaN values) into a numerical form for

the network. fixunknowns preserves information about which values are known and

which are unknown.

Similarly, network outputs can also have associated processing functions. Output

processing functions are used to transform user-provided target vectors for network use.

Neural Network Architectures

1-15

Then, network outputs are reverse-processed using the same functions to produce output

data with the same characteristics as the original user-provided targets.

Both mapminmax and removeconstantrows are often associated with network outputs.

However, fixunknowns is not. Unknown values in targets (represented by NaN values) do

not need to be altered for network use.

Processing functions are described in more detail in “Choose Neural Network Input-

Output Processing Functions” on page 4-9.

See Also

More About

• “Neuron Model” on page 1-5

•“Workow for Neural Network Design” on page 1-2

1Neural Network Objects, Data, and Training Styles

1-16

Create Neural Network Object

This topic is part of the design workow described in “Workow for Neural Network

Design” on page 1-2.

The easiest way to create a neural network is to use one of the network creation

functions. To investigate how this is done, you can create a simple, two-layer feedforward

network, using the command feedforwardnet:

net = feedforwardnet

net =

Neural Network

userdata: (your custom info)

dimensions:

numInputs: 1

numLayers: 2

numOutputs: 1

numInputDelays: 0

numLayerDelays: 0

numFeedbackDelays: 0

numWeightElements: 10

sampleTime: 1

connections:

biasConnect: [1; 1]

inputConnect: [1; 0]

layerConnect: [0 0; 1 0]

outputConnect: [0 1]

subobjects:

inputs: {1x1 cell array of 1 input}

layers: {2x1 cell array of 2 layers}

outputs: {1x2 cell array of 1 output}

biases: {2x1 cell array of 2 biases}

inputWeights: {2x1 cell array of 1 weight}

layerWeights: {2x2 cell array of 1 weight}

Create Neural Network Object

1-17

functions:

adaptFcn: 'adaptwb'

adaptParam: (none)

derivFcn: 'defaultderiv'

divideFcn: 'dividerand'

divideParam: .trainRatio, .valRatio, .testRatio

divideMode: 'sample'

initFcn: 'initlay'

performFcn: 'mse'

performParam: .regularization, .normalization

plotFcns: {'plotperform', plottrainstate, ploterrhist,

plotregression}

plotParams: {1x4 cell array of 4 params}

trainFcn: 'trainlm'

trainParam: .showWindow, .showCommandLine, .show, .epochs,

.time, .goal, .min_grad, .max_fail, .mu, .mu_dec,

.mu_inc, .mu_max

weight and bias values:

IW: {2x1 cell} containing 1 input weight matrix

LW: {2x2 cell} containing 1 layer weight matrix

b: {2x1 cell} containing 2 bias vectors

methods:

adapt: Learn while in continuous use

configure: Configure inputs & outputs

gensim: Generate Simulink model

init: Initialize weights & biases

perform: Calculate performance

sim: Evaluate network outputs given inputs

train: Train network with examples

view: View diagram

unconfigure: Unconfigure inputs & outputs

evaluate: outputs = net(inputs)

This display is an overview of the network object, which is used to store all of the

information that denes a neural network. There is a lot of detail here, but there are a

few key sections that can help you to see how the network object is organized.

1Neural Network Objects, Data, and Training Styles

1-18

The dimensions section stores the overall structure of the network. Here you can see that

there is one input to the network (although the one input can be a vector containing many

elements), one network output, and two layers.

The connections section stores the connections between components of the network. For

example, there is a bias connected to each layer, the input is connected to layer 1, and the

output comes from layer 2. You can also see that layer 1 is connected to layer 2. (The

rows of net.layerConnect represent the destination layer, and the columns represent

the source layer. A one in this matrix indicates a connection, and a zero indicates no

connection. For this example, there is a single one in element 2,1 of the matrix.)

The key subobjects of the network object are inputs, layers, outputs, biases,

inputWeights, and layerWeights. View the layers subobject for the rst layer with

the command

net.layers{1}

Neural Network Layer

dimensions: 10

distanceFcn: (none)

distanceParam: (none)

distances: []

initFcn: 'initnw'

netInputFcn: 'netsum'

netInputParam: (none)

positions: []

range: [10x2 double]

size: 10

topologyFcn: (none)

transferFcn: 'tansig'

transferParam: (none)

userdata: (your custom info)

The number of neurons in a layer is given by its size property. In this case, the layer has

10 neurons, which is the default size for the feedforwardnet command. The net input

function is netsum (summation) and the transfer function is the tansig. If you wanted to

change the transfer function to logsig, for example, you could execute the command:

net.layers{1}.transferFcn = 'logsig';

To view the layerWeights subobject for the weight between layer 1 and layer 2, use the

command:

Create Neural Network Object

1-19

net.layerWeights{2,1}

Neural Network Weight

delays: 0

initFcn: (none)

initConfig: .inputSize

learn: true

learnFcn: 'learngdm'

learnParam: .lr, .mc

size: [0 10]

weightFcn: 'dotprod'

weightParam: (none)

userdata: (your custom info)

The weight function is dotprod, which represents standard matrix multiplication (dot

product). Note that the size of this layer weight is 0-by-10. The reason that we have zero

rows is because the network has not yet been congured for a particular data set. The

number of output neurons is equal to the number of rows in your target vector. During the

conguration process, you will provide the network with example inputs and targets, and

then the number of output neurons can be assigned.

This gives you some idea of how the network object is organized. For many applications,

you will not need to be concerned about making changes directly to the network object,

since that is taken care of by the network creation functions. It is usually only when you

want to override the system defaults that it is necessary to access the network object

directly. Other topics will show how this is done for particular networks and training

methods.

To investigate the network object in more detail, you might nd that the object listings,

such as the one shown above, contain links to help on each subobject. Click the links, and

you can selectively investigate those parts of the object that are of interest to you.

1Neural Network Objects, Data, and Training Styles

1-20

Congure Neural Network Inputs and Outputs

This topic is part of the design workow described in “Workow for Neural Network

Design” on page 1-2.

After a neural network has been created, it must be congured. The conguration step

consists of examining input and target data, setting the network's input and output sizes

to match the data, and choosing settings for processing inputs and outputs that will

enable best network performance. The conguration step is normally done automatically,

when the training function is called. However, it can be done manually, by using the

conguration function. For example, to congure the network you created previously to

approximate a sine function, issue the following commands:

p = -2:.1:2;

t = sin(pi*p/2);

net1 = configure(net,p,t);

You have provided the network with an example set of inputs and targets (desired

network outputs). With this information, the configure function can set the network

input and output sizes to match the data.

After the conguration, if you look again at the weight between layer 1 and layer 2, you

can see that the dimension of the weight is 1 by 20. This is because the target for this

network is a scalar.

net1.layerWeights{2,1}

Neural Network Weight

delays: 0

initFcn: (none)

initConfig: .inputSize

learn: true

learnFcn: 'learngdm'

learnParam: .lr, .mc

size: [1 10]

weightFcn: 'dotprod'

weightParam: (none)

userdata: (your custom info)

In addition to setting the appropriate dimensions for the weights, the conguration step

also denes the settings for the processing of inputs and outputs. The input processing

can be located in the inputs subobject:

Congure Neural Network Inputs and Outputs

1-21

net1.inputs{1}

Neural Network Input

feedbackOutput: []

processFcns: {'removeconstantrows', mapminmax}

processParams: {1x2 cell array of 2 params}

processSettings: {1x2 cell array of 2 settings}

processedRange: [1x2 double]

processedSize: 1

range: [1x2 double]

size: 1

userdata: (your custom info)

Before the input is applied to the network, it will be processed by two functions:

removeconstantrows and mapminmax. These are discussed fully in “Multilayer Neural

Networks and Backpropagation Training” on page 4-2 so we won't address the

particulars here. These processing functions may have some processing parameters,

which are contained in the subobject net1.inputs{1}.processParam. These have

default values that you can override. The processing functions can also have conguration

settings that are dependent on the sample data. These are contained in

net1.inputs{1}.processSettings and are set during the conguration process. For

example, the mapminmax processing function normalizes the data so that all inputs fall in

the range [−1, 1]. Its conguration settings include the minimum and maximum values in

the sample data, which it needs to perform the correct normalization. This will be

discussed in much more depth in “Multilayer Neural Networks and Backpropagation

Training” on page 4-2.

As a general rule, we use the term “parameter,” as in process parameters, training

parameters, etc., to denote constants that have default values that are assigned by the

software when the network is created (and which you can override). We use the term

“conguration setting,” as in process conguration setting, to denote constants that are

assigned by the software from an analysis of sample data. These settings do not have

default values, and should not generally be overridden.

For more information, see also “Understanding Neural Network Toolbox Data Structures”

on page 1-23.

1Neural Network Objects, Data, and Training Styles

1-22

Understanding Neural Network Toolbox Data Structures

In this section...

“Simulation with Concurrent Inputs in a Static Network” on page 1-23

“Simulation with Sequential Inputs in a Dynamic Network” on page 1-24

“Simulation with Concurrent Inputs in a Dynamic Network” on page 1-26

This topic discusses how the format of input data structures aects the simulation of

networks. It starts with static networks, and then continues with dynamic networks. The

following section describes how the format of the data structures aects network

training.

There are two basic types of input vectors: those that occur concurrently (at the same

time, or in no particular time sequence), and those that occur sequentially in time. For

concurrent vectors, the order is not important, and if there were a number of networks

running in parallel, you could present one input vector to each of the networks. For

sequential vectors, the order in which the vectors appear is important.

Simulation with Concurrent Inputs in a Static Network

The simplest situation for simulating a network occurs when the network to be simulated

is static (has no feedback or delays). In this case, you need not be concerned about

whether or not the input vectors occur in a particular time sequence, so you can treat the

inputs as concurrent. In addition, the problem is made even simpler by assuming that the

network has only one input vector. Use the following network as an example.

To set up this linear feedforward network, use the following commands:

Understanding Neural Network Toolbox Data Structures

1-23

net = linearlayer;

net.inputs{1}.size = 2;

net.layers{1}.dimensions = 1;

For simplicity, assign the weight matrix and bias to be W = [1 2] and b = [0].

The commands for these assignments are

net.IW{1,1} = [1 2];

net.b{1} = 0;

Suppose that the network simulation data set consists of Q = 4 concurrent vectors:

p p p p

1 2 3 4

=È

Í˘

˙=È

Í˘

˙=È

Í˘

˙=È

Í˘

,,,

Concurrent vectors are presented to the network as a single matrix:

P = [1 2 2 3; 2 1 3 1];

You can now simulate the network:

A = net(P)

A =

5 4 8 5

A single matrix of concurrent vectors is presented to the network, and the network

produces a single matrix of concurrent vectors as output. The result would be the same if

there were four networks operating in parallel and each network received one of the

input vectors and produced one of the outputs. The ordering of the input vectors is not

important, because they do not interact with each other.

Simulation with Sequential Inputs in a Dynamic Network

When a network contains delays, the input to the network would normally be a sequence

of input vectors that occur in a certain time order. To illustrate this case, the next gure

shows a simple network that contains one delay.

1Neural Network Objects, Data, and Training Styles

1-24

The following commands create this network:

net = linearlayer([0 1]);

net.inputs{1}.size = 1;

net.layers{1}.dimensions = 1;

net.biasConnect = 0;

Assign the weight matrix to be W = [1 2].

The command is:

net.IW{1,1} = [1 2];

Suppose that the input sequence is:

p p p p

1 2 3 4

1 2 3 4=

[ ]

,,,

Sequential inputs are presented to the network as elements of a cell array:

P = {1 2 3 4};

You can now simulate the network:

A = net(P)

A =

[1] [4] [7] [10]

You input a cell array containing a sequence of inputs, and the network produces a cell

array containing a sequence of outputs. The order of the inputs is important when they

are presented as a sequence. In this case, the current output is obtained by multiplying

Understanding Neural Network Toolbox Data Structures

1-25

the current input by 1 and the preceding input by 2 and summing the result. If you were

to change the order of the inputs, the numbers obtained in the output would change.

Simulation with Concurrent Inputs in a Dynamic Network

If you were to apply the same inputs as a set of concurrent inputs instead of a sequence of

inputs, you would obtain a completely dierent response. (However, it is not clear why

you would want to do this with a dynamic network.) It would be as if each input were

applied concurrently to a separate parallel network. For the previous example,

“Simulation with Sequential Inputs in a Dynamic Network” on page 1-24, if you use a

concurrent set of inputs you have

p p p p

1 2 3 4

1 2 3 4=

[ ]

,,,

which can be created with the following code:

P = [1 2 3 4];

When you simulate with concurrent inputs, you obtain

A = net(P)

A =

1 2 3 4

The result is the same as if you had concurrently applied each one of the inputs to a

separate network and computed one output. Note that because you did not assign any

initial conditions to the network delays, they were assumed to be 0. For this case the

output is simply 1 times the input, because the weight that multiplies the current input is

In certain special cases, you might want to simulate the network response to several

dierent sequences at the same time. In this case, you would want to present the network

with a concurrent set of sequences. For example, suppose you wanted to present the

following two sequences to the network:

p p p p

p p

1 1 1 1

2 2

1 1 2 2 3 3 4 4

1 4 2 3

( ) , ( ) , ( ) , ( )

( ) , ( )

[ ]

,, ( ) , ( )p p

2 2

3 2 4 1=

[ ]

The input P should be a cell array, where each element of the array contains the two

elements of the two sequences that occur at the same time:

1Neural Network Objects, Data, and Training Styles

1-26

P = {[1 4] [2 3] [3 2] [4 1]};

You can now simulate the network:

A = net(P);

The resulting network output would be

A = {[1 4] [4 11] [7 8] [10 5]}

As you can see, the rst column of each matrix makes up the output sequence produced

by the rst input sequence, which was the one used in an earlier example. The second

column of each matrix makes up the output sequence produced by the second input

sequence. There is no interaction between the two concurrent sequences. It is as if they

were each applied to separate networks running in parallel.

The following diagram shows the general format for the network input P when there are

Q concurrent sequences of TS time steps. It covers all cases where there is a single input

vector. Each element of the cell array is a matrix of concurrent vectors that correspond to

the same point in time for each sequence. If there are multiple input vectors, there will be

multiple rows of matrices in the cell array.

In this topic, you apply sequential and concurrent inputs to dynamic networks. In

“Simulation with Concurrent Inputs in a Static Network” on page 1-23, you applied

concurrent inputs to static networks. It is also possible to apply sequential inputs to static

networks. It does not change the simulated response of the network, but it can aect the

way in which the network is trained. This will become clear in “Neural Network Training

Concepts” on page 1-28.

See also “Congure Neural Network Inputs and Outputs” on page 1-21.

Understanding Neural Network Toolbox Data Structures

1-27

Neural Network Training Concepts

In this section...

“Incremental Training with adapt” on page 1-28

“Batch Training” on page 1-31

“Training Feedback” on page 1-34

This topic is part of the design workow described in “Workow for Neural Network

Design” on page 1-2.

This topic describes two dierent styles of training. In incremental training the weights

and biases of the network are updated each time an input is presented to the network. In

batch training the weights and biases are only updated after all the inputs are presented.

The batch training methods are generally more eicient in the MATLAB environment, and

they are emphasized in the Neural Network Toolbox software, but there some

applications where incremental training can be useful, so that paradigm is implemented

as well.

Incremental Training with adapt

Incremental training can be applied to both static and dynamic networks, although it is

more commonly used with dynamic networks, such as adaptive lters. This section

illustrates how incremental training is performed on both static and dynamic networks.

Incremental Training of Static Networks

Consider again the static network used for the rst example. You want to train it

incrementally, so that the weights and biases are updated after each input is presented. In

this case you use the function adapt, and the inputs and targets are presented as

sequences.

Suppose you want to train the network to create the linear function:

t p p= +21 2

Then for the previous inputs,

p p p p

1 2 3 4

=È

Í˘

˙=È

Í˘

˙=È

Í˘

˙=È

Í˘

,,,

1Neural Network Objects, Data, and Training Styles

1-28

the targets would be

t t t t

1 2 3 4

4577=

[ ]

, , ,

For incremental training, you present the inputs and targets as sequences:

P = {[1;2] [2;1] [2;3] [3;1]};

T = {4 5 7 7};

First, set up the network with zero initial weights and biases. Also, set the initial learning

rate to zero to show the eect of incremental training.

net = linearlayer(0,0);

net = configure(net,P,T);

net.IW{1,1} = [0 0];

net.b{1} = 0;

Recall from “Simulation with Concurrent Inputs in a Static Network” on page 1-23 that,

for a static network, the simulation of the network produces the same outputs whether

the inputs are presented as a matrix of concurrent vectors or as a cell array of sequential

vectors. However, this is not true when training the network. When you use the adapt

function, if the inputs are presented as a cell array of sequential vectors, then the weights

are updated as each input is presented (incremental mode). As shown in the next section,

if the inputs are presented as a matrix of concurrent vectors, then the weights are

updated only after all inputs are presented (batch mode).

You are now ready to train the network incrementally.

[net,a,e,pf] = adapt(net,P,T);

The network outputs remain zero, because the learning rate is zero, and the weights are

not updated. The errors are equal to the targets:

a = [0] [0] [0] [0]

e = [4] [5] [7] [7]

If you now set the learning rate to 0.1 you can see how the network is adjusted as each

input is presented:

net.inputWeights{1,1}.learnParam.lr = 0.1;

net.biases{1,1}.learnParam.lr = 0.1;

[net,a,e,pf] = adapt(net,P,T);

a = [0] [2] [6] [5.8]

e = [4] [3] [1] [1.2]

Neural Network Training Concepts

1-29

The rst output is the same as it was with zero learning rate, because no update is made

until the rst input is presented. The second output is dierent, because the weights have

been updated. The weights continue to be modied as each error is computed. If the

network is capable and the learning rate is set correctly, the error is eventually driven to

zero.

Incremental Training with Dynamic Networks

You can also train dynamic networks incrementally. In fact, this would be the most

common situation.

To train the network incrementally, present the inputs and targets as elements of cell

arrays. Here are the initial input Pi and the inputs P and targets T as elements of cell

arrays.

Pi = {1};

P = {2 3 4};

T = {3 5 7};

Take the linear network with one delay at the input, as used in a previous example.

Initialize the weights to zero and set the learning rate to 0.1.

net = linearlayer([0 1],0.1);

net = configure(net,P,T);

net.IW{1,1} = [0 0];

net.biasConnect = 0;

You want to train the network to create the current output by summing the current and

the previous inputs. This is the same input sequence you used in the previous example

with the exception that you assign the rst term in the sequence as the initial condition

for the delay. You can now sequentially train the network using adapt.

[net,a,e,pf] = adapt(net,P,T,Pi);

a = [0] [2.4] [7.98]

e = [3] [2.6] [-0.98]

The rst output is zero, because the weights have not yet been updated. The weights

change at each subsequent time step.

1Neural Network Objects, Data, and Training Styles

1-30

Batch Training

Batch training, in which weights and biases are only updated after all the inputs and

targets are presented, can be applied to both static and dynamic networks. Both types of

networks are discussed in this section.

Batch Training with Static Networks

Batch training can be done using either adapt or train, although train is generally the

best option, because it typically has access to more eicient training algorithms.

Incremental training is usually done with adapt; batch training is usually done with

train.

For batch training of a static network with adapt, the input vectors must be placed in one

matrix of concurrent vectors.

P = [1 2 2 3; 2 1 3 1];

T = [4 5 7 7];

Begin with the static network used in previous examples. The learning rate is set to 0.01.

net = linearlayer(0,0.01);

net = configure(net,P,T);

net.IW{1,1} = [0 0];

net.b{1} = 0;

When you call adapt, it invokes trains (the default adaption function for the linear

network) and learnwh (the default learning function for the weights and biases). trains

uses Widrow-Ho learning.

[net,a,e,pf] = adapt(net,P,T);

a = 0 0 0 0

e = 4 5 7 7

Note that the outputs of the network are all zero, because the weights are not updated

until all the training set has been presented. If you display the weights, you nd

net.IW{1,1}

ans = 0.4900 0.4100

net.b{1}

ans =

0.2300

This is dierent from the result after one pass of adapt with incremental updating.

Neural Network Training Concepts

1-31

Now perform the same batch training using train. Because the Widrow-Ho rule can be

used in incremental or batch mode, it can be invoked by adapt or train. (There are

several algorithms that can only be used in batch mode (e.g., Levenberg-Marquardt), so

these algorithms can only be invoked by train.)

For this case, the input vectors can be in a matrix of concurrent vectors or in a cell array

of sequential vectors. Because the network is static and because train always operates

in batch mode, train converts any cell array of sequential vectors to a matrix of

concurrent vectors. Concurrent mode operation is used whenever possible because it has

a more eicient implementation in MATLAB code:

P = [1 2 2 3; 2 1 3 1];

T = [4 5 7 7];

The network is set up in the same way.

net = linearlayer(0,0.01);

net = configure(net,P,T);

net.IW{1,1} = [0 0];

net.b{1} = 0;

Now you are ready to train the network. Train it for only one epoch, because you used

only one pass of adapt. The default training function for the linear network is trainb,

and the default learning function for the weights and biases is learnwh, so you should

get the same results obtained using adapt in the previous example, where the default

adaption function was trains.

net.trainParam.epochs = 1;

net = train(net,P,T);

If you display the weights after one epoch of training, you nd

net.IW{1,1}

ans = 0.4900 0.4100

net.b{1}

ans =

0.2300

This is the same result as the batch mode training in adapt. With static networks, the

adapt function can implement incremental or batch training, depending on the format of

the input data. If the data is presented as a matrix of concurrent vectors, batch training

occurs. If the data is presented as a sequence, incremental training occurs. This is not

true for train, which always performs batch training, regardless of the format of the

input.

1Neural Network Objects, Data, and Training Styles

1-32

Batch Training with Dynamic Networks

Training static networks is relatively straightforward. If you use train the network is

trained in batch mode and the inputs are converted to concurrent vectors (columns of a

matrix), even if they are originally passed as a sequence (elements of a cell array). If you

use adapt, the format of the input determines the method of training. If the inputs are

passed as a sequence, then the network is trained in incremental mode. If the inputs are

passed as concurrent vectors, then batch mode training is used.

With dynamic networks, batch mode training is typically done with train only, especially

if only one training sequence exists. To illustrate this, consider again the linear network

with a delay. Use a learning rate of 0.02 for the training. (When using a gradient descent

algorithm, you typically use a smaller learning rate for batch mode training than

incremental training, because all the individual gradients are summed before determining

the step change to the weights.)

net = linearlayer([0 1],0.02);

net.inputs{1}.size = 1;

net.layers{1}.dimensions = 1;

net.IW{1,1} = [0 0];

net.biasConnect = 0;

net.trainParam.epochs = 1;

Pi = {1};

P = {2 3 4};

T = {3 5 6};

You want to train the network with the same sequence used for the incremental training

earlier, but this time you want to update the weights only after all the inputs are applied

(batch mode). The network is simulated in sequential mode, because the input is a

sequence, but the weights are updated in batch mode.

net = train(net,P,T,Pi);

The weights after one epoch of training are

net.IW{1,1}

ans = 0.9000 0.6200

These are dierent weights than you would obtain using incremental training, where the

weights would be updated three times during one pass through the training set. For batch

training the weights are only updated once in each epoch.

Neural Network Training Concepts

1-33

Training Feedback

The showWindow parameter allows you to specify whether a training window is visible

when you train. The training window appears by default. Two other parameters,

showCommandLine and show, determine whether command-line output is generated and

the number of epochs between command-line feedback during training. For instance, this

code turns o the training window and gives you training status information every 35

epochs when the network is later trained with train:

net.trainParam.showWindow = false;

net.trainParam.showCommandLine = true;

net.trainParam.show= 35;

Sometimes it is convenient to disable all training displays. To do that, turn o both the

training window and command-line feedback:

net.trainParam.showWindow = false;

net.trainParam.showCommandLine = false;

The training window appears automatically when you train. Use the nntraintool

function to manually open and close the training window.

nntraintool

nntraintool('close')

1Neural Network Objects, Data, and Training Styles

1-34

Deep Networks

• “Deep Learning in MATLAB” on page 2-2

• “Try Deep Learning in 10 Lines of MATLAB Code” on page 2-10

• “Deep Learning with Big Data on GPUs and in Parallel” on page 2-13

• “Construct Deep Network Using Autoencoders” on page 2-18

• “Pretrained Convolutional Neural Networks” on page 2-21

• “Learn About Convolutional Neural Networks” on page 2-27

• “List of Deep Learning Layers” on page 2-31

• “Specify Layers of Convolutional Neural Network” on page 2-36

• “Set Up Parameters and Train Convolutional Neural Network” on page 2-46

• “Resume Training from a Checkpoint Network” on page 2-51

•“Dene Custom Deep Learning Layers” on page 2-56

•“Dene a Custom Deep Learning Layer with Learnable Parameters” on page 2-73

•“Dene a Custom Regression Output Layer” on page 2-87

•“Dene a Custom Classication Output Layer” on page 2-97

• “Check Custom Layer Validity” on page 2-107

• “Long Short-Term Memory Networks” on page 2-116

• “Preprocess Images for Deep Learning” on page 2-127

• “Develop Custom Mini-Batch Datastore” on page 2-131

Deep Learning in MATLAB

In this section...

“What Is Deep Learning?” on page 2-2

“Try Deep Learning in 10 Lines of MATLAB Code” on page 2-5

“Start Deep Learning Faster Using Transfer Learning” on page 2-7

“Train Classiers Using Features Extracted from Pretrained Networks” on page 2-8

“Deep Learning with Big Data on CPUs, GPUs, in Parallel, and on the Cloud” on page 2-

What Is Deep Learning?

Deep learning is a branch of machine learning that teaches computers to do what comes

naturally to humans: learn from experience. Machine learning algorithms use

computational methods to “learn” information directly from data without relying on a

predetermined equation as a model. Deep learning is especially suited for image

recognition, which is important for solving problems such as facial recognition, motion

detection, and many advanced driver assistance technologies such as autonomous

driving, lane detection, pedestrian detection, and autonomous parking.

Neural Network Toolbox provides simple MATLAB commands for creating and

interconnecting the layers of a deep neural network. Examples and pretrained networks

make it easy to use MATLAB for deep learning, even without knowledge of advanced

computer vision algorithms or neural networks.

For a free hands-on introduction to practical deep learning methods, see Deep Learning

Onramp.

What Do You Want to Do? Learn More

Perform transfer learning to ne-tune a

network with your data

“Start Deep Learning Faster Using Transfer

Learning” on page 2-7

Tip Fine-tuning a pretrained network to

learn a new task is typically much faster

and easier than training a new network.

2Deep Networks

2-2

What Do You Want to Do? Learn More

Classify images with pretrained networks “Pretrained Convolutional Neural

Networks” on page 2-21

Create a new deep neural network for

classication or regression

“Create Simple Deep Learning Network for

Classication”

“Deep Learning Training from Scratch”

Resize, rotate, or preprocess images for

training or prediction

“Preprocess Images for Deep Learning” on

page 2-127

Label your image data automatically based

on folder names, or interactively using an

app

“Train Network for Image Classication”

Image Labeler

Create deep learning networks for

sequence and time series data.

“Sequence Classication Using Deep

Learning”

“Time Series Forecasting Using Deep

Learning”

Classify each pixel of an image (for

example, road, car, pedestrian)

“Semantic Segmentation Basics” (Computer

Vision System Toolbox)

Detect and recognize objects in images “Deep Learning, Object Detection and

Recognition” (Computer Vision System

Toolbox)

Classify text data “Classify Text Data Using Deep Learning”

(Text Analytics Toolbox)

Classify audio data for speech recognition “Deep Learning Speech Recognition”

Visualize what features networks have

learned

“Deep Dream Images Using AlexNet”

“Visualize Activations of a Convolutional

Neural Network”

Train on CPU, GPU, multiple GPUs, in

parallel on your desktop or on clusters in

the cloud, and work with data sets too large

to t in memory

“Deep Learning with Big Data on GPUs and

in Parallel” on page 2-13

To learn more about deep learning application areas, including automated driving, see

“Deep Learning Applications”.

Deep Learning in MATLAB

2-3

To choose whether to use a pretrained network or create a new deep network, consider

the scenarios in this table.

Use a Pretrained Network

for Transfer Learning

Create a New Deep

Network

Training Data Hundreds to thousands of

labeled images (small)

Thousands to millions of

labeled images

Computation Moderate computation (GPU

optional)

Compute intensive (requires

GPU for speed)

Training Time Seconds to minutes Days to weeks for real

problems

Model Accuracy Good, depends on the

pretrained model

High, but can overt to

small data sets

Deep learning uses neural networks to learn useful representations of features directly

from data. Neural networks combine multiple nonlinear processing layers, using simple

elements operating in parallel and inspired by biological nervous systems. Deep learning

models can achieve state-of-the-art accuracy in object classication, sometimes exceeding

human-level performance.

You train models using a large set of labeled data and neural network architectures that

contain many layers, usually including some convolutional layers. Training these models

is computationally intensive and you can usually accelerate training by using a high

performance GPU. This diagram shows how convolutional neural networks combine layers

that automatically learn features from many images to classify new images.

Many deep learning applications use image les, and sometimes millions of image les. To

access many image les for deep learning eiciently, MATLAB provides the

imageDatastore function. Use this function to:

• Automatically read batches of images for faster processing in machine learning and

computer vision applications

2Deep Networks

2-4

• Import data from image collections that are too large to t in memory

• Label your image data automatically based on folder names

Try Deep Learning in 10 Lines of MATLAB Code

This example shows how to use deep learning to identify objects on a live webcam using

only 10 lines of MATLAB code. Try the example to see how simple it is to get started with

deep learning in MATLAB.

1Run these commands to get the downloads if needed, connect to the webcam, and get

a pretrained neural network.

camera = webcam; % Connect to the camera

net = alexnet; % Load the neural network

The webcam and alexnet functions provide a link to help you download the free add-

ons using Add-On Explorer. Alternatively, see Neural Network Toolbox Model for

AlexNet Network and MATLAB Support Package for USB Webcams.

You can use alexnet to classify images. AlexNet is a pretrained convolutional neural

network (CNN) that has been trained on more than a million images and can classify

images into 1000 object categories (for example, keyboard, mouse, coee mug,

pencil, and many animals).

2To show and classify live images, run the following code. Point the webcam at an

object and the neural network reports what class of object it thinks the webcam is

showing. It keeps classifying images until you press Ctrl+C. The code resizes the

image for the network using imresize.

while true

im = snapshot(camera); % Take a picture

image(im); % Show the picture

im = imresize(im,[227 227]); % Resize the picture for alexnet

label = classify(net,im); % Classify the picture

title(char(label)); % Show the class label

drawnow

end

In this example, the network correctly classies a coee mug. Experiment with

objects in your surroundings to see how accurate the network is.

Deep Learning in MATLAB

2-5

To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.

To get the code to extend this example to show the probability scores of classes, see

“Classify Webcam Images Using Deep Learning”.

For next steps in deep learning, you can use the pretrained network for other tasks. Solve

new classication problems on your image data with transfer learning or feature

extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on

page 2-7 and “Train Classiers Using Features Extracted from Pretrained Networks”

on page 2-8. To try other pretrained networks, see “Pretrained Convolutional Neural

Networks” on page 2-21.

2Deep Networks

2-6

Start Deep Learning Faster Using Transfer Learning

Transfer learning is commonly used in deep learning applications. You can take a

pretrained network and use it as a starting point to learn a new task. Fine-tuning a

network with transfer learning is much faster and easier than training from scratch. You

can quickly make the network learn a new task using a smaller number of training

images. The advantage of transfer learning is that the pretrained network has already

learned a rich set of features that can be applied to a wide range of other similar tasks.

For example, if you take a network trained on thousands or millions of images, you can

retrain it for new object detection using only hundreds of images. You can eectively ne-

tune a pretrained network with much smaller data sets than the original training data. If

you have a very large dataset, then transfer learning might not be faster than training a

new network.

Transfer learning enables you to:

• Transfer the learned features of a pretrained network to a new problem

• Transfer learning is faster and easier than training a new network

• Reduce training time and dataset size

• Perform deep learning without needing to learn how to create a whole new network

For examples, see “Get Started with Transfer Learning”, “Transfer Learning Using

AlexNet”, and “Transfer Learning Using GoogLeNet”.

Deep Learning in MATLAB

2-7

Train Classiers Using Features Extracted from Pretrained

Networks

Feature extraction allows you to use the power of pretrained networks without investing

time and eort into training. Feature extraction can be the fastest way to use deep

learning. You extract learned features from a pretrained network, and use those features

to train a classier, for example, a support vector machine (SVM — requires Statistics and

Machine Learning Toolbox™). For example, if an SVM trained using alexnet can achieve

>90% accuracy on your training and validation set, then ne-tuning with transfer

learning might not be worth the eort to gain some extra accuracy. If you perform ne-

tuning on a small dataset, then you also risk overtting. If the SVM cannot achieve good

enough accuracy for your application, then ne-tuning is worth the eort to seek higher

accuracy.

For an example, see “Feature Extraction Using AlexNet”.

Deep Learning with Big Data on CPUs, GPUs, in Parallel, and

on the Cloud

Neural networks are inherently parallel algorithms. You can take advantage of this

parallelism by using Parallel Computing Toolbox™ to distribute training across multicore

CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs

and GPUs.

Training deep networks is extremely computationally intensive and you can usually

accelerate training by using a high performance GPU. If you do not have a suitable GPU,

you can train on one or more CPU cores instead. You can train a convolutional neural

network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a

cluster. Using GPU or parallel options requires Parallel Computing Toolbox.

You do not need multiple computers to solve problems using data sets too large to t in

memory. You can use the imageDatastore function to work with batches of data without

needing a cluster of machines. However, if you have a cluster available, it can be helpful

to take your code to the data repository rather than moving large amounts of data around.

To learn more about deep learning hardware and memory settings, see “Deep Learning

with Big Data on GPUs and in Parallel” on page 2-13.

2Deep Networks

2-8

See Also

Related Examples

• “Classify Webcam Images Using Deep Learning”

• “Get Started with Transfer Learning”

• “Pretrained Convolutional Neural Networks” on page 2-21

• “Create Simple Deep Learning Network for Classication”

• “Deep Learning with Big Data on GPUs and in Parallel” on page 2-13

• “Deep Learning, Object Detection and Recognition” (Computer Vision System

Toolbox)

• “Classify Text Data Using Deep Learning” (Text Analytics Toolbox)

See Also

2-9

Try Deep Learning in 10 Lines of MATLAB Code

This example shows how to use deep learning to identify objects on a live webcam using

only 10 lines of MATLAB code. Try the example to see how simple it is to get started with

deep learning in MATLAB.

1Run these commands to get the downloads if needed, connect to the webcam, and get

a pretrained neural network.

camera = webcam; % Connect to the camera

net = alexnet; % Load the neural network

If you need to install the webcam and alexnet add-ons, a message from each

function appears with a link to help you download the free add-ons using Add-On

Explorer. Alternatively, see Neural Network Toolbox Model for AlexNet Network and

MATLAB Support Package for USB Webcams.

After you install Neural Network Toolbox Model for AlexNet Network, you can use it

to classify images. AlexNet is a pretrained convolutional neural network (CNN) that

has been trained on more than a million images and can classify images into 1000

object categories (for example, keyboard, mouse, coee mug, pencil, and many

animals).

2Run the following code to show and classify live images. Point the webcam at an

object and the neural network reports what class of object it thinks the webcam is

showing. It will keep classifying images until you press Ctrl+C. The code resizes the

image for the network using imresize.

while true

im = snapshot(camera); % Take a picture

image(im); % Show the picture

im = imresize(im,[227 227]); % Resize the picture for alexnet

label = classify(net,im); % Classify the picture

title(char(label)); % Show the class label

drawnow

end

In this example, the network correctly classies a coee mug. Experiment with

objects in your surroundings to see how accurate the network is.

2Deep Networks

2-10

To watch a video of this example, see Deep Learning in 11 Lines of MATLAB Code.

To get the code to extend this example to show the probability scores of classes, see

“Classify Webcam Images Using Deep Learning”.

For next steps in deep learning, you can use the pretrained network for other tasks. Solve

new classication problems on your image data with transfer learning or feature

extraction. For examples, see “Start Deep Learning Faster Using Transfer Learning” on

page 2-7 and “Train Classiers Using Features Extracted from Pretrained Networks” on

page 2-8. To try other pretrained networks, see “Pretrained Convolutional Neural

Networks” on page 2-21.

Try Deep Learning in 10 Lines of MATLAB Code

2-11

See Also

Related Examples

• “Classify Webcam Images Using Deep Learning”

• “Get Started with Transfer Learning”

2Deep Networks

2-12

Deep Learning with Big Data on GPUs and in Parallel

Neural networks are inherently parallel algorithms. You can take advantage of this

parallelism by using Parallel Computing Toolbox to distribute training across multicore

CPUs, graphical processing units (GPUs), and clusters of computers with multiple CPUs

and GPUs.

Training deep networks is extremely computationally intensive and you can usually

accelerate training by using a high performance GPU. If you do not have a suitable GPU,

you can train on one or more CPU cores instead. You can train a convolutional neural

network on a single GPU or CPU, or on multiple GPUs or CPU cores, or in parallel on a

cluster. Using GPU or parallel options requires Parallel Computing Toolbox.

Tip GPU support is automatic. By default, the trainNetwork function uses a GPU if

available.

If you have access to a machine with multiple GPUs, simply specify the training option

'ExecutionEnvironment','multi-gpu'.

You do not need multiple computers to solve problems using data sets too large to t in

memory. You can use the imageDatastore function to work with batches of data without

needing a cluster of machines. For an example, see “Train Network for Image

Classication”. However, if you have a cluster available, it can be helpful to take your

code to the data repository rather than moving large amounts of data around.

Deep Learning Hardware

and Memory

Considerations

Recommendations Required Products

Data too large to t in

memory

To import data from image

collections that are too large

to t in memory, use the

imageDatastore function.

This function is designed to

read batches of images for

faster processing in machine

learning and computer

vision applications.

MATLAB

Neural Network Toolbox

Deep Learning with Big Data on GPUs and in Parallel

2-13

Deep Learning Hardware

and Memory

Considerations

Recommendations Required Products

CPU If you do not have a suitable

GPU, you can train on a CPU

instead. By default, the

trainNetwork function

uses the CPU if no GPU is

available.

MATLAB

Neural Network Toolbox

GPU By default, the

trainNetwork function

uses a GPU if available.

Requires a CUDA® enabled

NVIDIA® GPU with compute

capability 3.0 or higher.

Check your GPU using

gpuDevice. Specify the

execution environment

using the

trainingOptions

function.

MATLAB

Neural Network Toolbox

Parallel Computing Toolbox

Parallel on your local

machine using multiple

GPUs or CPU cores

Take advantage of multiple

workers by specifying the

execution environment with

the trainingOptions

function. If you have more

than one GPU on your

machine, specify 'multi-

gpu'. Otherwise, specify

'parallel'.

MATLAB

Neural Network Toolbox

Parallel Computing Toolbox

2Deep Networks

2-14

Deep Learning Hardware

and Memory

Considerations

Recommendations Required Products

Parallel on a cluster or in

the cloud

Scale up to use workers on

clusters or in the cloud to

accelerate your deep

learning computations. Use

trainingOptions and

specify 'parallel' to use

a compute cluster. For more

information, see “Deep

Learning in the Cloud” on

page 2-17.

MATLAB

Neural Network Toolbox

Parallel Computing Toolbox

MATLAB Distributed

Computing Server™

Tip To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-

All functions for deep learning training, prediction, and validation in Neural Network

Toolbox perform computations using single-precision, oating-point arithmetic. Functions

for deep learning include trainNetwork, predict, classify, and activations. The

software uses single-precision arithmetic when you train networks using both CPUs and

GPUs.

Because single-precision and double-precision performance of GPUs can dier

substantially, it is important to know in which precision computations are performed. If

you only use a GPU for deep learning, then single-precision performance is one of the

most important characteristics of a GPU. If you also use a GPU for other computations

using Parallel Computing Toolbox, then high double-precision performance is important.

This is because many functions in MATLAB use double-precision arithmetic by default.

For more information, see “Improve Performance Using Single Precision Calculations”

(Parallel Computing Toolbox).

Training with Multiple GPUs

Cutting-edge neural networks rely on increasingly large training datasets and networks

structures. In turn, this requires increased training times and memory resources. To

support training such networks, MATLAB provides support for training a single network

using multiple GPUs in parallel. If you have more than one GPU on your local machine,

Deep Learning with Big Data on GPUs and in Parallel

2-15

enable multiple GPU training by setting the 'ExecutionEnvironment' option to

'multi-gpu' with the trainingOptions function. If you have access to a cluster or

cloud, specify 'parallel'. To improve convergence and/or performance using multiple

GPUs, try increasing the MiniBatchSize and learning rate.

For optimum performance, you need to experiment with the MiniBatchSize option that

you specify with the trainingOptions function. Convolutional neural networks are

typically trained iteratively using batches of images. This is done because the whole

dataset is far too large to t into GPU memory. The optimal batch size depends on your

exact network, dataset, and GPU hardware, so you need to experiment. Too large a batch

size can lead to slow convergence, while too small a batch size can lead to no

convergence at all. Often the batch size is dictated by the GPU memory available. For

larger networks, the memory requirements per image increases and the maximum batch

size is reduced.

When training with multiple GPUs, each image batch is distributed between the GPUs.

This eectively increases the total GPU memory available, allowing larger batch sizes.

Depending on your application, a larger batch size can provide better convergence or

classication accuracy.

Using multiple GPUs can provide a signicant improvement in performance. To decide if

you expect multi-GPU training to deliver a performance gain, consider the following

factors:

• How long is the iteration on each GPU? If each GPU iteration is short, the added

overhead of communication between GPUs can dominate. Try increasing the

computation per iteration by using a larger batch size.

• Are all the GPUs on a single machine? Communication between GPUs on dierent

machines introduces a signicant communication delay.

To learn more, see “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2

and “Select Particular GPUs to Use for Training” on page 3-7.

Fetch and Preprocess Data in Background

You can fetch and preprocess data in parallel with network training. To perform data

dispatch in the background, train the network using trainNetwork with data in a mini-

batch datastore with background dispatch enabled. You can use a built-in mini-batch

datastore, such as an augmentedImageDatastore, denoisingImageDatastore, or

pixelLabelImageDatastore. To enable background dispatch, set the

2Deep Networks

2-16

DispatchInBackground property of the datastore to true. You can also use a custom

mini-batch datastore with background dispatch enabled. For more information on

creating custom mini-batch datastores, see “Develop Custom Mini-Batch Datastore” on

page 2-131. For advanced options, you can try modifying the number of workers of the

parallel pool. For more information, see “Specify Your Parallel Preferences” (Parallel

Computing Toolbox). You can also ne-tune the training computation and data dispatch

loads between workers by specifying the 'WorkerLoad' name-value pair argument of

trainingOptions.

Deep Learning in the Cloud

Try your deep learning applications with multiple high-performance GPUs on Amazon®

Elastic Compute Cloud (Amazon EC2®). You can use MATLAB to perform deep learning in

the cloud using Amazon EC2 with P2 instances and data stored in the cloud. If you do not

have a suitable GPU available for faster training of a convolutional neural network, you

can use this feature instead. You can try dierent numbers of GPUs per machine to

accelerate training and use parallel computing to train multiple models at once on the

same data, or train a single network using multiple GPUs. You can compare and explore

the performance of multiple deep neural network congurations to look for the best

tradeo between accuracy and memory use.

To help you get started, see these examples: “Deep Learning in Parallel and in the Cloud”.

See also this white paper that outlines a complete workow: Deep Learning in the Cloud

with MATLAB White Paper.

See Also

trainNetwork | trainingOptions

See Also

Related Examples

• “Scale Up Deep Learning in Parallel and in the Cloud” on page 3-2

• “Deep Learning in Parallel and in the Cloud”

See Also

2-17

Construct Deep Network Using Autoencoders

Load the sample data.

[X,T] = wine_dataset;

Train an autoencoder with a hidden layer of size 10 and a linear transfer function for the

decoder. Set the L2 weight regularizer to 0.001, sparsity regularizer to 4 and sparsity

proportion to 0.05.

hiddenSize = 10;

autoenc1 = trainAutoencoder(X,hiddenSize,...

'L2WeightRegularization',0.001,...

'SparsityRegularization',4,...

'SparsityProportion',0.05,...

'DecoderTransferFunction','purelin');

Extract the features in the hidden layer.

features1 = encode(autoenc1,X);

Train a second autoencoder using the features from the rst autoencoder. Do not scale

the data.

hiddenSize = 10;

autoenc2 = trainAutoencoder(features1,hiddenSize,...

'L2WeightRegularization',0.001,...

'SparsityRegularization',4,...

'SparsityProportion',0.05,...

'DecoderTransferFunction','purelin',...

'ScaleData',false);

Extract the features in the hidden layer.

features2 = encode(autoenc2,features1);

Train a softmax layer for classication using the features, features2, from the second

autoencoder, autoenc2.

softnet = trainSoftmaxLayer(features2,T,'LossFunction','crossentropy');

Stack the encoders and the softmax layer to form a deep network.

deepnet = stack(autoenc1,autoenc2,softnet);

2Deep Networks

2-18

Train the deep network on the wine data.

deepnet = train(deepnet,X,T);

Estimate the wine types using the deep network, deepnet.

wine_type = deepnet(X);

Plot the confusion matrix.

plotconfusion(T,wine_type);

Construct Deep Network Using Autoencoders

2-19

2Deep Networks

2-20

Pretrained Convolutional Neural Networks

In this section...

“Download Pretrained Networks” on page 2-22

“Transfer Learning” on page 2-24

“Feature Extraction” on page 2-25

Fine-tuning a pretrained network with transfer learning is typically much faster and

easier than training from scratch. You can use previously trained networks for the

following purposes.

Purpose Description

Classication Apply pretrained networks directly to

classication problems. To classify a new

image, use classify. For an example

showing how to use a pretrained network

for classication, see “Classify Image Using

GoogLeNet”.

Transfer Learning Take layers from a network trained on a

large data set and ne-tune on a new data

set. For examples showing how to use

pretrained networks for transfer learning,

see “Transfer Learning Using GoogLeNet”

and “Transfer Learning Using AlexNet”.

Feature Extraction Use a pretrained network as a feature

extractor by using the layer activations as

features. You can use these activations as

features to train another classier, such as

a support vector machine (SVM). For an

example showing how to use a pretrained

network for feature extraction, see

“Feature Extraction Using AlexNet”.

Pretrained Convolutional Neural Networks

2-21

Download Pretrained Networks

You can download and install pretrained networks to use for your problems. Use functions

such as alexnet to get links to download pretrained networks from the Add-On Explorer.

To see a list of available downloads, see MathWorks Neural Network Toolbox Team.

The pretrained networks have learned rich feature representations for a wide range of

natural images. You can apply these learned features to a wide range of image

classication problems using transfer learning and feature extraction. The pretrained

models are trained on more than a million images and can classify images into 1000

object categories, such as keyboard, coee mug, pencil, and many animals. The training

images are a subset of the ImageNet database [1], which is used in ImageNet Large-Scale

Visual Recognition Challenge (ILSVRC) [2].

AlexNet

Use the alexnet function to get a link to download a pretrained AlexNet model.

AlexNet won ILSVRC 2012, achieving highest classication performance. AlexNet has 8

layers with learnable weights: 5 convolutional layers, and 3 fully connected layers.

AlexNet is fast for retraining and classifying new images, but it is also large and not as

accurate on the original ILSVRC data set as other, newer pretrained models. For more

information, see alexnet.

VGG-16 and VGG-19

Use the vgg16 and vgg19 functions to get links to download pretrained VGG models.

VGG networks were among the winners of ILSVRC 2014. VGG-16 has 16 layers with

learnable weights: 13 convolutional layers and 3 fully connected layers. VGG-19 has 19

layers with learnable weights: 16 convolutional layers and 3 fully connected layers. In

both networks, all convolutional layers have lters of size 3-by-3. VGG networks are

larger and typically slower than other pretrained networks in Neural Network Toolbox.

For more information, see vgg16 and vgg19.

GoogLeNet

Use the googlenet function to get a link to download a pretrained GoogLeNet model.

GoogLeNet was among the winners of ILSVRC 2014. GoogLeNet is smaller and typically

faster than VGG networks, and smaller and more accurate than AlexNet on the original

ILSVRC data set. GoogLeNet is 22 layers deep. Like Inception-v3 and ResNets,

2Deep Networks

2-22

GoogLeNet has a directed acyclic graph structure. To extract the layers and architecture

of the network for further processing, use layerGraph. For a transfer learning example

using GoogLeNet, see “Transfer Learning Using GoogLeNet”. For more information, see

googlenet.

Inception-v3

Use the inceptionv3 function to get a link to download a pretrained Inception-v3 model.

Inception-v3 is a development of the GoogLeNet architecture. Compared to GoogLeNet,

Inception-v3 is larger, deeper, typically slower, but more accurate on the original ILSVRC

data set. Inception-v3 is 48 layers deep. To extract the layers and architecture of the

network for further processing, use layerGraph. To retrain the network on a new

classication task, follow the steps of “Transfer Learning Using GoogLeNet”. Load the

Inception-v3 model instead of GoogLeNet and change the names of the layers that you

remove and connect to match the names of the Inception-v3 layers: remove the

'predictions', 'predictions_softmax', and

'ClassificationLayer_predictions' layers, and connect to the 'avg_pool' layer.

For more information, see inceptionv3.

ResNet-50 and ResNet-101

Use the resnet50 and resnet101 functions to get links to download pretrained

ResNet-50 and ResNet-101 networks.

The residual connections of ResNets enable training of very deep networks. As the names

suggest, ResNet-50 is 50 layers deep and ResNet-101 is 101 layers deep. To retrain a

network on a new classication task, follow the steps of “Transfer Learning Using

GoogLeNet”. Load a ResNet network instead of GoogLeNet and change the names of the

layers in the end of the network that you remove and connect to match the names of the

ResNet layers. For more information, see resnet50 and resnet101.

importCaeNetwork

You can import pretrained networks from Cae [3] using the importCaffeNetwork.

There are many pretrained networks available in Cae Model Zoo [4]. Locate and

download the desired .prototxt and .caffemodel les and use

importCaffeNetwork to import the pretrained network into MATLAB. For more

information, see importCaffeNetwork.

Pretrained Convolutional Neural Networks

2-23

importCaeLayers

You can import network architectures from Cae using importCaffeLayers.

You can import the network architectures of Cae networks, without importing the

pretrained network weights. Locate and download the desired .prototxt le and use

importCaffeLayers to import the network layers into MATLAB. For more information,

see importCaffeLayers.

importKerasNetwork

You can import pretrained networks from Keras using importKerasNetwork.

You can import the network and weights either from the same HDF5 (.h5) le or separate

HDF5 and JSON (.json) les. For more information, see importKerasNetwork.

importKerasLayers

You can import network architectures from Keras using importKerasLayers.

You can import the network architecture of Keras networks, either with or without

weights. You can import the network architecture and weights either from the same

HDF5 (.h5) le or separate HDF5 and JSON (.json) les. For more information, see

importKerasLayers.

Transfer Learning

Transfer learning is commonly used in deep learning applications. You can take a

pretrained network and use it as a starting point to learn a new task. Fine-tuning a

network with transfer learning is much faster and easier than constructing and training a

new network. You can quickly transfer learning to a new task using a smaller number of

training images. The advantage of transfer learning is that the pretrained network has

already learned a rich set of features. These features can be applied to a wide range of

other similar tasks. For example, you can take a network trained on millions of images

and retrain it for new object classication using only hundreds of images. If you have a

very large data set, then transfer learning might not be faster. For examples showing how

to perform transfer learning, see “Transfer Learning Using AlexNet” and “Transfer

Learning Using GoogLeNet”.

2Deep Networks

2-24

Feature Extraction

Feature extraction is an easy way to use the power of pretrained networks without

investing time and eort into training. Feature extraction can be the fastest way to use

deep learning. You extract learned features from a pretrained network, and use those

features to train a classier, such as a support vector machine using fitcsvm (Statistics

and Machine Learning Toolbox). For example, if an SVM achieves >90% accuracy on your

training and validation set, then ne-tuning might not be worth the eort to increase

accuracy. If you perform ne-tuning on a small data set, you also risk over-tting to the

training data. If the SVM cannot achieve good enough accuracy for your application, then

ne-tuning is worth the eort to seek higher accuracy. For an example showing how to

use a pretrained network for feature extraction, see “Feature Extraction Using AlexNet”.

References

[1] ImageNet. http://www.image-net.org

[2] Russakovsky, O., Deng, J., Su, H., et al. “ImageNet Large Scale Visual Recognition

Challenge.” International Journal of Computer Vision (IJCV). Vol 115, Issue 3,

2015, pp. 211–252

[3] Cae. http://cae.berkeleyvision.org/

[4] Cae Model Zoo. http://cae.berkeleyvision.org/model_zoo.html

Pretrained Convolutional Neural Networks

2-25

See Also

alexnet | googlenet | importCaffeLayers | importCaffeNetwork | inceptionv3

| resnet101 | resnet50 | vgg16 | vgg19

Related Examples

• “Deep Learning in MATLAB” on page 2-2

• “Transfer Learning Using AlexNet”

• “Feature Extraction Using AlexNet”

• “Transfer Learning Using GoogLeNet”

• “Visualize Features of a Convolutional Neural Network”

• “Visualize Activations of a Convolutional Neural Network”

• “Deep Dream Images Using AlexNet”

2Deep Networks

2-26

Learn About Convolutional Neural Networks

Convolutional neural networks (ConvNets) are widely used tools for deep learning. They

are specically suitable for images as inputs, although they are also used for other

applications such as text, signals, and other continuous responses. They dier from other

types of neural networks in a few ways:

Convolutional neural networks are inspired from the biological structure of a visual

cortex, which contains arrangements of simple and complex cells [1]. These cells are

found to activate based on the subregions of a visual eld. These subregions are called

receptive elds. Inspired from the ndings of this study, the neurons in a convolutional

layer connect to the subregions of the layers before that layer instead of being fully-

connected as in other types of neural networks. The neurons are unresponsive to the

areas outside of these subregions in the image.

These subregions might overlap, hence the neurons of a ConvNet produce spatially-

correlated outcomes, whereas in other types of neural networks, the neurons do not share

any connections and produce independent outcomes.

In addition, in a neural network with fully-connected neurons, the number of parameters

(weights) can increase quickly as the size of the input increases. A convolutional neural

network reduces the number of parameters with the reduced number of connections,

shared weights, and downsampling.

A ConvNet consists of multiple layers, such as convolutional layers, max-pooling or

average-pooling layers, and fully-connected layers.

Learn About Convolutional Neural Networks

2-27

The neurons in each layer of a ConvNet are arranged in a 3-D manner, transforming a 3-D

input to a 3-D output. For example, for an image input, the rst layer (input layer) holds

the images as 3-D inputs, with the dimensions being height, width, and the color channels

of the image. The neurons in the rst convolutional layer connect to the regions of these

images and transform them into a 3-D output. The hidden units (neurons) in each layer

learn nonlinear combinations of the original inputs, which is called feature extraction [2].

These learned features, also known as activations, from one layer become the inputs for

the next layer. Finally, the learned features become the inputs to the classier or the

regression function at the end of the network.

The architecture of a ConvNet can vary depending on the types and numbers of layers

included. The types and number of layers included depends on the particular application

or data. For example, if you have categorical responses, you must have a classication

function and a classication layer, whereas if your response is continuous, you must have

a regression layer at the end of the network. A smaller network with only one or two

convolutional layers might be suicient to learn a small number of gray scale image data.

On the other hand, for more complex data with millions of colored images, you might

need a more complicated network with multiple convolutional and fully connected layers.

You can concatenate the layers of a convolutional neural network in MATLAB in the

following way:

2Deep Networks

2-28

layers = [imageInputLayer([28 28 1])

convolution2dLayer(5,20)

reluLayer

maxPooling2dLayer(2,'Stride',2)

fullyConnectedLayer(10)

softmaxLayer

classificationLayer];

After dening the layers of your network, you must specify the training options using the

trainingOptions function. For example,

options = trainingOptions('sgdm');

Then, you can train the network with your training data using the trainNetwork

function. The data, layers, and training options become the inputs to the training

function. For example,

convnet = trainNetwork(data,layers,options);

For detailed discussion of layers of a ConvNet, see “Specify Layers of Convolutional

Neural Network” on page 2-36. For setting up training parameters, see “Set Up

Parameters and Train Convolutional Neural Network” on page 2-46.

References

[1] Hubel, H. D. and Wiesel, T. N. '' Receptive Fields of Single neurones in the Cat’s

Striate Cortex.'' Journal of Physiology. Vol 148, pp. 574-591, 1959.

[2] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge,

Massachusetts: The MIT Press, 2012.

See Also

trainNetwork | trainingOptions

More About

• “Deep Learning in MATLAB” on page 2-2

• “Specify Layers of Convolutional Neural Network” on page 2-36

• “Set Up Parameters and Train Convolutional Neural Network” on page 2-46

See Also

2-29

• “Get Started with Transfer Learning”

• “Create Simple Deep Learning Network for Classication”

• “Pretrained Convolutional Neural Networks” on page 2-21

2Deep Networks

2-30

List of Deep Learning Layers

To specify the architecture of a neural network with all layers connected sequentially,

create an array of layers directly. To specify the architecture of a network where layers

can have multiple inputs or outputs, use a LayerGraph object. Use the following

functions to create dierent layer types.

Input Layers

Function Description

imageInputLayer An image input layer inputs images to a

network and applies data normalization.

sequenceInputLayer A sequence input layer inputs sequence

data to a network.

List of Deep Learning Layers

2-31

Learnable Layers

Function Description

convolution2dLayer A 2-D convolutional layer applies sliding

lters to the input. The layer convolves the

input by moving the lters along the input

vertically and horizontally and computing

the dot product of the weights and the

input, and then adding a bias term.

transposedConv2dLayer A transposed 2-D convolution layer

upsamples feature maps.

fullyConnectedLayer A fully connected layer multiplies the input

by a weight matrix and then adds a bias

vector.

lstmLayer An LSTM layer is a recurrent neural

network (RNN) layer that enables support

for time series and sequence data in a

network. The layer performs additive

interactions, which can help improve

gradient ow over long sequences during

training. LSTM layers are best suited for

learning long-term dependencies

(dependencies from distant time steps).

bilstmLayer A bidirectional LSTM (BiLSTM) layer is a

recurrent neural network (RNN) layer. The

layer learns bidirectional long-term

dependencies between time steps. These

dependencies can be useful for when you

want the network to learn from the

complete time series at each time step.

2Deep Networks

2-32

Activation Layers

Function Description

reluLayer A ReLU layer performs a threshold

operation to each element of the input,

where any value less than zero is set to

zero.

leakyReluLayer A leaky ReLU layer performs a simple

threshold operation, where any input value

less than zero is multiplied by a xed scalar.

clippedReluLayer A clipped ReLU layer performs a simple

threshold operation, where any input value

less than zero is set to zero and any value

above the clipping ceiling is set to that

clipping ceiling.

Normalization and Dropout Layers

Function Description

batchNormalizationLayer A batch normalization layer normalizes

each input channel across a mini-batch. The

layer rst normalizes the activations of

each channel by subtracting the mini-batch

mean and dividing by the mini-batch

standard deviation. Then, the layer shifts

the input by a learnable oset β and scales

it by a learnable scale factor γ. Use batch

normalization layers between convolutional

layers and nonlinearities, such as ReLU

layers, to speed up training of convolutional

neural networks and reduce the sensitivity

to network initialization.

crossChannelNormalizationLayer A channel-wise local response (cross-

channel) normalization layer carries out

channel-wise normalization.

dropoutLayer A dropout layer randomly sets input

elements to zero with a given probability.

List of Deep Learning Layers

2-33

Pooling Layers

Function Description

averagePooling2dLayer An average pooling layer performs down-

sampling by dividing the input into

rectangular pooling regions and computing

the average values of each region.

maxPooling2dLayer A max pooling layer performs down-

sampling by dividing the input into

rectangular pooling regions, and computing

the maximum of each region.

maxUnpooling2dLayer A max unpooling layer unpools the output

of a max pooling layer.

Combination Layers

Function Description

additionLayer An addition layer adds multiple inputs

element-wise. Specify the number of inputs

to the layer when you create it. The inputs

have names 'in1','in2',...,'inN',

where N is the number of inputs. Use the

input names when connecting or

disconnecting the layer to other layers

using connectLayers or

disconnectLayers. All inputs to an

addition layer must have the same

dimension.

depthConcatenationLayer A depth concatenation layer takes multiple

inputs that have the same height and width

and concatenates them along the third

dimension (the channel dimension). The

inputs have names

'in1','in2',...,'inN', where N is the

number of inputs. Use the input names

when connecting or disconnecting the layer

to other layers using connectLayers or

disconnectLayers.

2Deep Networks

2-34

Output Layers

Function Description

softmaxLayer A softmax layer applies a softmax function

to the input.

classificationLayer A classication output layer holds the name

of the loss function the software uses for

training the network for multiclass

classication, the size of the output, and

the class labels.

regressionLayer A regression output layer holds the name of

the loss function the software uses for

training the network for regression, and the

response names.

See Also

trainNetwork | trainingOptions

More About

• “Learn About Convolutional Neural Networks” on page 2-27

• “Set Up Parameters and Train Convolutional Neural Network” on page 2-46

• “Resume Training from a Checkpoint Network” on page 2-51

• “Create Simple Deep Learning Network for Classication”

• “Pretrained Convolutional Neural Networks” on page 2-21

• “Deep Learning in MATLAB” on page 2-2

See Also

2-35

Specify Layers of Convolutional Neural Network

In this section...

“Image Input Layer” on page 2-37

“Convolutional Layer” on page 2-37

“Batch Normalization Layer” on page 2-39

“ReLU Layer” on page 2-39

“Cross Channel Normalization (Local Response Normalization) Layer” on page 2-40

“Max- and Average-Pooling Layers” on page 2-40

“Dropout Layer” on page 2-41

“Fully Connected Layer” on page 2-41

“Output Layers” on page 2-42

The rst step of creating and training a new convolutional neural network (ConvNet) is to

dene the network architecture. This topic explains the details of ConvNet layers, and the

order they appear in a ConvNet.

The architecture of a ConvNet can vary depending on the types and numbers of layers

included. The types and number of layers included depends on the particular application

or data. For example, if you have categorical responses, you must have a softmax layer

and a classication layer, whereas if your response is continuous, you must have a

regression layer at the end of the network. A smaller network with only one or two

convolutional layers might be suicient to learn on a small number of grayscale image

data. On the other hand, for more complex data with millions of colored images, you

might need a more complicated network with multiple convolutional and fully connected

layers.

You can dene the layers of a convolutional neural network in MATLAB in an array

format, for example,

layers = [

imageInputLayer([28 28 1])

convolution2dLayer(3,16,'Padding',1)

batchNormalizationLayer

reluLayer

maxPooling2dLayer(2,'Stride',2)

convolution2dLayer(3,32,'Padding',1)

batchNormalizationLayer

2Deep Networks

2-36

reluLayer

fullyConnectedLayer(10)

softmaxLayer

classificationLayer];

layers is an array of Layer objects. layers becomes an input for the training function

trainNetwork.

Image Input Layer

The image input layer denes the size of the input images of a convolutional neural

network and contains the raw pixel values of the images. You can add an input layer using

the imageInputLayer function. Specify the image size using the inputSize argument.

The size of an image corresponds to the height, width, and the number of color channels

of that image. For example, for a grayscale image, the number of channels is 1, and for a

color image it is 3.

This layer can also perform data normalization by subtracting the mean image of the

training set from every input image.

Convolutional Layer

Filters and Stride: A convolutional layer consists of neurons that connect to subregions

of the input images or the outputs of the layer before it. A convolutional layer learns the

features localized by these regions while scanning through an image. You can specify the

size of these regions using the filterSize input argument when you create the layer

using the convolution2dLayer function.

For each region, the trainNetwork function computes a dot product of the weights and

the input, and then adds a bias term. A set of weights that are applied to a region in the

image is called a lter. The lter moves along the input image vertically and horizontally,

repeating the same computation for each region, that is, convolving the input. The step

size with which it moves is called a stride. You can specify this step size with the Stride

name-value pair argument. These local regions that the neurons connect to might overlap

depending on the filterSize and 'Stride' values.

The number of weights used for a lter is h*w*c, where h is the height, and w is the width

of the lter size, and c is the number of channels in the input (for example, if the input is

a color image, the number of color channels is 3). The number of lters determines the

number of channels in the output of a convolutional layer. Specify the number of lters

using the numFilters argument of convolution2dLayer.

Specify Layers of Convolutional Neural Network

2-37

Feature Maps: As a lter moves along the input, it uses the same set of weights and bias

for the convolution, forming a feature map. Hence, the number of feature maps a

convolutional layer has is equal to the number of lters (number of channels). Each

feature map has a dierent set of weights and a bias. So, the total number of parameters

in a convolutional layer is ((h*w*c + 1)*Number of Filters), where 1 is for the bias.

Zero Padding: You can also apply zero padding to input image borders vertically and

horizontally using the 'Padding' name-value pair argument. Padding is basically adding

rows or columns of zeros to the borders of an image input. It helps you control the output

size of the layer.

Output Size: The output height and width of a convolutional layer is (Input Size – Filter

Size + 2*Padding)/Stride + 1. This value must be an integer for the whole image to be

fully covered. If the combination of these parameters does not lead the image to be fully

covered, the software by default ignores the remaining part of the image along the right

and bottom edge in the convolution.

Number of Neurons: The product of the output height and width gives the total number

of neurons in a feature map, say Map Size. The total number of neurons (output size) in a

convolutional layer, then, is Map Size*Number of Filters.

For example, suppose that the input image is a 32-by-32-by-3 color image. For a

convolutional layer with 8 lters and a lter size of 5-by-5, the number of weights per

lter is 5*5*3 = 75, and the total number of parameters in the layer is (75+1) * 8 = 608.

If the stride is 2 in each direction and there are two pixels of zero padding, then each

feature map is 16-by-16. This is because (32 – 5 + 2*2)/2 + 1 = 16.5, and so some of the

outer-most zero padding to the right and bottom of the image is discarded. Finally, the

total number of neurons in the layer is 16*16*8 = 2048.

Learning Parameters: You can also adjust the learning rates and regularization

parameters for this layer using the related name-value pair arguments while dening the

convolutional layer. If you choose not to adjust them, trainNetwork uses the global

training parameters dened by trainingOptions function. For details on global and

layer training options, see “Set Up Parameters and Train Convolutional Neural Network”

on page 2-46.

A convolutional neural network can consist of one or multiple convolutional layers. The

number of convolutional layers depends on the amount and complexity of the data.

2Deep Networks

2-38

Batch Normalization Layer

Use batch normalization layers between convolutional layers and nonlinearities such as

ReLU layers to speed up network training and reduce the sensitivity to network

initialization. The layer rst normalizes the activations of each channel by subtracting the

mini-batch mean and dividing by the mini-batch standard deviation. Then, the layer shifts

the input by an oset β and scales it by a scale factor γ. β and γ are themselves learnable

parameters that are updated during network training. Create a batch normalization layer

using batchNormalizationLayer.

Batch normalization layers normalize the activations and gradients propagating through a

neural network, making network training an easier optimization problem. To take full

advantage of this fact, you can try increasing the learning rate. Since the optimization

problem is easier, the parameter updates can be larger and the network can learn faster.

You can also try reducing the L2 and dropout regularization. With batch normalization

layers, the activations of a specic image are not deterministic, but instead depend on

which images happen to appear in the same mini-batch. To take full advantage of this

regularizing eect, try shuing the training data before every training epoch. To specify

how often to shue the data during training, use the 'Shuffle' name-value pair

argument of trainingOptions.

ReLU Layer

Convolutional and batch normalization layers are usually followed by a nonlinear

activation function such as a rectied linear unit (ReLU), specied by a ReLU layer.

Create a ReLU layer using the reluLayer function. A ReLU layer performs a threshold

operation to each element, where any input value less than zero is set to zero, that is,

f x x x

( )

=≥

0 0

The ReLU layer does not change the size of its input.

There are extensions of the standard ReLU layer that perform slightly dierent operations

and can improve performance for some applications. A leaky ReLU layer multiplies input

values less than zero by a xed scalar, allowing negative inputs to “leak” into the output.

Use the leakyReluLayer function to create a leaky ReLU layer. A clipped ReLU layer

sets negative inputs to zero, but also sets input values above a clipping ceiling equal to

that clipping ceiling. This clipping prevents the output from becoming too large. Use the

clippedReluLayer function to create a clipped ReLU layer.

Specify Layers of Convolutional Neural Network

2-39

Cross Channel Normalization (Local Response Normalization)

Layer

This layer performs a channel-wise local response normalization. It usually follows the

ReLU activation layer. Create this layer using the crossChannelNormalizationLayer

function. This layer replaces each element with a normalized value it obtains using the

elements from a certain number of neighboring channels (elements in the normalization

window). That is, for each element

in the input, trainNetwork computes a normalized

value

’

using

Kss

windowChannelSize

’

Áˆ

where K, α, and β are the hyperparameters in the normalization, and ss is the sum of

squares of the elements in the normalization window [2]. You must specify the size of the

normalization window using the windowChannelSize argument of the

crossChannelNormalizationLayer function. You can also specify the

hyperparameters using the Alpha, Beta, and K name-value pair arguments.

The previous normalization formula is slightly dierent than what is presented in [2]. You

can obtain the equivalent formula by multiplying the alpha value by the

windowChannelSize.

Max- and Average-Pooling Layers

Max- and average-pooling layers follow the convolutional layers for down-sampling,

hence, reducing the number of connections to the following layers (usually a fully

connected layer). They do not perform any learning themselves, but reduce the number of

parameters to be learned in the following layers. They also help reduce overtting. Create

these layers using the maxPooling2dLayer and averagePooling2dLayer functions.

A max-pooling layer returns the maximum values of rectangular regions of its input. The

size of the rectangular regions is determined by the poolSize argument of

maxPoolingLayer. For example, if poolSize equals [2,3], then the layer returns the

maximum value in regions of height 2 and width 3.

Similarly, the average-pooling layer outputs the average values of rectangular regions of

its input. The size of the rectangular regions is determined by the poolSize argument of

2Deep Networks

2-40

averagePoolingLayer. For example, if poolSize is [2,3], then the layer returns the

average value of regions of height 2 and width 3. The maxPoolingLayer and

averagepoolingLayer functions scan through the input horizontally and vertically in

step sizes you can specify using the 'Stride' name-value pair argument of either

function. If the poolSize is smaller than or equal to the Stride, then the pooling

regions do not overlap.

For nonoverlapping regions (poolSize and Stride are equal), if the input to the pooling

layer is n-by-n, and the pooling region size is h-by-h, then the pooling layer down-samples

the regions by h [6]. That is, the output of a max- or average-pooling layer for one channel

of a convolutional layer is n/h-by-n/h. For overlapping regions, the output of a pooling

layer is (Input Size – Pool Size + 2*Padding)/Stride + 1.

Dropout Layer

A dropout layer randomly sets the layer’s input elements to zero with a given probability.

Create a dropout layer using the dropoutLayer function.

Although the output of a dropout layer is equal to its input, this operation corresponds to

temporarily dropping a randomly chosen unit and all of its connections from the network

during training. So, for each new input element, trainNetwork randomly selects a

subset of neurons, forming a dierent layer architecture. These architectures use

common weights, but because the learning does not depend on specic neurons and

connections, the dropout layer might help prevent overtting [7], [2]. Similar to max- or

average-pooling layers, no learning takes place in this layer.

Fully Connected Layer

The convolutional (and down-sampling) layers are followed by one or more fully

connected layers. Create a fully connected layer using the fullyConnectedLayer

function.

As the name suggests, all neurons in a fully connected layer connect to all the neurons in

the previous layer. This layer combines all of the features (local information) learned by

the previous layers across the image to identify the larger patterns. For classication

problems, the last fully connected layer combines the features to classify the images. This

is the reason that the outputSize argument of the last fully connected layer of the

network is equal to the number of classes of the data set. For regression problems, the

output size must be equal to the number of response variables.

Specify Layers of Convolutional Neural Network

2-41

You can also adjust the learning rate and the regularization parameters for this layer

using the related name-value pair arguments when creating the fully connected layer. If

you choose not to adjust them, then trainNetwork uses the global training parameters

dened by the trainingOptions function. For details on global and layer training

options, see “Set Up Parameters and Train Convolutional Neural Network” on page 2-46.

Output Layers

Softmax and Classication Layers

For classication problems, a softmax layer and then a classication layer must follow the

nal fully connected layer. You can create these layers using the softmaxLayer and

classificationLayer functions, respectively.

The output unit activation function is the softmax function:

( )

exp

where

0 1£ £yr

and

Â1

The softmax function is the output unit activation function after the last fully connected

layer for multi-class classication problems:

P c P c P c

P c P c

r r

j j j

exp ,

exp

qq qq

( )

( ) ( )

( )

xx,

( )

where

0 1£

( )

£P crx,

and

P cj

( )

Â1

. Moreover,

a P c P c

r rr

( )

ln x,

P cr

( )

is the conditional probability of the sample given class r, and

P cr

( )

is the

class prior probability.

2Deep Networks

2-42

The softmax function is also known as the normalized exponential and can be considered

the multi-class generalization of the logistic sigmoid function [8].

A classication output layer must follow the softmax layer. In the classication output

layer, trainNetwork takes the values from the softmax function and assigns each input

to one of the k mutually exclusive classes using the cross entropy function for a 1-of-k

coding scheme [8]:

E t y

ij j i

qq qq

( )

= -

( )

ÂÂ ln , ,x

where

tij

is the indicator that the ith sample belongs to the jth class,

is the parameter

vector.

yj i

( )

is the output for sample i, which in this case, is the value from the

softmax function. That is, it is the probability that the network associates the ith input

with class j,

P t j i

( )

Regression Layer

You can also use ConvNets for regression problems, where the target (output) variable is

continuous. In such cases, a regression output layer must follow the nal fully connected

layer. You can create a regression layer using the regressionLayer function. The

default loss function for a regression layer is the mean squared error:

MSE E t y

i i

( )

where

is the target output, and

is the network’s prediction for the response variable

corresponding to observation i.

References

[1] Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge,

Massachusetts: The MIT Press, 2012.

Specify Layers of Convolutional Neural Network

2-43

[2] Krizhevsky, A., I. Sutskever, and G. E. Hinton. "ImageNet Classication with Deep

Convolutional Neural Networks. " Advances in Neural Information Processing

Systems. Vol 25, 2012.

[3] LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel,

L.D., et al. ''Handwritten Digit Recognition with a Back-propagation Network.'' In

Advances of Neural Information Processing Systems, 1990.

[4] LeCun, Y., L. Bottou, Y. Bengio, and P. Haner. ''Gradient-based Learning Applied to

Document Recognition.'' Proceedings of the IEEE. Vol 86, pp. 2278–2324, 1998.

[5] Nair, V. and G. E. Hinton. "Rectied linear units improve restricted boltzmann

machines." In Proc. 27th International Conference on Machine Learning, 2010.

[6] Nagi, J., F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J.

Schmidhuber, L. M. Gambardella. ''Max-Pooling Convolutional Neural Networks

for Vision-based Hand Gesture Recognition''. IEEE International Conference on

Signal and Image Processing Applications (ICSIPA2011), 2011.

[7] Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov. "Dropout: A

Simple Way to Prevent Neural Networks from Overtting." Journal of Machine

Learning Research. Vol. 15, pp. 1929-1958, 2014.

[8] Bishop, C. M. Pattern Recognition and Machine Learning. Springer, New York, NY,

2006.

[9] Ioe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network

training by reducing internal covariate shift." preprint, arXiv:1502.03167 (2015).

See Also

averagePooling2dLayer | batchNormalizationLayer | classificationLayer |

clippedReluLayer | convolution2dLayer | crossChannelNormalizationLayer |

dropoutLayer | fullyConnectedLayer | imageInputLayer | leakyReluLayer |

maxPooling2dLayer | regressionLayer | reluLayer | softmaxLayer |

trainNetwork | trainingOptions

More About

• “Learn About Convolutional Neural Networks” on page 2-27

2Deep Networks

2-44

• “Set Up Parameters and Train Convolutional Neural Network” on page 2-46

• “Resume Training from a Checkpoint Network” on page 2-51

• “Create Simple Deep Learning Network for Classication”

• “Pretrained Convolutional Neural Networks” on page 2-21

• “Deep Learning in MATLAB” on page 2-2

See Also

2-45

Set Up Parameters and Train Convolutional Neural

Network

In this section...

“Specify Solver and Maximum Number of Epochs” on page 2-46

“Specify and Modify Learning Rate” on page 2-47

“Specify Validation Data” on page 2-48

“Select Hardware Resource” on page 2-48

“Save Checkpoint Networks and Resume Training” on page 2-49

“Set Up Parameters in Convolutional and Fully Connected Layers” on page 2-49

“Train Your Network” on page 2-50

After you dene the layers of your neural network as described in “Specify Layers of

Convolutional Neural Network” on page 2-36, the next step is to set up the training

options for the network. Use the trainingOptions function to dene the global training

parameters. To train a network, use the object returned by trainingOptions as an

input argument to the trainNetwork function. For example:

options = trainingOptions('adam');

trainedNet = trainNetwork(data,layers,options);

Layers with learnable parameters also have options for adjusting the learning

parameters. For more information, see “Set Up Parameters in Convolutional and Fully

Connected Layers” on page 2-49.

Specify Solver and Maximum Number of Epochs

trainNetwork can use dierent variants of stochastic gradient descent to train the

network. Specify the optimization algorithm by using the solverName argument of

trainingOptions. To minimize the loss, these algorithms update the network

parameters by taking small steps in the direction of the negative gradient of the loss

function.

The 'adam' (derived from adaptive moment estimation) solver is often a good optimizer

to try rst. You can also try the 'rmsprop' (root mean square propagation) and 'sgdm'

(stochastic gradient descent with momentum) optimizers and see if this improves

2Deep Networks

2-46

training. Dierent solvers work better for dierent problems. For more information about

the dierent solvers, see “Stochastic Gradient Descent”.

The solvers update the parameters using a subset of the data each step. This subset is

called a mini-batch. You can specify the size of the mini-batch by using the

'MiniBatchSize' name-value pair argument of trainingOptions. Each parameter

update is called an iteration. A full pass through the entire data set is called an epoch.

You can specify the maximum number of epochs to train for by using the 'MaxEpochs'

name-value pair argument of trainingOptions. The default value is 30, but you can

choose a smaller number of epochs for small networks or for ne-tuning and transfer

learning, where most of the learning is already done.

By default, the software shues the data once before training. You can change this

setting by using the 'Shuffle' name-value pair argument.

Specify and Modify Learning Rate

You can specify the global learning rate by using the 'InitialLearnRate' name-value

pair argument of trainingOptions. By default, trainNetwork uses this value

throughout the entire training process. You can choose to modify the learning rate every

certain number of epochs by multiplying the learning rate with a factor. Instead of using a

small, xed learning rate throughout the training process, you can choose a larger

learning rate in the beginning of training and gradually reduce this value during

optimization. Doing so can shorten the training time, while enabling smaller steps

towards the minimum of the loss as training progresses.

Tip If the mini-batch loss during training ever becomes NaN, then the learning rate is

likely too high. Try reducing the learning rate, for example by a factor of 3, and restarting

network training.

To gradually reduce the learning rate, use the 'LearnRateSchedule','piecewise'

name-value pair argument. Once you choose this option, trainNetwork multiplies the

initial learning rate by a factor of 0.1 every 10 epochs. You can specify the factor by which

to reduce the initial learning rate and the number of epochs by using the

'LearnRateDropFactor' and 'LearnRateDropPeriod' name-value pair arguments,

respectively.

Set Up Parameters and Train Convolutional Neural Network

2-47

Specify Validation Data

To perform network validation during training, specify validation data using the

'ValidationData' name-value pair argument of trainingOptions. By default,

trainNetwork validates the network every 50 iterations by predicting the response of

the validation data and calculating the validation loss and accuracy (root mean square

error for regression networks). You can change the validation frequency using the

'ValidationFrequency' name-value pair argument. If your network has layers that

behave dierently during prediction than during training (for example, dropout layers),

then the validation accuracy can be higher than the training (mini-batch) accuracy.

Network training automatically stops when the validation loss stops improving. By

default, if the validation loss is larger than or equal to the previously smallest loss ve

times in a row, then network training stops. To change the number of times that the

validation loss is allowed to not decrease before training stops, use the

'ValidationPatience' name-value pair argument.

Performing validation at regular intervals during training helps you to determine if your

network is overtting the training data. A common problem is that the network simply

"memorizes" the training data, rather than learning general features that enable it to

make accurate predictions for new data. To check if your network is overtting, compare

the training loss and accuracy to the corresponding validation metrics. If the training loss

is signicantly lower than the validation loss, or the training accuracy is signicantly

higher than the validation accuracy, then your network is overtting.

To reduce overtting, you can try adding data augmentation. Use an

augmentedImageDatastore to perform random transformations on your input images.

This helps to prevent the network from memorizing the exact position and orientation of

objects. You can also try decreasing the network size by reducing the number of layers or

convolutional lters, increasing the L2 regularization using the 'L2Regularization'

name-value pair argument, or adding dropout layers.

Select Hardware Resource

If a GPU is available, then trainNetwork uses it for training, by default. Otherwise,

trainNetwork uses a CPU. Alternatively, you can specify the execution environment you

want using the 'ExecutionEnvironment' name-value pair argument. You can specify a

single CPU ('cpu'), a single GPU ('gpu'), multiple GPUs ('multi-gpu'), or a local

parallel pool or compute cluster ('parallel'). All options other than 'cpu' require

Parallel Computing Toolbox. Training on a GPU requires a CUDA enabled GPU with

compute capability 3.0 or higher.

2Deep Networks

2-48

Save Checkpoint Networks and Resume Training

Neural Network Toolbox enables you to save networks as .mat les after each epoch

during training. This periodic saving is especially useful when you have a large network

or a large data set, and training takes a long time. If the training is interrupted for some

reason, you can resume training from the last saved checkpoint network. If you want

trainNetwork to save checkpoint networks, then you must specify the name of the path

by using the 'CheckpointPath' name-value pair argument of trainingOptions. If

the path that you specify does not exist, then trainingOptions returns an error.

trainNetwork automatically assigns unique names to checkpoint network les, for

example, convnet_checkpoint__351__2016_11_09__12_04_23.mat. In this

example, 351 is the iteration number, 2016_11_09 is the date, and 12_04_23 is the time at

which trainNetwork saves the network. You can load a checkpoint network le by

double-clicking it or using the load command at the command line. For example:

load convnet_checkpoint__351__2016_11_09__12_04_23.mat

You can then resume training by using the layers of the network as an input argument to

trainNetwork. For example:

trainNetwork(XTrain,YTrain,net.Layers,options)

You must manually specify the training options and the input data, because the

checkpoint network does not contain this information. For an example, see “Resume

Training from a Checkpoint Network” on page 2-51.

Set Up Parameters in Convolutional and Fully Connected

Layers

You can set the learning parameters to be dierent from the global values specied by

trainingOptions in layers with learnable parameters, such as convolutional and fully

connected layers. For example, to adjust the learning rate for the biases or weights, you

can specify a value for the BiasLearnRateFactor or WeightLearnRateFactor

properties of the layer, respectively. The trainNetwork function multiplies the learning

rate that you specify by using trainingOptions with these factors. Similarly, you can

also specify the L2 regularization factors for the weights and biases in these layers by

specifying the BiasL2Factor and WeightL2Factor properties, respectively.

trainNetwork then multiplies the L2 regularization factors that you specify by using

trainingOptions with these factors.

Set Up Parameters and Train Convolutional Neural Network

2-49

Initialize Weights in Convolutional and Fully Connected Layers

By default, the initial values of the weights of the convolutional and fully connected layers

are randomly generated from a Gaussian distribution with mean 0 and standard deviation

0.01. The initial biases are by default equal to 0. You can manually change the initial

weights and biases after you create the layers. For examples, see “Specify Initial Weights

and Biases in Convolutional Layer” and “Specify Initial Weights and Biases in Fully

Connected Layer”.

Train Your Network

After you specify the layers of your network and the training parameters, you can train

the network using the training data. The data, layers, and training options are all input

arguments of the trainNetwork function, as in this example.

layers = [imageInputLayer([28 28 1])

convolution2dLayer(5,20)

reluLayer

maxPooling2dLayer(2,'Stride',2)

fullyConnectedLayer(10)

softmaxLayer

classificationLayer];

options = trainingOptions('adam');

convnet = trainNetwork(data,layers,options);

Training data can be an array, a table, or an ImageDatastore object. For more

information, see the trainNetwork function reference page.

See Also

Convolution2dLayer | FullyConnectedLayer | trainNetwork | trainingOptions

More About

• “Learn About Convolutional Neural Networks” on page 2-27

• “Specify Layers of Convolutional Neural Network” on page 2-36

• “Create Simple Deep Learning Network for Classication”

• “Resume Training from a Checkpoint Network” on page 2-51

2Deep Networks

2-50

Resume Training from a Checkpoint Network

This example shows how to save checkpoint networks while training a convolutional

neural network and resume training from a previously saved network.

Load Sample Data

Load the sample data as a 4-D array.

[XTrain,TTrain] = digitTrain4DArrayData;

Display the size of XTrain.

size(XTrain)

ans =

28 28 1 5000

digitTrain4DArrayData loads the digit training set as 4-D array data. XTrain is a 28-

by-28-by-1-by-5000 array, where 28 is the height and 28 is the width of the images. 1 is

the number of channels and 5000 is the number of synthetic images of handwritten digits.

TTrain is a categorical vector containing the labels for each observation.

Display some of the images in XTrain .

figure;

perm = randperm(5000,20);

for i = 1:20

subplot(4,5,i);

imshow(XTrain(:,:,:,perm(i)));

end

Resume Training from a Checkpoint Network

2-51

Dene Layers of Network

Dene the layers of the convolutional neural network.

layers = [imageInputLayer([28 28 1])

convolution2dLayer(5,20)

reluLayer()

maxPooling2dLayer(2,'Stride',2)

fullyConnectedLayer(10)

softmaxLayer()

classificationLayer()];

2Deep Networks

2-52

Specify Training Options with Checkpoint Path and Train Network

Set the options to default settings for the stochastic gradient descent with momentum

and specify the path for saving the checkpoint networks.

options = trainingOptions('sgdm','CheckpointPath','C:\Temp\cnncheckpoint');

Train the network.

convnet = trainNetwork(XTrain,TTrain,layers,options);

Training on single GPU.

Initializing image normalization.

|=========================================================================================|

|=========================================================================================|

| 1 | 1 | 0.02 | 2.3022 | 7.81% | 0.0100 |

| 2 | 50 | 0.63 | 2.2684 | 31.25% | 0.0100 |

| 3 | 100 | 1.22 | 1.5395 | 53.13% | 0.0100 |

| 4 | 150 | 1.81 | 1.4287 | 50.00% | 0.0100 |

| 6 | 200 | 2.43 | 1.1682 | 60.94% | 0.0100 |

| 7 | 250 | 3.04 | 0.7370 | 77.34% | 0.0100 |

| 8 | 300 | 3.65 | 0.8038 | 71.09% | 0.0100 |

| 9 | 350 | 4.26 | 0.6766 | 78.91% | 0.0100 |

| 11 | 400 | 4.88 | 0.4824 | 89.06% | 0.0100 |

| 12 | 450 | 5.50 | 0.3670 | 90.63% | 0.0100 |

| 13 | 500 | 6.11 | 0.3582 | 92.19% | 0.0100 |

| 15 | 550 | 6.75 | 0.2501 | 93.75% | 0.0100 |

| 16 | 600 | 7.37 | 0.2662 | 93.75% | 0.0100 |

| 17 | 650 | 7.97 | 0.1974 | 96.09% | 0.0100 |

| 18 | 700 | 8.59 | 0.2140 | 97.66% | 0.0100 |

| 20 | 750 | 9.21 | 0.1402 | 99.22% | 0.0100 |

| 21 | 800 | 9.82 | 0.1324 | 97.66% | 0.0100 |

| 22 | 850 | 10.43 | 0.1373 | 96.88% | 0.0100 |

| 24 | 900 | 11.07 | 0.0913 | 100.00% | 0.0100 |

| 25 | 950 | 11.70 | 0.0935 | 98.44% | 0.0100 |

| 26 | 1000 | 12.31 | 0.0647 | 100.00% | 0.0100 |

| 27 | 1050 | 12.92 | 0.0678 | 99.22% | 0.0100 |

| 29 | 1100 | 13.55 | 0.1053 | 98.44% | 0.0100 |

| 30 | 1150 | 14.17 | 0.0714 | 99.22% | 0.0100 |

| 30 | 1170 | 14.41 | 0.0497 | 100.00% | 0.0100 |

|=========================================================================================|

Resume Training from a Checkpoint Network

2-53

trainNetwork uses a GPU if there is one available. If there is no available GPU, then it

uses CPU. Note: For the available hardware options, see the trainingOptions function

page.

Suppose the training was interrupted after the 8th epoch and did not complete. Rather

than restarting the training from the beginning, you can load the last checkpoint network

and resume training from that point.

Load Checkpoint Network and Resume Training

Load the checkpoint network.

load convnet_checkpoint__351__2016_11_09__12_04_23.mat

trainNetwork automatically assigns unique names to the checkpoint network les. For

example, in this case, where 351 is the iteration number, 2016_11_09 is the date and

12_04_21 is the time trainNetwork saves the network.

If you don't have the data and the training options in your working directory, you must

manually load and/or specify them before you can resume training.

Specify the training options to reduce the maximum number of epochs.

options = trainingOptions('sgdm','MaxEpochs',20, ...

'CheckpointPath','C:\Temp\cnncheckpoint');

You can also adjust other training options, such as initial learning rate.

Resume the training using the layers of the checkpoint network you loaded with the new

training options. The name of the network, by default, is net .

convnet2 = trainNetwork(XTrain,TTrain,net.Layers,options)

Training on single GPU.

Initializing image normalization.

|=========================================================================================|

|=========================================================================================|

| 1 | 1 | 0.02 | 0.5210 | 84.38% | 0.0100 |

| 2 | 50 | 0.62 | 0.4168 | 87.50% | 0.0100 |

| 3 | 100 | 1.23 | 0.4532 | 87.50% | 0.0100 |

| 4 | 150 | 1.83 | 0.3424 | 92.97% | 0.0100 |

| 6 | 200 | 2.45 | 0.3177 | 95.31% | 0.0100 |

2Deep Networks

2-54

| 7 | 250 | 3.06 | 0.2091 | 94.53% | 0.0100 |

| 8 | 300 | 3.67 | 0.1829 | 96.88% | 0.0100 |

| 9 | 350 | 4.27 | 0.1531 | 97.66% | 0.0100 |

| 11 | 400 | 4.91 | 0.1482 | 96.88% | 0.0100 |

| 12 | 450 | 5.58 | 0.1293 | 97.66% | 0.0100 |

| 13 | 500 | 6.31 | 0.1134 | 98.44% | 0.0100 |

| 15 | 550 | 6.94 | 0.1006 | 100.00% | 0.0100 |

| 16 | 600 | 7.55 | 0.0909 | 98.44% | 0.0100 |

| 17 | 650 | 8.16 | 0.0567 | 100.00% | 0.0100 |

| 18 | 700 | 8.76 | 0.0654 | 100.00% | 0.0100 |

| 20 | 750 | 9.41 | 0.0654 | 100.00% | 0.0100 |

| 20 | 780 | 9.77 | 0.0606 | 99.22% | 0.0100 |

|=========================================================================================|

convnet2 =

SeriesNetwork with properties:

Layers: [7×1 nnet.cnn.layer.Layer]

See Also

trainNetwork | trainingOptions

Related Examples

• “Create Simple Deep Learning Network for Classication”

More About

• “Learn About Convolutional Neural Networks” on page 2-27

• “Specify Layers of Convolutional Neural Network” on page 2-36

• “Set Up Parameters and Train Convolutional Neural Network” on page 2-46

See Also

2-55

Dene Custom Deep Learning Layers

Tip This topic explains how to dene custom deep learning layers for your problems. For

a list of built-in layers in Neural Network Toolbox, see “List of Deep Learning Layers” on

page 2-31.

This topic explains the architecture of deep learning layers and how to dene custom

layers to use for your problems.

Type Description

Layer Dene a custom deep learning layer and

specify optional learnable parameters,

forward functions, and a backward

function.

For an example showing how to dene a

custom layer with learnable parameters,

see “Dene a Custom Deep Learning Layer

with Learnable Parameters” on page 2-73.

Classication Output Layer Dene a custom classication output layer

and specify a loss function.

For an example showing how to dene a

custom classication output layer and

specify a loss function, see “Dene a

Custom Classication Output Layer” on

page 2-97.

Regression Output Layer Dene a custom regression output layer

and specify a loss function.

For an example showing how to dene a

custom regression output layer and specify

a loss function, see “Dene a Custom

Regression Output Layer” on page 2-87.

2Deep Networks

2-56

Layer Templates

You can use the following templates to dene new layers.

Layer Template

This template outlines the structure of a layer with learnable parameters. If the layer

does not have learnable parameters, then you can omit the properties (learnable)

section. For an example showing how to dene a layer with learnable parameters, see

“Dene a Custom Deep Learning Layer with Learnable Parameters” on page 2-73.

classdef myLayer < nnet.layer.Layer

properties

% (Optional) Layer properties

% Layer properties go here

end

properties (Learnable)

% (Optional) Layer learnable parameters

% Layer learnable parameters go here

end

methods

function layer = myLayer()

% (Optional) Create a myLayer

% This function must have the same name as the layer

% Layer constructor function goes here

end

function Z = predict(layer, X)

% Forward input data through the layer at prediction time and

% output the result

% Inputs:

% layer - Layer to forward propagate through

% X - Input data

% Output:

% Z - Output of layer forward function

% Layer forward function for prediction goes here

end

function [Z, memory] = forward(layer, X)

% (Optional) Forward input data through the layer at training

% time and output the result and a memory value

% Inputs:

% layer - Layer to forward propagate through

Dene Custom Deep Learning Layers

2-57

% X - Input data

% Output:

% Z - Output of layer forward function

% memory - Memory value which can be used for

% backward propagation

% Layer forward function for training goes here

end

function [dLdX, dLdW1, …, dLdWn] = backward(layer, X, Z, dLdZ, memory)

% Backward propagate the derivative of the loss function through

% the layer

% Inputs:

% layer - Layer to backward propagate through

% X - Input data

% Z - Output of layer forward function

% dLdZ - Gradient propagated from the deeper layer

% memory - Memory value which can be used in

% backward propagation

% Output:

% dLdX - Derivative of the loss with respect to the

% input data

% dLdW1, ..., dLdWn - Derivatives of the loss with respect to each

% learnable parameter

% Layer backward function goes here

end

Classication Output Layer Template

This template outlines the structure of a classication output layer with a loss function.

For an example showing how to dene a classication output layer and specify a loss

function, see “Dene a Custom Classication Output Layer” on page 2-97.

classdef myClassificationLayer < nnet.layer.ClassificationLayer

properties

% (Optional) Layer properties

% Layer properties go here

end

methods

function layer = myClassificationLayer()

% (Optional) Create a myClassificationLayer

% Layer constructor function goes here

end

function loss = forwardLoss(layer, Y, T)

% Return the loss between the predictions Y and the

2Deep Networks

2-58

% training targets T

% Inputs:

% layer - Output layer

% Y – Predictions made by network

% T – Training targets

% Output:

% loss - Loss between Y and T

% Layer forward loss function goes here

end

function dLdY = backwardLoss(layer, Y, T)

% Backward propagate the derivative of the loss function

% Inputs:

% layer - Output layer

% Y – Predictions made by network

% T – Training targets

% Output:

% dLdY - Derivative of the loss with respect to the predictions Y

% Layer backward loss function goes here

end

Regression Output Layer Template

This template outlines the structure of a regression output layer with a loss function. For

an example showing how to dene a regression output layer and specify a loss function,

see “Dene a Custom Regression Output Layer” on page 2-87.

classdef myRegressionLayer < nnet.layer.RegressionLayer

properties

% (Optional) Layer properties

% Layer properties go here

end

methods

function layer = myRegressionLayer()

% (Optional) Create a myRegressionLayer

% Layer constructor function goes here

end

function loss = forwardLoss(layer, Y, T)

% Return the loss between the predictions Y and the

% training targets T

% Inputs:

% layer - Output layer

% Y – Predictions made by network

Dene Custom Deep Learning Layers

2-59

% T – Training targets

% Output:

% loss - Loss between Y and T

% Layer forward loss function goes here

end

function dLdY = backwardLoss(layer, Y, T)

% Backward propagate the derivative of the loss function

% Inputs:

% layer - Output layer

% Y – Predictions made by network

% T – Training targets

% Output:

% dLdY - Derivative of the loss with respect to the predictions Y

% Layer backward loss function goes here

end

Layer Architecture

A layer has two main components: the forward pass, and the backward pass.

During the forward pass of a network, the layer takes the output x of the previous layer,

applies some function, and then outputs (forward propagates) the result z to the next

layer.

At the end of a forward pass, the network calculates the loss L between the predictions Y

and the true target T.

During the backward pass of a network, each layer takes the derivatives of the loss with

respect to z, computes the derivatives of the loss L with respect to x, and then outputs

(backward propagates) results to the previous layer. If the layer has learnable

parameters, then the layer also computes the derivatives of the layer weights (learnable

parameters) W. The layer uses the derivatives of the weights to update the learnable

parameters.

The following gure describes the ow of data through a deep neural network and

highlights the data ow through the layer.

2Deep Networks

2-60

Layer Properties

Declare the layer properties in the properties section of the class denition.

By default, user-dened layers have three properties:

•Name – Name of the layer, specied as a character vector. Use the Name property to

identify and index layers in a network. If you do not set the layer name, then the

software automatically assigns one at training time.

•Description – One-line description of the layer, specied as a character vector. This

description appears when the layer is displayed in a Layer array. The default value is

the layer class name.

•Type – Type of the layer, specied as a character vector. The value of Type appears

when the layer is displayed in a Layer array. The default value is the layer class name.

If the layer has no other properties, then you can omit the properties section.

Learnable Parameters

Declare the layer learnable parameters in the properties (Learnable) section of the

class denition. If the layer has no learnable parameters, then you can omit the

properties (Learnable) section.

Optionally, you can specify the learn rate factor and the L2 factor of the learnable

parameters. By default, each learnable parameter has its learning rate factor and L2

factor set to 1.

You can set and get the learn rate factors and L2 regularization factors using the

following functions. You can use these functions for both built-in layers and user-dened

layers.

Dene Custom Deep Learning Layers

2-61

Function Description

setLearnRateFactor Set the learn rate factor of a learnable

parameter.

setL2Factor Set the L2 regularization factor of a

learnable parameter.

getLearnRateFactor Get the learn rate factor of a learnable

parameter.

getL2Factor Get the L2 regularization factor of a

learnable parameter.

To specify the learning rate factor and the L2 factor of a learnable parameter, use the

syntaxes layer = setLearnRateFactor(layer,'MyParameterName',value) and

layer = setL2Factor(layer,'MyParameterName',value) respectively.

To get the value of the learning rate factor and the L2 factor of a learnable parameter, use

the syntaxes getLearnRateFactor(layer,'MyParameterName') and

getL2Factor(layer,'MyParameterName') respectively.

For example, to set the learn rate factor of the learnable parameter name Alpha to 0.1,

use

layer = setLearnRateFactor(layer,'Alpha',0.1);

Forward Functions

A layer uses two functions for a forward pass: predict and forward. If the forward pass

is at prediction time, then the layer uses the predict function. If the forward pass is at

training time, then the layer uses the forward function. The forward function has an

additional output argument memory which you can use during backward propagation. You

must return a value for memory.

If you do not require two dierent functions for prediction time and training time, then

you do not need to create the forward function. By default, the layer uses predict at

training time.

The syntax for predict is Z = predict(layer,X), where X is the input data and Z is

the output of the layer forward function.

The syntax for forward is [Z,memory] = forward(layer,X), where X is the input

data, Z is the output of the layer forward function, and memory is the memory value to

2Deep Networks

2-62

use in backward propagation. memory is a required output argument and it must return a

value. If the layer does not require a memory value, then return an empty value [].

The dimensions of X depend on the output of the previous layer. Similarly, the output Z

must have the appropriate shape for the next layer.

Built-in layers output 4-D arrays with size h-by-w-by-c-by-N, except for LSTM layers and

sequence input layers, which output 3-D arrays of size D-by-N-by-S.

Fully connected, ReLU, dropout, and softmax layers also accept 3-D inputs. When these

layers get inputs of this shape, then they output 3-D arrays of size D-by-N-by-S.

These dimensions correspond to the following:

•h – Height of the output

•w – Width of the output

•c – Number of channels in the output

•N – Number of observations (mini-batch size)

•D – Feature dimension of sequence

•S – Sequence length

Backward Function

The layer uses one function for a backward pass: backward. The backward function

computes the derivatives of the loss with respect to the input data and then outputs

(backward propagates) results to the previous layer. If the layer has learnable

parameters, then backward also computes the derivatives of the layer weights (learnable

parameters). During the backward pass, the layer automatically updates the learnable

parameters using these derivatives.

To calculate the derivatives of the loss, you can use the chain rule:

∂

∂=∂

∂

∂=∂

∂

i j

Dene Custom Deep Learning Layers

2-63

The syntax for backward is [dLdX,dLdW1,…,dLdWn] =

backward(layer,X,Z,dLdZ,memory). For the inputs, X is the layer input data, Z is the

output of forward, dLdZ is the gradient backward propagated from the next layer, and

memory is the memory output of forward. For the outputs, dLdX is the derivative of the

loss with respect to the layer input data, and dLdW1,…,dLdWn are the derivatives of the

loss with respect to the learnable parameters.

The dimensions of X and Z are the same as in the forward functions. The dimensions of

dLdZ are the same as the dimensions of Z.

The dimensions and data type of dLdX are the same as the dimensions and data type of X.

The dimensions and data types of dLdW1,…,dLdWn are the same as the dimensions and

data types of W1,…,Wn respectively where Wi is the ith learnable parameter.

During the backward pass, the layer automatically updates the learnable parameters

using the derivatives dLdW1,…,dLdWn.

If you want to include a user-dened layer after a built-in layer in a network, then the

layer functions must accept inputs X which are the outputs of the previous layer, and

backward propagate dLdX with the same size as X. If you want to include a user-dened

layer before a built-in layer, then the forward functions must output arrays Z with the size

expected by the next layer. Similarly, backward must accept inputs dLdZ with the same

size as Z.

GPU Compatibility

For GPU compatibility, the layer functions must support inputs and return outputs of type

gpuArray. Any other functions used by the layer must do the same. Many MATLAB built-

in functions support gpuArray input arguments. If you call any of these functions with at

least one gpuArray as an input, then the function executes on the GPU and returns a

gpuArray. For a list of functions that execute on a GPU, see “Run Built-In Functions on a

GPU” (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a

CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For more information

on working with GPUs in MATLAB, see “GPU Computing in MATLAB” (Parallel Computing

Toolbox).

Check Validity of Layer

If you create a custom deep learning layer, then you can use the checkLayer function to

check that the layer is valid. The function checks layers for validity, GPU compatibility,

2Deep Networks

2-64

and correctly dened gradients. To check that a layer is valid, run the following

command:

checkLayer(layer,validInputSize,'ObservationDimension',dim)

where layer is an instance of the layer, validInputSize is a vector specifying the valid

input size to the layer, and dim species the dimension of the observations in the layer

input data.

For more information, see “Check Custom Layer Validity” on page 2-107.

Check Validity of Layer Using checkLayer

Check the layer validity of the example layer examplePreluLayer.

To use the layer examplePreluLayer, add the example folder to the path.

exampleFolder = genpath(fullfile(matlabroot,'examples','nnet'));

addpath(exampleFolder)

Create an instance of the layer and check its validity using checkLayer.

Specify the valid input size to be the size of a single observation of typical input to the

layer. The layer expects 4-D array inputs, where the rst three dimensions correspond to

the height, width, and number of channels of the previous layer output, and the fourth

dimension corresponds to the observations.

Specify the typical size of the input of an observation and set

'ObservationDimension' to 4.

layer = examplePreluLayer(20);

validInputSize = [24 24 20];

checkLayer(layer,validInputSize,'ObservationDimension',4)

Running nnet.checklayer.TestCase

..........

Done nnet.checklayer.TestCase

__________

Test Summary:

21 Passed, 0 Failed, 0 Incomplete.

Time elapsed: 6.863 seconds.

Dene Custom Deep Learning Layers

2-65

Here, the function does not detect any issues with the layer.

To remove the example folder, use rmpath.

rmpath(exampleFolder)

Include Layer in Network

You can use your layer in the same way as any other layer in Neural Network Toolbox.

To use the example layer examplePreluLayer, add the example folder to the path.

exampleFolder = genpath(fullfile(matlabroot,'examples','nnet'));

addpath(exampleFolder)

Create a layer array which includes the example layer examplePreluLayer.

layers = [ ...

imageInputLayer([28 28 1])

convolution2dLayer(5,20)

batchNormalizationLayer

examplePreluLayer(20)

fullyConnectedLayer(10)

softmaxLayer

classificationLayer];

Remove the example folder from the path using rmpath.

rmpath(exampleFolder)

Output Layer Architecture

At the end of a forward pass at training time, an output layer takes the predictions

(outputs) y of the previous layer, and calculates the loss L between these predictions and

the training targets. The output layer computes the derivatives of the loss L with respect

to the predictions y and outputs (backward propagates) results to the previous layer.

The following gure describes the ow of data through a convolutional neural network

and an output layer.

2Deep Networks

2-66

Loss Functions

The output layer uses two functions to compute the loss and the derivatives:

forwardLoss and backwardLoss. The forwardLoss function computes the loss L. The

backwardLoss function computes the derivatives of the loss with respect to the

predictions.

The following table describes the input arguments of forwardLoss and backwardLoss.

Input Argument Description

layer Output layer object

YPredictions made by network. These

predictions are the output of the previous

layer

TTraining targets

For classication problems, the dimensions of T depend on the type of problem. The

following table describes the dimensions of T.

Task Dimensions of T

Image classication 4-D array of size 1-by-1-by-K-by-N, where K

is the number of classes, and N is the mini-

batch size.

Dene Custom Deep Learning Layers

2-67

Task Dimensions of T

Sequence-to-label classication Matrix of size K-by-N, where K is the

number of classes, and N is the mini-batch

size.

Sequence-to-sequence classication 3-D array of size K-by-N-by-S, where K is

the number of classes, N is the mini-batch

size, and S is the sequence length.

The size of Y depends on the output of the previous layer. To ensure that Y is the same

size as T, you must include a layer that outputs the correct size before the output layer.

For example, to ensure that Y is a 4-D array of prediction scores for K classes, you can

include a fully connected layer of size K followed by a softmax layer before the output

layer.

For regression problems, the dimensions of T depend on the type of problem. The

following table describes the dimensions of T.

Task Dimensions of T

Image regression 4-D array of size 1-by-1-by-r-by-N, where r

is the number of responses, and N is the

mini-batch size.

Image-to-image regression 4-D array of size h-by-w-by-c-by-N, where h,

w, and c denote the height, width and

number of channels of the output

respectively, and N is the mini-batch size.

Sequence-to-one regression Matrix of size r-by-N, where r is the number

of responses, and N is the mini-batch size.

Sequence-to-sequence regression 3-D array of size r-by-N-by-S, where r is the

number of responses, N is the mini-batch

size, and S

For example, if the network denes an image regression network with one response and

has mini-batches of size 50, then T is a 4-D array of size 1-by-1-by-1-by-50.

The size of Y depends on the output of the previous layer. To ensure that Y is the same

size as T, you must include a layer that outputs the correct size before the output layer.

For example, for image regression with r responses, to ensure that Y is a 4-D array of the

correct size, you can include a fully connected layer of size r before the output layer.

2Deep Networks

2-68

The following table describes the output arguments of forwardLoss and

backwardLoss.

Output Argument Description

loss (forwardLoss only) Calculated loss between the predictions Y

and the true target T

dLdY (backwardLoss only) Derivative of the loss with respect to the

predictions Y.

If you want to include a user-dened output layer after a built-in layer, then

backwardLoss must output dLdY with the size expected by the previous layer. Built-in

layers expect dLdY to be the same size as Y.