05 Selection And Train Validation Sets Instructionsl

05_-selection-and-train-validation--sets_instructionsl

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 1

Model Selection and
Train/Validation/Test Sets
Just because a learning algorithm fits a training set well, that does not mean it is a
good hypothesis. It could over fit and as a result your predictions on the test set
would be poor. The error of your hypothesis as measured on the data set with
which you trained the parameters will be lower than the error on any other data
set.
Given many models with different polynomial degrees, we can use a systematic
approach to identify the 'best' function. In order to choose the model of your
hypothesis, you can test each degree of polynomial and look at the error result.
One way to break down our dataset into the three sets is:
Training set: 60%
Cross validation set: 20%
Test set: 20%
We can now calculate three separate error values for the three different sets using
the following method:
1. Optimize the parameters in Θ using the training set for each polynomial
degree.
2. Find the polynomial degree d with the least error using the cross validation
set.
3. Estimate the generalization error using the test set with , (d =
theta from polynomial with lower error);
This way, the degree of the polynomial d has not been trained using the test set.

Navigation menu