Project Instructions

User Manual:

Open the PDF directly: View PDF .
Page Count: 4

Download
Open PDF In Browser	View PDF

CARLETON UNIVERSITY
Department of Systems and Computer Engineering
SYSC/BIOM 5405 – Pattern Classification and Experiment Design
Fall Semester 2018
Working in pairs, each team will develop a pattern classification system for the same pattern
classification challenge using the same training data. This task will encompass elements from the
entire course, ranging from experiment design, to feature extraction, to classification techniques, to
reporting classification accuracy. This project may require teams to learn concepts outside of the
scope of the lectures and each team will deepen their expertise about one or more methods.
PROJECT DESCRIPTION
What is unique about this year’s challenge is that you will be predicting a healthy or diseased
state from relatively simple time series data collected on patients walking a short distance. There are
samples from subjects with one of three neuro-degenerative diseases contained in the sample data:
Amyotrophic lateral sclerosis (ALS), Parkinson’s disease, and Huntington’s disease. In patients
suffering from these degenerative diseases, motor control progressively fails over the course of the
disease. There are some promising treatments under investigation, but no cure is widely available.
Treatment focuses on improving symptoms.
In this study, force-sensitive resistors were installed in footwear with a recording device attached
to the ankle. Output is roughly proportional to force. Subjects were either “healthy” or suffered
from from one of the three diseases. For each subject, there are two time-series, one for each foot,
measured at 300 Hz for 5 minutes as the subject walked along a course. There is some additional
information available for each subject, though some of these additional data points were missing
for some subjects. The course was not the same for all subjects.
Your goals, based on these time-series data, are two-fold:
1. to predict “healthy” H or “diseased” D state, and
2. to predict the specific disease (ALS, Huntington’s HD, or Parkinson’s PD) or declare a subject
healthy H.
The first problem should be relatively easier, as a two-class problem with distinct changes in motor
activation under disease conditions.
There are a relatively small number of samples to work with.

Evaluation

Teams will be evaluated on
• the quality of all deliverables below,
• the accuracy obtained on the final unlabeled test dataset, and
• the correctness of your accuracy prediction over the test dataset.
1 of 4

Deliverables
1. Project Proposal detailing the pattern classification approach that you plan to use, including a source for an implementation of your chosen method. Your presentation must include
a description of 1 feature that you intend to use in your approach.
This will be a 4 minute presentation with approximately 6 slides.
You will be evaluated on the quality of your presentation and your progress to date. (In
other words, demonstrate that you’ve started working, have a software framework in place,
and understand the problem.)
2. The Pitch describing your approach, your predicted accuracy, and how you computed it.
Pitch your method as the best approach.
This will be a 4 minute presentation with approximately 6 slides.
Slides should cover
• a quick review of your method,
• any pre-processing and feature extraction or class imbalance handling,
• describe your approach to dealing with time-series data,
• describe your training/testing protocol, including your meta-learning strategy,
• provide your estimated accuracy including the standard deviation of your estimate and
describe your methodology for estimating your “true”/expected accuracy.
You must include two confusion matrices in your presentation: one for the two-class H/D
classification problem, and one for the four-class H/ALS/HD/PD performance.
You will be evaluated on the quality of your presentation and content.
At the conclusion of this class, all groups will be provided with the blind test data set.
3. Final Report describing the method that you have chosen to use, the source of the implementation of your method, details on training techniques and parameters used, any preprocessing of the data and feature extraction, a discussion of your approach for the two-class
and four-class problems and your testing procedures, an estimate of prediction accuracy with
and without meta-learning, and a discussion of the actual accuracy achieved over the blind
test dataset. This report should be approximately 10 pages, double-spaced including figures/tables.

3 Schedule
Project Launch: November 13, 2018 (competition announced)
Project Proposal: November 20, 2018 (presentations submitted via cuLearn)
Project Pitch: November 27, 2018 (presentations submitted via cuLearn, blind test data released)
Submit Predictions: November 30, 2018 3pm (classification of blind test data)
Results Announced: December 4, 2018 (winners glorified; prizes distributed)
Final Report: December 21, 2018 6pm (submitted via cuLearn)
2 of 4

The Dataset

The data sets will consist of a training set train.zip and a blind test set blind.zip.
The training set consists of tab separated files with two columns, the left column for left foot
force and the right column for right foot force during 5 minutes of walking. The files are named
as .tsv. There is also a train.txt file which contains additional information for each
subject including age, height, weight, gender, and gait speed. There are 43 subjects. Each subject
is identified by a unique number N and their disease/health: control1.tsv (healthy subject),
als2.tsv (ALS), hunt3.tsv (Huntington’s disease), park4.tsv (Parkinson’s disease).
The blind test set is structured in the same way but all 20 subjects are identified as blindN .tsv.
The additional information can be found in the file blind.txt.
For some subjects, information in the additional data is missing. These fields are denoted with
a “?”.

5 Detailed Instructions
• Phase 1: Determine Approach
– All teams will choose a unique pattern classification approach.
– First-come, first-served…ideas include
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.

Bayesian belief networks
feed-forward neural networks
recurrent neural networks
linear discriminants
support vector machines
k-nearest-neighbour
decision trees
radial basis function networks
probabilistic neural networks
genetic algorithms
k-means clustering
hidden Markov models
association mining
logistic regression
your own idea!

– Find an implementation in any language you like
– Prepare and deliver project proposal detailing your proposed pattern classification approach and chosen implementation framework. Demonstrate that you understand the
problem and have a clear plan on how to solve it.
– Discuss 1 feature that you intend to use in your method. Please discuss how to extract
the scalar feature from the time-series data.

3 of 4

• Phase 2: Develop the pattern classification system
– Structure your investigation using the following steps:
1. Data pre-processing: normalization, outlier detection, censoring of bad data, handling of missing data, records of varying length, etc.
2. Feature extraction/selection: you may wish to generate many features from the
time-series data provided to you and ultimately only use a subset in your classifier.
3. Partition data and establish experiment design: train/validation/test sets, balancing
classes (optional).
4. Train classifier: approach used, parameters required, how they were tuned
5. Testing and expected accuracy: what is your predict accuracy for each of the two
problems, how was it computed, and provide a standard error / standard deviation
on your estimate. For example, “My accuracy will be 0.73 ± 0.04.”
6. Meta-learning approaches: Implement at least one meta-learning strategy and investigate its effect on accuracy. For example: CME-voting, bagging, boosting.
• Phase 3: Pitch method to class
– Present to class
– Predict accuracy you will get on the blind test dataset
∗ discuss expected performance both with and without meta-learning, but ultimately
choose 1 approach and 1 estimate
∗ Include 2 Confusion Tables in your presentation
– The blind test data is released. Keep in mind the size of the dataset. We have held back
approximately 30% of the total data... Beware of runtime issues (you have < 48 hours
to process all data!
• Phase 4: Competition
– Provide single best set of predictions for blind test data to the course instructor.
– Course instructor will evaluate each submission.
∗ Score1 will be overall accuracy Acc = Acc,h/d · Acc,h/als/hd/pd
∗ Score2 will be probability of observing this accuracy given your estimated accuracy
and standard deviation, assuming a normal distribution.
– Results announced: Laugh, cry, acceptance speeches…; Points for how well you do
(score1), points for how close your prediction is to your actual performance on the
test data (score2).
• Phase 5: Final report
– Prepare a final report (10 pages double-spaced including figures) describing entire effort
and results. Discuss how you would change your approach now that you have seen the
other approaches and now that you know how well you did.

End of Project Description
4 of 4

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 4
Creator                         :  XeTeX output 2018.11.12:1857
Producer                        : xdvipdfmx (20170318)
Create Date                     : 2018:11:12 18:57:49-05:00

EXIF Metadata provided by EXIF.tools

Project Instructions

Navigation menu

Versions of this User Manual:

Views

Navigation