ECE9309 9039 Exam Instructions

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 3

DownloadECE9309 9039  Exam Instructions
Open PDF In BrowserView PDF
Western University
Faculty of Engineering
Electrical and Computer Engineering Department
ECE 9309/9039 Machine Learning
Final Exam
April 2, 2019

Exam Instructions:
• This exam is a take-home exam.
• The exam takes place from Tuesday, April 2, 2019 at 5:00 pm to Friday, April 5, 2019 at
9:00 pm.
• You are allowed to use any software like Matlab, Python, ...etc for solving the exam questions.
• You may consult with your classmates or friends for help on the exam, but you may NOT share or
show your codes/solutions. Showing one another the solutions is NOT a consultation.
• No clarifications or corrections will be provided. If you believe that there is an error, inconsistency,
or omission in the exam, please state your assumptions about the issue within your discussion of
that issue.
• Any external source of code and ideas must be cited to give credit to the original source.
• Your final submission to this final exam should include:
– All codes/scripts that are used for solving the exam questions.
– Clear documentation that includes any explanations, comments or observations of the exam
solutions along with the input commands and output (files/graphs) from these programs. The
submitted document has to be in a pdf format.
• All

files

should

be

compressed

into

a

zip

file

with

the

naming

convention:

studentID FirstName LastName.zip and submitted it on OWL in Final Exam Field under
the Assignment section. An example of a submitted file name is: i.e., 250XXXXXX John Smith.zip.
HONOR STATEMENT:
BY SUBMITTING THIS EXAM THROUGH OWL, I AFFIRM ON MY HONOR THAT I AM AWARE
OF ALL EXAM INSTRUCTION, AND I HAVE NOT SHARED MY CODES/SOLUTIONS WITH OTHERS

1

Dataset Description
Attached with the exam instructions, you will find the datasets.zip file. After unzipping the file, you
will find several .csv files, where each file represents real-world measurement data of a heat experiment
inside a steel furnace. Each file has a prefix number representing the experiment heat ID. File names
in the given dataset have two formats, those end with ALARM OUT.csv which corresponds to experiments
with no anomalies, and on the other hand, heat experiments containing anomalies have a suffix name
“ ALARM OUT tag.csv” , where the anomaly tags are added in the last column of each file (1 = anomaly,
0 = normal). In the datasets, the features are the vibration measurements in columns A, B, . . . , H which
correspond to (X1, X2, ..., X8) measurement signals. Each feature represents a vibration signal inside the
furnace at several frequency bands. Data should be considered only when it is in steady-state conditions.
This information is in column I (“ Sds Armed”), where steady-state data is only when “ Sds Armed=1”.
Column J represents the anomaly tags. Each example raw is a measurement recorded at a time instance,
which is considered a time-series data measurements.

Data Preparation [20 points]
• Question 1) - Filter all “Normal Experiments” by taking into account only active examples “SDS
Armed = 1”, and then, merge them in a new file named as “merged exp normal.csv”. Write a script
that performs this task and indicate the number of examples of the merged dataset [10 points].
• Question 2) - Filter all “Experiments with Anomalies” by taking into account only active examples
“SDS Armed = 1” similar to the requirements in Questions 1, and then, merge them in a new file named
as “merged exp contains anomalies.csv”. Write a script that performs this task and indicate the
number of examples of the merged dataset [10 points].

Building A Statistical-Based Anomaly Detection Algorithm [40 points]
• Question 3) - Since the merged exp contains anomalies.csv contains anomalies, apply any significance test to rank the significance of each feature (X1, X2, ..., X8) as being a distinctive feature of
anomalies [5 points].
• Question 4) - Model the normal process “merged exp normal.csv” using Gaussian distribution.
Assume that the features are independent. Characterize your model using the following cases:
– Consider all features (X1, X2, ..., X8) [5 points].
– Mark the most important two features (obtained from the significance test in Question 3) [2
points].
– The projection of the feature space into the first two components using Principle Component
Analysis (PCA) (obtained from the significance test in Question 3) [5 points].
• Question 5) - Model the same normal process “merged exp normal.csv” using Gaussian distribution
with all requirements in Question 4. However, assume that the features are dependent [10 points].
Hint: Think about the co-variance matrix!

2

• Question 6) Develop an anomaly alarm by adjusting a threshold  to your Gaussian models obtained
in Questions 3 and 4, and accordingly, generate an alarm accordingly. Use any experiment that contains
anomaly as a test case [8 points].
• Question 7) Plot the generated alarm, true anomaly flags (given from the dataset), and the feature
X1 [5 points].

Alternative Ways For Anomaly Detection [40 points]
• Question 8) Apply one supervised learning approach for classifying the events to normal and anomalies [10 points].
• Question 9) Apply any clustering based algorithm you learn in the class, i.e., (hard and soft clustering
with K-means, EM, ..., etc.) to decouple the anomaly data from the normal ones. Is there a direct
mapping to the true anomaly tags? discuss your findings [10 points].
• Question 10) Compare the Gaussian-based anomaly detection algorithm, the supervised learning
approach you picked, and the clustering approach in terms of [20 points]:
– Detection capabilities (use the relevant metrics discussed in the class).
– Time complexity and memory requirements during the training phase.
– Time complexity and memory requirements during the execution phase.

Bonus Questions [20 points]
• Question 11) Optimize the parameter  from Question 6 with the objective of maximizing the
detection rate and minimizing the false alarm rate. Compare the results before and after optimizing 
[15 points]. Particularly, consider the following obectives “jointly”:
– Reduce the number of generated false alarms.
– Increase the number true anomalies discovered.
• Question 12) If the features in the Gaussian-based approach do not follow the Gaussian distribution,
apply a suitable transformation to make better suit the Gaussian shape. Compare the results before
and after the transformation [5 points].

3



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 3
Producer                        : pdfTeX-1.40.16
Creator                         : TeX
Create Date                     : 2019:04:02 16:24:25-04:00
Modify Date                     : 2019:04:02 16:24:25-04:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015) kpathsea version 6.2.1
EXIF Metadata provided by EXIF.tools

Navigation menu