Whiteboard Design Session Student Guide Big Data And Visualization

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 17

DownloadWhiteboard Design Session Student Guide-Big Data And Visualization
Open PDF In BrowserView PDF
Microsoft Cloud
Workshop
Big data and visualization
Whiteboard design session student guide

January 2018

Information in this document, including URL and other Internet Web site references, is subject to change without notice.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people,
places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain
name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable
copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may
be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic,
mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering
subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the
furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual
property.
The names of manufacturers, products, or URLs are provided for informational purposes only and Microsoft makes no
representations and warranties, either expressed, implied, or statutory, regarding these manufacturers or the use of the
products with any Microsoft technologies. The inclusion of a manufacturer or product does not imply endorsement of
Microsoft of the manufacturer or product. Links may be provided to third party sites. Such sites are not under the control
of Microsoft and Microsoft is not responsible for the contents of any linked site or any link contained in a linked site, or
any changes or updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the inclusion of any link
does not imply endorsement of Microsoft of the site or the products contained therein.
© 2018 Microsoft Corporation. All rights reserved.
Microsoft and the trademarks listed at https://www.microsoft.com/en-us/legal/intellectualproperty/Trademarks/Usage/General.aspx are
trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.

Contents
Big data and visualization whiteboard design session student guide .............................................. 1
Abstract and learning objectives......................................................................................................................................................... 1
Step 1: Review the customer case study .......................................................................................................................................... 2
Step 2: Design a proof of concept solution .................................................................................................................................... 9
Step 3: Present the solution ............................................................................................................................................................... 11
Wrap-up .................................................................................................................................................................................................... 12
Additional references ........................................................................................................................................................................... 13

Microsoft Cloud Workshop

Big data and visualization

Big data and visualization
whiteboard design session
student guide
Abstract and learning objectives
In this workshop, you will complete a web app using Machine Learning to predict travel delays given flight delay data and

weather conditions, plan the bulk data import operation, followed by preparation tasks, such as cleaning and
manipulating the data for testing, and training your Machine Learning model.
By attending this workshop, you will be better able to build a complete Azure Machine Learning (ML) model for
predicting if an upcoming flight will experience delays. In addition, you will learn to:
•
•
•

Integrate the Azure ML web service in a Web App for both one at a time and batch predictions
Analyze batch data with SQL Data Warehouse
Visualize batch predictions on a map using Power BI

This whiteboard design session is designed to provide exposure to many of Microsoft’s transformative line of business
applications built using Microsoft big data and advanced analytics. The goal is to show an end-to-end solution, leveraging
many of these technologies, but not necessarily doing work in every component possible. The architecture includes:
•
•
•
•
•
•

Azure Machine Learning (Azure ML)
Azure Data Factory (ADF)
Azure Storage
HDInsight Spark
Power BI Desktop
Azure App Service

1|Page

©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

Step 1: Review the customer case study
Outcome
Analyze your customer’s needs.

Facilitator/subject matter expert (SME) presentation of customer case study
Timeframe: 15 minutes
Directions: With all participants in the session, the facilitator/SME presents an overview of the customer case study along
with technical tips.
1.
2.
3.

Meet your table participants and trainer.
Read all of the directions for Steps 1–3 in the Student guide.
As a table team, review the following customer case study.

©2018 Microsoft Corporation

2|Page

Microsoft Cloud Workshop

Big data and visualization

Customer situation
AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they
are always looking for ways to differentiate themselves and provide added value to their corporate customers.
AWT is investigating ways that they can capitalize on their existing data assets to provide new insights that provide them a
strategic advantage against their competition. In planning their product, they heard much fanfare about machine learning
and came up with the idea of using predictive analytics to help customers best select their travels based on the likelihood
of a delay. When reviewing their customer transaction histories, they discovered that their most premium customers often
book their travel within 7 days of departure. In speaking with customer service, they learned that these customers often
ask questions like, “I don’t have to be there until Tuesday, so is it better for me to fly out on Sunday or Monday?”
While there are many factors that customer service uses to tailor their guidance to the customer (such as cost and travel
duration), AWT believes an innovative solution might come in the form of giving the customer an assessment of the risk of
encountering flight delays. For low risk flights, the customer may choose to book with a narrower travel window, giving
them more precious time at home and less on the road spent arriving too early to a destination. AWT is interested in
applying data science to the problem to discover if the weather forecast coupled with their historical flight delay data
could be used to provide a meaningful input into the customer’s decision-making process.
AWT plans to pilot this solution internally, whereby the small population of customer support who service AWT’s premium
tier of business travelers would begin using the solution and offering it as an additional data point for travel optimization.
They would like to provide their customer support agents a web-based solution that enables them to map the predicted
delays for a particular customer’s departure airport(s) of choice.
AWT has over 30 years of historical flight data provided to them by the United States Department of Transportation
(USDOT), which among other data points includes flight delay information for every flight. The data arrives in flat, comma
separated value (CSV) files with a schema of the following:
(Year, Month, DayOfMonth, Airline, TailNum, FlightNum, OriginAirport, DestinationAirport, ScheduledDepartureTime,
ActualDepartureTime, ScheduledArrivalTime, DepartureDelay, AirTime, Distance, Cancelled, CancellationCode)
In addition, for all data since 2003, each row includes new fields describing the type of delay experienced, where the value
for each type is the number of minutes the delay was experienced for that source of delay:
(CarrierDelay, WeatherDelay, NationalAirSystemDelay, SecurityDelay, LateAircraftDelay)
They receive updates to this data monthly, where the flight data and other related files total about 1 GB. In total their
solution currently manages about 2 TB worth of data.
Additionally, they receive current and forecasted weather data from a third-party service. This service gives them the
ability to receive weather forecasts around any airport, and provides forecasts up to 10 days. They have a history of the
historical weather condition for each flight as CSV files, but acquiring the weather forecasts requires a call to a REST API
that returns a JSON (JavaScript Object Notation) structure. Each airport of interest needs to be queried individually. An
excerpt of the weather forecast for a single day at the Seattle-Tacoma International airport is as follows:
{
"date": {
"epoch": "1444701600",
"pretty": "7:00 PM PDT on October 12, 2015",
3|Page

©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

"day": 12,
"month": 10,
"year": 2015,
"yday": 284,
"hour": 19,
"min": "00",
"sec": 0,
"ampm": "PM",
"tz_short": "PDT",
"tz_long": "America/Los_Angeles"
},
"high": {
"fahrenheit": "64",
"celsius": "18"
},
"low": {
"fahrenheit": "54",
"celsius": "12"
},
"conditions": "Overcast",
"maxwind": {
"mph": 15,
"kph": 24,
"dir": "SSW",
"degrees": 209
},
"avewind": {
"mph": 10,
"kph": 16,
"dir": "SSW",
"degrees": 209
},
"avehumidity": 70,
©2018 Microsoft Corporation

4|Page

Microsoft Cloud Workshop

Big data and visualization

"maxhumidity": 0,
"minhumidity": 0
}
Jack Tradewinds, the CIO of AWT, is looking to modernize their data story. He has heard a great deal of positive news
about Spark SQL on HDInsight and its ability to query exactly the type of files he has in a performant way, but also in a
way that is more familiar to his analysts and developers because they are all familiar with the SQL syntax that it supports.
He would love to understand if they can move this data away from their on-premises datacenter into the cloud, and
enhance their ability to load, process, and analyze it going forward. Given his long-standing relationship with Microsoft, he
would like to see if Azure can meet his needs.

5|Page

©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

Customer needs
1.

Want to modernize their analytics platform, without sacrificing the ability to query their data using SQL.

2.

Need an approach that can store all of their data, including the unmodified source data and the cleansed data
from which they query for production purposes.

3.
4.
5.

Want to understand how they will load their large quantity of historical data into Azure.
Need to be able to query the weather forecast and use it as input to their flight delay predictions.
Desire a proof of concept machine learning model that takes as input their historical data on flight delays and
weather conditions in order to identify whether a flight is likely to be delayed or not.
Need web-based visualizations of the flight delay predictions.

6.

©2018 Microsoft Corporation

6|Page

Microsoft Cloud Workshop

Big data and visualization

Customer objections
1.

Does Azure offer a machine learning solution that does not require a PhD in statistics?

2.

We have heard that creating a machine learning model takes a month to build and another 2–3 months to
operationalize so that it is useable from our production systems. Is this true?

3.

Can we query flat files in the file system using SQL?

4.

Does Azure provide anything that would speed up querying (and exploration) of files in HDFS?

5.

Does Azure provide any tools for visualizing our data? Ideally access to these could be managed with Active
Directory.

6.

While our Proof of Concept (PoC) does not have any sensitive data, if it is successful we would like to include
customer data that contains personally identifiable information (PII) and transaction history so we could achieve
new insights combining our flight delay predictions with our customers’ profiles. Are there any additional services
in the Azure Marketplace we could use to identify data loaded that contains PII, monitor access to sensitive data,
and protect the data at rest (via encryption or masking)?

7.

Is HDInsight our only option for running SQL on Hadoop solutions in Azure?

8.

We have heard of Azure Data Lake, but we are not clear about whether this is currently a good fit for our PoC
solution, or whether we should be using it for interactive analysis of our data.

9.

We’d like our operationalized models to be flexible in the inputs they support. In some cases, we want to provide
both the flight and weather data to get a prediction. In others we just want to provide flight data and have the
weather looked up. Is this possible?

7|Page

©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

Infographic for common scenarios

©2018 Microsoft Corporation

8|Page

Microsoft Cloud Workshop

Big data and visualization

Step 2: Design a proof of concept solution
Outcome
Prepare to present a solution to the target customer audience in a 15-minute chalk-talk format.
Timeframe: 60 minutes
Business needs
Directions: With all participants at your table, answer the following questions and list the answers on a flip chart.
1.

Who should you present this solution to? Who is your target customer audience? Who are the decision makers?

2.

What customer business needs do you need to address with your solution?

Design
Directions: With all participants at your table, respond to the following questions on a flip chart.
High-level architecture
1.

Without getting into the details (the following sections will address the details), diagram your initial vision for
handling the top-level requirements for data loading, data preparation, storage, machine learning modeling, and
reporting. You will refine this diagram as you proceed.

Data loading
1.

2.

How would you recommend that AWT get their historical data into Azure? What services would you suggest and
what are the specific steps they would need to take to prepare the data, to transfer the data, and where would the
loaded data land?
Update your diagram with the data loading process with the steps you identified.

Data preparation
1.
2.

3.
4.
5.

What service would you recommend AWT capitalize on to explore the flat files they get from the USDOT using
SQL?
If you suggested HDInsight, what specific configuration would you use? What components of the Hadoop stack
would you use to allow AWT analysts to query and prep the data? How would they author and execute these
data prep tasks?
If you suggested SQL Data Warehouse (DW), explain how you would configure the SQL DW instance.
Why did you recommend HDInsight over SQL Data Warehouse or vice versa?
How would you suggest AWT integrate weather forecast data?

Machine learning modeling
1.

What technology would you recommend that AWT use for implementing their machine learning model?

2.

How would you guide AWT to load data, so it can be processed by the machine learning model?

3.

What category of machine learning algorithm would you recommend to AWT for use in constructing their model?
For this scenario your option is clustering, regression or two-class classification. Why?

4.

Assuming you selected an algorithm that requires training, address the following model design questions:
a.

9|Page

What is the high-level flow of your machine learning model? Diagram this.
©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

b. What attributes of the flight and weather data do you think AWT should use in predicting flight delays?
How would you recommend that AWT identify the columns that provide the most predictive value in
determining if a flight will be delayed? Be specific on the particular modules or libraries they could use
and how they would apply them against the data.
c.

Some of the data may need a little touching up: columns need to be removed, data types need to be
changed. How would these steps be applied in your model?

d. How would you recommend AWT measure the success of their model?
Operationalizing machine learning
1.

How can AWT release their model for production use and avoid their concerns about extremely long delays
operationalizing the model? Be specific on how your model is packaged, hosted, and invoked.

2.

AWT has shown interest in not only scoring a flight at a time (based on a customer’s request), but also doing
scoring in large chunks so that they could show summaries of predicted flight delays across the United States.
What changes would you need to make to your ML model to support this?

Visualization and reporting
1.
2.

Is Power BI an option for AWT to use in visualizing the flight delays?
If so, explain:
a. How would AWT load the data and plot it on a map? What specific components would you use
and how would you configure them to display the data?
b. If they need to make minor changes, such as a change to the data types of a column in the model,
how would they perform this in Power BI?
c. How could they secure access to these reports to only their internal customer service agents?

Prepare
Directions: With all participants at your table:
1.

Identify any customer needs that are not addressed with the proposed solution.

2.

Identify the benefits of your solution.

3.

Determine how you will respond to the customer’s objections.

Prepare a 15-minute chalk-talk style presentation to the customer.

©2018 Microsoft Corporation

10 | P a g e

Microsoft Cloud Workshop

Big data and visualization

Step 3: Present the solution
Outcome
Present a solution to the target customer audience in a 15-minute chalk-talk format.
Presentation
Timeframe: 30 minutes
Directions
1.
2.
3.
4.
5.
6.
7.

Pair with another table.
One table is the Microsoft team and the other table is the customer.
The Microsoft team presents their proposed solution to the customer.
The customer makes one of the objections from the list of objections.
The Microsoft team responds to the objection.
The customer team gives feedback to the Microsoft team.
Tables switch roles and repeat Steps 2–6.

11 | P a g e

©2018 Microsoft Corporation

Microsoft Cloud Workshop

Big data and visualization

Wrap-up
Timeframe: 15 minutes
•

Tables reconvene with the larger group to hear a SME share the preferred solution for the case study.

©2018 Microsoft Corporation

12 | P a g e

Microsoft Cloud Workshop

Big data and visualization

Additional references
Item
Infographic

Description
Hi-resolution version of data analytics blueprint

Machine
Learning

Azure ML algorithm cheat sheet

Azure Data
Factory

What is Azure Data Factory?

HDInsight Spark

Overview: Apache Spark on HDInsight

Power BI

Power BI overview

Travel data

Sample data source

Weather data

Bureau of Transportation Statistics, United States
Department of Transportation
Database: Airline On-Time Performance Data
Table: On-Time Performance Table
Sample REST API for weather forecasts

ARM Templates

13 | P a g e

Understand the structure and syntax of ARM
templates

Links
https://msdn.microsoft.com/dn630664 fbid=rVymR_3WSRo
https://azure.microsoft.com/enus/documentation/articles/machine-learningalgorithm-cheat-sheet/
https://docs.microsoft.com/azure/datafactory/introduction
https://azure.microsoft.com/enus/documentation/articles/hdinsight-apachespark-overview/
https://support.powerbi.com/knowledgebase
/articles/430814-get-started-with-power-bi
http://www.transtats.bts.gov/Tables.asp?DB_I
D=120

http://www.wunderground.com/weather/api/
d/docs
https://docs.microsoft.com/azure/azureresource-manager/resource-groupauthoring-templates

©2018 Microsoft Corporation



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : No
Page Count                      : 17
Language                        : en-US
Tagged PDF                      : Yes
XMP Toolkit                     : 3.1-701
Producer                        : Microsoft® Word 2016
Creator Tool                    : Microsoft® Word 2016
Create Date                     : 2018:02:19 16:06:39-05:00
Modify Date                     : 2018:02:19 16:06:39-05:00
Document ID                     : uuid:A6FF76BE-5926-40DB-BD38-6D3460D4643F
Instance ID                     : uuid:A6FF76BE-5926-40DB-BD38-6D3460D4643F
Creator                         : Microsoft® Word 2016
EXIF Metadata provided by EXIF.tools

Navigation menu