Microsoft STP ML On AWS Technical Tech Participant Guide

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 70

DownloadMicrosoft  - STP-ML On AWS-Technical STP ML Tech Participant Guide
Open PDF In BrowserView PDF
Machine Learning on AWS - Technical
AWS Solutions Training for Partners

Vijay
AWS Partner Trainer

Machine Learning
Prediction is the process of filling in missing information; it uses data
you have to generate data you don’t have.

Learning

Language

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Perception

Problem
Solving

Insight

The Artificial Intelligence Landscape

Five Tribes / Two Breakthroughs
Tribe

Origins

Algorithm

Bayesians

Statistics

Probabilistic

Analogizers

Psychology

Kernel

Symbolists

Logic

Inverse Deduction

Evolutionaries

Biology

Genetic

Connectionists

Neuroscience

Back Propagation

• Computer Vision : CNNs
• Static / Unstructured

• Natural Language Processing : RNNs / LSTM
• Sequential / Structured

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

AWS Mission

Put machine learning in the hands of every developer, data scientist
and architect

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Customers Running ML on AWS

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The AWS Machine Learning Stack
Application
Services

Platform
Services

Vision Amazon Rekognition
Image

Speech

Amazon Polly

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI

TensorFlow

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

The AWS Machine Learning Stack
Application
Services
Platform
Services

Vision Amazon Rekognition
Image

Speech

Amazon Polly

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI
TensorFlo
w

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

Demo 1: Vision Services

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The AWS Machine Learning Stack
Application
Services
Platform
Services

Vision Amazon Rekognition
Image

Speech

Amazon Polly

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI
TensorFlo
w

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

Demo 2: Language Services

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The AWS Machine Learning Stack
Application
Services
Platform
Services

Vision Amazon Rekognition

Speech

Image

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Polly

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI
TensorFlo
w

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

The AWS Machine Learning Stack
Application
Services
Platform
Services

Vision Amazon Rekognition

Speech

Image

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Polly

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI

TensorFlow

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

Demo 3: Deep Learning AMI

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The AWS Machine Learning Stack
Application
Services
Platform
Services

Vision Amazon Rekognition
Image

Speech

Amazon Polly

Translate
Comprehend

Amazon
Machine
Learning

AWS DeepLens

Frameworks
& Infrastructure

Lex

Transcribe

Amazon Rekognition
Video

Amazon
SageMaker

Language

Amazon EMR
Spark

Amazon
Mechanical
Turk

AWS Deep Learning AMI

TensorFlow

Apache
MXNet

Compute

Gluon

Cognitive
Toolkit

GPU - P3

Caffe

Keras

AWS Greengrass

PyTorch

Chainer

Mobile

Amazon EC2 P3 Instances
The fastest, most powerful GPU instances in
the cloud
• Up to 8 NVIDIA Tesla V100 GPUs

Airbnb

• 1 PetaFLOPs of computational
performance

Toyota Research
Institute

• 300 GB/s GPU-to-GPU communication
(NVLink)

OpenAI

• 16GB GPU memory with 900 GB/sec peak
bandwidth

• 14x better than P2

• 9X better than P2

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Amazon EC2 P3 Instances

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Amazon EC2 P3 Instances

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

AWS Deep Learning Amazon
Machine Image (AMI)
• Get started quickly with easy-to-launch
tutorials
• Hassle-free setup and configuration
• Pay only for what you use – no additional
charge for the AMI
• Accelerate your model training and
deployment
• Support for popular deep learning
frameworks
TensorFlow, MXNet, Gluon, Keras, Caffe2, PyTorch, Zendesk, Matric Analytics, SCDM, etc.
© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Amazon ML Solutions Lab

Lots of companies doing
Machine Learning

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Lack ML
expertise

Unable to unlock business
potential

Amazon ML Solutions Lab

Lots of companies doing
Machine Learning

Lack ML
expertise

Unable to unlock business
potential

Amazon ML Solutions
Lab provides the
missing ML expertise
Brainstorming
© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Modeling

Education

Amazon ML Lab Customers
Johnson & Johnson
Toyota Research Institute
Washington Post

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The Machine Learning Process

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

The Machine Learning Process
Business Problem
ML Problem
Framing
Monitoring
& Debugging

Data Collection
Data Integration

Feature
Engineering

Data Preparation

Model Training
& Parameter Tuning

Data Visualization
& Analysis
Data
Augmentation

Feature
Augmentation

Model Evaluation

Yes
No
Are business
goals met?

Predictions

Model Deployment

The ML Process

Integration: The Data Architecture

Build the data
platform:

• Amazon Simple
Storage Service
(Amazon S3)
• Amazon Athena

Business Problem
ML Problem
Framing
Monitoring
& Debugging

Data Collection
Data Integration

Feature
Engineering

• Amazon Redshift

Data Preparation

• AWS Glue

Data Visualization
& Analysis

Model Training
& Parameter Tuning

• Amazon EMR

Data
Augmentation

Feature
Augmentation

Model Evaluation

No

Yes
Are business
goals met?

Predictions

Model Deployment

The ML Process

The Model Training: Undifferentiated Heavy Lifting

• Setup and
Manage

•
•
•

•

• Notebook
Environments
• Training
Clusters
Write Data
Connectors
Scale ML
algorithms to
large datasets
Distribute ML
training
algorithm to
multiple
machines
Secure model
artifacts

Business Problem
ML Problem
Framing
Monitoring
& Debugging

Data Collection
Data Integration

Feature
Engineering

Data Preparation

Model Training
& Parameter Tuning

Data Visualization
& Analysis
Data
Augmentation

Feature
Augmentation

Model Evaluation

Yes
No
Are business
goals met?

Predictions

Model Deployment

The ML Process

DevOps: Undifferentiated Heavy Lifting

• Setup and Manage
Inference Clusters

Business Problem

• Manage and Scale
Model Inference
APIs

ML Problem
Framing

• Monitor and
Debug Model
Predictions
• Models versioning
and performance
tracking
• Automate New
Model version
promotion to
production (A/B
testing)

Monitoring
& Debugging

Data Collection
Data Integration

Feature
Engineering

Data Preparation

Model Training
& Parameter Tuning

Data Visualization
& Analysis
Data
Augmentation

Feature
Augmentation

Model Evaluation

Yes
No
Are business
goals met?

Predictions

Model Deployment

Why Amazon SageMaker?
You Only Have to
Write Business Logic

Business Problem
ML Problem
Framing
Monitoring
& Debugging

Data Collection
Data Integration

Feature
Engineering

Data Preparation

Model Training
& Parameter Tuning

Data Visualization
& Analysis
Data
Augmentation

Feature
Augmentation

Model Evaluation

Yes
No
Are business
goals met?

Predictions

Model Deployment

Amazon SageMaker

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

A Fully-Dockerized Lifecycle

From Discovery to Development and Deployment

Data Scientists

Model Artifacts

Amazon SageMaker

Amazon S3

Training Data

Training Algorithm

Inference Engine

Amazon Elastic
Container
Registry

A Fully-Dockerized Lifecycle

From discovery to development and deployment

Developers and Operations

EndPoint

Amazon SageMaker

Amazon S3

Training Data

Model Artifacts

Training Algorithm

Inference Engine

Amazon Elastic
Container Registry

A Fully-Dockerized Lifecycle

From discovery to development and deployment

Delighted Customers

API Gateway
Identification
Authorization
Logging
Analytics

Predictive Model

Amazon SageMaker

Launch Customers

Intuit
Digital Globe
ZipRecruiter
Hotels.com
Thomson Reuters
© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Customer Example: Intuit

INTUIT

“With Amazon SageMaker, we can
accelerate our Artificial Intelligence
initiatives at scale by building and
deploying our algorithms on the
platform. We will create novel largescale machine learning and AI
algorithms and deploy them on this
platform to solve complex problems
that can power prosperity for our
customers.”

Ashok Srivastava, Chief Data
Officer, Intuit
© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Key Benefits of Amazon SageMaker at
Intuit

From
Ad-hoc setupEasy
anddata
management
explorationof notebook
in SageMaker
notebooks
environments
Building
for flexibility
Limited around
choicesvirtualization
for model deployment

Competing for compute resources across
Auto-scalable model hosting environment
teams

To

Fraud Detection using SageMaker
Data Collection

APACHE KAFKA
&
SPARK
STREAMING

Calculate Features

Feature Store

Model Training

Model Hosting

INTUIT
Reader

Lookup

Cleansing
Processor

Amazon EMR

Training

Amazon SageMaker

Client Service

Customer Example: DigitalGlobe

DigitalGlobe

“As the world’s leading provider of high-resolution Earth
imagery, data and analysis, DigitalGlobe works with enormous
amounts of data every day. DigitalGlobe is making it easier for
people to find, access, and run compute against our entire
100PB image library, which is stored in AWS’s cloud, to apply
deep learning to satellite imagery. We plan to use Amazon
SageMaker to train models against petabytes of Earth
observation imagery datasets using hosted Jupyter notebooks,
so DigitalGlobe's Geospatial Big Data Platform (GBDX) users
can just push a button, create a model, and deploy it all within
one scalable distributed environment at scale.”

Dr. Walter Scott, CTO of Maxar Technologies and
founder of DigitalGlobe

Customer Example: ZipRecruiter

ZipRecruiter

“We’re focused on making it faster and easier than ever
to hire and get hired, training our machine learning
algorithms against hundreds of millions of historical
transactional activities in order to deliver highly relevant
job matches as quickly as possible. Amazon SageMaker
provided us with an answer to problems we had with ML
workflow management, allowing us to train, evaluate
and deploy models in a flexible way. In addition,
Amazon SageMaker's modularity provides the ability to
build and create models independently, which is a
compelling feature for ZipRecruiter.”

Avi Golan, VP of Engineering, ZipRecruiter

Amazon SageMaker’s Components

Amazon SageMaker
1

2

I
Notebook Instances

3

4

I

I

I

Algorithms

ML Training Service

ML Hosting Service

SageMaker Notebook Instances
Zero Setup for Exploratory Data Analysis

Just add data!

Authoring &
Notebooks

Access to S3
Data Lake

ETL Access to AWS
Database services

Recommendations/Personalization
Fraud Detection
Forecasting
Image Classification
Churn Prediction
Marketing Email/Campaign
Targeting
• Log processing and anomaly
detection
• Speech to Text
• More…
•
•
•
•
•
•

Demo 4: A simple Jupyter Notebook

Pythagorean Theorem

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Demo 5: Predicting AWS Spot Pricing

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

SageMaker Built-in Algorithms
10x Faster

Streaming datasets, for Train faster, in a single
cheaper training
pass

Greater reliability on
extremely large
datasets

Choice of several ML
algorithms

SageMaker Built-in Algorithms

Time vs. Money

Distributed, with
Strong Machines

$$$$

Single
Machine

$$$
$$
$
Minutes

Hours

Days

Weeks

Months

SageMaker Built-in Algorithms

Memory

Time/Cost

Streaming

Data Size

Data Size

SageMaker Built-in Algorithms
Streaming

GPU

State

SageMaker Built-In Algorithms
Distributed Shared State

GPU

Local
State

GPU

Local
State

GPU

Local
State

Shared
State

SageMaker Built-in Algorithms

Cost vs. Time

$$$$
$$$

Best Alternative

$$

Amazon SageMaker

$
Minutes

Hours

Days

Weeks

Months

Infinitely Scalable ML Algorithms

Linear Learner

Classification (F1 Score)
SageMaker
Other
0.980
0.981
0.870
0.930
0.997
0.997
0.978
0.964
0.914
0.859
0.470
0.472
0.903
0.908
0.508
0.508

30 GB datasets for web-spam and web-url classification
1.375

1.1

0.825

Cost in Dollars

Regression (mean squared error)
SageMaker
Other
1.02
1.06
1.09
1.02
0.332
0.183
0.086
0.129
83.3
84.5

0.55

0.275

0.
0.

7.5

15.

22.5

30.

Billable time in Minutes
sagemaker-url

sagemaker-spam

other-url

other-spam

Factorization Machines

Log_loss F1 Score

Click Prediction 1 TB advertising dataset,
m4.4xlarge machines, perfect scaling.

Seconds
$200.00

0.494

0.277

820

Other (10 Iter)

0.516

0.190

650

Other (20 Iter)

0.507

0.254

1300

Other (50 Iter)

0.481

0.313

3250

$150.00

Cost in Dollars

SageMaker

$100.00

50 40 30
20
machines machines

$50.00

10
machines

$1.

2.75

4.5

Billable Time in Hours

6.25

8.

K-Means Clustering

Images
9GB
Videos
27GB
Advertising
127GB
Synthetic
1100GB

SageMaker
1.18E3
1.00E3
9.18.E2
3.29E2
2.72E2
2.17E2
2.19E2
2.03E2
1.86E2
1.72E7
1.30E7
1.03E7
3.81E7
3.51E7
2.81E7

Other
1.18E3
9.77E2
9.03E2
3.28E2
2.71E2
Failed
2.18E2
2.02E2
1.85E2
Failed
Failed
Failed
Failed
Failed
Failed

Running Time vs. Number of Clusters
8

Billable Time in Minutes

Text
1.2GB

k
10
100
500
10
100
500
10
100
500
10
100
500
10
100
500

6

4

~10x Faster!
2

0
10

100

500

Number of Clusters
sagemaker

other

Principal Component Analysis (PCA)
Cost vs. Time

Throughput and Scalability

5.

Cost in Dollars

3.75

Mb/Sec/Machine

110.00

More than 10x faster
at a fraction the cost!

2.5

82.50

55.00

27.50

1.25
0.00
8

0.
0.

12.5
other

25.

37.5

Billable time in Minutes

sagemaker-deterministic

20

Number of Machines

50.

sagemaker-randomized

10

other

sagemaker-deterministic

sagemaker-randomized

Neural Topic Modeling
Perplexity vs. Number of Topic

Output term counts vector

13750.

Sampled Document
Representation

Document
Posterior

Encoder: feedforward net

11000.
8250.

Perplexity

Decoder:
Softmax

5500.
2750.
0.
0

Input term counts vector

50

100

Number of Topics
NTM

Other

150

200

DeepAR
Mean absolute
percentage error

DeepAR

R

DeepAR

R

Hourly occupancy rate of
963 bay area freeways

0.14

0.27

0.13

0.24

Electricity use of 370
homes over time

0.07

0.11

0.08

0.09

10k

0.32

0.32

0.44

0.31

180k

0.32

0.34

0.29

NA

Traffic

Electricity

Network
Input

P90 Loss

Page views

Page view hits
of websites

One hour on p2.xlarge, $1

More Great ML Algorithms

Spectral LDA
Training Time vs. Number of Topics
Training Time in Minutes

250
200
150
100
50
0
0

25

50

75

100

Number of Topics
lda-data-a

lda-data-b

other-data-a

other-data-b

Boosted Decision Trees

It is now available in Amazon
SageMaker!

XGBoost

Throughput vs. Number of Machines
1625.

1300.

Throughput in MB/Sec

XGBoost is one of the most
commonly used
implementations of boosted
decision trees in the world.

975.

650.

325.

0.
0

18

35

53

Number of Machines (C4.8xLarge)

70

Sequence to Sequence

Supports both RNN/CNN
as encoder/decoder

English-German Translation
30.

22.5

BLEU Score

Based on Sockeye and Apache
incubated MxNet, Multi-GPU,
and can be used for Neural
Machine Translation.

15.

Best known result!

7.5

0.
0.

7.5

15.

Billable Time in Hours

P2.16x

P2.8x

22.5
P2.x

Image Classification

Transfer learning: begin with
a model already trained on
ImageNet!

Speedup with Horizontal Scaling
3.8

3.

2.3

Speedup

Implementation in MxNet of
ResNet.
Other networks such as
DenseNet and Inception will
be added in the future.

1.5

0.8

0.
0

1

2

3

Number of Machine (P2)

4

5

Amazon SageMaker Built-In
Algorithms 10x Better

Training code

• Matrix Factorization
MxNet
• Regression
TensorFlow
• Principal Component Analysis
• K-Means Clustering
Bring Your Own Script
• Gradient Boosted Trees
(Amazon SageMaker builds the Container)
• And More!
Amazon-Provided Algorithms

Apache Amazon SageMaker
Spark Estimators in Apache Spark

Bring Your Own Algorithm
(You build the Container)

Amazon SageMaker Built-In Algorithms
Managed Distributed Training with Flexibility

Save Model
Artifacts
Fully
managed

Fetch Training
data

Secured
Training code

Save Inference
Amazon ECR
Image

• Matrix Factorization
MxNet
• Regression
TensorFlow
• Principal Component Analysis
• K-Means Clustering
Bring Your Own Script
• Gradient Boosted Trees
(Amazon SageMaker builds the Container)
• And More!
Amazon-Provided Algorithms

Apache Amazon SageMaker
Spark Estimators in Apache Spark

Bring Your Own Algorithm
(You build the Container)

Demo 6: Using Amazon SageMaker Built-in Algorithms

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Amazon SageMaker Hosting Service
Easy Model Deployment to Amazon SageMaker
•
•
•
•
•

InstanceType: c3.4xlarge
InitialInstanceCount: 3
ModelName: prod
VariantName: primary
InitialVariantWeight: 100

100
80%
%

Ground
Truth Amazon S3

20%

Inference
EndPoint

Model Versions

Endpoint Configuration

Amazon ECR

SageMaker Hosting Service

Easy Model Deployment to Amazon SageMaker
• Auto-Scaling Inference APIs
• A/B Testing (more to come)
• Low Latency & High Throughput
• Bring Your Own Model
• Python SDK

Demo 7: Analyzing Breast Cancer Datasets

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Demo 8: Using Containers with Amazon SageMaker

© 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential.

Amazon SageMaker
Reference Architecture

Amazon S3

Train
Build

Amazon
SageMaker
Notebooks

Training
Algorithm

Coco dataset
static website hosted on S3

Amazon SageMaker
Training

Amazon
Cloudfront

Web assets on
Cloudfront

Amazon SageMaker
Hosting
Code Commit
AWS
API
Lambda Gateway
Code Pipeline Amazon ECR

Deploy

Inference requests

Amazon SageMaker

Technology Competency Partners

Data
Services

Platform Solutions

SaaS and API Solutions

Alteryx

Bonsai

DataRobot

Anodot

SigOpt

CrowdFlower

C3 IoT

DOMINO
DATA LAB

Luminoso

Veritone

Paxata

Databricks

H2O.ai

Narrative
Science

x.ai

TRIFACTA

Data Iku

Call To Action
• Getting started with Amazon SageMaker:
https://aws.amazon.com/sagemaker/
• Use the Amazon SageMaker SDK:
• For Python: https://github.com/aws/sagemaker-python-sdk
• For Spark: https://github.com/aws/sagemaker-spark

• SageMaker Examples: https://github.com/awslabs/amazonsagemaker-examples

Thank You

© 2018 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-coursefeedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.7
Linearized                      : Yes
Author                          : vkkasibh
Create Date                     : 2018:10:03 19:39:35-04:00
Modify Date                     : 2018:11:16 07:27:33-05:00
XMP Toolkit                     : Adobe XMP Core 5.6-c015 91.163280, 2018/06/22-11:31:03
Format                          : application/pdf
Creator                         : vkkasibh
Title                           : Microsoft PowerPoint - STP-ML on AWS-Technical Final
Metadata Date                   : 2018:11:16 07:27:33-05:00
Producer                        : Microsoft: Print To PDF
Document ID                     : uuid:bb8b9233-f3a9-4310-b4be-6671243dfa08
Instance ID                     : uuid:977f635d-9c74-431f-b353-97cea25adca9
Page Count                      : 70
EXIF Metadata provided by EXIF.tools

Navigation menu