Microsoft STP ML On AWS Technical Tech Participant Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 70
Download | |
Open PDF In Browser | View PDF |
Machine Learning on AWS - Technical AWS Solutions Training for Partners Vijay AWS Partner Trainer Machine Learning Prediction is the process of filling in missing information; it uses data you have to generate data you don’t have. Learning Language © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Perception Problem Solving Insight The Artificial Intelligence Landscape Five Tribes / Two Breakthroughs Tribe Origins Algorithm Bayesians Statistics Probabilistic Analogizers Psychology Kernel Symbolists Logic Inverse Deduction Evolutionaries Biology Genetic Connectionists Neuroscience Back Propagation • Computer Vision : CNNs • Static / Unstructured • Natural Language Processing : RNNs / LSTM • Sequential / Structured © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. AWS Mission Put machine learning in the hands of every developer, data scientist and architect © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Customers Running ML on AWS © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Image Speech Amazon Polly Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlow Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Image Speech Amazon Polly Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlo w Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile Demo 1: Vision Services © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Image Speech Amazon Polly Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlo w Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile Demo 2: Language Services © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Speech Image Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Polly Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlo w Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Speech Image Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Polly Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlow Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile Demo 3: Deep Learning AMI © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The AWS Machine Learning Stack Application Services Platform Services Vision Amazon Rekognition Image Speech Amazon Polly Translate Comprehend Amazon Machine Learning AWS DeepLens Frameworks & Infrastructure Lex Transcribe Amazon Rekognition Video Amazon SageMaker Language Amazon EMR Spark Amazon Mechanical Turk AWS Deep Learning AMI TensorFlow Apache MXNet Compute Gluon Cognitive Toolkit GPU - P3 Caffe Keras AWS Greengrass PyTorch Chainer Mobile Amazon EC2 P3 Instances The fastest, most powerful GPU instances in the cloud • Up to 8 NVIDIA Tesla V100 GPUs Airbnb • 1 PetaFLOPs of computational performance Toyota Research Institute • 300 GB/s GPU-to-GPU communication (NVLink) OpenAI • 16GB GPU memory with 900 GB/sec peak bandwidth • 14x better than P2 • 9X better than P2 © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Amazon EC2 P3 Instances © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Amazon EC2 P3 Instances © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. AWS Deep Learning Amazon Machine Image (AMI) • Get started quickly with easy-to-launch tutorials • Hassle-free setup and configuration • Pay only for what you use – no additional charge for the AMI • Accelerate your model training and deployment • Support for popular deep learning frameworks TensorFlow, MXNet, Gluon, Keras, Caffe2, PyTorch, Zendesk, Matric Analytics, SCDM, etc. © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Amazon ML Solutions Lab Lots of companies doing Machine Learning © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Lack ML expertise Unable to unlock business potential Amazon ML Solutions Lab Lots of companies doing Machine Learning Lack ML expertise Unable to unlock business potential Amazon ML Solutions Lab provides the missing ML expertise Brainstorming © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Modeling Education Amazon ML Lab Customers Johnson & Johnson Toyota Research Institute Washington Post © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The Machine Learning Process © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. The Machine Learning Process Business Problem ML Problem Framing Monitoring & Debugging Data Collection Data Integration Feature Engineering Data Preparation Model Training & Parameter Tuning Data Visualization & Analysis Data Augmentation Feature Augmentation Model Evaluation Yes No Are business goals met? Predictions Model Deployment The ML Process Integration: The Data Architecture Build the data platform: • Amazon Simple Storage Service (Amazon S3) • Amazon Athena Business Problem ML Problem Framing Monitoring & Debugging Data Collection Data Integration Feature Engineering • Amazon Redshift Data Preparation • AWS Glue Data Visualization & Analysis Model Training & Parameter Tuning • Amazon EMR Data Augmentation Feature Augmentation Model Evaluation No Yes Are business goals met? Predictions Model Deployment The ML Process The Model Training: Undifferentiated Heavy Lifting • Setup and Manage • • • • • Notebook Environments • Training Clusters Write Data Connectors Scale ML algorithms to large datasets Distribute ML training algorithm to multiple machines Secure model artifacts Business Problem ML Problem Framing Monitoring & Debugging Data Collection Data Integration Feature Engineering Data Preparation Model Training & Parameter Tuning Data Visualization & Analysis Data Augmentation Feature Augmentation Model Evaluation Yes No Are business goals met? Predictions Model Deployment The ML Process DevOps: Undifferentiated Heavy Lifting • Setup and Manage Inference Clusters Business Problem • Manage and Scale Model Inference APIs ML Problem Framing • Monitor and Debug Model Predictions • Models versioning and performance tracking • Automate New Model version promotion to production (A/B testing) Monitoring & Debugging Data Collection Data Integration Feature Engineering Data Preparation Model Training & Parameter Tuning Data Visualization & Analysis Data Augmentation Feature Augmentation Model Evaluation Yes No Are business goals met? Predictions Model Deployment Why Amazon SageMaker? You Only Have to Write Business Logic Business Problem ML Problem Framing Monitoring & Debugging Data Collection Data Integration Feature Engineering Data Preparation Model Training & Parameter Tuning Data Visualization & Analysis Data Augmentation Feature Augmentation Model Evaluation Yes No Are business goals met? Predictions Model Deployment Amazon SageMaker © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. A Fully-Dockerized Lifecycle From Discovery to Development and Deployment Data Scientists Model Artifacts Amazon SageMaker Amazon S3 Training Data Training Algorithm Inference Engine Amazon Elastic Container Registry A Fully-Dockerized Lifecycle From discovery to development and deployment Developers and Operations EndPoint Amazon SageMaker Amazon S3 Training Data Model Artifacts Training Algorithm Inference Engine Amazon Elastic Container Registry A Fully-Dockerized Lifecycle From discovery to development and deployment Delighted Customers API Gateway Identification Authorization Logging Analytics Predictive Model Amazon SageMaker Launch Customers Intuit Digital Globe ZipRecruiter Hotels.com Thomson Reuters © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Customer Example: Intuit INTUIT “With Amazon SageMaker, we can accelerate our Artificial Intelligence initiatives at scale by building and deploying our algorithms on the platform. We will create novel largescale machine learning and AI algorithms and deploy them on this platform to solve complex problems that can power prosperity for our customers.” Ashok Srivastava, Chief Data Officer, Intuit © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Key Benefits of Amazon SageMaker at Intuit From Ad-hoc setupEasy anddata management explorationof notebook in SageMaker notebooks environments Building for flexibility Limited around choicesvirtualization for model deployment Competing for compute resources across Auto-scalable model hosting environment teams To Fraud Detection using SageMaker Data Collection APACHE KAFKA & SPARK STREAMING Calculate Features Feature Store Model Training Model Hosting INTUIT Reader Lookup Cleansing Processor Amazon EMR Training Amazon SageMaker Client Service Customer Example: DigitalGlobe DigitalGlobe “As the world’s leading provider of high-resolution Earth imagery, data and analysis, DigitalGlobe works with enormous amounts of data every day. DigitalGlobe is making it easier for people to find, access, and run compute against our entire 100PB image library, which is stored in AWS’s cloud, to apply deep learning to satellite imagery. We plan to use Amazon SageMaker to train models against petabytes of Earth observation imagery datasets using hosted Jupyter notebooks, so DigitalGlobe's Geospatial Big Data Platform (GBDX) users can just push a button, create a model, and deploy it all within one scalable distributed environment at scale.” Dr. Walter Scott, CTO of Maxar Technologies and founder of DigitalGlobe Customer Example: ZipRecruiter ZipRecruiter “We’re focused on making it faster and easier than ever to hire and get hired, training our machine learning algorithms against hundreds of millions of historical transactional activities in order to deliver highly relevant job matches as quickly as possible. Amazon SageMaker provided us with an answer to problems we had with ML workflow management, allowing us to train, evaluate and deploy models in a flexible way. In addition, Amazon SageMaker's modularity provides the ability to build and create models independently, which is a compelling feature for ZipRecruiter.” Avi Golan, VP of Engineering, ZipRecruiter Amazon SageMaker’s Components Amazon SageMaker 1 2 I Notebook Instances 3 4 I I I Algorithms ML Training Service ML Hosting Service SageMaker Notebook Instances Zero Setup for Exploratory Data Analysis Just add data! Authoring & Notebooks Access to S3 Data Lake ETL Access to AWS Database services Recommendations/Personalization Fraud Detection Forecasting Image Classification Churn Prediction Marketing Email/Campaign Targeting • Log processing and anomaly detection • Speech to Text • More… • • • • • • Demo 4: A simple Jupyter Notebook Pythagorean Theorem © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Demo 5: Predicting AWS Spot Pricing © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. SageMaker Built-in Algorithms 10x Faster Streaming datasets, for Train faster, in a single cheaper training pass Greater reliability on extremely large datasets Choice of several ML algorithms SageMaker Built-in Algorithms Time vs. Money Distributed, with Strong Machines $$$$ Single Machine $$$ $$ $ Minutes Hours Days Weeks Months SageMaker Built-in Algorithms Memory Time/Cost Streaming Data Size Data Size SageMaker Built-in Algorithms Streaming GPU State SageMaker Built-In Algorithms Distributed Shared State GPU Local State GPU Local State GPU Local State Shared State SageMaker Built-in Algorithms Cost vs. Time $$$$ $$$ Best Alternative $$ Amazon SageMaker $ Minutes Hours Days Weeks Months Infinitely Scalable ML Algorithms Linear Learner Classification (F1 Score) SageMaker Other 0.980 0.981 0.870 0.930 0.997 0.997 0.978 0.964 0.914 0.859 0.470 0.472 0.903 0.908 0.508 0.508 30 GB datasets for web-spam and web-url classification 1.375 1.1 0.825 Cost in Dollars Regression (mean squared error) SageMaker Other 1.02 1.06 1.09 1.02 0.332 0.183 0.086 0.129 83.3 84.5 0.55 0.275 0. 0. 7.5 15. 22.5 30. Billable time in Minutes sagemaker-url sagemaker-spam other-url other-spam Factorization Machines Log_loss F1 Score Click Prediction 1 TB advertising dataset, m4.4xlarge machines, perfect scaling. Seconds $200.00 0.494 0.277 820 Other (10 Iter) 0.516 0.190 650 Other (20 Iter) 0.507 0.254 1300 Other (50 Iter) 0.481 0.313 3250 $150.00 Cost in Dollars SageMaker $100.00 50 40 30 20 machines machines $50.00 10 machines $1. 2.75 4.5 Billable Time in Hours 6.25 8. K-Means Clustering Images 9GB Videos 27GB Advertising 127GB Synthetic 1100GB SageMaker 1.18E3 1.00E3 9.18.E2 3.29E2 2.72E2 2.17E2 2.19E2 2.03E2 1.86E2 1.72E7 1.30E7 1.03E7 3.81E7 3.51E7 2.81E7 Other 1.18E3 9.77E2 9.03E2 3.28E2 2.71E2 Failed 2.18E2 2.02E2 1.85E2 Failed Failed Failed Failed Failed Failed Running Time vs. Number of Clusters 8 Billable Time in Minutes Text 1.2GB k 10 100 500 10 100 500 10 100 500 10 100 500 10 100 500 6 4 ~10x Faster! 2 0 10 100 500 Number of Clusters sagemaker other Principal Component Analysis (PCA) Cost vs. Time Throughput and Scalability 5. Cost in Dollars 3.75 Mb/Sec/Machine 110.00 More than 10x faster at a fraction the cost! 2.5 82.50 55.00 27.50 1.25 0.00 8 0. 0. 12.5 other 25. 37.5 Billable time in Minutes sagemaker-deterministic 20 Number of Machines 50. sagemaker-randomized 10 other sagemaker-deterministic sagemaker-randomized Neural Topic Modeling Perplexity vs. Number of Topic Output term counts vector 13750. Sampled Document Representation Document Posterior Encoder: feedforward net 11000. 8250. Perplexity Decoder: Softmax 5500. 2750. 0. 0 Input term counts vector 50 100 Number of Topics NTM Other 150 200 DeepAR Mean absolute percentage error DeepAR R DeepAR R Hourly occupancy rate of 963 bay area freeways 0.14 0.27 0.13 0.24 Electricity use of 370 homes over time 0.07 0.11 0.08 0.09 10k 0.32 0.32 0.44 0.31 180k 0.32 0.34 0.29 NA Traffic Electricity Network Input P90 Loss Page views Page view hits of websites One hour on p2.xlarge, $1 More Great ML Algorithms Spectral LDA Training Time vs. Number of Topics Training Time in Minutes 250 200 150 100 50 0 0 25 50 75 100 Number of Topics lda-data-a lda-data-b other-data-a other-data-b Boosted Decision Trees It is now available in Amazon SageMaker! XGBoost Throughput vs. Number of Machines 1625. 1300. Throughput in MB/Sec XGBoost is one of the most commonly used implementations of boosted decision trees in the world. 975. 650. 325. 0. 0 18 35 53 Number of Machines (C4.8xLarge) 70 Sequence to Sequence Supports both RNN/CNN as encoder/decoder English-German Translation 30. 22.5 BLEU Score Based on Sockeye and Apache incubated MxNet, Multi-GPU, and can be used for Neural Machine Translation. 15. Best known result! 7.5 0. 0. 7.5 15. Billable Time in Hours P2.16x P2.8x 22.5 P2.x Image Classification Transfer learning: begin with a model already trained on ImageNet! Speedup with Horizontal Scaling 3.8 3. 2.3 Speedup Implementation in MxNet of ResNet. Other networks such as DenseNet and Inception will be added in the future. 1.5 0.8 0. 0 1 2 3 Number of Machine (P2) 4 5 Amazon SageMaker Built-In Algorithms 10x Better Training code • Matrix Factorization MxNet • Regression TensorFlow • Principal Component Analysis • K-Means Clustering Bring Your Own Script • Gradient Boosted Trees (Amazon SageMaker builds the Container) • And More! Amazon-Provided Algorithms Apache Amazon SageMaker Spark Estimators in Apache Spark Bring Your Own Algorithm (You build the Container) Amazon SageMaker Built-In Algorithms Managed Distributed Training with Flexibility Save Model Artifacts Fully managed Fetch Training data Secured Training code Save Inference Amazon ECR Image • Matrix Factorization MxNet • Regression TensorFlow • Principal Component Analysis • K-Means Clustering Bring Your Own Script • Gradient Boosted Trees (Amazon SageMaker builds the Container) • And More! Amazon-Provided Algorithms Apache Amazon SageMaker Spark Estimators in Apache Spark Bring Your Own Algorithm (You build the Container) Demo 6: Using Amazon SageMaker Built-in Algorithms © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Amazon SageMaker Hosting Service Easy Model Deployment to Amazon SageMaker • • • • • InstanceType: c3.4xlarge InitialInstanceCount: 3 ModelName: prod VariantName: primary InitialVariantWeight: 100 100 80% % Ground Truth Amazon S3 20% Inference EndPoint Model Versions Endpoint Configuration Amazon ECR SageMaker Hosting Service Easy Model Deployment to Amazon SageMaker • Auto-Scaling Inference APIs • A/B Testing (more to come) • Low Latency & High Throughput • Bring Your Own Model • Python SDK Demo 7: Analyzing Breast Cancer Datasets © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Demo 8: Using Containers with Amazon SageMaker © 2018 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon confidential. Amazon SageMaker Reference Architecture Amazon S3 Train Build Amazon SageMaker Notebooks Training Algorithm Coco dataset static website hosted on S3 Amazon SageMaker Training Amazon Cloudfront Web assets on Cloudfront Amazon SageMaker Hosting Code Commit AWS API Lambda Gateway Code Pipeline Amazon ECR Deploy Inference requests Amazon SageMaker Technology Competency Partners Data Services Platform Solutions SaaS and API Solutions Alteryx Bonsai DataRobot Anodot SigOpt CrowdFlower C3 IoT DOMINO DATA LAB Luminoso Veritone Paxata Databricks H2O.ai Narrative Science x.ai TRIFACTA Data Iku Call To Action • Getting started with Amazon SageMaker: https://aws.amazon.com/sagemaker/ • Use the Amazon SageMaker SDK: • For Python: https://github.com/aws/sagemaker-python-sdk • For Spark: https://github.com/aws/sagemaker-spark • SageMaker Examples: https://github.com/awslabs/amazonsagemaker-examples Thank You © 2018 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-coursefeedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.7 Linearized : Yes Author : vkkasibh Create Date : 2018:10:03 19:39:35-04:00 Modify Date : 2018:11:16 07:27:33-05:00 XMP Toolkit : Adobe XMP Core 5.6-c015 91.163280, 2018/06/22-11:31:03 Format : application/pdf Creator : vkkasibh Title : Microsoft PowerPoint - STP-ML on AWS-Technical Final Metadata Date : 2018:11:16 07:27:33-05:00 Producer : Microsoft: Print To PDF Document ID : uuid:bb8b9233-f3a9-4310-b4be-6671243dfa08 Instance ID : uuid:977f635d-9c74-431f-b353-97cea25adca9 Page Count : 70EXIF Metadata provided by EXIF.tools