Difference between revisions of "AWS/Machine Learning"

From Christoph's Personal Wiki
Jump to: navigation, search
(Machine Learning concepts)
Line 21: Line 21:
 
* This linking is used to predict the outcomes
 
* This linking is used to predict the outcomes
 
* Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)
 
* Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)
 +
 +
;Why do ML on AWS?
 +
* Simplifies the whole process
 +
* No coding required for creating models
 +
* Identifies the best ML algorithm to run based on the input data
 +
* Easily integrates into other AWS services for data retrieval
 +
* Deploy within minutes
 +
* Full access via APIs
 +
* Scalable
 +
 +
;Amazon ML pricing (as of March 2017)
 +
* Data Analysis and Model Building fees: $0.42/hour
 +
* Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000
 +
* Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active)
 +
 +
;AWS ML Workflow
 +
# Create a data source
 +
#* S3 (i.e., upload a CSV file to S3)
 +
#* RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML)
 +
# Identify the feature and target columns
 +
#* Select whether the file has a header row
 +
#* Select the correct field data types (possible types: binary, categorical, numeric, text)
 +
#* Select the '''target''' that needs to be predicted
 +
#* Select a Row ID, if the data has one
 +
# Train a model with a part of the dataset (generally 70%)
 +
#* By default, AWS ML takes 70% of your data and uses it to train the model
 +
#* It also automatically decides the best ML Model algorithm to use, based on the data schema
 +
#** Binary target => binary model
 +
#** Numeric target => regression model
 +
#** Categorical target => multi-class model
 +
# Evaluate the model by running the remaining dataset through it
 +
# Fine-tune the model
 +
# Use the model for predictions
 +
 +
;Types of ML models available
 +
* Binary
 +
** The target/prediction value is a 0 or 1
 +
** Best used when the prediction is a Boolean or one of two possible outcomes (e.g., true/false, yes/no, green apple/red apple, etc.)
 +
** Examples:
 +
*** Does an email match the spam criteria?
 +
*** Will someone respond to a marketing email?
 +
*** Does a purchase on a credit card seem fraudulent?
 +
* Multi-class
 +
** The target/prediction is from a set of values
 +
** Best used for predicting categories or types
 +
** Examples:
 +
*** What is the next product a user will purchase based on his/her history of purchases?
 +
*** Film recommendations
 +
* Regression
 +
** The target/prediction is a numeric value
 +
** Best used for predicting scores
 +
** Examples:
 +
*** How many millimetres of rain can we expect?
 +
*** Traffic delays
 +
*** How many goals will my soccer team score?
  
 
==External links==
 
==External links==

Revision as of 23:56, 14 March 2017

This article will be about Amazon Web Services - Machine Learning (ML).

Machine Learning concepts

What is Machine Learning (ML)?
  • The basic concept of ML is to have computers or machines program themselves.
  • Machines can analyze large and complex datasets and identify patterns to create models, which are then used to predict outcomes.
  • Over time, these models can take into account new datasets and improve the accuracy of the predictions.
Examples of where ML is being used
  • Recommendations when checking out on an e-commerce site (e.g., purchases on Amazon.com)
  • Spam detection in email
  • Any kind of image, speech, or text recognition
  • Weather forecasts
  • Search engines
What is Amazon ML?
  • Amazon ML is supervised ML; learns from examples or historical data.
  • An Amazon ML Model requires your dataset to have both the features and the target for each observation/record.
  • A feature is an attribute of a record used to identify patterns; typically, there will be multiple features.
  • A target is the outcome that the patterns are linked to and is the value the ML algorithm is going to predict.
  • This linking is used to predict the outcomes
  • Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)
Why do ML on AWS?
  • Simplifies the whole process
  • No coding required for creating models
  • Identifies the best ML algorithm to run based on the input data
  • Easily integrates into other AWS services for data retrieval
  • Deploy within minutes
  • Full access via APIs
  • Scalable
Amazon ML pricing (as of March 2017)
  • Data Analysis and Model Building fees: $0.42/hour
  • Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000
  • Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active)
AWS ML Workflow
  1. Create a data source
    • S3 (i.e., upload a CSV file to S3)
    • RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML)
  2. Identify the feature and target columns
    • Select whether the file has a header row
    • Select the correct field data types (possible types: binary, categorical, numeric, text)
    • Select the target that needs to be predicted
    • Select a Row ID, if the data has one
  3. Train a model with a part of the dataset (generally 70%)
    • By default, AWS ML takes 70% of your data and uses it to train the model
    • It also automatically decides the best ML Model algorithm to use, based on the data schema
      • Binary target => binary model
      • Numeric target => regression model
      • Categorical target => multi-class model
  4. Evaluate the model by running the remaining dataset through it
  5. Fine-tune the model
  6. Use the model for predictions
Types of ML models available
  • Binary
    • The target/prediction value is a 0 or 1
    • Best used when the prediction is a Boolean or one of two possible outcomes (e.g., true/false, yes/no, green apple/red apple, etc.)
    • Examples:
      • Does an email match the spam criteria?
      • Will someone respond to a marketing email?
      • Does a purchase on a credit card seem fraudulent?
  • Multi-class
    • The target/prediction is from a set of values
    • Best used for predicting categories or types
    • Examples:
      • What is the next product a user will purchase based on his/her history of purchases?
      • Film recommendations
  • Regression
    • The target/prediction is a numeric value
    • Best used for predicting scores
    • Examples:
      • How many millimetres of rain can we expect?
      • Traffic delays
      • How many goals will my soccer team score?

External links