Difference between revisions of "AWS/Machine Learning"

From Christoph's Personal Wiki
Jump to: navigation, search
(Machine Learning concepts)
(Machine Learning concepts)
Line 52: Line 52:
 
#** Categorical target => multi-class model
 
#** Categorical target => multi-class model
 
# Evaluate the model by running the remaining dataset through it
 
# Evaluate the model by running the remaining dataset through it
 +
#* AWS ML automatically evaluates the model based on the data source for you
 +
#* If using the API, you would have to do this in a separate step
 
# Fine-tune the model
 
# Fine-tune the model
 
# Use the model for predictions
 
# Use the model for predictions
Line 72: Line 74:
 
** The target/prediction is a numeric value
 
** The target/prediction is a numeric value
 
** Best used for predicting scores
 
** Best used for predicting scores
 +
** Root mean square error (RMSE)
 +
*** AWS ML takes the mean of the training target data (RMSE Baseline) and uses that as a baseline and compares it to the mean of the predictions (RMSE)
 +
*** A RMSE lower than the RMSE Baseline is better
 
** Examples:
 
** Examples:
 
*** How many millimetres of rain can we expect?
 
*** How many millimetres of rain can we expect?

Revision as of 00:17, 15 March 2017

This article will be about Amazon Web Services - Machine Learning (ML).

Machine Learning concepts

What is Machine Learning (ML)?
  • The basic concept of ML is to have computers or machines program themselves.
  • Machines can analyze large and complex datasets and identify patterns to create models, which are then used to predict outcomes.
  • Over time, these models can take into account new datasets and improve the accuracy of the predictions.
Examples of where ML is being used
  • Recommendations when checking out on an e-commerce site (e.g., purchases on Amazon.com)
  • Spam detection in email
  • Any kind of image, speech, or text recognition
  • Weather forecasts
  • Search engines
What is Amazon ML?
  • Amazon ML is supervised ML; learns from examples or historical data.
  • An Amazon ML Model requires your dataset to have both the features and the target for each observation/record.
  • A feature is an attribute of a record used to identify patterns; typically, there will be multiple features.
  • A target is the outcome that the patterns are linked to and is the value the ML algorithm is going to predict.
  • This linking is used to predict the outcomes
  • Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)
Why do ML on AWS?
  • Simplifies the whole process
  • No coding required for creating models
  • Identifies the best ML algorithm to run based on the input data
  • Easily integrates into other AWS services for data retrieval
  • Deploy within minutes
  • Full access via APIs
  • Scalable
Amazon ML pricing (as of March 2017)
  • Data Analysis and Model Building fees: $0.42/hour
  • Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000
  • Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active)
AWS ML Workflow
  1. Create a data source
    • S3 (i.e., upload a CSV file to S3)
    • RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML)
  2. Identify the feature and target columns
    • Select whether the file has a header row
    • Select the correct field data types (possible types: binary, categorical, numeric, text)
    • Select the target that needs to be predicted
    • Select a Row ID, if the data has one
  3. Train a model with a part of the dataset (generally 70%)
    • By default, AWS ML takes 70% of your data and uses it to train the model
    • It also automatically decides the best ML Model algorithm to use, based on the data schema
      • Binary target => binary model
      • Numeric target => regression model
      • Categorical target => multi-class model
  4. Evaluate the model by running the remaining dataset through it
    • AWS ML automatically evaluates the model based on the data source for you
    • If using the API, you would have to do this in a separate step
  5. Fine-tune the model
  6. Use the model for predictions
Types of ML models available
  • Binary
    • The target/prediction value is a 0 or 1
    • Best used when the prediction is a Boolean or one of two possible outcomes (e.g., true/false, yes/no, green apple/red apple, etc.)
    • Examples:
      • Does an email match the spam criteria?
      • Will someone respond to a marketing email?
      • Does a purchase on a credit card seem fraudulent?
  • Multi-class
    • The target/prediction is from a set of values
    • Best used for predicting categories or types
    • Examples:
      • What is the next product a user will purchase based on his/her history of purchases?
      • Film recommendations
  • Regression
    • The target/prediction is a numeric value
    • Best used for predicting scores
    • Root mean square error (RMSE)
      • AWS ML takes the mean of the training target data (RMSE Baseline) and uses that as a baseline and compares it to the mean of the predictions (RMSE)
      • A RMSE lower than the RMSE Baseline is better
    • Examples:
      • How many millimetres of rain can we expect?
      • Traffic delays
      • How many goals will my soccer team score?

External links