Difference between revisions of "AWS/Machine Learning"

Revision as of 00:17, 15 March 2017

This article will be about Amazon Web Services - Machine Learning (ML).

The basic concept of ML is to have computers or machines program themselves.
Machines can analyze large and complex datasets and identify patterns to create models, which are then used to predict outcomes.
Over time, these models can take into account new datasets and improve the accuracy of the predictions.

Recommendations when checking out on an e-commerce site (e.g., purchases on Amazon.com)
Spam detection in email
Any kind of image, speech, or text recognition
Weather forecasts
Search engines

Amazon ML is supervised ML; learns from examples or historical data.
An Amazon ML Model requires your dataset to have both the features and the target for each observation/record.
A feature is an attribute of a record used to identify patterns; typically, there will be multiple features.
A target is the outcome that the patterns are linked to and is the value the ML algorithm is going to predict.
This linking is used to predict the outcomes
Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)

Data Analysis and Model Building fees: $0.42/hour
Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000
Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active)

Create a data source
- S3 (i.e., upload a CSV file to S3)
- RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML)
Identify the feature and target columns
- Select whether the file has a header row
- Select the correct field data types (possible types: binary, categorical, numeric, text)
- Select the target that needs to be predicted
- Select a Row ID, if the data has one
Train a model with a part of the dataset (generally 70%)
- By default, AWS ML takes 70% of your data and uses it to train the model
- It also automatically decides the best ML Model algorithm to use, based on the data schema
  - Binary target => binary model
  - Numeric target => regression model
  - Categorical target => multi-class model
Evaluate the model by running the remaining dataset through it
- AWS ML automatically evaluates the model based on the data source for you
- If using the API, you would have to do this in a separate step
Fine-tune the model
Use the model for predictions

@@ Line 52: / Line 52: @@
 #** Categorical target => multi-class model
 # Evaluate the model by running the remaining dataset through it
+#* AWS ML automatically evaluates the model based on the data source for you
+#* If using the API, you would have to do this in a separate step
 # Fine-tune the model
 # Use the model for predictions
@@ Line 72: / Line 74: @@
 ** The target/prediction is a numeric value
 ** Best used for predicting scores
+** Root mean square error (RMSE)
+*** AWS ML takes the mean of the training target data (RMSE Baseline) and uses that as a baseline and compares it to the mean of the predictions (RMSE)
+*** A RMSE lower than the RMSE Baseline is better
 ** Examples:
 *** How many millimetres of rain can we expect?