Difference between revisions of "AWS/Machine Learning"
From Christoph's Personal Wiki
(→Machine Learning concepts) |
|||
Line 21: | Line 21: | ||
* This linking is used to predict the outcomes | * This linking is used to predict the outcomes | ||
* Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target) | * Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target) | ||
+ | |||
+ | ;Why do ML on AWS? | ||
+ | * Simplifies the whole process | ||
+ | * No coding required for creating models | ||
+ | * Identifies the best ML algorithm to run based on the input data | ||
+ | * Easily integrates into other AWS services for data retrieval | ||
+ | * Deploy within minutes | ||
+ | * Full access via APIs | ||
+ | * Scalable | ||
+ | |||
+ | ;Amazon ML pricing (as of March 2017) | ||
+ | * Data Analysis and Model Building fees: $0.42/hour | ||
+ | * Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000 | ||
+ | * Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active) | ||
+ | |||
+ | ;AWS ML Workflow | ||
+ | # Create a data source | ||
+ | #* S3 (i.e., upload a CSV file to S3) | ||
+ | #* RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML) | ||
+ | # Identify the feature and target columns | ||
+ | #* Select whether the file has a header row | ||
+ | #* Select the correct field data types (possible types: binary, categorical, numeric, text) | ||
+ | #* Select the '''target''' that needs to be predicted | ||
+ | #* Select a Row ID, if the data has one | ||
+ | # Train a model with a part of the dataset (generally 70%) | ||
+ | #* By default, AWS ML takes 70% of your data and uses it to train the model | ||
+ | #* It also automatically decides the best ML Model algorithm to use, based on the data schema | ||
+ | #** Binary target => binary model | ||
+ | #** Numeric target => regression model | ||
+ | #** Categorical target => multi-class model | ||
+ | # Evaluate the model by running the remaining dataset through it | ||
+ | # Fine-tune the model | ||
+ | # Use the model for predictions | ||
+ | |||
+ | ;Types of ML models available | ||
+ | * Binary | ||
+ | ** The target/prediction value is a 0 or 1 | ||
+ | ** Best used when the prediction is a Boolean or one of two possible outcomes (e.g., true/false, yes/no, green apple/red apple, etc.) | ||
+ | ** Examples: | ||
+ | *** Does an email match the spam criteria? | ||
+ | *** Will someone respond to a marketing email? | ||
+ | *** Does a purchase on a credit card seem fraudulent? | ||
+ | * Multi-class | ||
+ | ** The target/prediction is from a set of values | ||
+ | ** Best used for predicting categories or types | ||
+ | ** Examples: | ||
+ | *** What is the next product a user will purchase based on his/her history of purchases? | ||
+ | *** Film recommendations | ||
+ | * Regression | ||
+ | ** The target/prediction is a numeric value | ||
+ | ** Best used for predicting scores | ||
+ | ** Examples: | ||
+ | *** How many millimetres of rain can we expect? | ||
+ | *** Traffic delays | ||
+ | *** How many goals will my soccer team score? | ||
==External links== | ==External links== |
Revision as of 23:56, 14 March 2017
This article will be about Amazon Web Services - Machine Learning (ML).
Machine Learning concepts
- What is Machine Learning (ML)?
- The basic concept of ML is to have computers or machines program themselves.
- Machines can analyze large and complex datasets and identify patterns to create models, which are then used to predict outcomes.
- Over time, these models can take into account new datasets and improve the accuracy of the predictions.
- Examples of where ML is being used
- Recommendations when checking out on an e-commerce site (e.g., purchases on Amazon.com)
- Spam detection in email
- Any kind of image, speech, or text recognition
- Weather forecasts
- Search engines
- What is Amazon ML?
- Amazon ML is supervised ML; learns from examples or historical data.
- An Amazon ML Model requires your dataset to have both the features and the target for each observation/record.
- A feature is an attribute of a record used to identify patterns; typically, there will be multiple features.
- A target is the outcome that the patterns are linked to and is the value the ML algorithm is going to predict.
- This linking is used to predict the outcomes
- Example: {Go to the grocery store} {on Monday} (attribute {feature}) => Buy milk (target)
- Why do ML on AWS?
- Simplifies the whole process
- No coding required for creating models
- Identifies the best ML algorithm to run based on the input data
- Easily integrates into other AWS services for data retrieval
- Deploy within minutes
- Full access via APIs
- Scalable
- Amazon ML pricing (as of March 2017)
- Data Analysis and Model Building fees: $0.42/hour
- Batch Predictions $0.10 per 1,000 predictions, rounded up to the next 1,000
- Real-time predictions: $0.0001 per prediction, rounded up to the nearest penny (plus hourly capacity reservation charge only when the endpoint is active)
- AWS ML Workflow
- Create a data source
- S3 (i.e., upload a CSV file to S3)
- RDS and Redshift (i.e., run a SQL query on a Redshift cluster and get the data back directly into ML)
- Identify the feature and target columns
- Select whether the file has a header row
- Select the correct field data types (possible types: binary, categorical, numeric, text)
- Select the target that needs to be predicted
- Select a Row ID, if the data has one
- Train a model with a part of the dataset (generally 70%)
- By default, AWS ML takes 70% of your data and uses it to train the model
- It also automatically decides the best ML Model algorithm to use, based on the data schema
- Binary target => binary model
- Numeric target => regression model
- Categorical target => multi-class model
- Evaluate the model by running the remaining dataset through it
- Fine-tune the model
- Use the model for predictions
- Types of ML models available
- Binary
- The target/prediction value is a 0 or 1
- Best used when the prediction is a Boolean or one of two possible outcomes (e.g., true/false, yes/no, green apple/red apple, etc.)
- Examples:
- Does an email match the spam criteria?
- Will someone respond to a marketing email?
- Does a purchase on a credit card seem fraudulent?
- Multi-class
- The target/prediction is from a set of values
- Best used for predicting categories or types
- Examples:
- What is the next product a user will purchase based on his/her history of purchases?
- Film recommendations
- Regression
- The target/prediction is a numeric value
- Best used for predicting scores
- Examples:
- How many millimetres of rain can we expect?
- Traffic delays
- How many goals will my soccer team score?