Machine Learning Part 1
What is Supervised Learning?
The algorithm is provided with examples and the solution or label for each. Ex: emails flagged as spam or not spam.
How do they Use ML for Spam Filtering?
Writing an algorithm to catch spam emails would involve manual rules that would be constantly tuned A machine learning algorithm does this automatically, learning which words, phrases and metadata characterize spam email automatically.
What are the Features?
the attributes of a sample that are used to make a prediction
What are the Challenges of Machine Learning?
1. Insufficient data "The Unreasonable effectiveness of Data" 2. Unrepresentative training data -Including or omitting outliers can dramatically change results and accuracy 3. Feature Engineering - Selecting the Relevant predictive features 4. Overfitting Training Data - It is possible to find signal that isnt there in the dataset 5.Underfitting Training Data - Model its to simple for the Underlying data
What is Unsupervised Learning?
The data is not labeled, instead the algorithm must find correlations in the training data. Ex: clustering or data reduction (dimensionality reduction)
What is Regularization?
constraining a model to avoid overfitting
IOT and Machine Learning
The cloud is also at the confluence between the internet of things and machine learning As more devices produce more data, machine learning models can be used to predict more real-world events more quickly This allows for events to be predicted and action to be taken with more accuracy, before a problem occurs This sort of action taking and associated service contracts can be seen as a crucial element for the vision of industrial giants such as GE As devices generate more data and that data becomes increasingly valuable by virtue of the analysis that can be performed on it a question of who owns data becomes more salient Today many vendors collect and utilize data from the internet connected devices they sell This creates a tension with customers, especially industrial customers who are coming to appreciate the value of this data Who owns data generated by devices is not consistently managed or governed, it's governed by individual contracts Machine learning models are frequently trained on a large batch of data The model can then be used on data streaming through the system in near real-time Cloud services allow models to be deployed around the world so they can be close to end users or the devices which are consuming data from Geographic redundancy also allows companies to scale their models with demand and survive disasters and other outages
What is Machine Learning
field of study that gives computers the ability to learn without being explicitly programmed.
What is the Validation Dataset?
portion of a dataset that is used to evaluate or test the model and is not shown for training
What is the Training Dataset?
the portion of a dataset that the model can learn from
What is the Target?
the thing being predicted
What is Machine Learning Routinely applied to?
1. Customer acquisition and retention 2. Customer interaction and support 3. Digital advertisement placements 4. Speech translation 5. Autonomous vehicles 6. Identifying people 7. Identifying dangerous items or contraband at airports and customs offices
Machine Learning Process
1. Dataset 2. Train Model/Algorithm 3. Evaluate Results 4. Deploy Model 5. Update Dataset
Key Terms
1. Features 2. Target 3. Training Dataset 4. Validation Dataset 5. Regularization
Machine Learning
1. Industries are being transformed by ML and Artificial Intelligence 2. Machine learning allows for common tasks like the discovery process in legal trials to be automated, removing the need for some junior staff 3. Machine learning also allows companies to optimize operations and find patterns that humans would inevitably miss
How does the cloud help companies use machine learning?
1. Machine learning relies on large quantities of data, the more data the more accurate models are, by and large 2. Storing all of this data was expensive or impossible for most companies in the past 3. Training machine learning algorithms requires vast quantities of processing power which is only necessary when training the model 4. Running machine learning models only takes a fraction of the compute needed to train 5. Training machine learning models is also increasingly done on Graphic Processing Units, GPUs 6. These are used because they are several orders of magnitude faster than Central Processing Units, CPUs, for matrix multiplication 7. Matrix multiplication is the basic arithmetic that underpins deep learning like neural networks 8. GPUs for deep learning are very expensive, which makes renting them in the cloud for training ML models even more appealing 9. Clusters of GPU machines can be rented on both AWS and Azure for training models
What is ML good for?
1. Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better. ex: Spam filtering, document classification 2. Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. Ex: Speech to text, translation 3. Getting insights about complex problems and large amounts of data. Ex:Genome analysis, particle collision analysis
Types of Machine Learning
1. Supervised Learning 2. Unsupervised Learning 3. Semi-supervised Learning
What is Semisupervised Learning?
Combining unsupervised learning. Ex: clustering then labeling examples in the clusters