ML
value function
looking ahead into the future and figuring out how much reward you expect to get given your current policy. adjustment is called a policy update
Machine learning using Python libraries
For more classical models (linear, tree-based) as well as a set of common ML-related tools, take a look at scikit-learn. The web documentation for this library is also organized for those getting familiar with space and can be a great place to get familiar with some extremely useful tools and techniques. For deep learning, mxnet, tensorflow, and pytorch are the three most common libraries. For the purposes of the majority of machine learning needs, each of these is feature-paired and equivalent.
Deterministic policy
IF ... THEN Occurs when the agent has a full understanding of the environment Does not work in situations like rock-paper-scissors
Machine Learning
In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values. We will explore this in-depth in this lesson. In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data. We will explore this in-depth in this lesson. In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal. This is a completely different approach than supervised and unsupervised learning. We will dive deep into this in the next lesson.
Unsupervised Learning Example
In this use case, the silhouette coefficient is a good choice. This metric describes how well your data was clustered by the model. To find the optimal number of clusters, you plot the silhouette coefficient as shown in the following image below. You find the optimal value is when k=19. Silhouette coefficients: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.
Terminology
Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data. In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values. In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal. In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.
# machine learning model evaluation techniques
Machine learning: Supervised Classification Accuracy Precision Recall Regression Mean Absolute Error Unsupervised Clustering Inertia (within cluster sum of squares)(the closer to 0 the better) Silhouette score (centroid to outer cluster)(worst is -1, the best is 1)
Loss Function
Measurement of how close the model is to its goal
Reinforcement Learning Models
PPO - Proximal Policy Optimization SAC - Soft Actor Critic
PPO
Proximal Policy Optimization - On-Policy - learns ONLY from observations made by the current policy exploring the environment - most recent and relevant data Data hungry More stable short-term Less stable long-term
stochastic policy
Range of possible state based on probability distribution
SAC
Sodt Actor Critic - Off-Policy - use observations made from previous policies exploration of the environment - so it can also use old data Data Efficient Less stable short-term More stable long-term
Reinforcement learning
Test and fail Dog training
Reward Function
Uses the input parameters such as: * track_width * distance_from_center * steering angle
RL Reinforcement learning key concepts
key concepts: Agent is the entity being trained. In our example, this is a dog. Environment is the "world" in which the agent interacts, such as a park. Actions are performed by the agent in the environment, such as running around, sitting, or playing ball. Rewards are issued to the agent for performing good actions. Self-driving cars
Deep learning models
is based around a conceptual model of how the human brain functions. The model (also called a neural network) is composed of collections of neurons (very simple computational units) connected together by weights (mathematical representations of how much information thst is allowed to flow from one neuron to the next). The process of training involves finding values for each weight. Various neural network structures have been determined for modeling different kinds of problems or processing different kinds of data.A short (but not complete!) list of noteworthy examples includes: FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer. CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images. RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data. Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.