ML

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

value function

looking ahead into the future and figuring out how much reward you expect to get given your current policy. adjustment is called a policy update

Machine learning using Python libraries

For more classical models (linear, tree-based) as well as a set of common ML-related tools, take a look at scikit-learn. The web documentation for this library is also organized for those getting familiar with space and can be a great place to get familiar with some extremely useful tools and techniques. For deep learning, mxnet, tensorflow, and pytorch are the three most common libraries. For the purposes of the majority of machine learning needs, each of these is feature-paired and equivalent.

Deterministic policy

IF ... THEN Occurs when the agent has a full understanding of the environment Does not work in situations like rock-paper-scissors

Machine Learning

In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values. We will explore this in-depth in this lesson. In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data. We will explore this in-depth in this lesson. In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal. This is a completely different approach than supervised and unsupervised learning. We will dive deep into this in the next lesson.

Unsupervised Learning Example

In this use case, the silhouette coefficient is a good choice. This metric describes how well your data was clustered by the model. To find the optimal number of clusters, you plot the silhouette coefficient as shown in the following image below. You find the optimal value is when k=19. Silhouette coefficients: A score from -1 to 1 describing the clusters found during modeling. A score near zero indicates overlapping clusters, and scores less than zero indicate data points assigned to incorrect clusters. A score approaching 1 indicates successful identification of discrete non-overlapping clusters.

Terminology

Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data. In supervised learning, every training sample from the dataset has a corresponding label or output value associated with it. As a result, the algorithm learns to predict labels or output values. In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal. In unsupervised learning, there are no labels for the training data. A machine learning algorithm tries to learn the underlying patterns or distributions that govern the data.

# machine learning model evaluation techniques

Machine learning: Supervised Classification Accuracy Precision Recall Regression Mean Absolute Error Unsupervised Clustering Inertia (within cluster sum of squares)(the closer to 0 the better) Silhouette score (centroid to outer cluster)(worst is -1, the best is 1)

Loss Function

Measurement of how close the model is to its goal

Reinforcement Learning Models

PPO - Proximal Policy Optimization SAC - Soft Actor Critic

PPO

Proximal Policy Optimization - On-Policy - learns ONLY from observations made by the current policy exploring the environment - most recent and relevant data Data hungry More stable short-term Less stable long-term

stochastic policy

Range of possible state based on probability distribution

SAC

Sodt Actor Critic - Off-Policy - use observations made from previous policies exploration of the environment - so it can also use old data Data Efficient Less stable short-term More stable long-term

Reinforcement learning

Test and fail Dog training

Reward Function

Uses the input parameters such as: * track_width * distance_from_center * steering angle

RL Reinforcement learning key concepts

key concepts: Agent is the entity being trained. In our example, this is a dog. Environment is the "world" in which the agent interacts, such as a park. Actions are performed by the agent in the environment, such as running around, sitting, or playing ball. Rewards are issued to the agent for performing good actions. Self-driving cars

Deep learning models

is based around a conceptual model of how the human brain functions. The model (also called a neural network) is composed of collections of neurons (very simple computational units) connected together by weights (mathematical representations of how much information thst is allowed to flow from one neuron to the next). The process of training involves finding values for each weight. Various neural network structures have been determined for modeling different kinds of problems or processing different kinds of data.A short (but not complete!) list of noteworthy examples includes: FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer. CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images. RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data. Transformer: A more modern replacement for RNN/LSTMs, the transformer architecture enables training over larger datasets involving sequences of data.


Ensembles d'études connexes

Repaso de Pobre Ana Bailó Tango (Cap. 1-2)

View Set

Learning plan 2 homework assignment

View Set

Fundamentals Practice Exam (HESI)

View Set

Chapter 72: Care of Pt with Male Reproductive Problems EAQ

View Set

Capítulo 15: Los avances tecnológicos

View Set

Real Estate Valuation and Market Analysis

View Set

Why was the Missouri Compromise necessary, what agreements did it include, and what were the consequences of the compromise on the nation?

View Set

Patho final exam (Chapters 30,32,33,37,39,40,41,46 and comprehensive)

View Set