My CSE 40 Machine Learning Study Guide

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

ML Input: Label

A correct label (desired output) associated with input feature vector. Notation: 1) if x is input vector, y is output label 2) x i is input vector, and y i is output label

ML Input: Feature Vectors

A feature vector is simply a vector (represented as an array) of features describing each example. E.g., salary & education, debt, etc. Notation: 1) x is input vector, x i is the ith feature 2) x i is input vector, and x ij is the jth feature of the ith input vector

ML Output: Hypothesis

A hypothesis is a function that takes an input feature vector and outputs a (predicted) label. Notation: 1) h(x) a function 2) H is the set of all hypothesis functions being considered, called the hypothesis space

Empirical Risk Minimization

A principle in machine learning where the idea is to minimize the error or "risk" on the training data. Here's how it works: Training Data: You have a dataset that you use to "train" or teach your machine learning model. Risk or Error: You need a way to measure how well your model is doing. This is where the "risk" or error comes in. It's a way of quantifying how far off your model's predictions are from the true values. Minimization: The goal of ERM is to find the best model that minimizes this error or risk on the training data.

Sample Space

A sample space in machine learning is the set of all possible inputs to a machine learning model. It is analogous to the set of all possible outcomes of a random experiment. For example, if we are building a machine learning model to classify images of cats and dogs, our sample space would be the set of all possible images of cats and dogs.

What type of reasoning is this example: There is heavy traffic so there is probably an accident ahead.

Abductive Reasoning

What type of reasoning is this example: eStudent has had coughing symptoms, breathing problems, and is generally ill. Therefore, eStudent is likely COVID-19 positive.

Abductive Reasoning

Permutation

An arrangement, or listing, of objects in which order is important. n!/(n-k)!

Bayes Rule

Bayes' theorem is a mathematical formula that describes the probability of an event occurring, given the knowledge of whether or not another event has occurred. It is based on the idea that the probability of an event can be updated based on new information. P(A|B) * P(A)/ P(B)

Odds

Compares the number of successes to the number of failures.

Probability

Compares the number of successes to the total number of attempts made.

Data Cleaning

Data cleaning is the process of identifying, deleting, and/or replacing inconsistent or incorrect information from the data.

What type of reasoning is this example: All dogs have ears, golden retrievers are dogs therefore golden retrievers have ears.

Deductive Reasoning

What type of reasoning is this example: All humans are mortal and Socrates is a human. Therefore Socrates is mortal.

Deductive Reasoning

Joint Distribution

Describes the behavior of two or more random variables simultaneously.

The Product Rule

Given conditional probability, we can get joint distribution by simply multiplying: P(X,Y) = P(X|Y)*P(Y)

Python Pandas

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data. With this tool, we can accomplish typical steps in the processing and analysis of data, regardless of the origin of data — load, prepare, manipulate, model, and analyze

Reinforcement Learning

In this learning style, an algorithm (often called an "agent") interacts with an environment and learns to make decisions by receiving feedback in the form of rewards or penalties. It's not provided with the correct answer but must discover it by trying out different actions and observing the outcomes. It's used in situations where an agent needs to learn how to behave in an environment by performing certain actions and receiving rewards as feedback. Training a computer program to play a game like chess or Go. The program makes a move (action), the game progresses (new state), and the program receives a reward or penalty depending on the outcome of the move.

Unsupervised Learning

In this learning style, the algorithm is given input data without any explicit output or labels. The goal is to discover patterns, structures, or relationships within the data. It's used when one wants to derive structure from data without having predefined labels. If you had data about customer shopping habits, this learning algorithm could group customers with similar habits without knowing any predefined categories.

Supervised Learning

In this learning style, the algorithm is trained on a labeled dataset. This means that each example in the training dataset is paired with the correct output. Consists of the algorithm making predictions and then being corrected by the labeled data whenever it's wrong. Used for problems such as classification: The output variable is a category, such as "spam" or "not spam", "fraudulent" or "valid", etc

What type of reasoning is Machine Learning?

Inductive Reasoning

What type of reasoning is this example: All swans we've seen are white, therefore all swans are white.

Inductive Reasoning

What type of reasoning is this example: Sarah goes to a local park every day during her lunch break. Over the course of several weeks, she notices the following: -On Monday, she sees that all the birds she observes near the pond are ducks. -On Tuesday, she again notices that every bird she sees near the pond is a duck. -On Wednesday and Thursday, the pattern continues—only ducks are present near the pond. She concludes that all birds that hang out near the pond in this park are ducks.

Inductive Reasoning

Event

Is a subset of outcomes that can occur in an experiment. Ex: The probability of getting an even number

Combinations

Order doesn't matter n!/(n-k)! * k!

Supervised Machine Learning: Ranking

Output a ranking, either ordinal (0.0,1.0), or pairwise.

Supervised Machine Learning: Regression

Output a real number. E.g., number between (-inf,+inf), (0,+inf), (0.0,1.0), etc.

Supervised Machine Learning: Classification

Output one of a set of discrete labels. E.g., yes/no, low/medium/high, etc.

Joint Probability for Two Independent Events

P(A,B) = P(A) * P(B)

Deductive Reasoning

Reasoning in which a conclusion is guaranteed. It begins by introducing a general guideline or principle, and then, based on that, leads to a clear and specific conclusion. If the original assertions are true then the conclusion must be true. With this reasoning we can make observations and expand implications, we cannot make predictions about the future or otherwise non-observed phenomena.

What type of learning is this example: eProf gets evaluated based on the SETS at the end of the quarter. Throughout the quarter, she can take actions (give lectures, make assignments and quizzes), and gets indirect feedback such as how many students fall asleep, student comments on EdDiscussion, etc. She would like to optimize her actions so that her SETS are good (and her students learn, :>).

Reinforcement Learning

The focus of this class will be:

Supervised Learning

What type of learning is this example: eProf has examples of research papers she has written with students from her research group, including features such as how novel the theoretical results are, how exciting the experimental results are, how good the title is, how well-written the paper is, and labels, whether or not the paper was accepted or rejected. She would like to predict whether a new paper will be accepted or rejected.

Supervised Learning

What type of learning is this example: eProf has information about students in her class from previous quarters - their attendance, their grades on HO & WA, and scores on quizzes, and whether they got an A on the final. For students this quarter, given attendance, grades on HO & WA, and scores on quizzes, she would like to predict whether a student will get an A on the final

Supervised Learning

Loss Function

The loss function measures the difference between the output of the hypothesis, h(x), and the desired output y. You can think of it as measuring the error in the hypothesis. For now, we'll just consider the simplest loss function, the number of mistakes. If we're correct, then the loss is 0If we're incorrect, then the loss is 1This is also often called 0/1 loss (for obvious reasons, :>!)

True Positive (TP)

The model correctly predicts that an instance belongs to a particular class.

True Negative (TN)

The model correctly predicts that an instance does not belong to a particular class.

False Positive (FP)

The model incorrectly predicts that an instance belongs to a particular class when it does not.

False Negative (FN)

The model incorrectly predicts that an instance does not belong to a particular class when it does.

Conditional Probability

The probability of an event occurring given that another event has already occurred. It quantifies the likelihood of an event. X happening under the condition that event Y has taken place.

Recall

The proportion of actual positive examples that are correct. Recall measures how well the model identifies all of the instances of a particular class. It is calculated as the percentage of actual positive instances that are correctly predicted: Recall = TP / (TP + FN)

Accuracy

The proportion of correct predictions. Accuracy is the most common metric for evaluating machine learning models. It is calculated as the percentage of all predictions that are correct: (TP + TN)/(TP+FP+TN+FN)

Precision

The proportion of positive predictions that are actually correct- Recall: The proportion of actual positive examples that are correct Measures how precise the model's positive predictions are. It is calculated as the percentage of positive predictions that are actually correct: Precision = TP / (TP + FP)

Inductive Reasoning

This reasoning begins with observations that are specific and limited in scope and proceeds to a generalized conclusion that is likely, but not certain, in light of accumulated evidence. Much scientific research is carried out by this reasoning: gathering evidence, seeking patterns, and forming a hypothesis or theory to explain what is seen. While this reasoning cannot yield an absolutely certain conclusion, it can actually increase human knowledge (it is ampliative). It can make predictions about future events or as-yet unobserved phenomena.

Abductive Reasoning

This reasoning typically begins with an incomplete set of observations and proceeds to the likeliest possible explanation for the set. "Taking your best shot"

Independence

Two events are said to be independent if the occurrence of one event does not affect the probability of the occurrence of the other event. Ex: -Tossing a coin and rolling a die -Tossing a coin today and tossing a coin tomorrow. If two events are independent they satisfy P(A|B) = P(A) and P(B|A) = P(B)

What type of learning is this example: eProf did a survey to ask to about the interests and background of students taking her class. She is trying to find patterns in their background and interests (so she can make more compelling examples!) She would like to find groups (or clusters) of interests

Unsupervised Learning

What is Machine Learning?

When we say machines "learn," what we really mean is that they find a math formula that works well with a certain set of data (the "training data"). This formula helps the machine produce the right results. And, if we give the machine new data that's similar to the training data, the formula should still give us the right results. But remember, machines aren't really learning like humans do. They're just using math to make predictions.

DataFrames

While data can take many forms, tabular data (aka a table) is the most common. In this setting, data is structured into rows representing a single entry Properties or attributes of these entries are values in the row, and titled columns describe the properties. Index — a unique identifier for each data entry


Kaugnay na mga set ng pag-aaral

Network+ Guide to Networks Chapter 5 - RQ

View Set

AP Psychology Test 1: Motivation, Emotion, and Personality

View Set

Gross 1: Brachial Plexus, shoulder and arm

View Set

Chapter 12: The Age of the Renaissance

View Set