Introduction to Machine Learning

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is supervised learning and what approaches does it entail?

Learns from both inputs and expected outputs, datasets are labeled Classification, regression, similarity learning, feature learning, anomaly detection passive process where learning is performed without any actions that could influence the data

What is unsupervised learning and what approaches does it entail?

Learns from data that only contains inputs (unlabeled), finds hidden structures and relationships in data to train model Clustering, feature learning, anomaly detection passive process where learning is performed without any actions that could influence the data

What is reinforcement learning and what approaches does it entail?

Learns how an agent should take action in an environment to maximize a reward function Markov Decision Process active process where the actions of the agent influence the data observed in the future, hence influencing its own potential future states

How do you prepare data for a linear regression?

Linear assumption, remove noise, remove collinearity, gaussian distribution fit, rescale inputs

What is the Markov Decision Process?

Mathematical process to model decision making in situations where out comes are partly random and partly under the control of a decision maker Used in reinforcement learning

What is the difference between irreducible error and model error?

Model error is how different the predictions are from the actual output, can be reduced by refining the model learning process

What are some applications of ML?

Natural language processing, Computer vision, analytics, decision making

How does feature extraction for text data work?

Vectors can be visualized on a graph, the distance between two vectors is used to assess similarity in meaning or some connection.

What is Overfitting?

When the model fits the training dat avery well but fails to generalize new data "memorizing" the data and not adapting well to new data

What is Underfitting?

When the models neither fit the training data nor generalize to new data Doesn't model training data well, doesn't generalize new data well either

What is AI (artificial intelligence)?

a broad term that refers to computers thinking like humans

What is standardization?

a method of scaling data to have mean = 1 and std. deviation = 1

What is anomaly detection?

algorithms used to detect abnormal data in a set of normal data

What is Variance?

amount the estimate of the target function will change if different training data is used

What is ordinal encoding?

converting categories into numerical values, first category is represented by 0, second by 1 ... etc.

What is one hot encoding?

each possible value for a category gets its own column and receives either a 1 or a 0 as its value depending if the entity is part of that category or not

What does it man to tokenize a string?

either split each strong of text into a list of smaller parts or tokens or split a sentence into separate keywords

What are feature learning algorithms?

features are discovered or learned from the data

What distinguishes Clustering algorithms?

find inherent groups or clusters in the data, assigns entities to each cluster/group

What does training a linear regression model entail?

finding the coefficients that best represent the input variables, minimizing error between line of best fit and each data point

What are stop words?

high freq. words that are unwanted during text analysis

What is the drawback of ordinal encoding?

implicitly assumes order and importance between categories (category 1 is more important than category 2 because it is 0 whereas category 2 is 1)

What is the tradeoff between bias and variance?

inverse relationship, models that are very complex usually have low bias and high variance Low complexity models usually have low variance and high bias Error is lowest when variance and bias are balanced

What is the drawback of one hot encoding?

large number of columns generated

What is a learning function used for?

learn a useful transformation of the input data that gets us closed to our expected output

How do you make image data have uniform aspect ratio?

make sure it's a square matrix

How do you normalize image data?

mean pixel value in a channel from each pixel value in that channel

What is the difference between a model and an algorithm?

models are specific representations learned from data, algorithms are the processes of learning Model = Algorithm(data)

What is a non-parametric function?

no assumptions are made regarding the form of mapping between input and output data, free-form relationship formation between data, can have any functional form

What distinguishes Classification algorithms?

outputs are categorical

What distinguishes Regression algorithms?

outputs are numericaland continuous

What is a parametric function?

parametric functions simplify mapping to a known functional form, general form is known, computes for coefficients and constants

What do the columns in tabular data represent?

properties of items

What is lemmatization when it comes to text data?

reduces multiple inflections to a single dictionary form

What is Bias?

simplifying assumptions made by a model that make the target function easier to learn

What do the rows in tabular data represent?

single items (entities)

What is ML (Machine learning?)

subcategory of AI that involves learning from data without being explicitly programmed

What is DL (Deep Learning)?

subcategory of machine learning that uses a layered neural network architecture inspired by the human brain

What is TF-IDF and how does it work?

term frequency-inverse document frequency and it assigns less importance to common words or words that contain less information

What is irreducible error caused by?

the data collection process: - not enough data - not enough data features

How do you normalize text data?

transform it into canonical form. multiple spellings are reduced into single spelling (colour becomes color), different forms are reduced to a single form ('is, am, are' all becomes 'be')

What is the statistical perspective of ML?

y= f(x), output is dependent as a function of the input and you are looking to find the function

How are values modified in standardization?

(x- mean) / variance for value x

How are values modified in normalization?

(x-xmin)/(xmax-xmin) for values x

What are some properties of classical ML?

- based on classical mathematical algorithms - more suitable for small data - easier to interpret outcomes - cheaper to perform - can run on low-end machines - doesn't require large amounts of computational power - difficult to learn large datasets - requires feature engineering - difficult to learn complex features

What are some properties of deep learning?

- based on neural networks - suitable for high complexity problems - better accuracy than classical ML - better support for big data - complex features can be learned - difficult to explain trained data - requires significant computational power

What is ML?

A data science technique that extracts patterns from data to forecast future outcomes, behaviors and trends

What is normalization?

A method of scaling data into the range [0,1]

What are the benefits and limitations of non-parametric functions?

Benefits: highly flexible, can fit a large # of functional forms, makes no assumptions on underlying function, high performance in prediction models produced Limitations: more training data needed, slower to train, generally has many parameters, risks overfitting the training data

What are the benefits and limitations of parametric functions?

Benefits: simpler, easier to understand and interpret, faster learning from data, less training data required to learn mapping function Limitations: highly constrained to specific form of the function, limited complexity, poor fit in practice, not everything fits the underlying mapping function

Why is high bias bad?

Bias measures how inaccurate a model prediction is in comparison to true output so more bias = less accurate High bias = more assumptions + potentially miss important relationships b/t features and output, can cause underfitting

What are the steps of the data science process?

Collect data, prepare data, train model, evaluate model, deploy model, retrain model

What is depth in terms of image data?

Depth is how many channels the data has. RBG has depth of 3 and grayscale has depth 1

What are notebooks?

Documenting tool that others can use to reproduce experiments, it's a combination of runnable code, output, formatted text and visualizations that is made up of one or more cells that allow execution of individual code snippets and chunks. Output of each cell can be saved and viewed by others

How is image data vectorized?

Each pixel is represented by [xpos, ypos, color]. 3-D vector size is [height]*[width]*[channel depth] so a 4x4 color image has vector size [4][4][3].

What kind of bias / variance do non-parametric algorithms usually have?

High Variance, Low Bias

What kind of bias / variance do parametric algorithms usually have?

High bias, Low Variance

Why is high variance bad?

High variance suggests that the algorithm learns the random noise instead of the output and causes overfitting

How do you vectorize text data?

Identify the particular features of the text that is relevant to the task Get features extracted in a numerical form that is accessible to ML algorithm via TF-IDF or word embedding

How is image data represented?

Image data is represented by pixels

What are some hallmarks of tabular data?

It is arranged in a data table with rows and columns

What is the importance of vectors in ML?

It is used heavily to represent many things. Non-numerical data types are often converted into representative numerical vectors

What is a linear regression model?

It predicts a variable y from input variable x and assumes a simple linear relationship

How would you prevent overfitting?

K-fold cross-validation Simplifying Model More data Reduce Dimensionality Stop training early when performance stops improving

Explain what each of the following components help you perform ML runs in Azure ML: [Notebooks, automated ML, designer, datasets, experiments, models, endpoints, compute, datastores]

Notebooks - Sample notebooks and user files loaded inside of compute instances Automated ML - Can automate intensive tasks that rapidly iterate over many combinations of algorithms, hyperparameters to find the best model based on the chosen metric - Create new runs and view previous runs in the Automated ML tab Designer - Drag-and-drop tool that lets you create ML models without any code - Has templates and can view drafts Datasets - Create datasets from local files, datastores, etc Experiments - Helps organize runs - All runs must be associated with an experiment, can view all runs related to an experiment Models - Models are produced by runs in Azure ML, all models created in Azure or trained outside of Azure are accessible here Endpoints - Exposes real-time endpoints for scoring or pipelines for advanced automation Compute - Designated compute resource where you run training script or host service deployment - Manage compute instance, training cluster, inference cluster, attached compute Datastores - Attached storage account in which you can store datasets

What types of data does ML deal with?

Numerical, time-Series, Categorical, Text, Image

What are the two approaches to encoding categorical data?

Ordinal encoding and one hot encoding

Identify the pipeline for text data

Preprocessing and normalizing, tokenization, stop word removal etc -> feature extraction and vectorization (TF-IDF, GloVe, Word2Vec)-> feed vectorize document and labels into model and train model

What is the computer science perspective of ML?

Program(input Features), data inputs (input features) are used to train a model to find the correct outputs (sometimes given). Use input features to create a program that can generate the desired output

What is the formula to determine Root Mean Squared Error (RMSE)

RMSE = sqrt((predicted - actual)^2)/(# of datapoints))

When it comes to image preprocessing, what are some of the transformations used?

Rotation, cropping, resizing, denoising, centering, normalizing, making the aspect ratio uniform

What are the forms for simple and multiple linear regressions?

Simple: Y = B0 + B1*X Multiple: Y = B0 + B1*X1 +B2*X2 ... +Bn*Xn

What is K-fold cross-validation?

Splits initial training data into k subsets and trains the model k times Used to reduce overfitting

What are the two main approaches of scaling data?

Standardization and Normalization

What is irreducible error?

The ever-present error in a predicted value because it is predicted from a limited dataset


Ensembles d'études connexes

A&P 1, day one - directional terms

View Set

DECA MCQ Master Set by alpha order

View Set

Basic Financial Management Chapter 9

View Set

Pathophysiology Chapter 49 PrepU Questions

View Set

Schoology Grade Book Quizlet Review

View Set

Review of Long Term Liabilities, Review of Cash Flows

View Set

Practice Test: Module 8 Segmentation

View Set