AI Exam 1

Ace your homework & exams now with Quizwiz!

what is reinforcement learning?

- finding a strategy for taking a series of decisions in an environment that is usually changing unpredictably. - reward and cost system to max rewards EX: The learning system, called an agent, observes the environment, selects and performs actions, and gets rewards/penalties in return. Learns by itself, called a policy. A policy defines the action taken in a given situation.

what is the goal of unsupervised learning?

- goal of unsupervised ML is to learn and generate distinctive groups or clusters of data points a dataset. -Data given is only x; no y.

Determine the difference between regression and classification

- if taregt = numerical, is regressor; draw through data -if target = categorical, is classifier; draw between clusters

what does feature engineering do?

- improves the performance of the model by selecting the right features and preparing the features in a way that is suitable for the machine learning model. - heavily dependent on the experience and expertise of the data scientists conducting the analysis.

what is bias?

- is the difference between our actual and predicted values. - Bias is the simple assumptions that our model makes about our data to be able to predict on new data.

how do you get a good fit in ML?

- look at performance of ML model over time with the training data - if model trained too long, can learn the unnecessary details and noise in the training set and lead to overfitting - to get a good fit, need to stop training where error on the test set starts to increase.

why does model size and packaging affect model deployment?

- model size plays a role in how we plan to package the model. - smaller can be wrapped faster and contained in a docker container

what are the three common approaches for converting ordinal and nomincal variables to numeric?

- ordinal encoding - one hot encoding -dummy variable encoding

why does data and concept drift affect model deployment?

- over a span of time real-world data keeps changing and may not be reflected in the model

what are the two most common approaches for dealing with missing values?

- removal: simply remove any observations(rows) where one or missing values are present -imputation: input or impute replacement values where they were originally missing

what are the stages of a simple ML model lifecycle?

- scoping - data collection - data engineering - model training - model validation - deployment - monitoring

what is model selecting?

- the challenge of choosing a model among many that relate to your specific problem - the process of selecting the "best," from among a collection of machine learning models.

what is feature extraction?

- the process of extracting new features from the existing attributes - primarily concerned with reducing the number of features in the model.

what is data cleaning?

- to correct what is incorrect -error may be caused by human input (spelling, formatting, data missing)

what is rolling updates deployment?

- updating all instances of your model one by one - useful when you want to make quick update of entire model line with new version

what is shadow deployment?

- used to test new version of model with production data -A copy of the user request is made and sent to your updated model, but the existing system gives the response

how does encoding contribute to data preparation?

- variables that are numeric, but are unstructured/categorical variables, must be coded numerically

Machine Learning

A subset of AI techniques that use statistical methods to enable machines to improve with experiences; study of algorithms that improve their performance (p) at a task (t) with experience (e); solve a prediction problem given an input X, predict an appropriate output Y

Learning Example

Spam Detection Input: incoming mail Output: spam or not spam this is a binary classification problem because only 2 possible outcomes

Bernoulli Distribution

The probability distribution of a random variable with two possible outcomes, each with a constant probability of occurrence.

what is an outlier generally?

any data point that is very different to the majority can be fixed by removing the row containing the outlier or simply replacing its value

what is irrelevant data?

anything that isn't related to the problem you're looking to solve

computer scientist

applies concepts from computer science to create efficient solutions

what is the curse of dimensionality?

as the dimensionality of the feature's space increases, the number of configurations can grow exponentially and thus the number of configurations covered by an observation decreases

what does it mean to estimate accuracy?

When you are building a predictive model, you need to evaluate the capability of the model on unseen data. This is typically done by estimating accuracy using data not used to train the model

what is a model parameter?

a configuration variable that is internal to the model and whose value can be estimated from data.

what is a data dictionary?

a glossary of terminology relevant to the project; what each data entry is, its format, etc

what is continuous probability distribution?

a probability distribution showing all the possible outcomes and associated probabilities for a given event

what are AI technologies driven by?

data and analytics

what are the three major components of a machine learning system?

data, models and learning

what is a cumulative distribution function?

describes the cumulative probability of any given function below, above or between two points

data engineer

design and build pipelines that transform and transport data into a format so that by the time it reaches the data scientists or other end users in a usable state

what is computer vision?

enables computers and systems to derive meaningful info from digital images, videos and other visual inputs and take actions or make a recommendations based on that info

what does the CLT establish

establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.

computer vision

extracts and understands info from images and videos

what is bias-variance trade off?

find the perfect balance between Bias and Variance. ensures that we capture the essential patterns in our model while ignoring the noise present it in.

what is meant by good models?

good meaning performs well on unseen data requires us to define some performance metrics, such as accuracy or distance from ground truth, as well as figuring out ways to perform well under these performance metrics.

rules based programming approach

handcrafted knowledge - where programmers craft sets of rules to represent knowledge in well-defined domains issues: -labor-intensive -cannot generalize to unanticipated input combos (prediction problematic) -doesn't naturally handle uncertainty

what is unstructured data?

has some implicit structure, it doesn't follow a specified format.

what is the main purpose of EDA?

help ID obvious errors, better understand patterns within data, detect outliers or anomalous events, and find interesting relations among variables

AI hardware

includes physical computer component requirements to achieve increased processing efficiency and/or speed

data collection from different sources

internal and/or external to make sure we have the right data in correspondence with the business requirements/ problems

what does data splitting involve?

involves partitioning the data into: 1.An explicit training dataset used to prepare the model: Train: the algorithm learns from the data pattern to develop the model. 2.An (unseen) test dataset used to evaluate the model's performance. Test: use the developed model from the last step to predict the target variable on the test dataset. Then, evaluate the model's performance by comparing the predicted value with the actual value for our target variable.

what can we use to determine unsupervised learning?

k-means clustering

data scientist

leads research projects to extract valuable info from big data.

what is continuous data?

numeric, but exists in fractional form; represents info that can be divided to a more granular level EX: Lebron James is 2.06m tall v. 2.064759m tall

machine learning approach

statistical learning - where programmers create statistical models for specific problem domains and train them on data issues: -machine learns on its own

Stats v. ML?

statistics is data plus analytical theory and machine learning is data plus computable structures

what does data science do?

structures big data, finding the best patterns, and then advising businesspeople to make the changes that would work best for their needs. It includes the transformation, ingestion, collection, and retrieval of large quantities of data, which is referred to as Big Data.

Deep Learning

subset of ML which make the computation of multi-layer neural networks feasible

what is SVM?

supervised learning models assign new examples to one category or the other, making it a non probabilistic binary linear classifier.

what does MLOps workflow involve?

supporting data collection and processing, experimentation, evaluation and deployment, and monitoring and response.

Again, what is machine learning?

teaching computers how to perform a task without having to program them to do it

what is big bang-recreate deployment?

tear down the existing deployment for the new one to be deployed

what is NLP?

the part of computer science and AI that can help in communicating between computers and humans by natural language; enables a computer to read and understand data by mimicking human natural language; EX: GPS, Siri

what is joint probability?

the probability of the intersection of two events

what is Conditional Probability

the probability that one event will occur given that some other event has occurred

what is standardization?

the process of converting the data into a uniform format.

what is variance?

the variability in the model prediction—how much the ML function can adjust depending on the given data set.

what is the objective of a data dictionary and schema?

to create a product demand fucntion that suggests optimal timing (when) and depth (percent) of markdowns to realize the highest product margin.

what is the goal of Feature engineering?

to create new features by combining several features that we expect to be important based on our human knowledge of the problem.

what is the goal of learning?

to find a model and its corresponding parameters such that the resulting predictor will perform well on unseen data.

Why is data cleaning necessary?

to preprocess data and to correct incorrect, improperly formatted, duplicated, irrelevant, missing or outliers in the data.

what is a learner?

trained algorithm that successfully moves from individual examples to broader generalizations

what is supervised learning?

try to predict either a categorical target variable or a numerical target variable. EX: given an object with a set of known, observed measurements, predict the value of an unknown or target variable

what is structured data?

typically stored in traditional relational databases and refers to data that has a defined length and format.

natural language processing

understanding and using data encoded in written language

what is data extraction?

unstructured or semi-structured data to be converted into structured data.

What is a Poisson distribution?

used to model the number of events occurring within a given time interval

what is a/b testing deployment?

used to understand what users prefer and which model might work better for them

what is binomial distribution?

used when there are exactly 2 mutually exclusive outcomes of a trial labeled success or failure

what is a predictor as a function?

when given a particular input example produces an output. And we have presented this as: Y = F(X). Y - target outcome. F - function (algorithm) that relates X to Y as trained. X - new instances. And we represented the linear function F as: Y ≈ β0 + β1X. Β0-intercept Β1- slope

what is low variation data?

where column in your data set contains only one or few unique values

what is underfitting?

where the model cannot find patterns in our training set and hence fails for both seen and unseen data

how are models deployed in practice?

with REST API or (representational state transfer): an API that conforms to the design principles of the REST, or representational state transfer architectural style.

How do you pick the best model?

·A model that meets the requirements and constraints of project stakeholders. ·A model that is good given the time and resources available. ·A model that is skillful as compared to naive approaches, i.e., excel models. ·A model that performs well relative to other tested models. ·A model that is skillful relative to the state-of-the-art.

develop future state requirements

•Define what the organization's data and analytics structure would ideally look like in the short term and in the long term

create an enterprise data model (EDM)

•EDM is an integrated view of the data produced and consumed across an entire organization •EDM determines the structure by which data is governed and how it relates to the various aspects of the organization

embrace continuous process improvement

•Encourage continuous process improvement (CPI) using incremental enhancements and breakthroughs •Implement a feedback mechanism and put processes in place for its rapid implementation

emphasize rapid prototyping

•Encourage rapid prototyping of solutions and an iterative approach to process improvement •Encourage incremental enhancements to existing processes in order to reach mature processes and technologies

what is continuous uniform distribution?

•Forms the basis for sampling from more complex distributions

conduct a gap analysis

•Identify people, processes and technologies that are required to move from a current state to the desired state

what are some ways to tackle underfitting?

•Increase the number of features in the dataset. •Increase model complexity. •Reduce noise in the data. •Increase the duration of training the data.

obtain leadership and stakeholder commitment

•Leadership commitment is essential for this momentous task •Multiple stakeholders need to be brought on board because of the interdisciplinary nature of this task

what is the prediction function?

- A prediction function takes input x and produces an output y. - Machine learning is about finding the best prediction function.

what happens when you have an underfit model?

- An underfit model has poor performance on the training data and will result in unreliable predictions. - Underfitting occurs due to high bias and low variance.

Role/Responsibility of a Chief analytics or Data officer

- Create a vision for AI in the company. - Identify business-driven use-cases. - Determine the appropriate level of ambition. - Create a target data architecture. - Manage external innovation. - Develop and maintain a network of AI champions.

how do we train a model?

- Distill training data into model parameters - Parameters: * beta coefficients for linear model * tree structure (split for decision tree) - Hyperparameters: num trees, K clusters, learning rate -A model = learned algorithm + parameters

what are the sources of error in assessing model accuracy?

- Model underfitting, too weak or simple, does not capture X Y relationship. - Model overfitting, model too specific to training data, does not generalize well.

Why are model parameters important for ML?

- Parameters are key to machine learning algorithms. They are the part of the model that is learned from historical training data. - Given an input for the feature vector X, the values of the model parameters (which are learnt from the training data) allow the output variable y to be computed. - They are required by the model when making predictions. - The practitioner does not manually set them. - They are often saved as part of the learned model. - Often model parameters are estimated using an optimization algorithm, which is a type of efficient search through possible parameter values. - Think of the model as the hypothesis and the parameters as the tailoring of the hypothesis by of data.

what are the different types of machine learning?

- Supervised learning - Unsupervised learning -Reinforcement learning

data cleaning and feature engineering

- Understand the dataset and clean up the given dataset. - Understand the features and the relationships between them. - Extracting essential variables and leaving behind/removing non-essential variables. - SELECTING, TRANSFORMING, EXTRACTING, COMBINING AND MANIPULATING RAW DATA

what is overfitting?

- When a model performs very well for training data but has poor performance with test data (new data); - the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. - can happen due to low bias and high variance.

what is blue/ green deployment

- a server swap; there are 2 identical systems available -when user requests are routed to the newer system, swapping out for the older one - used mostly in application/web scenarios

What is discrete data?

- also numeric, also exists in whole form; represents info that is countable and cannot be divided into smaller forms EX: Cristiano Ronaldo's total scored points for the season; they cannot be broken down any further

what is an API?

- application programming - set of rules that define how applications or devices can connect to and communicate with each other

what is one-hot encoding

- assigns numeric vector to the value of a nominal variable - there is no exact order to this

what is ordinal encoding?

- assings a numeric value to the value of an ordinal variable - this value preserves an order amongst the values EX: poor, good, excellent -> 1,2,3

what is independent events?

- can occur at the same time - can have intersection - and events - multiplication rule

what is a mutually exclusive event

- cannot occur at the same time -no intersection - or events - addition rule

robot characteristics

- consist of some sort of mechanical construction; this helps it complete tasks in the environment for which it's designed -need electrical components that control and power the machinery - contain some level of computer programming

why does traffic and requesting routing affect model deployment?

- depending on traffic and type of model have to decide on either real-time inferencing or batch model deployment

what is a data dictionary, again?

compiles all of the data about the data elements in the model

what is classical statistics?

concerned with developing models that characterize, explain, and describe phenomena, machine learning is primarily concerned with prediction.

planning/control

contain processes to identify, create and execute activities to achieve specified goals

ensure that data.....

1. Intended for specific use cases and algorithms. 2. Helps make the model more intelligent. 3. Speeds up decision making.

machine learning

contains a broad class of computational models that learn from data

what is canary deployment?

- deploy the update to existing system ad expose the users partially to the new version - smaller % of users will use the updated model and rest will use old version

what is EDA?

- exploratory data analysis: used to analyze and investigate data sets and summarize their main characteristics, often employing data visualization methods. - Used to discover patterns, spot anomalies, test a hypothesis, or check assumptions

what is ordinal data?

-another type of categorical data; does contain an underlaying order or ranking; they generally only show the sequence, not the scale EX: tshirt sizes; small < medium < large; average

What is Gaussian distribution?

-aries naturally in many processes in our everyday life -> central limit theorem

what is nominal data?

-categorical data; the data points do not have an order EX: social media names like instagram, twitter, FB

why does model retraining and versioning affect model deployment?

-how often a model is retrained impacts development strategy because you need to compare model performance, update, and possibly maintain different versions

computer vision tasks

-image classification -object detection -object tracking -content-based image retrieval

what are the major areas of AI?

-knowledge processing -speech -AI hardware -Evolutionary computation -natural language processing -machine learning -vision -planning/control

what are the built in data structures of python?

-list -dictionary -tuple -set

what should you consider when choosing a model?

-performance -how long model takes to train -how easy it is to explain to project stakeholders

What are some NLP use cases?

-virtual agents and chatbots -machine translation -social media sentiment analysis -text summarization

what are the steps to end-to-end process flow?

1) collection of data from various source 2) data cleaning and feature engineering 3) model building for selecting correct ML algorithm 4) evaluate model 5) model deployment

what are the Key Steps in Creating a Center of Excellence in Data Science and Driving Organizational Adoption

1) define a vision 2) obtain leadership/stakeholder commitment 3) evaluate current state 4) develop future state requirements 5) conduct a gap analysis 6) create an implementation roadmap 7) establish a data governance structure 8) create an enterprise data model 9) emphasize rapid prototyping 10) embrace continuous process improvement

what are the fundamental rules data sets must follow before their use in models?

1.All data must be numeric. 2.There can't be any missing values. 3. Must delete or derive numeric features from nonnumeric features, such as strings, dates, and categorical variables. 4.Even with purely numeric data, there is potential cleanup work, such as deleting or replacing erroneous/missing entries or even deleting entire records that are outside our business rules.

what are some examples of probability distributions that are discrete?

1.Bernoulli Distribution 2.Poisson Distribution 3.Binomial Distribution

what are some examples of probability distributions that are continuous?

1.Gaussian Distribution 2.Exponential Distribution 3.Continuous Uniform Distribution

You have gathered data, cleaned the data, performed EDA, explored various algorithms and created the final model. Now What?

1.Model needs to be integrated into an existing production environment so that it can be used for making predictions which will aid in decision making. 2.After a model is deployed in the production environment, when it is given an input, in a production environment, the model provides a prediction for the value of output variable for the given input.

How does Kmeans find find similar baskets?

1.Randomly chooses initial centroids. 2.Measures the distances between data points in our case similar baskets. 3.Sums the distances. 4.Finds a new points by moving the the average of the points in the cluster and repeat. Goal - find the centroids with the smallest distances (Within Cluster Sum of Squares)

python is a .....

an interpreter that translates based on syntax into machine code

what is python?

A multi-purpose language: - Data Analysis. - AI/ML. - Automation. - Web development (server-side). - Software development.

evolutionary computation

contains a set of computational routines using aspects of nature and evolution

what is feature scaling?

Features with very different scales can affect the regularization of ML models, and can also make the learning procedure itself slow. The goal of normalization is to transform the feature values into a similar (or identical) range.

who operationalizes ML?

Data/ML engineers operationalize ML, i.e., deploy and maintain ML pipelines in production.

data science v. ML

Data science can be viewed as an incorporation of several different parent disciplines, including data engineering, software engineering, data analytics, machine learning, business intelligence, predictive analytics, and more.

what is dummy encoding?

If not x or y, it must be z

how do you set the business objective?

START WITH A QUESTION 1) determine business objectives * background * business objectives * business success criteria 2) assess situation * inventory resources * requirements, assumptions, and constraints * risks and contingencies * terminology (data dictionary) * costs and benefits 3) project plan * project plan *assessment of tools and techniques

What is the definition of AI?

activity devoted to making machines intelligent, and intelligence is that quality that enables an entity to function appropriately and with foresight in its environment; any technique which enables computers to mimic human behavior

what is feature selection?

adding or removing features from model to ensure that features are only added or removed if it results in an improvement in the model performance

what are some of the most common ML problems?

classification regression

what do data scientists really spend most of their time doing?

cleaning and organizing data and collecting data sets

what are the two predictive approaches in models as functions?

predictor or probabilistic

what is important to remember in terms of data collection?

privacy, diversity/ neutrality, credibility, and quality of the data.

what is marginal probability?

probability of a single event; If A is an event, the marginal probability is the probability of that event occurring P(a)

what is encoding?

process of converting categorical variables to numerical variables

what is learning?

process of converting experience into expertise or knowledge; learning system that is enabled to use that expertise or knowledge gained when it is confronted with new info

what is operationalizing ML or MLOps?

process of the continual loop of (i) data collection and labeling, (ii) experimentation to improve ML performance, (iii) evaluation throughout a multi-staged deployment process, and (iv) monitoring of performance drops in production.

what is data transformation?

process of transforming the data from one layout to another; doesn't change original meaning of the data

what does good data preparation do?

produces clean and well-curated data which leads to more practical, accurate model outcomes.

what are robots?

programmable machines that assist humans or mimic human actions

what does probability do?

provides a language for quantifying uncertainty

what does the probability theory provide?

provides us the tools and techniques to work with uncertain phenomenon

what are the two problems of supervised ML?

regression and classification

knowledge processing

representing and deriving facts about the world and using this info in automated systems

what is duplicate data?

rows of data that are exactly the same across all columns; add to storage and processing

what is incorrect data?

self explanatory; can be hard to spot

what is unsupervised learning?

solves a complementary set of problems that do not require labeled data.

what is kmeans clustering?

specify the number of clusters (K) that we wish to cluster the data into. - Often, we wish to find groupings or patterns in our data. -Datapoints in the same cluster are deemed to be similar under some measure.

speech

speech recognition includes techniques to understand a sequence of words given an acoustic signal

evaluate the current state

•Assess current technical infrastructure •Review existing business functions, activities, roles of existing stakeholders and technology implementations

establish a data governance structure

•Clearly assign data-related responsibilities, oversight and ownership of tasks •Encourage adoption of new technology and processes •Standardize data processes across the organization

what are the reasons for underfitting?

•Data used for training is not cleaned and contains noise (garbage values) in it. •The model has a high bias. •The size of the training dataset used is not enough. •The model is too simple.

what are the reasons for overfitting?

•Data used for training is not cleaned and contains noise (garbage values) in it. •The model has a high variance. •The size of the training dataset used is not enough. •The model is too complex.

why do we need to construct multiple version of the model?

•Testing •User experience research •Change in market environment •Model update

what are some ways to tackle overfitting?

•Using K-fold cross-validation. •Using regularization techniques such as Lasso and Ridge. •Training model with sufficient data. •Adopting ensembling techniques.

what does the deployment of the model in production on a technical level usually involve?

•an API endpoint gateway •a load balancer •a cluster of virtual machines •a service layer •persistent data storage (Database) •the model itself.

define a vision

•for the goals in data analytics domain that your organization should achieve in the short and long term •This vision statement acts as a guideline for the next steps

skillset for a data scientist

●A deep knowledge of machine learning algorithms. ●Proficiency with statistics and probabilistic reasoning. ●Proficiency with python, R and other computer languages used for machine learning. ●Proficiency with various machine learning frameworks such as scikit-learn. ●Pros and cons of existing machine learning techniques. ●A deep knowledge of AI literature, algorithms and how existing machine learning techniques can be adapted to the problem at hand. ●Ability to work at the interface of computer science, mathematics and machine learning.

skillset for computer scientist

●A detailed understanding of computer architecture. ●In-depth knowledge of operating systems, what functionality it provides and the hardware-software interface. ●Ability to program computer systems using various programming languages and create new software products. ●Develop underlying computer concepts on which data engineer can build: for example, development of new concepts for efficient data storage. ●Development of underlying concepts and tools on which data scientist can build and deploy.

skillset for data engineer

●Extensive knowledge of database concepts. ●Extensive knowledge of various types of fault tolerant architectures used in database design. ●Detailed knowledge of data layouts on various types of data storage systems and how they store information. ●An understanding of various performance metrics to make efficient use of these layouts. ●Deep understanding of how databases are accessed on a computer network. ●In contrast to a data scientist who requires detailed knowledge of numeric programming and machine learning algorithms, a data engineer requires proficiency with database languages such as SQL.

what are the applications of probability to ML?

●Just like calculus and matrix theory, probability theory is one of the main pillars on which machine learning and AI rest 1.Many classification algorithms such as Naive Bayes are based entirely on probability 2.Many machine learning algorithms such as logistic regression incorporate probability ideas as part of their inference 3.Machine learning algorithms such as Decision Trees use probabilistic ideas under the hood 4.Many AI techniques for Natural Language Processing(NLP) and Speech Recognition are based on probability - examples include parts of speech (POS) tagging using Hidden Markov Models (HMMs) 5.Bayesian Networks, which are based entirely on probability ideas, are a well know AI technique used for decision making in numerous business applications


Related study sets

Help Desk Customer Service Quiz #9(slides, part two)

View Set

Organizations and Human Capital Final Exam

View Set

Property and Casualty Insurance (Help)

View Set

positieve eigenschappen (adjectieven) E - F

View Set