IS - ML

Ace your homework & exams now with Quizwiz!

accuracy is the

proportion of correct classifications

error rate is the

proportion of misclassifications

bias node value is always

1

output in naive bayes

assign a class with maximal probability for the new dataset

bayesian learning (Naive bayes classifier) steps

- Decide on features - do pre-processing: lowercase words, remove punctuation, stem, remove stop words... - Count occurences (can use tf - idf) - count probabilities (apply smooting if needed) - apply naive bayes to find the class with maximal probability

Preprocessing in consists of

- Deciding on features - splitting data in training and testing parts

TN

I predicted negative and it was actually negative

classifier learns relations between

labels and features

how to evaluate ranking algorithm

precision @ k

Limitations of AI

- Limited ability to learn and adapt to new situations without instructions. - Difficulty in analyzing, interpreting unstructured data, such as text or images - If the data used to train an AI system is biased, the system will also be biased. - Limited creativity - lack of common sense and lack of unnding (ability to connect concepts (red) to objects (rose) without instructions)

difference: validation set and test set

- Validation sets are used DURING TRAINING to adjust the model's parameters and see how the training process is going so far,, - test sets are used AFTER training to evaluate the FINAL performance of the model on UNSEEN data.

unsupervised learning meaning

- When an AI system can look at data on its own and build rules; - NO CORRECT ANSWERS (no previous info) GIVEN!

autonomous weapons - problems

- can be used in unethincal way or against the law - thus, we need laws in order to develop and use them - If an autonomous weapon causes harm or damage, it is not clear who would be responsible for the consequences

why privacy is a huge concern in AI?

- collects personal data - might be sharing it with third parties (foe ex companies) - may keep data forever - AI systems may be vulnerable to cyber attacks and information leaking

how do you convert text into text feature vector?

- do preprocessing (lowercase the words, remove punctuation, stem, remove stop words) - COUNT OCCURENCIES (FT-IDF)

K-means clustering algorithm

- it is unsupervised learning method, so it does not require labeled data. It identifies patterns and group similar data points itself into k groups BASED ON THE MEAN DISTANCE BTW DATA POINTS AND THE CENTROID OF THE GROUP.

Libet experiment - 2 conclusions

- our conscious intention for an action is NOT the cause of that action. - brain actifity was first, before individual WILLED anything to happen - brain knew when you would click before you consciously did!

algorithms/methods for classification:

1. Artificial neural networks 2. linear classification 3. Bayes classifier 4. with KNN 5. decision trees

ML Steps

1. Choose the features 2. Choose the model class 3. Choose a search method

Asimov's Laws

1. NOT LET TO INJURE - A robot may not injure a human being or allow a human being harmed 2. OBEY ORDERS - A robot must obey orders given it by human beings except where such orders would conflict with the First Law. 3. PROTECT ITS OWN - A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

types of supervised learning

1. classification- assigning CLASS to each example 2. regression - assigning NUMBER to each example 3. collaborative filtering - recommendations 4. ranking

shilouette score: what is it, what it measures, what is being used to calculate it

1. it is the MEAN DISTANCE btw point x and all other points of the same cluster. 2. It measures how well-separated clusters are in a clustering algorithm. 3. It is calculated using intra-cluster distance and inter-cluster distance.

A good clustering will produce high quality clusters in which:

1. the intra-cluster similarity is high (similarity of the points in one cluster) 2. the inter-cluster similarity is low (similarity of the points between clusters)

orthogonal

90 degree angle

how feature vector could look like?

<1,1,0,1,0>

K-nearest neighbor classifier

A SUPERVISED learning technique that classifies a new observation by finding similarities ("nearness"/ PROXIMITY) between this new observation and the existing data it finds the most similar case and copies its label

Silhouette Coefficient

A measure between -1 and 1 to evaluate clustering algorithms; Higher value indicate that a point is well clustered and vice versa

Design Matrix

A tool used to compare design solutions against one another, using specific criteria - when we are thinking if we should choose 2D, 3D or 4D space to represent a problem?

ML is subset of .. and ...

AI and Data science

Feedforward Neural Network

a type of artificial neural network. - no loops - information flows ONLY forward

Baye's rule vs naive Bayes classifier

Bayes' rule is a general math formula, while naive Bayes is a specific ML classification algorithm that is based on Bayes' rule.

ranking and ex

COMPARING ITEMS and ranking them based on keywords example: web search, image search

Searle's Chinese Room argument

Computer is just excecuting the program and cannot actually "understand" what it is doing, regardless of how "intelligently" the computer seems to act.

confusion matrix

a way to evaluate classification model. It is a table with false positive, true positive, ... etc

recall minimizes

FN - minimizes the ERROR VALUE (the distance between linear regression and datapoint)

precision minimizes

FP

domain in rdfs is where the line is pointing

FROM

vagueness vs uncertainty: what is the difference

For ex, the concept of "tallness" is vague because it is difficult to say exactly how tall someone or something must be to be considered tall; while uncertainty is a lack of knowledge/confidence about a situation

FN

I predicted negative and it was actually positive

TP

I predicted positive and it was actually true (positive)

FP

I predicted positive, but it was actually false

stemming

In keyword searching, word endings are automatically removed (lines -> line) , you replace word with its stem

Deep learning is a subset of

ML

bayesian learning aka

Naive bayes classifier

bayesian learning is also called

Naive bayes classifier

Ethics and law: connections

Not all unethical behaviour is illegal Not all illegal behaviour is ethical Not all legal behaviour is ethical

give me an example of a vector in a feature space:

Number of positive words; length of the sentence; etc..

Binary classification

T/F; positive/negative values

range in rdfs is where the line is pointing

TO

Tf-idf =

Tf * idf

Veil of Ignorance

The Veil of Ignorance shows us that ignorance is not always detrimental to a society, sometimes it is better to ignore something

feature set

The group of features your machine learning model trains on. For example, postal code, property size, and property condition might comprise a feature set for a model that predicts housing prices.

what is Syllabisation in natural language processing?

dividing a word into its individual syllables (skiemenys)

Clustering and examples

discovering structure in data and grouping items example: when you go to "photos" and type "sunset" and get all pics of the sunset

Turing Test

a test of computer's intelligence - if human can't say if they are talking to machine or another human, machine then is intelligent

End-to-end evaluation means that we evaluate the performance of an algorithm on .... , from ... to ...

a complete dataset, from the initial state to the final results

maximum margin hyperplane

a hyperplane that separates two classes of data points in a multi-dimensional space with the maximum possible distance between the hyperplane and the nearest data points of each class.

feature space and example

a space in which each dimension represents a feature that characterizes your data. For example, if your data is about people, your feature space might be Gender, Height, Weight, Age

"label" or "class" in a training set in classification is

an attribute that we want to teach ML to determine

ANN

artificial neural network

Classification and example

assign a CLASS aka LABEL to each example; for ex: 1. program which converts handwriting to digital text 2. face recognition 3. autonomous driving

(linear) regression and example

assign a NUMBER to each example; example: stock market

"features" in a training set in classification are

attributes that ML can learn from

sequential sampling aka

autoregressive sampling

why the Naive Bayes algorithm is called naive?

because it assumes conditional independence of all the features used for classification

hamming distance formula

bits different / all bits

Hyperplane

boundary that separates different classes or categories in a multi-dimensional space.

in naive bayes,the formula we use is

chaining

unsupervised learning type

clustering: grouping items WITHOUT training data

what is Discretisaton in natural language processing? + example

converting continuous data into discrete (categorical): for ex, when you have ppl with diff height, you create categories: ppl with height 150-160; then 160-170, etc

K-nearest neighbor classifier is represented in

coordination plane, also called FEATURE space (because dots in there are FEATURES)

Supervised Learning

correct answers (previous information) are given for each example!

we don't use smoothing in

cosine similarity, hamming Distance and euclidean distance

we can find KNN with

cosine similarity, hamming Distance or euclidean distance

Euclidean distance formula in KNN - in your own words, what do you need to do?

count the number of times when 2 sentences differ, and then take the square root of it

solution to avoid overfitting

cross validation

preprocessing steps in bayesian learning

data preparation - lowercase words, remove punctuation, do stemming, removing stop words

input in naive bayes

data which belongs to certain classes + one set of data with unknown class

Part of Speech Tagging

defining verbs, nouns, etc. in a sentence

Rule-based reasoning is just a

different way of proving entailment

regression distance (error value) isw

distance from the datapoint to the regression line

the trolley problem is used to

explore ethical dilemmas and decision-making

classifier divides the

feature space

we use Naive Bayes to

find class with maximal probability

Libet experiment findings leads us to argue that

free will is just an illusion, and our decisions are more like a "report" of what is already happening in the brain

When we use Naive Bayes to detect spam, the algorithm discards the order of the words and considers only their

frequency

shilouette score interval + what values here mean?

from -1 to 1. The higher the number, the better clustering

sci-fi ethics

future ethics

the ... the number for precision or recall, the better

higher

Validation/development set aka

hold out set

weird thing how machines fool ppl in turing test

humans associate intelligence during turing test with many other things which are not actually intelligent, for example using " ;) " , words fillers and many exclamation marks

sentiment analysis

identifying an attitude based on text

Named entity recognition

identifying and labelling words in text

Alan Turing theory

if a computer can convince/ fool person that they are interacting with another human being, computer is intelligent and is able to "think"

what is hyperplane un 2D space and 3D space?

in 2D, it is a 1D line that separates different categories. In 3D, it is a 2D plane

what is neighborhood distance?

in KNN it is a parameter that determines how far from a given point the algorithm should look for other data points aka "neighbors" for prediction

cosine similarity vs Hamming Distance vs euclidean distance: main difference

in cosine similarity, we want to MAXIMIZE the number because the bigger the similarity, the closer the neighbor. for euclidean and hamming distance, we want to MINIMIZE the number bc it represents a distance = when distance is smaller, they are more similar

"implication is transative" means that

in relationship btw 3 statements: if A -> B and B -> C, then A -> C

layers of neural networks

input layer, hidden layer, output layer

Turing test - what is the argument based on which we can argue its quality?

it uses human intelligence as the standard for intelligence; incorrectly assumes that all intelligence will be similar to human intelligence

k meaning in k-NN algorithm

k defines how many neighbors will be checked to determine the classification of a new instance

when we have 0 in naive Bayes classifier table, we use

laplace, simple or naïve smoothing

in K nearest neighbor classifier, there is no

learning phase

hyperparameter in ML is a parameter whose value is used to control the ...

learning process

Lemmatization

like stemming, but more accurate (for ex would convert "good" from "better")

what is Grounding in natural language processing?

linking words to their meaning in the real world; true understanding of abstract concepts requires sensory experience! we have it; remember example with red rose

gradient descent

local search similar to hill-climbing, used in 3D spaces when searching for a point with the lowest error rate. We move steb by step and check

sequential sampling (autoregressive sampling) is

longitudinal - data is collected and analyzed over time, and researchers make decisions about what additional data to collect based on their findings from data they've already collected

error function other name

loss function

when you have a complex classification problem and need to fit it with a linear model,

modify the FEATURES so that you could still draw a linear model

the smaller eucledian distance, the ...

nearer the neighbour is

collaborative filtering example

netflix, spotify recommendations

"K" in k-means clustering represents

number of clusters

cosine similarity formula in your own words when using it in KNN

number of times where both sentences A = 1 and B = 1 ----------------------------------- number of times when A = 1 * when B = 1

reinforcement learning has

occasional rewards

overfitting

occurs when a ML model matches the training data so closely that the model fails to make correct predictions on new data

parsing

oouping words in a sentence into phrases. The way a sentence is parsed determines its meaning.

Precision @ k formula is the same as

our regular precision formula

Classification is about ... and not

predicting the correct CLASS label and not about ACCURATELY estimating probabilities.

Pros and Cons of K-Means

pros: fast, can work with many examples; can work in high-dimensional spaces cons: 1. K is an input parameter - > estimating the right k is difficult 2. it gets stuck in local minima 3. Often no optimal clusters are found 4. Outliers might lead to problems

Stochastic - synonim

random

operators in knowledge graphs

rdf and rdfs

ontology-based languages are

rdfs and rdf

error rate aka

residual sum of squares

measure to evaluate clustering algorithm (for instance, k-means clustering) quality

shilouette score (intra- and inter-class similarity)

precision

shows accuracy of our algorithm. It is the ratio of the cases that algorithm predicted "positive" and it was accurate / all cases that algorithm predicted "positive"

why maximum margin hyperplane is useful?

since distance btw the hyperplane and the nearest data points of each class is maximized, it reduces the likelihood of misclassification.

cross validation

spreading out validation sets over the whole training process

formula to calculate node's value in feedforward neural network

sum (child node value * weight)

the row of (marginal and not) probability distribution ALWAYS

sums up to 1

ML categories

supervised, unsupervised, semi-supervised, reinforcement learning

when you create a classifier, you test the new instances with

test set

tf and idf meaning

tf - term frequency; idf - Inverse document frequency (how common the word is in the collection of documents?)

Naive Bayes can still perform quite well even if

the assumptions behind Naive Bayes are not satisfied

smaller value in Hamming Distance means that

the distance is smaller, they are more similar and it is a closer neighbor

what are the FEATURES when applying naive bayes when training with sentences?

the keywords (words)

what k means in k-fold cross-validation

the number of groups that a given data sample is to be split into.

hamming distance meaning

the number of pieces of data you would have to change to convert one dataset (for ex sentence) into another.

Linear classifier

the plane/line that separates the two sets of data

recall

the ratio of the cases that algorithm PREDICTED "positive" and it was accurate / all cases when the outcome was actually "positive"

sigma means

the sum of

regression predicts .. , for ex

the value of a function, for ex house price given a huge dataset abt previous houses

Pragmatism in ethics rejects the idea that

there is any universal ethical principle

in ML, features are generic means that

they are general, global, designed to be used in different contexts

what we want to represent with ontologies/knowledge graphs?

to define relations; to classify

activation function of neuron is applied

to the output of a neuron in a neural network.

in nearest neighbour classification, the biggest problem is the noise and solution is ...

to use K Nearest neighbour classification

sets in ML:

training set Validation/development set (aka hold out set) Test set

It is more important to have a larger ... set than a larger ... set in order to avoid overfitting.

training, test

resolution rule is the same as

transative implication

when the machine learning model description involves "auto-labelling" or smth like this, we can immediately assume that this is .. ML algorithm

unsupervised

semi-supervised learning

uses both labeled and unlabeled data to learn the relationships.

when better (higher) score of recall is more important than high score of precision? give an example

when for ex you want to build a better system for detecting a cancer; because you want to minimize the cases of FN (when system predicts negative, but it is actually true)

Occam's Razor Principle

when many solutions are available for a given problem, choose the simplest one (using prior knowledge).

Moral Absolutism

when we as a society agree on ethical standards and then use it across all cultures

activation function of neuron determines

whether the neuron should be activated or not

How to evaluate classification models?

with Confusion matrix (Precision and Recall)

Precision and Recall

with these 2 formulas you can evaluate classification model

if in the task they say "use boolean feature of word "artificial", it means that

you assign value of either 0 or 1 / T or F to it in the table

"find precision at 5" means that

you have to look at FIRST 5 results and see which ones are correct, then write the value of coreect / k

main problems in AI ethics today

• Autonomous weapons • Privacy • (Racial) profiling

ways to compute similarity between examples based on their features:

• Hamming distance • Euclidean distance • Cosine similarity


Related study sets

Union University Photo 1 Terms Test Rosemarie Doumitt

View Set

Set 1_ Do-Not-Call / CAN SPAM / Lead based pain / Radon

View Set

Web Accessibility Specialist (WAS) Certification - Semantic Structure and Navigation

View Set

Sevärdheter i Shanghai, Tianjin och Beijing

View Set

DA211-Ch.23 Chemical and Waste Management

View Set

D216 Business Law For Accountants

View Set

Legal Rights/Responsibilities & Ethical Issues Practice Questions :)

View Set

world history AGE OF ENLIGHTENMENT

View Set

Introduction to Research Essentials

View Set