Intro to Data Science and AI Maastricht Lectures 1-6b

Ace your homework & exams now with Quizwiz!

What is Exploratory data analysis?

Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics and start discovering interesting things/structure in the data.

What are the ways of measuring distance? Look for how to compute these in lecture 3 slide 46

Euclidean distance Infinity distance Absolute (Manhattan) distance

What does PEAS stand for?

Performance measure, Environment, Actuators, Sensors. •Consider, the task of designing an automated taxi driver- Performance measure: Safe, fast, legal, comfortable trip, maximize profits- Environment: Roads, other traffic, pedestrians, customers- Actuators: Steering wheel, accelerator, brake, signal, horn-Sensors: Cameras, sonar, speedometer, GPS, odometer, engine sensors, keyboard

What is the minimax principle?

Player 1 aims to maximize the minimum payoff (to player 1) •Player 2 aims to minimize the maximum payoff to player 1

Exam Checklist 1b

Turing test What? Why? Pros/Cons Thinking/Acting Humanly/Rationally Differences between these approaches Examples of what current AI can/can't do? Chapter 1 of Russell and Norvig: Artificial Intelligence: A Modern Approach, Prentice Hall, 2010, third Edition

What is the PageRank algorithm?

- Algorithm gives each webpage returned from the keyword search a weight between 0 and 1. - The higher the weight given to the page, the more likely it is that this page will be displayed first to you.

What are the 3 types of learning?

1. Supervised learning 2. Unsupervised learning 3. Reinforcement learning

What are the steps in the data science process?( the process of transforming raw data into usable information and knowledge)

1.) Ask an interesting question(Identify a data science problem from the real world 2.)Gather the data 3.)Explore the data 4.)Model the data(4.1: Use the model for inference, interpolation, extrapolation, prediction) 5.)Communicate and visualize results Don't just question your findings, question your data!

What is thinking humanly?

1960s "cognitive revolution": -information-processing psychology • Requires scientific theories of internal activities of the brain • How to validate? Requires 1.Predicting and testing behavior of human subjects (top-down) 2.Direct identification from neurological data (bottom-up) • Both approaches (Cognitive Science and Cognitive Neuroscience) are now distinct from AI

What is a decision tree?

A decision tree is a tree where: -Each internal node tests an attribute -Each branch corresponds to an attribute value -Each leaf node is labelled with a class (class node)

What is a Term-Document Matrix?

A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms.

Define inference

A logical interpretation based on prior knowledge and experience.(programming)

What is validity and satisfiability?

A sentence is valid if it is true in all models,e.g., True,A A, A A, (A (A B)) B A sentence is satisfiable if it is true in some model e.g., A B, C A sentence is unsatisfiable if it is true in no models e.g., AA 4b slide 24

What is a payoff table (matrix)?

Collects possible payoffs for all players, given alternative strategies (this is where the prior knowledge about a problem goes in)

What to know/know-how-to-do for the exam: lecture 2. Look deeper into these subjects.

Complex numbers and their representation in the complex plane • Cartesian and polar form - and transformation from one into another • Operations with complex numbers• Solve exercises like those you can find on the Student Portal

What is data?

Data is unprocessed information, raw facts

What is Data Science?

Data science is the discipline that describes, predicts, and makes causal inferences, based on data (not the discipline that uses machine learning algorithms or other technical tools)

What to know/know-how-to-do for the exam LECTURE 1

Definition of data, information, and knowledge•Link between data, information, and knowledge (data -> information -> knowledge)•.

What is Deterministic vs. Stochastic?

Deterministic: The next state of the environment is completely determined by the current state and the action executed by the agent. Stochastic: the next state of the environment is not determined by the current state and actions by agent

Discrete vs. Continuous

Discrete: A limited number of distinct, clearly defined percepts and actions Continuous: opposite

What is the entropy formula for when you have more than 2 classes?

E(S) = -p1lognp1- p2lognp2 - p3lognp3

What is a postings list?

Each item in the list - which records that a term appeared in a document (and, later, often, the positions in the document) - is conventionally called a posting . The list is then called a postings list (or ), and all the postings lists taken together are referred to as the postings .

What is Nash Equilibrium?

Each player does the best for themselves (and for the group!)

What is entailment?

Entailment means that one thing follows from another: • KB ╞α • Knowledge base KB entails sentence α if and only if, in all worlds where the KB is true, α is true - E.g., the KB containing "the Giants won" and "the Reds won" entails "The Giants won or the Reds won"- E.g., x+y = 4 entails 4 = x+y- hold_party(helen) ╞ happy(helen)

Episodic vs Sequential

Episodic: The agent's experience is divided into atomic "episodes" (each episode consists of the agent perceiving and then performing a single action), and the choice of action in each episode depends only on the episode itself Sequential: opposite

What is a rational agent?

For each possible percept sequence, a rational agent should select an action that is expected to maximise its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has

What is a Fully Observable environment vs partially observable?

Fully: An agent's sensors give it access to the complete state of the environment at each point in time Partial: An agent's sensors give it access to the partial state of the environment at a point in time

What is the link between angles and motions of points around a circle?

Google it, its too much. there is like a formula and stuff

What is information retrieval?

Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). •web search •e-mail search •searching your laptop •corporate knowledge bases •legal information retrieval

Define Heuristic Search

Heuristic search refers to a search strategy that attempts to optimize a problem by iteratively improving the solution based on a given heuristic function or a cost measure. • Truncate the game tree (limited search depth)• Use a (static heuristic) evaluation function at the leaves• Minimax (with pruning) on the reduced game tree• Playing is solving a sequence of these game trees

Heuristic Evaluation Function

Heuristics values must be correlated with the true (game-theoretic) value • For chess, typically linear weighted sum of features- Eval(s) = w1*f1(s) + w2*f2(s) + ... + wn*fn(s)-e.g., w1 = 9 with f1(s) = (number of white queens) - (number of black queens), etc. • Chapter 3 - stop at start of 3.4.4•Chapter 5, 5.1 and 5.2

What questions should you ask for exploratory data analysis?

How are the data distributed/organized? (visualize your data!)- Basic statistics (mean, median, range, standard deviation, ...) - Outliers? -Anomalies? -Missing values? - Discover interesting things about the data/discover structures (unsupervised learning; clustering, principal component analysis, etc.)

What is data mining?

Identifying patterns in data

Define Depth first search

In depth first search we go as deep as we can with each path, until we reach a leaf, then we backtrack up the tree

What is strategic dominance?

In game theory, strategic dominance occurs when one strategy is better than another strategy for one player, no matter how that player's opponents may play

What is information gain?

Information gain is the expected reduction in entropy caused by partitioning the instances from S according to a given attribute. 5b slide 20 for formula

What is information?

Information is commonly thought to be data, processed or transformed into a form or structure suitable for use by human beings Information is considered a property of data. This implies that the former cannot exist without the latter

What is Knowledge Engineering?

Knowledge engineering is the process of developing knowledge based systems in any field whether it be in the public or private sector, in commerce or in industry Data -> Information -> Knowledge

What is knowledge?

Knowledge is what someone has after understanding information.

What is linear regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable. • y is the output of the model, i.e., the observations you have collected • x is the explanatory variable (or regressor), a quantity the observations depend on• linear regression is an approach to modeling the relationship between y and x • For instance: a model representing the daily temperature in Maastricht (this is the y, the output of the model) • We assume the temperature depends on some other quantities, as wind, pressure, or day of the year (these are the x, the input variables)

What is thinking rationally?

Not all intelligent behavior is mediated by logical deliberation Thinking in a way that will help you to directly achieve a goal. • Aristotle: what are correct arguments/thought processes • Greek schools developed various forms of logic: notation and rules of derivation for thoughts; • Direct line through mathematics and philosophy to modern AI-The logicist tradition in AI hopes to create intelligent systems using logic programming.• Problems: - ||Not all intelligent behavior is mediated by logical deliberation||

What are the advantages of the Turing test?

Objective notion of Intelligence • Avoids discussion of internal processes and consciousness • Eliminates bias in favor of living organism

What is the goal in information retrieval?

Retrieve documents with information that is relevant to the user's information need and helps the user complete a task

What is the complete search tree terminology?

Root: node without parent (A) •Descendant of a node: child, grandchild, grand-grandchild, etc .•Internal node: node with at least one child (A, B, C, F)•Leaf node (a.k.a. leaf ): node without children (E, I, J, K, G, H, D)\ •Ancestors of a node: parent, grandparent, grand-grandparent, etc. •Subtree: tree consisting of a node and its descendants

Single agent vs. multiagent

Single agent: An agent operating by itself in an environment Multiagent: opposite

Static vs. Dynamic

Static: The environment is unchanged while an agent is deliberating. -The environment is semidynamic if the environment itself does not change with the passage of time but the agent's performance score does Dynamic: opposite

What is a learning agent?

Teach them instead of instructing them •i.e., expose the agent to reality rather than trying to write it down -Advantage is the robustness of the program toward initially unknown environments •i.e., when designer lacks omniscience

What is a collection?(information retrieval)

a set of documents •Assume it is a static collection

True or false: If I sum up sine and cosine waves at different frequencies, I can build waveforms of many different shapes • I can use trigonometric functions at different frequencies as basic components to build up a specific function • I can use these basic blocks to "decompose" a function in fundamental components

True True True

Why is linear regression useful?

Why useful? - Interpretation of relationship between two quantities - investigate possible association (which does NOT mean causation!) - Interpolation of missing data, of gaps in your measures- Forecasting

What is the Turing Test?

a blind test to determine if someone can tell the difference between talking to a human and talking to a machine - if you are a programmer the goal is to have the person taking the test to fail.

What is a rational decision maker?

a hypothetical person that will always pick the option they predict will be the best for themselves

define uniform distribution

a type of probability distribution in which all outcomes are equally likely. ... A coin also has a uniform distribution because the probability of getting either heads or tails in a coin toss is the same.

Parking sensors are proximity sensors for road vehicles designed to alert the driver to obstacles while parking. a. Explain what is data and what is information in the example above. b.Explain how a measure of proximity between a road vehicle and an obstacle could be transformed from data into knowledge. c. Express the knowledge generated at the previous point into a rule which could be implemented into a knowledge based system.

a.Explain what is data and what is information in the example above: data: distance between the vehicle and an obstacle detected by the sensor; information: the vehicle is getting close to the obstacle. b.Explain how a measure of proximity between a road vehicle and an obstacle could be transformed from data into knowledge.The distance is detected by the sensor (data). If vehicle gets too close to obstacle (information), then trigger an alarm (knowledge). c.Express the knowledge generated at the previous point into a rule which could be implemented into a knowledge based system.If distance < 30cm -> trigger sound alarm

What should you do if there is no pattern in your time series data?

create a different representation of the data

What is payoff?

payoff (final outcome): the benefit for a player resulting from the actions or strategies taken by the player, with respect to the strategies of all other players. It could be a negative number, representing a net gain for other players

Define confidence interval

indicates the percent likelihood that a random sample of data will fall within a specific range of values

What is a learning element?

introduces improvements in performance element -Critic provides feedback on agents performance based on fixed performance standard •Design of a learning element is affected by -Which components of the performance element are to be learned -What feedback is available to learn these components -What representation is used for the components

What is a mechanical model?

look it up fool

What are stable vs nonstable solutions in game theory?

not sure, look at lecture 6

What is the performance element?

selecting actions based on percepts - Corresponds to the previous agent programs

What is strategy?

strategy: the set of actions taken by a player

What is a problem generator

suggests actions that will lead to new and informative experiences -Exploration vs. exploitation

Define knowledge base

the underlying set of facts, assumptions, and rules which a computer system has available to solve a problem.

Can all agents be turned into learning agents?

yes

What is (statistical) classification?

• A classification problem occurs when an object needs to be assigned into a predefined group or class, based on a number of observed (quantifiable) attributes/features related to that object. What features? How many features?-> The larger the better?

What is mathematical modeling?

• A mathematical model is a description of a system using mathematical concepts and language • Mathematical models can be used to model, or represent, how the real world works • The process of developing a mathematical model is termed mathematical modelling • Why useful?-A model can help to better understand the properties and the behaviour of a system (for instance: predator-pray model)-It can be applied to the system to have some control on it (automatic pilot of airplanes; water pipelines; a pacemaker; etc.)-It can be used to test hypotheses or make prediction about behaviour when some initial conditions are changed (weather forecast; progression of a certain disease)

How to avoid the curse of dimensionality?

• Add new features as a smart combination of existing features: the kernel trick

What is Bias and Variance Trade-off?

• Bias refers to the error that is introduced by modeling a real life problem (that is usually extremely complicated) by a much simpler problem •Variance refers to how much your estimate would change by if you had a different training data set

What is game theory?

• Game theory is the study of strategic decision-making • Game theory is a branch of applied mathematics which studies the individual decisions of a subject in situations of strategic interaction (or conflict) with other subjects, finalized to the largest gain for every subject •It is the study of mathematical models of conflict and cooperation between intelligent rational decision-makers

How does the programming language PROLOG work?

• In PROLOG, a logic programming language we don't need to program the computer to build and search the tree •This is built-in to the interpreter. •Instead we just need to define the problem in a way which it understands. Refer to slide 6 in lecture 4b

What is Fourier analysis?

• In mathematics, Fourier analysis is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions (sine and cosine) • Time series can be interpreted as "functions" • Then Fourier analysis can be used to "decompose" time series into simple patterns (simple building blocks) • These building blocks tell us something about the oscillatory components / repetitive patterns in a time series!

Define Breadth First Search

• Look at the root, if it isn't a solution.. .•Look at all children of the root, if no solution... • Look at all grandchildren of the root, if no solution... • Etc.

What is acting rationally?

• Rational behaviour: doing the right thing • The right thing: that which is expected to maximize goal achievement, given the available information • Doesn't necessarily involve thinking - reactive behaviours (reflex action in leg), and automatic behaviours (eg. Blinking)

What is an agent?

•An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators •An agent operates autonomously •Agents include humans, robots, softbots, thermostats, etc. •An agent is an entity that perceives and acts

Data mining involves six common classes of tasks. What are they? know the definitions as well.

•Anomaly detection (outlier/change/deviation detection) • Association rule learning (dependency modelling) • Clustering • Classification •Regression •Summarization

What are the disadvantages of the Turing test?

•Bias toward purely symbolic problem solving task •Constrain machine intelligence to fit human mold-Limited memory-Error prone •It is a distraction

What is a utility based agent?

•Certain goals can be reached in different ways-Some are better, have a higher utility •Utility function maps a (sequence of) state(s) onto a real number •Improves on goals: -Selecting between conflicting goals -Select appropriately between several goals based on likelihood of success

When should you consider decision trees?

•Each instance consists of an attribute with discrete values (e.g. outlook/sunny, etc..) •The classification is over discrete values (e.g. yes/no ) •It is okay for the training data to contain errors - decision trees are robust to classification errors in the training data .•It is okay for the training data to contain missing values - decision trees can be used even if instances have missing attributes

What are the limits of AI(What cannot ai do?)

•Each program is good in its own domain, but it can't do all tasks • Machine Translation • Acting as a judge • Beating humans in soccer• General Game Playing • Converse successfully with another person for an hour

What are the 4 types of agents?

•Four basic types in order of increasing generality: 1.Simple reflex agents 2.Model-based reflex agents 3.Goal-based agents 4.Utility-based agents •All can be turned into learning agents

What is entropy?

•Let S be a sample of training examples, and p+ is the proportion of positive examples in Sand p-is the proportion of negative examples in S. •Then: entropy measures the impurity of S: •E(S) = -p+log2p+ - p- log2p-

What are the properties of logarithms

•Properties of logarithms -logba = c if a = bc -logb(a●c) = logba + logbc ; -logb (a/c) = logba - logbc -logba = (logca) / (logcb) -So: -Log10(100) = 2; log10(1) = 0, log10(0.001)=-3 -log2(0.5) = -1; log2(8) = 3; log2(0.25) = -2 -log2(0.4)= ln(0.4)/ln(2)=(or log10(0.4) / log10(2)) ≈ -1.32

What is a simple reflex agent?

•Select action on the basis of only the current percept •Large reduction in possible percept/action situations •Implemented through condition‐action(if-then) rules •Will only work if the environment is fully observable otherwise infinite loops may occur

What is bootstrapping with replacement

•Take a sample with replacement from the original data (the original sample), where with replacement means that the same data point can be selected more than once •Compute the statistics •Repeat the previous steps a large amount of times (1000 at least), assuming each sample is drawn independently of the other samples •Compute confidence interval from bootstrap distribution by means of bootstrap percentile method Study slides 57 and 58 lecture 5

What is a goal based agent?

•The agent needs a goal to know which situations are desirable- More tricky when long sequences of actions are required to find the goal •Typically investigated in search and planning research •Major difference: future is taken into account •Is more flexible since goals are represented explicitly and can be changed

What are Monte Carlo simulations?

•They are simulations: they need an underlying model • The model is still deterministic, the uncertainty is on the input to the model • Use it to generate statistical knowledge about the output, the model, and its parameters

What is a model based reflex agent?

•To tackle partially observable environments-Maintain internal state •Over time update state using world knowledge -How does the world change- How do actions affect world-⇒Model of the World

Why do we need simulations?

•To understand-Model to understand the impact of solar radiation and greenhouse gases on the earth's surface temperature. •To improve-Model to reduce waiting time of patients in a hospital emergency room •To predict performance/outcomes (to test scenarios). -Simulation of a potential layout for a new factory. •To guide-Simulation of student demand for computers in the laboratory room •To learn-Flight simulators

Adversary Search

•Two (or more) opponents, each trying to maximize their expectations •Player 1 is called MAX-Obtain the maximum result-Minimize that of the opponent •Player 2 is called MIN-Obtain the minimum result-Maximize that of the opponent

What is Logical Equivalence?

•Two sentences are logically equivalent iff true in same models: α ≡ ß iff α╞βand β╞α

What is unstructured data?

•Typically refers to free text •Allows: -Keyword queries including operators -More sophisticated "concept" queries e.g. ,find all web pages dealing with drug abuse •Classic model for searching text documents

What type of problems can implement data mining?

•require knowledge-based decisions •have a changing environment •have sub-optimal current methods •have accessible, sufficient, and relevant data •provides high payoff for the right decisions •privacy and ethics considerations important if personal data is involved! (e.g., GDPR)


Related study sets

SECURITY + PENETRATION TESTING 6.13

View Set

Practice Question Banks 16-30 (Not Required)

View Set

Certified Ethical Hacking (CEH) v.8 Study Guide part 2 (101-200)

View Set

Gaston College NUR 112 Appendicitis

View Set

International Marketing Final Exam Review, TTU, Duhan

View Set

COMPLETE: D-01 Distinguish Between Dependent &Independent Variables - Part 3 - Review/Interpret Literature - Acquisition

View Set