wpc 300 final

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

T-statistics

is used when population SD is unknown Good for small sample n<30 with underlying population distribution is normal

Correlation

n is a measure of the linear relationship between two variables, X and Y, which does not depend on the units of measurement. Correlation is measured by the correlation coefficient, also known as the Pearson product moment correlation coefficient. The correlation coefficient is scaled between -1 and 1

Find an appropriate solution framework

Break down the problem into pieces Iterative process (agile vs waterfall) Identify appropriate analytical/modeling techniques

Routinize the procedure

Documentation Next similar questions can be solved quickly Build a system (macros, codes, programs)

Secondary Data

Firm's proprietary database Internet data (crawlers) [scarpy, beautifulsoup] Stock/capital market data [compustat, CRSP] Accounting disclosure data [ from 10K, 10Q]

Mode:

The most frequently occurring value in a data set Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)

Tell an interesting and complete story

The problem you address should be meaningful Solution could be reused for related problems Assumptions, boundaries

Predictive Analytics

- Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to providing a best assessment of what will happen in the future. Question: (1) What will happen next? (2) Why will it happen next? Methods: (1) Data mining (2) Text mining (3) Forecasting Outcome: Accurate projections of future outcomes and events

Prescriptive Analytics

- Prescriptive analytics answers the question of what to do by providing information on optimal decisions based on the predicted future scenarios. The key to prescriptive analytics is being able to use big data, contextual data and lots of computing power to produce answers in real time. Question: (1) What should be done about it? (2) Why should you do it? Methods: (1) Optimization (2) Simulation (3) Expert systems Outcome: Best possible business decision and outcome

Descriptive Analytics

- This is a preliminary stage of data processing that creates a summary of historical data to yield useful information and possibly prepare the data for further analysis. Questions: (1) What happened? (2) What is happening? Methods: (1) Standard reporting (2) Dashboards (3) Visual analytics Outcome: Well defined business problems and opportunities

Measures of central tendency

yield information about the center, or middle part, of a group of numbers. Mean Median Mode Percentiles Quartiles Measuring Data Centrality Perce

Diagnostic/Explanatory Analytics

- this is about looking into the past and determining why a certain thing happened. This type of analytics usually revolves around working on a dashboard. Question: (1) Why did it happen? (2) How did it happen? Methods: Inferential Statistics, Visual analytics Outcome: Discover/Understand causal relationships of an outcome

Estimation

- using an experiment guarantees that you learn something about what you want to know.

Control

- using an experiment is the only reliable way to measure response to changing variables

Negatives of Analytical Decision Making

. Delayed Action Lack of flexibility Frustrations in teams

Null hypothesis

A statement that generally assumes nothing has changed Avg. amount of drink = 16 Oz

Observational Study example

A study took a random sample of people and examined their social media habits. Each person was classified as either a light, moderate, or heavy social media user. The researchers looked at which groups tended to be happier

Decoy Effect Bias

According to economic theory, we make decisions based on what will have the most utility to us. Consumers will tend to have a specific change in preference between two options when presented with a third option that is asymmetrically dominated

Experimental Study example

Another study took a group of adults and randomly divided them into two groups. One group was told to drink tea every night for a week, while the other group was told not to drink tea that week. Researchers then compared when each group fell asleep.

Mean Absolute Deviation

Average of the absolute deviations from the mean:

Confounder

is an extraneous variable in an observational study that correlates with both the dependent and independent variables Example: Regular consumptions of organic food will keep you in good mood. The confounder could be "money" Since you need money to buy organic food and ideally since you have money you are in good mood.

p-value

is lower than α, then reject the null hypothesis

Experimental Study

Another randomly assigned volunteers to one of two groups: One group was directed to use social media sites as they usually do. One group was blocked from social media sites. The researchers looked at which group tended to be happier.

Zero risk bias

Because we love certainty and hence ignore risk entity while making decision What would you decide if you were offered the following two options? Bet $10 to win a lottery $100 that has 50% chance of winning Bet nothing to get free $10.

quantitative data

Can be counted, measured, and expressed using numbers

Categorical: Ordinal

Categorical data can be on an ordinal scale. Numbers are used to indicate rank or order Relative magnitude of numbers is meaningful Differences between numbers are not comparable Example: Difference between strongly agree and agree is not necessarily same as the difference between disagree and strongly disagree. Another example (rank value as shown below) 1 for President 2 for Vice President 3 for Plant Manager. cannot add or sub

Block

is the e arranging of experimental units in groups (blocks) that are similar to one another

Simulated Data

Data based on assumption and simulation Used a lot in scheduling, routing and queuing

Numerical: Interval

Distances between consecutive integers are equal Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, "zero", is arbitrary Data are always numerical Example: Temperature at different rooms in a home. cannot multiply and divide

Observational Study example

Effect of drinking tea before bedtime A study took random sample of adults and asked them about their bedtime habits. The data showed that people who drank a cup of tea before bedtime were more likely to go to sleep earlier than those who didn't drink tea.

Experimental Study

Establish causality from observational study in a controlled environment Design an experiment to study a certain effect by intervention You plan for the data before you collect it

Why Experimental Study?

Experiments allow us to set up a direct comparison between the treatments of interest. We can design experiments to minimize any bias in the comparison. We can design experiments so that the error in the comparison is small. We are in control of experiments, and having that control allows us to make stronger inferences about the nature of differences that we see in the experiment. Specifically, we may make inferences about causation.

Data Extraction

Extract data from primary/secondary source.

Bandwagon Effect

Group thinking, adopting a decision based on the number of people who hold a certain belief. The most famous and commonly cited example of Groupthink is how the US Navy treated the threat of a Japanese attack on Pearl Harbor in Hawaii. Following a long line to dine in a famous restaurant - think Yelp

Observational Study

How different parameters in the population behave together, if or not they move together in the same direction Draw conclusions on correlations No outside intervention during the study You use the data available to you.

Categorical: Nominal

In nominal measurement the numerical values just "name" the attribute uniquely. A player with number 24 is not more of anything than a player with number 23, and is certainly not better than number 23. Numbers are used to classify (male or female) or categorize (Color) - can be stored as "word", "text" or "nominal code". Example: Employment Classification 1 for Educator 2 for Construction Worker 3 for Manufacturing Worker cant find the mean. can only compare if the data is equal. cannot add or sub

Sunk-cost Fallacy Bias

Individuals commit the sunk cost fallacy when they continue a behavior or endeavor as a result of previously invested resources Example: "I might as well keep eating because I already bought the food." "I might as well continue dating someone bad for me because I've already invested so much in them."

qualitative data

Is descriptive and conceptual and cannot be measured

Mean:

Is the average of a group of numbers Not applicable for nominal (categorical) or ordinal data Affected by each value in the data set, including extreme values Computed by summing all values in the data set and dividing the sum by the number of values in the data set.

Analytics

Learns by Analyzing Uses step by step procedure Values quantitative information and models Builds mathematical models and algorithms Seeks optimal solution

Heuristics

Learns by acting Uses trial and error Values experience, effort reduction Relies on common sense Seeks satisficing solution Fast and frugal May lead to decision biases!

Data Load

Load data into final target database, more specifically an operational data store, data mart or data warehouse

median

Middle value in an ordered array of numbers Applicable for ordinal, interval (quantitative), ratio data Ex: Median Housing price in a State Not applicable for nominal data (why not?) Unaffected by extremely large and extremely small values (How?)

Alternative hypothesis

Opposite of null, typically your claim. Avg. amount of drink < 16 Oz.

Tools for A/B Testing

Optimizely Visual web optimizer Adobe target Google content experiments

Anchoring Bias

Over-reliant on first piece of information you hear Most of buying decisions are affected by anchoring effect What do you think black Friday sales are driven by? Have you ever wondered why retail price of a product tend to be $39.99, not $40?

Availability Heuristics

Overestimate the importance of information that is available. Example: After you see a movie about a nuclear disaster, you might become convinced that a nuclear war or accident is highly likely. A person might argue that smoking is not unhealthy as his father who lived 100 years was a chain smoker and smoked 3 packs a day for 70 years!

Online Survey

Polls are completed only by visitors to the site Those with an interest in the website's mission are the only ones who will participate

Predictive Modeling Applications

Predict water leakage in a city water pipe network Predict when a person would go to depression Predict criminal activities at Los Angeles by LAPD Predict performance for certain stock portfolios Forecast demands for sales Predict if a customer is likely to buy certain product or services

Prescriptive Data Modeling

Prescribes the best course of action when making complex decisions involving tradeoffs between business goals and constraints, using optimization technology It basically uses simulation and optimization to ask "What should a business do?" Prescriptive analytics is a combination of data, mathematical models and various business rules.

responses

The outcomes that we observe after applying a treatment to an experimental unit

Numerical: Ratio

Ratio is very similar to the interval scale, with the difference that it has a true zero point. This scale is commonly used for values that are measured in numbers, such as length, height, weight, or monetary values like cost and revenue. Relative magnitude of numbers is meaningful Differences between numbers are comparable Location of origin, zero, is absolute (natural) Examples: Height, Weight, and Volume; Monetary Variables, such as Profit and Loss, Revenues;

The endowment effect

is the phenomenon in which most people would demand a considerably higher price for a product that they own than they would be prepared to pay for it (Weber 1993). The endowment effect is a hypothesis that people value a good more once their property right to it has been established.

Non-response

Some individuals are less likely to respond to a survey, e.g., want their opinion about smoking weed

Primary Data

Survey Interviews (marketing firm's telephone interviews) Used a lot in marketing research

Principles of Problem Framing

Tell, find and routinize

Clustering illusion

Tendency to see patterns in random events Gambler's fallacy

range

The difference between the largest and the smallest values in a set of data Simple to compute Ignores all data points except the two extremes

Analytics

is the process of developing actionable decisions or recommendations for actions based on insights generated from historical data

e confidence level

is the proportion of samples that will yield a confidence interval that actually contains the population mean.

Experimental units

The things on which the experiment is done. ex: students

Sample Study

This is mostly done if you want to estimate the parameters of the population Inferential statistics enables us to determine such parameters You make sure the sample is representative of the population before analyzing it.

Overconfidence

Too confident about your ability, especially when you are considered an expert in your field. Example: A person who is convinced he is going to get into Harvard and who only applies to Harvard. In this case, the overconfidence of the person could result in him not getting into any schools if Harvard rejects him.

Data Transformation

Transform / clean data into proper format or structure for the purpose of querying & analysis

Data Association

Two variables have a strong statistical relationship with one another if they appear to move together. When two variables appear to be related, you might suspect a causeand-effect relationship. Sometimes, however, statistical relationships exist even though a change in one variable is not caused by a change in the other

Typical sampling mistake

Unrepresentative sample Biased respondents Low response rate (non-response bias) or lower sample size Biased questions

Prescriptive Data Modeling Application

Used in producing credit score which helps financial institutions decide the probability a customer paying credit bills on time Asset management in utility companies Optimized operating conditions to maximize productions and minimize risks Better utilize: capital, personnel, equipment, vehicles and facilities

Social Desirability

Want to study what factors lead to academic dishonesty Who will be participating?

Decision Making Biases

We tend to believe or seek out information to preserve our own opinions or beliefs This can cause a gap in how we reason and how we should reason This causes us to make bad decisions Remember we make better decisions using critical thinking and being bit analytical.

Planning of an experiments

You have to decide: What measurement to make (the response) What condition to study (the treatment) What experimental materials to use (the units)

Process of A/B Testing

You take a webpage or app screen and modify it to create a second version of the same page. The change you want to see should be controlled to a single change (for example the placement of the "sign in" tab from left to right. Use a script to randomly show half of your visitors the original version of the page (known as the control) and the other half are exposed to modified version of the page (the variation).

Sample

a portion of the whole/population a subset of the population; must be large enough to represent the whole

Measurement units

actual objects on which the response is measured

Variance and Standard Deviation

average of the squared deviations from the arithmetic mean. Standard Deviation= square root of the variance

The framing effect

is an example of cognitive bias, in which people react to a particular choice in different ways depending on how it is presented; e.g. as a loss or as a gain.

3 principles of describing data

center, spread, shape

factors

combine to form treatment. Individual setting for each factor are called levels of the factor

Measures of Shape

describe the skewness of a set of data

Treatments

different procedures we want to compare

Census

gathering data from the entire population

Covariance

is a measure of the linear association between two variables, X and Y. Like the variance, different formulas are used for populations and samples. Population covariance

A/B testing

is a method of comparing two versions of a webpage or app against each other to determine which one performs better. • Two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal.

placebo

is a null treatment that is used when the act of applying s treatment any treatment has an effect

Random sampling

is a part of the sampling technique in which each sample has an equal probability of being chosen A sample chosen randomly is meant to be an unbiased representation of the total population. An unbiased random sample is important for drawing conclusions about the population

confidence interval

is a range of values (based on the sample mean, the sample size, and either the sample or the population standard deviation) that is likely to contain the population mean

spurious correlation

is a relationship between two variables that appear to have interdependence or association with each other but actually do not

Experimental Design

n is the process of planning a study to meet specified objectives. Planning an experiment properly is very important in order to ensure that the right type of data and a sufficient sample size and power are available to answer the research questions of interest as clearly and efficiently as possible.

Blinding

occurs when the evaluator of response do not know which treatment was given to which unit

experimental error

random variation present in all experimental results

Inferential statistics

s (study sample data) Estimate uncertainty (using probability) some member of the data to infer about population data. don't have access to the entire population so you randomly select a sample

Z-statistics

s is good for larger sample (n>30) with underlying distribution of the population may or may not be normal

Statistics

s is the science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data to assist in making effective decision

low kurtosis

s tend to have a flat top near the mean rather than a sharp peak.

Descriptive statistics

study data with entirety) Three principles of describing data Center, Spread and Shape

h high kurtosis

tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails.

Population

the whole a collection of persons, objects, or items under study

Inferential statistics

use a random sample of data taken from a population to describe and make inferences about the population.

Randomization

use of a known, understood probabilistic mechanism for the assignment of treatments to units

Efficiency

using an experiment learn the most from the experiment

Measures of variability

y describe the spread or the dispersion of a set of data. Common Measures of Variability: Range Interquartile Range Mean Absolute Deviation Variance Standard Deviation


संबंधित स्टडी सेट्स

Quiz 5 Medical Terminology, The Eye and the Ear, Medical Terminology Special Topics, Healthcare Terminology Chapter 13 - Special Senses: The Eye and the Ear, Endocrine System, Med Term Ch 11, Medical Terminology, Medical Terminology Chapter 9: The Ur...

View Set

Prep U for Brunner and Suddarth's Textbook of Medical Surgical Nursing, 13th Edition Chapter 37: Management of Patients With HIV Infection and AIDS

View Set

Medical Surgical: Management of Patients With Immunodeficiency Disorders

View Set

AP Human Geography Unit 7 (Economic Human Geography- Urban Hierarchy: From Hamlet to Megalopolis) Review

View Set