MGSC 291 Whitcomb

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

cognitive bias

1. anchoring effect 2. availability bias 3. confirmation bias 4. overconfidence bias

solutions to anchoring effect

1. ask for multiple, independent opinions 2. reconstruct multiple answers from scratch 3. downplay initial information; work from fundamentals 4. use experts

to be regarded as a candidate for normality, a random variable should

1. be measured on a continuous scale 2. possess a clear center 3. have only one peak (unimodal) 4. exhibit tapering tails 5. be symmetric about the mean (equal tails) most physical measurements in the real world would resemble normal distributions

basic roadmap to designing effective graphics (3 things)

1. determine your message 2. select the most effective format to display your message 3. design the data to show the most important parts

types of statistical lies

1. incorrect data 2. ignoring the baseline 3. arbitrary comparisons 4. misleading comparisons

design the data to show the most important parts

1. make the data clear and straightforward 2. remove all components that aren't necessary 3. highlight the most important parts; mute supporting elements 4. encourage the eye to compare data

normal Probability Distribution

1. normal or gaussian (or bell shaped) distribution was named for Karl Gauss 2. defined by two parameters µ and σ 3. denoted N(µ ,σ) 4. domain is -infinity < X < +infinity (continuous scale) 5. 99.7% of the area under the normal curve is included in the range between three standard deviations of the mean 6. the distribution is symmetric, bell shaped

sampling distributions for proportions

1. sample given as a percent 2. qualifies for normality Then the resulting distribution for the sample is normal and has the following characteristics... p is approximately normal with formulas on formula sheet

Tufts's rules for splaying quantitative information

1. show your data 2. use graphics 3. avoid chart junk 4. utilize data ink 5. use labels 6. use micro/macro 7. separate layers 8. use multiples 9. use color 10. understand narrative

Recognizing Poisson process

1.) an event of interest occurs randomly over time (or space) 2.) the average arrival rate (λ) remains constant 3.)the arrivals are independent of each other 4.) the random variable (X) is the number of events within an observed time interval

all discrete probability distributions must satisfy

1.) probabilities being between or equal to 0 and 1 2.) sum of possibilities of all values of X must sum to 1 probabilities of X can be anything

expected value

E(X) μ: mean--weighted average

special law of multiplication: if events A and B are independent then

P(A ∩ B) = P(A)P(B)

calculating conditional probabilities

P(A ∩ B) = P(A|B) P(B) P(A|B) = P(A ∩ B) / P(B)

if A and B are mutually exclusive, then the general law of addition can be simplified to...

P(A)+P(B)

When events A & B are mutually exclusive,

P(A|B) = 0 and P(B|A) = 0. cannot both be true --> single coin toss

When events A & B are independent

P(A|B) = P(A) and P(B|A) = P(B)

conditional probability

P(A|B): "the probability of A given B" P(A ∩ B) / P(B) for P(B)>0

deceptive graphs

Page 92 error 1: nonzero origin error 2: elastic graph proportions (stretching the Y or X axis) error 3: dramatic titles and distracting pictures error 4: 3-D and Novelty graphs--distort the bar volume error 5: chart junk error 6: data to ink ratio error 7: area trick: bars of graph misstates the true proportions

complement rule

The probability of the complement of an event, P(Ac), is equal to one minus the probability of the event.

standardized dara (z score)

a general approach to identifying unusual observations is to redefine each observation in terms of its distance from the mean in standard deviations

a probability distribution for a random variable X is

a listing of all possible values of X and their probabilities of occurring

Bernoulli experiments

a random experiment that has only two accounts to create a random variable we call one outcome a success X=1 and one outcome a failure X=0 the probability of success is denoted 𝜋 and probability of failure is 1-𝜋 ^both sum to 1

sample data set

a subset of the population characteristics are called statistics in most cases we cannot study all the members of a population

example of dependent event

age and phone use / arthritis ... knowing a persons age would effect the probability that the individual uses test messages or has arthritis may be casually related but does not prove cause and effect --- only means that knowing event B has occurred will effect the probability that A will occur

Probabilities as areas-continuous

areas under curves P(a<X<b) is the integral of the probability density function over the interval a to b because P(X=a)=0 entire area under PDF must be 1

binomial distribution

arises when a Bernoulli experiment is related n times each Bernoulli trial is independent so that the probability of success 𝜋 remains constant on each trial

why do we call the poisson distribution the model of arrivals (customers, defects, accidents)

arrivals can be regarded as Poisson events if each event is independent ex) X= # of customers arriving at a bank ATM in one minute

Bar and Column charts

column chart is a vertical bar chart numerical values on y axis category labels on x axis the height or length of each bar

anchoring effect

common human tendency to rely too heavily on the first piece of information offered when making decisions ex) originally for x amount but well give it to you for x-100!!! feel like you got a great deal

sampling is the basis for

confidence intervals hypotheis testing

contingency table

cross-tabulation of frequencies into rows and columns the intersection of each row and column is a cell that shows frequency

when P(A) differs from P(A|B) the events are ...

dependent

exponential distribution

describes the distribution of time between two events when the count of events has a poisson distribution (λ = mean # events per time)

poisson probability distribution

describes the number of occurrences within a randomly chosen unit of time (minute or hour) or space (square foot, linear mile) events must occur randomly and independently over a continuum time or space represented by dots

random variables can be either

discrete or continuous

how to calculate inverse probabilities for uniform distribution

ex) it is equally likely to arrive at any time within 20 minutes after he sets foot on the platform. What is the length of time he would wait such that there is a 30% chance he will wait this amount of time or longer? Area = .30= Length x height = (20-x0)×.05 .30=(20−_𝑥0).05 6.0=20−x0 𝒙𝟎=𝟏𝟒.𝟎

rules of probability

expressed as decimals and should add up to 1.0

histogram

graphical representation of a frequency distribution no gaps b/t bars

discrete random variable

has a countable number of distinct values number of sixes in 4 dice rolls 0,1,2,3,4

sampling when mean is normal

has to have a large number of trials need more than 30 observations in a sample underlying distribution must be normal

why is the poisson distribution called the model of rare events

if the mean is large, we can reformulate the time units to yield a smaller mean λ=90 events per hour = λ=1.5 events oer minute

collectively exhaustive

if their union is the entire sample space at least one of the events has to occur

tables

individual values will be looked up/compared, precise values are required

joint probabilities (CT)

intersection of 2 events (each cell in a contingency table)

random variables

is a function or rule that assigns a numerical value to each outcome in the sample space of a random experiment use X when referring to random variable, while specific variables of x are shown in lower case

probability density function

is used ti describe continuous random variables

probability mass distribution

is used to describe random discrete variables

skewed right

long tail of histogram points right (most data on left) mean>median

a cumulative distribution function

may be used to describe both discrete and continuous random variables

graphs

message is the shape of the values

skewed left

negative skewness long tail of histogram points left (mostly data on the right) mean<meadian

in a binomial experiment we are interested in the

number of success in n trials, so the binomial random variable X is the sum of n independent Bernoulli random variables X=X1+X2+X3+X4 +Xn

confirmation bias

occurs when decision makers seek out evidence that confirms their previously held beliefs while discounting or demising evidence that supports differing conclusions

proportions can always be expressed as a percentage buttt

percentages cannot always be expressed as proportions

random sampling allows us to draw valid conclusions about

populations-all the people, items, or objects of interest- from random samples drawn from those populations

addition rule

probability that event A or B occurs (at least one of the events happen P(A U B) = P(A) + P(B) - P(A ∩ B) union--either or both events will occur

confidence examples over just right under

provide a low and high guess of 10 items..90% sure that the correct answer falls between the two. if you successfully meet the challenge you should have 10% misses-one miss overconfidence--two or more of my intervals did not contain correct answer under confidence-- all intervals contained right answer

Bayes' theorem

provides a method of revising probabilities to reflect new information the prior (marginal) probability of an event B is revised after event A has been considered to yield a posterior (conditional) probability

continuous random variable

random variable that arises from measuring something infinite number of outcomes don't have a probability attached to each outcome, we have a density f(x) f(x)>_0 ex) waiting time until a customer arrives can have decimal values

Poisson distribution events occur...

randomly and independently over a continuum of time or space

conditional probability (CT)

restricting ourselves to a single row or column ex. salary gains are small given that MBA tuition is large

Continuous expected value and variance

same as discrete except instead of ∑ , it uses the integral sign ∫. Integrals are taken over all X-values

standard deviation

single number that helps us understand how individual values in a data set vary from the mean

shape of binomial distribution

skewed right when 𝜋 < .5, symmetric when 𝜋=.5 and skewed left when 𝜋>.5

Uniform Continous distribution

sometimes noted as U(a, b) Since PDF is rectangular you can easily verify that the area under the curve is 1 by multiplying the base (b-a) by its height 1/ (b-a) f(x) the density is constant

mean

sum of all data values divided by the number of data items

variance of a discrete random variable

sum of the squared deviations about its expected value, weighted by the probability of each X-value standard deviation squared

sample variance

s² replace μ with x bar

symmetric

tails of histogram are balanced mean=meadian

median

the 50th percentile

availability bias

the belief that if something can be recalled, it must be important --people tend to weight their judgements toward more recent information --it is easier to recall consequences if those consequences are bigger ex) if asked if more ny students go to usc than tennesse students it would be based on personal examples ex) if people understood the odds of winning the lottery no one would ever buy a ticker. However since the jackpot winners are advertised so frequently, people forget about the vast majority of people who haven't won a cent

population data set

the complete set of individuals characteristics are called parameters

intersection

the event consisting of all outcome in the sample space that are contained in both event A or B joint probability is denoted as P(A ∩ B)

mode

the most frequently occurring data value

as λ increases

the poisson becomes less skewed on the graph

marginal probability

the relative frequency that is found by diving a row or column by the total sample size

expected value for the mean of a sample size n is

the same as expected value for an individual observation long run average

standard error

the standard deviation of the sampling distribution the dispersion of a sample will be lower than that for a single observation each sample could contain highs and lows which cancel out

the probability of 2 independent events occurring simultaneously is the product of

their separate probabilities P(A1 ∩ A2 ∩ A3)= P(A1)P(A2)P(A3)

how to counter the availability bias

try to think of instance of the event that aren't so memorable ex) all instance someone swam at beach and wasn't attacked by shark ex) all the people who weren't murdered in the dt area

coefficient of variation

unit free measure of dispersion the standard deviation expressed as a percent of the mean when SD exceeds the mean the CV can exceed 100% want a lower number useful in comparing variables measured in different units

using the standard normal table to calculate normal probabilities

us it to look up the probability that X takes on a value in a given range, or given interval. We can use the standard normal probability table for any normal random variable (no matter what its mean and standard deviation) by transforming the normal random variable X into a standard normal random variable Z using the following formula Z=X-𝜇 / 𝜎

scatterplot

used to depict two potentially related variables -each point is a pairing -linear, curvilinear -positive vs negative relationships

line graph

used to display a time series, to spot trends or to compare time periods no vertical grid lines only horizontal numerical variable on y axis and time units on x axis use zero origin numerical labels omitted data markers

overconfidence effect

well established bias in which a persons subjective confidence in her or his judgements is reliably greater than the objective accuracy of those judgements, especially when confidence is relatively high

independent

when knowing that event B has occurred does not affect the probability that event A will occur ex) event A is independent of event B if the conditional probability P(A|B) is the same as the unconditional probability P(A)--> if the probability of event A is the same whether event B occurs or not

when do we treat discrete variables as continuous?

when the range is large ex) exam scores-- range from 0-100 but are often treated as continous ex) number of people in Richland County who have purchased flood insurance policy

problem with confirmation bias

you selectively filter what info to choose to pay attention to and value ex) person believes left handed people are more creative than right handed people, now every time they meet a left handed person that is also creative they place even greater support on the "evidence" --could be scientifically proven by experts, but also discounting examples that do not support the idea

poisson model has only one parameter denoted as

λ--> representing the mean number of events per unit of time or space The unit of time usually is chosen to be short enough that the mean arrival rate is not large λ<20 aka model of rare events

population variance

σ² the sum of the squared deviations from the mean divided by their population size

𝜋 is the binomial parameter for the probability of success on a single trial...what is it for sampling distributions

𝜋 is 𝑝 bar=𝑋 / n

only parameter needed to define a Bernoulli process

𝜋: mean 𝜋(1-𝜋): Variance


Kaugnay na mga set ng pag-aaral

Introduction to Retailing 275 Final

View Set

Chapter 2 - The Adolescent Brain: A work in Progress

View Set

Module 1 Lesson 2 Types of Network Topologies Main Ideas

View Set

EMTB CH 28 HW and Quiz Questions

View Set

Ch. 6 - Future and Present Values of Multiple Cash Flows

View Set