Intro to Managerial Statistics

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

mean formula

"The sum of observations / n"

normal distribution

-bellshaped, centered at mean, area under curve=probability A function that represents the distribution of variables as a symmetrical bell-shaped graph. used to describe the variability associated with sample proportions which are taken from repeated samples describes variability of many different statistics

Normal distributions assumptions

1. independent observations 2. large enough sample proportions: least 10 expected successes and 10 expected failures in sample

mosaic plot

2 categorical variables the bigger the area the bigger the proportion uses the area of rectangles to display the relative frequency of occurrence of all combinations of two categorical variables use an explanatory variable to represent the first split

95% confidence interval

68-95-99.7 rule tells us 95% of observations are within 2 standard errors of the mean when normal distribution the point estimate we observe will be within 1.96 standard errors of the true value of interest 95% of the time 95% confident this interval captures the value

data distribution

A listing of the values or responses associated with a particular variable in a data set.

range

max-min can be inflamed by ranged

center

mean/average: measure center of a distribution of data

best to use when data is skewed

IQR and range together

95% for finding the true population proportion/mean

If 95% of sample proportions are within two standard deviations of the population proportion/mean, we can say that 95% of the time the sample proportion/mean is within 2 std of p: phat plus or minus 2 times square root of phat qhat over n

doa ll 95% confidence intervals include the population parameter?

no only 95% of confidence intervals do

Tidy Data

One observation per row One variable per column One value of observation per cell

categorical nominal

no order ex fav color, football team,

Investigative Cycle

Problem -Identify the problem -Define the Research Question Plan -Prepare or examine the sampling plan and/or experimental design -Collect the data (if not given) Data -Identify the explanatory and response variables -Identify the variables as categorical or quantitative -Ask questions about data -Are there missing observations? -Where did the data come from? -Is their anyone missing? Analysis - Examine the data by finding statistical and graphical summaries -Determine the appropriate approach Conclusion -What is the conclusion of the analysis? -Can you extend it to the population? -Can you show cause and effect? Why? -What are the next steps?

IQR

Q3 (75%) - Q1(25%), the middle 50% of the data!!

empirical rule

The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve

categorical data display

contingency table, bar plot, stacked bar plot, dodged bar plot, standardized bar plot, mosiac plot

Principles of Experimental Design

control, randomize, replicate, block

margin of error

describes how far observations are from mean; z multiplier * the standard error

median higher than mean

left skewed

histogram

a bar graph depicting a frequency distribution has unimodal, bimodal, multimodal data density, right skewed, left skewed mode = peak in distribution

bar plot

a common way to display a single categorical variable

sampling distribution

a distribution of statistics obtained by selecting all the possible samples of a same size from a !population! how sample statistics vary from one another

point estimate

a summary statistic from a sample used as an estimate of the population parameter

explanatory variable

a variable that we think explains or causes changes in the response variable

prospective study

an observational study in which subjects are followed to observe future outcomes

retrospective study

an observational study in which subjects are selected and then their previous conditions or behaviors are determined

mode best used when data is

bimodal or unimodal to find peaks

sampling with replacement

bootstrapping Once a member of the population is selected for inclusion in a sample, that member is returned to the population for the selection of the next individual. taking repeated sample from a population is impossible so we bootstrap= resample from the sample

multistage

cluster + strata like a cluster sample but rather than keeping all observations in each cluster, we collect a random sample within each selected cluster more economical helpful when there is a lot of case to case variability within a cluster ex someone wanted to survey parents of elementary school kids. They randomly picked 20 school districts in the us and then randomly picked 2 elementary schools within each district. for each school they took a random sample of 100 parents.

census

collecting data from an entire population hard to do and expensive difficult to identify the entire population of interest

ridge plot

combines density plots for various groups drawn on the same scale in a single plotting window

stacked bar plot

displays distributions of two categorical variables on a bar plot useful for visualizing the relationship between two categorical variables on a bar plot one variable is explanatory--one is a response

null distribution

distribution of simulated statistics that represent what could have happened in the study assuming the null hypothesis was true always centered at the value where the null hypothesis is true

stratified

divde + conquer 1.population divided into groups called strata strata are chosen so that similar cases are grouped together 2. usually simple random sample employed within each strata useful when cases in strata are very similar with respect to the outcome of interest population estimate more precise if each group estimate is more precise helpful when there is a lot of case to case variability within a cluster

difference between a scatterplot and dotplot

dot plot displays one variable scatterplot displays two

simple random sample

each case in the population has an equal chance of being included and knowing that a case is included in a sample does not provide useful information about what other cases were included

robust statistic

extreme observation have little effect on their values IQR Median

was the sample mean unusually high

find zscore find prob yhat=. x or more

normal quantile plot

if data follows mostly straight line = normal data curves up or down at ends = not approx normal

What does changing the confidence level affect

increasing CI increases width of the interval

what does changing sample size affect CI

increasing sample decreases width of CI

assumptions needed to use t model to make one sample t interval for the man

independence assumption: check randomization, check 10% normal population assumption: nearly normal condition n<30 with no outliers n>=30 good

convenience sample

individuals who are easily accessible are more likely to be included in the sample

categorical ordinal

intrinsic order ex somewhat agree, disagree, somewhat disagree, great terrible, bad, severity 1-10, star ratings

data frame

is a convenient and common way to organize a data frame where each row is a unique case (observational unit), each column is a variable and each cell is a single value

summary statistic

is a single number summarizing data from a sample

sampling distribution of the sample proportion;

is all the possible values of a stat from taking samples of the same size from the population with these conditions: independence assumption: sampled values are independent randomization condition-random samples or random application treatments 10% condition-the sample should not be larger than 10% of the population success/failures:you must have at least 10 expected successes (np) and failures (nq) note =1-p

zscore

number of standard deviations away from the mean ex if observation is one std above man zscore is 1 observations below mean have negative z scores observations above the mean always have positive z scores if observation=mean z score is 0 if the absolute value of a zscore is larger than the absolute value of the other observations x1 is more unusual

sample distribution

one possible sample simulation

scatterplots

one type of graph used to study relationship between two numerical variables if two variables show some connection they are associated variable not related are independent

Dot Plot

one variable, quantitative a graphical device that summarizes data by the number of dots above each data value on the horizontal axis

variability is small

original stat is close we expect the sample stat to be close to the true parameter

variability is large (bootstrap)

original stat is far from true population parameter

bias

over-representing someones interest nonresponse bias- ex only 30% of ppl sampled actually respond

mu

population mean

confidence interval for the population proportion

question data: bar graph or pie chart variables: categorical w y/n parameter: p=population proportion of success this value is unknown assumptions meet to be able to us CI for p: independence assumption: check randomization condition, check 10% condition sample size assumption: success/failure condition if not met we would say we could not use the normal model and thus cannot proceed with the interval different approaches Classical Approach Bootstrap Approach: Percentiles (find middle 95% of the sample means in the bootstrap CI), Standard Error Approach(use p hat plus-minus 2 times SEboot_strap

bootstrap percentile confidence interval

range of values for the true proportion

measures of spread (variability) (dispersion)

range, variance, standard deviation

Bootstrapping

repeatedly sample from the sample with replacement best for modeling studies where data has been generated through random sampling from a population model how a statistic varies from one sample to another taken goal: to produce an interval estimate for the population parameter

observational study

researchers collect data in a way that does not interfere with how data arises ex surverys, review records

mean higher than median

right skewed, median is preferred measure of center, IQR and range better describers

standard deviation

s:a measure of variability that describes an average distance of every score from the mean if points are closer to mean standard deviation is smaller

variance

s^2: standard deviation squared (not measured in same units as data)

Histogram

shape : skewed left or right outliers: between center: mean median mode spread: standard deviation, variance, IQR, range

quantitative data display

side by side box plot, box plot, histogram, ridge plot, faceting, scatterplot, dot plot

4 types of sampling

simple, clustered, multistage, stratified

median best used when data is

skewed median is better here because mean is pulled by extreme skewedvalues

For a t distribution...

smaller the distribution the more spread there is in the tails

Faceting

split geographical display of the data across plotting windows based on groups

sample

subset of whole

box plot

summarizes 5 stats +identifying unusual observations

contingency table (cross tabulation)

summarizes data for twi categorical variable, each value in the table represents the number of times a particular combination of variable outcomes occurred

mean best used when data is

symmetrical, bell shaped

dodged bar plot

tends to use too much horizontal space difficult to know if there is a relationship

for the t distribution: higher the df

the closer it gets to the z shape

Central Limit Therom

the distribution of all the sample means taken from the same population with mean equal to m and standard deviation equal to simga bell shaped curve

sampling distribution of the sample mean

the distribution of all the sample means taken from the same population with mean equal to mu and std equal to sigma with these conditions: large enough n = 30 independence assumption sampled values are independent randomization conditions: random sample or random application of treatments 10% condition: the sample should not be larger than 10% of the population

bootstrap distribution

the distribution of many bootstrap statistics approximating the sampling distribution

standard error

the standard deviation of a sampling distribution

How is the 95% confidence interval derived using the classical approach

the standard error is estimated from the theoretical sampling distribution of the statistic. The point estimate is the center of the interval and the margin of error is the z multiplier *the standard error

population

the whole we are interested in

side by side box plot

two box plots one for each group

scatterplot

twovariablesquantitative

What are Confidence Interval Interpretations about

unknown population parameters NOT sample statistics or individuals

t distribution

use for quantitative data, paired data ie one mean degrees of freedom symmetric and bell shaped centered at zero more spread out at tails than the z distribution smaller sample, more variability, fatter tails one mean df=n-1

sample

used to provide estimate of population average

standardized bar plot

useful for understanding fraction +associations, useful if the primary variable in the stacked bar plot is relatively unbalanced lose send of how many cases each bar represents

quantitative discrete

variable counted ex # of absences, # w/ jumps, whole #

quantitative continuous

variable that can take on any value within a range (usually measuring) ex sq footage, height, weight

best to use when data is symmetric

variance and standard deviation

randomized experiment

when individuals are randomly assigned to a group

statistic

when number is being calculated on a sample of data

Parameter

when number is being calculated on an entire population true value we use stats to estimate the parameter

we use a ____ confidence interval if we want to be more certain that we capture the parameter.

wider

cluster

would not represent all population 1. break up populations into groups 2.sample a fixed # of clusters>>>include all observations from each of those clusters in the sample use bc more economical and geographic limitation example the mayor of Gainesville would like to take a survey of Gainesville residents. he decides to send out pollsters to randomly selected city blocks and randomly select participants from each city block.

estimate mu using

x bar


Kaugnay na mga set ng pag-aaral

Human Resource Management Final Exam Review

View Set

Chapter 5.2: CONTENT THEORIES OF MOTIVATION

View Set

Maternity and Pediatric Nursing Ch 33

View Set