Scope & Methods Exam II

¡Supera tus tareas y exámenes ahora con Quizwiz!

With the mean difference and the standard error of the mean difference, we can easily calculate the Z or t statistic

(Ha -Ho) / SE Mean difference

Type 2 error

(false negative) Conclude there is no relationship when there is

Type 1 error

(false positive) When we conclude there is a relationship when there is not

Lambda Calculated using the following:

(prediction error with no knowledge of IV - prediction error with knowledge of IV) / prediction error with no knowledge of the IV

Ideal Sample

A larger sample with lower dispersion leads to less random error and greater confidence in the sample statistics

What is the probability that a randomly chosen mean will be less than -1 standard errors?

13.6%

What is the probability that a randomly chosen mean will be more than +2 standard errors?

2.1%

What is the probability that a randomly chosen mean will NOT be between -1 and 1 standard errors?

34.1%

Lambda

A measure of association that measures the PRE when the researcher uses the value of the IV to predict the value of the DV Used when at least one of the variables is nominal Lambda is an asymmetric measure of association, which means if you flip the IV and DV you will get a different result

The null hypothesis

A null hypothesis assumes that, in the population there is no relationship between the independent and dependent variables Any relationship that we do observe, we assume is part of random sampling error. The null hypothesis always starts with the assumption that error explains the results We are not trying to show that our hypothesis is true, rather we are trying to show that the null hypothesis is implausible Null hypothesis: H0 Alternative hypothesis: Ha

Populations

A property of a population (parameter) will have a central tendency and dispersion For interval variables we are interested in the mean and the standard deviation

Confidence intervals

A range abound the mean estimate where we are confident that X% of all sample estimates will fall For 95% confidence intervals, we are looking at a range of +/- approx 2 standard errors - 1.96 to be exact For 99% confidence intervals, it is 2.576 standard errors To compute the 95% confidence interval range, we create an upper bound and a lower bound by adding and subtracting 1.96 standard errors from the sample mean

Samples

A sample is a number of cases or observations drawn from a population It is easier for us to collect and see the parameters of samples

Sampling frames

A sampling frame is the method for determining your sample Poor sampling frames lead to selection bias Selection bias leads to compositional differences between the sample and the population Estimators with high bias are not going to be valid

Sample size component

As sample size goes up, random sampling error goes down

Variation component

As variation goes up, random sampling error increases

Why is this Important?

Assuming that the distribution is normal allows us to make precise inferences about the percentage of sample means that will fall within x standard errors of the mean

Normal distribution probability table

Assuming the true population mean is 66, what percentage of the time would a sample mean fall 2.8 standard units lower .26 percent of the time - so it seems pretty safe to say that the sample mean is likely close to the population mean than the hypothetical mean

Comparing two sample means: the confidence interval approach

Compare men v. women on opinions of gun control Looking at men, we know that 95% of all samples from the same population will fall within +/- 1.96 standard deviation of the mean - this range is important If we calculate the confidence interval for women, and we find that the two ranges do not overlap, hen it is very unlikely that these differences are due to random error Why? Because the chance that the true value is the same for both is less than 5% (not that it is not impossible)

Comparing two sample means: The P-value approach

Confidence intervals and your eyeballs will not steer you wrong, but the lack precision P value is an expression for the standard error of the difference between the two sample means P values will give you the exact probability of obtaining the observed difference if the null hypothesis is correct When the p values is less than .05 we can reject the null hypothesis

The Chi square test of significance

Definition: determines whether the observed dispersal of cases departs significantly from what we would expect to find if the null hypothesis were correct X^2: pronounced Ki square It works with cross tabular data The null hypothesis for a X2 test is that the distribution of cases down each column should be the same as the total distribution for all the cases in the sample

Inference

Drawing a conclusion from facts that are known or assumed Ex: parachutes

Why not always a census?

Expensive Extensive

Sample Statistic

First of all, what is a statistic? The word statistic refers simply to a measure A sample statistic is simply a measure of some attribute of a sample Example: we have a sample of students, they all have an age. We could compute the sample statistic of mean age When a sample statistic is used to estimate a population parameter, it is called and estimator

Calculating standard deviation

First we calculate each value difference (or deviation) from the mean Deviation = (individual's value - 𝛍 ) Second, since some of the values are negative, we square each value Deviation squared Third, we add all the individual values together This is called the total sum of squares Not important now but will be later for regression and correlation Fourth, we calculate the average of the sum of the squared deviations - this is called the variance Variance = total sum of squares / n Finally, we compute the standard deviation by taking the square root of the variance Standard deviation = √ variance

Assessing likelihood of hypothetical mean

First we find the difference between the sample mean and the hypothetical mean. In this case it is 66-59 = 7 Seconds as we are working in standardized units we need to convert this raw score into a z-score, in other words, we don't care that there is a difference of 7 points, we want to know what the difference i in units of standard error We calculate that by taking the difference and dividing by the standard error: 7 / 2.5 = 2.8 In other words, if the population mean was 66 our sample would be a 2.8 standard errors LOWER

Determining the standard deviation of the sample

For the population we calculate variance as summed squared deviations / N For the sample, we calculate variance a summed squared deviations / (n - 1) So our standard error when using the sample standard deviation will be higher

Interpreting

How often will random processes give us a mean that is 2.7 standard units below the true mean? We can look this up precisely using a normal distribution probability trade, which gives us the proportion of the normal curve above any given value of Z Take a look (table 6-3) - how much of the curve lies above 1.96 Read down of the left, go across right for second decimal The number is the percentage How does this relate to our 95% confidence interval Is hypothetical mean wrong? Is 66 probably right or probably wrong? - wrong How do you know? - Only a .26 chance it is right

Sample size and sampling error

If a sample is not random then the size of the sample has no relationship to how accurate the statistic will be Random samples do not get rid of error, instead they introduce random sampling error - difference between the population and the sample that occur by chance We are okay purposefully introducing this error because we know how to deal with it

Sample size & Sampling error

If a sample is not random, then the size of the sample has no relationship to how accurate the statistic will be Random samples do not get rid of error, instead they introduce random sampling error - difference between the population and the sample that occur by chance We are okay purposefully introducing this error because we know how to deal with it

The normal distribution II

If we draw a sample from the population, we can assume it is part of a normal distribution of samples The standard error of the sample mean expresses how sure we are that the sample mean is the same as the population mean The larger the samples size, the lower the standard error

The normal distribution

If we repeat the dice roll activity 100 times, and then plot the mean of each try, we will get a normal distribution When we take a sample from a population, there is error Some samples will have a lot of error (deviation from the mean), some will only have a little When we gather enough samples, we can see that the vast majority are going to be close to the mean, but there will still be some that fall at the extremes What values did you get for the dice roll mean?

The Central Limit Theorem

If we take an infinite number of samples of size n from a population of N, the distribution of the means of the samples would be a normal distribution The distribution of sample means would have a mean equal to the true population mean The distribution of sample means would have a standard error equal to the population's standard deviation / square root n

Using the t distribution

In a normal distribution, we don't care about sample sample size - but here the sample size impacts the shape of the curve - so it is extremely important

95% confidence intervals

In general we work with 95% confidence intervals, but we can have others Due to the central limit theorem, we know precisely what this interval will look like If we want an area of the curve where 95% of the sample estimates will fall, how many standard deviations are we looking at? For 95% confidence intervals we are looking at a range of +/- approx 2 standard errors - 1.96 to be exact For 99% CI it is 2.576 standard errors

Problems with lambda

Not good when there is low variation Not good then the within category modes are the same as the overall mode (will return a lambda of zero, even if there is a relationship)

Estimating Population Parameters

Population parameter = sample statistic + random sampling error Random sampling error = (variation component) / (sample size component) What does this suggest? As sample size goes up, sampling error goes down

Estimating population parameters

Population parameters

Populations have information, characteristics, attached to them (think of these as variables) Example: if my population is all students in this classroom, what are some population parameters? To get the actual population parameter, we would perform a census In a census, every unit of the population is measured and statistical inference is not necessary

Proportional reduction in error (PRE)

Prediction based metric that varies between 0 and 1 The measure tells you precisely how much better you can predict the dependent variable when you know the value of the independent variable If knowledge of the IV doesn't tell us anything, the value will be 0 If knowledge of the IV givers us perfect prediction, the value will be 1

Degrees of freedom

Property of many distributions (including student's t) Equal to the number of observations (sample size) - the number of parameters being estimated The higher the degrees of freedom, the less random sampling error and the more confidence in the sample statistic typically n - 1

The Variation Component: Standard Deviation

Random sampling error = (variation component) / sqrt n Nominal and ordinal variables have less precise measures of variation, but for interval variables we can compute the standard deviation

The variation component: standard deviation

Random sampling error = (variation component) / √ n Nominal and ordinal variables have less precise measures of variation, but for interval variables we can compute the standard deviation

Properties of samples

Samples are drawn from a population - here we are only interested in random samples Samples have size (n) Samples have standard deviation (sigma) that express variation Samples have standard error (sigma / sqr(n)) that measures the accuracy of the sample mean in relation to the population's actual mean Taking multiples samples form the population No matter what the original population looks like, no matter the distribution or variation, the distribution of the sample means will look the same way What will they look like?

Random sampling

Simple Stratified Only one method allows for statistical inference Ensures that each member of the population has an equal chance of being selected into the sample Maximizes the likelihood that measurement error is random and not systematic

Comparing two sample means: the eyeball approach

Simply we can quickly compute a confidence interval around a mean difference by adding/subtracting 2 times the standard error As we are working with standardized scores, we can check to see fi 0 is within that interval. If so, we cannot reject the null We can also simply ask if the mean difference is at least twice the standard error. If no, we cannot reject the null. Obviously not precise, but a useful shortcut

Confidence intervals II

So far we've been working with point-estimates, that is a precise number (e.g. the mean) In inferential statistics we do not get precise number s- but rather we get uncertain predictions We express those predictions with something called a confidence interval A confidence interval is the range in which X% of the possible sample estimates will fall in by chance

standard error of a mean difference

Square each means standard error Add the two together Take the square root

Standard Deviation VS. Standard Error

Standard deviation: referring to the data itself In a normal distribution, 95% of all the data fall within 1.96 standard deviations of the mean Standard error: referring to the estimate of the mean

Statistical inference

Still drawing conclusions from facts known or assumed to be true, however now we have (and include) measures of uncertainty Statistical inference is when we make estimates (inferences) about a population based on information (conclusions and facts) about a sample It is a set of tools to help us assess how much confidence we have that what we observe in a sample reflects what exists in the population

which error do we want to minimize

TYPE 1

Frequency tables

Take sample, write frequencies Histogram

Measures of Association

Test of significance tell us if a relationship is likely due to random error Measures of association add more information: they communicate the strength of the relationship between the independent and dependent variable Statistical significance is often the thing we are most interested in, but a measure of association will allow you to start properly interpreting results

Interpreting X2

The X2 has its own distribution that like the t-distribution, depends on degrees of freedom Degrees of freedom are calculated by the following D.o.f. = (number of rows - 1) (number of columns - 1) General rule: the larger x2, the lower the probability that the results were obtained by chance Precise value: can be obtained by looking at a critical values of x2 table (table 7-8)

The law of large numbers

The average that of the results of running an experiment a large number of times will be close to the expected value For example, we would expect that 50% of the time a coin flip will be heads, but it's still possible to flip five tails in a row However, if we flip long enough, the mean will approach the expected value

Difference between means

The difference between means is easily calculated by subtracting the null hypothesis mean from the alternative hypothesis mean

Sample Sizes

The relationship between sample size and error is non-linear, so our sample size component is going to be a transformation of the sample size (n) Random sampling error = (variation component) / sq root n Larger samples are always better, but as the sample size increases, the additional samples become marginally less important

Sample sizes

The relationship between sample size and error is non-linear, so our sample size component is going to be a transformation of the sample size (n) Random sampling error = (variation component) / √ n Larger samples are always better, but as the sample size increases,the additional samples become marginally less important

Central limit theorem

The sample means will have a normal distribution and standard deviation Must be random sampling The clt also tells us some other things If we were to take an infinite number of samples, the mean of our sample means would converge to the true population parameters Larger sample sizes will be more accurate

Compute 95% CI

To compute the 95% confidence interval range, we create an upper bound and lower bound by adding and subtracting 1.96 standard errors from the sample mean Sample mean = 75 standard error =2.2 Lower confidence boundary = 75 - 1.96(2.2) Upper confidence boundary = 75 + 1.96(2.2) 95% confidence interval = [70.7 - 79.3] What does this mean? LOOK AT EXAMPLE

Z-Scores

Transform the sample mean into a standard score with a mean of 0 To better make sense of what we are looking at, we can standardize the scores into Z-scores Z-scores have a mean of 0 and a standard deviation of 1 We need to convert our scores to z-scores to properly leverage the power of statistical inference

Sample proportions

Used for nominal / ordinal data Same logic applies, but we can calculate variation differently Example: gun control Sample proportion of individuals who support gun control = p = .72 Sample proportion of individuals who do not support gun control = q = .28 Sample size = 100 Standard error of a sample proportion : sqroot of pq / sqroot of n = .045

Student's T

Used when we have a small sample (<30) The larger the sample, the closer the t distribution resembles the z distribution We care here about degrees of freedom (sample size - 1) The higher the degrees of freedom the lower of the random sampling error

What is the standard deviation

Variation is expressed as a function of the deviations from the mean of a distribution

What is the standard deviation?

Variation is expressed as a function of the deviations from the mean of a distribution

Inference from distributions

We know that in an infinite number of samples from a population, the sample statistic will be equal to the population parameter (Central limit theorem) Every time we take a sample, we treat it as if it were a part of larger population of infinite samples We know that if we take all of these samples and plot them, they will have a normal distribution Any single sample, then, is going to fall somewhere in that normal distribution - some much more frequently than others We are trying to determine how likely it is that this particular sample statistic would occur as a result of random sampling error Remember: At any given time, we can calculate the population parameter by adding the random sampling error to the sample statistic Random sampling error is a function of variation and sampling size

Inference using the student's T distribution

We use the student's t distribution when sample sizes are small The normal distribution always has the same shape, but the student's t-distribution changes its shape depending on the size of the sample The smaller the sample, the less confidence it permits us to have As the sample size gets larger, the student's t distribution gets closer and closer to being normal After about 30 degrees of freedom, indistinguishable from a normal curve Instead of using the standard deviation of the population (sigma) we use the standard deviation of the sample (s)

Standard error of a sample mean

We've been discussing the random sampling error as being the following: Random sampling error = (variation component) / (sample size component) This is more commonly known as the standard error of a sample mean Standard error of the sample mean = sigma / sq root n

Lambda values:

Weak: <= 0.1 Moderate: >0.1 but <= 0.2 Moderately strong: > 0.2 but <= 0.3 Strong: > 0.3

Normal estimation

What we did was just called normal estimation Assume that the hypothetical claim is correct If so, how often would the sample mean occur by chance? If it is unlikely to occur by chance, we can infer that the hypothetical claim is incorrect

Cramer's V

When the dependent variable has a low amount of variation, lambda isn't very helpful - we turn to Cramer's V (based on x^2) Cramer's V takes a value of 0 to 1 (no relationship to perfect relationship) This is not a PRE measure, so this does not reflect predictive accuracy (we cannot say that it improves prediction by X%) It does help us see relationships that lambda misses

Inference using the normal distribution

Why do we do this? Why does it matter? We can say things about the population with a certain degree of certainty Standardize and give ability to perform inference

Using the standard error

Within one standard error or negative one standard error What is the probability that a randomly chosen mean will be between -1 and 1 standard errors?

Problems with X2

X2 is very sensitive to sample size Large sample sizes will often produce statistically significant results even if the relationships doesn't look substantively interesting When expected frequencies are very small (less than 5) it does not work well at all Remember: x2 is only telling us the likelihood that a relationship has occurred by chance (same with p-values) - it does NOT help us interpret the data

Calculating Z score

Z = (deviation from the mean) / (standard unit)

Z - scores

Z scores are standardized units - expressed in units of deviation from the mean of the distribution When a z-score is +/- 1.96, then the difference between the population and sample mean would only happen by chance <5% of the time We can look up the proportion of the normal curve about the z score using a z table

Populations

a population is just the entire universe of cases we are interested in

A raw score:

a value expressed in its original form (e.g. years of age, dollars earned, etc.) also called an unstandardized score

Purposive sampling

expert judgment

A standardized score

is a value expressed in units of standard deviation from the mean. Also called a z-score

𝛍

population mean

𝛔

population standard deviation

Snowball sampling

sample grows from sample itself

Multi-stage

sample multiple levels of analysis

The observed frequency

the actual number of cases falling into each cell

The expected frequency

the hypothetical number of cases that should fall into each cell if Ho is correct

Calculating the chi square test of significance

x^2 = sigma (fo-fe)^2 / fe Note: the smaller the difference between the observed and the expected values, the lower the x2

Scope & Methods Exam II

Conjuntos de estudio relacionados

BIO446L final

Fundamentals of Human Resource Management

History Chapter 4 Lesson 1

A. The Study of Political Science (with notes)

Chapter 3 : The Planet Puzzle

Unit 5: Probability Unit Test

AP World History Chapter 14: The Resurgence of Empire in East Asia

biology chapter 4 practice questions

MGMT ch 9

exam 3 (adaptive practice)

Real Estate License Questions

Chapter 2: Financial Statements, Taxes, and cash flow

리그 오브 레전드

Factors & Multiples - Brown

81 carbohydrates test

Chapter 15 The Federal Bureaucracy

Chapter 8: Intelligence

kin 201 final

Ribs 1-12

Pre-Lecture 9 Normal Distribution