Soc 46B Final Exam Prep.

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

After formulating our null and alternative hypotheses, there are several steps to test them with data:

1.) Formulate the null and alternative hypotheses. 2.) Make necessary assumptions for our inferences to be valid. a. Does the sample represent the population? b. Are we analyzing the variable correctly given its level of measurement? c. Is the sampling distribution Normal, allowing us to use the Normal table? 3.) Calculate the test statistic a. How different is the observed sample statistic from the null value for the population parameter? 4.) Find the p-value for the test statistic. a. Assuming the null hypothesis is true, what is the probability that we find thesample statistic that we do? 5.) Make a conclusion regarding the null and alternative hypotheses.

hypotheses testing is always

2 tail

mediators (correlation)

= a third variable that helps explain the relationship between an independent and dependent variable Example: could workers' type of occupation explain some of the relationship between education and earnings? - occupation is mediator b/c it is part of the process of linking education and earnings (helps to explain the relationship)

example w/ proportions (ONE group INDEPENDENT)

A pollster wants to use sample data on voters to test whether Candidate A will win an upcoming election. The candidate will win the election if their proportion of votes exceeds half (0.50). Based on prior polls, Candidate A is in the lead. Given that Candidate A has been in the lead in previous polls, the pollster's alternative hypothesis is that the majority of the population supports the candidate. Given the importance of 0.50 as the threshold for winning, we can select 0.50 as the null value, 𝜋" (subscript 0) = 0.50. In that case, our null hypothesis is that exactly half of the population supports Candidate A (i.e., that the population proportion equals the null value). Notice that we have provided both verbal and symbolic representations of our hypotheses. We write both versions because we want to make extremely clear what we're testing mathematically (symbols) and what that means substantively (words).

step 4 elaborated - find the p-value (probability) for the test statistic

After calculating the test statistic, find the associated probability (or "p-value") in the Normal table or t-table This probability tells us the chance that we find the sample statistic that we do if the null hypothesis is true - It does not tell us the probability that the null hypothesis is true. - If the probability of the sample statistic is quite small, then the data provide evidence contrary to the null hypothesis. To find the p-value, we have to know which tail of the sampling distribution to use - This depends on our alternative hypothesis - When we use only one tail (the left or the right), we call it a "one- tail test." When we use both the left and the right tails, we call it a "two-tail test." (refer to pic) In the first example, our alternative hypothesis was that the proportion supporting Candidate A is greater than half, 𝐻+ (subscript 0) : 𝜋 > 0.50. - This is a one-tail test. - Looking at the figures above, that means we should find the probability in the right tail. - After calculating our z- statistic, we would take the probability from the z-table and subtract it from 1.00 to calculate the p-value. (since its to the right) - we are only given probabilities to the left In practice, however, almost all sociological research uses two-tail tests.

Decision Errors

As described above, the conventional 𝛼 = 0.05 threshold means that 5% of the time, we will reject the null hypothesis when we should not. However, we will also sometimes fail to reject the null hypothesis when we should reject it. These mistakes are called decision errors. However, we cannot really know whether the null or alternative hypothesis is true, so we cannot know when we make these decision errors. We can only know the approximate probability of making these errors. remember that when we fail to reject the null hypothesis, its as if the the null hypotheses were true, but we can't be completely sure so we fail to reject (stick w/ this to understand decision errors here) so, as it states, when we reject a null hypothesis when we should've failed to reject the null hypothesis, that would be a type I error - also, remember if we fail TO reject the null hypothesis when we should've rejected the null hypothesis, then that could've lead to making a type "2" error (to = 2) distinct from when we use the other statement using reject hypotheses first (just something here to help you remember)

Step 0: Identifying the Variables (2 group independent)

Before beginning the hypothesis test, you must first identify your independent and dependent variables - Remember, the research question guides which variable is which. Research questions are often phrased like, "Is Y different by X?", where Y is the dependent variable and X is the independent variable. Research questions might also be phrased like, "Is there a difference in Y between groups A and B?", where Y is the dependent variable and groups A and B are the categories of the independent variable, X. Another clue for identifying independent and dependent variables is the temporal ordering of the things they represent. If one variable is something that occurs early in life or tends to be stable over time (i.e., gender, race/ethnicity, childhood family background), that variable is likely the independent variable. In the example research question, we think race is generally formed early in life and is stable over time for an individual.1 Homeownership status is generally a characteristic of adults, determined when they form their own households - A person's race would come prior to theirhomeownership status, making homeownership the dependent variable and race the independent variable. Remember that "White" and "Black" are not variables, in this case—they are categories for the nominal variable "race." After identifying the dependent variable, we also need to identify its level of measurement - If the dependent variable is interval-ratio, we will compare the means of that variable between the two groups - If the dependent variable is ordinal or nominal, we will compare the proportions of a particular category between the groups. In this example, the dependent variable—homeownership status—is a nominal variable with two categories: homeowner and renter. - Our hypotheses will test for a difference in the proportion of White and Black families that are homeowners.

step 3 - ex. w/ means (2 group independent)

Calculate the test statistic - using the formula for the means now

Associations between Variables

Comparison of means between groups: • Does the typical value of an interval-ratio variable differ between categories of a nominal or ordinal variable? Comparison of proportions between groups: • Does the proportion of a category in one nominal or ordinal variable differ between categories of another nominal or ordinal variable? To assess associations between two interval-ratio variables, we can calculate the correlation between them.

show the data (correlation)

Correlations assume a linear (straight line) relationship between the variables They can't measure other kinds of associations well

ex. earnings by education (correlation)

Education has a main effect on earnings Earnings and education are moderately positively correlated (with p<0.001) = workers w/ more years of education tend to have higher earnings *we have to assume the sampling distribution is normal*

Confidence Intervals for Comparisons of Two Groups

Finally, we can also assess differences between groups using confidence intervals. The calculation is similar to confidence intervals with one group. When we calculated a confidence interval for one group, we used the following approach: 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 ± 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 With two independent groups, we use the following approach: 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐𝑠 ± 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟 𝑓𝑜𝑟 𝑡h𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 We can calculate the margin of error for the difference in sample statistics by combining the standard errors of each individual group. - Remember that the standard deviation is the square root of the variance. We can add variances together but not standard deviations. To combine standard deviations or standard errors, we can add the variances together and take the square root. - The table below presents the formulas for the confidence intervals for differences between groups. (refer to pic)

Step 5: Conclusion (2 group independent)

Finally, we make our conclusion regarding the hypotheses using the p-value Using the standard alpha = 0.05 threshold, we reject the null hypothesis if 𝑝 ≤ 0.05 and we fail to reject the null hypothesis if 𝑝 > 0.05. NOTE: If 𝑝 ≤ 0.05 and the sample statistics also follow the hypothesized pattern (i.e. > or < in the way we predicted), we also accept the alternative hypothesis In this example, 𝑝 < 0.05 and we reject the null hypothesis. We also accept the alternative hypothesis. - We conclude that homeownership is statistically significantly more common for White household head than Black household heads. - It is also possible that this conclusion is incorrect. In which case, we could be making a Type I error.

step 5 elaborated - make a conclusion

Finally, we need a rule for deciding whether the evidence is strong enough to reject the null hypothesis. The p-value tells us the probability that we would find the sample statistic as different from the null value as we observed if the null hypothesis is true A large p-value means that it's fairly probable that any difference between the observed sample statistic and the null value could be due to random chance. - The null hypothesis could be true. In that case we fail to reject the null hypothesis. A very small p-value means that the difference between the observed sample statistic and the null value is very unlikely to happen by random chance. - In that case, the data provide strong evidence contradicting the null hypothesis, so we would be willing to reject it.

step 4 (dependent groups)

Find and report the p-value We find the p-value for this test statistic in the exact same way as in previous units. The sample size is greater than 30, so we use the z-table. The test statistic is negative, so we use the number directly from the table to find the probability in the left tail. We double that number to find the p-value for a two-tail test. The z-statistic is less than -3.50, so: - refer to pic (any z-stat. -3.5 or less is equal to .0002 according to the z-table)

step 4 - ex. w/ means (2 group independent)

Find the p-value n>30, so we use the z-table The z-statistic is positive, so we subtract the value from the z-table from one then double it. 𝑝=2×(1.00−0.8289)=𝟎.𝟑𝟒𝟐𝟐

Hypothesis Tests Comparing Dependent Groups

For categorical dependent variables (i.e., summarized with proportions), the steps for testing hypotheses comparing two dependent groups are the same as comparing hypotheses with two independent groups - We will not cover hypothesis tests of proportions with dependent groups in this unit because there is nothing different compared to the previous unit. For interval-ratio dependent variables (i.e., summarized with means), the only differences between hypothesis tests with dependent and independent groups are step 1 (formulating hypotheses) and step 3 (calculating the test statistic). Everything else is the same.

formulating hypotheses continued w/ example (2 groups independent)

For the example research question, we predicted that the proportion of homeowners among White families is greater than the proportion of homeowners among Black families. Null hypothesis: There is no difference in homeownership between White and Black families.𝐻o: 𝜋𝑊h𝑖𝑡𝑒 − 𝜋𝐵𝑙𝑎𝑐𝑘 = 0 Alternative hypothesis: Homeownership is more common for White families than Blackfamilies. 𝐻𝑎: 𝜋𝑊h𝑖𝑡𝑒 − 𝜋𝐵𝑙𝑎𝑐𝑘 > 0

Step 1: Formulating Hypotheses with Two Groups (Independent)

Formulating the null and alternative hypotheses for tests with one group requires us to find a theoretically relevant null value to test the sample statistic against. - The situation is somewhat more straightforward when comparing two groups: Null hypothesis: There is no difference in the means or proportions of the two groups. Alternative hypothesis: The means or proportions of the two groups are different. (Perhaps one is greater than or less than the other.) For our example research question, the null hypothesis would simply be that the proportions of homeowners among Black and White families are the same. Our alternative hypothesis is guided by past research and theory. - There are substantial racial disparities in income and wealth. Racial/ethnic residential segregation remains high, constraining the housing options for many families of color. There is also evidence of ongoing racial discrimination by realtors and mortgage lenders. As a result, we would predict that homeownership is less attainable for Black families than for White families = In other words, the proportion of homeowners is greater among White families than among Black families. (alternative hypothesis) When expressing the hypotheses with symbols, we need parameters for two different groups. For the example research question, we want to represent the proportion of families that own their homes. - We represent proportions with (pi symbol) - However, we need to distinguish between the proportion of homeowners for White and Black families. - We usually represent these proportions with something like 𝜋𝐵𝑙𝑎𝑐𝑘 and 𝜋𝑊h𝑖𝑡𝑒 (B & W are subscripts) [Select a short but clear subscript to specify the parameters for each group] - for the subscript - Studies will often use just the first initial, which would be 𝜋𝐵 and 𝜋𝑊 in this case. (B and W are subscripts) Finally, we use these symbols to express the expected relations between the groups in the null and alternative hypotheses Perhaps the most intuitive way to express the null hypothesis (the proportion of homeownership is the same among White and Black families) would be 𝜋𝐵𝑙𝑎𝑐𝑘 = 𝜋𝑊h𝑖𝑡𝑒. However, we'll see in step 3 (calculating the test statistic) that we are actually assessing the difference between the groups (i.e., subtraction). - If there is no difference between the groups (i.e., the proportions are the same), then the proportion for one group minus the proportion for the other group will equal zero (refer to pic) If our alternative hypothesis simply predicts that the groups are not the same without specifying which group will have a higher proportion, the difference in the proportions between the two groups will not equal zero 𝜋𝑊h𝑖𝑡𝑒 − 𝜋𝐵𝑙𝑎𝑐𝑘 ≠ 0. (W and B are subscripts) If the proportion for the first group is greater than the proportion for the second group, the difference between them will be greater than zero, 𝜋𝑊h𝑖𝑡𝑒 − 𝜋𝐵𝑙𝑎𝑐𝑘 > 0 (b/c you'll end up with a positive number) If the proportion for the first group is smaller than the proportion for the second group, the difference between them will be less than zero, 𝜋𝑊h𝑖𝑡𝑒 −𝜋𝐵𝑙𝑎𝑐𝑘 < 0 (b/c you'll end up with a negative number) It is very important to be consistent with the order of the two groups. You may put the two groups in whichever order you prefer, so long as you correctly specify whether the hypothesized difference will be greater than or less than zero in the alternative hypothesis - In this example, our alternative hypothesis could also be 𝐻𝑎 : 𝜋𝐵𝑙𝑎𝑐𝑘 − 𝜋𝑊h𝑖𝑡𝑒 < 0. {But is very important that the order you choose is the same throughout the entire test. If you accidentally switch the order between formulating the hypothesis and calculating the test statistic, you will also accidentally test the opposite of your alternative hypothesis!}

When we conduct a formal hypothesis test

If we are making predictions about population means (interval-ratio variables), we use the symbol μ If we are making predictions about a population proportion (nominal or ordinal variables), we use the symbol p. (pi)

Step 3: Calculating the Test Statistic (2 group independent)

In hypothesis tests with a single group, the test statistic was for the difference between the sample statistic and null value. - In hypothesis tests with two groups, we simply test the difference in sample statistics between the two groups (represented by groups A and B in the formulas below). - refer to pic Notice that the sample statistics in the bottom part of each fraction refer to the total sample instead of one group or the other. We use the sample statistics from the total sample to approximate the standard errors because hypothesis tests assume the null hypothesis is true - If the null hypothesis is true, the population standard deviation for group A is the same as for group B, and these standard deviations will be the same as for the overall population. In that case, the standard deviation for the total sample is our best approximation for the population standard deviation. Similarly, if the population proportions for groups A and B are the same, they will both also equal the overall population proportion. In that case, the proportion for the total sample is our best approximation for the overall population proportion.

step 3 calculations w/ example (2 group independent)

(note: in the calculations, we use sample proportions. In writing out the hypotheses, we use population proportions w/o the hat symbol) here, we are also doing the calculations w/ the proportion formula since we are dealing with proportions and not means (would be diff. for means - formula) Remember to keep the order of the groups consistent in the hypotheses and test statistic formula. (don't attempt to flip the order of any numbers around or you will throw off your alternative hypothesis completely) - when doing the calculations - p. 6 for quick reference - unit notes

correlation coefficient

(r or 𝜌): a statistic between -1.0 and 1.0 that quantifies the association between two continuous variables Ex: Midterm scores and average homework scores were positively correlated r = 0.48 (moderate relationship) The line in the chart is the "line of best fit"-the straight line that passes through all the data points as closely as possible. The line of best fit helps visualize the association. when the dots are more bunched = more relation when the dots are more scattered = less relation

significance tests w/ 2 groups summarized steps (independent)

0. Identify the independent & dependent variables 1. Formulate the hypotheses 2. Make necessary assumptions 3. Calculate the test statistic (z- or t-statistic) 4. Find the p-value (z- or t-table) 5. Make a conclusion regarding the null and alternative hypotheses *remember that p-value is probability in 2 tails* - refer to pic

example w/ proportions - p. 9 notes (ONE group independent)

Let's return to the example with the pollster predicting that the majority of the population supports Candidate A The pollster surveys 200 registered voters, and 120 say they support Candidate A. Let's walk through the five steps of a hypothesis test Step 1 - formulate hypotheses Null hypothesis: Exactly half the population supports Candidate A. 𝐻" (subscript 0) :𝜋=0.50 Alternative hypothesis: More than half the population supports Candidate A. 𝐻+ (subscript a) :𝜋>0.50 Step 2 : State necessary assumptions We assume the survey represents the voting population. We assume support for Candidate A should be measured categorically. (People either do or do not vote for a candidate, so this seems reasonable.) We assume the sampling distribution for the sample proportion is Normally distributed because the sample is sufficiently large (n>30). Step 3 : Calculate the test statistic If 120 of 200 sample members support Candidate A, the sample proportion = 120/200 = .60 (plug into formula the values) - refer to pic. Step 4: Find the p-value Because our alternative hypothesis was 𝜋 > 0.50, we are looking for a right-tail probability with z = 2.82. We subtract the probability in the Normal table from 1.00. However, standard practice is to use a two-tail test. We should multiply that probability by 2 𝑝=2×(1−0.9976)=𝟎.𝟎𝟎𝟒𝟖 Step 5: Make a conclusion Using the standard 𝛼 = 0.05, we found that p<0.05. (.0048) = We reject the null hypothesis and accept the alternative. We would conclude that the majority of the voting population supports Candidate A.

ex 2 (independent or dependent)

Migration researchers plan to study how migration experiences affect families' well- being. They first survey immigrant families from Mexico living in the United States and ask them which neighborhoods they lived before migrating to the US. Next, they survey families living in those neighborhoods in Mexico. They assess the effects of migration by comparing the migrant families to families from their neighborhoods of origin in Mexico. = dependent (connected by neighborhoods)

Data with Independent and Dependent Groups

Most surveys include observations that are independent from one another Imagine a simple random sample of UC Davis undergraduate students. The sampling frame is the roster of all currently enrolled UCD students. Then we randomly select 100 students from the roster. Because we selected students at random, there is nothing linking one student to another in terms of our sampling design. - We would say the observations in our sample are independent because randomly selecting one student doesn't affect the probability of selecting any of the other students. There are many instances in which the observations in our data are related to one another in some way, however. - The CPS is a stratified random sample of the US population conducted by the Census Bureau and Bureau of Labor Statistics. The surveyors first select a random sample of geographic areas, then select a random sample of households within those areas. Finally, the surveyors collect data on every person within the selected households. - Some observations in the sample are related to some other observations because they come from the same households. = We would say the observations in the sample are DEPENDENT because selecting one observation affects the probability of selecting a different one. Often, these relationships between observations are by design. Studies of child development often collect data on children and their parents. Children represent one group in the data and parents represent another group. The two groups are dependent because we can match the observations of one group (children) to the other group (parents). Observations of the same individuals or families at different points in time are another common form of dependent groups. - Surveys that interview the same individuals periodically over time are called longitudinal or panel surveys. - Surveys that interview a sample of people only once are called cross-sectional surveys. Observations from panel surveys form dependent groups—observations from each time point are a group (time 1 and time 2), and the observations from the time points are related because they are of the same people.

ex. - earnings by education (correlation)

Null Hypothesis: there is no association between annual earnings and education Alternative Hypothesis: Annual earnings and education are positively associated. refer to pic no association = no diff. b/w the mean or proportion the greek letter p is the population parameter (used in the hypotheses making); the r is the sample statistic (used in the calculations)

correlation coefficient contd.

Number (r) between -1 and +1 (the -1 and +1 are at the far right and left ends) - NOT the values right close by 0 Nature of the relationship: positive or negative r Strength of the relationship: absolute value of r = How closely the scores cluster around the line of best fit through the data points weak = the data points are really scattered out

Example with means: (ONE group independent)

Our research question asks whether the average adult in the United States supports or opposes government policies that would reduce income inequality. We will use the General Social Survey (GSS) to address this question. The GSS includes a survey question that asks respondents whether the government should reduce income differences between the rich and the poor. Respondents can answer on a scale from 1 (yes government action) to 7 (no government action). We can make the reasonable assumption that this seven-point scale is an interval-ratio variable. - SO, = The population parameter of interest would be a mean, μ. (b/c we are dealing w/ an interval ratio variable) If low values represent support and high values represent opposition, the null value would be the middle of the scale. The midpoint of the one-to-seven scale is four, which we can select as our null value, 𝜇" (subscript 0) = 4. Then our null hypothesis is simply that the average value of the scale variable in the population is four. Unlike the previous example, the researcher does not express any strong prior belief about whether the average adult is supportive or opposed. Our alternative hypothesis is simply that the average value of the scale variable in the population is not four (either less than four or greater than four). Null hypothesis: The average adult in the United States neither supports nor opposes government intervention in income inequality. 𝐻" (subscript 0): 𝜇=4 Alternative hypothesis: The average adult in the United States either supports or opposes government intervention in income inequality. 𝐻+ (subscript a) :𝜇≠4 The wording for the null and alternative hypotheses is quite similar with a key exception—the null hypothesis uses "neither/nor" and the alternative hypothesis uses "either/or." Although this can seem like a minor difference in wording, those phrases have complete opposite meanings. Remember that it's very important to be extremely clear and precise with your language when articulating hypotheses!

Example w. means (ONE group independent)

Our research question asks whether the average adult in the United States supports or opposes government policies that would reduce income inequality. We will use the General Social Survey (GSS) to address this question. The GSS includes a survey question that asks respondents whether the government should reduce income differences between the rich and the poor Respondents can answer on a scale from 1 (yes government action) to 7 (no government action). The sample statistics are: y-bar = 3.57, s = 1.96, n = 539. Step 1: Formulate Hypotheses Null hypothesis: - The average adult in the United States neither supports nor opposes government intervention in income inequality. 𝐻" (subscript 0) :𝜇=4 (middle of 1-7) Alternative hypothesis: - The average adult in the United States either supports or opposes government intervention in income inequality. 𝐻+(subscript a) :𝜇≠4 Step 2: State necessary assumptions We assume the survey represents the US adult population. We assume the seven-point scale measuring support/opposition for government policy regarding income inequality is an interval-ratio variable. We assume the sampling distribution for the sample proportion is Normally distributed because the sample is sufficiently large (n>30). Step 3: Calculate the test statistic - refer to pic Step 4: Find the p-value The z-statistic is negative, which will give us the probability in the left tail. No need to subtract the number in the z-table from 1.00. The z-statistic above is -8.6, but the largest negative number on the z-table is only -3.49. What do we do? If you look at the bottom of the z-table, a message reads, "For 𝑍 ≤ 3.50, the probability is less than or equal to 0.0002." That means the probability is 𝑝 ≤ 0.0002. We still need to double that number for a two-tail test: 𝑝 ≤ 2 × 0.0002 = 0.0004, so 𝒑 ≤ 𝟎. 𝟎𝟎𝟎𝟒. Step 5: Make a conclusion - Using the standard 𝛼 = 0.05, we found that p<0.05. - We reject the null hypothesis and accept the alternative. We would conclude that the average (dealing w/ the mean) adult in the US is (slightly) supportive of government policy to limit income inequality.

almost all sociological research uses two-tail tests

Recently, the editors of the American Sociological Review (one of sociology's leading peer-reviewed journals) wrote, "one- tailed tests should only be used in rare, exceptional circumstances with proper justification." Similar to confidence intervals, we want to find the probability that the observed sample statistic is far from the null value in either direction (in the left and right tails combined). Even if our alternative hypothesis specifies a direction (e.g., the population parameter is greater than the null value), including probabilities from both tails makes our test more *conservative* (less likely to reject the null hypothesis). The entire process of null hypothesis testing starts from a skeptical disposition, so a more conservative p-value is usually better.

Regression and Casualty

Regression helps limit confounding, but still requires substantive knowledge to adequately interpret. Causal inference in observational studies is limited. General conditions for causality: - The cause precedes the effect. - The cause is related to the effect. - There is no plausible alternative explanation.

logic of significance tests

Start with predictions from past research and theory Null hypothesis: skeptical assumption Alternative hypothesis: prediction of interest Assume the null hypothesis is true, and compare the sample data to that prediction. - If the data are somewhat consistent, we can't reject the null. - If the data strongly contradict the null, we can reject it.

ex. w/ means (step 0) - (2 group independent)

Step 0: Identify the variables The research question asks if mean rent has increased over time. The dependent variable is monthly rent and the independent variable is time. (2018, 2019) The two groups we're comparing are the observations from 2018 and from 2019. The test will use formulas for means because monthly rent is an interval-ratio variable.

Step 4: Find the p-value (2 group independent)

Steps 4 and 5 of hypothesis testing are the same for one group or two groups. - If 𝑛𝑡𝑜𝑡𝑎𝑙 ≥ 30, we use the z-table to find the p-value of our test statistic. - If 𝑛𝑡𝑜𝑡𝑎𝑙 < 30 for interval-ratio variables, we use the t-table. Standard practice is to always calculate the p-value using two tails.- If the z-statistic from step 3 is negative, we find the left-tail probability in the Normal table and multiply it by two. - If the z- statistic is positive, we find the right-tail probability by subtracting the value from the Normal table from 1.00 then multiply by two. In this example, z = 2.23. The z-statistic is positive, so we will subtract the value from the table from one, then double it to calculate the correct p-value. If we subtract this number from one and double it, the p-value is 𝑝 = 2 × (1.0000 − 0.9901), which works out to 𝑝 = 0.0198.

Step 2: Necessary Assumptions (2 group Independent)

The assumptions needed for valid inferences are the same with two groups as they are with one group. However, we now need to make these assumptions about both groups. Do the data represent the populations? - For this example, we must assume the survey represents the population of Black household heads and the population of White household heads. Are we using the correct level of measurement? -Homeownership seems pretty clearly categorical, so testing the hypothesis with proportions seems fine Is the sampling distribution for the difference Normal? - With one group, we assumed the sampling distribution for the sample statistic is Normal. With two groups, we must assume the sampling distribution of the difference in the two sample statistics is Normal. (The Central Limit Theorem tells us this will be true if the sample size is large enough, about n>30. In this example, we can assume the sampling distribution for the proportion of homeowners is approximately Normal for White and Black household heads because n>30)

In general, there are three options for alternative hypotheses with one group

The population parameter can be less than the null value (𝜇 < 𝜇"), greater than the null value (𝜇 > 𝜇"), or simply not equal to the null value (𝜇 ≠ 𝜇"). The same is true if we're using proportions, p. 𝜇" = mu subscript 0

Step 1: Formulate hypotheses

The research question asks if average monthly rents have increased over time, meaning the researchers seem to predict the mean is greater in 2019 than 2018. Null hypothesis: The is no difference in average monthly rent in Sacramento between 2018 and 2019. 𝐻o: 𝜇2019−𝜇2018=0 (the years are subscripts) Alternative hypothesis: Average monthly rent in Sacramento is greater in 2019 than in 2018. 𝐻𝑎:𝜇2019−𝜇2018>0

Example with Means (2 group independent)

The steps for hypothesis testing with two groups are the same for comparisons of means. For this example, our research question asks, have average monthly rental costs for a two-bedroom apartment in Sacramento increased from 2018 to 2019? - To address this question, a researcher studied a random sample of Sacramento apartment listings on Craigslist in each year. The sample statistics are given below. (refer to pic)

step 2 elaborated

There are three key assumptions needed for making valid inferences. 1. First, we need to assume that the sample adequately represents the population. If it does not, then the data aren't appropriate for our hypotheses. 2. we need to assume that we're analyzing the variable appropriately given its level of measurement - Consider the seven-point scale variable from the example above. We conducted a test using means because we assumed that the scale is an interval-ratio variable. But if the variable would more accurately be treated as an ordinal variable, it would not be appropriate to analyze it using means. 3. we must assume the sampling distribution is either Normal or t-distributed, which lets us use the Normal table or t- table in the following steps. The Central Limit Theorem (unit 8) tells us that the sampling distribution will be Normal if the sample size is sufficiently large (about 30 or higher). - The sampling distribution of a sample mean (interval-ratio variables only) will be t-distributed if the sample is small (less than 30) and the population distribution is approximately Normal.

ex 1 (independent or dependent)

UC Davis administrators want to measure trends in the amount of student loan debt taken on by recent graduates. The administrators took a simple random sample of recent graduates from the class of 2005, then took another simple random sample of recent graduates from the class of 2015. = independent

Confidence Intervals

Unlike null hypothesis testing, doesn't begin with assumption of null value or no difference Instead of binary decision rule (reject null or fail to reject), provides a likely range of values for population difference follows the same logic in format for analysis (refer to pic) - point estimate & margin of error

Procedure for finding the p-value from a test statistic:

Using the Normal table: - If the z-statistic is negative, the p-value is two times the value from the Normal table. - If the z-statistic is positive, the p-value is two times (1 - the value from the Normal table). (b/c we would be looking to the right end of the normal distribution tail instead of the left, so we subtract from 1) Using the t-table: - Take the absolute value of the t-statistic (if t is negative, make it positive) and find where that t-statistic would fit in the t-table with the correct degrees of freedom. - The range for the p-value is given by the "two tails" row at the top (see unit 10).

Step 2 (ex. w/ means) - necessary assumptions (2 group independent)

We assume the random samples of Craigslist posts are representative of monthly rents in Sacramento in 2018 and 2019 This may be a strong assumption because different landlords may use different advertising strategies. We need to assume that landlords that use Craigslist charge similar amounts to landlords that don't use Craigslist, on average. We assume monthly rent is an interval-ratio variable. That's pretty reasonable. We assume the sampling distribution for the difference in mean rent between the two years is approximately Normal. The Central Limit Theorem tells us this should be true because the total sample size is greater than 30.

show the data! - correlation

We can also look for groups or "clumps" in the data Associations between predictors and outcomes might be confounded by group differences "Confounding" means that a third variable makes it look like there is relationship between two variables when there truly isn't. .74 = strong positive correlation .03 = not a super strong relationship the second graph down illustrates multiple dependent variables//// there can also be multiple independent variables as well

step 3 elaborated - calculate the test statistic

We next calculate the test statistic, which is the z-statistic or t-statistic that compares our sample statistic to the assumed population parameter (the null value). Because we are trying to find the probability of a sample statistic, not a value for an individual, we use the formulas for sampling distributions. When calculating the test statistic for means, we can use the sample standard deviation to approximate the population standard deviation, 𝑠 ≈ 𝜎 For proportions, we use 𝜋" (subscript 0) in the standard error (bottom of the fraction) because we are still assuming that 𝜋" is the correct population parameter.

five steps of formal hypothesis testing (2 group independent)

We will consider the following example

Confidence intervals Example

What is the 99% confidence interval for the change in net worth from 2008 to 2012 among people in the US who lost at least 25% of their income? to find the z-score, subtract 1 from .99 and then divide by 2 then find the corresponding z-stat as close as poss to the value of the probability (.005) = -2.58 z-score

Example (dependent groups)

When people in the US experience income losses, do they spend down their savings or take on more debt to help make ends meet? We can predict that among people who experience income drops over time, average net worth (total savings and assets minus total debts) also decreases. We can evaluate this expectation with data on middle-aged adults (40s and 50s) from the National Longitudinal Survey of Youth (NLSY), a nationally representative panel survey of the Baby Boomer generation. Total income and net worth were measured for the SAME (dependent) people in 2008 and 2012 - We restrict the sample to observations from people whose income declined 25% or more between years - The table below presents sample statistics for this group of people experiencing income drops.

the null value

a theoretically skeptical value for the population parameter, derived from past research or theory We formally represent the null value with 𝜇" (subscript 0) for means and 𝜋" (subscript 0) for proportions.

significance tests (steps)

also note that when you are in step 5 with examining the decision rule, when we reject the null hypothesis (NOT FAIL TO REJECT), we accept the alternative ex. problem - p. 3 lecture 11 notes (1 group independent tests)

correlation

an association between two variables (x and y) positive correlation: y is higher when x is higher + or - the relationship is going up or down (the line) negative correlation: y is lower when x is higher

sociological hypotheses often focus on a comparison of means and proportions

between two groups (2 group independent)

Step 3 (dependent groups)

calculate the test statistic The primary difference between tests of two independent groups and two dependent groups is in the calculation of the test statistic. Instead of using sample statistics for each group, we only need sample statistics for the differences between the groups. - This makes the test statistic formula for two dependent groups similar to the formula for one group. - The table below summarizes the test statistic formulas for one group, two independent groups, and two dependent groups with interval-ratio variables.

to verify that we accept the alternative hypothesis

check the value of the sample statistic and whether or not that value is greater than or less than in accordance with the alternative hypothesis - if the alt. hypothesis says the difference is greater than 0 and you get a sample statistic of a value lower than 0 (like -.53), you cannot accept the alt. hypothesis (among formula calculation in step 3 like when subtracting 2 y-bars from each other - that value) in the pic, -57,063.72 is less than 0 and in the last steps, we decided to accept the alt hypothesis ---- this is in accordance w/ the alt hypothesis set in the beginning where the population mean diff. is less than 0 (p. 4-5 unit 13 notes)

step 5 - ex. with mean (2 group independent)

conclusion Using the standard alpa =0.05 threshold, p>0.05 and we fail to reject the null hypothesis. - - We cannot conclude that average monthly rents in Sacramento increased from 2018 to 2019 If this conclusion is incorrect, we would be making a Type II error.

step 5 (dependent groups)

conclusion We apply the same alpha-level, alpha=0.05, as a decision rule. We found p=0.0006, so p<0.05 and we reject the null hypothesis. We also accept the alternative hypothesis, which predicts that net worth declined on average between 2008 and 2012 among people who lost 25% or more of their income. We could be making a Type I error (rejecting the null hypothesis when we should not).

hypothesis

def - a prediction for the patterns we will observe in the data These predictions come from our research questions, past evidence, and guiding theory. We learned that any prediction includes two competing hypotheses: Null hypothesis: the skeptical assumption that our data will not reveal notable differences or associations Alternative hypothesis: the prediction of interest derived from theory and past research We can reject the null hypothesis and accept the alternative hypothesis in the face of strong data. - If the data are not sufficiently strong, we fail to reject the null hypothesis. Remember, we do not "accept" the null hypothesis, just like courts do not find defendants "innocent." We can only "fail to reject" the null, just like courts find defendants "not guilty."

when calculating the degrees of freedom for t-distributions

do n-1 when working w/ two independent groups do n-2 when working with two dependent groups (b/c its 2 diff. means that are being compared to one another)

reading a regression table

each column is a separate regression model, including only the variables withcoefficients in that column each row gives the coefficient for an independent variable Positive = higher average value of dependent variable (values) Negative = lower average value of dependent variable (values) Model 1: Positive, statistically significant association between years of education and annual earnings Model 2: Positive, statistically significant association between years of education and annual earnings, controlling for sex and age Model 3: The positive, statistically significant association between years of education and annual earnings is only partly mediated by occupation, work hours, and job tenure.

when finding a positive z-score and then finding the associated probability (dependent groups)

flip the sign after subtracting from 1 and multiplying by 2 (only do this when you have to subtract from one) ex. z score = 23.41 p > .9998 p = 2(1-.9998) = .0004 p < .0004 = final probability answer

step 1 (based on ex on net worth) - dependent groups

formulating hypothesis As in previous units, the null hypothesis is that there is no difference between groups. In this example, the null hypothesis predicts that average net worth is not different between 2008 and 2012 for the same individuals - The substantive interpretation of this hypothesis is that people's net worth does not decline when they experience income drops - Perhaps they avoid spending their savings or taking on debt, and they reduce their expenses instead. If we had two independent groups, we would compare the average in 2008 and average 2012. However, we have observations for the same individuals in each year. - We can calculate the change in net worth for each individual, then estimate the average change for all individuals. - Let's represent the interval-ratio variable for net worth with 𝑦 - Net worth in 2008 is the variable 𝑦2008 and net worth in 2012 is 𝑦2012 (subscripts here) - The difference in net worth for EACH individual is 𝑦𝑑 (difference)= 𝑦2012 − 𝑦2008 - The average change in net worth is 𝑦̅𝑑 Combining information from the two time points, we find ourselves back in the scenario with a single group of individuals and one variable measuring the change in net worth between years. Null hypothesis: There was no average change in net worth between 2008 and 2012 among people who lost 25% or more of their income. 𝐻o:𝜇𝑑 (subscript) = 0 Alternative hypothesis: On average, net worth declined between 2008 and 2012 among people lost 25% or more of their income. - 𝐻𝑎:𝜇𝑑 <0 (less than because the change would be less than 0 as a negative value and b/c money would be lost)

correlation matrix

grid of variables reporting correlation coefficients between each pair a correlation b/w a variable and itself is ALWAYS 1 in the ex. here (in the pic), we are dealing w/ categorical variables) look at where the columns and rows intersect to find the correlation value

bivariate tests

involve two variables the dependent variable (we're calculating the mean or proportion for) and the independent variable (the variable defining the groups we're comparing).

regression

is a statistical technique that calculates the correlations between a dependent variable and multiple independent variables simultaneously. A regression model estimates these correlations while also adjusting for the relationships between independent variables. = helps avoid confounding = comparison of different regression models helps assess potential mediators

step 2 - based on ex on net worth (dependent groups)

necessary assumptions The assumptions needed for valid inferences about the population are very similar to those in previous units. Sample: We assume the sample represents middle-aged adults in the US who lost 25% or more of their income between 2008 and 2012. Measurement: We assume net worth and the change in net worth from 2008 to 2012 should be measured with an interval-ratio variable. Sampling Distribution: We assume the sampling distribution for the change in net worth is Normal because the sample size is greater than 30.

visualizing correlations

notice that w/ the almost no correlation, the data points are very scattered out as opposed to the strong positive correlation positive correlation = y is higher when x is higher negative correlation = y is lower when x is higher

p-value is

probability in TWO tails (multiply your value by 2 in step 4 of hypothesis testing)

correlation DOES NOT IMPLY CASUALTY

spurious association: an apparent association between two variables driven by some unseen, third variable confounder: a variable affecting two other variables of interest, biasing the observed association between them

The threshold for deciding to reject the null hypothesis is called

the a level ("alpha level"). - w/ greek symbol The alpha level, which is the same as the error probability, the amount of error we would be willing to accept when we decide to reject the null hypothesis. We reject the null hypothesis when 𝑝 < 𝛼 and we fail to reject if 𝑝 ≥ 𝛼. The most conventional threshold is 𝛼 = 0.05. If we choose this threshold, we reject the null hypothesis when we find a p-value less than 0.05. We fail to reject if the p-value is 0.05 or greater. It also means that we could make the wrong conclusion 5% of the time over the long run when we reject the null hypothesis. (error probability) just note: If we use a lower a level, like 𝛼 = 0.01, we will reject the null hypothesis less often. However, our error rate will be lower when we make this conclusion, only 1% in this case. If we use a higher a level, like 𝛼 = 0.10, we will reject the null hypothesis more often. However, our error rate in the long run will be 10%. Quantitative sociologists tend to avoid the high error rates that come with high a levels. For example, the editors of the American Sociological Review discourage thresholds like 𝛼 = 0.10, which they call "suggestive" evidence.

independent samples

the observations from one group do not depend on the observations from the other group Ex: • Separate samples from two countries, or two points in time. • Random samples of two groups, like men and women.

dependent samples

the observations of one group can be matched to observations from the other group. ex. • Same subjects at two points in time (panel or longitudinal study) • Samples of parents and their children.

When we're testing hypotheses about a single group

we make predictions that the population parameter is equal to (null hypothesis) or different from (alternative hypothesis) some particular value, called the "null value."

testing hypothesis w/ data

we start by assuming the null hypothesis is true Using the skills covered in units 7 through 10, we then compare the observed sample statistics to the predictions of the null hypothesis. If the sample statistics we observe are very unlikely if the null hypothesis is true, then perhaps the null hypothesis is not true. In this case, the data provide strong evidence that contradict the null, so we would reject the null hypothesis. This logic is presented in the figure to the right - Assuming the null hypothesis is true, the sampling distribution should be centered around the null value, 𝜇" (subscript 0) The sampling distribution is Normal, so we can use the z-score formula and z-table to find the probability of a sample mean like the 𝑥̅ we observe or one even more different from the null value. If that probability is very small, then we should doubt that 𝜇" is the true population parameter.

"empirical" means

we test with data

Example: Earnings by Education (correlation)

• Data: National Health Interview Survey• random sample of 200 adult workers in the US • Dependent variable: Annual Earnings• measured in total $ earned from job in previous year • Independent variable: Education• measured with years of completed education


Kaugnay na mga set ng pag-aaral

Chapter 3 - Core Finance: Management Accounting

View Set

everyday life as a learning experience (commonLit)

View Set

Chapter 16. Cloud Computing and Client-side Virtualization

View Set

Socialization, Interaction, and the Self Ch 4; Ch 5 Separate & Together Life in Groups

View Set

Measures of retention including: - relative sensitivity of recall, recognition & relearning

View Set

US History- Unit 6 Benchmark Review (Topic 15)

View Set