Soci 220 w/ Dr. Linneman (Exam 3) TAMU

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

- Not being able to read a participant's handwriting - A participant not answering a question - A participant circling two answers on one question - Inapplicable data (they don't answer contingency questions)

What are some reasons that data would be labeled missing?

- Often no major effect on findings - Easy to find if they cause invalid code - Hard to find if they do not cause invalid code ex: a 30 year old's age being entered as '40'

What are the characteristics of random errors in data entry?

- Can have a major effect on findings - Often easy to fix once discovered ex: Interpreting all of the crossed out numbers as zeros and entering numbers that are 10 times too high.

What are the characteristics of systematic errors in data entry?

N/N, N/O, O/N N = nominal O = ordinal

What are the levels of measurement that Lambda use?

- Variable names - Variable labels - Value labels - Missing values - Re-code variables - Create variables

What are the six things that define data?

The mean, median, and mode are used to identify the typical, common, average, or middle number (central tendency)

What are the three measures of central tendency?

1. Range (distance from highest to lowest) 2. Variance 3. Standard deviation

What are the three measures of dispersion?

1. Random errors 2. Systematic errors

What are the two types of data entry errors?

- Describe strength of a relationship in a sample - Measure substantive significance

What do descriptive bivariate statistics do?

Summarize/describe the findings on one variable in a sample

What do descriptive statistics do?

- Distinguish different types of missing data - Aid in checking accuracy of data

What do missing value codes do?

The sum of the previous VALID percentages ex: A table has numbers 1-10 that stand for 'years of college completed'. How many people completed 4 years of college or less? Look at the cumulative percent across from '4' on the table

What does 'cumulative percent' stand for in a table?

How many times a variable shows up in the data ex: 703 people bubbled in '1' on our gender question, so we have 703 males in our sample

What does 'frequency' stand for in a table?

If the word 'system' is listed under 'missing' in a table, it stands for the answers that were left blank.

What does 'system' stand for in a table?

Shifts focus from what was found on one variable (univariate) to the relationship between two variables.

What does bivariate analysis do?

It's likely that there is a relationship between these variables and it's not just sampling error

What does it mean if we have a high 'F'?

That is the number of stray codes that were found

What does the number mean that's listed at the intersection of 'missing' and 'frequency' in a table?

The number 8. It means that the question was inapplicable. ex: Female participants would get an 8 on a question about prostate exams.

What is a common missing value code that refers to data that is supposed to be missing?

(N + 1) /2 N = # of cases

What is a formula that can be used to find the middle case or cases in a set of data?

You are keeping the original variable, but now it's a simplified version of the variable ex: Taking out strongly support and strongly oppose so you just have "support, neutral, oppose"

What is a re-code variable?

They act as a key, they tell you what everything means ex: 1=yes, 2=no, etc.

What is a value label?

They are used to describe what the variable name is referring to, you can use up to 40 characters.

What is a variable label?

There are two people and one person has a higher level of education and a higher level of income than the other person (positive relationship).

What is an example of a same pair?

There are two people and one person has a higher level of education but they have a lower level of income (negative relationship).

What is an example of an opposite pair?

Analyzing two variables at a time

What is bivariate data analysis?

Finding and fixing errors in data stray code cleaning ex: Noticing a '3' on a set of data where 1=male and 2=female, then going back to the interviews to fix the mistake

What is data-cleaning?

Analyzing three or more variables at a time

What is multivariate data analysis?

explained variation / total variation

R^2 = ________ / _________

Percentage

R^2 is not strength of a relationship, it's a ______

No, they are straightforward

Do ratio variables need value labels?

It tells you where the median is ex: (5 cases + 1) / 2 = 3 This means that the median is the 3rd case in the data

Does the formula (N + 1)/2 find the median, or does it tell you where the median is in a set of data?

ALL of the cases, including the invalid ones

'Percent' in a table stands for ______

No, it would be way too big, there would be hundreds of unique numbers

Can you put R/R data in a crosstab?

Missing data

Data analysis software treats blanks as _____

The extent to which one can become a better guesser of where a case would stand on a dependent variable by knowing where the case stands on an independent variable.

Define Proportionate Reduction in Error (PRE)

R/R relationships

For ___________, statistical significance is derived from r and R^2, so there are no separate inferential statistics

high number / low number

High F = ?

They're changed to blanks, or designated as missing values

How are invalid responses handled?

Multiply each number in the valid column by the corresponding number in the frequency column, add these numbers together, then divide by the valid total. (see powerpoint slide 64) ex: (0 x 45) + (1 x 30) + (2 x 35) + (3 x 40) + (4 x 50) + (5 x 25) = 545 / 225 = 2.3

How do you calculate mean using a frequency table?

Frequency / Total x 100 ex: 63 students in our sample live on campus, there were 412 respondents total (including those that gave invalid answers). 63/412 x 100 = 15.3% is the percentage of all of the cases that said they live on campus

How do you find the 'percent' in a table?

Frequency / Valid total x 100 The valid total is the total that is listed before all of the missing data on the table

How do you find the 'valid percent' in a table?

Suppose there are six cases in a set of data (6 + 1) / 2 = 3.5 The middle cases are 3 and 4, so you would find the 3rd and 4th cases in the data, average them together, and that would give you the median.

How do you find the median if you have an even number of cases?

Add 1 to the valid total, then divide by two to find out which case you are looking for. Then start adding numbers in the frequency column until you get around the case number. (see powerpoint slide 66) ex: 225 + 1 = 226 / 2 = 113 (we're looking for the 113th case) 45 (valid 1) + 30 (valid 2) + 35 (valid 3) = 110

How do you find the median in a frequency table?

Find the highest number in the frequency column, then find the corresponding variable/number in the valid column (see powerpoint slide 65)

How do you find the mode in a frequency table?

You can have many modes. Do not average anything for mode.

How do you find the mode with an even amount of numbers; average the two middle numbers or have two modes?

Multiply the standard error by two ex: The mean is 21, the standard error is 2 2 x 2 = 4, the confidence interval is +/- 4, so the 95% confidence interval is 17-25.

How would you find the confidence interval for a set of data if the confidence level is 95%?

Multiply the standard error by three

How would you find the confidence interval for a set of data if the confidence level is 99%?

"We are 95% confident that, in the population, the mean legal abortion support score is between 17 and 25- between moderate opposition and very weak support." (This is on a scale of 8 to 40, 24 is neutral. 17 and 25 are not percentages)

How would you interpret a set of data that has a 95% confidence level and a confidence interval of 17-25?

150, because there are 150 people in the sample to whom the question does not apply.

If we have a sample of 100 men and 150 women in a sample and we ask them a question about prostate exams, how many 8's should there be?

There were more opposite pairs than same answers ex: -0.32 --> There were 32% more opposite pairs than same pairs

If we use the formula for Gamma and get a negative answer, what does that mean?

As education goes up, income goes up, but this is a weak relationship. It also means that there were 11% more same pairs than opposite pairs.

If we use the formula for Gamma to test the strength of the relationship between education and income and our answer is 0.11, what does that mean?

There is a near zero probability that the relationship in the sample occurred purely by chance and does not exist in the population. We reject the null hypothesis Our hypothesis is supported. (Assumes that the trend you predicted appeared in your data)

Interpret the following: "Statistical significance was equal to 0.000"

There is a 3% probability that the relationship in the sample occurred purely by chance and does not exist in the population. We reject the null hypothesis Our hypothesis is supported. (Assumes that the trend you predicted appeared in your data)

Interpret the following: "Statistical significance was equal to 0.03"

There is a 41% probability that the relationship in the sample occurred purely by chance and does not exist in the population. We fail to reject the null hypothesis Our hypothesis is not supported.

Interpret the following: "Statistical significance was equal to 0.41"

12% of the variation in income (dependent variable) is explained by sex (independent variable).

Interpret this: Eta^2(sex/income): 0.12

No, it is not

Is statistical significance an effective measure of relationship strength?

No, the mean may not even be close to where most cases are ex: If half the class gets a 50 on the test and half get a 100, the mean is 75, even though everyone is 25 points away from that score.

Is the mean an effective measure of where most people are on a variable?

Yes, with a large enough sample even the weakest relationship will be found in the population

Is using statistical significance a good way to find relationships in a population?

low number / high number

Low F = ?

Measures of spread

Measures of dispersion is also known as _____

Statistical significance determines whether there is enough evidence to reject the null hypothesis in favor of your hypothesis.

Null Hypothesis: In the population, there is no relationship between the independent variable and dependent variable. Your hypothesis says that there is a relationship in the population. What does statistical significance do in this situation?

Square root of the variance

Standard deviation is the ______

Sampling Error

Standard error is a measure of ____

explained variation + unexplained variation

Total variation = ______ + _______

The number 8 is often used for missing data, so their data could be counted as missing. The solution is to use -8 for men who answer '8', so that their data will be counted.

We are conducting research on a co-ed sample of people and we ask them how many prostate exams they have had in their life. A couple of men answer '8'. What's a potential problem with this?

2 years^2 Rough interpretation: On average, the seven people are 2 years^2 away from the mean of 22.

We average the numbers in the 'squared distance from mean' column and find that the variance is 2 years away from the mean. What unit of measurement should we use in our interpretation?

7

We have a sample of 7 college students that are 20, 21, 22, 23 and 24 years old. What does N equal?

That education is the only thing that determined people's incomes

We have a scatterplot that shows the relationship between education (x-axis) and income (y-axis). What would it mean if all of the points were on the regression line?

It's the cut off for a hypothesis being supported. If the p-value is less than .05 that means that there is less than a 5% chance that the relationship in the sample resulted purely from chance and does not exist in the population and the null hypothesis is rejected. If it is greater than .05, we fail to reject the null hypothesis and our hypothesis is not supported.

What is significant about the p-value .05?

The average distance of all of the cases from the mean

What is standard deviation?

It measures the chance that a relationship in a sample (regardless of strength) exists in the population, or resulted from sampling error.

What is statistical significance (p-value)?

How strong or important a relationship is in a sample. Measures of association are based on Proportionate Reduction in Error (PRE).

What is substantive significance?

Find out how far away the cases are from the middle on average

What is the best way to measure dispersion (spread) in data?

Data Processing: -happens before data analysis - Preparing data for computer data entry - Formatting electronic data for easy use and interpretation Data analysis: - Looking at what you found and analyzing it -actually seeing the results of the research

What is the difference between data processing and data analysis?

8 = refers to data that is supposed to be missing (inapplicable questions) 9 = refers to other missing data

What is the difference between using 8/-8 and 9/-9 when entering missing data?

The distance from the mean line to the regression line, this shows how much the dependent variable effected the independent variable.

What is the explained portion of total variation?

(same- opposite)/(same + opposite)

What is the formula for Gamma?

Variation between groups / Variation within groups

What is the formula for finding 'F'?

Ordinal & Ordinal

What is the measurement that Gamma uses?

The points that do not fall in between the mean line and the regression line, the distance from the mean line cannot be explained by the relationship between the independent and dependent variable. ex: If someone has little formal education but makes a huge salary, their data would likely fall in the unexplained portion of an income vs. education graph

What is the unexplained portion of total variation?

Analyzing one variable at a time

What is univariate data analysis?

The average squared distance of all the cases from the mean

What is variance?

There are no units of measurement Corrected statement: "On average, the six people are five years away from the mean of 22."

What is wrong with this interpretation? "On average, the six people are 5 away from the mean of 22"

Usually one little mistake doesn't mess up the data very much, but more importantly, the mistakes can balance out because you are just as likely to enter an age that is ten years too high as ten years too low.

What keeps random errors from having a major impact on data?

1. The relationship you predicted must appear in the data 2. Statistical significance must be < 0.05

What two conditions have to be met in order for a hypothesis to be supported?

If you are guessing someone's income level based off of how much formal education they have, you will be able to reduce your errors. The more two variables are related, the more you can reduce your errors.

What's an example of PRE?

No, it's very important to disregard missing cases

When finding the mean in a set of data, should we include the missing cases in our average?

If both people have the same answers for both variables ex: They have the same level of education and same level of income

When looking at data for same pairs and opposite pairs, how could some pairs slip through the cracks?

Never, if no one picked a certain answer then it won't be there

When would a zero show up in the frequency column?

Units. ex: You can't just say "they are 7 away from the mean," you would have to say "they are 7 years away from the mean."

When writing an interpretation, you must include the ____

Zero

When you add up all of the numbers in the 'distance from the mean' column, the sum should be _____

People with brown hair would have a higher variance for IQ because their IQ scores would be all over the place. Harvard graduates would have IQ's that are concentrated toward the high end of the scale.

Which group would have a higher variance for IQ? A. Graduates from Harvard B. People with brown hair

r

Which has a possible range from -1 to 1, r or R^2?

R^2

Which has a possible range from 0 to 1, r or R^2?

Median

Which measure of central tendency involves finding the number in the very middle of a set of numbers/averaging the two middle numbers together?

Lawyers would have a lower standard deviation for years of education because their data would be concentrated at one end of the scale. The data for people who were born in June would have a huge range.

Which of the two groups would have a lower standard deviation for years of education? A. Lawyers B. People who were born in June

The range is 14 would be correct, because range is just one number

Which of these is correct: 1. The range is 14 2. The range is from 20 to 34

Gamma

Which one has a possible range of -1 to 1, Gamma or Lambda?

Inferential statistics

Which type of statistics allow you to make conclusions about one variable in a population based on a relationship in a sample?

Nominal variables don't increase or decrease and Lambda uses nominal variables. It's not positive or negative. ex: Your views on alcohol can't increase or decrease your race, because race is not a number, it's a nominal variable.

Why can't Lambda be negative?

There is a 94% chance that the relationship in the data occurred by chance.

Your p-value is 0.94. What does that mean?

Inferential

______ statistics can tell you about samples vs. populations

Descriptive

______ statistics never tell you about samples vs. populations


Ensembles d'études connexes

chapter 11 quizzes (Spring 2015, Fall 2018, Spring 2016, Fall 2015, this years quiz)

View Set

BIOL 1030 - Chapter 12 MasteringBiology

View Set

MedSurg Ch 27- Assessment and Management of Patients with Hypertension

View Set

International Chapter 8 questions

View Set

Alta - Chapter 8 - Confidence Intervals - Part 1

View Set

software engineering collection 2

View Set