Field SPSS 4th edition - chapter 1, Stats Test #1
What info do we want to know about our scores?
- how many total scores do we have? - what are the scores? - how many scores are in each category? -what % of scores are in each category?
What happens if the Standard deviation is more impacted by extreme scores when the sample size is low?
- this means that samples with low n's are likely to be less accurate estimates of the population standard deviation than are samples with high n's We saw this in the earlier chart that showed that the unbiased (corrected) estimate of the standard deviation is closer to the biased (uncorrected) estimate as the sample size increases
What z-score corresponds to a score that is below the mean by 1.5 standard deviations?
-1.5 (the z-score tells you how many standard deviations away from the mean your value is)
What are (broadly speaking) the five stages of the research process?
1. Generating a research question: through an initial observation (hopefully backed up by some data). 2. Generate a theory to explain your initial observation. 3. Generate hypotheses: break your theory down into a set of testable predictions. 4. Collect data to test the theory: decide on what variables you need to measure to test your predictions and how best to measure or manipulate those variables. 5. Analyse the data: look at the data visually and by fitting a statistical model to see if it supports your predictions (and therefore your theory). At this point you should return to your theory and revise it if necessary.
Based on what you have read in this section, what qualities do you think a scientific theory should have?
1.Explain the existing data. 2.Explain a range of related observations. 3.Allow statements to be made about the state of the world. 4.Allow predictions about the future. 5. Have implications.
In 2011 I got married and we went to Disney Florida for our honeymoon. We bought some bride and groom Mickey Mouse hats and wore them around the parks. The staff at Disney are really nice and upon seeing our hats would say 'congratulations' to us. We counted how many times people said congratulations over 7 days of the honeymoon: 5, 13, 7, 14, 11, 9, 17. Calculate the mean, median, sum of squares, variance and standard deviation of these data.
1st Compute the MEAN: ADD- 5+13+7+14+11+9+17 = 76 then divided by the n n= 7, so 76/7 = 10.86 . To calculate the MEDIAN: 1st let's arrange the scores in ascending order: 5, 7, 9, 11, 13, 14, 17. The median will be the (n+ 1)/2 the score. There are 7 scores, so this will be the 8/2 = 4 the 4th. The 4th score in our ordered list is 11. To calculate the sum of squares, first take the mean from each score, then square this difference, finally, add up these squared values: Sum of Sq error is 104.86. The variance is the sum of squared errors divided by the degrees of freedom (N-1) = 17.48 The standard deviation is the square root of the variance: = 4.18
On an exam with μ = 52, you have a score of X = 56. Which value for the standard deviation would give you the highest position in the class distribution? 2 b. 4 c. 8 d. Cannot determine from the info given
2 z=(56-52)/2 = 2
Interval
= intervals on the variable represent = diff in the property being measured Ex: Qualitative ORDER Subtraction makes sense * 0 means something is still there Ex: 0° Outside (Temp)
What does Σ(X)2 stand for and solve it. X= 2,3,4,5
= sum all the X values together, then square that totalΣ (X)2 = 142 = 196
Based on what you have read in this section, what qualities do you think a scientific theory should have?
A good theory should do the following: 1. Explain the existing data. 2. Explain a range of related observations. 3. Allow statements to be made about the state of the world. 4. Allow predictions about the future. 5. Have implications.
What is a test statistic and what does it tell us?
A test statistic is a statistic for which we know how frequently different values occur. The observed value of such a statistic is typically used to test hypotheses, or to establish whether a model is a reasonable representation of what's happening in the population.
What does biased mean?
A test statistic is considered BIASED if the average value of the statistic either underestimates or overestimates the corresponding population parameter If we want our sample variability to be our stand-in for the population variability (as in inferential statistics), it's important that we correct for this bias Dividing the sum of squares by n-1, rather than n, is the way that we make this correction
What is unbiased?
A test statistic is considered UNBIASED if the average value of the statistic is equal to the corresponding population parameter
The difference between the highest and lowest score in the distribution is called the Range Semi-Interquartile range Standard deviation Variance
A. range
A researcher conducts an experiment to see if moderate doses of a new drug have any effect on memory for college students. For this study, what is the independent variable? A.The amount of the new drug given to each participant B. The memory score for each participant C. The group of college students D. The method of administering the drug
A. the amount of the new drug have any effect on memory for college
The sum of the squared deviation scores is SS = 20 for a sample of n = 5 scores. What is our best guess for the variance of the population from which the sample was drawn? 5 c. 80 4 d. 100
A.5 (variance = SS/n-1 if being used to estimate the population)
What is ΣX? X= 2,3,4,5
Add all numbers together and it =14 sum all the values of X ΣX= 2+3+4+5 = 14
What do the sum of squares, variance and standard deviation represent? How do they differ?
All of these measures tell us something about how well the mean fits the observed sample data. Large values (relative to the scale of measurement) suggest the mean is a poor fit of the observed scores, and small values suggest a good fit. They are also, therefore, measures of dispersion, with large values indicating a spread-‐out distribution of scores and small values showing a more tightly packed distribution. These measures all represent the same thing, but differ in how they express it. The sum of squared errors is a 'total' and is, therefore, affected by the number of data points. The variance is the 'average' variability but in units squared. The standard deviation is the average variation but converted back to the original units of measurement. As such, the size of the standard deviation can be compared to the mean (because they are in the same units of measurement).
When is it considered unbiased?
Although the adjustment of subtracting 1 from the sample size might seem small, it is enough to make the sample variance and standard deviation unbiased estimators of the population variance and standard deviation
How do you determine the median if you have an even number of scores?
An even number of scores leaves you with two 'middle' values Take the average of the two middle values so you would take (# +#) /2 = #/2 = your average
Impact of sample szie on the correction bias?
As sample size increases, the difference between the biased and the unbiased estimate of the standard deviation decreases Biased estimate of SD (divide by n) Unibiased- True estimate of SD (divide by n-1) Difference between biased and unbiased
Which of the following symbols identifies the sample variance? S c. σ S2 d. σ2
B. s2
What is the first step to be performed when computing Σ(X+ 2)2 Square each value Sum the squared values Add 2 points to each score Sum the (X + 2) values
C. Add 2 points to each score then square each pf those squared scores)
Which measure of central tendency would be most appropriate for describing the religious preferences of delegates to the United Nations? Mean Median Mode Mode or Mean
C. Mode because you can'r do any math operations with religious categories
Which is the only measure of central tendency that must correspond to an actual score in the distribution? Mean Median Mode Mean Squared
C. mode
Which is the only measure of central tendency that can be used with all four measurement scales? Mean Median Mode Mean Squared
C. mode because its the only one that makes sense of nominal data
The GREs (Graduate Record Exam) is standardized so that it has a mean of 500 and a standard deviation of 100 each year. Assume that this year's scores had a mean of 450 and a standard deviation of 110. What value will a score of 490 on this year's exam have once it's put on the desired distribution, with a mean of 500 and a standard deviation of 100?
Compute z-score for 490 from this year's distribution = (490 - 450)/110 = .36 2) Find the score corresponding to a z-score of .36 in the desired distribution Z = (X - μ) / σ .36 = (X - 500) / 100 36 = X - 500 536 = X Do a 'logic check' of your answer. The original score was above the mean so the transformed score should also be above the mean. The original score was less than one standard deviation above the mean so the new score should be less than one standard deviation above the mean
Discrete Vs Cont The distance a 2011 Toyota Prius can travel in the city with a full tank of gas.
Cont- because of the the distance
Negative skew
Conversely, when the frequent scores are clustered at the higher end of the distribution and the tail points towards the lower more negative scores, the value of skew is negative.
Parameter
Data that talks or describes the pop. A numerical summary of a pop.
What is a Sample?
Describes the sample is a numerical summary of a sample Ex: 39/50 returns cash = 78% or .78
Stat procedures used to orgainze, summerize, and simplify data is called what?
Descriptive stats
Discrete Vs Cont The number of cars that arrive at a drive through between noon and 1 pm.
Discrete- # and between noon and 1pm
Discrete Vs Cont The number of heads obtained after flipping a coin five times.
Discrete- # and five times
What is a Population?
Every possible person or thing Ex: All students in school
Twenty-one heavy smokers were put on a treadmill at the fastest setting. The time in seconds was measured until they fell off from exhaustion: 18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32, 34, 34, 36, 36, 43, 42, 49, 46, 46, 57 Compute the mode, median, mean, upper and lower quartiles, range and interquartile range
First, let's arrange the scores in ascending order: 16, 18, 18, 22, 22, 23, 23, 24, 26, 29, 32, 34, 34, 36, 36, 42, 43, 46, 46, 49, 57. The mode: The scores with frequencies in brackets are: 16 (1), 18 (2), 22 (2), 23 (2), 24 (1), 26 (1), 29 (1), 32 (1), 34 (2), 36 (2), 42 (1), 43 (1), 46 (2), 49 (1), 57 (1). Therefore, there are several modes because 18, 22, 23, 34, 36 and 46 seconds all have frequencies of 2, and 2 is the largest frequency. These data are multimodal (and the mode is, therefore, not particularly helpful to us). The median: The median will be the (n + 1)/2th score. There are 21 scores, so this will be the 22/2 = 11th. The 11th score in our ordered list is 32 seconds. The mean: The mean is 32.19 seconds: = ∑ = 16 + 218+ 222+ 223+ 24 + 26 + 29 + 32 + 234+ 236+ 42 + 43 + 246+ 49 + 57 21 = 676 21 = 32.19 The lower quartile: This is the median of the lower half of scores. If we split the data at 32 (not including this score), there are 10 scores below this value. The median of 10 scores is the 11/2 = 5.5th score. Therefore, we take the average of the 5th score and the 6th score. The 5th score is 22, and the 6th is 23; the lower quartile is therefore 22.5 seconds. The upper quartile: This is the median of the upper half of scores. If we split the data at 32 (not including this score), there are 10 scores above this value. The median of 10 scores is the 11/2 = 5.5th score above the median. Therefore, we take the average of the 5th score above the median and the 6th score above the median. The 5th score above the median is 42 and the 6th is 43; the upper quartile is therefore 42.5 seconds. The range: This is the highest score (57) minus the lowest (18), i.e. 39 seconds. The interquartile range: This is the difference between the upper and lower quartiles: 42.5 − 22.5 = 20 seconds.
Compute the mean but excluding the score of 234. 22 40 53 57 93 98 103 108 116 121
First, we first add up all of the scores: 22 40 53 57 93 98 103 108 116 121 811 We then divide by the number of scores (in this case 10): = ∑ = 811 10 = 81.1 The mean is 81.1 friends.
What are good ways to organize our data?
Frequency distributions
So what does a researcher do if he/she can't measure everybody in the population of interest?
He/she gathers data from a representative sample of that population, and then use statistics to draw conclusions about the population of interest based on the sample a SAMPLE is a set of individuals selected from a population, usually intended to represent the population in a research study The goal is always to draw inferences about the population of interest. The sample is the means to that end. We use the sample as a proxy for the population, and use statistics to draw conclusions about the population based on the sample
Wht are the properties of the median?
If you +/-/*/divide by a constant (ie., the same value) to every score in a distribution of scores, it will change by that constant.It doesn't take the specific value of all scores into consideration, it only focuses on 1 or 2 values, therefore isn't influenced by outliers. For example, the score that was in the middle before you added a constant to all the scores will still be in the middle, only now it will be equal to "old score + constant"
What are the properties of the mean?
If you add a constant (ie., the same value) to every score in a distribution of scores, the mean will increase by that constant In other words, if you add 2 to every score in a distribution of scores, the mean will increase by 2 points This is true for addition, subtraction, multiplication and division The mean changes by whatever value you add/subtract/multiply or divide every score by
What are the properties of Standard deviation?
If you add or subtract a constant to all the observations in a dataset, the standard deviation doesn't change All the individual scores increase/decrease by that constant, but so does the mean. Therefore the distance between each score and the mean doesn't change, so the standard deviation doesn't change If you multiply or divide each observation in a dataset by a constant, the standard deviation will be multiplied or divided by that constant We can see this by looking at two scores from a distribution of scores. X = 5 and 10 in the original database, a 5-point difference. If we multiply each score by two, X now equals 10 and 20, a 10-point difference. The difference between the two scores has doubled.
What is a central tendency?
In statistics a representative, or average, score is called a measure of central tendency is a statistical measure that attempts to determine the single value, usually located in the center of a distribution, that is most typical or most representative of the entire set of scores One useful feature of the measure of central tendency is that it gives us a way of comparing different samples to one another
What does add and subtract consist of in the levels of measures?
Interval Ratio
What does the mean consist of in the levels of measures?
Interval Ratio
Determining the level of measurement: Temperature
Interval - because 0° means something is there
How do we know that a score that is one standard deviation from the mean will be relatively close to it, while one that is two standard deviations away will be quite far?
It has to do with the shape of the distribution of scores As mentioned almost everything that we study in behavioral sciences takes on the shape of the normal distribution
Σ WHat does this symbol mean
It is a Sigma Symbol for the sum of
Why is randomization important?
It is important because it rules out confounding variables (factors that could influence the outcome variable other than the factor in which you're interested). For example, with groups of people, random allocation of people to groups should mean that factors such as intelligence, age and gender are roughly equal in each group and so will not systematically affect the results of the experiment.
Levels of measurement is what and consists of what?
It is the variables that can be split into categorical and continuous, and within these types there are different levels of measures: Categorical: Binary variable Nominal variable Ordinal variable Continuous: Interval ratio
The formula for Variance of a Population σ2 = Σ(X - μ)2 /N What does this equation mean? What does σ2 mean? What does Σ(X - μ)2 mean? What does /N mean?
It means which produces a measure of the average squared differences. It is the symbol for the population variance is sigma squared. The mean is subtracted from each score in the distribution, then that value is squared. The X is those squared differences are then summed together across all scores. The sum of the squared differences is then divided by the number of scores in the distribution
In the info that we want to know about scores: how many total scores do we have relates to what in the frequency distribution? How many scores are in each category relates to what in the freq distribution?
It relates to the frequency total, which you would add up the total in that column. It is how many scores, such as the number of times you see a number. Ex: X= 2,2,4,1,3 You would put in the x and F column x F 2 2 4 1 1 1 3 1
What is the usefulness of Range?
It tells you as a measure of varability the highest scores and lowest scores Let's say the highest score on Test #1 is 78 (out of 80), while the lowest is 40 That tells me that at least one student has a good understanding of the material, and at least one student is struggling to understand it However, it doesn't tell me much more than that I have no idea whether the entire distribution looks like this...
What is the formula for the sample Mean? And what is the category is it in?
M = ΣX / n Central tendency
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the Sample Mean
M = ΣX / n 1st Step: Add up all the numbers 2nd Step: divide it by how many are in the data sets 3rd step: plug in equation- 45+100+75+96+68+95+93+75+100+67+72+83+82+99+81+82+69+69+96+87+65+81+65+93+99+96+96+82+94+71 = 2476 then n= 30 M= 2476/ 30 - M=82.53
What should you do to avoid distorting data in a graph?
Make sure your scales provide a true representation of the data If your scale has a small range then any difference between them will be magnified. Make sure the visual story your graph is telling is consistent with the story the data itself is telling. If showing multiple graphs of the same thing, keep the scales consistent Not everything should be graphed. Some data is better displayed in a table than a graph. Choose the method that gives the reader the best understanding of the data.
What is the Mean on central tendency?
Mean = Average Find the average by adding up all the scores (ΣX) and dividing by the total number of scores In statistics, the symbol for total number of scores is N (for populations) and n (for samples) The symbol for the average is μ (mu) for the mean of a population and M for the mean of a sample In books you'll also see X as a symbol for a sample mean Remember that Greek letters (like μ) are used when we're talking about a population, while english letters (like M) are used when we're talking about a sample Population Mean μ = ΣX N Sample Mean M = ΣX n The mean is the most commonly used measure of central tendency - by far - in behavioral science research It's the only measure of central tendency that takes all the values in the database into consideration The mean is the value that everybody would receive if the total (ΣX) were divided equally by everybody in the population or sample - knowing this will help you to solve for the total if you know the mean and the number of observations if M is 10 and you have 6 observations, you know that ΣX = (M * n) = (10 *6) = 60 The mean is also a 'balancing point' in the distribution of scores The sum of the scores below the mean will be equal to the sum of the scores above the mean The scores below are from our previous example, where we determined that the mean = 5
Which measure of central tendency would be most appropriate for describing the heart rates for a group of women before they start their first aerobics class? Mean Median Mode Mode or Mean
Mean don't expect any extreme values, likely normally distributed
Say I own 857 CDs. My friend has written a computer program that uses a webcam to scan my shelves in my house where I keep my CDs and measure how many I have. His program says that I have 863 CDs. Define measurement error. What is the measurement error in my friend's CD counting device?
Measurement error is the difference between the true value of something and the numbers used to represent that value. In this trivial example, the measurement error is 6 CDs. In this example we know the true value of what we're measuring; usually we don't have this information, so we have to estimate this error rather than knowing its actual value.
Which measure of central tendency would be most appropriate for describing the types of phobias exhibited by patients attending a phobia clinic? Mean Median Mode All are equally good
Mode because you can't do any kind of math cal on types of phobias; the names of the phobias have no meaning
What is a Positively skewed distribution?
Most frequent scores and majority of scores appear on the left side Least frequent scores appear on the right - or positive - side, in the direction of the skew Distribution is not symmetrical Household income in the United States is an example of a positively skewed distribution
What is a negatively skewed distribution?
Most frequent scores and majority of scores appear on the right side Least frequent scores appear on the left - or negative - side, in the direction of the skew Distribution is not symmetrical Total # of points by students in this statistics course is (we hope) an example of a negatively skewed distribution
What is a symmetrical distribution?
Most frequent scores appear in the middle Least frequent scores appear at the tails Graph is symmetrical around the middle value (one side is a mirror image of the other) Shape of the distribution looks like a bell Most common distribution in behavioral science research You'll see this distribution over and over again in this class so make sure you understand its properties
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Are there any outliers in the data?
No outliers
What does the mode consist of in the levels of measures?
Nominal Ordinal Interval Ratio
Determining the level of measurement: Gender
Nominal- because no numerical meaning
Qualitative data
Non-numerical data 99% of the time. Ex: 1% Zip code, Jersey numbers
Measures of variability for samples?
Note that if we simply want to know the variance of our sample (as in descriptive statistics), and we aren't trying to use the variability of our sample as our best estimate of our population variance (as in inferential statistics) we don't have to correct for this bias In that case we use this as our formula for the variance of our sample: s2 = Σ(X - M)2 /n
Quantitative Data
Numbers where doing math makes sense.
What does the median consist of in the levels of measures?
Ordinal Interval Ratio
What does the range consist of in the levels of measures?
Ordinal Interval Ratio
Determining the level of measurement: Letter Grade earned in stats Class
Ordinal- because order matters such as A,B,C,D,F but doesn't have any numerical value
A value that describes a population is called
Parameter Population
Is it a Parameter or Stat? The percentage of all students on your campus that own a car is 48.2%.
Parameter- because it's talking about all students.
A researcher uses an anonymous survey to investigate the television-viewing habits of American teenagers. The entire group of American teenagers is an example of a ___________.
Population
What is statistical power?
Power is the ability of a test to detect an effect of a particular size (a valu e of 0.8 is a good level to aim for)
Qualitative or Quantitative Gender
Qualitative, because it's a non numerical data
Qualitative or Quantitative Zip Code
Qualitative, because it's a non numerical data
Ratio
Quantatitve Put in order (MATTERS) Subtraction Mult/ Dividing ** 0 means nothing is there- No true 0pt Ex: Heart beats per min - Beats/ min
Qualitative or Quantitative Number of days during the week that a college student studied
Quantitative because it's numerical data such as number of days
Qualitative or Quantitative Temperature
Quantitative because it's numerical data where the math makes sense.
Compute the range but excluding the score of 234. 22 40 53 57 93 98 103 108 116 121
Range = maximum score - minimum score = 121 - 22 = 99.
Determining the level of measurement: Number of days a student studies
Ratio-because 0 days studied means that no true 0pt exists
What is estimated population variance and what is estimated population standard deviation? How does this help with the sample?
Remember that the formulas for sample variance and standard deviation are constructed so that the sample variability will provide a good estimate of the population variability because of this, sample variance is often called estimated population variance and sample standard deviation is called estimated population standard deviation
What makes a good graph?
Remember the purpose of a graph is to provide an accurate picture of the data. You know the data that went into the graph but remember your reader doesn't - make sure to label all your axes - provide a caption to the graph - don't copy and paste a graph from SPSS (make your graph pretty) In a report, don't have the headline simply repeat what the graph is showing - you want the text to provide analyses or draw implications from the data, not just reiterate what the reader can already determine just by looking at the graph
What is the formulas for the population sum of squares? and what category is it?
SS = Σ(X -μ)2 Variability ss= ΣX2- Σ(X)2/N
What is the formulas for the Sample sum of Squares? and what category is it?
SS = Σ(X -μ)2 Variability ss= ΣX2- Σ(X)2/n
What is a histograph?
Separate bar for each value Height of bar shows frequency of that value No space between bars Used for interval and ratio data
What is a bar graph?
Separate bar for each value Height of bar shows frequency of that value Space between bars Used for nominal and ordinal data
A value that describes a sample is called a STATISTIC
Statistic Sample
Is it a Parameter or Stat? 100 students are polled and 46 % own a car.
Statistic because it describes the sample, such as 100 students and because its a numerical summary of a sample such as 46% own a car.
Is it a Parameter or Stat? A sample of houses from the united states.
Stats because it describes the sample. Also because of the wording.
Transforming X-Scores Ex Assume that this year's scores had a mean of 450 and a standard deviation of 110. What value will a score of 490 on this year's exam have once it's put on the desired distribution, with a mean of 500 and a standard deviation of 100? Use the z-score as the 'common value' for both
Step 1) What is the z-score for the score of 490 in this year's distribution? Step 2) What score in the desired distribution* corresponds to the z-score in step 1 * In this example, the one with a mean of 500 and a standard deviation of 100
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the IQR (Inter Quartile Range)
Step 1: List in order like you would do to find the median Step 2: then take half and re- listed and cross off as you would do for the median to get your Upper number Step 3: Repeat the step but with the bottom half of the data to get your Lower number Step 4: you take the Upper - the Lower = IQR 50, 63, 64, 65, 67, 68, 69,69, 70, 71, 72, 75, 75, 81, 81, 82, 82, 87, 93, 93, 94, 95, 96, 96, 96, 96, 99, 99, 100, 100 Using the cross of to find the median = 82 50, 63, 64, 65, 67, 68, 69,69, 70, 71, 72, 75, 75, 81, 81, 82 Using again the same method you get 70 50, 63, 64, 65, 67, 68, 69,69, 70, 71, 72, 75, 75, 81, 81, 82 Using again the same method you get 96 So Subtract 96-70= 26 - IQR
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the Sample Range
Step 1: Look for the Highest # in the data Step 2: Look for the Lowest # in the data Step 3: Take the Highest # - Lowest # = Range Highest - 100 Lowest - 45 100- 45 = 55 Range = 55
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the Median
Step 1: You have to re-arrange the numbers in order Step 2: Cross off the numbers to get to the "MIDDLE VALUE" 45, 65, 65, 67, 68, 69, 69, 71, 72, 75,75, 81, 81, 82, 82, 82, 83, 87, 93, 93, 94, 95, 96, 96, 96,96, 99, 99, 100 Median = 82
A population with μ = 37 and σ = 6 is being standardized to a new distribution with μ = 50 and σ = 10. What score in the new distribution corresponds to a score of X = 39 from the original distribution?
Step 1: convert the score from the original distribution to a z-score: z = (X - μ) / σ z = (39 - 37) / 6 z = 2/6 z = .33 Step 2: Find the score in the new distribution that corresponds to a z-score of .33 z = (X - μ) / σ .33 = (X - 50) / 10 .33 (10) = (X - 50) 3.3 = X - 50 3.3 + 50 = X 53.3 = X
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the Sample Standard Deviation
Step 1: s = Σ(X -M)2 /n-1 Step 2: is to take the formula for sample variance and take the square root. Variance = √193.45 - =13.91 Sample standard deviation
Parameters are represented by _______and statistics are represented by ______
Symbols and letters
What are the 2 main values that describe a distribution of scores?
Taken together, central tendency and variability are the two main values that describe a distribution of scores
In the info that we want to know about scores: What are the scores relates to what in the frequency distribution?
The X column, that number of scores
Where do Stats fit in during the Behavioral Science Research Process?
The conclusion- because stats are used to help us draw conclusion about the research
Extreme scores can give a distorted picture of the population such as?
The distance from the more extreme score in this distribution and the mean is quite large which means some scores will be a poor representation of the population.
One key step in research is determining to whom a research question applies - who is the group we're interested in studying? - the entire world? - everybody in the United States? - all adults? - all the people in Illinois? - all students at Governors State? - everybody who takes this class
The entire set of individuals of interest for a particular research question is called the POPULATION
Under what circumstances would you choose the Mean as a preferred measure of central tendency?
The mean is ordinarily the preferred measure of central tendency. The mean is the arithmetic average of a distribution. The mean presented along with the variance and the standard deviation is the "best" measure of central tendency for continuous data. There are some situations in which the mean is not the "best" measure of central tendency. In certain situations, the median is the preferred measure. These situations are as follows: when you know that a distribution is skewed when you believe that a distribution might be skewed when you have a small number of subjects
What are the properties of the normal distribution?
The mean, median, and mode are equal The normal curve is bell-shaped and symmetric about the mean The total area under the curve is equal to 1.0 or 100% We can describe each point in terms of how many standard deviations it is from the mean - We know what percent of scores fall above or below any of these standard deviation values
What is the median
The median score is the middle score It's literally the score that falls in the middle of the distribution of scores if you were to arrange all the scores in the distribution from lowest to highest (or highest to lowest)
Under what circumstances would you choose the Mode as a preferred measure of central tendency?
The mode is rarely chosen as the preferred measure of central tendency. The mode is not usually used because the largest frequency of scores might not be at the center. The only situation in which the mode may be preferred over the other two measures of central tendency is when describing discrete categorical data. The mode is preferred in this situation because the greatest frequency of responses is important for describing categorical data.
The numerator of what formula is called the sum of squares?
The numerator of the variance formula σ2 = Σ(X - μ)2
Under what circumstances would you choose the Median as a preferred measure of central tendency?
The purpose for reporting the median in these situations is to combat the effect of outliers. Outliers affect the distribution because they are extreme scores. For example, in a distribution of peoples income, a person who has an income of over a million dollars would dramatically increase the mean income whereas in reality, most of the people in the distribution do not make that kind of money. In this case, the median is the preferred measure of central tendency.
Ordinal
The same as nominal but the categories have a logical order. Ex: Whether people got a fail, a pass, a merit, or a distinction in their exam. data are assigned in categories categories are ranked or ordered #'s are assigned to categories but are arbitrary - distance between categories is unknown the distance between categories aren't necessary = order is distinguishing characteristic (order of finished, ranking, etc). Can say that 1st > 2nd >3rd or distance between 2nd and 3rd ≠ distance between 1st and 2nd, Because the distance between the values isn't always = Can only perform limited # of Stats tests on the data Qualitative or Quantitative Order Matters EX: Zip codes ( like low to high), Football standings
What is Stats?
The science of collecting, organizing, and analyzing info to draw conclusions, and answer questions. refers to the mathematical procedures for organizing, summarizing, and interpreting data
What does a Z-score formula tell us?
The sign (+/-) of the z-score tells you if the original score is above or below the mean The value tells you how far above or below the mean the score is X - μ tells us how far is the score from the mean σ tells us stated as the number of standard deviations a score is from the mean?
What does the SD tell us in Z-scores?
The standard deviation tells us the average difference among scores in a distribution, or the average distance from the mean - we can use the standard deviation, along with the mean, to identify the location of a score in the distribution
Categorical variable
The university you attend is a good example of a categorical variable: students who attend the University of Sussex are not also enrolled at Harvard or UV Amsterdam, therefore, students fall into distinct categories.
When is it considered biased?
The variability of our sample is consistently lower than the variability of the population from which it was drawn. Because of this the sample variability is considered biased
Nominal
There are more than 2 categories - lowest level of measurement - mutually exclusive categories (everyone is in ONLY 1 category). - No natural order to the categories identified by names - if categories are assigned #'s, the #'s have no meaning - they are only labels - very few stat testing can be performed Qualitative ONLY No order EX: Political parties- Rep, Dem, Independent party, Tea Party Animals- cow, duck, and dog Whether someone is an omnivore, vegetarian, vegan, or fruitarian
What is the difference between reliability and validity?
Validity is whether an instrument measures what it was designed to measure, whereas reliability is the ability of the instrument to produce the same results under the same conditions.
What is a polygon?
Value are connected by a line Height of line shows frequency of that value Dot marks the spot where that value occurs Used for interval and ratio data
For the data below find the info that follows: 45 100 75 96 68 95 93 75 100 67 72 83 82 99 81 82 69 69 96 87 65 81 65 93 99 96 96 82 94 71 Find the Sample Variation
Var= √SD s2 = Σ(X -M)2 /n-1 Step 1: find the Sample variation (variance) by using the sample variance formula Step 2: We already have found the Mean = 82.5, So we need to list out the column for (X-M) and subtract the score with the Mean Step 3: Then Square the difference between each score and the mean Step 4: Add up the Squared differences Step 5: Since we already have found the n from earlier, which is 30. Then you take 30-1 = 29. So the n-1 = 29 Step 6: Plug in the numbers in the formula. Step 7: Then we take the answer we get from s2 (or sample variance) and take the square root to get the standard deviation 45- 82.5 = (-37.5)2 = 1406.25 95- 82.5 = (12.5)2 = 156.25 72- 82.5 = .... = 110.25 and so on to get a total of 5609.94 then divide by n-1 (which is 29) = 193.45 for the sample variance
Why do we use samples?
We are usually interested in populations, but because we cannot collect data from every human being (or whatever) in the population, we collect data from a small subset of the population (known as a sample) and use these data to infer things about the population as a whole
Measures of variability
We know that we can summarize our data by one of our measures of central tendency (usually the mean) This gives us our 'most representative' score But to get a true sense of our data we'd also like to know how far apart the scores in our database are spread
What is the measures of variability
We know that we can summarize our data by one of our measures of central tendency (usually the mean) This gives us our 'most representative' score But to get a true sense of our data we'd also like to know how far apart the scores in our database are spread
How do we calculate the variability of a distribution of scores?
We use Range- Highest score minus the lowest score
Z-scores
We used the mean and the standard deviation to describe the overall distribution, and we use this same information to describe any individual score within that distribution We can describe a score in terms of how it relates to the mean - is it above or below the mean? - how far above or below the mean is it? Example: You scored a 76 and the mean is 70, so your score is above the mean by 6 points This helps but it isn't enough to allow us to compare scores across different distributions -- how does this 6 point difference compare to a 6 point difference in other distributions? - does that put the score close to the mean or far from the mean compared to other distributions? What have we just learned about that will help us to determine if our score of 76 is close to or far from the mean in a distribution?
Inferential Stats
What do the numbers tell you. EX: Most of my students are moral. In order to draw conclusions about the population of interest based on a sample of data, a researcher computes inferential statistics consist of techniques that allow a researcher to study samples and then make generalizations about the populations from which the samples were selected
Positive skew
When the frequent scores are clustered at the lower end of the distribution and the tail points towards the higher or more positive scores, the value of skew is positive.
Continuous Variable
Whenever an exact count is tough to get. ** MEASURE**
Discrete Variable
Whenever it can be counted exactly **COUNT***
Descriptive Stats
Whenever numbers are referenced, consists of organizing and summarizing data. EX: 78% or .78 Also graphs and tables Once a researcher has gathered data, there are two types of general statistical procedures that can be run on the data - statistics that describe the data are statistical procedures used to summarize, organize, and simplify data
In a frequecy ditribution table are you able to determine ΣX?
Yes because you take the X and F and times it together , then add up the total ex: X= 2,3,5,6,8,10 f= 2,3,1,1,1,2 the fX= 2*2=4, 3*3=9, 5*1=5, 6*1=6, 8*1=8, 10*2=20 fX= 4+9+5+6+8+20 = 52 ΣX = 52
What is the formula for the sample z-score? What category?
Z = X - M s Standardized scores
What is the formula for the population z-score? What category?
Z = X - μ σ Standardized scores
Scores that have been covered to fit the Standard Normal distribution are called?
Z-scores Any value can be transformed into a z-score by using the formula Z= value- mean / standard deviation Z = X - μ/ σ
Binary variable
a categorical variable that has only two mutually exclusive categories (e.g., being dead or alive).
Predictive validity
a form of criterion validity where there is evidence that scores from an instrument predict external measures (recorded at a different point in time) conceptually related to the measured construct.
Histogram
a frequency distribution.
Central tendency
a generic term describing the centre of a frequency distribution of observations as measured by the mean, mode and median.
Quartiles
a generic term for the three values that cut an ordered data set into four equal parts. The three quartiles are known as the lower quartile, the second quartile (or median) and the upper quartile.
Frequency distribution
a graph plotting values of observations on the horizontal axis, and the frequency with which each value occurs in the data set on the vertical axis (a.k.a. histogram).
Skew
a measure of the symmetry of a frequency distribution. Symmetrical distributions have a skew of 0.
Hypothesis
a prediction about the state of the world
Normal distribution
a probability distribution of a random variable that is known to have certain properties. It is perfectly symmetrical (has a skew of 0), and has a kurtosis of 0.
Percentiles
a type of quantile; they are values that split the data into 100 equal parts.
Confounding variable
a variable (that we may or may not have measured) other than the predictor variables in which we're interested that potentially affects an outcome variable.
Continuous variable
a variable that can be measured to any level of precision. (Time is a continuous variable, because there is in principle no limit on how finely it could be measured.)
Discrete variable
a variable that can only take on certain values (usually whole numbers) on the scale.
Predictor variable
a variable that is used to try to predict values of another variable known as an outcome variable.
Outcome variable
a variable whose values we are trying to predict from one or more predictor variables.
What are the values for Sum of Squares and variance for the following population of N = 3 scores: 1, 4, 7 SS = 18 and variance = 6 c. SS = 66 and variance = 22 SS = 18 and variance = 9 d. SS = 66 and variance = 33
a. SS= 18 and variance =6 Step 1: mean = (1+4+7)/3 = 12/3 = 4 Step 2: subtract the mean and square Step 3: 9 + 0 + 9 = 18 = SS Step 4: Divide SS by N to get variance= (18/3 = 6)
Assume that mean height for adult women is 65 inches, and that the standard deviation is 5 inches. What is the z-score for a woman who is 60 inches tall? 75 inches tall? How tall is a woman whose z-score for height is -3? +1.7?
a.What is the z-score for a woman who is 60 inches tall? 75 inches tall? z=(60-65)/3; -5/5 = -1.00 z=(75-65)/5; z = 10/5; z = 2 How tall is a woman whose z-score for height is -3? +1.7? -3 = (X-65)/5; -15 = (X-65); 50=X 1.7 = (X-65)/5; 8.5=(X-65); X = 73.5
What is the Sum of Squares? What does it tells you what in a data set.
also known as the Sum of Squared Deviations It's called because it is the sum of the squared difference between each score and the mean. tells you how much total variability there is in the dataset.
Theory
although it can be defined more formally, a theory is a hypothesized general principle or set of principles that explain known findings about a topic and from which new hypotheses can be generated.
Variance
an estimate of average variability (spread) of a set of data. It is the sum of squares divided by the number of values on which the sum of squares is based minus 1.
Standard deviation
an estimate of the average variability (spread) of a set of data measured in the same units of measurement as the original data. It is the square root of the variance.
Independent variable
another name for a predictor variable. This name is usually associated with experimental methodology and is used because it is the variable that is manipulated by the experimenter and so its value does not depend on any other variables (just on the experimenter).
Dependent variable
another name for outcome variable. This name is usually associated with experimental methodology and is used because it is the variable that is not manipulated by the experimenter and so its value depends on the variables that have been manipulated.
Second quartile
another name for the median.
Sum of squared errors
another name for the sum of squares.
Variables
anything that can be measured and can differ across entities or across time.
When is Sum of Squares formula useful?
as a starting point because it explains the concept, you get the point that it's the sum of squared deviations from the mean. There is another formula that uses overall data, rather than individual-level data: SS = ΣX2 - (ΣX)2 N
A researcher records the change in weight during the 1st semester of college for each individual in a sample of 25 freshmen and calculates the average change in weight. The average is an example of a ________________ a.parameter c. variable b. statistic d. constant
b. Stat (because data about a sample)
Which is the best measure of central tendency to use when the distribution is strongly skewed? Mean Median Mode Either mean or median
b. median
Which measure of central tendency would be most appropriate for describing the amount of time participants spend solving a cognitive problem, with some of the participants unable to solve it Mean Median Mode None would be appropriate
b. median because the distributions would be skewed
Using letter grades to classify student performance on an exam is an example of a(n) ____________ scale of measurement a.Nominal c. Interval b. Ordinal d. Ratio
b. ordinal because it is ordered or ranked
Higher samples sizes are What?
better than Lower sample sizes
Which of these statistics is not part of the formula for a z-score? Mean Raw score Sample size Standard Deviation
c. sample size
What additional information is gained by measuring two individuals on an interval scale compared to an ordinal scale? Whether the measurements are the same or different The direction of the difference The size of the difference Whether the zero point is used
c. the sie of the difference because how much is one compared to another
A researcher is curious about the average monthly cell phone bill for high school students in Illinois. If this average could be obtained it would be an example of a ______________ a.sample c. Population b. statistic d. Parameter
d. parameter (measures the population of interest).
What scale of measurement is being used when a teacher measures the number of correct answers on a quiz for each student? Nominal c. Interval ordinal d. Ratio
d. ratio because it has a true zero point- 0 answers correct = none are correct
What is the final step to be performed in the mathematical expression (ΣX)2 Square each score Add the scores Add the squared scores Square the sum of the scores
d. square the sum of the scores (then sum the scores, then square that value)
Which measure of variability is in squared units? Range b. Standard deviation Semi-Interquartile range d. Variance
d. variance
Which of the following z-scores represents the location farthest from the mean? z = +.50 b. z = -1.00 z = +2.00 d. z = -2.35
d. z = -2.35
What do Z-scores do and why?
describing individual scores based on their location in the distribution We need to have a common way of describing and comparing scores no matter what sample or population they're drawn from Describing a score in terms of its location in the distribution will give us this common point of comparison e.g., I'm in the top 10% in English and the top 20% in Math
Categorial
entities are divided into distinct categories such as Binary, Ordinal, and Nominal
Validity
evidence that a study allows correct inferences about the question it was aimed to answer or that a test measures what it set out to measure conceptually (see also Content validity, Criterion validity).
Criterion validity
evidence that scores from an instrument correspond with (concurrent validity) or predict (predictive validity) external measures conceptually related to the measured construct.
Content validity
evidence that the content of a test corresponds to the content of the construct it was designed to cover.
Ecological validity
evidence that the results of a study, experiment or test can be applied, and allow inferences, to real-world conditions.
Qualitative methods
extrapolating evidence for a theory from what people say or write (cf. quantitative methods).
What is the mean and how do we tell if it's representative of our data
he mean is a simple statistical model of the centre of a distribution of scores. A hypothetical estimate of the 'typical' score. We use the variance, or standard deviation, to tell us whether it is representative of our d ata. The standard deviation is a measure of how much error there is associated with the mean: a small standard deviation indicates that the mean is a good representation of our data
A researcher is interested in the texting habits of high school females in Will County. The researcher randomly selects 300 female students and measures the number of text messages each sends and receives for a week, then calculates the average for all. What is the Population? WHat is the sample? What type of scale is number of text messages measured on? (nominal, ordinal, interval or ratio)?
high school girls 300 females ratio
A researcher wants to assess the effect of alcohol on the reaction time of adults age 21+. He assigns 120 adults age 21+ into one of three conditions: 1) no alcohol, 2) one 8-oz glass of alcohol, and 3) one 16-oz glass of alcohol. He then measures how quickly each person presses a buzzer after seeing a flashing red light. What is the independent variable? What is the dependent variable? Because the researcher is trying to generalize his results to draw conclusions about the population of interest, this is an example of what type of statistics? (descriptive or inferential?)
how much alcohol was had how they pressed the button speed inferential
Quantitative methods
inferring evidence for a theory through measurement of variables that produce numeric outcomes (cf. qualitative methods).
What is the Standard normal distribution?
is a normal distribution that has been standardized to have a mean of 0 and a standard deviation of 1. If we convert scores from any distribution to a standard normal distribution - so that the scores have a mean of 0 and a standard deviation of 1 - we can determine what percent of scores fall above or below any value
WHat is an outlier?
is a score that is substantially different (larger or smaller) than the other scores in the group or sample Notice in this example that the median stays the same even when the last value is multiplied by 100 X = 2, 3, 5, 7, 8, 10, 12 X = 2, 3, 5, 7, 8, 10, 1200 The 1200 is an outlier By contrast, the mean goes from 6.7 in the first group of scores to 176.4 in the second group of scores Because it's not susceptible to extreme scores, the median is the preferred measure of central tendency when the distribution of scores is skewed (remember the positively and negatively skewed distribution discussed earlier?), or when there are extreme scores in the distribution - for example, if you are measuring how long it takes a class of students to take a math test and one person forgets their calculator and so takes much longer than the rest
What is Variance?
is the average squared distance from the mean Notice that variance is a DISTANCE measure - it tells you how far away, on average, each observation is from the mean
Properties of the Mode
is the only measure of central tendency that has to be an actual score in the distribution and that is used the least. It's most useful for nominal data, where data is organized by categories and doesn't have mathematical properties (which makes the median and mean impossible to compute)
What is the Standard Deviation?
is the square root of the variance and provides a measure of the standard, or average, distance from the mean In order to be most useful for us, we would like our measure of variability to be in terms of the original units of measurement, Fortunately that's easy to do. We simply take the square root of the variance.
how can we easily summarize and simplify our data?
is to find the one value that most represents our data Each database will have several options for this 'most representative' score We can do this by using central tendency
Will adding the correction for variability from each sample = to the population variability
it will not But if we could take all possible samples from a population, and we made that correction for each sample, the average variance and standard deviation from all those samples would equal the variance and standard deviation of the population
What about larger sample sizes with its relation on the correction for bias?
lead to better uncorrected estimates of the population standard deviation (and variance) than smaller sample sizes
What does high variability mean?
means that the scores are quite different and are spread out
What does low variability mean?
means the scores are similar and are clustered closer together
What is Variability?
provides a quantitative measure of the difference between scores in a distribution and describes the degree to which the scores are spread out or clustered together
Transforming Z-scores
put your scores on a new distribution, 1 that has a different mean and/or different standard deviation from the original distribution For example, each year the scores from all students who take the GRE are transformed so that they look as though they came from a distribution with a mean of 500 and a standard deviation of 100 This allows for comparison of scores from yr to yr
What is the formula for the sample standard deviation when not estimating the population? and what category is it?
s = Σ(X -M)2 n variability when not estimating the population
What is the formula for the Sample standard deviation when estimating the population? and what category is it?
s = Σ(X -M)2 n-1 variability
What is the formula for Sample variance? And what category is it in?
s2 = Σ(X -M)2 n-1 Variability
One commonly used approach is to create a Frequency Distribution from the data A FREQUENCY DISTRIBUTION is an organized tabulation of the number of individuals located in each category on the scale of measurement
shows you how many people / observations are in each category for each variable
What does ΣX2 Stand for? Solve: ΣX2 x= 2,3,4,5
square each vale of X first, then sum them all together. ΣX =ΣX2 = 4+9+16+25 = 54
What's the difference between the standard deviation and the standard error?
tells us how much observations in our sample differ from the mean value within our sample. The standard error tells us not about how the sample mean represents the sample itself, but how well the sample mean represents the population mean. - is the standard deviation of the sampling distribution of a statistic. For a given statistic (e.g. the mean) it tells us how much variability there is in this statistic across samples from the same population. Large values, therefore, indicate that a statistic from a given sample may not be an accurate reflection of the population from which the sample came
What would happen if you drew a sample size of 100 from this population, and that sample contained 1 extreme score?
that extreme score would be offset by the other 99 non-extreme scores and so the sample standard deviation would be much closer to the population standard deviation
Reliability
the ability of a measure to produce consistent results when the same entities are measured under different conditions.
Falsification
the act of disproving a hypothesis or theory.
What additional information is obtained by measuring two individuals on an ordinal scale compared to a nominal scale? a. Whether the measurements are the same or different b. The direction of the difference c. The size of the difference d. Whether the zero point is used
the directuib of diff because is one value higher or lower than another
Measurement error
the discrepancy between the numbers used to represent the thing that we're measuring and the actual value of the thing we're measuring (i.e., the value we would get if we could measure it directly).
Interquartile range
the limits within which the middle 50% of an ordered set of observations fall. It is the difference between the value of the upper quartile and lower quartile.
How can we describe the extent to which all the scores in the distribution differ?
the mean of the distribution is our best guess of the 'average value' of that distribution It makes sense to use this average value as the anchoring value for our measure of variability,Our ideal measure of variability should indicate how far all the scores in the distribution differ from the average score in that distribution A.K.A the Variance
What would happen if you drew a sample size of 10 from this population, and the sample contained 1 extreme score?
the other 9 non-extreme scores would not be enough to offset that extreme value, and the estimated standard deviation would likely be quite far from the actual standard deviation
Randomization
the process of doing things in an unsystematic or random way. In the context of experimental research the word usually applies to the random assignment of participants to different treatment conditions.
Range
the range of scores is the value of the smallest score subtracted from the highest score. It is a measure of the dispersion of a set of scores. See also variance, standard deviation, and interquartile range.
Levels of measurement
the relationship between what is being measured and the numbers obtained on a scale.
What is the mode?
the score that occurs most often in the distribution of scores,The easiest way to determine it is to look at a frequency distribution or a graph of the scores
z-score
the value of an observation expressed in standard deviation units. It is calculated by taking the observation, subtracting from it the mean of all observations, and dividing the result by the standard deviation of all observations. By converting a distribution of observations into z-scores a new distribution is created that has a mean of 0 and a standard deviation of 1.
Upper quartile
the value that cuts off the highest 25% of ordered scores. If the scores are ordered and then divided into two halves at the median, then the upper quartile is the median of the top half of the scores.
Lower quartile
the value that cuts off the lowest 25% of the data. If the data are ordered and then divided into two halves at the median, then the lower quartile is the median of the lower half of the scores.
When we draw a sample from the population what does that tell you about the population variability?
we expect that most of the scores in our sample will be closer to the middle of the distribution than to the ends, since that's where the majority of the population values fall Therefore the sample variability will generally be smaller than the population variability, since the values in the sample will not deviate as far from the mean
What is the formula for the population Mean? And what category is it in?
μ= ΣX / N Central tendency
What is the formula for the population standard deviation? and what category is it? When would you use it?
σ = Σ(X -μ)2 N Variability when estimating population & Not estimating the population
What is the formula for Population Variance? And what category is it in?
σ2 = Σ(X -μ)2 N Variability