Social Stats
The distribution of heights of young men is approximately normal with mean = 68 inches and standard deviation = 2 inches. Which 2 heights will 95% of young men fall between?
64-72 inches 95% of a normal distribution falls between plus or minus 2 standard deviations of the mean. In this example, two standard devations = 4 inches. Therefore, 95% of young men will fall between heights of 64 and 72 inches.
Given a normal distribution, what percentage of scores fall within one standard deviation of the mean?
68% of scores fall within one standard deviation of the mean. (95% within two standard deviations; and 99.7% within three standard deviations.)
nominal variable
A categorical variable, also called a nominal variable, is for *mutual exclusive*, but not ordered, categories. For example, your study might compare five different genotypes. You can code the five genotypes with numbers if you want, but the order is arbitrary and any calculations (for example, computing an average) would be meaningless. (e.g., sex, religion, disease, social class, species, etc); variation ratio is the only "level of dispersion" choice for nominal
SPSS cross tabs
A cross-tabulation (or crosstab for short) is a table that depicts the number of times each of the possible category combinations occurred in the sample data. To create a crosstab, click Analyze > Descriptive Statistics > Crosstabs. A Row(s): One or more variables to use in the rows of the crosstab(s).
Finding the Median - how?
arrange the scores from lowest to highest, and the middle score is the median. if you have an even set, add them and divide by 2! donezo
how to calculate percentage frequency
divide the total frequency by the total number of results and multiply by 100.
Frequency
is the number of times a particular value occurs in a set of data. Usually we would record the frequency of data in a frequency table.
What is the highest point of a frequency polygon?
the mode. In a frequency polygon, the height of each point corresponds to the number of scores at a particular value. Therefore, it is at its highest point at the score that occurs most frequently in the distribution. This is the mode.
Variation Ratio (v)
the number of cases in the mode, divided by the total number of cases • The proportion of cases not in the modal category • Used for nominal-level variables. *variation ratio is the only measure of dispersion for nominal*
The standard deviation of a distribution is...
the square root of the variance
when the mean is higher than the standard deviation that means...
there is low variability M (ary) SD L (orne)
Variation Ratio. *Find the MODE first - line up the numbers and whichever is the most occurring number is the MODE. with Interval-ratio you find the most occurring score, with nominal and ordinal you find the highest number or %*
v = 1 − (fm/n) fm = the number of cases in the mode n = the total number of cases
Population Variance and Standard Deviation vs Sample Variance and Standard Deviation
we use the population variance and standard deviation to find the spread of the score. a high number means there is lots of variation, and of course, a low number means everything is squished together
x avg (Xbar) n
x = one value in your set of data avg (Xbar) = the mean (average) of all values x in your set of data n = the number of values x in your set of data
Nominal levels
• Allow for only qualitative classification • Scores are different from each other but cannot be treated as numbers • Examples: • Sex • 1 = Female, 2 = Male • Immigrant Status • 1 = Canadian-born, 2 =Foreign-born • Grades • 1 = Pass, 2 = No pass (fail)
3. What is your approximate height in centimeters?
• Continuous • Interval-ratio it is continuous because you can infinitely divide height. it is interval ratio because there is meaningful difference between the numbers in height
1. What is your age in years?
• Continuous • Interval-ratio it's interval ratio because there is meaningful difference between the intervals. it is continuous because you can divide numbers infinitely.
5. How many siblings do you have?
• Discrete • Interval-ratio
10. Which federal political party do you most identify with? • Bloc Quebecois • Conservative Party of Canada • Liberal Party of Canada • New Democratic Party (NDP) • Green Party • Other
• Discrete • Nominal
6. Of the following three choices, which type of animal would you prefer for a pet? • Otter • Hedgehog • Sloth
• Discrete • Nominal
7. Would you rather be attacked by a big bear or a swarm of bees?
• Discrete • Nominal
2. How would you describe the gender that you most identify with? • Male • Female • Other
• Discrete • Nominal it is nominal because there is no order and there is no meaning between the scores. it is discrete because it is concrete as you cannot divide gender.
8. How often do you pay with cash (compared to a credit or debt card) when making a purchase? • Never • Rarely • Sometimes • Frequently • Always
• Discrete • Ordinal
9. How likely is it that intelligent life exists on other planets? • 1 - Not at all likely • 2 - Not very likely • 3 - Somewhat likely • 4 - Very likely
• Discrete • Ordinal
2A - conveying data and findings. vocabulary, formulas
...
2B - Measures of Central Tendency and Dispersion, Vocabulary, Formulas
...
2C - Measures of Central Tendency and Dispersion, Vocabulary, Formulas
...
Identify whether each question will produce: • A discrete or continuous variable and • Whether the level of measurement should be nominal, ordinal, or interval-ratio • And be able to explain why
...
The variables that we collect in our data can be (1) independent or dependent, depending on our analysis; (2) discrete or continuous; and (3) nominal, ordinal, or interval-ratio
...
Many schemes used to classify variables including:
1 Independent or dependent variables 2 Discrete or continuous variables 3 Nominal, ordinal, or interval-ratio variables (or levels of measurement)
what to select for nominal, ordinal and interval ratio levels of descriptive statistics
1. Nominal data - select this if your values are categories (e.g., sex, religion, disease, social class, species, etc); 2. Ordinal data - select this if your values are a series of ranks, such as in the case of a motor racing result (first, second, third, fourth, etc) or class standing. 3. Interval/ratio data - select this if your values are numerical, where each interval (e.g., 1 metre, 1 second, 1 inch, 1 correct answer, etc) is the same size, no matter where it is located on the scale (consider, for example, that 60 inches is exactly 10 inches longer than 50 inches, and that this difference is the same length as the distance between 30 inches and 20 inches). **Nominal- labeling categories (Male-1 female-2 ... Jewish-1 protestant-2 ... Jewish-3) **Ordinal- Greater vs lesser (SES-upper,middle,lower ... AGE-old,middle aged,young ... GPA-high,moderate,low) **Interval/Ratio- numerical values (actual amount if annual income)
normal distribution/skewness
A normal distribution of data means that most of the examples in a set of data are close to the "average," while relatively few examples tend to one extreme or the other. The *mode* is always at the height - the highest point on a distribution, even if skewed. The *MEAN* is most affected by outliers and therefore will be the smallest number always, which leave the median in the middle, between the two (most often). a skewness (tailing off to the right (lowest side)) is a positive skew. https://www.youtube.com/watch?v=xpbYKaEbcPA
ordinal variable
A ordinal variable, is one where the *order matters* but not the difference between values. (first, second, third, fourth, etc) For example, you might ask patients to express the amount of pain they are feeling on a scale of 1 to 10. A score of 7 means more pain that a score of 5, and that is more than a score of 3. But the difference between the 7 and the 5 may not be the same as that between 5 and 3. The values simply express an order. Another example would be movie ratings, from * to *****.
ratio variable
A ratio variable, has all the properties of an interval variable, and also has a *clear definition of 0.0. When the variable equals 0.0, there is none of that variable*. Variables like height, weight, enzyme activity are ratio variables. *Temperature, expressed in F or C, is not a ratio variable.* A temperature of 0.0 on either of those scales does not mean 'no heat'. However, temperature in Kelvin is a ratio variable, as 0.0 Kelvin really does mean 'no heat'. Another counter example is pH. It is not a ratio variable, as pH=0 just means 1 molar of H+. and the definition of molar is fairly arbitrary. A pH of 0.0 does not mean 'no acidity' (quite the opposite!). When working with ratio variables, but not interval variables, you can look at the ratio of two measurements. A weight of 4 grams is twice a weight of 2 grams, because weight is a ratio variable. A temperature of 100 degrees C is not twice as hot as 50 degrees C, because temperature C is not a ratio variable. A pH of 3 is not twice as acidic as a pH of 6, because pH is not a ratio variable.
If the standard deviation of a distribution is exactly zero...
All the scores are exactly the same. There is no deviation because the scores are all the same. To deviate is to "depart from". And in this case, there is no deviation.
confidence intervals (Z) 90% 95% 99% 99.9%
Alpha 90% = .10, Z score = 1.65 95% = .05%, Z score = 1.96 99% = .01, Z score = 2.58 99.9% = .001, Z score = 3.29
4. Approximately how many hours per week do you spend studying?
Continuous • Interval-ratio
Cumulative Frequency Distribution
Cumulative Frequency Distribution Definition. Technically, a cumulative frequency distribution is the sum of the class and all classes below it in a frequency distribution. *All that means is you're adding up a value and all of the values that came before it*
Cumulative frequency
Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value in a data set. ... The cumulative frequency is calculated by adding each frequency from a frequency distribution table to the sum of its predecessors. so, take the first score and add them together, then take that score and add it to the next score, just keep adding the scores together. frequency is the number of times a particular value occurs in a set of data. Usually we would record the frequency of data in a frequency table.
Discrete and Continuous
Discrete variables: measured in units that cannot be subdivided • Examples: marital status, country of birth Continuous variables: measured in units that can be subdivided (often infinitely) • Examples: age in years, income in dollars
how to calculate the standard deviation
For each value x, subtract the overall avg (x) from x, then multiply that result by itself (otherwise known as determining the square of that value). Sum up all those squared values. Then divide that result by (n-1). Got it? Then, there's one more step... find the square root of that last number. That's the standard deviation of your set of data. Now, remember how I told you this was one way of computing this? Sometimes, you divide by (n) instead of (n-1).
What does variance measure?
How far a set of numbers are spread out from their mean.
Range
Range (R) = High Score - Low Score • Can be used with ordinal or interval-ratio variables • Limitations because based on only two scores
Range
Range = Maximum value - Minimum value Therefore, the range of the four random variables (3, 5, 5, 7} would be 7 - 3 or 4. Not a very good measurement because it does not account for outliers. A score range could have a major outlier which affects the accuracy of the calculation.
cumulative percent
The cumulative percent is the sum of all the percentage values up to that category, as opposed to the individual percentages of each category.
What is the Interquartile range?
The distance between the first and third quartile. The interquartile range is calculated by subtracting Q1 from Q3, which makes it the distance between the first and third quartile.
Interquartile range
The interquartile range (Q) is a measure of variability, based on dividing a data set into quartiles. OR The difference between the largest and smallest values in the middle 50% of a set of data. Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. Q1 is the "middle" value in the first half of the rank-ordered data set. Q2 is the median value in the set. Q3 is the "middle" value in the second half of the rank-ordered data set. The 3rd quartile is the *75th* percentile. The interquartile range is equal to Q3 minus Q1.
mean, median, mode
To find the *mean*, add up the values in the data set and then divide by the number of values that you added. To find the *median*, list the values of the data set in numerical order and identify which value appears in the middle of the list. To find the *mode*, identify which value in the data set occurs most often.
standard deviation
a quantity calculated to indicate the extent of deviation for a group as a whole. also represented by the Greek letter sigma σ
Population Variance
Variance • The average of squared deviations from the mean 1. calculate the mean! (add the scores on the left, then divide by number of cases) 2. then subtract mean from each score (listed on left) 3. square them ("deviations") 4. add them 5. divide by number of cases 6. to find standard deviation, square the variance
Central Tendency and Dispersion
We use Central Tendency to measure or find out the midpoint of a score or statistic. For example, we use MODE to find the most likely variable (ie, Roman Catholic (we always say what the category is instead of listing the number or percentage) We use median to describe the midpoint (ie: the midpoint for this variable is 25.1%) We use mean to describe the average or the middle number of the variable. We use dispersion to find out the spread of a score. For positive and negative skews, it goes: MODE, MEDIAN, MEAN beause MEAN is affected by outliers -In a negative skew, the tail is to the right: A distribution with a few extremely low values -In a positive skew, the tail is to the left: A distribution with a few extremely high values
Mean and Median (measures of central tendency)
When statisticians talk about the mean of a population, they use the Greek letter *μ* to refer to the mean score. When they talk about the mean of a sample, statisticians use the symbol *X* to refer to the mean score. the MEAN is achieved by adding the scores and dividing them by the total number of scores. The mean can be used with both discrete and continuous (age in years, income in dollars) data, although its use is most often with continuous The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. therefore it is best to use the median to calculate data, or use both together to see how it is skewed. The median may be a better indicator of the most typical value if a set of scores has an outlier. An outlier is an extreme value that differs greatly from other values. However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency. You get the median by arranging the numbers from lowest to highest and then finding the middle number. if there is an EVEN set, add those two numbers and divide by 2.
Does all data have a median, mode and mean?
Yes and no. All *continuous* (age in years, income in dollars) data has a median, mode and mean. However, strictly speaking, *ordinal* data has a *median and mode* only... and *nominal* data has only a *mode* (no order to the scores and there is no difference between scores - yes and no type answers). which makes sense :)
x, y
____ is along bottom, ____ is along the side
Standard Deviation for sample
square the variance
What is true about a perfectly symmetrical normal distribution is true?
the mean, median and mode are all the same value.
Interquartile Range
• Distance from the third quartile (Q3) to the first quartile (Q1) Subtract Q1 (assign as lowest quartile always) from Q3 (assign as highest). EVEN - first identify mid-point then find middle for first quartile, then the third and then subtract Q1 from Q3. ODD set of numbers - identify middle number and cross out (don't use), then take median (average number of middle two, or whatever is the middle, then subract. ie: 11, 11, 8, *7*, 5, 4, 3 (cross out 7), then add 11 + 8 (which equals 19) and, 5 + 4 (which equals 9). Subtract Q1 (9) from 19 (Q3) this equals 10! easy peasy lemon squeezy. Q1 = 9 Q2 = 19 Q3 = *10*
Variance and Standard Deviation
• Measure the degree of dispersion of the data from the mean • Measures of variation for interval-ratio variables • Variance • The average of squared deviations from the mean • Standard deviation • The square root of the variance
Levels of measurement
• Nominal • Ordinal • Interval-ratio
Interval-Ratio Level
• Scores are actual numbers and have equal intervals between them • *Can be discrete or continuous* • Examples • Age (in years) • Income (in dollars) • Number of children
Ordinal level variables
• Scores can be *ranked* from high to low or from more to less • Survey items that measure opinions and attitudes are typically ordinal • If you can *distinguish between the scores of the variable using terms such as "more, less, higher, or lower"* the variable is ordinal • Examples: • Students at a university were asked "Do you agree or disagree that smoking should be banned on campus?" • 1 = Strongly disagree, 2 = Disagree, 3 = Agree, 4 = Strongly agree • Grades • 5 = A, 4 = B, 3 = C, 2 = D, 1 = F
Standard Deviation for population
• Standard deviation • The square root of the variance to find standard deviation, square the variance
Measures of Dispersion or Variability
• The variety, diversity, and amount of variation between scores • Variation ratio • Range and interquartile range • Variance and standard deviation
• Measures of Dispersion
• Variation Ratio • Range • Interquartile Range • Variance and Standard Deviation