Statistics & Research Methods Exam One
When box plots are used to display means they also provide information about the interquartile ranges and outliers?
False
What are the steps in sum of squares?
1. First add all of your X's together 2. Square each deviation score 3. Now sum all of the squared deviation scores
In the computational formula for the sum of squares what are the steps?
1. First calculate the sum of all of the X scores 2. Next we square each score and add the squared scores together to get a sum 3. Finally, we substitute these values along with N (the number of cases) into the computation formula and solve for the sum of squares
3 rules for presenting grouped data
1. How many groups should be reported? 2. What should the interval size be for each group? 3. What should be the beginning value for the lowest interval? No start rules govern these issues, but these are central questions to ask, and these have specified answers that are particular to the textbook but not necessarily to all cases.
Steps of finding the interquartile range
1. Label the numbers from lowest to highest 2. Section the scores into four quartiles of three scores each 3. Eliminate the bottom most and top most groups to be left with just the middle groups 4. Now calculate the range from the highest score in the middle group to the lowest score in the middle group
Identify each of the following as a quantitative or qualitative variable. 1. age 2. gender 3. eye color
1. Quantitative 2. Qualitative 3. Qualitative
Rounding a mean to ____ or _____ decimal places provides a better measure of where scores for a discrete variable tend to cluster, and is more useful than a crude index that rounds the mean to the nearest integer.
2, more
What is the upper real limit of 20?
20.5
What can a box plot also be referred to as?
A box and whiskers plot
What is another type of graph for displaying information about central tendency and variability?
A box plot
X with a bar over it is typically used to represent what?
A sample mean
Any outliers would be indicated by dots above or below the whiskers as is done with box plots of what type?
All types of box plots
The standard deviation thus represents?
An average deviation from the mean
Why does the median qualify as a measure of central tendency?
Because it is closer to the scores in a distribution than any other value in an absolute sense
Why does the mode qualify as a measure of central tendency?
Because it represents a typical case
If you want to make the smallest amount of absolute error across all scores your best guess would be to use the median, why?
Because you are worried about the SIZE of the error, and the median minimizes absolute or unsigned error across all scores.
Steam and Leaf plot
Best for small sets of data. Uses the digits to the left of the rightmost digit to form the stem. Each rightmost digit forms a leaf; Same for the leftmost. Use when the number of values of the base is more than 1 or but less than 20. These plots provide a compact way of conveying both individual scores and the general "shape" of a frequency distribution.
In a box plot graph how is the IQR represented?
By the height
Distributions of scores can have identical variability, but very different ______ _______
Central tendencies
Bar Graphs
Charts that represent information using a series of vertical or horizontal bars
A line plot is constructed exactly like a frequency polygon except that it is not " __________"
Closed
Absolute frequencies
Commonly referred to simply as "frequencies", is a statistical term describing the number of times a particular piece of data or a particular value appears during a trial or set of trials
Third Approach to computing the median DUPLICATION OF MIDDLE SCORES
Computation of the median when there are duplications of the middle score(s). This formula is based on the assumption that the median occurs within the real limits of the middle score(s), and it is applicable regardless of whether there is an even or odd number of scores. 1. Order the scores from lowest to highest 2. Apply the approach for when there is an even number of scores (1st approach) 3. Think real limits, the median must therefore be defined so that two more scores are less than it 4. We specify a score two-thirds or .67 X greater than the lower real limit 5. Plug into the median formula 6. Solve for answer ex: given 8,6,9,6,10,8,9,6,10,8 1. 6,6,6,8,8,8,9,9,10,10 2. median = 8 3. URL = 8.5, LRL = 7.5 4. 7.5 + .67 =8.17 5. MDN = 7.5 +[ (10) (.50) -3 / 3] (1.0) 6. ANSWER = 8.17
First Approach to computing the median EVEN NUMBERED
Computation of the median when there is an EVEN number of scores. 1. Order the scores from lowest to highest 2. The median is the average of the two middle scores, and then divided by 2. 3. Solve equation EX: given 20,26,25,27,22,18 1. 16,18,20,22,25,26,27 2. (20+22)/ 2 = 21 3. answer = 21
Second Approach to computing the median ODD NUMBERS
Computation of the median when there is an odd number of scores 1. First, order the scores from lowest to highest 2. The median would simply be the middle score, however now the problem is what to do with the middle score, and the answers lies within the REAL LIMITS 3. Take the real limits of the number, and divide the interval in half to define the median 4. Solve ex: given 6, 16, 12, 14, 6, 8, 16 1. 6,6,8,12,14,16,16 2. 12 (URL = 12.5, LRL - 11.5) 3. Answer = 12
Frequency polygons are typically utilized when the variables being reported are _________ in nature, whereas frequency histograms are typically utilized when the variables being reported are ________.
Continuous, discrete
Clearly there will typically be much greater variability in the first group (all men in America) rather than the second group (college aged men) because the individuals who comprise this group are much more diverse on characteristics such as age, nationality, gender, and education, and that should influence many different attitudes and behaviors. In fact, individual differences of this type are an important type of ______ ________
Disturbance Variable
If there were no _____ ______, the scores within a given group would all be the same. The greater the role of ______ ______, the more variability there will be, and this will be reflected in things such as a larger variance and standard deviation.
Disturbance Variables
The fact that members of a given group have different standing on a particular dimension is due to ________ _______.
Disturbance varaibles
The median is the point that...
Divides the distribution into halves
Maximum variability occurs when the number of individuals is _______ ______ among the possible categories. When there are for example an equal amount of boys to girls.
Evenly divided
T or F: These deviation scores sum to ONE. If any value other than the mean were subtracted from the set of scores, the sum of the signed deviation scores would be greater than ONE in absolute value. This is true for any set of scores: the sum of signed deviations from the mean will always equal ONE.
FALSE, they will always sum to ZERO
T or F: The sum of a set of signed deviations around the mean will always equal 1.00
False
T or F: Measures of variability can help to interpret measures of Standard deviations?
False, it is that measures of variability can help to interpret measures of CENTRAL TENDENCY
A stem and leaf plot is similar to a
Frequency histogram turned on its side
It is traditional to denote indexes derived from populations with what kind of letters?
Greek letters
The symbols for population variance and population standard deviation are ?
Greek sigmas
Platykurtic Distribution
Has a flat peak and short, steep tails compared to a normal distribution
Leptokurtic Distribution
Has a sharp peak, and long, flat tails compared to a normal distribution
Frequency histograms and frequency polygons can be constructed for grouped as well as ungrouped scores. When the scores are grouped the abscissa might list the midpoints of the score rather than the _______ _______ _______
Individual score values
Usually we are not interested in how the scores themselves vary, but rather what?
Instead, how the phenomenon that the scores supposedly represent varies.
Mean
It is simply the arithmetic average of the scores that. is, the sum of all the score in the data set, divided by the total number of scores. Written formally as X = Sigma X / N
A _______ in the body of the graph identifies the groups
Legand
Distributions are often said to be ________ or _______ relative to a normal distribution
Leptokurtic, platykurtic
_____ ______ are particularly useful when one or more research participants receive the lowest or highest score possible on a particular measure.
Line plots
Formula for computing median
MDN = L [ (N) (.50) - nL / nW] (i) MDN = Median L = Lower real limit of the category that contains the median N = total number of scores in the distribution nL = the number of scores that are less than L nW = the number of scores that are within the category that contains the median i = the size of the interval of the category that contains the median
The ________ is the arithmetic average of a set of numbers.
Mean
What measure of central tendency is more influences by extreme scores?
Mean
Because the _______ is sensitive to extreme scores. variables like income that tend to include a meaningful percentage of very low or very high scores are often reported in terms of modes, and especially medians rather than ________.
Mean, means
The frequency histogram tends to highlight?
The frequency of specific scores rather than the entire distrubtion
If we subtract the ______ from each score and retain the signs of the resulting differences, we get a set of ________ deviation scores
Mean, signed
The most frequently used descriptive statistics are the, _____, _____, _______
Mean, variance, and standard deviation
Kurtosis
Measure of the fLatness of the tails of a probability distribution relative to that of a normal distribution. Indicates likelihood of extreme outcomes. It also reflects how long and flat the peaks are relative to the tails.
Quantitative Measures
Measures that permit expression of various amounts of something, such as a trait. Variables that are measured on an ordinal, interval or ratio level.
The closest concept to central tendency for a qualitative variable is the ______ _______, that is, the category that occurs most frequently
Modal category
Which of the following measures is appropriate for use with qualitative variables?
Modal category
If you were to want the highest probability of being exactly correct, your best guess would be to use the value of the _____ because it is the most frequently occurring score and has the highest probability of predicting each score exactly.
Mode
When a quantitative variable is measured on an ordinal level, from interval characteristics, the mean is NOT an appropriate index of central tendency, and the _____ or _____ should be used instead.
Mode, median
What are the three measures of central tendency?
Mode, median, and mean
Can any of these measures every be negative?
No
What are the four types of measurement?
Nominal, ordinal, interval, ratio
One very important type of theoretical distribution that is being studied by statisticians is the ______ ________
Normal distribution
What is one problem with the sum of squares being used as an index of variability?
One problem with the sum of squares as an index of variability is that its size depends not only on the amount of variability among scores, but also on the number of scores.
When the abscissa is broken by a double slash, this should be used anytime the abscissa "jumps" from zero to a larger number such that it is not drawn to scale. And the same holds true for the ______
Ordinate
A case that shows very extreme scores relative to the majority of the cases in the data set is known as an
Outlier
In probability, P = ? and A = ?
P= probability A= the given event
In a graph of distribution, the modal score has the highest "_______" in the graph?
Peak
Frequency graphs can also be constructed for ________ variables
Qualitiative
When a ___________ variable is measured on a level that at least approximates interval characteristics, all three measures of central tendency are ________.
Quantitative, meaningful
The frequency of a score in comparison to the total number of scores in the group is called a
Relative frequency
S squared, and S are symbols for what?
Sample variance and sample standard deviation
Sample
Simply a subset of a population
What are two other dimensions on which distributions of scores can differ?
Skewness and Kurtosis
Why would you prefer to use the IQR as opposed to the range?
Some researchers prefer to use the IQR rather than the range because the IQR is not sensitive to distortions from extreme cases. Unlike the range, the IQR is not biased by one extreme score.
How are line plots sometimes structured?
Sometimes, line plots are structured so that each vertical line encompasses a full standard deviation.
Standard Deviation Formula
Square root of the variance
The ______ _______ is the most easily interpreted measure of variability among a set of scores. Recall that the variance is the mean squared deviation score.
Standard Deviation
The "typical" sample has a relatively small ______ ________
Standard deviation
The vertical lines on a line plot convey information about the _______ associated with each mean
Standard deviation
Deviation scores
Such scores are calculated by subtracting some constant from each score in a set of scores. These deviations are said to be unsigned when their absolute values are taken.
One index of variability that considers all of the scores in a data set is the ____ __ _______?
Sum of squares
The standard deviation is found by?
Taking the positive square root of the variance
Median
The ______ is the point in the distribution of scores that divides the distribution into two equal parts. In other words, 50% of the scores occur below the _______ and 50% of the scores occur below the ______. When it is used as a measure of central tendency it is displaying the "middlemost" as a representative value for the set of scores.
Mode
The ______ of a distribution of scores is the most easily computed index of central tendency. It is simply the score that occurs most frequently.
What is the interquartile range?
The difference between the highest and lowest scores (hence, it is a range) after the top 25% of the scores and the bottom 25% of the scores have been trimmed or eliminated from the data set. In short, it is the range of the middle 50% of the scores.
What is the symbol for a population mean?
The greek m, or "mu"
What is the balance point, and that is its reason for being a measure of central tendency?
The mean
If your goal was to make the signed error as close as possible to zero, then what would your best measure of central tendency to use be?
The mean, because across all scores, the sum of signed error from the mean will always equal zero.
With graphs of central tendency and variability what does the set up look like?
The names of the levels of the independent variable appear on the abscissa and each of the various levels of this variable is represented by a bar that extends to the heights on the ordinate that corresponds to the mean score on the dependent variable. Unlike frequency graphs, it is not necessary for the ordinate for this type of graph to start at zero.
To avoid a potential problem or inconsistency, measures of variability should take into account what?
The number of cases in the data set.
Dependent Variable
The outcome factor; the variable that may change in response to manipulations of the experiment. It is the variable being tested and measured in an experiment, and is '__________' on the other variable.
Standard Deviation
The positive square root of the variance is denoted by the lower case letter S
Under what condition is the range a misleading index of variability?
The range if a misleading index of variability when there is an extreme score in a set of scores that are otherwise similar to each other.
The frequency polygon tends to highlight what?
The shape of the entire distribution more so than does the frequency histogram
What do the parts of a box plot represent?
The small square in the middle of the rectangular box represents the mean, the top of the tob represents one standard deviation above the mean, and the bottom of the box represents one standard deviation below the mean. The "T's" extending away from the box which are referred to as "whiskers" are the criteria that are used to define outliers. The solid dots above and below the whiskers reflect outlying scores.
Why is the standard deviation more "interpretable" than the variance ? That is, what is the advantage of reporting statistics in terms of the standard deviation as opposed to the variance?
The standard deviation is more "interpretable" than the variance because it represents an average deviation from the mean in the original unit of measurement. In contrast, the variance is in terms of squared deviation units.
In skewed distributions, the three measures of central tendency will take on what type of values?
The three measures of central tendency all take on different values in skewed distributions
What is the difference between parameters and statistics?
There are various numerical indexes that are based on either population or from samples, when such indexes are based on data from an entire population, they are referred to as parameters.When they are based on data from a sample they are referred to as statistics.
What is the major problem with mode?
There can be more than one modal score, these can be referred to as multimodal or if it only has 2 then it would be bimodal
What do standard deviations and other measures of variability have?
They have the potential to cause real world implications
For quantitative variables what do measures of central tendency indicate?
They indicate where the scores tend to cluster in a distribution
What does the notation above the summation sign tell us to do?
To add through that given individuals number
What does the notation below the summation sign tell us to do?
To start with that individual at that given number
T or F: (for rounding rule ONE) If the remained to the right of the decimal place that you wish to round to is greater than 1/2 a measurement unit, increase the last digit kept by 1.
True
T or F: A negatively skewed distribution will be when the "tail" is towards the left, or is facing the negative end of the abscissa. Most scores in negatively skewed distributions occur above the mean and only a relatively few extreme scores occur below it. This is reflected in the fact that the mean of a negatively skewed distribution will always be smaller than the median.
True
T or F: Because the measure is only ordinal, the standard deviation of those scores does not help us appreciate how much variability there is on underlying dimensions.
True
T or F: By taking the square root of variance, we are in essence, eliminating the square and returning it to the original unit of measurement.
True
T or F: Central tendency and variability represent different characteristics of a distribution
True
T or F: In terms of variability for qualitative variables, we cannot meaningfully define a range because the categories cannot be ordered to define a low and a high score.
True
T or F: It is that property- the fact that the median minimizes the absolute difference between it and the scores in the distribution - that qualifies the median as a measure of central tendency
True
T or F: It is understood that X has an, i subscript, that i = 1 applies below the summation sign, and that N applies above it?
True
T or F: Positively skewed is when the "tail" is toward the right, or positive end of the abscissa. In positively skewed distributions, most scores occur below the mean and only a relatively few scores occur above it. Thus, the mean will always be larger than the median in a positively skewed distribution.
True
What is the general decimal place that you should round to?
Two decimal places.
In graphs of central tendency and variability what do the bars look like?
Usually bars for the different groups do not touch one another to indicate that the means are from different sets of scores, however sometimes the bars are allowed to touch if the independent variable is quantitative but they keep the bars from touching if the independent variable is qualitative.
Distributions can also have identical central tendencies but very different _________
Variability
A value of 0 for these statistics means that there is no ________ in the scores; they are all the same. As the values of the three statistics become increasingly greater than 0, more ______ among the scores is indicated, other things being equal.
Variability, variability
To divide the sum of squares by N- that is, to compute an average squared deviation from the mean would give you the?
Variance
Is the standard deviation a type of average?
Yes
Can box plots be used to represent anything aside from means and standard deviations?
Yes, they can also be used to present medians and interquartile ranges. In fact, box plots were originally designed to represent the median and IQR.
What do you need to do in order to get the unsigned deviations from the median?
You must subtract the median from each score and take the absolute values of the resulting differences
The bars do not touch one another because each bar in a ____ _____ represents a distinct variable
bar graph
The use of numerical information to describe a group of scores in a clear, and precise manner is referred to as _______ statistics.
descriptive
If we let the "typical" score be represented by the mean, then we are concerned with what in the sum of squares?
how much each score deviates from the mean
Variability
in a set of numbers, how widely dispersed the values are from each other and the extent to which they vary from the mean
When the mean and median are different, the sum of the squared deviations from the mean will always be ______ than the sum of the squared deviations from the median
less
The more extreme a score is relative to the other scores in a distribution, the more it will alter the ______
mean
The ______ is the value that minimizes the sum of _______ deviations
mean, signed
The ________ is the value that minimizes the sun of ________ deviations
median, unsigned
The ______ is the most frequently occurring score in a distribution
mode
If all individuals fall into a single category for qualitative variables there is ___ _____
no variability
Formula for Probability
p(a) = Number of observations favoring event A / total number of possible observations
Numerical indexes derived from population data are __________; numerical indexes derived from sample data are _________
parameters, statistics
A second reason for preferring the mean concerns the important role that the means and their sums of squares play in making inferences about?
populations from sample data
The simplest index of variability is the ________, which is the highest score minus the lowest score in a distribution
range
If a variable is continuous, the vertical boundaries of the bar for a score represents the _____ ______ of the score
real limits
Skewness
refers to the tendency for scores to cluster on one side of the mean, or, one of the "tails" of the distribution relative to the central section is disproportionate compared to the other tail.
Random sampling is a procedure for generating ________ samples.
representative
Variance Formula
s ^2 = SS / N
Stem and leaf plots provide a compact way of conveying both individual scores and the general " ________ " of a frequency distribution
shape
In an investigation on the effect of religious upbringing on moral development, moral development is the
the dependent variable
The sum of squares, the variance, and the standard deviation will always be greater or equal to _____
zero
Σ X(i) ^2 =
Σ X ^ 2
Other summation expressions require mathematical operations be applied to each score before the individual results are added together, this means that each X score should first be squared and then summed, what does this equation look liked?
Σ(x)^2 = (X1)^2 + (X2)^2
(Σ X(i) ) ^2 =
(Σ X) ^2
A third summation expression that is not the same as expression 1.2 is because the parenthesis signal that the summation operation should be executed first (that is, that the X scores should be summed) and then this sum should be squared, what does this equation look like?
(ΣX) ^2 = (X1 + X2 + X3)^2
Frequency Distribution
A useful tool for summarizing a large set of data. This is a compilation of all of the scores in a set of scores and the number of times that each occurs. These distributions are often presented in the form of frequency tables.
The ordinate lists _______ ________ from 0 to the highest frequency that was observed in the study
Frequency Values
What does the X(i) to the right of the summation sign tell us to do?
General term that stands for the individual X scores
Round each of the following numbers to three decimal places a. 4.8932 b. 8.9749 c. 1.4153 d. 4.1450 e. 6.245002 f. 2.615501 g. 6.3155
a. 4.893 b. 8.975 c. 1.415 d. 4.415 e. 6.245 f. 2.616 g. 6.316
Identify each of the following as a variable or a constant. Explain your answer 1. The number of hours in a day 2. People's attitudes toward abortion 3. The country of birth of presidents in the United States 4. The value of a number divided by itself 5. The total number of points scored in a football game
1. A constant because there are always 24 hours in a day 2. A variable because different people have different attitudes toward abortion 3. A constant because all presidents of the United States must be born in the United States 4. A constant because a number divided by itself always equals 1.00 5. A variable because different months have different numbers of days
Identify each of the following as a quantitative or qualitative variable. 1. weight 2. religion 3. income
1. Quantitative 2. Qualitative 3. Quantitative
Indicate whether each of the following variables is discrete or continuous 1. grains of sand on a beach 2. height 3. the annual federal budget 4. shyness
1. discrete 2. continuous 3. discrete 4. continuous
Frequency histogram
A bar graph that represents the frequency distribution of a data set. The horizontal dimension of this graph is referred to as the X axis or as the abscissa, and the vertical dimension is referred to as the Y axis or the ordinate
Frequency polygon
A graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connect by straight lines. Similar to a frequency histogram except that solid dots correspond to the appropriate frequencies and are placed directly above the score values, the dots are connected with lines.
What is a probability distribution? Why is the nature of probability distributions for qualitative and discrete variables different from that for continuous variables?
A probability distribution is a distribution that represents probabilities associated with all possible score values for a variable. The nature of probabilities distributions for qualitative variables and discrete quantitative variables because in the former case it is possible to list all possible values of the variable and their corresponding probabilities. Because the number of values that a continuous variable can have is in principle, infinite, that is not possible for continuous variables. Instead probabilities for continuous variables are conceptualized as the areas within the corresponding intervals of the density curve,
Outliers
A score that is so very extreme to the majority of the scores in the data set, in fact it is so extreme that the score is suspect. When one of these scores occurs, it is important to figure out why.
Ordinal Measurement
A variable is said to measured on this level when the categories can be ordered on some continuum. Classifying individuals/things into categories that are ordered along a dimension of interest. EX: Race times
Frequency graphs
A way of displaying frequency on the Y axis and the dependent variable in the X axis
Independent Variable
Any variable that is presumed to influence a second variable
Probability distributions
Are most commonly represented in graph form. When the score values are mutually exclusive and exhaustive, the probabilities associated with that individual score values represents this form of distribution with respect to that variable. This also holds for discrete quantitative variables.
Theoretical Distributions
Are not constructed by taking formal measurements, but rather by making assumptions and representing these assumptions mathematically
Line plot
Are particularly useful when one or more research participants receive the lowest or highest score possible on a particular measure. Line plots do not need to be closed, this can also be useful when comparing two groups
Probability Distributions for Continuous Variables
Because the number of values that a continuous variable can have is, in principle, infinite, it is not possible to specify a probability distribution for a continuous variable by listing all possible values of the variable and their corresponding probabilities. Specifically, probability distributions for continuous variables are conceptualized in terms of probability density functions
Frequency distributions for grouped and ungrouped data
Because the scores are grouped together in intervals tables of this type are referred to as grouped frequency tables. Note that as with the ungrouped frequency tables, the scores can only pertain to one individual or category, but the grouped and ungrouped data are both listed from highest to lowest values in their respective tables.
Constant (variables)
Does not vary within given constrains
ROUNDING RULE THREE: If the remainder to the right of the decimal place that you wish to round to is exactly one half a measurement unit (that is if it is a 5, followed by nothing, or by nothing by zeros), leave the last digit kept as it is if it is an _______ number. But increase it by ____ if it is an _____ number.
Even, 1, odd
T or F: (for rounding rule TWO) If the remainder to the right of the decimal place that you wish to round to is less than 1/2 a measurement unit, decrease the digit kept by one.
False, you would leave the last digit kept as it is.
Probability Density Functions
For the continuous variables, probability distribution is always represented by a smooth curve over the abscissa. The ordinate, though also not formally demarcated, is used to determine the probability of observing a specified range of score values. The higher the curve is between two points the more "dense" the corresponding score are and the more likely they are to occur. Thus statisticians refer to the probability associated with a specific range of score values as a density.
How do you determine the number of groups?
Generally speaking, the use of 5 to 15 groups tends to strike the appropriate balance between impression and incomprehensibility. If the number of possible score values is small, fewer groups can be used, whereas if the number of possible score values is large, more groups are required.
Ratio Measurement
Have all of the properties of internal, and ordinal measured, but provide even more information, Specifically these measures map onto the underlying dimension in such a way that ratios between the numbers represent ratios on the dimension that is being measured. EX: height
Interval Measurement
Have all of the properties of ordinal measures but allow us to do more than order people on a dimension. They also provide information about the magnitude of the differences between the individuals. Have the property that numerically equal distances on the scale represent equal distances on the dimension that is being measured. Ex: Temperature
Real Limits of a number
If a variable is continuous, it follows the measurements taken on that variable must be approximate. Thus, it is often inaccurate to talk about a specific value of a particular measurement for a continuous variable, rather such measurements are more accurately represented in terms of their real lifts.
Compare the general shape of this graph with the cumulative frequency of the graph from the previous exercise, what accounts for the similarities in their shapes?
In an ungrouped frequency table, each different score value is listed. Whereas in a grouped frequency table, scores are grouped together into intervals
What is the difference between inferential and descriptive statistics?
In inferential statistics we are again trying to describe a population, however, we do so not by taking a measure on all cases in the population, but rather by selecting a sample, observing scores on the variable of interest for that sample, and then inferring something with respect to that variable for the entire population.
What is probability?
In statistics it has a precise meaning, in a given situation, there may be several possible different outcomes that are equally likely to occur and any one of them can occur at random. Usually statisticians report from a probability to convey the likelihood that sample results do not accurately represent what is occurring in the population.
ROUNDING RULE ONE: If the remainder to the right of the decimal place that you wish to round to is greater than 1/2 a measurement unit, _______ the last digit kept by _____.
Increase, 1
Relative Frequency
Indicates the proportion of times that a score occurred and is derived by dividing the number of scores of a given value by the total number of scores in the distribution
Inferential Statistics
Involves taking measurements on a sample and then, from these observations, drawing a conclusion about a population.
Descriptive Statistics
Involves the use of numerical indexes to describe either a sample or a population, when measurements have been taken on all members of a population. In either case, the goals is to DESCRIBE a group of scores in a clear and precise manner
Nominal Measurement
Involves using numbers merely as labels No special meaning In behavioral statistics of interest for variables that involve these measures are frequencies, proportions and percentages ex: types of religion
A probability density function
Is a smooth curve including all possible values of a continuous variable
ROUNDING RULE TWO: If the remained to the right of the decimal place that you wish to round to is less than 1/2 a measurement unit, ______ ____ ____ ____ ____ ___ ___ ___.
Leave the last digit kept as it is.
What is the difference between populations and samples?
Such statements are made with reference to a population are generalized to a group of individuals/categories. It is often not possible to make observations on every member of a population, so an investigator must use a sample. Based on sample observations, it is often possible to generalize the underlying population
Population
The aggregate of all cases that one wishes to generalize to
Steps in the Scientific Method
The basic steps of the scientific method are: 1. make an observation that describes a problem 2. create a hypothesis 3. test the hypothesis 4. draw conclusions and refine the hypothesis.
Where do you start?
The conventional starting point is the closest number that is evenly divisible by the interval size that is equal to or less than the lowest score
What is the measurement hierarchy?
The four types of measurement can be thought of as a hierarchy, at the lowest level nominal measurement allows us only to classify individuals into categories. The second level, ordinal measurement not only allows us to classify individuals into categories but also indicates the relative ordering of the categories on a dimension of interest. Interval measurement is the next level, it possesses the same properties as ordinal measurement, but, in addition, is sensitive to the magnitude of differences along the dimension. However, ration statements are NOT possible at interval level. It is only at the final level, ratio measurement, that such statements are possible. Ratio measures have all of the properties of interval measures and also permit ratio statements to be made.
How tall should the ordinate of the frequency graph be relative to the abscissa? Why is this important?
The ordinate of a frequency graph should be presented such that its height at the demarcation for the highest frequency is approximately 2/3-3/4's the length of the abscissa. This helps to ensure a uniform, clearly interpretable sensation of graphed results
Definition of Real Limits
The real limits of a number are those that points that fall one half a measurement unit below that number and one half a measurement unit above that number. Real limits can be stated for numbers that are expressed as decimals as well as whole numbers
Cumulative Frequency
The sums of the frequencies of the data values from smallest to largest. For any given score, the _____ ______ is the frequency associated with that score, plus the sum of all of the frequencies below that score
How do you determine the size of the interval?
Typically an interval size of 2,3, or a multiple of 5 (for instance 5, 10, or 15) is used. The first step in determining the appropriate interval size for a particular set of data is to subtract the lowest score from the highest score. The difference should then be divided by the desired number of groups and the results rounded to the nearest of the commonly used interval sizes
Qualitative Measures
Variables measured on a nominal level
Percentage
When a relative frequency is multiplied by 100, it reflects the _______ of times that score occurred.
What is the note on rounding rule three?
When this rule is used, the last digit of the answer will always be an even number (0,2,4,6, or 8)
What is the conceptual difference between probabilities and relative frequencies?
Whereas a relative frequency indicates the proportion of times that some score was previously observed, a probability represents the likelihood of observing that score in the future
In statistical notation, X =? Σ = ?
X = general name for a variable Σ = sigma, or the summation sign
We will sometimes want to simultaneously consider two variables, when this is the case, the capital letter ______ can be used to represent the second variable.
Y
When rounding do not forget to include _______
Zeros
Normal Distributions
a function that represents the distribution of many random variables as a symmetrical bell-shaped graph.
round each of the following numbers to three decimal places: a. .39572 b. .9999 c. 3.6666 d. 12.2538 e. 9.724001 f. 1.9950 g. 2.0060
a. .396 b. 1.000 c. 3.667 d. 12.254 e. 9.724 f. 1.995 g. 2.005
State the real limits of each of the following numbers, assuming they are measured in the units reported. a. 21, 384.11 b. 0.689 c. 13 d. 13.0 e. 13.00
a. 21, 384. 105 and 21,384.115 b. 0.6885 and 0.6895 c. 12.5 and 13.5 d. 12.95 and 13.05 e. 12. 995 and 13.005
The ______ lists the score values from low to high, extending one unit below the lowest score to 1 unit above the highest score
abscissa
Parameters
are numbers that summarize data for an entire population
Statistics
are numbers that summarize data from a sample, i.e. some subset of the entire population
Frequency Tables
are useful for summarizing how data are distributed, score values are usually listed from highest to lowest here
Empirical Distributions
based on measurements that are actually taken on a variable
How do you know if it is a measure VS. a scale?
because a measure has as its referent not solely a particular scale, but additionally and individual whom a measurement is taken on, a time, and setting.
________ polygons are _______ "closed" with the abscissa in the sense that the abscissa always includes a value that is a unit lower than the lowest observed score and a value that is a unit higher than the highest score observed, with a frequency of 0 denoted for each. This serves to connect the lines to the the abscissa, and thus, to form the _________.
frequency, always, polygon
Discrete Variables
is a variable whose value is obtained by counting (ex:) number of red marbles in a jar, number of heads when flipping three coins.
Continuous Variables
is a variable whose value is obtained by measuring. (EX: Counting the dogs in a room)
Summation Notation
is used in statistics as a shorthand way of indicating that a set of scores should be summed
The formula for converting absolute frequency into relative frequency is
rf = f/n rf= relative frequency N= total number of cases
Cumulative Relative Frequency
the accumulation of the previous relative frequencies. They are computed in the same manner but use the column of relative frequencies instead, and for any given score the ____ _____ ______ is the relative frequency associated with that score plus the sum of all of the relative frequencies below that score.
Frequency
the number of times the observation occurred and was recorded in an experiment or study.