CJC STATS CHRISTMAN BALL STATE FINAL
Pearson correlation coefficient (Pearson r Pearson product moment correlation coefficient)
(Pearson r) A statistic that is calculated to reflect the degree of relationship between two interval level variables. Also called Pearson Product Moment Correlation Coefficient. -correlation coefficient -closer to 0 or 1
Sample mean symbol
- x
The Variation Ratio for this data is .515. What does this mean? - 51.5% of the cases are in the Modal category - 51.5% of the cases are outliers - 51.5% of the cases are not in the Modal category - 51.5% of the cases are not outliers
- 51.5% of the cases are in the Modal category
What is required for regression analysis? - Interval/ratio independent variable and interval/ratio dependent variable. - Nominal independent variable and nominal dependent variable. - Ordinal independent variable and ordinal dependent variable. - Ordinal independent variable and interval/ratio dependent variable
- Interval/ratio independent variable and interval/ratio dependent variable.
The range of the middle 50% of scores in a data set is the: - range - variance - interquartile range - standard deviation
- interquartile range
For continuous data with the interval and ratio level measurement it is best to report both the ______ and the ______ for purposes of accurately understanding the statistical composition of the distribution. - mode; percent - cumulative frequency; median - mean; median - mean; cumulative percent
- mean; median
Which of the following is the correct way to match the terms below? 1. One-tailed 2. Two-tailed 3. Directional 4. Non-directional -1 and 2; 3 and 4 -1 and 4; 2 and 3 -1 and 3; 2 and 4 -All four terms are interchangeable and go together.
-1 and 3; 2 and 4
When outliers are included in the regression analysis, what is a possible outcome for the data? -The slope and correlation coefficients are not inflated. -The slope and correlation coefficients will remain unchanged. -The slope and correlation coefficients will be excessively influenced. -Only the slope coefficient will be excessively influenced while the correlation coefficient will not be influenced
-The slope and correlation coefficients will be excessively influenced.
Which of the following is the best example of an alternative hypothesis? -Fords are no more likely to be in an auto accident than any other model of car. -Generic drugs are just as effective in treating illnesses as brand-name drugs. -The networks of all the different cell phone providers are the same. -Younger people are more likely to use social media than are older people.
-Younger people are more likely to use social media than are older people.
A contingency table shows the joint distribution of two: -interval level variables -ratio level variables -categorical variables -dependent variables
-categorical variables Contingency tables (also called crosstabs or two-way tables) are used in statistics to summarize the relationship between several categorical variables. A contingency table is a special type of frequency distribution table, where two variables are shown simultaneously.
Probability can be defined as: -number of times a specific even can occur relative to the total number of times that any event can occur. -the total number of possible outcomes minus the number of ways a particular outcome may occur. -the total number of ways a possible outcome can occur. -the likelihood that the researcher would reject the null hypothesis.
-number of times a specific even can occur relative to the total number of times that any event can occur.
A confidence interval is: -the lower and upper boundaries of the confidence interval. -the margin of error around the point estimate that consists of a range of values into which the population value falls. -single values used to estimate an unknown population parameter. -range of values for a variable that has a stated probability of containing an unknown population mean.
-range of values for a variable that has a stated probability of containing an unknown population mean.
If the obtained z or t value is outside of the critical values, we.... -accept the null hypothesis -fail to reject the null hypothesis -reject the null hypothesis -accept the alternative hypothesis .
-reject the null hypothesis
A 'critical value': -is the same as alpha (α). -separates the regions of rejection and non-rejection within a probability distribution. -relates to the decision to reject the research hypothesis. -is a sample parameter.
-separates the regions of rejection and non-rejection within a probability distribution. ????? In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis. If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis. Critical values correspond to α, so their values become fixed when you choose the test's α.
standard deviation is the standard deviation tells
-square root of the average squared deviation score -how far people are from the mean on average
The variance is: -the extent to which the observations are not concentrated in the modal category of the variable. - distance of a score from the mean. -the average-squared difference of each score in a set of scores from the mean of those scores. - the square root of the average squared difference of each score in a set of scores from the mean of those scores.
-the average-squared difference of each score in a set of scores from the mean of those scores.
The restricted additional rule of probabilities relates to: -determining the number of different ways a particular outcome can occur. -the probability of either of two mutually exclusive events -occurring is equal to the sum of their separate probabilities. -the fact that the probability of an event occurring is between 0 and 100. -the probability of two non-mutually exclusive events occurring is equal to the sum of their separate probabilities.
-the probability of either of two mutually exclusive events -occurring is equal to the sum of their separate probabilities.
The standard level of significance that social scientists strive for in hypothesis testing is .25 .10 .05 .01
.01 and .05
Alpha Levels
.01, .05, .10 Risk we are willing to take in rejecting a true null hypothesis.
Which of the following is correct? 1 - confidence interval = α (alpha) 1 + confidence interval = α (alpha) 0 + confidence interval = α (alpha) 2 - confidence interval = α (alpha)
1 - confidence interval = α (alpha)
Hypothesis testing
1 Formally State the null and research hypothesis 2 Select an appropriate test stats and sampling distribution 3 Select a level of Significance 4 Conduct test 5 Make decision
Calculate the interquartile range from the following set of data. 16, 19, 24, 36, 41, 45, 48, 54, 62, 88, 91, 92
45
In chi-square testing fo = fe =
= the observed cell frequencies from our sample data = the expected cell frequencies we should get under the null hypothesis
Regression Line
A line depicting the relationship between independent and dependent variables determined by a least-squares regression equation
What does a higher r2 represent? A stronger nonlinear relationship between X and Y. A weaker nonlinear relationship between X and Y. A stronger linear relationship between X and Y. A weaker linear relationship between X and Y.
A stronger linear relationship between X and Y.
Y=a+bx (Xwith line on top,Ywith line on top)
A= y intercept B= Slope Centroid
_____ refers to the probability of a statistic needed to reject the null hypothesis. The population mean (μ) The assumption of normality Alpha (α) Standardized score
Alpha (α)
Which of the following is not a measure of association? Lambda Beta Gamma Kramer's V
Beta
Measures of dispersion
Capture how different the values of a varaible are. The more dispersion there is in a variable, the more different the values are from each other or from some central tendency and the more heterogeneity in the data Variation Ratio, Range, Interquartile range
Variable
Characteristics of a property that can vary/chance
Constant
Characteristics of property that does not vary
Partial correlation coefficient
Correlation between two variables after controlling for a third variable Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables). Although partial correlation does not make the distinction between independent and dependent variables, the two variables are often considered in such a manner (i.e., you have one continuous dependent variable and one continuous independent variable, as well as one or more continuous control variables).
What are we looking for in a contingency table? Covariation Equilibrium Distinctions Anomalies
Covariation
One tailed
Directional Research alternative hypothesis that states direction
Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ>5
Directional hypotheses for a larger population mean
Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ<5
Directional hypotheses fora smaller population mean
Frequency Distribution
Distribution of values that make up a variable distribution
Which term describes the difference between the predicted value of y and the observed value of y? Regression Intercept Slope Error
Error
What does "fe" represent? Expected frequency for cell k Observed frequency for cell k Dsired frequency for cell k Excise frequency for cell k
Expected frequency for cell k
A research found that, in general, the more unstructured free time students have the more likely they are to get involved in gang activity. This would be an example of a positive relationship. T or F
F
An alpha level of .05 is used for all Z tests. T or F
F
As the sample size decreases, the variation around the mean of the sampling distribution decreases and is more likely to cluster around the true population mean. T or F
F
At any given level of alpha, a larger zobt is needed to reject the null hypothesis in the directional hypothesis test. T or F
F
Confidence intervals are the lower and upper boundaries on the confidence limits. T or F
F
If two different sets of data have the same range, the variability for both sets has to be the same. T or F
F
In a contingency table, the independent variable must be the row and the dependent variable must be the column. T or F
F
The chi-square test of independence indicates whether the observed frequencies are significantly different than the expected frequencies and the strength of the relationship. T or F
F
The contingency coefficient and Cramer's V are best applied to tables that are larger than 2 by 2. T or F
F
The estimated standard deviation of a sampling distribution gets smaller when the sample size is decreased. T or F
F
The main reason for creating a scatterplot between two variables is to determine whether the relationship is statistically significant. T of F
F
The φ coefficient can be used with chi-squares tables that are larger than two by two. T or F
F
There is no two-tailed alternative for chi-square tests. T or F
F
You cannot calculate a confidence interval using a proportion. T or F
F
The direction of the sign of a Pearson correlation coefficient ( r ) indicates the strength of the relationship. T or F
F - The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of 0 indicates that there is no association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one variable increases, so does the value of the other variable. A value less than 0 indicates a negative association; that is, as the value of one variable increases, the value of the other variable decreases. the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).
The mean deviation is the most frequently used measure of dispersion for interval/ratio level variables. T or F
F - most widely used measure of dispersion is the average distance from the mean the larger the *standard deviation* , the larger the average distance the data point is from the mean of the distribution used with interval - ratio variables ( but often used with ordinal - level variables)
On a scatter plot graphing regression, the independent variable is along the Y axis. T or F
F- X axis
If the sample size is larger than 120, the t distribution is used instead of the z distribution. T or F
False
The mode is the most useful measure of central tendency for ordinal level measures?T or F
False - Median
The median can be used when analyzing nominal, ordinal, and interval level variables. T or F
False - Ordinal, Interval/Ratio (skewed)
The mean can be calculated for all levels of measurement. T or F
False-
Cumulative Frequency distributions
Frequency distribution reserved for ordinal or interval/ratio level data made by starting with the lowest value of the variable (or the highest value) and cumulating (keeping a running tally or sum) the frequences in each adjacent value until the highest (or lowest) value is reached. The sum of a cumulative frequency distribution should be equal to the total number of cases (n)
Which of the following measure of association is best used for a chi-square table that is 3 by 4 with ordinal level variables? φ Contingency coefficient Lambda Gamma
Gamma
Which rule of probability states that for two non-mutually exclusive events the probability of each event occurring is equal to the sum of their separate probabilities minus the probability of their joint occurrences? Bounding rule of probabilities Restricted addition rule of probabilities General addition rule of probabilities Restricted multiplication rule of probabilities
General addition rule of probabilities
Null Hypothesis
Hypothesis that is tested; it always assumes there is no relationship between the independent and dependent variables in a hypothesis test the null is the hypothesis that is initially assumed to be true.
What can you do to decrease the width of a confidence interval without compromising the level of confidence? Utilize a mobius strip. Collect a new random sample. Decrease sample size. Increase sample size.
Increase sample size.
Which type of relationship between two variables does the Pearson correlation assume? Linear Significant Skewed Nonlinear
Linear
Measurements of Central Tendency
Mean, Median, Mode summary statistics that capture the "typical,""average," or "most likely" score or value in a variable distribution
____________________ is a summary measure that captures the magnitude or strength of the relationship between two variables. Chi-square test Joint frequency distribution Yule's Q Measure of association
Measure of association
Interquartile range
Measure of dispersion appropriate for interval/ratio data. It measures the range of scores in the middle 50% of a distribution of continuous scores and is calculated as the difference between the score and the third quartile (the 75th percentile) and the scores at the first quartile (the 25th percentile)
Variation Ratio
Measure of dispersion used at a nominal or ordinal level.measures proportion of cases not in modal value. the greater the magnitude of the variation ratio, the more dispersion there is in a nominal or ordinal value
Interquartile Range
Measure of dispersion used for Interval/Ratio Q1 25th percentile Q2 50th Percentile Q3 75th percentile
Range
Measure of dispersion used for interval/ratio Highest-Lowest
Measures of Association
Measure strength and direction only measure direction if its interval/ratio Gamma, Lambda, Cramers V Linked to Level Of Measurments by Gamma, Lambda, Cramers V
Variance
Measures the average squared deviations from the mean for an interval/ratio variable
Which measure of central tendency is best used for categorical variables? -Mode -Median -Mean -Bimodal distribution
Mode
Levels Of Measurement
Nominal, Ordinal, Interval Ratio mathematical nature of the values for a variable
Two Tailed
Non Directional Research alternative hypothesis that does not state direction. Only states there is a relationship between independent and dependent variables
Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ=/=5
Non directional hypothesis for a population mean
The bounding rule of probability states that: A probability is bounded on each end by an unknown value Probabilities only have upper bounds. Probabilities only have lower bounds. Probabilities are bounded by 0 and 1.
Probabilities are bounded by 0 and 1.
Type I Error
Probability of rejecting a null hypothesis that is in fact true. It is equal to the alpha probability level
Type 11 Error
Probability of retaining a null hypothesis that is in fact false
Difference between interval and Ratio
Ratio has a absolute zero, Interval does not
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero-which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero. Good examples of ratio variables include height and weight. Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.
Standard score (Z-score)
Score from the standard normal probability distribution that indicates how many standard deviation units a score is from the mean of zero
Which two pieces of information can be obtained from a z-score? Sign and frequency Sign and magnitude Magnitude and origin Frequency and origin
Sign and magnitude
The Median can be calculated or computed with grouped frequency distribution data. - Always - Sometimes - Never
Sometimes We can calculate a median with grouped data so long as it was originally in continuous (interval/ratio) form and our midpoints are meaningful.
Standard Deviation
Square Root of the squared deviations about the mean
Standard deviation
Square root of the squared deviations about the mean
Sample Statistic
Statistic (i.e., mean, proportion, etc.) obtained from a sample of the population
Pearson correlation coefficient (pearson's r)
Statistic that quantifies the direction and strenght of the relationship between two interval/ratio level variables
Confidence intervals
Statistical interval around a point estimate (e.g., mean) that we can provide a level of confidence for capturing the true population paramater
t test
Statistical test used to test several null hypotheses including the difference between two means small population
Z test
Statistical test used to test several null hypotheses including the difference between two means Use this for large samples
A contingency table is generally defined by the number of columns and rows in the table. T or F
T
A distribution that had two modes would be considered bimodal? T or F
T
A linear regression equation predicts a score on one variable from a score on another variable and that is based on the relationship between the two variables. T or F
T
A researcher believes that crime rises as individuals get older, but starts to decrease as they are past 30. This would be an example of a nonlinear relationship. T or F
T
A researcher has to compare percentage differences found in the categories for the independent variable at the same category of the dependent variable. T or F
T
A researcher wants to understand the probability of jury verdicts (guilty versus non-guilty). This is an example of a variable that would be considered a binomial variable. T orn F
T
A type I error occurs when we reject the null hypothesis, even though it is true T or F
T
Chi-square allows for the rejection of a null hypothesis of independence but does not tell the researcher the magnitude or strength of that relationship. T or F
T
In a normal distribution the mode, median, and mean will all be the same. T or F
T
In the regression equation the predicted y value is based on an estimate and therefore can only be considered a best guess. T or F
T
Point estimates are the estimate of the mean and proportion that we obtained from a sample. T or F
T
The mean is calculated by summing up all of the scores for a particular variable and then dividing by the number of cases. T or F
T
The mode may give a misleading notion of the central tendency of the data. T or F
T
The null hypothesis indicates that there is no difference while the alternative hypothesis indicates that there is a difference between the means of two groups. T or F
T
The range and interquartile range use only two scores to estimate the amount of dispersion, making them more limited measures than the variance and standard deviation. T or F
T
The variation ratio is a measure of dispersion that is appropriate to use for variables such as religion, gender, and race. T or F
T
Unlike the mean, the median is not influenced by extreme scores, either low or high. T or F
T
When a researcher says they are 99% confident, they are saying that the population mean will not fall within the confidence interval 1% of the time. T orF
T
When an event is an independent event, its occurrence does not effect nor is it effected by another event's occurrence. T or F
T
When calculating the variance and standard deviation for grouped data, one would use the midpoint of the group instead of the individual case score. T or F
T
When estimating the regression model two degrees of freedom are lost because two parameters are being estimated. T or F
T
When trying to obtain greater confidence the researcher loses precision. T or F
T
Gamma (Yule's Q)
Tells us the strength or magnitude of a relationship between two ordinal level variables. tells us how closely two pairs of data points "match". Gamma test for an association between points and also tells us the strength of association. The goal of the test is to be able to predict where new values will rank. For example, if score A scores "LOW" for question 1 and "HiGH" for question 2, will score B also result in a LOW/High response? Gamma can be calculated for ordinal (ordered) variables that are continuous variables (like height or weight) or discrete variables (like "hot" "hotter" and "hottest"). The gamma coefficient ranges between -1 and 1. 1 = perfect positive correlation: if one value goes up, so does the other. -1 = perfect inverse correlation: as one value goes up, the other goes down. 0 = there is no association between the variables
Chi Squared test
Tests the null hypothesis that two categorical variables are independent of each other Alpha level degrees of freedom type 1 & type 11
Linear Relationship
The effect of x on y is generally the same at all values of x
Validity
The goal of validity is when the statements/ conclusions that you make are true about Empirical reality
What does the expected cell frequencies table represent? A directional hypothesis. A nondirectional hypothesis. An alternative hypothesis. The null hypothesis.
The null hypothesis.
Non Probability Sampling (quota)
These methods are not based on random selection and do not allow us to know in advance the likelihood of any element of a population being selected for the sample
Probability sampling methods
These methods rely on random selection or chance and allow us to know in advance how likely it is that any element of a population is selected for the sample
Which of the following would be a reason to create a scatterplot between two variables? To calculate the degrees of freedom To get a sense of the strength of the relationship To see if the variables are categorical or continuous To see if the group means are the same
To get a sense of the strength of the relationship
Which type of error occurs when we fail to reject the null hypothesis when we should reject it? Non-descript error Statistical error Type I error Type II error
Type II error- The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α. Type II error. *A Type II error occurs when the researcher fails to reject a null hypothesis that is false.*
Degrees of freedom (df)
Value nessesary along with an alpha value to find the critical value and region for a null hypothesis or confidence interval
Critical Value
Value that corresponds with an alpha level for any particular null hypothesis or confidence interval.
Which measure of dispersion is appropriate to use with nominal level variables? Variation ratio Range Variance Standard deviation
Variation ratio
Reliability
When it yields consistent scores or observations from a given phenomenon on different occasions. which is a pre req for measurement of validity
Causal Validity(Internal validity)
When we can assume that our independent variable did cause the dependent variable
Negative Linear relationship
X and Y go in opposite directions
Positive linear relationship
X and Y increase/decrease in the same direction
Multiple coefficient of determination (r squared)
You get a value of R squared when 2 or more independent variables are predicting a dependent variable R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model. Or: R-squared = Explained variation / Total variation R-squared is always between 0 and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean.
Which of the following is the best example of an alternative hypothesis? Younger people are more likely to use social media than are older people. Fords are no more likely to be in an auto accident than any other model of car. Generic drugs are just as effective in treating illnesses as brand-name drugs. The networks of all the different cell phone providers are the same.
Younger people are more likely to use social media than are older people.
Which test statistic and sampling distribution do we use when we have one population mean and a large sample? Z test and distribution T test and distribution Student's t test and distribution Chi-squared test and distribution
Z test and distribution
The null hypothesis states that the µ = 110 and the alternative hypothesis states µ < 110. The mean of the sample was 109 with a standard deviation of 3. After calculating the t statistic at the α .05 level, the research would reject the null hypothesis. T or F
`F
Cumulative Frequency Distribution
a frequency distribution used to for interval/ratio data used to keep a running tally of sum of the frequency distribution
If two data sets have the same range: a. the distances from the smallest to largest observations in both sets will be the same. b. the smallest and largest observations are the same in both sets. c. both sets will have the same standard deviation. d. both sets will have the same interquartile range
a. the distances from the smallest to largest observations in both sets will be the same.
The Median is insensitive to high and low scores - called "outliers" - in a frequency distribution. -always -sometimes -never
always
Mean
average Can be used at the interval/ratio level
Mutually exclusive intervals
class intervals must not overlap ex; the first measure ensures that each individual answer given in a single or multiple response survey question cannot be true at the same time.
Exhaustive (collective) intervals
class intervals must provide a place to count all original values of the variable distribution ex;the second measure ensures that all answers given for the question cover the entire realm of possible answers. Survey writers listing answers in an aided single or multiple response question need to make sure that all potential answers to the particular question are listed for the respondent.
The probability of an event not occurring is called the . opposite of an event complement of an event inverse relation of an event façade of an event
complement of an event
A binominal distribution has a variable that: consists of three categories. consists of only one category. consists of multiple categories. consists of two categories.
consists of two categories.
Interval
continuous data -Interval scales are numeric scales in which we know not only the order, but also the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example of an interval scale in which the increments are known, consistent, and measurable. Interval scales are nice because the realm of statistical analysis on these data sets opens up. For example, central tendency can be measured by mode, median, or mean; standard deviation can also be calculated. Here's the problem with interval scales: they don't have a "true zero." For example, there is no such thing as "no temperature." Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide. Bottom line, interval scales are great, but we cannot calculate ratios, which brings us to our last measurement scale...
A one-tailed test is also referred to as which of the following? single-point examination mono-polar directional nondirectional
directional
The statement "μ > 15" is an example of a: directional alternative hypothesis. non-directional alternative hypothesis. non-directional null hypothesis. directional null hypothesis.
directional alternative hypothesis.
In statistical terminology, we always either reject the null hypothesis or . accept the null hypothesis accept the alternative hypothesis reject the alternative hypothesis fail to reject the null hypothesis
fail to reject the null hypothesis If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept the alternative hypothesis. You should note that you cannot accept the null hypothesis, but only find evidence against it.
Ordinal
has rank order -With ordinal scales, it is the order of the values is what's important and significant, but the differences between each one is not really known. Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc. "Ordinal" is easy to remember because is sounds like "order" and that's the key to remember with "ordinal scales"-it is the order that matters, but that's all you really get from these. Advanced note: The best way to determine central tendency on a set of ordinal data is to use the mode or median; the mean cannot be defined from an ordinal set.
What is the range for the following set of data? 1, 2, 6, 4, 10, 9, 12, 5, 9, 6, 13 9 10 11 12
highest number-least number = 12*
Class intervals
in creating a grouped frequency distribution, the class interval defines the range of values that are included in each interval n statistics, the data is arranged into different classes and the width of such class is called class interval. Class intervals are generally equal in width but this might not be the case always. Also, they are generally mutually exclusive. Class Intervals are very useful in drawing histograms. A class interval is a way to divide data and group certain answers together. When determining the class interval to use with your data, you must follow three rules: The same person or unit can be in only one class interval. The width or range of numbers in the class intervals must be equal. There are no numbers left out of the groupings.
Measures of Dispersion really only have practical use for what level of measurement? - nominal - ordinal - interval/ratio - all the above
interval/ratio
Population
larger set of cases or aggregate number of people that a researcher is actually interested in or wishes to know something *bigger populations are better for sampling.
The chosen alpha level can also be referred to as the . standard mean level of significance population parameter critical window
level of significance
Correlation
measure the linear correlation between two interval/ration level variables
Measures that capture differences within a variable are called: measures of central tendency. measures of dispersion. summary measures. standard deviations.
measures of dispersion.
mode
most frequent Can be used at the Nominal, Ordinal, or interval/ratio level
Degrees of Freedom (DF) = n n - 1 n + 1 n / x
n - 1
At what point do t-distributions and z-distributions appear virtually identical? n ≥ 30 n ≥ 60 n ≥ 120 n ≥ 1,000
n ≥ 120
Nominal
name alone -Nominal scales are used for labeling variables, without any quantitative value. "Nominal" scales could simply be called "labels." Here are some examples, below. Notice that all of these scales are mutually exclusive (no overlap) and none of them have any numerical significance. A good way to remember all of this is that "nominal" sounds a lot like "name" and nominal scales are kind of like "names" or labels.
The Range is susceptible to ________ in data distributions. modes variances outliers standard deviants
outliers
What is the probability of two independent events occurring simultaneously? p(A and B) = p(A) + p(B) p(A and B) = p(A) - p(B) p(A and B) = p(A) x p(B) p(A and B) = p(A) / p(B)
p(A and B) = p(A) x p(B)
Another way of expressing a confidence interval is . point estimate ± standard deviation point estimate ± margin of error margin of error ± standard deviation Median ± margin of error
point estimate ± margin of error
The estimates of the mean and proportion that are obtained from a sample are referred to as ____________________ of the same values in the population. sample statistics point estimates population parameters confidence levels
point estimates
What is the fundemental aspect of probability sampling
random selection
As a Measure of Dispersion, the _______ is simply the difference between the highest and lowest score in a distribution. -variance -standard deviation -range -variation ratio
range
median
rank Can be used at the interval/ratio level
In the regression equation β is the: variable score regression coefficient constant degrees of freedom
regression coefficient
Hypothesis tests are sensitive to . external factors researcher error outliers sample size
sample size
In a given frequency distribution, the Mode is represented by one number only. -Always -Sometimes -Never
somtimes
The __________ of a distribution of scores for a variable is measured by the __________________. mode; symmetry variance; variability mean; modality standard deviation; variability
standard deviation; variability Variability (also called spread or dispersion) refers to how spread out a set of data is. Variability gives you a way to describe how much data sets vary and allows you to use statistics to compare your data to other sets of data. The four main ways to describe variability in a data set are: Range Interquartile range Variance Standard deviation.
Population Parameter
statistic obtains from a population since we rarely have entire population data we typically estimate population parameters
The slope coefficient (b) measures the ______________ of the linear relationship between the independent and dependent variable while the correlation coefficient indicates the _______________ of the relationship. strength; direction form; strength direction; form strength; form
strength; direction
Population Sample
subset of the population that a researcher most often use to make generalizations about a larger population
Lambda
tells us the strength of a relationship between two nominal level variables helps with predictions It may range from 0.0 to 1.0. Lambda provides us with an indication of the strength of the relationship between independent and dependent variables.
Cramers V
tells us the strength of a relationship between two nominal level variables. statistical measure of association that quantifies the strength or magnitude of a relationship between two nominal level variables. Cramer's V is the most popular of the chi-square-based measures of nominal association because it gives good norming from 0 to 1 regardless of table size, when row marginals equal column marginals. V defines a perfect relationship as one which is predictive or ordered monotonic, and defines a null relationship as statistical independence, as discussed in the section on association. However, the more unequal the marginals, the more V will be less than 1.0.
Deciding to use a directional versus non-directional alternative hypothesis most directly affects: alpha (α) the critical value(s) the degrees of freedom the sample size
the critical value(s)
Deciding to use a directional versus non-directional alternative hypothesis most directly affects: alpha (α) the sample size the critical value(s) the degrees of freedom
the critical value(s)
The distance of a score from the mean is referred to as: the range. the variance. the standard deviation. the mean deviation score.
the mean deviation score.
The mode is defined as: -the mathematical average. - the middle score of a distribution that splits it into two equal halves. - the most frequently occurring score in a distribution. - the average of all the midpoints in a distribution.
the most frequently occurring score in a distribution.
When testing the null hypothesis a researcher begins with the assumption that: the researcher can prove the null hypothesis is false. the researcher can prove the alternative hypothesis is false. the alternative hypothesis is true. the null hypothesis is true.
the null hypothesis is true.
Variable Vs. Constant
variables have different values, while a constant has only one value
The ________ is a Measure of Dispersion that can be used with nominal and ordinal level data. standard deviation variance mode variation ratio
variation ratio The variation ratio is a simple measure of statistical dispersion in nominal distributions; it is the simplest measure of qualitative variation. It is defined as the proportion of cases which are not in the mode category: Just as with the range or standard deviation, the larger the variation ratio, the more differentiated or dispersed the data are; and the smaller the variation ratio, the more concentrated and similar the data are.
Measurement Validity
when we have actually measured what we intended to measure
What is the equation for a straight line? y = mca y = ax + bc + n/df y = a + x2 y = a + bx
y = a + bx
Population mean symbol
μ