CJC STATS CHRISTMAN BALL STATE FINAL

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Pearson correlation coefficient (Pearson r Pearson product moment correlation coefficient)

(Pearson r) A statistic that is calculated to reflect the degree of relationship between two interval level variables. Also called Pearson Product Moment Correlation Coefficient. -correlation coefficient -closer to 0 or 1

Sample mean symbol

- x

The Variation Ratio for this data is .515. What does this mean? - 51.5% of the cases are in the Modal category - 51.5% of the cases are outliers - 51.5% of the cases are not in the Modal category - 51.5% of the cases are not outliers

- 51.5% of the cases are in the Modal category

What is required for regression analysis? - Interval/ratio independent variable and interval/ratio dependent variable. - Nominal independent variable and nominal dependent variable. - Ordinal independent variable and ordinal dependent variable. - Ordinal independent variable and interval/ratio dependent variable

- Interval/ratio independent variable and interval/ratio dependent variable.

The range of the middle 50% of scores in a data set is the: - range - variance - interquartile range - standard deviation

- interquartile range

For continuous data with the interval and ratio level measurement it is best to report both the ______ and the ______ for purposes of accurately understanding the statistical composition of the distribution. - mode; percent - cumulative frequency; median - mean; median - mean; cumulative percent

- mean; median

Which of the following is the correct way to match the terms below? 1. One-tailed 2. Two-tailed 3. Directional 4. Non-directional -1 and 2; 3 and 4 -1 and 4; 2 and 3 -1 and 3; 2 and 4 -All four terms are interchangeable and go together.

-1 and 3; 2 and 4

When outliers are included in the regression analysis, what is a possible outcome for the data? -The slope and correlation coefficients are not inflated. -The slope and correlation coefficients will remain unchanged. -The slope and correlation coefficients will be excessively influenced. -Only the slope coefficient will be excessively influenced while the correlation coefficient will not be influenced

-The slope and correlation coefficients will be excessively influenced.

Which of the following is the best example of an alternative hypothesis? -Fords are no more likely to be in an auto accident than any other model of car. -Generic drugs are just as effective in treating illnesses as brand-name drugs. -The networks of all the different cell phone providers are the same. -Younger people are more likely to use social media than are older people.

-Younger people are more likely to use social media than are older people.

A contingency table shows the joint distribution of two: -interval level variables -ratio level variables -categorical variables -dependent variables

-categorical variables Contingency tables (also called crosstabs or two-way tables) are used in statistics to summarize the relationship between several categorical variables. A contingency table is a special type of frequency distribution table, where two variables are shown simultaneously.

Probability can be defined as: -number of times a specific even can occur relative to the total number of times that any event can occur. -the total number of possible outcomes minus the number of ways a particular outcome may occur. -the total number of ways a possible outcome can occur. -the likelihood that the researcher would reject the null hypothesis.

-number of times a specific even can occur relative to the total number of times that any event can occur.

A confidence interval is: -the lower and upper boundaries of the confidence interval. -the margin of error around the point estimate that consists of a range of values into which the population value falls. -single values used to estimate an unknown population parameter. -range of values for a variable that has a stated probability of containing an unknown population mean.

-range of values for a variable that has a stated probability of containing an unknown population mean.

If the obtained z or t value is outside of the critical values, we.... -accept the null hypothesis -fail to reject the null hypothesis -reject the null hypothesis -accept the alternative hypothesis .

-reject the null hypothesis

A 'critical value': -is the same as alpha (α). -separates the regions of rejection and non-rejection within a probability distribution. -relates to the decision to reject the research hypothesis. -is a sample parameter.

-separates the regions of rejection and non-rejection within a probability distribution. ????? In hypothesis testing, a critical value is a point on the test distribution that is compared to the test statistic to determine whether to reject the null hypothesis. If the absolute value of your test statistic is greater than the critical value, you can declare statistical significance and reject the null hypothesis. Critical values correspond to α, so their values become fixed when you choose the test's α.

standard deviation is the standard deviation tells

-square root of the average squared deviation score -how far people are from the mean on average

The variance is: -the extent to which the observations are not concentrated in the modal category of the variable. - distance of a score from the mean. -the average-squared difference of each score in a set of scores from the mean of those scores. - the square root of the average squared difference of each score in a set of scores from the mean of those scores.

-the average-squared difference of each score in a set of scores from the mean of those scores.

The restricted additional rule of probabilities relates to: -determining the number of different ways a particular outcome can occur. -the probability of either of two mutually exclusive events -occurring is equal to the sum of their separate probabilities. -the fact that the probability of an event occurring is between 0 and 100. -the probability of two non-mutually exclusive events occurring is equal to the sum of their separate probabilities.

-the probability of either of two mutually exclusive events -occurring is equal to the sum of their separate probabilities.

The standard level of significance that social scientists strive for in hypothesis testing is .25 .10 .05 .01

.01 and .05

Alpha Levels

.01, .05, .10 Risk we are willing to take in rejecting a true null hypothesis.

Which of the following is correct? 1 - confidence interval = α (alpha) 1 + confidence interval = α (alpha) 0 + confidence interval = α (alpha) 2 - confidence interval = α (alpha)

1 - confidence interval = α (alpha)

Hypothesis testing

1 Formally State the null and research hypothesis 2 Select an appropriate test stats and sampling distribution 3 Select a level of Significance 4 Conduct test 5 Make decision

Calculate the interquartile range from the following set of data. 16, 19, 24, 36, 41, 45, 48, 54, 62, 88, 91, 92

45

In chi-square testing fo = fe =

= the observed cell frequencies from our sample data = the expected cell frequencies we should get under the null hypothesis

Regression Line

A line depicting the relationship between independent and dependent variables determined by a least-squares regression equation

What does a higher r2 represent? A stronger nonlinear relationship between X and Y. A weaker nonlinear relationship between X and Y. A stronger linear relationship between X and Y. A weaker linear relationship between X and Y.

A stronger linear relationship between X and Y.

Y=a+bx (Xwith line on top,Ywith line on top)

A= y intercept B= Slope Centroid

_____ refers to the probability of a statistic needed to reject the null hypothesis. The population mean (μ) The assumption of normality Alpha (α) Standardized score

Alpha (α)

Which of the following is not a measure of association? Lambda Beta Gamma Kramer's V

Beta

Measures of dispersion

Capture how different the values of a varaible are. The more dispersion there is in a variable, the more different the values are from each other or from some central tendency and the more heterogeneity in the data Variation Ratio, Range, Interquartile range

Variable

Characteristics of a property that can vary/chance

Constant

Characteristics of property that does not vary

Partial correlation coefficient

Correlation between two variables after controlling for a third variable Partial correlation is a measure of the strength and direction of a linear relationship between two continuous variables whilst controlling for the effect of one or more other continuous variables (also known as 'covariates' or 'control' variables). Although partial correlation does not make the distinction between independent and dependent variables, the two variables are often considered in such a manner (i.e., you have one continuous dependent variable and one continuous independent variable, as well as one or more continuous control variables).

What are we looking for in a contingency table? Covariation Equilibrium Distinctions Anomalies

Covariation

One tailed

Directional Research alternative hypothesis that states direction

Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ>5

Directional hypotheses for a larger population mean

Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ<5

Directional hypotheses fora smaller population mean

Frequency Distribution

Distribution of values that make up a variable distribution

Which term describes the difference between the predicted value of y and the observed value of y? Regression Intercept Slope Error

Error

What does "fe" represent? Expected frequency for cell k Observed frequency for cell k Dsired frequency for cell k Excise frequency for cell k

Expected frequency for cell k

A research found that, in general, the more unstructured free time students have the more likely they are to get involved in gang activity. This would be an example of a positive relationship. T or F

F

An alpha level of .05 is used for all Z tests. T or F

F

As the sample size decreases, the variation around the mean of the sampling distribution decreases and is more likely to cluster around the true population mean. T or F

F

At any given level of alpha, a larger zobt is needed to reject the null hypothesis in the directional hypothesis test. T or F

F

Confidence intervals are the lower and upper boundaries on the confidence limits. T or F

F

If two different sets of data have the same range, the variability for both sets has to be the same. T or F

F

In a contingency table, the independent variable must be the row and the dependent variable must be the column. T or F

F

The chi-square test of independence indicates whether the observed frequencies are significantly different than the expected frequencies and the strength of the relationship. T or F

F

The contingency coefficient and Cramer's V are best applied to tables that are larger than 2 by 2. T or F

F

The estimated standard deviation of a sampling distribution gets smaller when the sample size is decreased. T or F

F

The main reason for creating a scatterplot between two variables is to determine whether the relationship is statistically significant. T of F

F

The φ coefficient can be used with chi-squares tables that are larger than two by two. T or F

F

There is no two-tailed alternative for chi-square tests. T or F

F

You cannot calculate a confidence interval using a proportion. T or F

F

The direction of the sign of a Pearson correlation coefficient ( r ) indicates the strength of the relationship. T or F

F - The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of 0 indicates that there is no association between the two variables. A value greater than 0 indicates a positive association; that is, as the value of one variable increases, so does the value of the other variable. A value less than 0 indicates a negative association; that is, as the value of one variable increases, the value of the other variable decreases. the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).

The mean deviation is the most frequently used measure of dispersion for interval/ratio level variables. T or F

F - most widely used measure of dispersion is the average distance from the mean the larger the *standard deviation* , the larger the average distance the data point is from the mean of the distribution used with interval - ratio variables ( but often used with ordinal - level variables)

On a scatter plot graphing regression, the independent variable is along the Y axis. T or F

F- X axis

If the sample size is larger than 120, the t distribution is used instead of the z distribution. T or F

False

The mode is the most useful measure of central tendency for ordinal level measures?T or F

False - Median

The median can be used when analyzing nominal, ordinal, and interval level variables. T or F

False - Ordinal, Interval/Ratio (skewed)

The mean can be calculated for all levels of measurement. T or F

False-

Cumulative Frequency distributions

Frequency distribution reserved for ordinal or interval/ratio level data made by starting with the lowest value of the variable (or the highest value) and cumulating (keeping a running tally or sum) the frequences in each adjacent value until the highest (or lowest) value is reached. The sum of a cumulative frequency distribution should be equal to the total number of cases (n)

Which of the following measure of association is best used for a chi-square table that is 3 by 4 with ordinal level variables? φ Contingency coefficient Lambda Gamma

Gamma

Which rule of probability states that for two non-mutually exclusive events the probability of each event occurring is equal to the sum of their separate probabilities minus the probability of their joint occurrences? Bounding rule of probabilities Restricted addition rule of probabilities General addition rule of probabilities Restricted multiplication rule of probabilities

General addition rule of probabilities

Null Hypothesis

Hypothesis that is tested; it always assumes there is no relationship between the independent and dependent variables in a hypothesis test the null is the hypothesis that is initially assumed to be true.

What can you do to decrease the width of a confidence interval without compromising the level of confidence? Utilize a mobius strip. Collect a new random sample. Decrease sample size. Increase sample size.

Increase sample size.

Which type of relationship between two variables does the Pearson correlation assume? Linear Significant Skewed Nonlinear

Linear

Measurements of Central Tendency

Mean, Median, Mode summary statistics that capture the "typical,""average," or "most likely" score or value in a variable distribution

____________________ is a summary measure that captures the magnitude or strength of the relationship between two variables. Chi-square test Joint frequency distribution Yule's Q Measure of association

Measure of association

Interquartile range

Measure of dispersion appropriate for interval/ratio data. It measures the range of scores in the middle 50% of a distribution of continuous scores and is calculated as the difference between the score and the third quartile (the 75th percentile) and the scores at the first quartile (the 25th percentile)

Variation Ratio

Measure of dispersion used at a nominal or ordinal level.measures proportion of cases not in modal value. the greater the magnitude of the variation ratio, the more dispersion there is in a nominal or ordinal value

Interquartile Range

Measure of dispersion used for Interval/Ratio Q1 25th percentile Q2 50th Percentile Q3 75th percentile

Range

Measure of dispersion used for interval/ratio Highest-Lowest

Measures of Association

Measure strength and direction only measure direction if its interval/ratio Gamma, Lambda, Cramers V Linked to Level Of Measurments by Gamma, Lambda, Cramers V

Variance

Measures the average squared deviations from the mean for an interval/ratio variable

Which measure of central tendency is best used for categorical variables? -Mode -Median -Mean -Bimodal distribution

Mode

Levels Of Measurement

Nominal, Ordinal, Interval Ratio mathematical nature of the values for a variable

Two Tailed

Non Directional Research alternative hypothesis that does not state direction. Only states there is a relationship between independent and dependent variables

Formal statements of the Null and Research /Alternative Hypothesis: H0;μ=5 H1;μ=/=5

Non directional hypothesis for a population mean

The bounding rule of probability states that: A probability is bounded on each end by an unknown value Probabilities only have upper bounds. Probabilities only have lower bounds. Probabilities are bounded by 0 and 1.

Probabilities are bounded by 0 and 1.

Type I Error

Probability of rejecting a null hypothesis that is in fact true. It is equal to the alpha probability level

Type 11 Error

Probability of retaining a null hypothesis that is in fact false

Difference between interval and Ratio

Ratio has a absolute zero, Interval does not

Ratio

Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about the order, they tell us the exact value between units, AND they also have an absolute zero-which allows for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition of zero. Good examples of ratio variables include height and weight. Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales.

Standard score (Z-score)

Score from the standard normal probability distribution that indicates how many standard deviation units a score is from the mean of zero

Which two pieces of information can be obtained from a z-score? Sign and frequency Sign and magnitude Magnitude and origin Frequency and origin

Sign and magnitude

The Median can be calculated or computed with grouped frequency distribution data. - Always - Sometimes - Never

Sometimes We can calculate a median with grouped data so long as it was originally in continuous (interval/ratio) form and our midpoints are meaningful.

Standard Deviation

Square Root of the squared deviations about the mean

Standard deviation

Square root of the squared deviations about the mean

Sample Statistic

Statistic (i.e., mean, proportion, etc.) obtained from a sample of the population

Pearson correlation coefficient (pearson's r)

Statistic that quantifies the direction and strenght of the relationship between two interval/ratio level variables

Confidence intervals

Statistical interval around a point estimate (e.g., mean) that we can provide a level of confidence for capturing the true population paramater

t test

Statistical test used to test several null hypotheses including the difference between two means small population

Z test

Statistical test used to test several null hypotheses including the difference between two means Use this for large samples

A contingency table is generally defined by the number of columns and rows in the table. T or F

T

A distribution that had two modes would be considered bimodal? T or F

T

A linear regression equation predicts a score on one variable from a score on another variable and that is based on the relationship between the two variables. T or F

T

A researcher believes that crime rises as individuals get older, but starts to decrease as they are past 30. This would be an example of a nonlinear relationship. T or F

T

A researcher has to compare percentage differences found in the categories for the independent variable at the same category of the dependent variable. T or F

T

A researcher wants to understand the probability of jury verdicts (guilty versus non-guilty). This is an example of a variable that would be considered a binomial variable. T orn F

T

A type I error occurs when we reject the null hypothesis, even though it is true T or F

T

Chi-square allows for the rejection of a null hypothesis of independence but does not tell the researcher the magnitude or strength of that relationship. T or F

T

In a normal distribution the mode, median, and mean will all be the same. T or F

T

In the regression equation the predicted y value is based on an estimate and therefore can only be considered a best guess. T or F

T

Point estimates are the estimate of the mean and proportion that we obtained from a sample. T or F

T

The mean is calculated by summing up all of the scores for a particular variable and then dividing by the number of cases. T or F

T

The mode may give a misleading notion of the central tendency of the data. T or F

T

The null hypothesis indicates that there is no difference while the alternative hypothesis indicates that there is a difference between the means of two groups. T or F

T

The range and interquartile range use only two scores to estimate the amount of dispersion, making them more limited measures than the variance and standard deviation. T or F

T

The variation ratio is a measure of dispersion that is appropriate to use for variables such as religion, gender, and race. T or F

T

Unlike the mean, the median is not influenced by extreme scores, either low or high. T or F

T

When a researcher says they are 99% confident, they are saying that the population mean will not fall within the confidence interval 1% of the time. T orF

T

When an event is an independent event, its occurrence does not effect nor is it effected by another event's occurrence. T or F

T

When calculating the variance and standard deviation for grouped data, one would use the midpoint of the group instead of the individual case score. T or F

T

When estimating the regression model two degrees of freedom are lost because two parameters are being estimated. T or F

T

When trying to obtain greater confidence the researcher loses precision. T or F

T

Gamma (Yule's Q)

Tells us the strength or magnitude of a relationship between two ordinal level variables. tells us how closely two pairs of data points "match". Gamma test for an association between points and also tells us the strength of association. The goal of the test is to be able to predict where new values will rank. For example, if score A scores "LOW" for question 1 and "HiGH" for question 2, will score B also result in a LOW/High response? Gamma can be calculated for ordinal (ordered) variables that are continuous variables (like height or weight) or discrete variables (like "hot" "hotter" and "hottest"). The gamma coefficient ranges between -1 and 1. 1 = perfect positive correlation: if one value goes up, so does the other. -1 = perfect inverse correlation: as one value goes up, the other goes down. 0 = there is no association between the variables

Chi Squared test

Tests the null hypothesis that two categorical variables are independent of each other Alpha level degrees of freedom type 1 & type 11

Linear Relationship

The effect of x on y is generally the same at all values of x

Validity

The goal of validity is when the statements/ conclusions that you make are true about Empirical reality

What does the expected cell frequencies table represent? A directional hypothesis. A nondirectional hypothesis. An alternative hypothesis. The null hypothesis.

The null hypothesis.

Non Probability Sampling (quota)

These methods are not based on random selection and do not allow us to know in advance the likelihood of any element of a population being selected for the sample

Probability sampling methods

These methods rely on random selection or chance and allow us to know in advance how likely it is that any element of a population is selected for the sample

Which of the following would be a reason to create a scatterplot between two variables? To calculate the degrees of freedom To get a sense of the strength of the relationship To see if the variables are categorical or continuous To see if the group means are the same

To get a sense of the strength of the relationship

Which type of error occurs when we fail to reject the null hypothesis when we should reject it? Non-descript error Statistical error Type I error Type II error

Type II error- The probability of committing a Type I error is called the significance level. This probability is also called alpha, and is often denoted by α. Type II error. *A Type II error occurs when the researcher fails to reject a null hypothesis that is false.*

Degrees of freedom (df)

Value nessesary along with an alpha value to find the critical value and region for a null hypothesis or confidence interval

Critical Value

Value that corresponds with an alpha level for any particular null hypothesis or confidence interval.

Which measure of dispersion is appropriate to use with nominal level variables? Variation ratio Range Variance Standard deviation

Variation ratio

Reliability

When it yields consistent scores or observations from a given phenomenon on different occasions. which is a pre req for measurement of validity

Causal Validity(Internal validity)

When we can assume that our independent variable did cause the dependent variable

Negative Linear relationship

X and Y go in opposite directions

Positive linear relationship

X and Y increase/decrease in the same direction

Multiple coefficient of determination (r squared)

You get a value of R squared when 2 or more independent variables are predicting a dependent variable R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. The definition of R-squared is fairly straight-forward; it is the percentage of the response variable variation that is explained by a linear model. Or: R-squared = Explained variation / Total variation R-squared is always between 0 and 100%: 0% indicates that the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean.

Which of the following is the best example of an alternative hypothesis? Younger people are more likely to use social media than are older people. Fords are no more likely to be in an auto accident than any other model of car. Generic drugs are just as effective in treating illnesses as brand-name drugs. The networks of all the different cell phone providers are the same.

Younger people are more likely to use social media than are older people.

Which test statistic and sampling distribution do we use when we have one population mean and a large sample? Z test and distribution T test and distribution Student's t test and distribution Chi-squared test and distribution

Z test and distribution

The null hypothesis states that the µ = 110 and the alternative hypothesis states µ < 110. The mean of the sample was 109 with a standard deviation of 3. After calculating the t statistic at the α .05 level, the research would reject the null hypothesis. T or F

`F

Cumulative Frequency Distribution

a frequency distribution used to for interval/ratio data used to keep a running tally of sum of the frequency distribution

If two data sets have the same range: a. the distances from the smallest to largest observations in both sets will be the same. b. the smallest and largest observations are the same in both sets. c. both sets will have the same standard deviation. d. both sets will have the same interquartile range

a. the distances from the smallest to largest observations in both sets will be the same.

The Median is insensitive to high and low scores - called "outliers" - in a frequency distribution. -always -sometimes -never

always

Mean

average Can be used at the interval/ratio level

Mutually exclusive intervals

class intervals must not overlap ex; the first measure ensures that each individual answer given in a single or multiple response survey question cannot be true at the same time.

Exhaustive (collective) intervals

class intervals must provide a place to count all original values of the variable distribution ex;the second measure ensures that all answers given for the question cover the entire realm of possible answers. Survey writers listing answers in an aided single or multiple response question need to make sure that all potential answers to the particular question are listed for the respondent.

The probability of an event not occurring is called the . opposite of an event complement of an event inverse relation of an event façade of an event

complement of an event

A binominal distribution has a variable that: consists of three categories. consists of only one category. consists of multiple categories. consists of two categories.

consists of two categories.

Interval

continuous data -Interval scales are numeric scales in which we know not only the order, but also the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example of an interval scale in which the increments are known, consistent, and measurable. Interval scales are nice because the realm of statistical analysis on these data sets opens up. For example, central tendency can be measured by mode, median, or mean; standard deviation can also be calculated. Here's the problem with interval scales: they don't have a "true zero." For example, there is no such thing as "no temperature." Without a true zero, it is impossible to compute ratios. With interval data, we can add and subtract, but cannot multiply or divide. Bottom line, interval scales are great, but we cannot calculate ratios, which brings us to our last measurement scale...

A one-tailed test is also referred to as which of the following? single-point examination mono-polar directional nondirectional

directional

The statement "μ > 15" is an example of a: directional alternative hypothesis. non-directional alternative hypothesis. non-directional null hypothesis. directional null hypothesis.

directional alternative hypothesis.

In statistical terminology, we always either reject the null hypothesis or . accept the null hypothesis accept the alternative hypothesis reject the alternative hypothesis fail to reject the null hypothesis

fail to reject the null hypothesis If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g., either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept the alternative hypothesis. You should note that you cannot accept the null hypothesis, but only find evidence against it.

Ordinal

has rank order -With ordinal scales, it is the order of the values is what's important and significant, but the differences between each one is not really known. Ordinal scales are typically measures of non-numeric concepts like satisfaction, happiness, discomfort, etc. "Ordinal" is easy to remember because is sounds like "order" and that's the key to remember with "ordinal scales"-it is the order that matters, but that's all you really get from these. Advanced note: The best way to determine central tendency on a set of ordinal data is to use the mode or median; the mean cannot be defined from an ordinal set.

What is the range for the following set of data? 1, 2, 6, 4, 10, 9, 12, 5, 9, 6, 13 9 10 11 12

highest number-least number = 12*

Class intervals

in creating a grouped frequency distribution, the class interval defines the range of values that are included in each interval n statistics, the data is arranged into different classes and the width of such class is called class interval. Class intervals are generally equal in width but this might not be the case always. Also, they are generally mutually exclusive. Class Intervals are very useful in drawing histograms. A class interval is a way to divide data and group certain answers together. When determining the class interval to use with your data, you must follow three rules: The same person or unit can be in only one class interval. The width or range of numbers in the class intervals must be equal. There are no numbers left out of the groupings.

Measures of Dispersion really only have practical use for what level of measurement? - nominal - ordinal - interval/ratio - all the above

interval/ratio

Population

larger set of cases or aggregate number of people that a researcher is actually interested in or wishes to know something *bigger populations are better for sampling.

The chosen alpha level can also be referred to as the . standard mean level of significance population parameter critical window

level of significance

Correlation

measure the linear correlation between two interval/ration level variables

Measures that capture differences within a variable are called: measures of central tendency. measures of dispersion. summary measures. standard deviations.

measures of dispersion.

mode

most frequent Can be used at the Nominal, Ordinal, or interval/ratio level

Degrees of Freedom (DF) = n n - 1 n + 1 n / x

n - 1

At what point do t-distributions and z-distributions appear virtually identical? n ≥ 30 n ≥ 60 n ≥ 120 n ≥ 1,000

n ≥ 120

Nominal

name alone -Nominal scales are used for labeling variables, without any quantitative value. "Nominal" scales could simply be called "labels." Here are some examples, below. Notice that all of these scales are mutually exclusive (no overlap) and none of them have any numerical significance. A good way to remember all of this is that "nominal" sounds a lot like "name" and nominal scales are kind of like "names" or labels.

The Range is susceptible to ________ in data distributions. modes variances outliers standard deviants

outliers

What is the probability of two independent events occurring simultaneously? p(A and B) = p(A) + p(B) p(A and B) = p(A) - p(B) p(A and B) = p(A) x p(B) p(A and B) = p(A) / p(B)

p(A and B) = p(A) x p(B)

Another way of expressing a confidence interval is . point estimate ± standard deviation point estimate ± margin of error margin of error ± standard deviation Median ± margin of error

point estimate ± margin of error

The estimates of the mean and proportion that are obtained from a sample are referred to as ____________________ of the same values in the population. sample statistics point estimates population parameters confidence levels

point estimates

What is the fundemental aspect of probability sampling

random selection

As a Measure of Dispersion, the _______ is simply the difference between the highest and lowest score in a distribution. -variance -standard deviation -range -variation ratio

range

median

rank Can be used at the interval/ratio level

In the regression equation β is the: variable score regression coefficient constant degrees of freedom

regression coefficient

Hypothesis tests are sensitive to . external factors researcher error outliers sample size

sample size

In a given frequency distribution, the Mode is represented by one number only. -Always -Sometimes -Never

somtimes

The __________ of a distribution of scores for a variable is measured by the __________________. mode; symmetry variance; variability mean; modality standard deviation; variability

standard deviation; variability Variability (also called spread or dispersion) refers to how spread out a set of data is. Variability gives you a way to describe how much data sets vary and allows you to use statistics to compare your data to other sets of data. The four main ways to describe variability in a data set are: Range Interquartile range Variance Standard deviation.

Population Parameter

statistic obtains from a population since we rarely have entire population data we typically estimate population parameters

The slope coefficient (b) measures the ______________ of the linear relationship between the independent and dependent variable while the correlation coefficient indicates the _______________ of the relationship. strength; direction form; strength direction; form strength; form

strength; direction

Population Sample

subset of the population that a researcher most often use to make generalizations about a larger population

Lambda

tells us the strength of a relationship between two nominal level variables helps with predictions It may range from 0.0 to 1.0. Lambda provides us with an indication of the strength of the relationship between independent and dependent variables.

Cramers V

tells us the strength of a relationship between two nominal level variables. statistical measure of association that quantifies the strength or magnitude of a relationship between two nominal level variables. Cramer's V is the most popular of the chi-square-based measures of nominal association because it gives good norming from 0 to 1 regardless of table size, when row marginals equal column marginals. V defines a perfect relationship as one which is predictive or ordered monotonic, and defines a null relationship as statistical independence, as discussed in the section on association. However, the more unequal the marginals, the more V will be less than 1.0.

Deciding to use a directional versus non-directional alternative hypothesis most directly affects: alpha (α) the critical value(s) the degrees of freedom the sample size

the critical value(s)

Deciding to use a directional versus non-directional alternative hypothesis most directly affects: alpha (α) the sample size the critical value(s) the degrees of freedom

the critical value(s)

The distance of a score from the mean is referred to as: the range. the variance. the standard deviation. the mean deviation score.

the mean deviation score.

The mode is defined as: -the mathematical average. - the middle score of a distribution that splits it into two equal halves. - the most frequently occurring score in a distribution. - the average of all the midpoints in a distribution.

the most frequently occurring score in a distribution.

When testing the null hypothesis a researcher begins with the assumption that: the researcher can prove the null hypothesis is false. the researcher can prove the alternative hypothesis is false. the alternative hypothesis is true. the null hypothesis is true.

the null hypothesis is true.

Variable Vs. Constant

variables have different values, while a constant has only one value

The ________ is a Measure of Dispersion that can be used with nominal and ordinal level data. standard deviation variance mode variation ratio

variation ratio The variation ratio is a simple measure of statistical dispersion in nominal distributions; it is the simplest measure of qualitative variation. It is defined as the proportion of cases which are not in the mode category: Just as with the range or standard deviation, the larger the variation ratio, the more differentiated or dispersed the data are; and the smaller the variation ratio, the more concentrated and similar the data are.

Measurement Validity

when we have actually measured what we intended to measure

What is the equation for a straight line? y = mca y = ax + bc + n/df y = a + x2 y = a + bx

y = a + bx

Population mean symbol

μ


Ensembles d'études connexes

Marketing Chapter 9: Product, Branding and Packaging Decisions

View Set

troubleshooting advanced services

View Set

Study Guide 6 - Part 3 - Analyzing Adjustments and Extending Account Balances on a Work Sheet

View Set

EMT Abbreviation (patient assessment)

View Set

Presidents: # 16 Abraham Lincoln, Rep., 1861-1865

View Set