Psych 200 Stats Exam One
Steps of hypothesis testing
State hypothesis about population parameter μ = 14 hours per week of TV watching Predict sample characteristics M = ~14 Obtain random sample n = 36 college students across US Compare sample data with prediction and draw conclusion about hypothesis
7) A biologist records the number of trout, bass, perch, and other types of fish caught in a local lake during a 2-week period. If the results are organized in a frequency distribution graph, what kind of graph should be used? a. A bar graph b. A polygon c. A histogram d. Either a histogram or a polygon
bar graph
2) What are the two types of frequency distribution graphs? When do you use each?
bar vs histogram When the scores are measured on a nominal or ordinal scale (usually non-numerical values), the frequency distribution can be displayed in a bar graph. For a nominal scale, the space between bars emphasizes that the scale consists of separate, distinct categories. For ordinal scales, separate bars are used because you cannot assume that the categories are all the same size. Histogram: interval and ratio
What is the variance for the following population of scores? Scores: 5, 2, 5, 4
1.5
What is the range for the following set of scores? Scores: 5, 7, 9, 15 4 points 10 points 5 points 15 points
10 points
15) A class consists of 10 males and 30 females. If one student is randomly selected from the class, what is the probability of selecting a male? a. 10/30 b. 1/10 c. 10/40 d. 1/40
10/40
A sample of n = 25 scores has M = 20 and s2 = 9. What is the sample standard deviation?
3
19) A researcher conducts a hypothesis test using a sample from an unknown population. If the t statistic has df = 30, how many individuals were in the sample? a. n=29 b. n=30 c. n=31 d. Cannot be determined from the information given
31
16) For a population with μ = 80 and σ = 20, the distribution of sample means based on n = 16 will have an expected value of ____ and a standard error of ____.
80; 5
5) What is the difference between a Type I and a Type II error? What influences the risk of a Type I error?
A Type I error occurs when a researcher rejects a null hypothesis that is actually true. In a typical research situation, a Type I error means the researcher concludes that a treatment does have an effect when in fact it has no effect. A Type I error occurs when a researcher unknowingly obtains an extreme, nonrepresentative sample. The alpha level for a hypothesis test is the probability that the test will lead to a Type I error. That is, the alpha level determines the probability of obtaining sample data in the critical region even though the null hypothesis is true. Whenever a researcher rejects the null hypothesis, there is a risk of a Type I error. Similarly, whenever a researcher fails to reject the null hypothesis, there is a risk of a Type II error. A Type II error occurs when a researcher fails to reject a null hypothesis that is in fact false. In a typical research situation, a Type II error means that the hypothesis test has failed to detect a real treatment effect. Unlike a Type I error, it is impossible to determine a single, exact probability for a Type II error. Instead, the probability of a Type II error depends on a variety of factors and therefore is a function, rather than a specific number. Nonetheless, the probability of a Type II error is represented by the symbol β, the Greek letter beta.
2) Define and explain the following terms: parameter
A parameter is a value—usually a numerical value—that describes a population. A parameter is usually derived from measurements of the individuals in the population.
What is the relationship between a population and a sample with regard to research?
A population is the set of all the individuals of interest in a particular study. A sample is a set of individuals selected from a population, usually intended to represent the population in a research study. Because populations tend to be very large, it usually is impossible for a researcher to examine every individual in the population of interest.
2) Define and explain the following terms: statistic
A statistic is a value—usually a numerical value—that describes a sample. A statistic is usually derived from measurements of the individuals in the sample. Typically, the research process begins with a question about a population parameter. However, the actual data come from a sample and are used to compute sample statistics.
Grading Variable: A-F: 0-100: Pass / Fail:
A-F: ordinal 0-100: ratio Pass / Fail: nominal
2) How do we write alternative hypothesis?
Alternative hypothesis: there is a change, difference or relationship H1: μ ≠ 14
What position in the distribution corresponds to a z-score of z = -1.00?
Below the mean by a distance equal to 1 standard deviation
What is the purpose for obtaining a measure of central tendency?
Central tendency is a statistical measure to determine a single score that defines the center of a distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group.
Central Tendency Median
Data point that divides the distribution of points in half Works for ordinal data Resistant to outliers Notation Mdn Skewed data or outliers Ordinal scales Undetermined data (e.g., did not complete task) Open-ended data (e.g., 5 or more)
3) What is the difference between descriptive and inferential statistics?
Descriptive statistics are statistical procedures used to summarize, organize, and simplify data. -- Often the scores are organized in a table or a graph so that it is possible to see the entire set of scores. Another common technique is to summarize a set of scores by computing an average. Inferential statistics consist of techniques that allow us to study samples and then make generalizations about the populations from which they were selected.
4) What is the difference between a discrete and a continuous variable? What are examples of each?
Discrete Data: clear spaces between values; usually units counted in whole numbers; finite number of possible values; cannot be divided into smaller numbers and add additional meaning; number of students in class; shoe size; number of home runs; number of questions you answered correctly Continuous Data: falls on a continuous sequence; infinite number of values within the interval; sub values can be divided into smaller and smaller pieces; amount of time required to finish a project; height of children; amount of rain in inches from a storm; weight of a truck; speed of cars
6) Why do we need to calculate effect size in addition to our z-test? How do calculate Cohen's d (know how to use the formula!)? How do we determine magnitude?
Effect Size: Measure of how many units of standard deviation the sample mean is from the original null hypotheses mean Cohen's d measures the distance between two means and is typically reported as a positive number even when the formula produces a negative value. Notice that Cohen's d simply describes the size of the treatment effect and is not influenced by the number of scores in the sample. .2 .5 .8
1) __________ Using the average score to describe a sample is an example of inferential statistics.
False
3) What are degrees of freedom and why do we need to this value for a t-test?
For t statistics, however, this relationship is typically expressed in terms of the degrees of freedom, or the df value (n − 1) for the sample variance instead of sample size (n): As the df value increases, the better a t statistic approximates a z-score. Thus, the degrees of freedom associated with also describe how well t represents z. Degrees of freedom describe the number of scores in a sample that are independent and free to vary. Because the sample mean places a restriction on the value of one score in the sample, there are n − 1 degrees of freedom for a sample with n scores.
1) What is a frequency distribution table? What do the two columns show (don't worry about relative frequency)?
Frequency distributions take a disorganized set of scores and place them in order from highest to lowest, grouping together individuals who all have the same score X and f Highest number X should be at top of table
3) How do changes to scores in a distribution affect the overall mean (we discussed four possibilities)?
If a constant value is added to every score in a distribution, the same constant will be added to the mean. similarly, if you subtract a constant from every score, the same constant will be subtracted from the mean. If every score in a distribution is multiplied by (or divided by) a constant value, the mean will change in the same way.
17) Whathappens to the standard error of M as sample size increases? a. It increases. b. It decreases. c. It stays constant. d. The standard error does not change in a predictable manner when sample size increases.
It decreases
factors that influence t
Larger s2 = larger sM = less likely to find significant difference Larger n = smaller sM = bigger t values = more likely to find significant difference
Central Tendency: Mean
Mean: Sum of all data points divided by the count of data points Cannot use on nominal or ordinal data Most useful; used in inferential statistics Notation Population mean = u Sample mean = M or X̅ Whenever the scores are numerical values (interval or ratio scale) the mean is usually the preferred measure of central tendency. Because the mean uses every score in the distribution, it typically produces a good representative value. Remember that the goal of central tendency is to find the single value that best represents the entire distribution. The mean has the added advantage of being closely related to variance and standard deviation, the most common measures of variability.
5) What are the four types of measurement scales (i.e., variables)? What are examples of each?
Nominal:Unordered named categories (e.g., parties: R-D-I) ; type of donuts Ordinal: Ordered categories, unequal intervals (e.g., contest); ranking of favorite donuts Ratio: Interval plus absolute 0 (e.g., weight in lb); how many donuts are left after bringing them to class Interval:Ordered equal-sized intervals (e.g., depression scores); temperature of donuts
2) How do we write a null hypothesis?
Null hypothesis: there is no change, no difference, no relationship in population H0: μ = 14
Political orientation democrat, republican left, center, right -10....0....10
Political orientation Nominal: democrat, republican Ordinal: left, center, right Interval: -10....0....10
1) What is the goal of measuring variability?
Quantitative measure of the differences between scores (degree of spread or clustering) Variability provides a quantitative measure of the differences between scores in a distribution and describes the degree to which the scores are spread out or clustered together. If the scores in a distribution are all the same, then there is no variability. If there are small differences between scores, then the variability is small, and if there are large differences between scores, then the variability is large. Variability can also be viewed as measuring predictability, consistency, or even diversity. Variability describes the distribution of scores. Specifically, it tells whether the scores are clustered close together or are spread out over a large distance. Usually, variability is defined in terms of distance. It tells how much distance to expect between one score and another, or how much distance to expect between an individual score and the mean. For example, we know that the heights for most adult males are clustered close together, within 5 or 6 inches of the average. Variability measures how well an individual score (or group of scores) represents the entire distribution.
2) What are the two requirements for a random sample?
Random sampling requires that each individual in the population has an equal chance of being selected. A sample obtained by this process is called a simple random sample. A second requirement, necessary for many statistical formulas, states that if more than one individual is being selected, the probabilities must stay constant from one selection to the next. Adding this second requirement produces what is called independent random sampling. The term independent refers to the fact that the probability of selecting any particular individual is independent of the individuals already selected for the sample. Samples that are obtained using this technique are called independent random samples or random samples. Each of the two requirements for random sampling has some interesting consequences. The first assures that there is no bias in the selection process. For a population with N individuals, each individual must have the same probability, p = 1/N, of being selected. You also should note that the first requirement of random sampling prohibits you from applying the definition of probability to situations in which the possible outcomes are not equally likely.
Characteristics of the sampling distribution
Sample means should pile up around population mean Should form a normal distribution Larger sample size, closer sample means are to population mean
What problems arise when we try to make inferences about a population from sample?
Sampling error (i.e., natural discrepancy between sample statistic and corresponding parameter) Samples are variable (i.e., different samples from the same population have different scores, means, etc.)
2) Define and explain the following terms: sampling error
Sampling error is the difference between sample statistic and population parameter; When descriptive statistics show a difference in means between groups but inferential statistics does not indicate this is statistically significant, this could be due to sampling error
Define the concept of "sampling error." Note: your definition should include the concepts of sample, population, statistic, and parameter.
Sampling error is the naturally occurring discrepancy, or error, that exists between a sample statistic and the corresponding population parameter.
1) What is a z-score? What does the numerical z-value mean? What does the sign of a z-score indicate?
Score that has been recoded into how many units of standard deviation it is away from the mean Equation (Score - Mean)/Standard Deviation Statisticians often identify sections of a normal distribution by using z-scores. Z-scores measure positions in a distribution in terms of standard deviations from the mean. - a z-score requires that we know the value of the population standard deviation (or variance), which is needed to compute the standard error The z-score transforms each X value into a signed number (+ or −) so that the sign tells whether the score is located above (+) or below (−) the mean, and the number tells the distance between the score and the mean in terms of the number of standard deviations.
1) What is the goal of central tendency?
Single value in the center of a distribution that is most typical or most representative; measures where the center of the distribution is located mean; median; mode
2) What is the difference between the three measures of variability?
Standard deviation and variance are the most commonly used measures of variability. Both of these measures are based on the idea that each score can be described in terms of its deviation or distance from the mean. Range: Distance covered by scores in a distribution Smallest to largest Equation Range = Xmax - Xmin Variance: Mean squared deviation (average squared distance from the mean) Notation Population = σ2 Sample = s2 Equation Population variance = SS/N Sample Variance = SS/n-1 Standard Deviation: Measure of the standard, or average, distance from the mean Notation Population = σ Sample = s Equation Standard deviation = sqrt of variance
1) What is the goal of hypothesis testing?
Statistical method that uses sample data to evaluate a hypothesis about a population State hypothesis about population parameter μ = 14 hours per week of TV watching Predict sample characteristics M = ~14 Obtain random sample n = 36 college students across US Compare sample data with prediction and draw conclusion about hypothesis
4) What is the relationship between the three measures of central tendency for normally distributed data? Positively skewed? Negatively skewed?
Symmetrical: mean, median, mode all the same Pos: mode --> median --> mean Neg: mean --> median --> mode
3) Which of the following is an example of a continuous variable? a. The gender of each student in a psychology class b. The number of males in each class offered by the college c. The amount of time to solve a problem d. The number of children in a family
The amount of time to solve a problem
Explain what is measured by the sign of a z-score and what is measured by its numerical value.
The sign of the z-score indicates whether it is above or below the mean. The numerical value indicates how many standard deviations a score is away from the mean.
3) How do we calculate the test statistic for a z-test?
The z-score statistic that is used in the hypothesis test is the first specific example of what is called a test statistic. The term test statistic simply indicates that the sample data are converted into a single, specific statistic that is used to test the hypotheses. The t test does not require any prior knowledge about the population mean or the population variance. All you need to compute a t statistic is a null hypothesis and a sample from the unknown population. Thus, a t test can be used in situations for which the null hypothesis is obtained from a theory, a logical prediction, or just wishful thinking.
2) __________ A set of scores ranging from a high of 47 to a low of 6 is organized in a frequency distribution table. Another set of scores ranging from 40 to a low of 4 is organized in another table. If the distributions are shown in a graph, then a polygon should be used.
True
Central Tendency: Mode
Value(s) with the highest frequency Can be more than one Notation None Reported in narrative text Nominal scales Discrete variables
2) How do we calculate a t statistic? Where do we get the variability and standard error?
We use a corresponding value in place of population standard deviation/variance Estimate standard error using sample standard deviation/variance
18) You complete a hypothesis test using α = .05, and based on the evidence from the sample, your decision is to reject the null hypothesis. If the treatment actually has no effect, which of the following is true? a. You have made a Type I error. b. You have made a Type II error. c. You might have made a Type I error, but the probability is only 5% at most. d. You have made the correct decision.
You have made a Type I error.
1) When do we use a t test instead of a z-test?
Z-test Requires that we know more information than is usually available t statistic is used to test hypotheses about a population mean (μ) when value of standard deviation is unknown (σ) Takes into account variance of sample (s2) and sample size (n)
Grouped Frequency Distribution Guidelines
about 10 class intervals Width of each interval should be a simple number (e.g., 5, 10, etc.) Bottom score in each class interval should be a multiple of the width All intervals should be the same width
4) How do changes to variability and sample size affect t-tests?
fAs the df or a sample increases, the better the sample variance represents the population variance
11) __________ If a hypothesis test leads to rejecting the null hypothesis, it means that the sample data failed to provide sufficient evidence to conclude that the treatment has an effect.
false
4) __________ A student takes a 10-point quiz each week in statistics class. If the student's quiz scores for the first three weeks are 2, 6, and 10, then the mean score is M = 9.
false
5) __________ A sample has a mean of M = 40. If a new score of X = 35 is added to the sample, then the sample mean would increase.
false
8) __________ A jar contains 10 red marbles and 20 blue marbles. If you take a random sample of two marbles from this jar, and the first marble is blue, then the probability that the second marble is blue is p = 19/29.
false
9) A researcher measures eye color for a sample of n = 50 people. Which measure of central tendency would be appropriate to summarize the measurements? a. Mean b. Mode c. Median d. Any of the three measures could be used
mode
8) The students in a psychology class seemed to think that the midterm exam was very easy. If they are correct, what is the most likely shape for the distribution of exam scores? a. Symmetrical b. Negatively skewed c. Positively skewed d. Normal
negatively skewed
1) How do we calculate the probability of an event?
number of outcomes classified as A / total number of possible outcomes
4) Using letter grades (A, B, C, D, and E) to classify student performance on an exam is an example of measurement on a(n) _______ scale of measurement. a. Nominal b. Interval c. Ordinal d. Ratio
ordinal
2) A researcher is curious about the average monthly cell phone bill for high school students in the state of Florida. If this average could be obtained, it would be an example of a ____. a. sample b. statistic c. population d. parameter
parameter
4) What does a positively skewed distribution look like? Negatively skewed?
pos has thing on left and neg has thing on right
10) A population of scores has a mean of μ = 26, a median of 23, and a mode of 22. What is the most likely shape for the population distribution? a: Symmetrical b:Positively skewed c:Negatively skewed d:Cannot be determined from the information e:given
positively skewed
5) What scale of measurement is being used when a teacher measures the number of correct answers on a quiz for each student? a. Nominal b. Interval c. Ordinal d. Ratio
ratio
4) How do we make a decision regarding whether to reject or fail to reject the null hypothesis?
reject if p is less than .025 sample data in critical region
1) A researcher uses an anonymous survey to investigate the television-viewing habits of American adolescents. Based on the set of 356 surveys that were completed and returned, the researcher finds that these students spend an average of 3.1 hours each day watching television. For this study, the set of 356 students who returned surveys is an example of a _______. a. parameter b. statistic c. population d. sample
sample
Standard Error
standard deviation / square root of n
SS
sum of squares, is the sum of the squared deviation scores = numerator of variance fraction
10) __________ In general, the null hypothesis states that the treatment has no effect on the population mean.
true
12) __________ Although the size of the sample can influence the outcome of a hypothesis test, it has little or no influence on measures of effect size.
true
13) __________ If all other factors are held constant, increasing the sample size from n = 25 to n = 100 will increase the z-score.
true
14) __________ Compared to a z-score, a hypothesis test with a t statistic requires less information from the population.
true
3) __________ For a set of scores measured on an ordinal scale, the median is preferred to the mean as a measure of central tendency.
true
6) __________ In a population with σ = 4, a score of X = 48 corresponds to z = 1.50. The mean for this population is μ = 42.
true
7) __________ A z-score of z = +1.00 always indicates a location exactly 1 standard deviation above the mean.
true
9) __________ The mean for the distribution of sample means is always equal to the mean for the population from which the samples are obtained.
true
3) What are the properties of a z-score distribution?
when any distribution is transformed into z-scores, the mean becomes zero and the standard deviation becomes one.
3) When would you use a polygon rather than a histogram?
when you want to display more than one distribution
Calculating ∑X?
x times frequency
z score p value; t score p value
z score p val must be less than .025. t score p val must be less than .05.