AM/OM/PM Core Boards - Epidemiology and Biostatistics
For mutually exclusive events the probability of A or B happening is equal to: A) P(A) + P(B) B) P(A) + P(B) - P(AB) C) P(A) + P(B/A) D) P(A) X P(B)
The correct answer is: A The formula for mutually exclusive events which determines the probability of either one event or another happening is P(A or B) = P(A) + P(B). Mutually exclusive events cannot occur at the same time.
A Type II error (or a beta error) is: A) Acceptance of a false null hypothesis B) Acceptance of a true null hypothesis C) Rejection of a false null hypothesis D) Rejection of a true null hypothesis
The correct answer is: A A Type II error is a type of sampling error due to chance. A Type II error is the acceptance of a null hypothesis when it is in fact false. The probability of making a Type II error is beta.
One standard deviation (SD) is equivalent to what percentage? A) 34.135% B) 47.725% C) 49.865% D) 68.27%
The correct answer is: A A number falling within one standard deviation from the mean in either direction will be 34.135% on either the left or the right.
For a positively skewed distribution the: A) Median is always to the left of the mean B) Mean and median are equal and to the right of the mode C) Mode is always equal to the median and always to the right of the mean D) Most of the values are towards the upper end of the scale
The correct answer is: A A positively skewed distribution is also said to be skewed to the right. Most of the values are on the lower end of the scale or to the left of the mean. Thus, the middle value or median is to the left of the mean.
A random variable: A) May be either discrete or continuous B) Is called "random" because it depends on the normal distribution C) Is called "variable" because it refers to the variance D) Is called "variable" because the laws of chance have no effect on its value
The correct answer is: A A random variable is a variable that can take on different values as a result of a random experiment. It is a discrete random variable if the values it can take on are limited, such as integers only. It is a continuous random variable, however, if it can take on any value in a given range.
Between column variance, within column variance, F Ratio and grand mean are all terms associated with: A) ANOVA B) Regression analysis C) Chi-square test D) Non-parametric statistics
The correct answer is: A ANOVA, which stands for analysis of variance, is associated with all of the terms mentioned in the question. An analysis of variance procedure involves many steps and is quite time consuming to do by hand. Among the calculated values are the between column variance, the within column variance, the F Ratio, and the grand mean. See "Statistics for Management" by Richard I. Levin and David S. Rubin for a good discussion on the ANOVA procedure.
If, in a t-test, alpha is 0.02: A) 2 percent of the time we will say that there is a real difference, when there really is not a difference B) 2 percent of the time we will make a correct inference C) 98 percent of the time we will make an incorrect inference D) 2 percent of the time we will say that there is no real difference, when in reality there is a difference
The correct answer is: A Alpha is the significance level of the test and is also a Type I error. This occurs when we reject the null hypothesis when it is true, or in this case, saying that there is a real difference, when there really is not a difference.
Which one of the following is a proper null hypothesis? A) H[O]:µ equals 100 B) H[O]:µ is less than 100 C) H[O]:µ is greater than 100 D) H[O]:µ is not equal to 100
The correct answer is: A Answers B, C, and D are all examples of an alternative hypothesis. Answers B and C are used for a one tailed test and D is used as an alternative hypothesis for a two tailed test.
Which of the following is an example of a dichotomous scale? A) Gender B) Race C) Blood pressure D) Severity of symptoms (mild, moderate, severe)
The correct answer is: A Dichotomous data are discrete, not ranked or ordered, and have only two possible categories. Other examples include alive/dead, smoker/non-smoker, and HIV negative/HIV positive.
You measure the IQ's of first-year medical students. In looking at the distribution curve, you note that this distribution is severely positively skewed. The value associated with the measure of central tendency of a distribution that is found the farthest to the right on this distribution is: A) Mean B) Mode C) µ D) Median
The correct answer is: A If a distribution is positively skewed, the skewed tail extends to the right. In a skewed distribution the median is between the mean and the mode because it is less affected by the outlying values than the mean and less affected by the frequency of a given value than the mode. The mean will be drawn out into the skewed tail and from the choices given will be the measure of central tendency that will be found the farthest to the right.
The Poisson distribution can be used as an approximation of the binomial distribution when: A) n is large and p is small B) n is small and p is small C) n is large and p is large D) n is small and p is large
The correct answer is: A If n is relatively large, say 20 or more, and p is small, say 0.1 or less, then the Poisson distribution approximates the binomial distribution quite well.
All of the following statements regarding a histogram are correct EXCEPT: A) The class intervals must all be equal B) Each data point must fit into only one class C) Classes cannot overlap D) Open classes are not allowed
The correct answer is: A It is not necessary that class intervals be equal. However, it is not recommended that intervals be unequal; otherwise, the histogram will/can be misleading.
All of the following are measures of central tendency EXCEPT: A) Standard deviation B) Median C) Mode D) Mean
The correct answer is: A Measures of central tendency indicate the location about which the data points tend to congregate. The standard deviation is a measure of dispersion.
Ratios of two variances drawn from the same normal population are described by which one of the following distributions: A) F B) Student's "t" C) Chi-square D) Normal
The correct answer is: A The F distribution is used in the F hypothesis test. This is used in the ANOVA and compares two estimates of the population variance. The first estimate of the population variance is based on the variance between the means of the samples and is divided by the second estimate of the population variance which is based on the variances within the samples themselves. If the null hypothesis is true, obviously these two estimates will be approximately the same and this ratio will have a value, the F statistic, which will be close to 1. The F distribution has a pair of degrees of freedom, one for the numerator and one for the denominator.
If the correlation coefficient is = .87, the coefficient of determination is equal to: A) .7569 B) .87 since they are equal for values of r is less than 1.0 C) .9327 D) 1.74
The correct answer is: A The coefficient of determination is equal to r^2. Hence, given the correlation coefficient r, one would need to simply square the correlation coefficient: (.87)^2 = .7569
Which of the following statements is correct? A) The higher the correlation, the better the regression equation estimate B) The lower the correlation, the better the regression equation estimate C) Regression estimates are better made with positive than with negative correlation D) The lower the correlation, the greater is the likelihood that homoscedasticity exists with respect to the predicted variable
The correct answer is: A The regression equation is an estimate of the linear relationship of the points. The higher the correlation, the closer this line is to representing the linear nature of these points.
The hypergeometric distribution is: A) Used to describe sampling without replacement from a finite population where there are several outcomes for each trial B) A continuous distribution C) A discrete distribution with its expected value equal to its variance D) The limiting distribution of the sum of several independent discrete random variables
The correct answer is: A These are some of the conditions for the use of the hypergeometric distribution. The binomial distribution could be considered to be a special case of the hypergeometric distribution. For more information see "Elementary Business Statistics" by Freund/Williams/Perles.
The expression P(x) = [µ^x * e^(-µ)] / x! represents the: A) Poisson distribution B) Pascal distribution C) Hypergeometric distribution D) Binomial distribution
The correct answer is: A This can be compared to the Poisson formula in any statistics book. However it is wise to examine the equations on the CQE exam closely to make sure that they have not been changed in any way to render them invalid. In quality control work the value u in the above equation may be replaced by the terms np (the mean of the binomial distribution) where n is the number of trials or sample size and p is the probability of success. The Poisson distribution can be used to approximate the binomial distribution when certain conditions are met, i.e., when p is small and n large.
The F-statistic is associated with problems involving: A) ANOVA and inferences about standard deviations B) Chi-Square distributions C) Inferences about means D) Inferences about unequal variances
The correct answer is: A When performing an analysis of variance test (ANOVA), the F-statistic is computed. Normality and equal variances are assumed in ANOVA. Wilcoxon's signed rank test and Wilcoxon's sample ranks test are examples of tests used when normality and equal variance are not satisfied. Chi-Square tests involve the Chi-Square distribution. Inferences about means usually involve the assumption of normal distributions for large samples and the "t" distribution for small samples.
In a distribution skewed to the right: A) The mean and median are of equal value B) The mean is greater than the median C) The median is greater than the mean D) The mode is greater than the mean
The correct answer is: B A distribution skewed to the right, or positively skewed, has a longer tail among the higher values. Therefore, the mean will be larger than the median and the mode. In a symmetrical distribution, the mean, median, and mode will be the same.
Given a finite population, when computing the standard error of the mean, a finite population multiplier is necessary: A) When the sample is small relative to the population size; e.g., n is less than 5% of the population size N B) When the sample is taken without replacement and the sample size is large relative to the population size C) Whenever the population is finite and the sample small relative to the population size D) All of the above
The correct answer is: B A finite population multiplier is used when the sample size is large relative to the finite population, i.e., n > 5% or when sampling without replacement. It is not used for sampling with replacement even if the population is finite.
If the sample coefficient of determination is = 1, this indicates: A) No correlation between the independent and dependent variable B) The regression line is a perfect estimator C) The correlation coefficient is = 0 D) The regression line shows an inverse relationship between the variables
The correct answer is: B A perfect correlation is indicated by r^2 = 1, therefore every data point lies exactly on the regression line and hence it is a perfect estimator. The strong correlation between the independent and dependent variables does not, in itself, establish a causal relationship.
A null hypothesis assumes that the rates of morbidity and mortality in the immediate post-op period following CABG are the same for Hospital A (a tertiary medical center) and Hospital B (a community hospital). The type II error is to conclude that: A) There is a difference in morbidity and mortality rates when there actually is not B) There is no difference in morbidity and mortality rates when there actually is C) Hospital A has higher morbidity and mortality rates when it actually does not D) Hospitals A and B have the same morbidity and mortality rates when they actually do
The correct answer is: B A type II error is accepting the null hypothesis when it is false. A type I error is rejecting the null hypothesis when it is in fact true.
If we use the sample standard deviation to estimate the population standard deviation, this is an example of a(n): A) Interval estimate B) Point estimate C) Coefficient of variation D) Use of the standard error concept
The correct answer is: B An interval estimate, Answer A, would contain an interval between which the population standard deviation would fall along with an associated probability of that happening. Answers C and D have nothing to do with this question.
You measure the IQ's of all first-year medical students. In looking at the histogram, you note that this distribution is severely positively skewed. The value associated with the measure of central tendency of a distribution that is found to the left of the others is: A) Mean B) Mode C) Sigma D) Median
The correct answer is: B If a distribution is positively skewed, the skewed tail extends to the right. In a skewed distribution the median is between the mean and the mode because it is less affected by the outlying values than the mean and less affected by the frequency of a given value than the mode. Since the mode is the value that is most frequent in the distribution, it will be at the peak and the mean will be in the skewed tail (which in this case is to the right), and the median will be to the right of the peak but before the mean. Therefore the mode is to the left of the mean and the median.
If in testing an assumption about a population mean, we accept the alternative hypothesis: A) The sample mean must have fallen within the acceptance region B) The sample mean must have fallen within the rejection region C) A larger sample should be taken and the hypothesis tested again D) The level of significance was probably too large and it should be reduced to increase beta
The correct answer is: B If the sample mean falls within the rejection region, we reject the null hypothesis and accept the alternative hypothesis.
A major advantage of using non-parametric statistics is: A) Sample sizes can be reduced significantly B) No assumption needs to be made regarding normality of the distribution C) It is easier to determine the correct hypothesis D) It is not necessary to worry about the significance level
The correct answer is: B It does not matter if a distribution is normal. However, estimates using non-parametric methods may be much less precise. Also, information is lost because of ranking rather than using actual values. However, nonparametric tests are very useful when the data contain extreme outliers.
All but one of the following statements about linear regression analysis are true. Select the statement that is FALSE: A) Linear regression analysis is used to describe an association between two variables B) The linear regression equation is frequently useful for extrapolating data C) If the slope of the linear regression is negative, the relationship is inverse D) Customarily, the independent variable is plotted on the x axis and the dependent variable is plotted on the y axis
The correct answer is: B It is dangerous to extrapolate from a regression equation. One should not make predictions outside the range of the given data. However, this may be common practice due to the ease of doing so and the impression that time and money may be saved by avoiding additional testing.
Which of the following is an example of a nominal scale? A) Gender B) Race C) Blood pressure D) Severity of symptoms (mild, moderate, severe)
The correct answer is: B Nominal data are discrete, not ranked or ordered, and have more than two possible categories. Other examples include childhood immunizations, cardiac medications, and ethnicity.
When performing a hypothesis test, a Type I error is: A) Accepting a null hypothesis when it is true B) Rejecting a null hypothesis when it is true C) Accepting a null hypothesis when it is false D) Rejecting a null hypothesis when it is false
The correct answer is: B Statistically, it is possible to get a sample in which the sample statistic falls outside the acceptance region but the null hypothesis is indeed true. This is called a Type I error.
The distribution of a characteristic is negatively skewed. The sampling distribution of the mean for large samples is: A) Negatively skewed B) Approximately normal C) Positively skewed D) Bimodal
The correct answer is: B The Central Limit Theorem states that the sampling distribution of the mean for large samples will be approximately normal. This is one of the most significant theorems in statistics.
The Chi-Square test: A) Is useful for determining if a distribution is discrete or continuous B) Can be used to determine if a particular probability distribution is an appropriate distribution for data we are considering C) Is used to test for significance of the difference between two sample means D) Is only appropriate for discrete distributions
The correct answer is: B The Chi-Square test can be used to determine the appropriateness of a discrete or continuous distribution. Other tests are used to test significance of the difference between sample means.
How many outcomes are possible when performing a single trial of a binomial experiment? A) One B) Two C) Three D) Four
The correct answer is: B The binomial distribution requires that there are only two possible outcomes from a single trial. Either heads or tails, immunized or not immunized, good or bad, etc..
The binomial distribution is a discrete distribution and may be used to describe: A) Sampling without replacement from a finite population B) The case of n independent trials with probabilities constant from trial to trial C) The case of n independent trials with several outcomes for each trial D) Sampling without replacement from a finite population where there are several outcomes for each trial
The correct answer is: B The binomial distribution requires that there be only two possible outcomes from each trial. Positive or negative for example. In addition the probability of an outcome must be constant. When flipping a coin the probability of getting a head is always 0.5.
For two events, A and B, which one of the following is a true probability statement: A) P(A or B) = P(A) + P(B) if A and B are independent B) P(A or B) = P(A) + P(B) if A and B are mutually exclusive C) P(A and B) = P(A) x P(B) if A and B are mutually exclusive D) P(A or B) = P(A) x P(B) if A and B are independent
The correct answer is: B The equation is correct for mutually exclusive events. All of the other equations are incorrect and are not true statements. As an example: equation D would be correct if it stated that P(A and B) = P(A) x P(B) if A and B are independent.
The price in dollars of a bottle of 100 pills of Antibiotic X from a sample of 8 different pharmacies is: 12.00, 12.50, 11.80, 12.10, 11.90, 12.60, 11.80, 11.80. The median cost of the bottle of pills is: A) 11.80 B) 11.95 C) 12.00 D) 12.06
The correct answer is: B The median is the middle value of the data arranged in ascending or descending order. When there is an even number of data points, it is the average of the middle two, in this case 11.9 and 12.0.
All of the following statements regarding the normal distribution are true EXCEPT: A) The mean, mode and median are the same value B) The normal distribution is a discrete distribution C) The tails never touch zero D) The normal distribution is symmetrical
The correct answer is: B The normal distribution is a continuous distribution.
The standard deviation of a set of data: A) Is the square of the variance B) Is the square root of the variance C) Is the sum of the square roots of the deviation of each data point from the mean D) Is the product of the square roots of the deviation of each data point from the mean
The correct answer is: B The standard deviation, s, is a measure of variation. If you know the variance, s^2, the standard deviation is easily calculated by taking its square root.
Which of the following statements concerning the coefficient of simple linear correlation, r, is not true? A) r = 0.00 represents the absence of a relationship B) The relationship between the two variables must be nonlinear C) r = 0.76 has the same predictive power as r= -0.76. D) r = 1.00 represents a perfect linear relationship
The correct answer is: B There are no requirements made concerning the linear relationship between the variables. The relationship between two variables may linear, curvilinear or the variables may be unrelated. r = 1.00 represents a perfect positive linear relationship, and r = -1.00 represents a perfect negative linear relationship.
The beta risk is the risk of: A) Selecting the wrong hypothesis B) Accepting a hypothesis when it is false C) Accepting a hypothesis when it is true D) Rejecting a hypothesis when it is true
The correct answer is: B This is a Type II error. A Type I error is the alpha risk, or the risk of rejecting the null hypothesis when it is true. Accepting the hypothesis when it is true is the power of the test.
The standard deviation as a percent of the mean is called: A) Relative precision B) Coefficient of variability C) Standard deviation of the mean D) Standard error
The correct answer is: B This is also referred to as the coefficient of variation and is equal to: (the standard deviation of the population/the mean of the population) x 100.
The sum of the squared deviations of a group of measurements from their mean divided by the number of measurements equals: A) sigma B) sigma^2 C) Zero D) X
The correct answer is: B This is the mathematical definition of population variance written in words. The formula is: sigma^2 = [(x-µ)^2] / n.
If a distribution is skewed to the left, the median will always be: A) Less than the mean B) Between the mean and the mode C) Greater than the mode D) Equal to the mean
The correct answer is: B When a distribution is skewed, either left or right, the median will always be between the mean and the mode. This is because the median is less affected by outlying values than the mean and less affected by the frequency of a given value than the mode.
The correlation coefficient for two variables is determined to be .98. This indicates: A) The independent variable is the direct cause of the dependent variable B) The dependent variable and the independent variable are not well related to one another C) A high degree of relationship between the two variables D) A negative linear relationship between the two variables
The correct answer is: C A high correlation coefficient does not mean one variable causes the other, only that there is a strong relationship between the two variables. In this case, there is a strong (positive) linear relationship, i.e., as the independent variable increases, the dependent variable increases.
If a Venn Diagram is used to represent non-mutually exclusive events A and B: A) Two separate diagrams must be used B) The events can be represented by non-overlapping circles inside a rectangle C) The events can be represented by circles that overlap inside the rectangle D) The events can be represented by a single rectangle without circles
The correct answer is: C A single rectangle can be used, and since the events are not mutually exclusive they must be represented by overlapping circles.
The following tests are all non parametric statistical methods EXCEPT: A) Rank Correlation B) The Mann-Whitney U Test C) Analysis of Variance D) Kruskal-Wallis Test
The correct answer is: C All of the other tests are nonparametric methods. Nonparametric methods require no knowledge of the actual population distribution.
A number derived from the population data which describes the data in some useful way is called a: A) Constant B) Statistic C) Parameter D) Critical value
The correct answer is: C Calculated values such as the mean, variance, and standard deviation which are derived from the population data are called parameters of the population.
In order to use the binomial distribution to describe a process, each of the following conditions must be met EXCEPT: A) Each trial in the experiment must have only two possible outcomes which are mutually exclusive B) The probability of the outcome of any trial must remain fixed C) The trials must be statistically dependent D) Classes of outcomes are referred to as successes or failures
The correct answer is: C In fact, the trials must be statistically independent.
Which of the following is an example of an interval scale? A) Gender B) Race C) Blood pressure D) Severity of symptoms (mild, moderate, severe)
The correct answer is: C Interval data are continuous and ordered. They are measured on a scale with constant intervals. Therefore, the distance between variables has meaning, and variables can be subtracted from one another. Other examples include height.
A lack of statistical significance ( p > 0.05 ): A) Proves that no real difference exists B) Indicates that differences as large as, or larger than, that observed would occur by chance alone less than 5% of the time C) Results in acceptance of the null hypothesis D) Results in rejection of the null hypothesis
The correct answer is: C Lack of statistical significance indicates that differences as large as, or larger than, that observed would occur by chance more than 5% of the time. Therefore, you would accept the null hypothesis. Answer A is wrong because statistical tests do not prove or disprove the null hypothesis.
If, in a t-test, alpha is .05: A) 5% of the time we will say that there is no real difference but in reality there is a difference B) 5% of the time we will make a correct inference C) 5% of the time we will say that there is a real difference when there really is not a difference D) 95% of the time we will make an incorrect inference
The correct answer is: C Rejecting a null hypothesis when there is not a real difference is called a Type I error and is symbolized by the Greek letter alpha. An alpha of 5% means there is a 5% probability of rejecting the null hypothesis when it is true.
A sample of n observations has a mean of S-bar and standard deviation s[x] > 0. If a single observation which equals the value of the sample mean S-bar is removed from the sample, which of the following is true? A) S-bar and s[x] both change B) S-bar and s[x] remain the same C) S-bar remains the same but s[x] increases D) S-bar remains the same but s[x] decreases
The correct answer is: C Removing an observation from the sample equal to the mean does not change the mean. You can easily show this by taking the average of the numbers 1, 2, 3, 4, & 5 and then repeating the average for the numbers 1, 2, 4, & 5. The mean of 3 does not change. The sample standard deviation increases and again you can show this by taking the standard deviation of the two groups of numbers above. While this is not a rigorous proof, it is an easy way to demonstrate what happens.
The coefficient of determination has an advantage over the correlation coefficient in that it is: A) More accurate in indicating the relationship of variables B) Easier to calculate and thus more usable C) A direct measure of the proportion of the variation in Y explained by the regression line D) Negative or positive
The correct answer is: C The coefficient of determination r^2 is directly proportional to the variation in Y as explained by the regression line. For example, r^2 =.78 tells us that 78% of the variation in Y is explained by the regression line. Remember that the coefficient of determination, r^2, is between 0 and 1 (inclusive), whereas, the correlation coefficient, "r" is between -1 and +1 (inclusive).
In a normal distribution the mean plus two standard deviations estimates the _______ percentile of the distribution. A) 5 B) 68 C) 95 D) 99
The correct answer is: C The mean plus 1.96 standard deviations will estimate the point below which 95% of the population will lie. If the question were the mean minus 1.96 standard deviations, then it would be the point below which 5% of the population will lie.
Which of the following statistical measures of variability is not dependent on the exact value of every measurement? A) Interquartile range B) Variance C) Range D) Coefficient of variation
The correct answer is: C The range is the difference between the largest and smallest value in a group of data. It does not depend on the data between these values.
Which of the following measures of variability is not dependent on the exact value of every measurement? A) Mean deviation B) Variance C) Range D) Standard deviation
The correct answer is: C The range is the difference between the largest and smallest value in a group of data. It does not take into account or depend on the data between these values.
The percentage of sample means, when performing a hypothesis test, that would be outside the acceptance region: A) Is the probable value B) Is the value called (beta) C) Is the significance level D) Is the expected value
The correct answer is: C The significance level, or alpha, is the probability of a Type I error, the probability of rejecting the null hypothesis when it is true.
Mutually exclusive events are: A) Events in which the occurrence of one event has no effect on the probability of the occurrence of another event B) Events in which the occurrence of one event is necessary for the occurrence of another event C) Events that cannot happen together D) Events that may occur together or in succession
The correct answer is: C The words mutually exclusive themselves are a tip-off to this answer, i.e., they cannot happen together.
A chrome plating process is known to cause 2 types of injuries. For the sake of simplicity we will call them injury type A and injury type B. These injuries can occur either alone or together. The historical annual rate of occurrence of injury type A is .0769 and injury type B is .2500. They are known to occur together at the rate of .0192. What is the probability, when sampling a group of workers performing this process, of getting either injury A or injury B? A) .0192 B) .1731 C) .3077 D) .3269
The correct answer is: C This an example of events that are not mutually exclusive. The formula to use for this situation is: P(A or B) = P(A) + P(B) - (AB). Thus, P(A or B) = .0769 + .2500 - .0192 = .3077.
In the regression equation y = mx + b, y increases with x in all cases: A) If b is positive B) If b is negative C) If m is positive D) If m is negative
The correct answer is: C This is basic algebra. In this equation m equals the slope of the regression line. Obviously, if m is negative, the relationship between x and y would be inverse.
Which one of the following is a true statement of probability? A) P(E and F) = P(E) + P(F) B) P(E or F) = P(E) + P(E/F) C) P(E or F) = P(E) + P(F) - P(E and F) D) P(E and F) = P(E) + P(F) - P(E and F)
The correct answer is: C This is the formula for the probability for one or another of two events happening when they are not mutually exclusive. If this were a case of mutually exclusive events, the value of P (E and F) would be zero. This latter portion of the equation is omitted by some authors when discussing mutually exclusive events.
A Type I error (or an alpha error) is: A) Acceptance of a false null hypothesis B) Acceptance of a true null hypothesis C) Rejection of a false null hypothesis D) Rejection of a true null hypothesis
The correct answer is: D A Type I error is a type of sampling error due to chance. A Type I error is rejection of the null hypothesis when it is in fact true. The probability of making a Type I error is alpha.
A data point that is extremely far from most of the data is: A) Never due to biological variability B) Automatically excluded from consideration C) An indication to reject the null hypothesis D) Said to be an outlier
The correct answer is: D A data point that is much different from the main body of data is said to be an outlier. An outlier could be caused by an extraneous event, perhaps a measurement error, or by a situation that is not yet understood. Great care should be taken before rejecting such a data point from the analysis.
A parameter is: A) A random variable B) A systematic variable C) A sample value D) A population value
The correct answer is: D A parameter is a numerical value that describes some characteristic of a population, as opposed to a statistic which is a numerical value that describes some characteristic of a sample.
A number resulting from the manipulation of some raw data according to certain specified procedures is called a: A) Sample B) Population C) Constant D) Statistic
The correct answer is: D A statistic is a computed value from a portion of a population called a sample. It is therefore the best answer.
A statistic is: A) The solution to a problem B) A population value C) A positive number between 0 and 1 inclusive D) A sample value
The correct answer is: D A statistic is a numerical value that describes some characteristic of a sample as opposed to a parameter which is a numerical value that describes some characteristic of a population.
Accepting the null hypothesis when it is false is: A) The power of the hypothesis test B) The significance level effect C) An effect of the probable value D) None of the above
The correct answer is: D Accepting the null hypothesis when it is false is called a Type II error, and the probability of such an error is symbolized by the Greek letter (BETA).
The binomial distribution (select the correct statement): A) Is used to describe continuous data B) Is a symmetrical distribution C) Is used when trials are not independent D) Can be approximated by the normal distribution under certain conditions
The correct answer is: D All of the other statements (A,B,C) are false. Even though the normal distribution describes continuous data, it can be used to approximate binomial distribution if certain conditions are met.
When taking samples from a population we can make what is called a point estimate from this sample. Choose the best answer below. A point estimate: A) Is not usually equal to the population parameter B) Can sometimes be sufficient for estimating the population mean C) Is a single value that is used to represent the parameter of a population D) All of these are correct
The correct answer is: D All of these are correct. A point estimate of the mean is a single value derived from a sample and may give a reasonable estimate of the true population parameter. However, it is not likely to be exact.
Which of the following are continuous variables? A) The number of patients in a study with a positive mammogram B) The weights of infants in a newborn nursery C) The noise level in a machine shop D) B and C
The correct answer is: D Answer A is an example of a discrete variable since the results can only be a whole number. For example, a patient cannot have half a positive mammogram. However, Answers B and C can take on any value within a given range and thus are continuous.
Joint probability is the probability of: A) Event A or B occurring B) Event A or B occurring, but not both C) Event A occurring given that event B has already happened D) Events A and B occurring together or in succession
The correct answer is: D Answers A and B are incorrect (and are really the same statement) because either one event or the other will happen, not together or "jointly". Answer C relates to conditional probability.
Pap smears are used widely as a screening test for cervical cancer. If the prevalence of cervical cancer decreases in the population: A) The sensitivity of the screening test increases B) The specificity of the screening test increases C) The specificity of the screening test decreases D) The negative predictive value of the screening test increases
The correct answer is: D As prevalence decreases, more of those screened will be disease-free. It is less likely that an individual with a negative test will have the disease. Sensitivity and specificity are not affected by the prevalence.
The "t" distribution should be used for interval estimates when: A) The sample size is 30 or less B) The population distribution is not normal C) The population standard deviation is not known D) A and C
The correct answer is: D Both conditions, A and C, are necessary. The normal distribution is used if the population standard deviation is known.
The y-intercept value in the least squares linear regression analysis equation is also known as: A) The slope of the estimated regression line B) The least squares indicator C) The regression dependent variable D) An estimated regression coefficient
The correct answer is: D Both the y-intercept represented by the letter "a" and the slope, represented by the letter "b" in the equation Y = a + bX are known as estimated regression coefficients. Do not confuse the dependent variable Y with the y-intercept in this discussion.
You measure the IQ's of all lawyers. In looking at the line graph of the distribution curve, you note that this distribution is severely negatively skewed. The value associated with the measure of central tendency of a distribution that is found to the right of the peak but less than the average value is: A) Mean B) Mode C) µ D) None of the above
The correct answer is: D If a distribution is negatively skewed, the skewed tail extends to the left. In a skewed distribution the median is between the mean and the mode because it is less affected by the outlying values than the mean and less affected by the frequency of a given value than the mode. Since this distribution is skewed to the left, there are no measures normally associated with central tendency in the right tail.
If the correlation coefficient and the coefficient of determination are equal this indicates: A) An inverse relationship between the dependent and independent variable B) A curvilinear relationship C) A causal relationship D) Either no correlation or perfect correlation
The correct answer is: D If both the correlation coefficient and coefficient of determination both = 0 there is no correlation. If they both = 1 there is perfect correlation. These are the only times they are equal.
The Poisson distribution can probably be used as a good approximation of the binomial distribution if: A) p is greater than 0.15 B) The number of trials is less than 10 C) The study is blinded D) None of the above
The correct answer is: D In general, p should be less than .1 and the number of trials 15 or more. Some statisticians may argue that p should be less than .05 with at least 20 trials. While there is no agreement on the exact p and n values, p should be "small" and n "large".
All of the following statements are true EXCEPT: A) In multiple regression, extrapolation beyond the region of observations can lead to erroneous predictions B) Multiple regression may be used to isolate the effect of a variable from other independent and potentially confounding variables C) At least three variables are involved in multiple regression D) Multiple regression involves one independent and two or more dependent variables
The correct answer is: D Multiple regression actually involves one dependent variable and two or more independent variables. It is represented by the equation: y= a + b1x1 + b2x2 + ... b1nn where y is the dependent variable and x1, x2, xn are the independent variables.
In hypothesis testing, (BETA): A) Symbolizes the probability value B) Is the probability of a Type I error C) Symbolizes the probability of rejecting a null hypothesis when it is false D) None of the above
The correct answer is: D None of these is correct. BETA symbolizes the probability of a Type II error; accepting the null hypothesis when it is false. Alpha symbolizes the probability of a Type I error; rejecting the null hypothesis when it is true.
If there is an even number of data points in a set of data: A) The median cannot be calculated B) The mode is equal to the average of the middle two values C) The mean and the median are equal D) None of the above
The correct answer is: D None of these responses is correct. In fact, the median is equal to the average of the middle two values when there is an even number of data points.
Which of the following is an example of an ordinal scale? A) Gender B) Race C) Blood pressure D) Severity of symptoms (mild, moderate, severe)
The correct answer is: D Ordinal data are discrete and ordered. Although categories are graded, they have no mathematical relationship to one another and the distance between variables is not meaningful. (In other words, "severe" is not twice the value of "moderate." Nor can "moderate" be subtracted from "severe").
The principal reason for random sampling is to: A) Make certain that the sample represents the population B) Guarantee that two treatment groups will be comparable C) Give investigators discretion as to whom they include in the study D) Minimize the chance of bias
The correct answer is: D Random sampling gives each member of the population an equal chance of being selected for the study, and each member of the population is as likely to be selected for one sample group as the other (e.g., drug A versus drug B; control versus treatment group). It does not guarantee that the sample represents the population or that two treatment groups will be comparable. Additional statistical methods, such as hypothesis testing, must be performed to obtain this additional information. Random sampling minimizes variables that may bias the results, e.g., selection bias.
Which of the following is an example of a ratio scale? A) Gender B) Race C) Blood pressure D) Weight
The correct answer is: D Ratio data are continuous variables with an origin at zero (indicating lack of the measured variable). Variables have a mathematical relationship in that the ratio between variables has meaning. Other examples include white blood cell count and ferritin levels.
If we drew a large number of samples from a population, we would not be surprised to discover: A) Some differences among the values of the sample means B) A distribution of sample means around some central value C) That many sample means differ from the population mean D) All of the above
The correct answer is: D The central tendency of the sampling distribution of the means comprises one of the most important theorems in statistics, the Central Limit Theorem. Although there may be some differences among sample means and between sample means and the population mean, as the sample size increases, the distribution of sample means approaches the normal (gaussian) distribution.
The coefficient of variation of a population is: A) The population variance divided by the population mean B) The average squared distance between the mean and each data point in the population C) A measure of the absolute variation of the population D) 100 times the population standard deviation divided by the mean
The correct answer is: D The coefficient of variation is the standard deviation expressed as a percentage of the mean. This is a relative measure of dispersion which allows us to more readily compare two distributions.
Suppose that you are blindfolded and five coins are placed before you, each of which is either heads or tails. The probability that you will identify all items correctly is approximately: A) 1.00 B) 0.50 C) 0.25 D) 0.03
The correct answer is: D The coins are either heads or tails. You will either be right or wrong. This is a fixed probability and the events are statistically independent of each other. With this sample size, we use the binomial distribution with a p value of .5 and an n value of 5. This yields the outcome of 0.03. Another approach to this problem is to use the theory of joint probability under statistical independence; P(AB) = P(A) x P(B). Thus P = .5 x .5 x .5 x .5 x .5 = .03125.
When finding a confidence interval for µ based on a sample size of n: A) Having to use s[x] instead of ‚ decreases the interval B) The larger the interval, the better the estimate of µ C) Increasing n increased the interval D) Increasing n decreases the interval
The correct answer is: D The confidence interval can be expressed as the mean plus or minus a certain number of standard errors. The formula for the standard error places the square root of the sample size, n, in the denominator. As n increases, the value of the standard error decreases, and therefore the size of the confidence interval decreases.
When deciding on a confidence interval for estimating the mean, a high confidence level will: A) Reduce the confidence interval B) Require a large sample size C) Require the use of a finite population multiplier D) None of the above
The correct answer is: D The confidence interval will become larger as the confidence level is increased. The confidence level is independent of the sample size, but the interval is reduced with a relatively larger sample size. The finite population multiplier affects the interval but not the level.
A study of the relationship between smoking and the incidence of respiratory infections yielded a correlation coefficient of 1.08. This means that: A) The number of cigarettes smoked is a good predictor of the incidence of future respiratory infections B) The number of cigarettes smoked is a poor predictor of the incidence of future respiratory infections C) A large number of cigarettes smoked produces a protective effect on susceptiblity to future respiratory infections D) You need a new statistician
The correct answer is: D The correlation coefficient is either negative or positive depending on the slope of the regression line and has a maximum absolute value of 1.00.
For a two tailed "t" distribution problem with a 90 percent confidence interval, the t value is 1.729 (degrees of freedom = 19). The number of data points in the sample under question is: A) 17 B) 18 C) 19 D) 20
The correct answer is: D The degrees of freedom for this t value is 19. Since the d.o.f. is one less than the sample size, the number of data points in the sample would be 20.
When you perform "one experiment" with "forty-nine repetitions," what are the fifty experiments called? A) Randomization B) Sequential C) Planned grouping D) Replications
The correct answer is: D The fifty experiments are called replications because they represent the repetition of an observation/ experiment.
The variance calculation of the binomial distribution depends on: A) The number of trials B) The probability of success C) The probability of failure D) A, B and C
The correct answer is: D The formula for the variance of a binomial distribution is ‚ (sigma)^2 = npq where n is the number of trials, p is the probability of a success, and q is the probability of a failure.
The mode of a data set: A) Is the middle value B) Is the most often occurring value C) May not be unique; there can be more than one D) B and C
The correct answer is: D The middle value in a set of data is the median. A disadvantage of the mode is that there can be more than one--in fact several can occur--thus answers B and C are correct.
A new medication is known to produce only two side effects, headache or pruritus or both. If the probability of headaches is .10 and of pruritus is .20, the probability that a patient will have no side effects is: A) .02 B) .28 C) .30 D) .72
The correct answer is: D The probability of no side effects is the same as 1 minus the probability of having a side effect. The probability of having headaches or pruritus is based on the addition rule for non-mutually exclusive events, which is: P(A or B) = P(A) + P(B) - P(AB). Substituting, we obtain P(A or B) = (.10)+(.20)-(.02) = .28 where the probability of P(AB) = .10 x .20 = .02. Subtracting this value, (.28), from 1 we obtain .72.
The use of the range in describing a set of data is disadvantageous because: A) It does not take into account all values in the data set B) It is heavily influenced by extreme values C) It ignores the nature of the variation of the data points D) A, B, and C
The correct answer is: D The range is the difference between the highest and lowest values; thus, all the other data points are ignored. The range also gives no indication as to how the data points are dispersed.
When computing a descriptive number from a set of sample data, the result is called a: A) Continuous variable B) Parameter C) Discrete number D) Statistic
The correct answer is: D The results of statistical calculations from a set of sample data are called statistics. The results from a population are called parameters.
The standard score of a normal distribution is equal to: A) The mean of the distribution divided by the standard deviation B) The difference between the highest and lowest values in the data set C) The absolute deviation divided by the standard deviation D) The number of standard deviations a data point lies from the mean
The correct answer is: D The standard score is also known as the Z score or the Z value and is the number of standard deviations from the mean. For example, a standard score of 2 means the data point is 2 standard deviations in the positive direction from the mean.
Which table should be used to determine a confidence interval on the mean when sigma is not known and the sample size is 10? A) alpha B) CHI^2 C) F D) t
The correct answer is: D The t distribution is used for determining confidence intervals when the sample size is small (less than 30) and the population standard deviation is not known. Note that although the t distribution is thought of as being used for small samples, another condition must exist (i.e., the population standard deviation is not known) before its use is required.
You measure the cholesterol level of 100 patients. Assume that this measurement always follows a perfect normal distribution. You obtain the value 200 from the exact center of the distribution. It is the: A) Mean B) Mode C) Median D) All of the above
The correct answer is: D This is a normal (gaussian) distribution. In a normal distribution, the mean, median, and mode are all equal.
Judgement sampling: A) Is never a good way to take a sample B) Is a form of random sampling C) Relies on the expertise of a single person D) None of the above
The correct answer is: D This is actually a form of nonrandom sampling and there are times when it is the only economical or logical way to take a sample. It is not necessary that only a single person make a decision regarding judgement sampling as implied by Answer C; a group of experts could be consulted.