biostats midterm
symbols for (a) sample standard deviation (b) population standard deviation (c) sample variance (d) population variance
(a) s (b) σ (c) s^2 (d) σ^2
the study is an ___ since ___
-an observational study -the subjects were not given any treatment
when trying to better understand the IQ data, what is the advantage of examining the histogram instead of the frequency distribution?
-it is easier to see the distribution of the data -it is easier to see the location of the center of the data -it is easier to see the spread of the data
what is wrong with this type of sampling method?
-many people may choose not to respond to the survey -responses may not reflect the opinions of the general population
the given value is___ because the numerical measurement describes a characteristic of a ______
-parameter -population
What term is used to describe this type of survey?
-the respondents are a voluntary response sample -the respondents are a self- selected sample
in the stemplot, the data are arranged in order from __ to ___ in ___ order thus, the first lead in the first steam corresponds to the ___ data value __ and the last leaf in the last stem corresponds to the ___ data ___
1. lowest 2. highest 3. increasing 4.lowest 5. 38 6. highest
Let A= the events of getting at least one defective pacemaker battery when 3 batteries are randomly selected with replacement from a batch. Write a statement describing event A
A is the event that no pacemaker batteries are defective when 3 bacteries are randomly selected with replacement
What is a scatterplot and how does it help us?
A scatterplot is a graph of paired (x, y) quantitative data. It provides a visual image of the data plotted as points, which helps show any patterns in the data.
A study has shown that there is a correlation between body weight and blood pressure. Higher body weights are associated with higher blood pressure levels. Can researchers conclude that gaining weight is a cause of increased blood pressure?
No. Correlation does not imply causality.
Let even G= subject has glaucoma and let event Y= test that indicates "yes" the subject has glaucoma. Translate the notation P(Y/G) into a verbal statement
P(Y/G) is the probability that a test subject tests positive for glaucoma given that the subject has glaucoma
When testing a new treatment, what is the difference between statistical significance and practical significance? Can a treatment have statistical significance, but not practical significance?
Statistical significance is achieved when the result is very unlikely to occur by chance. Practical significance is related to whether common sense suggests that the treatment makes enough of a difference to justify its use. It is possible for a treatment to have statistical significance, but not practical significance.
An article noted that chocolate is rich in flavonoids. The article reports that "regular consumption of foods rich in flavonoids may reduce the risk of coronary heart disease." The study received funding from a candy company and a chocolate manufacturers association. Identify and explain at least one source of bias in the study described. Then suggest how the bias might have been avoided.
The researchers may have been more inclined to provide favorable results because funding was provided by a party with a definite interest. The bias could have been avoided if the researchers were not paid by the candy company and the chocolate manufacturers.
A certain medical organization tends to oppose the use of meat and dairy products in our diets, and that organization has received hundreds of thousands of dollars in funding from an animal rights foundation.
There does appear to be a potential to create a bias. There is an incentive to produce results that are in line with the organization's creed and that of its funders.
Based on the scatterplot, what can be concluded about a linear correlation?
There does not appear to be a linear correlation because there is no general pattern to the data.
In what sense are the mean, median, mode, and midrange measures of "center"?
They use different approaches for providing a value (or values) of the center or middle of the sorted list of data.
the ___ event A occurring are the ratio P(A)/P(A)
actual odds in favor of
when using the ___ always be careful to avoid double counting outcomes
additional rule
which if the following is not a principle of probability
all events are equally likely in any probability procedure
which of the following would not cast doubt of the usefulness of sample data?
an effective sampling method
which word is associated with multiplication when computing probabilities?
and
since the amount of time spent exercising____ countable, the data are from a ____ data set
are not continuous
to determine customer opinion of their inflight service, continental airlines randomly selects 30 flights during a certain week and surveys all passengers on the flights. which type of sampling is used?
cluster
a ____ probability of an event is a probability obtained with knowledge that some other event has already occured
conditional
a ___ random variable has infinitely many values associated with measurements
continuous
what methods used that summarize or describe characteristics of data are called____ statistics
descriptive
a ___ random variable has either a finite or a countable number of values
discrete
events that are ___ cannot occur at the same time
disjoint
the __ of a discrete random variable represents the mean value of the outcomes
expected value
the heights of the bars of a histogram correspond to___ values
frequency
a ___ helps us understand the nature of the distribution of a data set
frequency distribution
an ___ uses line segments to connect points located directly above class midpoint values
frequency polygon
we utilize statistical ___ to look for features that reveal some useful or interesting characteristics of the data set
graphs
which of the following would be classified as categorical data?
hair color
which is always true?
in a symmetric and bell shaped distribution, the mean, median and mode are the same
two events A and B are ___ if the occurrence of one does not affect the probability of the occurrence of the others
independent
a measure of the effectiveness of an influenza vaccine is the number needed to treat, which is 39 (under certain conditions) Interpret that number. Does the result apply to every particular group of 39 subjects?
interpret the number to treat: we need to treat 39 subjects with the influenza vaccine in order to prevent one case of influenza does the result apply to every particular group of 39 subjects? the result can be used to approximate the number needed to treat when there are large number of subjects, but not to every particular group of 39 subjects
the entries are white blood cell counts and heights from male subjects examined as part of a large study conducted by a health organization. The data are matched, so that the first subject has a white blood cell count of 8.7 and a height of 70.8 and so on. Given the context of the data in the table, what issue can be addressed by conducting a statistical analysis of the measurements?
is there a relationship or an association between white blood cell count and height
of the 5000 14% were returned. this response rate appears to be low
it creates a serious potential for getting a biased sample that consists of those with a special interest in the topic
In a recent year, a health survey ery, sponsored by a polling company, selected more than 8,000 women who were given eye tests. Subjects elected to be a part of the study by answering a mail-in questionnaire
it is flawed because it is a voluntary response sample
which of the following is NOT a value in the 5-number summary?
mean
does misconduct appear to be a major factor?
misconduct appears to be a major factor because the combined impacts of fraud, duplication and plagiarism are the reason for a significant portion of the retractions
the measure of center that is the value that occur with the greatest frequency is the
mode
if we have a large voluntary response sample consisting of weights of subjects who chose to respond to a survey posted on the internet, can a graph help to overcome the deficiency of having a voluntary response sample?
no, a graph can't help to overcome the deficiency. if the sample is a bad sample, there are no graphs or other techniques that can be used to salvage the data
Is this binomial distribution? certain method of gender selection was designed to increase the likelihood that a baby will be a boy. When 346 couples that used the method and gave birth to 346 babies, the hair colors of the babies were recorded
no, because some trials have outcomes classified into more than two categories
does the graph depict the data fairly?
no, because the vertical scale does not start at zero
Refer to the accompanying data table. The entries are white blood cell counts (1000 cells) and arm circumferences from male subject examined as part of a large study conducted by a health organization. The data are matched, so that the first subject has a white blood cell count of 8.7 and an arm circumference of 31.9 and so on. Given that the data are matched and considering the units of the data, does it make sense to use the difference between each white blood cell count and the corresponding arm circumference?
no, it does not make sense to use the difference between each white blood cell count and the corresponding arm circumference, because these measure different quantities with different units
Cotinine (ng/mL) Frequency 0-99 16 100-199 19 200-299 11 300-399 4 400-499 1 is it possible to identify the exact values of all the original cotinine measurements?
no. frequency table only shows frequencies of data values in each category
a ____ distribution has a "bell"shape
normal
in the binomial probability formula, the variable x represents the
number of successes
in modified box plots, a data value is an ___ if it is above Q3+(1.5)(IQR) or below Q1- (1.5)(IQR)
outlier
____ are sample values that lie very far away from the majority of the other sample values
outliers
the ___ distribution is a discrete probability distribution that applies to the number of occurrences of some event over a specified interval
poisson
pt is the ___ of some characteristic in the ___ and Pc is the ___ of some characteristic of the ___
proportion treatment group proportion control group
a ___ variable is a variable that has a single numerical value, determined by chance, for each outcome of a procedure
random
___ is used when subjects are assigned to different groups through a process of random selection
randomization
which measure of variation is most sensitive to extreme values?
range
a ____ histogram has the same shape and horizontal scale as a histogram but the vertical scale is marked with relative frequencies instead of actual frequencies
relative frequency
in a ____ distribution, the frequency of a class is replaced with a proportion or percent
relative frequency
the range rule of thumb roughly estimates the standard deviation of a data set as
s= range/ 4
the __ for a procedure consists of all possible simple events or all outcomes that can not be broken down any further
sample space
a histogram aids in analyzing the _____ of the data
shape of the distribution
which of the following corresponds to the case when every sample of size n has the same chance of being chosen?
simple random sample
why is it important to learn about bad graphs?
so that we can critically analyze a graph to determine whether is it misleading
the data are at the ordinal level of measurement
such data should not be used for calculations such as an average
to estimate the percentage of defects in a recent manufacturing batch, a quality control manager at Ford selects every 14th truck that comes off the assembly with the fourth until she obtains a sample of 80 trucks. Which type of sampling is used?
systematic
"A researcher selected every 540th social security number and surveys the corresponding person. which type of sampling did the researcher use?
systematic sampling
a man is selected by a marketing company to participate in a pid focus group. the company says that the man was selected because every 20,000th person in the phone number listings was being selected. which type of sampling did the marking company use?
systematic sampling
98.6 98.6 98.0 98.0 99.0 98.4 98.4 98.4 98.4 98.6 why is it that a graph of these data would not be very effective in helping us understand the data?
the data set is too small for a graph to reveal important characteristics of the data
in a study designed to test the effectiveness of a drug as a treatment for lower back pain, 1643 patients were randomly assigned to one of the three groups
the given description corresponds to an experiment because the researchers applied a treatment
what impression does the graph create?
the graph creates the impression that men have salaries that are more than twice the salaries of women
body temperature (in degrees celsius)
the interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, but there is no natural zero starting point
as a procedure is repeated again and again, the relative frequency of an event tends to approach the actual probability. This is known as
the law of large numbers
assume that 1400 births are randomly selected and 37 of the births are girls. Use subjective judgment to describe the number of girls are sign. high, sign. low, or neither
the number of girls is significantly low
which of the following is not a requirement of the poisson distribution?
the occurrences must be dependent
what are the possible values of x?
the possible x-values are 0,1,2.. with no upper limit
what does P(B/A) represent?
the probability of event B occurring after it is assumed that event A has already occurred
P(A or B) indicates
the probability that is a single trail, event A occurs, event B occurs, or they both occur
volumes (cm^3) of brains
the radio level of measurement is most appropriate opiate because the data can be ordered, differences (obtained by substraction) can be found and are meaningful, and there is a natural zero starting point
Is the random variable given in the accompanying table discrete or continuous? Number of Girls, x P(x) 0 0.063 1 0.250 2 0.375 3 0.250 4 0.063
the random variable given in the accompanying table is discrete because there are finite number of values
is x discrete random variable or a continuous random variable?
the random variable is discrete because the collection of possible values of c is infinite by countable
What is the random variable, what are it's possible values and are its values numerical? Number_of_girls P(x) 0 0.125 1 0.375 2 0.375 3 0.125
the random variable is x, which is the number of girls in three births . the possible values are x are 0,1,2,and 3. The values of the random value x are numerical
a research project on the effectiveness of liver transplants begins with the number of hospitals that provide liver transplants
the ratio level of measurement is most appropriate because the data can be ordered, differences can be found and are meaningful and there is a natural zero starting point
for 100 births, P(exactly 57 girls)=0.0301 and P(57 or more girls)=0.097. Is 57 girls in 100 births a significantly high number for girls? Which probability is relevant to answering that question? Consider a number of girls to be significantly high if the appropriate probability is 0.05 or less.
the relevant probability is P(57 or more girls) so 57 girls in 100 births is not a significantly high number of girls because the relevant probability is greater than 0.05
a study involved 22,071 male physicians. this is an experiment because the researchers apply a treatment to the individuals. what is the major problem with the study?
the results apply only to male physicians
In this section we use r to denote the value of the linear correlation coefficient. Why do we refer to this correlation coefficient as being linear?
the term linear refers to a straight line, and r measures how well a scatterplot fits a straight-line pattern
which of the following is not requirement of the binomial probability distribution?
the trials must be dependent
is a value of x= 90.3 possible
the value is not possible because the values of x in a poisson distribution are restricted to be integers greater than or equal to zero
which of the following is NOT true about statistical graphs?
they utilize areas or volumes for data that are one- dimensional in nature
which of the following is not one of the three methods for finding binomial probabilities that is found in the chapter on discrete probability distributions?
use a simulation
which of the following is a common distortion that occurs in graphs?
using a two- dimensional object to represent data that are one dimensional in nature
which characteristic of data is a measure of the amount that the data values vary?
variation
which of the following is not a property of the standard deviation?
when comparing variation in samples with very different means, it is good practice to compare the two sample standard deviations
is this a binomial distribution A certain method of gender selection was designed to increase the likelihood that a baby will be a boy. When 266 couples used the method and give birth to 266 babies, whether or not the babies have blue eyes is recorded
yes, because the procedure satisfies all of the requirements for a binomial distribution
which is the following values can't be probabilities? 0, 5/3,1,0.02,3/5, 1.39, √2, -0.51
√2 -0.51 5/3 1.39