CIM 250 Midterm Review
mathematical expression of probability
-probability is always expressed as a value between 0 and 1 - closer to 0 means less probable; closer to 1 is more probable -this means that it is always a fraction, decimal or percent
Emma's probability of winning a school contest is 0.20. What is her probability of not winning the contest?
0.80
Steps in Hypothesis Testing
1. Formulate research question and state null hypothesis (h0) and alternate (h1) hypothesis 2. state a significance (alpha) level 3. conduct relevant statistical test & obtain the test statistics, generate p value and/or confidence interval 4. This allows you to determine whether to accept or reject the null hypothesis.
3 important reasons for random sampling
1. It avoids known and unknown biases on average 2. it helps convenience others that the trail was conducted properly 3. it is the basis for statistical theory that underlies hypothesis tests & confidence intervals
Concern was expressed by the health educators on a particular college campus that students with serum cholesterol levels above the mean level of 195 would be a increased risk of heart disease. If the mean cholesterol level of students is 195, what percentage of students are at risk?
50%
In a normal distribution, about what percent of values lie within one standard deviation of the mean?
68%
Emperical Rule
68% of all data is within one standard deviations of the mean 95% of all data is within two standard deviations of the mean 99.7% of all data is within three standard deviations of the mean
Which of the following would indicate a test with the greatest sensitivity?
99.8% [highest number]
Which of the following would indicate a test with the greatest specificity?
99.8% [highest number]
All of the following are examples of inferential studies except:
A calculation of colon cancer incidence rates in the California population.
normal distribution
A function that represents the distribution of variables as a symmetrical bell-shaped graph.
Histogram
A graph of vertical bars representing the frequency distribution of a set of data.
probability
A number that describes how likely it is that an event will occur
cluster sampling
A probability sampling technique in which clusters of participants within the population of interest are selected at random, followed by data collection from all individuals in each cluster. select a simple random sample of groups
Bias occurs when:
A study sample is not representative of the underlying population
stratified sampling
A type of probability sampling in which the population is divided into groups with a common attribute and a random sample is chosen within each group used when we wish the sample to represent various subgroups of the population proportionally or to increase the precision of the estimate
Which of the following are properties of the normal distribution?
ALL OF THE ABOVE It has the appearance of a symmetrical, bell-shaped curve It is defined by two parameters: the mean and the standard deviation the area under the probability curve is always equal to 1 or 100%
Probability Rule 5
Addition Rule -to determine the probability that one or another event (but not necessarily both) will occur we use the addition rule -states that the probability that event A or event B (or both) will occur equals the sum of the probabilities of each individual event minus the probability of both - P (A or B) = P(A)+ P(B)- P(A and B) - the reason for subtracting the P(A and B) from the equation is that this portion would otherwise be included twice if the events were not mutually exclusive
In which of the following scenarios would the mode be the most appropriate measure to calculate?
An inventory manager at a store wants to determine the most frequently purchased television model
In interpreting the results of epidemiologic studies, the size of a sample is more important than the way in which it was selected.
FALSE
Measures of central tendency are useful when we are trying to draw conclusions from a larger population and apply them to a sample.
FALSE
Suppose we conduct a clinical trial to test the efficacy of a particular treatment. We find that the mean difference between two groups is 40%, with a 95% confidence intervale of (35.0, 42.0). Which of the following interpretations of the confidence interval is true?
If we were to conduct this clinical trial 100 times, in 95 of 100 trials, the mean difference between the treatment groups would be between 35% to 42%.
The mode tumor size in a sample of breast cancer cancer patients is 4 centimeters. Which of these interpretations is correct?
More patients has a tumor size of 4 centimeters than any other tumor size
Probability Rule 4
Multiplication Rule -2 events are independent if the occurrence of one has no effect on the other. the outcomes of coin tosses are independent because the outcome of 1 toss does not affect the another NOTE: independent and mutually exclusive are NOT the same
Suppose we wish to compare treatment efficacy of a drug for two groups in a breast cancer clinical trial. In the treatment group, the drug is effective for 35% of the patients, while in the placebo (control) group, it is effective for 39% of patients. We wish to determine if this difference of 4% is statistically significant or not. We perform a significance test, which yields a p-value of 0.08. If our significance level is set at 0.05, what can we say about the difference?
Since our calculated p-value of 0.08 is greater than our significance level of 0.05, we accept the null hypothesis and conclude that the difference is not statistically significant.
central tendency
Summary measures that describe a whole set of date with a single value around which other values tend to cluster
In simple random sampling, each subject has an equal chance of being selected.
TRUE
Probability is the likelihood that an event will occur.
TRUE
When data are symmetrically distributed, the mean, median and mode are equal.
TRUE
alternative hypothesis
The hypothesis that states there is a difference between two or more sets of data.
relevance
The quality of information that indicates the information makes a difference in a decision.
Probability Rule 2
The sum of the probabilities of all possible outcomes is 1
Which of the following represents information classified on an ordinal scale?
The top five most watch shows on prime time TV
A major advantage of retrospective (case-control) studies is:
They are better at establishing temporal sequence of events than prospective cohort studies.
Specificity is calculated according to which of the following?
True negative/True negative+False positive
Sensitivity is calculated according to which of the following?
True positives/True positives+False negative
The mean is the measure of central tendency best used in which of the following situations?
When data have a relatively symmetric distribution
All of the following are important reasons for random sampling except:
[It decreases the validity of a study] It is the basis for statistical theory that underlies hypothesis tests and confidence intervals. It helps convince others that the trial/study was conducted properly It avoids known and unknown biases on average
All of the following statements about probability are true except:
[Probabilities are expressed as values between 1 and 10] Probability is a numeric expression of uncertainty about an event. The probability of a given event is equal to 1 minus the complement The sum of the probabilities of all possible outcomes in a given situation is always equal to 1
All of the following statements are true about hypothesis testing EXCEPT:
[The null hypothesis in hypothesis testing is that there is a significant difference between groups] Hypothesis testing allows researchers to compare descriptive statistics between two populations or samples. Hypothesis testing allows researchers to make statistical inferences about a population from a sample. Hypothesis testing refers to the formal procedures used to determine that the probability for a given hypothesis is true.
pie chart
a circular chart divided into triangular areas proportional to the percentages of the whole
sampling frame
a complete non-overlapping list of the persons or objects constituting the population
variance
a difference between what is expected and what actually occurs indicates the spread of dispersion of the data, but useful only in practical terms for calculating the standard deviation
line graph
a graph that uses one or more lines to show changes in statistics over time or space
bar graph
a graph that uses vertical or horizontal bars to show comparisons among two or more items
frequency distribution
a graphical representation of measurements arranged by the number of times each measurement was made
population
a set of persons (or objects) having common observable characteristic
statistical significance
a statistical statement of how likely it is that an obtained result occurred by chance
longitudinal study
a study that observes the same participants on many occasions over a long period of time
sample
a subset of a population
To determine the probability that one or another event will occur, we use the ______Rule.
addition
A census is:
an enumeration of an entire population
Probability Rule 1
any probability is a number between 0 and 1
All except which of the following are common elements of a frequency table?
axis
What type of graph would you use if you wanted to display the number of lung cancer cases in 2011 for each of four major race/ethnic groups?
bar chart
continuous variables
can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals.
Which of the following tests is used to compare categorical outcomes from two samples to determine if the differences are statistically significant?
chi-squared test
false positive
classifying a person as diseased when they actually do not have the disease
false negative
classifying a person as not diseased when they actually do have the disease
Probability Rule 3
complement rule [P(not A)=1-P(A)]
inferential statitics
concerned with reaching conclusions from incomplete information- generalizing from specific uses information obtained from a sample to say something about the entire population (opinion poll)
A study of attitudes about flexible workplace policies is being conducted in a large company. An e-mail survey is distributed to a complete list of employees. Participation in the survey is anonymous and voluntary. This is an example of a:
convenience sample
descriptive statistics
deals with the enumeration, organization and graphical representation of data (example census)
The area of statistics that describes the data is called ______ statistics:
descriptive
experiments
design a research plan; imposes controls
The specificity of a test refers to the ability of the test to:
detect negative diagnoses among individuals who do not have the disease
The sensitivity of a test refers to the ability of the test to:
detect positive diagnoses among individuals who actually have the disease
ordinal
do have intrinsic order but differences between levels are not relevant, examples: low, medium and high; age ranges
random sample
every subject has an equal chance at being selected
hypothesis testing
formal procedure used to determine the probability that a hypothesis is true
When you want to make a statement about a population using information from a sample, you use________statistics.
inferential
variables
information on specific characteristics
significance level
known as the alpha level, this refers to the probability of rejecting the null hypothesis when the null hypothesis is true
standard deviation
measure of how spread out the observations/data points are from the mean; it is equal to the square root of the variance; a low standard deviation means the values are clustered around the mean, while a high deviation indicates that they are spread out.
The survival time from diagnosis until death of four cancer patients was as follows: 8 months, 2 months, 1 month, and 3 months. Which of the following measures of central tendency best describes the distribution of survival times?
median
prospective studies
members of the cohort are identified before the outcome occurs advantage: permit the accurate estimation of disease incidence in a population disadvantage: they take a lot of time and they are expensive
The _____________ Rule of probability is used to determine the probability of occurrence of two independent events.
multiplication
nominal
no intrinsic order & the difference between levels of the variable have no meaning, examples: sex, race or exposure
The variable eye color can be classified using a ___________ scale.
nominal
The ______is a number that indicates the probability that measures from two samples or groups are similar.
p-value
A characteristic of a population is called a(n):
parameter
All of the following except __________ are ways to graphically display continuous data?
pie charts
A group of healthy teachers is assembled and followed over time in order to determine how many of them develop breast cancer. This is an example of a:
prospective cohort study
systemic sampling
randomly select a first case then proceed by selecting every nth case, where n depends on the desired sample size
The difference between the highest and lowest value in a data set is referred to as the:
range
survey
represents observations; controls are seldom possible
parameter
set of observations may be summarized by a descriptive statistic
Which is a measure that describes how spread out the observations in a data set are from the mean?
standard deviation
A characteristic of a sample is called a(n):
statistic
An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst then randomly samples 100 buyers of each brand. This is an example of:
stratified sampling
placebos
substances or treatment that have no therapeutic value
A radio station is conducting a promotion where they are giving away a total of 100 free iPhones. Every 10th caller will receiving an iphone until all of them have been given away. This is an example of a:
systematic sample
mean
the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores
range
the difference between the highest and lowest scores in a distribution
null hypothesis
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
median
the middle score in a distribution; half the scores are above it and half are below it
mode
the most frequently occurring score(s) in a distribution
p-value
the number that indicates the probability that measures from 2 samples or groups are similar
Sensitivity
the probability that a clinical test correctly identifies individuals with disease (ie produces a positive result)
Specificity
the probability that a clinical test correctly identifies individuals without disease ( ie produces a negative result)
statistic
the same characteristic pertains to a sample
convenience sample
type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand
measures of variation
useful for measuring how spread out the data are; 3 main measures: range, variance and standard deviation
data
values of the observations recorded for them; raw materials of statistics
discrete variables
variables that are integers; variables that usually consist of whole number units or categories and are made up of chunks or units that are detached and distinct from one another
retrospective studies
where the cohort is identified after the outcome occurs advantages: economical and particularly applicable to the study of rare disease disadvantages: data usually collected for different purposes & may be missing things/incomplete surveys fail to include relevant variables unknown bias frequently hinder such studies