Statistics Exam 1
Statistical analysis is most applied in what part of the cycle?
Check. Checking requires analyzing the data!
What is the notation for the sum of all the values?
∑ this is symbol for the sum of all the values
What did Shewhart mean when he talked about "chance cause variation"?
Random variation The left over variation that occurs when all the assignable-causes of the variation are considered.
Hans Rosling, in his video The Joy of Stats, presents data that shows that African countries have achieved almost no improvements in life expectancies over the past 100 years.
Hans Rosling shows using a great visual presentation of the data, that there have been tremendous improvements in life expectancy in Africa.
Using the table below, what was the cumulative proportion of patients admitted from January through March? Hint: see page 9 of the lesson 7 presentation.
Hint: see page 9 of the lesson 7 presentation.
Using the table below, what proportion of patients were admitted in August? Hint: see page 9 of the lesson 7 presentation.
Hint: see page 9 of the lesson 7 presentation. Day of the Week | Number of Patients Admitted | Proportion | Cumulative proportion
Approximately, 95% of catheterized patients suffer bacterial invasion after 1 month. In this example what is the 95%?
Probability Probability is the proportion of times an event is expected to occur in the population.
What is a mathematical relation that assigns probabilities to all possible outcomes for a discrete random variable?
Probability mass function
What is the 99% CI for the following: mean = 235, standard deviation=35 and n=455?
230.8 to 239.2 The 99% CI for the following: mean = 235, standard deviation=35 and n=455? First calculate the Stand Error of the Mean (which is standard deviation / square root of n) or 35/21.33=1.64 The 99% CI are 235+/-2.576*1.64=230.8 to 239.2
What is the range of these dataset 5, 12, 3, 8, 9, 10, 25, 14, 8, 18, 23?
25 - 3 = 22
On the Wisconsin Collaborative for Healthcare Quality website under reports (https://reports.wchq.org/ (Links to an external site.)), in their heart care reports, under controlling high blood pressure, for the Q1 2018 to Q4 2018 report, how many patients were reported by Bellin Medical Group for the "percent of patients with controlled blood pressure" measure?
30781 patients
What is the mode in the dataset 3, 3, 6, 5, 6, 7, 4, 8, 9, 9, 6, 10.
6
The incidence rate of new cases of childhood onset type 1 diabetes mellitus is 22 per 100000 in Wisconsin. If there are 38,520 children in Madison approximately how many new cases of type 1 diabetes would be expected to occur each year?
8 (22/100,000)*38,520=8.47 or 8 children with type 1 diabetes expected in Madison, WI
On the Wisconsin Collaborative for Healthcare Quality website under reports (https://reports.wchq.org/ (Links to an external site.)), what was the rate reported for breast cancer screening (look under the "system results") by the Marshfield Clinic in the Q1 2013 to Q4 2014 period?
82.86
Which of the following has a greater chance of rejecting the null hypothesis?
90% power Remember power = 1- type 2 error 90% power means the chances of having a type 2 error is only 10% in this example.
Ethics and informed consent are particularly important for RCTs because hypothesis testing often involves new drugs or treatments, the safety and effectiveness of which are probably unknown. Therefore before subjects are enrolled in these studies the investigators must:
All of these issues are important, especially for RCT's in healthcare that may involve unknown risk (like harmful side effects) Have the study reviewed and approved by the Human Subjects and Ethics Institutional Review Board (IRB) at their research institution. Ensure that potential participants have provided their "informed consent." Ensure the participants know the potential risks and benefits of participating in the study as explained in simple terms in their native language. Ensure that participants are aware that they can withdraw from the study at any time.
What is Kaizen?
Quality improvement as a continuous process requiring teamwork and open communication
Sometimes when continuous data are non-normal and can not be made normal one option is the categories the data into equal groups. What is the term for 5 equal groups?
Quintiles
In probability, what is a numerical quantity that takes on different values depending on chance?
Random Variable Yes, a random variable is a numerical quantity that takes on different values depending on chance.
What does the following indicate: xi - x (bar)
Deviation Yes, this is the difference between a data value and the mean. The standard deviation is the overall average deviation in the data set.
In probability, what is a countable set of possible outcomes?
Discrete random variables Yes, countable finite outcomes. Like rolling a six sided die.
The sample size of cross-sectional studies are often too small to get reliable estimates of rare diseases. Therefore rare diseases are often measured by:
Disease registries Reports of notifiable diseases (for infectious or communicable diseases). We tend to use disease registries to measure non-communicable rare diseases (like type 1 diabetes). For communicable diseases (like pertussis also knowns as hooping cough), we measure these rare conditions by making them reportable to local, state and national health authorities.
What is pseudo-precision?
Reporting results with too many significant digits than your variable measure actually achieved. Yes, you see this a lot in in published research. For example "the mean height of children studied n=50 was 149.86 centimeters. Reporting the results this way implies that the researchers measured the children height to within 100th of a centimeter (very very unlikely).
The law of large numbers states that as n increases the sample mean becomes a better reflection of the population mean.
True
95% CI are calculated as mean + or - (1.96*SEMean)?
True See page 9 of the lesson 13 transcripts.
With large sample sizes, in general, there is greater study power than smaller sample sizes.
True Yes, with large sample sizes we have more power to reject the null. Remember, in general, the tails of the shape of the distribution with large sample sizes are pulled in. see page 30 and 31 of the lesson transcript.
suppose you are studying height at age 18 between two groups young men with asthma (n=100) and young men without asthma (n=100). Rejecting the null hypothesis when you should have accepted the null hypothesis is a ____.
Type I (alpha) error
On the Wisconsin Collaborative for Healthcare Quality website (https://reports.wchq.org/ (Links to an external site.)), in their heart care reports, which medical group reports the highest percent of patients with controlled blood pressure for the Q1 2018 to Q4 2018 reporting period?
UnityPoint Health - Meriter
In hypothesis testing, if mean 1 = 225, 95% CI= 223 to 227 and mean 2=220, 95% CI 118 to 222 would you state that these means are statistically significantly different?
Yes, because the 95% confidence intervals do not overlap the population means are likely significantly different (we would reject the null).
Suppose you are studying height at age 18 between two groups young men with asthma (n=100) and young men without asthma (n=100). If we failed to reject the null hypothesis here when we should have rejected it we are creating a Type 1 (alpha) Error.
false It would be a type 2 (beta) error
The sales of new iphones from 2009 to 2012 is an example of a negative exponential probability distribution.
false It's a positive exponential distribution. Using a graph, this means it low at the start of the x and y axis then increases dramatically as x and y get bigger.
The gravitational center of a group of data is the median.
false It's the middle value of a set of data. 3 4 5 6 82 the median here is 5 and the mean is 20. Notice how the median is more resistant to the influence of the outlier data point of 82?
The Monty Hall problem _____.
is an example of conditional probability Yes, the probability changes once a door is revealed. With 3 doors, if one of the doors is revealed, changes the underlying probability of choosing correctly. It's always better to switch your door choice. You have a 1/3 chance of winning if you stay with your first choice, 2/3 if you switch. It also shows that probability can be difficult to understand!
What type of sampling uses a staged approach where a random selection occurs of a natural groups of individuals (like a School, or Town) and then everyone in that natural group receives the same treatment assignment.
Cluster sampling These are sometime used to evaluate the effect of changes in curriculum in schools or for community interventions where the entire community is either in the intervention or control group.
What type of sampling is it when you sample individuals with a given characteristic as they are presented until enough with that characteristic are acquired (quota)?
Consecutive sampling This is the kind of sampling used in most RCT's in healthcare. A physician see's a certain type of patient and then asked that person to be in a study. This is done until the total sample size require is obtained. It is important to note that this is not simple random sampling and the sample might not be truly representative of the entire population.
What type of sampling is also known a haphazard, volunteer or judgmental sampling?
Convenience sampling Yes, an example of this kind of sampling would be surveying people attending the State Fair. Are these people really representative of the total state population?
Kurtosis is the steepness of the curve of the tails.
True
For a binomial probability, suppose we have five patients and the probability of success is .65. What is the rounded expected value of the population mean µ?
(the expected value (mean) μ is = n∙p) Again please refer to side 13 of lesson 9 the expected population mean μ= n*p = 5*.65 which = 3.25 which rounded = 3.
For a binomial probability, suppose we have five patients and the probability of success is .65. What is the probability of observing 2 successes?
0.1811 See slide 6 and 7 on this lesson. It's giving an example using 4 patients you need to calculate it using 5 patients.
For a binomial probability, suppose we have five patients and the probability of success is .65. What is the total cumulative probability of observing 2 or fewer successes (PrX<=2)?
0.2351 See slide 11 on this lesson 9.
Looking at the standard normal z table (table B in text), what is the z-score that corresponds to the cumulative probability of 0.6700?
0.44 In Table B, find .6700 and look up the associated 10ths on the left side of the table (0.4) and 100th on the top of the table (0.04). Add them together and you get .44
In a binomial random variable where n=10 and p=.2 what is q?
0.8 Correct, the answer is 1-p or 1-0.2= 0.8.
Suppose it is known that the weights of a certain group of individuals are approximately normally distributed with a mean of 140 pounds and a standard deviation of 25 pounds. What is the probability that a person picked at random from this group will weigh between 100 and 170 pounds? (Hint: see slide 16).
0.8301 Sketch a normal curve with mean u=140 and a line at 100 and 170. You are estimating the area between 100 and 170 and this is done by finding the z values for 100 and 170 (the 170 z-value - the 100 z-value). To calculate the z-values see the equation on slide 14. For example, the z-value for 100 = (100-140)/25 or -1.6; the z-value for 170 = (170-140)/25 or 1.2. Look up the cumulative probabilities for these z-scores (0.0548 for z-value 100 and .8849 for z-value for 170). The probability between 100 and 170 is .8849-0.0548 = 0.8301
Study power is defined as:
1 - beta error
List 3 important considerations when comparing data from the clinics in the Wisconsin Collaborative for Healthcare Quality reports.
1) Were all the data for eligible patients reported? 2) Were there data errors? 3) To what extent are differences due to severity of illness differences (or health literacy differences or socio-economic differences) between clinic patient populations? These differences are referred to as case mix. my answer 1. There are different amounts of patients that are included in the data (submitted). 2. If all of the data of patients that have a certain condition are getting submitted to the corresponding registries. There could always be data that is being left out. 3. Different quality indicators to take into account.
For a binomial probability, suppose we have five patients and the probability of success is .65. What is the expected value of the population variance σ^2?
1.1375 Please refer to slide 13 of lesson 9 for the solution. n=5, p=.65, q=1-p or .35 and σ2= n*p*q 5*.65*.35= 1.1375
What is the median value of this dataset 3, 5, 12, 8, 9, 10, 14, 8, 18, 23, 25
10
We have established in a prior exercise that the heights of women in the US vary according to a normal distribution with a population mean µ=163.3 and a population standard deviation of 6.5 centimeters. How tall does a woman have to be to be taller than 95% of women (hint: see page 146 first edition of the book or page 158-160 second edition)?
173.99 Here's the solution: first look up the z -value for .95 which = 1.645. Next use the equation found on slide 15 (z=(x-population mean U)/standard deviation). This would be 1.645=(x-163.3)/6.5. Solve for x by multiplying both sides of the equation by 6.5 then adding 163.3 to both sides of the equation. This leaves you with 173.19=x)
Given a mean=190 and a SEmean = 2.5, what are the 95% CI?
185.1 to 194.9 Given a mean=190 and a SEmean = 2.5, what are the 95% CI? 190+/- 1.96*2.5
The probability of being struck by lightning in your lifetime is 1 in 3,000. If there are 5,686,986 people in Wisconsin. How many Wisconsinites will likely suffer a lightning strike in their lifetime?
1896 (1/3000)*5,686,986= 1895.6 or rounded 1896
Which of the following is an example of a non-experimental (observational) design
Both Cohort and Case Control studies are examples of non-experimental designs because we are not testing an intervention in these studies; we are just observing differences and outcome among groups. Randomized Controlled Trials are experimental design because we are randomly assigning groups of people to an intervention and then following them to an outcome. For example, if we follow a group of people who smoke cigarettes and a group who do not smoke, to determine the rate of heart attacks among the two groups. This is an example of a non-experimental (observational) design. If we take a group of patients with newly diagnosed high cholesterol and randomly assign half to a new drug for lowering cholesterol and half to a statin drug for lowering cholesterol and we follow both groups over time for an outcome (like number of heart attacks per group). This is an example of an experimental design. Studies with experimental designs have a planned intervention.
Approximately 95% of all the data in a normally distributed data set falls between + or - 1 standard deviation.
False About 68% of the data are +/- 1 standard deviation. 95% of the data are +/- 2 standard deviations. 99% of the data are +/- 3 standard deviations
The mean is a more robust summary statistic than a median for a small data set with an outlier.
False Actually, the median is more robust for a small data set with an outlier. For example: 2, 3, 4, 5, 6, 7, 83 The mean =15.7 The median is 5
When making categories from continuous data that are non-normal and can't be normalized, it's best to use quintiles where the data set n <100.
False In general, when making categories from data sets n=100 it's important to make sure that you have fairly large categories each categories should have about 30 or more data values. We will learn why this is important in our lesson 20 on Chi-Square analysis. If you use 5 categories for a data set n=100 you will have only 20 values in each category. In this example, it would be better to use tertiles (3 groups) each with about 33 data values.
The standard deviation is always larger than the sample variance.
False It's smaller because it's the square root of the variance.
As n increases the standard error of the mean increases.
False No it decreases due to the square root law (see 10 and 11 in this lesson)
Every statistically significant finding is also clinically significant?
False See page 37 of the lesson transcript. lesson 12
The mean of the sampling distribution of x-bars is often not equal to the population mean.
False The mean of the sampling distribution of X bar is equal to the population mean.
Even if n is large, the binomial distribution never takes on properties of a normal distribution.
False The sampling distribution of the means has a normal shape even if the population distribution is non-normally shaped.
With large sample sizes, in general the type II (beta) error is larger than with smaller sample sizes.
False the beta error is smaller in general with large sample sizes (see page 30 and 31 of the lesson transcript). lesson 12
Looking at the standard normal z table (table B in text), what is cumulative probability that corresponds to the z-score 1.23?
Find the 10ths (1.2) in the row of table B and (0.03) in the 100ths column. The answer is .8907.
In a one sample z-test for a mean, if our population mean= 185, n=85 and standard deviation = 30. If we found a sample mean of 195 what would be our calculated z-value be?
First calculate the standard error of the mean which is sd/sq root n= 30/9.219=3.25 Then calculate the Z value = (sample mean - population mean)/ standard error of the mean = (195-185)/3.25 or 10/3.25= 3.08
On the Wisconsin Collaborative for Healthcare Quality website under reports (https://reports.wchq.org/ (Links to an external site.)), what does the general historical trend in breast cancer screening in the Gundersen Health System patient population over time? View the breast cancer screening trends for Q1 2004 thru Q4 2018. Scroll down to find the data for the Gundersen Clinic and click on the "historical data" icon.
It's about the same, perhaps slightly improved
Why do you begin analyzing data by first running frequency distributions including histograms and scatter plots?
Look for outliers Check for data entry errors Check that quantitative (continuous) variables are normally distributed Check to see if you need to combine categories for categorical data This is an essential first step in analyzing data. Remember the GIGO rule? Your data should be error free before you start the analysis. You also have to look at the shape of your data as this point to how you will analyze it. We will learn about this more in future lessons.
If you found a two-sided p-value of p=0.17 would you reject the null hypothesis?
No No, a p=0.17 is not good evidence to reject the null. In general we look for p values of less than 0.05 to reject the null.
Suppose you are studying height at age 18 between two groups young men with asthma (n=100) and young men without asthma (n=100). What would your null hypothesis be?
No difference in height between the two groups. Correct, you are testing that there is no difference in height between young men with asthma and young men without asthma.
Deming's believed management should use quotas to judge worker productivity and conduct annual appraisals for workers.
No, Deming didn't approve of quotas. He was advocating about changing people and processes. He wanted workers to drive quality improvement at every step of the process. For example, in healthcare, it's not simply how many patients get treated but more importantly working to make sure they have good health outcomes. Is our system in the US set up to do this? We are not there yet!
In Kahneman's flight trainer example, rewarding the student pilots with money for a great landing appeared to work.
No, it did not appear to work because if they were paid for a great landing odds were their next landing as closer to their mean. It did appear that yelling at them for a poor landing worked, because once again their next landing would be better—closer to their mean. The flight trainers didn't understand the concept of regression to the mean.
The major mathematical understanding of probability came from the ancient Greeks who were gamblers.
No, it really started later with the Romans.
Deming's ideas were rapidly taken up in US manufacturing.
No, it took many years to adopt Deming's approaches here in the US. Now they are widely used. In healthcare, there are many extensions of Deming's approaches including Six Sigma, Lean, etc.
Given the same sample size, the 90% CI are wider than the 99% CI.
No, the 99% are wider. It means we 99% certain that the true population mean is in this range. See page 10 of the lesson transcripts.
Gerolamo Cardano introduced the term "probabilis."
No, the Roman Cicero introduced this term.
Non-experimental designs use random assignment to a treatment group.
No, they do not use a randomized design.
As the sample size increases, the 95% CI get larger (wider).
No, they get smaller. See page 17 of the lesson 13 transcript.
Suppose you are studying height at age 18 between two groups young men with asthma (n=100) and young men without asthma (n=100). If we reject the null hypothesis we would always use a one-sided alternative hypothesis (conclude that the young men with asthma are taller than young men without asthma).
No, we would use a two-sided alternative if we reject the null in this example. We do not know if the mean height of young men with asthma would be taller or shorter than the mean height of young men without asthma.
Deming expanded on Shewhart's quality improvement cycle. The order of the cycle is plan, act, check, do.
No. It's Plan, Do, Check, Act.
If you know the direction of your alternative hypothesis (i.e. that it will be greater that your comparison mean) would you use a two-sided or one-sided p-value?
One-sided
What type of sampling is it when each individual in the population has an equal chance of ending up in the sample?
Simple random sampling This is the optimal type of sampling but it requires that you know how many people are in the total population. Each individual in the population is assigned a number. Individual numbers are randomly selected for the study sample
The square root of the sample variance = ?
Standard deviation
What type of sampling is it when first divide the population on a characteristics and then randomly sample with these groups?
Stratified sampling
A patient is undergoing a major coronary bypass surgery. The probability of success is 80%. In this example what would the "event" in probability terms be?
Suscess or failure An event is an outcome or set of outcomes.
What type of sampling starts with a random selection from the population then selects every "nth" individual until the sample size is full?
Systematic sampling Don't forget that the start is random then it take every Nth person.
Believing that the outcome of an independent random event may influence the outcome of a future independent random event is an example of what?
The Gambler's Fallacy This is when people attribute an outcome based on the previous outcome, even though each of these events are independent. For example, saying you are on a lucky streak when playing dice.
On the Wisconsin Collaborative for Healthcare Quality website under healthcare reports (https://reports.wchq.org/ (Links to an external site.)) under cancer care, what is the defined measure for breast cancer screening? (Hint: click the "i" next to the breast cancer measure).
The percentage of women age 50 through 74 who had a minimum of one breast cancer screening test during the two year measurement period.
Which of the following is an example of a binomial random variable?
The probability of being alive 5 years after cancer treatment Yes, it's a binomial random variable. The patient is either alive or dead at year 5 post treatment. One of two possible outcomes
In the scenario in Question 7, "how many people were living or staying in the house, apartment, or mobile home on April 1, 2000" is the _____.
The variable Correct. In this example, the entire record of the survey for that household would be the observation (in a spreadsheet the row), the question about house hold members is the variable (the column in a spreadsheet) and the number of individuals residing in the household on that date is the variable (the cell number in a spreadsheet).
Which of the follow is true about probability density functions
They are used for continuous random variables. They obey all the rules of probability They come in many shapes (like the normal curve pdf)
What is the term used to describe a mathematical change in the shape of the data distribution from skewed to normal?
Transformation Remember that sometimes you can transform the data, by using a log or a square root transformation, to "normalize" the data. In healthcare, we often have to do this because clinic data such as LDL cholesterol, systolic blood pressure, BMI and other variables are often positively skewed in clinic data (that's because they contain a lot of high values)
Why do we use sampling?
To try and get a representative unbiased (error free) sample of the population. Because we often can get reliable estimates without having to study the entire population Because studying the entire population is often too time consuming and expensive
In hypothesis testing, we seek evidence against the null as a way of bolstering the alternative hypothesis.
True
Matching two datasets based on a key variable in common to both of them, such as medical record number, is called data linking.
Yes, and this is done using a "Key" variable, like Patient Medical Records Number in healthcare.
In healthcare, the central starting point in process improvement is making a careful care management plan and systematically collecting patient data.
Yes, and this is why electronic health records are so important in this process. They give us the opportunity to evaluate and change healthcare. You picked a good career!
Before electronic health records were widely available it was extremely difficult and costly to gather and evaluate patient data.
Yes, prior to electronic health information it was very difficult to collect patient data. It required abstracting medical records, finding and then recording the data. As a result, most patient health outcome studies were done at academic institutions (which had students / grant funding to do these chart reviews).
W. Edwards Deming said the key to improve was reducing variation.
Yes, quality can be improved by reducing or eliminating outliers. For example, in healthcare, understanding and reducing long hospital stays, complications from procedures, etc.
Blinding (or Masking) in Randomized Controlled Trials is when the subjects do not know which treatment group they are in. This minimizes potential errors or bias).
Yes, single masking (or blinding) occurs when the subjects don't know what group they are. Double masking is when the subject and investigators don't know who is in which group. Triple masking is when even the statistician does not know which subjects are in which group (subject data are coded and upon final analysis the code is broken). Masking help prevent bias.
Generally, continuous, discrete and categorical variables can be summarized using tables.
Yes, tables are great for presenting any kind of data but if you want to really show differences in percents, etc. it's best to graph the data.
When datasets are large (n>40) the plot of the frequency of the values starts to look like a normal curve?
Yes, the larger the data set the more likely the data are to follow a normal distribution. However, this is not always the case, so you should always plot the data first before doing any further analysis. Remember there are other reasons to plot the data too. Remember these other reasons?
A sample proportion approaches the true probability in the population as n increases.
Yes, this is true. Think about the Monty Hall link example is lesson 1b. As the number of simulations increased, say from 100 to 1000, the estimates of keeping with your original door get closer to the true population probability of success of 1/3 (33%). The more simulations you do (larger n) the closer you get to the true population probability.
Although events appear to be random, by systematically collected data on all the possible outcomes the possible occurrence of future events can be estimated.
Yes, we can simulate the probability of events if we understand how often they are likely to occur.
The best presentation approach to display the relationship between smoking group (non-smoker, current smoker, ex-smoker) and gender (male, female) would be _____.
bar graph
What kind of variable is HIV positive (yes or no). Select the best answer:
categorical Yes, in this example it has to distinct categories. Categorical variable can sometime have many categories, like selecting a race / ethnicity category box on a survey.
Cohort studies begin by identifying a group of cases and the selecting a comparable control groups. Investigators then ask individuals in both groups about potential exposures.
false No. This question describes a Case Control Study. A traditional Cohort Study follows a group of people who are exposed to factor and a group who are not exposed to that factor and follows them overtime to determine relative risk of disease in these two groups. For example, following a group of people who smoke vs. a group that does not smoke to determine rates of lung cancer in both groups. In a traditional cohort study no one has disease at the start of the study; they just have exposure to a factor. A case control study starts with cases of disease and a control group without disease and has them recall possible exposures related to their disease. For example, we might have a group of lung cancer cases and a control group without lung cancer and have them recall their smoking history. We might find that the odds of smoking is much greater in the lung cancer group compared to the lung cancer free control group.
Historically, humans have always had a good understanding of probability and estimating random events
false Our understanding of probability can after other forms of mathematics, such as geometry and algebra.
An asymmetrical shape with a long right tail is a negative skew.
false The direction of the skew indicates whether the skew is positive or negative. If the right tail is longer then it's a positive skew. If the left tail is longer it's a negative skew. Remember that sometimes you can transform the data, by using a log or a square root transformation, to "normalize" the data. In healthcare, we often have to do this because clinic data such as LDL cholesterol, systolic blood pressure, BMI and other variables are often positively skewed in clinic data (that's because they contain a lot of high values).
As p-values get small and smaller, there is less evidence to reject the null hypothesis
false There is greater evidence against the null hypothesis.
The number of new cases occurring in a population over time is ______.
incidence Yes, Incidence is the number of new cases occurring in a population between two points of time. Prevalence is the number of cases with disease at one point in time. They give you different measurements. For example it's important to measure the prevalence of HIV on a population to determine how much HIV is in a population. However, if you measure the incidence of HIV it can indicate the characteristics of new cases - which is very important to monitoring and preventing this disease. For example, if you know who is a new case (incident case), you might be better able to target prevention programs to stopping or reducing new cases.
Random Variable
is a numerical quantity that takes on different values depending on chance.
What is the notation for sample size?
n
What kind of variable is the following 3-level glucose variable: Normal glucose, pre-diabetes, diabetes. Select the best answer.
ordinal yes, an ordinal variable has levels of severity in it - like cancer stage (stage I - IV, with IV being the most advanced stage). Technically, an ordinal variable is a categorical variable where the order of the categories is very important (the order provides important information).
What kind of variable is systolic blood pressure (mm/hg). Select the best answer.
quantitative It's a continuous variable on a scale like height or weight.
In a one sample z-test for a mean, if our population mean was 185, n=85 and standard deviation = 30. If we found a sample mean of 195, would we reject or accept the null hypothesis?
reject First calculate the standard error of the mean which is sd/sq root n= 30/9.219=3.25 Then calculate the Z value = (sample mean - population mean)/ standard error of the mean = (195-185)/3.25 or 10/3.25= 3.08, this is quite a large z value (way out on the tail end of the normal distribution) so we are almost certainly going to reject the null. The Z probability (found on table B of the text, page 543) for 3.08 is 1-0.9990 =0.001 so you would have very good evidence to reject the null.
Taking three consecutive blood pressure measurements on the same subject allow you to determine the _____ of the blood pressure measure.
reliability Correct. Reliability can be determined by repeating the measurements. In a dart board example with three darts and highly reliable throw would be getting all three darts to cluster closely together. Validity is how truthful the measurement is. In other words, it it measuring what it supposed to be measuring. In the dart board example, high validity is hitting the bulls eye if that's what you are aiming at. Ideally, when measuring something, you want high reliability and high validity (in the dart board example, you want the darts to cluster closely together on the bulls eye). In healthcare, suppose you are evaluating the accuracy of a new automatic blood pressure (BP) arm cuff. You can take repeated measurements to determine the reliability of the BP monitor. Do these measurements agree with each other? If they do, the BP device has good reliability. To determine the validity of the new BP monitor, you can compare the new BP device readings with the results from a really expensive and accurate BP monitor. Is there good agreement between these two measures? If yes, then you have good validity.
When the data from the census for are entered into a spreadsheet format. What do the rows correspond to?
the observation Correct. In this example, the entire record of the survey for that household would be the observation (in a spreadsheet the row), the question about house hold members is the variable (the column in a spreadsheet) and the number of individuals residing in the household on that date is the variable (the cell number in a spreadsheet).
Consider the following scenario. You are completing the US census form (like the form shown in this lesson). You enter "4" in to the question asking "how many people were living or staying in the house, apartment, or mobile home on April 1, 2000." What is the "4"?
the value Correct. In this example, the entire record of the survey for that household would be the observation (in a spreadsheet the row), the question about house hold members is the variable (the column in a spreadsheet) and the number of individuals residing in the household on that date is the variable (the cell number in a spreadsheet).
When the data from the census for are entered into a spreadsheet format. What do the columns correspond to?
the variable Correct. In this example, the entire record of the survey for that household would be the observation (in a spreadsheet the row), the question about house hold members is the variable (the column in a spreadsheet) and the number of individuals residing in the household on that date is the variable (the cell number in a spreadsheet).
In the classic cohort study, none of the study subjects has disease at the start of the study. They are grouped according to exposure and then followed up over time until some study participants develop disease.
true
Population values are called parameters.
true
The Central Limit Theorem states that even though the population can be skewed, the sampling distribution of x-bars is normal.
true
The probability of disjoint (not overlapping) events can be added.
true
The standard deviation of the sample distribution of the x-bars is called the standard error of the mean.
true
Discrete numerical data contains a finite number of results.
true Discrete data is information that can be categorized into a classification. Discrete data is based on counts, usually counted as whole numbers. There a finite number of possible values. For example the number of days hospitalized is usually rounded to the nearest whole day. The number of cancer patients treated by a hospital each year is discrete but patient weight is continuous.
GI=GO conveys that your statistical conclusions are only as good as the data being analyzed.
true GIGO stands for garbage in - garbage out. The quality of your results rests on the quality of the data. This is always important to think about, especially in healthcare. Always think about the quality of the data being analysed and how this might impact your results.
Gerolamo Cardano was a physician who like to gamble.
true He also published a book on probability in gambling.
Both experimental and non-experimental designs deal the studying the relationship between explanatory variables and response variables.
true Response variables are outcomes and explanatory variable are factors that are associated (sometime causally shown to be a risk factor) for the outcome. For example, we might conduct a case control study that shows an association between smoking cigarettes (explanatory variable) with the eye disease macular degeneration (response variable).
95% confidence intervals can be interpreted as: 95% of the confidence intervals for means will capture the population mean.
true See page 4 of the lesson transcripts. lesson 13
Capture-Recapture studies, like census Post Enumeration Surveys, use multiple sampling to estimate population size.
true These methods are used to estimate wildlife populations (like the number of rabbits in a field or fish in a lake) hence the name "capture - recapture." These methods are being used widely now in healthcare too to evaluate the completeness of registries, estimate disease incidence and prevalence, etc.
Randomization is important in Randomized Controlled Trials (RTCs) because it balances lurking variables among the treatment groups (minimizes confounding effects).
true This is a very important component of the Randomized Controlled Trial (RCT) design. Randomization attempts to make the groups equal in all respects other than the planned intervention. If you take a group of patients (n=500) with high blood pressure and randomly assign haft (250) to a new blood pressure control drug and half (250) to a known blood pressure comparison drug, the groups should be similar in terms of age, gender, other conditions etc. A word of warning though, sometime, due to chance, the randomization might not work perfectly and some differences between groups can occur. This is why Table 1 of most of these RCT's publications show the baseline characteristics of the two groups - to demonstrate that the randomization process worked and the groups are comparable with respect to certain major characteristics that might influence the outcome of the study (like age, gender, etc).
Cross-sectional surveys study a sample of the population at one point in time.
true Yes a cross-section survey measures a population at one point in time. Sometimes repeated cross sectional surveys are done on the same population to detect changes in health conditions over time. For example, we can survey our HIMT350 class during the first week of the semester to determine the number of colds in our students. We could also survey our class every month to estimate how the number of cold at each survey changes in our student population.
Availability bias is like cherry picking evidence to support an argument. It's not objectively considering all the evidence.
true This kind of bias exists in healthcare too—for example, when a clinician remember a group of patients with a given condition more than others for some particular reason. They might think that their outcomes are better if they selectively remember those patients. It's better to review all the patients with a given condition and not too just rely on memory when evaluating patient outcomes.
In a binomial distribution, if n=40, p=.3 and q=.7 would you use a normal approximation?
yes Because we use the rule n*p*q=>5 then use the normal approximation 40*.3*.7=8.4
What is the notation for the population mean?
μ Yes this is mu, the symbol for population mean.