The scientific method
The scientific process: Study design Avg. daily gain of control calves over 30 day feeding trial (in hundred pounds)
-If we sample 25-30 calves receiving a control diet, we might observe something like this, where the green dots represent the individual calves. Over a 30 day period, some calves gained very little weight (maybe even lost some), some calves gained a lot of weight, and most were somewhere around 75 pounds for the period. We would then draw a black line over those dots, theorizing that, if we measured performance of all calves ever fed this diet, they would generate this line. But recognize this line is an approximation, based only upon these 25-30 calves.
Establishing statistical significance Most journals set a value of __________ in order to be considered 'statistically significant' •This threshold is termed __________ •The calculated _________ is compared to the pre-established α to determine if results are 'significant'
0.05 (5%) α P-value
One should examine:
1. What the target population is a. Is this defined? b. Is it relevant to your practice population? 2. How subjects were selected from the target population a. Was the sample size large enough to feel random selection is ideal? b. If not, were potential confounders identified and blocked appropriately? 3. How treatment/exposure/independent variable was administered or assessed a. Was it objective and consistent? b. Were there opportunities for failure in administration or contamination of controls? 4. What outcomes were deemed important, and how they were assessed a. Were they objective? b. Were they measured consistently and uniformly across individuals, groups and time? c. Were those conducting the assessments blinded to treatment groups? 5. How outcomes were assessed a. Is statistical methodology adequately explained? b. Do statistical methods appear 'typical' or 'exotic'? 6. What conclusions are justified a. Does evidence support the conclusions authors' propose? b. Could alternative explanations exist? 7. Applicability to your practice a. Does the target population reflect the population you encounter? b. Can you implement intervention in manner similar to that used in study?
The scientific process: Study design Avg. daily gain of calves fed vitamin B1 over 30 day feeding trial (in hundred pounds)
We can then examine the weight of calves fed a ration that included vitamin B1. These calves appear to have done better, with only a very few gaining little weight, and some gaining a lot. The average looks to be ~110 #. Again, we can draw a blue line to predict what we think the population average would be.
A fourth year vet student in small animal surgery rotation has noted that she is the only student to be called in on emergency. She hypothesizes that there is a conspiracy among administrators to flunk her out and prevent her from graduating. She decides to examine the probability of having five students on the rotation yet only one has gotten after-hours calls. Which of the following would be an appropriate null hypothesis for her query? A.She has received at least as many emergency calls as all other students B.The other students on the rotation have received significantly fewer calls than she has received C.There has been no significant difference in emergency calls between all students on the rotation D.The other students on the rotation have received significantly more emergency calls than she has received
C. There has been no significant difference in emergency calls between all students on the rotation If we can't prove something to be absolutely true, does that mean there is no 'truth'? i.e., that there is no absolute? Generally we would say no; we would suggest that a truth does exist, we simply can never achieve certainty in saying we know it. As such, when we make a conclusion based upon a scientific investigation, there is the possibility that our finding is inconsistent with the truth.
•____________- Outcome of interest; can include: -•Disease; death; pregnancy; culling; "event"; etc.
Dependent variable
· These errors result in execution of the study, where the treatment, exposure or independent variable is not appropriately distributed among study participants. Illustrations: 1. Failure to administer the treatment appropriately (too low a dose, dog hiding pills in the couch, etc.) 2. 'Contamination' of control subjects by treatment. For example, in the asthmatic cats study, cats from non-smoking households may be exposed to second hand smoke by houseguests. 3. Withdrawal of some subjects from the study, with those individuals being different from those who remain. For example, dogs with severe cystitis being removed from the study. 4. Interpretation- Errors of interpretation result when the results of the study are used to reach an erroneous conclusion. The most common scenario would be that either: Incorrect assumptions were made about the subjects, treatment, outcome, or all of these; or, 5. Relevant information was not identified or reported by the researcher Either of these could result in confounding. Confounding results when a third variable exists (i.e., neither the independent nor dependent variable) which is associated with both.
Execution/intervention/exposure
True or False? Double blinding is always necessary in veterinary studies
False Double blinding is not necessary in veterinary studies, as the animal isn't going to be influenced by a placebo effect. However, in some cases it may be preferred that the owner is unaware of which group the animal belongs, since that may alter how the owner treats and/or assesses the animal. If at all possible, blinding should remain throughout the analysis and interpretation of data to ensure bias doesn't influence the conclusions.
During the execution of the study and collection of data, ________ should be employed if at all possible. In human studies this should apply to both the investigator and the patient. This is termed "___________."
blinding double-blind
A specific statement or question being evaluated in the course of a research project.
Hypothesis
The scientific process: Study design Avg. daily gain of all calves over 30 day feeding trial (in hundred pounds)
If we then overlay the two sets of data and the respective curves, we can appreciate an apparent small difference between the two populations. However, hopefully you can appreciate that there is quite a bit of overlap between the two groups. And if the observations were just a little different, the interpretation may be that the two populations had essentially the same ADG. Statistical analysis is needed to see if this small difference is a true reflection of a difference between the populations, or whether it is simply an artifact of the samples we used. This is important to recognize, as it is virtually inconceivable that you would ever truly see 'no difference.' Some difference will always be apparent, even if the two populations are identical. So we have to estimate the population curve since we can't sample the entire population. Then we need to determine statistically how likely it is that the two populations are the same.
True or False we can never truly prove anything as the cause or result of something else
In nearly all cases, the best we can do is to form a defensible explanation that seeks to approximate the truth by being consistent with all (or most) of the evidence available.
Statistical significance Back to our prototypical study... •Compare one treatment and control •We DON'T find a significant difference •p>0.05, p=0.20, p=0.44 •The treatment is no better than control, right?
In this instance, we FAIL TO REJECT THE NULL hypothesis. That doesn't mean that we've proven it to be true. Maybe making a type II error
•______________- The purported "cause;" synonyms include: -•Exposure; risk factor; treatment
Independent variable cause=SARs purported cause=ivermection lessening severity
(in regards to science and the scientific process)- An empirical generalization; a statement of a biological principle that appears to be without exception at the time it is made, and has become consolidated by repeated successful testing; rule (Lincoln et al., 1990); A set of observed regularities expressed in a concise verbal or mathematical statement. (Krimsley, 1995).
Law
power is directly related to the accepted value for α. Accepting a larger α means you are more willing to incorrectly reject the null hypothesis; but it also increases the likelihood that you will correctly reject it if the null is wrong. (More/Less?) variability amongst the target population makes it easier to detect a difference.
Less Explanation: To understand this, let's return to the effect of vitamin B1 supplementation on ADG. Let's examine two hypothetical scenarios. In the first scenario, the mean ADG for the vitamin B1 group was 2.55 pounds, but ranged from 1.9 to 3. The mean ADG for the control group was 2.45, but ranged from 1.8 to 2.9. The p-value between these two groups was 0.6 . Examining the chart, it is hard to know which group did better. Generally, the treated animals (blue diamonds) are higher, but there are certainly some instances where the matched control animal (red square) did better than treated calves. And the way the data points are dispersed, it is impossible to judge by 'eye-balling' it. In contrast, look at scenario two. The means for both groups are the same (2.55 and 2.45). However, the ranges are much smaller- only 2.51 to 2.6 for the vitamin B1 group, and 2.41 to 2.5 for the control group. Looking at the chart, it is readily apparent that the treated animals are consistently better than the controls. The p-value in this case is highly significant, at 0.0001.
Designing the study
Now that we understand what affects power, we can use that knowledge to examine what should be done to design an effective study. First, the researcher should determine his/her values for α and β. Then, s/he must make an estimate about the degree of effect he/she anticipates. This should be based upon previously reported studies, preliminary trials or personal experience. Once this has been done, it can be calculated how many animals to sample in order to achieve the desired power (again, power= 1- β). Unfortunately, some investigators fail to consider power prior to beginning the study, and use an unacceptably low number of participants. This may be for economical reasons, or in an effort to reduce the number of animals used in research. The net effect of a low power study is a very high likelihood of getting negative results (finding no difference), regardless of whether or not a difference exists. This may mean a potentially effective treatment is abandoned, or that another study must be done. Ideally, the power of the study is disclosed when the results are reported. A common form of this would be: Given the expected prevalence of asthma rates in cats (1% in the general population, 2% in smoking households), we wanted to have an 80% chance of detecting a difference as small as 2-fold increase in risk associated with exposure to smoking. To achieve this power required enrollment of 875 cats in the study." That means, if the study finds no difference, there is still a 20% chance that cats exposed to smoking really are 2 times more likely to develop asthma. Alternatively, if no difference is found, it is possible that exposure to smoke increases asthma, but by less than predicted.
________: No difference exists between treatments
Null
Enrollment in the study
Once it has been determined how many subjects must be utilized, enrollment can begin. Preferably, a reasonably accurate census exists for the target population (remember, the target population should have been defined as part of generating the hypothesis). If so, a number of ways can be used to select participants. Most people assume that random assignment is the best way to assign participants to treatment or control group. And indeed, if a large enough number of participants will be enrolled, random assignment will usually work best. However, if relatively few animals are to be used, it is possible that random distribution will still create some 'clustering' effects. In this case, if a known factor is present that may impact the results, you may choose to block according to that factor. For example, one area of the feedlot may offer a poorer environment than another (more mud, greater exposure to weather, etc.). Therefore, instead of putting the treatment group in this area and the control group in the preferred region (or vice versa), each region should be divided into two pens. This would result in four pens (2 good and 2 bad); both treatment and control can then each have one group in the poorer pen and one group in the better pen. An alternative method to address potential variation is to stratify or match. Steers consistently have higher ADG than do heifers. It may therefore be preferred to ensure there are equal numbers of steers in the control group and the vitamin B1 group. If a census of the population does not exist, one may be relegated to an alternative sampling method. That can include systematic (assigning every other dog with a UTI to marbofloxacin, and every other one to amoxicillin) or convenience (we'll enroll every cat that comes from a smoking household, and every 3rd cat from a non-smoking household). There are flaws with these approaches, but they can be used effectively in some cases. The study design should clearly state how animals were selected and enrolled!
A low p-value (i.e., 0.001) indicates which of the following: A.The null hypothesis is improbable and thus should be rejected B.The data acquired in the study appears to be representative of the population(s) from which it was collected C.The difference found is likely to be clinically significant (i.e., it should impact your clinical decision-making) D.The difference found is unlikely to reflect a real difference in the two groups compared (i.e., it is likely due to chance)
So a p-value represents the probability of observing a difference as large as what the study found, or larger, given that the null hypothesis was true. So if we choose to reject the null hypothesis, p-value approximates the probability that the decision to reject is done in error. To decide whether or not to reject the null, we compare the p-value to a pre-established alpha value. If p is smaller than alpha, we'll reject the null hypothesis. If p is greater than alpha, we will NOT reject. A.The null hypothesis is improbable and thus should be rejected?
Errors are grouped into two main classes - systematic and random. ________ errors occur over and over again, for the same reason (or from the same cause), and are consistent in the direction and magnitude in which they distort a finding. As such, (answer 1) error is also called a _________. An easily understood example of a systematic bias is a scale that consistently reads 10 lbs. heavier than the actual weight. This always results in a higher weight than is accurate. The average of weight values would be approximately 10 lbs. higher than they should be, but assuming this is a true systematic error and occurs very consistently, the distribution of the values would be very similar to the truth. Assuming both groups are weighed on the same scale, weights for those individuals receiving the vitamin B1 supplementation and those receiving the placebo would be equally affected and the conclusion would not be significantly altered from what it should have been. If, however, control animals (those receiving a placebo) are weighed on one scale and the supplemented cattle are weighed on another, then the systematic error can greatly distort the findings of the study. In contrast, ________ error does not reflect a flaw in instruments or execution; instead, it results from natural biologic and random variation (an animal's weight will vary ever so slightly depending on where on the scale they stand, if they've defecated and urinated recently, etc.). As such, (answer 3) error will affect the results in both directions, and be unpredictable in effect on a given event. However, with repeated measurements, we can begin to estimate that variability and predict its effect.
Systematic bias random
That is an important distinction between systematic and random errors: _________ errors can be eliminated; but if they are not eliminated, you are not able to estimate their impact (because you don't know they're there!). In contrast, ________ error CANNOT be eliminated, but often times can be estimated. While (answer 2) error cannot be eliminated, it can be reduced. Ways to do this include minimizing the variability in your subjects, and maximizing consistency in your protocol. For our feedlot trial, that may mean weighing all calves at the same time each day, prior to feeding. It would also be aided by enrolling uniform cattle. This could have a downside, though, by reducing external validity.
Systematic random
A scientifically accepted general principle supported by a substantial body of evidence offered to provide an explanation of observed facts and as a basis for future discussion or investigation (Lincoln et al., 1990)
Theory
Study execution: Statistical analysis True or False? •Sample(s) is/are NEVER a perfect representation of the population(s) •Some difference will ALWAYS be found -•Even between two samplings of the same population •What is the probability of finding a difference this large, or larger, if the null is true? -•Estimated by the _____________
True p-value-what is the probability of finding a difference this large or larger in a given population again-go back 830 We have established that when we sample a subset of the population we obtain an average that does not perfectly reflect the true average of the entire population. When we are comparing a treatment and control (or two treatments, etc.), we have actually obtained two averages, and are seeking to compare them. Under virtually every imaginable scenario there will be some difference between these two averages. The question we are concerned with is, what is the probability of detecting a difference at least this large that is solely due to chance? In other words, the truth is that there is no difference between the treatment and control, but we just happened to sample individuals in such a way that there appears to be a difference. Let's return to the vitamin B1 feeding trial example. Under both scenarios, the Excel spreadsheet shows a difference in ADG of 0.1 lb./day (2.55 for the treatment group, 2.45 for the control). Your client wants to know if this difference is real. Can he expect to see this benefit in all groups of similar cattle, or was this a one-time chance happening? Statistically this question is stated as: "what is the probability of finding a difference at least as large as 0.1 in ten pairs of animals, if vitamin B1 really does not impact ADG?" Notice that the 0.1 difference could be either an improvement or a decrease- this is therefore a 2-sided hypothesis.
Is P<0.05 always good? •Could there be a situation where a researcher does NOT want a significant difference between groups?
Yes
How does research begin? The first step is to ask a question or develop a hypothesis. the scientific convention is that we will assume _____________ until proven otherwise. Therefore, the scientific idea that is to be tested is stated as a ____________. Therefore the (answer 2) states that there is no association between the factor and the outcome of interest. Examples of (answer 2) would include: 1. Marbofloxacin is equivalent to amoxicillin for treating urinary tract infections. 2. Exposure to second hand smoke does not influence the risk of asthma in cats. 3. Adding vitamin B1 to the feed has no effect on average daily gain (ADG). By definition, there is an alternative to each of these hypotheses. Namely, the __________ hypotheses would be: 1. There is a difference between these two drugs. 2. Second hand smoke influences asthma rates. 3. Vitamin B1 impacts ADG. Note that a researcher likely has a hunch about what relationship may exist. For example, s/he may believe that marbofloxacin is better than amoxicillin, or that smoke increases asthma, or that B1 improves ADG. But they typically will entertain the possibility that marbofloxacin is actually less effective than amoxicillin, or smoking protects against asthma, or vitamin B1 decreases ADG. If you are willing to consider either possibility, you have created a 2-sided hypothesis.
equivalency null hypothesis alternative
The scientific method: hypothesis formation Examine probability of results, given null hypothesis •Remember: Null assumes NO association •Analogous to court of law: Require a large preponderance of evidence before concluding association exists; If null is improbable, alternative hypothesis is considered likely (though NOT definite!) •Direction of alternative is guided by _________________
findings •Treatment A is better (or worse) than B •Exposure harms (or improves) health
Before a study begins Once a hypothesis has been generated, the researcher must ask__________. This will determine from where the study participants will be selected. The ability to generalize your results to "the real world" is called__________. Prior to enrolling participants, the researcher must next consider how s/he will__________. True or False? Remember, the results of a study should not be taken to represent the absolute truth. In considering the relationship between the study results and the actual "truth," there are essentially four possible outcomes (see table 1). One possible result is that an association exists and the study indeed detects it. A second possibility is that no association exists, and the study found no association. In either case, the conclusion reached is correct. Then there are two possible errors: The study says an association exists when in reality, it doesn't; or, the study fails to detect the association that truly exists. we would rather our error be to fail to reject the null hypothesis than to incorrectly reject it when it is really true. Hopefully it is intuitive that there is a trade-off between these two possibilities. The more certain you want to be that the null is incorrect before you reject it, the more likely you will not reject it, even if it is in fact wrong. The acceptable probability of making a type I error (rejecting the null hypothesis when it is true) is denoted by__________. The acceptable probability of making a type II error (failing to reject the null hypothesis when it is false) is denoted by_______. The researcher must decide how confident s/he wants to be for each of these, and that should be done before the experiment starts. Most disciplines have established a preferred value of 0.05 for (answer 4). That means that we are willing to make a type I error only 5% of the time. The maximum tolerable value for (answer 5) is usually said to be 0.2. That means in those situations where the null is false, we will still fail to reject the null 20% of the time. The compliment of this (i.e., 1- (answer 5)) is referred to the _______ of a study. (answer 6) refers to the ability of the study to detect a difference in the two groups, given that a difference truly exists. Given the desired β of 0.2, that means you would want a power of 80%.
for what population s/he wants the results to be relevant -For example, you may want to compare marbofloxacin vs. amoxicillin in otherwise healthy dogs with a urinary tract infection (UTI). Alternatively, you may be more interested in dogs who previously had a UTI that has recurred. Or, you may be interested in infections that have developed while the dog was in a veterinary hospital with a urinary catheter. Obviously, these three questions are very different, and results from a study using one of these populations may not apply to either of the other two scenarios. true external validity interpret the results α β power
The scientific method: hypothesis formation •Can declare some __________improbable •Can give estimations of values, with ___________ in those estimates
hypotheses stated degree of confidence
power is directly related to the accepted value for α. Accepting a larger α means you are more willing to incorrectly reject the null hypothesis; but it also increases the likelihood that you will correctly reject it if the null is wrong. A stronger association (or larger difference in treatment effect) also (increases/decreases?) the likelihood of rejecting the null hypothesis.
increases Example: This can be explored further with the cat/asthma example. You may hypothesize that second-hand smoke doubles the risk of asthma in cats. Further, it is estimated that asthma occurs in 1% of cats overall. This would predict a rate of ~2% in cats in smoking households. If the estimates were correct and a 100 cat study was done (½ from smoking households and ½ from non-smoking), you would anticipate 1 cat with asthma in the former group (50 cats * 2%) and no cats with asthma in the latter. Such a small difference would not be considered significant. But if exposure to second hand smoke really increases the risk of asthma 10 fold, you would expect to find 5 cats out of 50 with asthma in the exposed group, and none in the control group. This makes it easier to reject the null hypothesis, even with a relatively small sample size.
power is directly related to the accepted value for α. Accepting a (larger/smaller?) α means you are more willing to incorrectly reject the null hypothesis; but it also increases the likelihood that you will correctly reject it if the null is wrong.
larger
power is directly related to the accepted value for α. Accepting a larger α means you are more willing to incorrectly reject the null hypothesis; but it also increases the likelihood that you will correctly reject it if the null is wrong. A (larger/smaller?) sample size makes it easier to detect a difference/association compared to a similar study examining the same question but using a (larger/smaller?) sample size.
larger smaller Explanation: Mathematically, this derives from the calculation of the standard deviation. Logically, it may be explained by the relationship of the sample to the population as a whole. If all members of a population were included in the study, there would be no need for statistical analysis. We would be able to determine what percentage of UTI's were cured by marbofloxacin, and that would either be greater than, equal to, or less than the percentage cured by amoxicillin. But it is impractical to survey all members of a population. Therefore, we take a sample, knowing that the average value from this sample will vary from the average value for the entire population (how likely is it that your sample will be PERFECTLY representative of the whole group?). The more members we sample, the closer we get to determining the actual 'true' value. If we only sample 10 dogs out of 5,000, we recognize the resulting average is not as accurate a reflection of the 'true' value of the population as if we sampled 500 out of 5,000.
This illustrates an important point about random error- it always biases results toward the _________hypothesis. Random error always makes it harder to distinguish if a difference exists between two populations, because it results in loss of precision and greater 'blurring of the lines.' In essence, a p-value is an attempt to quantify the impact of random error by predicting the likelihood of random chance/random error giving you an extreme result. But just because we can quantify it to some degree doesn't mean it isn't important. A large amount of random error will result in a higher p-value and make it unlikely to reject the null hypothesis. Thus, random error is detrimental to answering questions, and should be minimized to the fullest extent possible.
null
The scientific method: hypothesis formation Start with a __________ hypothesis=assume there is no association •Treatment A is equivalent to treatment B •Exposure to substance X has no impact on health •____________ hypothesis is typically two-sided -Ivermectin may help you heal faster from covid -Ivermectin may make covid worse
null -For example. If we want to see if Ivermectin will treat covid, the null would be Ivermectin would have no impact on covid Alternate
So a ______________ represents the probability of observing a difference as large as what the study found, or larger, given that the null hypothesis was true. So if we choose to reject the null hypothesis, (answer) approximates the probability that the decision to reject is done in error. To decide whether or not to reject the null, we compare the (answer) to a pre-established ______________ value. If (answer 1) is smaller than (answer 2), we'll reject the null hypothesis. If (answer 1) is greater than (answer 2), we will NOT reject. Regardless of the decision, we still are facing the possibility of making a mistake. If we reject null, the likelihood of an error is approximated by (answer 1) (assuming the study was well designed and executed). But what if we don't reject? Can we estimate the likelihood that we're making a type II error? Yes. And ideally, that should be reported as the '______________' of the study (where (answer 3) equals 1- beta). We aren't going to go through calculating (answer 3) in near the depths we covered a p-value. But I do want you to understand the factors that influence a study's power.
p-value alpha power
Sampling/selection- As explained above, we generally select a small segment of the population and use them as a representation of the entire population. This fact, in and of itself, introduces random error. By selecting only some, we will be ignoring others, and this will result in the curve built from the observations being somewhat different from what the true curve looks like. We can minimize the impact of this by
selecting as large a sample as possible, and by selecting members from the population randomly for inclusion, rather than based upon some selection criteria.
Study execution: Statistical analysis If you were to repeat any given study numerous times it is highly unlikely that each replication would find the same difference between the two treatment groups- this estimate should not be confused with 'the truth' (which will never be known). Even if there is no real difference between the treatment and control most studies would find at least some difference although the results would center around zero. ASSUMING THE NULL IS TRUE, if you were to repeat the same study multiple times and then plot the results, they would likely look like this. Note that this looks EXACTLY like a depiction of the observations of individuals enrolled in the study. The same principles apply to both situations. A given observation (be it a weight from an animal, or a complete study) is merely a single observation from a whole population of possible individuals. Repeated observations (additional animals, or replicating a study) won't give you the same exact answer, because it is always a small snapshot of the whole truth. If the null is correct, repeating the study multiple times reveals that the most common value found would be zero, but many studies would find some value above zero, while an equal number of studies would find values below zero.
so if the null was true, you would end up with a bell curve like this when plotting all the multiple studies
we don't want to have to run the same trial 10 times or more. So instead, we run it once and then use those results to try to make some conclusions about what may be expected if the trial were repeated many times. We first use the variability we see in that study to build a curve. Except we center it at the value theorized in the null hypothesis. In the feedlot study, the null stated that there would be no difference in ADG between cattle fed the control ration and those supplemented with vitamin B1. So our curve would center on zero (the null value), and have a standard deviation reflective of what was seen in the one trial. We then see where our observed value falls in this curve. If it is close to the center of the curve, it (supports/disproves?) the null hypothesis that there is no difference in rations. But if it lies toward one extreme or the other of the curve, then we may (accept/reject?) the null hypothesis. In other words, we are testing the probability of finding a difference as large as, or larger than the one seen, if the null hypothesis were true.
supports reject Picture: The figure above represents a theoretical distribution of differences in ADG across many studies, ASSUMING THAT THE NULL HYPOTHESIS IS CORRECT (i.e., there is no difference between treatments). If our study found a difference of 0.03 (shown by the star above), it is highly possible that the true difference in ADG is 0 (even though we did find a small difference). But if the observed value is to one edge or the other (say 0.1), you have a relatively small chance of a study finding a value that large or larger, GIVEN THAT THE 'TRUTH' IS THERE IS NO DIFFERENCE IN TREATMENTS. Recognize that the difference could also be less than 0 (i.e., -0.1) and it would still be unlikely for the null to be true. In that case, we would conclude that the treatment actually decreases ADG. This is a central tenet of science. We don't seek to accept the alternative hypothesis. We simply choose to reject the null hypothesis if it is determined to be satisfactorily improbable. How improbable does it need to be? That is established by the value for α that the researcher set prior to beginning the study. In this case, the likelihood of finding a difference of 0.1 lbs. when addition of vitamin B1 actually had no effect is less than 0.0005. That means if the null hypothesis was true, we would see a value this extreme only in one of 2,000 times we performed the study. This is certainly rare enough that we would feel comfortable saying the null hypothesis is unlikely to be true.
Executing the study Ground rules and terminology #1. Is there a difference in efficacy of marbofloxacin compared to amoxicillin for treating urinary tract infections in dogs? #2. Does second-hand cigarette smoke increase the risk of cats developing asthma? #3. Does addition of vitamin B1 to a feedlot ration improve average daily gain? For each of these scenarios, you have a population of subjects which you're interested in (dogs for #1, cats for #2, feedlot cattle for #3). Ideally, you would enroll every individual in the population into the study, but that is typically not feasible. So instead, you select a number of individuals to represent the population. You have at least two different groups that subjects will be divided into. For study #1, it would be marbofloxacin and amoxicillin. For #2, it would be exposure to smoke vs. no exposure to smoke. For #3, vitamin B1 vs. no vitamin B1 supplementation (perhaps a placebo or carrier is added). These are called '__________,' '________,' or '__________.' All of these terms mean basically the same thing and can be used largely interchangeably, although one term often makes more sense for a given situation than another. Once subjects are exposed to one of the possible treatments, we are then going to measure an outcome. The outcome may be recovery, disease, death, performance characteristic, or something else. The outcome is commonly called a_________, because, at least in theory, it is dependent upon the treatment (or exposure or independent variable) which the individual was subjected to. The outcome will be measured for each individual in each treatment group. Obviously, there will be variation among the individuals, and this will be characterized and some form of 'average' created for each group. The researcher will then assess what relationship (if any) exists between the treatment (or exposure/independent variable) and the 'average' outcome (or dependent variable), and try to determine if that relationship was due to chance, a true effect of treatment, of something else (a confounder). Let's say the control group of calves averaged 75 lbs. gained over the course of a 30 day feeding trial. The green dots represent actual values observed for calves enrolled in the study. The blue line is an estimate of what would be seen if we measure the entire population of interest (ALL calves fed this diet, or an essentially identical one, without supplementation with vitamin B1). The group of calves supplemented with vitamin B1 gained an average of 125 lbs. over the same time period. Individual observations are seen as red dots, and again the blue line represents an estimate of the distribution if we examined all calves receiving vitamin B1 supplementation. The averages are different (75 vs. 125), but if you look at the two groups together, there is a lot of overlap; both in terms of individual observations and also the area covered by the curve intended to represent the entire population. The researcher will then have to analyze this data to determine if the weight gain is significantly different between the treatments. An important thing to recognize at this point is the distribution of points and shape of the curve. Specifically, as values get farther from zero they are found with less and less frequency. This curve is what is termed a____________. It is defined by____________. Less than 5% of observations will be found more than two standard deviations away from the mean.
treatments exposures independent variables -Scenario one is clearly a treatment; scenario two would typically be called an exposure. And all of them can be called independent variables. dependent variable Normal distribution (or a bell-shaped curve) a mean (the 'average') and the standard deviation (which reflects the variability in the observations)
True or False? statistical significance and clinical significance are very different. It is possible to achieve statistical significance that has no relevance to clinical veterinary medicine. It is also possible for a study to find a clinically significant fact that does not achieve statistical significance.
true For example, a 1999 publication in the Journal of Veterinary Internal Medicine1 examined a variety of parameters for association with sepsis in calves. They enrolled nearly 250 calves with diarrhea, including over 70 septic calves and 170 calves that were not septic. This relatively large number gave the study high power to detect small differences. They found statistically significant differences in a variety of factors, including rectal temperature, ionized calcium and total serum protein. Specifically, septic calves had a lower temperature (99.7°F vs. 100.8°F), lower ionized calcium (4.72 mg/dL vs 4.92 mg/dL) and lower serum total protein (5.4g/dL vs. 5.94g/dL). Are any of these differences clinically significant? Should you as a clinician make a decision on diagnosis, treatment or prognosis based upon this information? I would strongly argue no. You should not choose to withhold treatment or give a poorer prognosis simply because of a one degree difference in body temperature! This value is clinically meaningless, despite being highly significant statistically (p=0.009). The same holds true (in my opinion, at least) for the calcium and protein values. In fact, it is highly likely that the association between serum protein and sepsis is confounded by failure of passive transfer. A clinician reading this paper found little of importance in terms of management of calves with scours, despite the numerous statistically significant findings reported. Contrast this with an examination of metastasis of anal sac adenocarcinomas in dogs.2 This study was unable to demonstrate a statistically significant difference in survival between dogs who had metastasis to regional lymph nodes at diagnosis, compared to dogs with no evidence of metastasis. This is very important to a clinician, as conventional wisdom suggests that metastasis correlates with a poor prognosis. Such a finding warrants closer examination. Dogs without metastatic lymph node disease (MLN) at the time of diagnosis did in fact have a longer median survival (862 days, n=8) than dogs with MLN (260 days, n=7) at the time of diagnosis. Read that closely- dogs without MLN evidence lived 600 days longer than those with MLN! I believe that 600 days of life would be important to any owner and clinician, even though this difference was not statistically significant (P=0.1). Why was it not significant? Because the entire study consisted of only 15 dogs (7 with MLN and 8 without). A more powerful study may find the difference in survival isn't quite as large as this, but it seems very probable that the conclusion is the same- if a dog has lymph node metastasis at time of diagnosis, advise the owner to buy small bags of dog food from now on, because the dog isn't likely to be around a long time! The take home lesson is that the onus is upon you as a clinician to understand both the statistical and clinical significance of research. There are very valid reasons that we demand statistical analysis of research- just about any project would find a difference of some kind, just because of random variation and chance. Statistical analysis helps us sort out how likely it is that the findings are due to those possibilities. However, a p-value is not the end-all-be-all statement in regards to the impact the project my have on clinical practice. You need to read and critique the project, recognizing its attributes and limitations. Examine the conclusions of the author and see if they are applicable to your situation. Assess the strength of the evidence, compare it to other evidence available to you, and then proceed in your pursuit of practicing EBM!
True or False? No study is perfect
true No study is perfect, either in terms of design or execution. There will be errors in any and all scientific investigations. When we critique a study, either as its designer or as a reader of a paper, we must recognize this fact and avoid discounting the conclusions reached simply because it wasn't perfect. Instead, we should strive to identify what factors could give rise to errors, reduce or eliminate some of those we can control (if we're the investigator), and attempt to quantify those which we can't control.
True or False? It is imperative that the researcher decide ahead of time how s/he will deal with issues such as non-compliance and loss to follow-up.
true These standards should be abided by and ultimately included in the reporting of the study. In some cases, treatment may result in unbalanced non-compliance or loss to follow-up. This is typically due to side effects associated with the treatment. In that case, it must be decided whether all originally enrolled participants are to be included in analysis or only those that completed the study. There are advantages and disadvantages to each, which we won't discuss. But the approach taken should be detailed in the report of the study.
True or False? Science seeks to observe and offer best explanation, not PROVE True or False? Because of answer to question one, Means facts actually don't exist!
true, Ex. if you hit a table one hundred times and it makes a noise, its likely that when you it it again it'll make a noise. But technically, without doing that you can't know for sure False, Doesn't mean facts don't exist!
We can't prove something works; only prove that it is highly improbable that it doesn't work. Recognize that the null hypothesis is typically counter to
what we 'want' to prove or are truly curious of.
The figure above represents a theoretical distribution of differences in ADG across many studies, ASSUMING THAT THE NULL HYPOTHESIS IS CORRECT (i.e., there is no difference between treatments). If we performed a single study and observed a difference of 0.04 lbs per day, we're not at all out of the realm of the null hypothesis being very probable. But if the difference is farther from the null value of 0, say, 0.1, then this is an improbable finding. Clearly not impossible; just as we cannot know the truth, we cannot say that the null is wrong. We can simply say that it would seem improbable to find this difference in one study, if the null were true. This is a central tenet of science. We don't seek to accept the alternative hypothesis. We simply choose to reject the null hypothesis if it is determined to be satisfactorily improbable. How improbable does it need to be? That is established by the value for α that the researcher set prior to beginning the study. In this case, the likelihood of finding a difference of 0.1 lbs. when addition of vitamin B1 actually had no effect is less than 0.0005. That means if the null hypothesis was true, we would see a value this extreme only in one of 2,000 times we performed the study. This is certainly rare enough that we would feel comfortable saying the null hypothesis is unlikely to be true.
would be a .005 probability that the calves would do this well
The scientific process: Study design Avg. daily gain of all calves over 30 day feeding trial (in hundred pounds) •Mean (avg) daily gain control: •0.75#, range from -0.85# to 2.1# •Mean (avg) gain trt group: •1.10#, range from 0# to 2.8# •Difference between groups: 0.35# •Null hypothesis: No difference between groups (0#) •"What's the probability of finding a difference of 0.35# (or more), if the null is true?" •Calculated, considering the variation within each group
would probably come up with a different number every time you do this study. This doesn't mean science has failed. if we do the study over and over again, and the null was actually true, you'd get similar answers
Power of a study to detect a difference What influences power?
•Accepted value for α -•Higher value for α increases power •Size/strength of association -•Stronger association increases power •Sample size -•Larger sample size increases power •Amount of variation among population -•Larger variation decreases power
Ground rules & terminology: (Almost) universal concepts
•Select a number of individuals to represent population •Compare treatment/exposure (independent variables) •Assigned to, or a characteristic of, the subject -•Abx A vs. B; smoker vs. non-smoker; Vitamin B1 vs. placebo •Measure disease/outcome(s) (dependent variables) -•Theorized to be related to the treatment --•Cure; asthma; avg daily gain (ADG) •Determine relationship (if any) between treatment(s) and outcome(s) -•Chance (i.e., p>0.05); treatment; confounder(s)
The scientific process: Study design •How can a wrong conclusion be made in a well-designed study?
•Using a (relatively) small group of subjects to represent the population of interest The previous slide laid out two scenarios where a wrong conclusion could be reached. You might rightfully ask, "if a study is well done, there shouldn't be the chance for a mistake, right?" Wrong. Why can we still reach a wrong conclusion when we do a good study? Because in almost all cases, we're only selecting a (relatively) small group of subjects and using them as representatives for the population of interest. If we could assess every member of the population, there wouldn't even be a need for statistical analysis- we could simply compare the two populations and know whether a difference exists. But surveying the entirety of two populations is virtually never possible, so we need to take a sample from each population and use statistical analysis to predict if the difference observed is real or an artifact of our sampling.