Chap 3: The phoenix of statistics

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

NHST is a hard habit to break bc it ...

requires many people to make the effort to broaden their statistical horizons

scientists may contribute to publication bias by capitalising on ... to show their results in the most favourable light possible

researcher degrees of freedom

what are Cohen's benchmarks for effect size?

d = .02 (small) d = .05 (medium) d = .08 (large)

what is the formula for Cohen's d?

d = X1 - X2 / s

pearson's correlation coefficient r can vary from ....

- 1 (perfect negative relationship) 0 (no relationship) 1 (perf positive)

pearson's correlation coefficient r is a measure of the strength of a relationship between ...

- 2 continuous variables - one continuous & one categorical variable containing 2 categories

what are the 3 effect sizes that Field discusses?

- Cohen's d - Pearson's correlation coefficient r - the odds ratio

what are other older terms for p-hacking?

- fishing - cherry-picking - data snooping - data mining - data diving - selective reporting - significance chasing

the transparency and openness promotion (TOP) guidelines, is a set of standards for open science principles. They cover guidelines such as ...

- pre-registration of study & analysis protocols - replication transparency with data, analysis scripts, design, analysis plans, research materials - replication

open science initiatives push incentive structures towards ...

- quality not quantity - documenting research (not storytelling) - collaboration (not competition)

there are many ways scientists can influence the p-value, these are known as researcher degrees of freedom & include ...

- selective exclusion of data - fitting different statistical models, but only reporting most favourable results - stopping data collection at a point other than that decided at the study's conception - including only control variables that influence the p-value in a favourable way

what practices represent p-hacking?

- trying multiple analyses & reporting only the one that yields sig results - deciding to stop collecting data at a point other than when predetermined sample size is reached - including (or not) data based on the effect they have on te p-value

the transparency and openness promotion (TOP) guidelines, is a set of standards for open science principles. For each standard, there are levels defined from 0 - 3. What do they represent?

0 - the journal merely encourages data sharing or says nothing 3 - data is posted on a trusted repository & results reproduced independently before publication

what are the 3 common misconceptions about what a statistically significant result allows you to conclude?

1) a significant result means that the effect is important 2) a non-significant result means that the H0 is true 3) a significant result means that the H0 is false

the practice in research articles of presenting a hypothesis that was made after data collection as tho it were made before

HARKing

the ASA released 6 principles for scientists using NHST. The 2nd is that you don't interpret p-values as ...

a measure of the probability that the hypothesis in question is true

what is the 'open science' movement?

a movement to make the process, data and outcomes of research freely available to everyone

one way scientists have looked for evidence of p-hacking is by extracting the reported p-values across a range of studies on a topic, or within a discipline, and plotting their frequency. What does the line show?

a p-curve; the no. of p-values you'd expect to get for each value of p

NHST is the dominant method for testing theories using stats; it is compelling bc it offers ...

a rule-based framework for deciding whether to believe a hypothesis

one common misconception about what a statistically sig result allows you to conclude is that a sig result means that the H0 is false. Why is this not the case?

a sig test statistic is based on probabilistic reasoning, which limits what we can conclude

r quantifies the relationship between 2 variables, thus if one is categorical with 2 groups (coded as 0 and 1), then you get ...

a standardised measure of the difference between 2 means (kinda like cohen's d)

the most important conclusion about p-curves is that it's very difficult to extract enough info to ...

accurately model what p-hacking might look like & thus ascertain whether it happened

NHST is also based on long-run probabilities, i.e. the alpha value (e.g. .05) is a long-run probability & means that ...

across repeated identical experiments the probability of making a type 1 error is .05

other things being equal, effect sizes are not ...., unlike p-values

affected by sample size

why is a 'standardised' measure good? for example with effect sizes

allows you to compare effect sizes across different studies that have measured different variables, or have used different scales of measurement

NHST is also based on long-run probabilities, i.e. the alpha value (e.g. .05) is a long-run probability. That probability does not apply to ...

an individual study where you either have (p = 1) or have not (p = 0) made a type 1 error

the analysis of p-curves looks for p-hacking by examining ...

any p-value reported within and across the articles studied

the ASA published some guidelines about p-values and the ways in which you might address the problems with NHST. The first way to combat the problems is to ..

apply sense when you use it

the p-value is the probability of getting a test statistic ... from an infinite no. of identical replications of the experiment

at least as large as the one observed relative to all possible values of tnull

one common misconception about what a statistically sig result allows you to conclude is that it means that the effect is important. Why is statistical significance not the same thing as importance?

bc the p-value from which we determine significance is affected by sample size e.g. v small effects will be sig in large samples (vice versa)

questionable research practices are not purely the fault of NHST, but NHST nurtures these temptations by fostering ...

black and white thinking, i.e. sig results garner much greater personal rewards than non-sig

altho the sample size doesn't affect the computation of the effect size in the sample, it does affect how ...

closely the sample effect size matches that of the population (the precision)

the peer reviewers openness initiative asks scientists to ...

commit to the principles of open science when they act as expert reviewers for journals

NHST is biased by researchers ...

deviating from their initial sampling frame (e.g. by stopping data collection earlier than planned)

cohen's d is the ...

difference between means divided by the SD

.... add info that you don't get from a p-value

effect sizes

the ASA released 6 principles for scientists using NHST. The 6th id that 'by itself, a p-value doesn't provide a good measure of ...'

evidence regarding a model or hypothesis, i.e. there may be many other hypotheses that are more compatible with the data than the hypotheses you have tested

when using cohen's d, and the groups do not have equal standard deviations, there are 2 common options. The first is to use the SD of the control group or baseline. This option makes sense bc any intervention might be ...

expected to change not just the mean but also the spread of scores - therefore the control group/baseline SD will be a more accurate estimate of the 'natural' SD for the measure you're using

scientists have looked at whether there's general evidence for p-hacking by looking at the distribution of p-values you would expect to get if p-hacking doesn't happen with the distributions you might expect if it does. One example is to ...

extract the reported p-values across a range of studies on a topic, or within a discipline, and plot their frequency

part of the open science movement is about providing ...

free access to scientific journals

researcher degrees of freedom refers to the fact that a scientist ...

has many decisions to make when designing & analysing a study

an empirical probability is the proportion of events that ...

have the outcome in which you're interested in an indefinitely large collective of events

one common misconception about what a statistically sig result allows you to conclude is that a non-significant result means that the H0 is true. Why is this wrong?

if p > .05, then you can decide to reject the H1 - but it doesn't mean the H0 is true. i.e. there may still be an effect, but it just wasn't big enough to be found

the ASA released 6 principles for scientists using NHST. The first is that p-values can indicate how ...

incompatible the data are with a specified statistical model, i.e. we can use the value of p to indicate how incompatible the data is with the H0 (smaller ps = more incompatibility)

another way to see if there's evidence for p-hacking in science is to focus on p-values in ...

individual studies that report multiple experiments

the current incentive structures in science are ....

individualistic rather than collective

what is the crucial point about empirical probabilities?

it applies to the collective (not individual events)

when the group SDs are different, using the pooled estimate can be useful to calculate Cohen's d, however ...

it changes the meaning of d bc we're now comparing the difference between means against all the background noise in the measure

what is perhaps the biggest practical problem created by NHST?

it encourages all-or-nothing thinking, i.e. if p < .05 then an effect is sig, but if p > .05 it is not

effect sizes offer us something potentially less/more misleading than NHST

less

NHST works on the principle that you'll make a type 1 error in 5% of an infinite no. of repeated, identical experiments. The .05 value of alpha is a ...

long-run & empirical probability

when one variables increases and the other decreases

negative relationship

a non-sig result should never be interpreted as ...

no difference between means or no relationship between variables

the ASA released 6 principles for scientists using NHST. The 3rd is that scientific conclusions and policy decisions should ...

not be based only on whether a p-value passes a specific threshold

both p-hacking and HARKing are cheating the system of NHST, bc in both cases it means that you're ...

not controlling the type 1 error rate (bc you're deviating from the process that ensures that it is controlled) & thus you have no idea how many type 1 errors you will make in the long run

pearson's r is ...., so an effect with r = .60, is not twice as big as one with r = .30

not measured on a linear scale

.... help to combat many of the wider problems into which NHST feeds

open science practices

the ASA released 6 principles for scientists using NHST. The 4th is: DO NOT

p-hack

researcher degrees of freedom that lead to the selective reporting of significant p-values

p-hacking

what are researcher degrees of freedom that relate closely to NHST?

p-hacking & HARKing

when using cohen's d, and the groups do not have equal standard deviations, there are 2 common options. The second option is to ..

pool the SDs of the 2 groups (if the groups are independent)

when one variable increases and the other also increases

positive relationship

the ASA released 6 principles for scientists using NHST. The 5th is not to confuse statistical significance with ...

practical importance

HARKing is the practice in research articles of ....

presenting a hypothesis that was made after data collection as tho it were made before

submission of pre-registration of research is reviewed by relevant experts and if the ...., then it's accepted with a guarantee to publish the findings no matter what

protocol is deemed to be rigorous enough and the research question novel enough

significant findings are much more likely (7x) to be published than non-significant ones = ?

publication bias

what are the benchmarks for pearson's r?

r = .10 (small) r = .30 (medium) r = .50 (large)

by guaranteeing publication of the results, registered reports should ...

reduce publication bias and discourage questionable research practices aimed at nudging p-values below the .05 threshold

what are the consequences of the misconceptions of NHST?

scientists overestimate the importance of their effects; e.g. by: - ignore effects that they falsely believe don't exist due to accepting the H0 - pursue effects that they falsely believe exist bc of rejecting the H0

a report by the APA on the limitations of NHST, didn't recommend against NHST, but instead suggested that ...

scientists report things like CIs & effect sizes to help them evaluate findings without reliance on p-values

scientists may contribute to publication bias by ... and excluding non-sig ones

selectively focusing on sig findings

success in science is largely defined by ...

significant results

an effect size is an objective and (usually) ..

standardised measure of the magnitude of observed effect

pre-registration of research can be done in a registered report in an academic journal or more informally. A formal registered report is a ...

submission to an academic journal that outlines an intended research protocol before data collected

Cohen's d is simple enough to compute & understand, but it has 2 small inconveniences. The second is that altho the difference between means gives an indication of the signal, it does not ...

tell us about the 'noise' in the measure

when using cohen's d, and the groups do not have equal standard deviations, there are 2 common options. The first is to use ...

the SD of the control group or baseline

the p-value is the frequency of the observed test statistic relative to all possible values that could be observed in ...

the collective of identical experiments

when computing p-curves (to test p-hacking), whereby the curve is affected by the effect size & the sample size - what happens when the effect size if greater than 1?

the curve has an exponential shape - bc smaller ps (more sig) occur more often than larger ps

when computing p-curves (to test p-hacking), whereby the curve is affected by the effect size & the sample size - what happens when there's no effect (0)?

the curve is flat - all p-values are equally likely

the pearson's correlation coefficient can also be used to quantify ...

the difference in means between 2 groups

scientists have looked at whether there's general evidence for p-hacking by looking at ...

the distribution of p-values you would expect to get if p-hacking doesn't happen with the distributions you might expect if it does

the p-value is affected by ... of the researcher

the intention

pre-registration of research, as part of the open science movement, refers to ...

the practice of making all aspects of your research process publicly available before data collection begins

altho NHST is the result of trying to find a system that can test which of 2 competing hypotheses (H0 or H1) is likely to be correct, it fails bc ...

the sig of the test provides no evidence about either hypothesis

pearson's correlation coefficient r, is a measure of ...

the strength of the relationship between 2 variables

Cohen's d is simple enough to compute & understand, but it has 2 small inconveniences. The first is that the difference in means will be expressed in ...

the units of measurement for the outcome variable (so if it's units of 'positivity' for example, it's less tangible)

incentive structures in science that reward publication of sig results also reward ...

the use of researcher degrees of freedom

what is the problem with using Cohen's benchmarks?

they encourage lazy thinking

the strength of the pearson correlation reflects how ...

tightly packed the observations are around the model (line) that summarises the relationship between variables

one problem with NHST is that significance does not tell us about the importance of an effect. What is a solution to this?

to measure the size of the effect being tested in a standardised way

researchers are engaging in practices ... and that's why these values are over-represented in literature

to nudge their p-values below the threshold of significance

one way to see if there's evidence for p-hacking in science is to focus on p-values in individual studies that report multiple experiments. A low probability means that it's highly unlikely that the researcher would get results which are ...

too good to be true - with the implication that p-hacking has occurred

by having a public record of the planned analysis strategy, deviations from it will be ...

transparent, i.e. p-hacking & HARKing will be discouraged

p-curves can't be completely trusted to pinpoint p-hacking as when you use p-curves that better resemble p-hacking behaviour, the evidence ...

vanishes

Another problem with NHST is that the conclusions from it depend on ...

what the researcher intended to do before collecting data

when is cohen's d favoured over pearson's r?

when group sizes are discrepant, r can be quite biased compared to d

when is pooling the SDs of the 2 groups useful for calculating Cohen's d?

when the group SDs are different

what is an advantage of having your protocol for a study reviewed by experts?

you can get useful feedback before data collection

by using effect sizes we overcome one of the major problems of NHST (bc they're not affected by sample size). But the situation is more complex bc, like any parameter, ...

you will get better estimates of the population value in large samples (vs small)

you should evaluate an effect size within the context of ...

your specific research question


संबंधित स्टडी सेट्स

Chapter 44: Alteration in Mobility/Neuromuscular or Musculoskeletal Disorder NCLEX

View Set

Physics - Chapter 4: Thermodynamics - Section 1

View Set

Parkinson's Disease NCLEX Q registerednurseRN.com site

View Set

Management Accounting 1909 set 2 (chapter 2)

View Set

Chapter 20: Care of Patients with Coronary Artery Disease and Cardiac Surgery

View Set