STAT 5700 EXAM ONE
Induction by Enumeration
(enumerative induction) allows us to generalize from observed regularities/instances to unobserved regularities/instances Often is the only justification that we have for a particular scientific fact EX: P: All observed pigeons have been adults C: All pigeons will be adults
Hypothetical relative frequency
"the probability of an event A in a reference class B is the (limiting) relative frequency of occurrences of A within B if B were infinite." requires an infinite sequence of trials in order to define such probabilities believes over n trials your estimation will converge to the true parameter (law of large numbers)
Stake
(S) total amount of money involved in the wager (bet) amount paid in the case that E is true together with the amount forfeited if E (event) is false
Subjective Interpretation DEF
(bayesian interpretation) probabilities represent degrees of belief (or credences) for rational agents Probability is then "the logic of partial belief." agents degree of belief/confidence/credences in a given position degree of belief measured on a scale from 0-10% Dutch Book Argument/Theorem "I am not sure that it will rain in Philly this week, but it probably will."
3 Broad categories of interpretations of probability
1. measure of objective evidential support 2. degrees of belief 3. physical concept
What justifies hypothesis testing (or frequentist inference/methods/procedures) according to Mayo and Spanos
1. optimal inference procedures 2. adequacy of M_theta(x)
How Hypothesis testing works
uses test statistics (which is summary of experiment or values in the data set) and have p value 1. Assume the statistical model given by the null hypothesis. 2. Calculate the probability of a result at least as extreme as the one observed (x0) under the statistical model given by the null hypothesis. -if the prob is small then reject/evidence against the null (null is false? or believe null is implausible?) -if probability is large then fail to reject the null (or is it accept the null --> problem) even give the outcome of hypothesis test we should not be entirely convinced of that outcome we might act as if a hypothesis is false if, under that hypothesis, the data at hand are improbable
Betting quotient
(q) amount lost if E is false divided by the stake
Two Fallacies of Hypothesis testing
1. fallacy of acceptance 2. fallacy of rejection Mayo and Spanos have developed the concept of severity to deal with these fallacies
Epistemology
-study of knowledge (what are reliable and unreliable sources of knowledge) -a true belief is not sufficient for claiming knowledge; knowledge must be justified -concerns the role of theory in observations (theory laden measurement)
"theory laden" measurement
-theory of loaded measurement --> measurement are loaded with theory -idea that measurements used in stat models are not just simple reflections of the empirical (factual/true/verifiable) world -measurements often require the use of theory and the theories requirements measurements
Why we need a interpretation of probability
-to understand/interpret probabilistic claims -to be able to produce conclusions to important questions (does rising CO2 concentrations cause temperature to rise?) -be able to accurately interpret outcomes produced by statistical procedures (procedures rely entirely on probability theory) -probability seen in all the sciences -to solve philosophical issues --> (stats useful in answering Phil questions, and stats relies entirely on probability)
Objections to Subjective Interpretation
1. A DBA assumes that if a DB exists for an agents credences, then the agent can actually be induced into taking all of the bets in the book, which would yield a sure loss. Suppose after a few bets, Lindsey realizes that she will be DB'd. Could she stop at that point? (Ascertainability) 2. Assumes all uncertainty is epistemic. Cannot differentiate between uncertainty due to lack of information and inherent uncertainty. (Applicability) 3. Can we really assign a real number to our beliefs? My degree of belief in "I will go for a run tonight" is exactly 0.4? 4. Does subjective probability provide an operational definition of beliefs?
cases where defining X in terms of the processes used to measure X can be problematic
1. A definition of intelligence might be whatever a IQ test measures. Can't IQ tests be misleading, biased, confounding, etc.? 2. If temperature just is the reading given on the given thermometer, then how can we account for the fact that the thermometer can be ill-calibrated?
Objections to propensity interpretations
1. Admissibility Can a propensity obey the axioms of probability? 2. Propensities are unobservable entities. In what contexts are these unobservables constant, and in what contexts might they change? 3. theorists struggle to even define what "propensity" exactly is OP--> AUD
What is means for an interpretation of probability to be good
1. Admissible 2. Ascertainability 3. Applicability
Objections to Classical interpretation
1. Equally possible vacuous or circular? 2. Bertrand's paradoxes: Invariance issues 3. Irrational probability
Frequentist methods/inference in statistics
1. Hypothesis testing (z test, t test, chi squared test, F test) 2. point estimation (MLE, LSE, Method of moments) 3. interval estimation (CI)
Objections to logical/evidential interpretation
1. If logical probability is meant to be a generalization of deductive logic, it should be "objective" in some sense. But there are different ways to assign probabilities (issues with ascertainability) 2. It's not clear that logical probability can reasonably be applied in science (problems with applicability)
Properties of Subjective interpretation
1. If your beliefs do not follow probability theory, you are susceptible to a Dutch Book Argument (DBA); you will continue to lose money/utility. 2. If your beliefs do follow probability theory, you are not susceptible to a DBA. 3. For many subjectivists, bets are a way of operationalizing beliefs (DOB operationalized through betting (DBA))
Academic philosophy vs personal philosophy
1. Personal philosophies are not necessarily critical examinations 2. Personal philosophies are often absent of method 3. Academic philosophy critically examines "anything and everything"— including statistics! Personal philosophies are typically much more limited in scope.
Objections to frequentist inference
1. beronoullis fallacy 2. Lindley paradox 3. Tests do not provide inductive support 4. Easily misused
Pre/post data? --> severity, p val, type 1 error (alpha), type 2 error (beta), power
1. post data. Severity is dependent on the data you observe (use value of x bar) 2. pre data. never function of observed data (they a function of the test setup), and don't depend on data. 3. predata 4. predata
Objections to Finite frequentism
1. problem of the single case 2. reference class problem 3. operationalism 4. defining X in terms of the processes used to measure X can be problematic OFF-ROXS
objections to hypothetical frequentist
1. singe case events 2. uncountably many events
Statistics
A mathematical and conceptual discipline that focuses on the relation between data and hypotheses A response to the problem of induction similar to inductive inference (we input the data and the statistical procedure outputs a verdict that transcends the data,, and statistics pertains to future or general states of affairs.) Statistics is ampliative because we get more out than we put in
differences + similarities between ampliative and inductive inference
Ampliative can deal with past doesn't need to deal with future, inductive involves the future both can involve the future
Error Statistical Philosophy
An extension and reinterpretation of frequentist methods components: 1. Evidential (NOT behavioral) construct of frequentist methods like hypothesis testing 2. A commitment to objectivity - getting it right and not just ensuring consistency 3. Active inquiry/examination/questioning 4. Severity (data provides good evidence for a hypothesis when the hypothesis is severely tested
deductive argument
Arguments that either are or attempt to be deductively valid takes in what I already know and states a conclusion. doesn't draw on any new information based on a series of premises to get a conclusion
Example of Severity
Before leaving the USA for the UK, I record my weight on two scales at home, one digital, one not, and the big medical scale at my doctor's office. Suppose they are well calibrated and nearly identical in their readings, and they also all pick up on the extra 3 pounds when I'm weighed carrying three copies of my 1-pound book, Error and the Growth of Experimental Knowledge (EGEK). Returning from the UK, to my astonishment, not one but all three scales show anywhere from a 4-5 pound gain. There's no difference when I place the three books on the scales, so I must conclude, unfortunately, that I've gained around 4 pounds. Even for me, that's a lot. I've surely falsified the supposition that I lost weight! (p. 14-15)
Operationalism What is operationalism with respect to interpretations of probability? Which interpretations of probability are operational? Do you find operationalism in probability problematic?
DEF: idea that we do not know the meaning of a concept unless we have a method of measurement for it (ascertainable) RESPECT: means that we don't know the definition/interpretation of probability unless we have a method for calculating/measuring the probability Freq, classical, and subjective (to a certain extent, it is debated). evidential/logical is and im not sure about propensity theory. YES: if we are trying to make a claim about probability we must be able to calculate it and measure it (ascertainable)
uniformity of nature principle
DEF= all observed instances of A have property P therefore all unobserved instances of A also have property P (ie assume the future is going to be similar to the past at least in the short term0 use in induction: -prove this principle through induction -use UP to show a general form of inductive argument is justified
Subdisciplines of statistics
DIEST 1. descriptive stats 2. exploratory data analysis 3. inferential statistics 4. spatial statistics (modeling spatially correlated residuals, have data that have a spatial location attached to them) 5. time series (one of the variables we care about is time--> applying statistical methods to data that is recorded over consistent intervals of time)
Ethics
Dealing with things we should and shouldn't do the principles of right and wrong that guide an individual in making decisions ethical questions (which ask what we ought to do) in statistics we might ask: -is the practice of not publish null results unethical? -Should statistics/machine learning algorithms just tell us what we want to hear (e.g., Netflix recommendations)? Do we have obligations to get outside of our "filter bubble"? -What responsibility does a statistician have for the results and consequences of research for which they have provided statistical tools?* -misinterpretation of statistics -statistics has personal bias and is this ok?
hypothesis tests do not provide inductive support
Defenders of tests often claim that a low p-value indicates that the alternative is more plausible than the null (a) But what can the word plausible mean here? (b) Does a low p-value imply that the result (x0) is inconsistent with the null?
Example of theory of laden measurement and what to do
EX: want to decide whether meteorological conditions, such as temperature, windspeed, humidity, lead to systematic changes in atmospheric ozone concentration Find out there are several anomalous temperature readings from thermometer What to do: 1. keep them (Anomalous doesn't necessarily mean wrong; just different from what is expected or standard) 2. Decide whether they were measured incorrectly. We might decide that our thermometer was not calibrated correctly. But the calibration process requires the use of theories about the way that liquid takes up space at different temperatures. So, our measurements are laden ("loaded") with theory.
all valid deductive arguments are ampliative
FALSE
physical concept
FINITE frequentist interpretation propensity interpretation best-system interpretation FFPB
Statistical induction
Form of a statical inductive argument: Data --> stat procedure --> data transcending conclusion that pertains to the future, or to general states of affairs (shows how statistical arguments might be classified as inductive) often quantifies the degree to which the premises support the conclusionfd
Discrimination/misinterpretation in statistics (ethics)
Full linear model predictors: population size, percentage of population with refugee status, percent divorced, percent of kids without two parents, median family income, the poverty rate, income inequality, the unemployment rate Reduced linear model: same predictors, except percentage of population with refugee status. response = crime rate Conclusion?--> R2 is higher for the full model; therefore, cities with a higher refugee population have more crime. Yes? (no because more predictors always increases R^2' can we make these bold claims?
Dutch Book Theorem
Given a set of betting quotients (q) that fails to satisfy the probability axioms, there is a set of bets with q that guarantees a net loss to one side Stake, betting quotient, qs see table in pictures DBA associates betting quotients with degrees of belief always lose money if you are irrational DBA shows that DOB adhere to axioms of probability
Logic
It is the branch of philosophy that has its focus on the analysis of arguments study of correct reasoning helps us reason properly about incomplete, uncertain data inductive vs deductive arguments
Seven Pillars of Statistical Inquiry
LAIIDRR 1. aggregation 2. information 3. likelihood 4. intercomparison 5. regression 6. design 7. residual
Statistical model
M_theta = (X, P_theta) chosen to represent a description of a chance mechanism that accounts for the systematic information in the data X = (X1,...Xn) = data P_theta(x) = {f(x;theta) : theta in phi ,x in R_x^n} = probability function
adequacy of M_theta(x)
M_theta(x) defines the premises for statistical (inductive) inferences to be drawn on the basis of data x0, and its probabilistic assumptions are selected with a view to rendering x0 a 'typical realization' thereof. This 'typicality' is tested using misspecification (M-S) testing to establish the statistical adequacy of M_theta(x), i.e. the validity of its probabilistic assumptions via the data x0 Adequacy shown with RSS in regression models
Weak Severity Principle
One does not have evidence for a hypothesis, H, if nothing has been done to rule out ways that H may be false If data agree with H but the method used is guaranteed to find such agreement, and had little capability of finding flaws with H even if they exist, then we have 'bad evidence, no test' tells me what I don't have evidence for (negative claim) solution to fallacy of acceptance and rejection
optimal inference procedures
SEE flash card
debate about if statistics is inductive or not/problems that arise in statistical induction
Some say there is no proper justification for procedures that take data as input and that return a verdict, an evaluation, or some other piece of advice that pertains to the future, or to general states of affairs --> much of the philosophy of statistics is about coping with this challenge (statistics as inductive???) even if the premises are true, the conclusion does not necessarily follow
statistics as a solution to the problem of induction Does it do any better than non-statistical inductive arguments?
Statistics seen as solution because it seeks to assert that one outcome or conclusion is significantly likely, or maybe much more likely than any other. In other words, Statistics seeks to quantify our uncertainty or certainty in a conclusion when compared to others. Therefore statistics allows us to quantify our confidence on a conclusion drawn from induction. Statistics help us justify the jump from data to a data transcending conclusion that pertains to future or general state of affairs. Statistics can tell us how certain that our conclusion will follow from our premises. Statistics does seem to be much better than other non-statistical inductive arguments, since it provides empirical evidence of testing or observation in support of the arguments it may make
Philosophy
Subdisciplines: 1.logic 2. ethics 3. metaphysics 4. epistemology no clean and uncontroversial way to partition the field of philosophy; there is much overlap, between the subdisciplines presented here involves the analysis of arguments and concepts and uses the power of reason and weight of evidence exposes unsupported assertions examines everything and anything reason, evidence, the analysis of arguments, concepts, and assumptions are all core features of philosophy now, philosophy no longer includes the sciences
Strong Severity Principle
The data is good evidence for H if H passes a test that was highly capable of finding flaws or discrepancies from H, and yet none or few are found" Data provide good evidence for hypothesis H (just) to the extent that test T has severely passed H with x0. solution to fallacy of acceptance and rejection tells me what I do have evidence for
how logical/evidential interpretation interprets probability claims
The evidence to date, e, supports hypothesis, H, to degree p.
Argument from analogy
The general form of an argument from analogy might look something like this: (P1') A and B share properties p1, ..., pn. (P2') A has property p (C') Therefore, B has property p (almost) always categorized as inductive Make generalization on one entity, then use that to make generalization on another entity P1: Rams and football players have similar head to head collisions P2: Rams show no signs of brain damage C: Football players will show no signs of brain damage.
How is statistics philosophical + important philosophical issues that arise in statistics
Use statistical tools to prove philosophical questions/solve philosophical problems (like scientific theory confirmation) Philosophical issues in statistics: -the inability to replicate many scientific results -To launch an effective critique of frequentist methods, one must often address the underlying philosophical and logical principles in play -interpretations of probability (defining probability) -How does each study avoid, or fail to avoid, data dredging?
deductively valid argument
When it is impossible for the premises to be true and the conclusion to be false the premises logically entail the conclusion
Basic form of enumerative induction
[P1] All observed instances of A have been B. [C] The next instance of A will be B. Call this inference I.
inductive argument
an argument whose premises do not logically entail the conclusion even if the premises are true, the conclusions may be false
Bertrans paradoxes
arise in uncountable spaces probabilities may not be well defined if the method that produces the random variable is not clearly defined turn on alternative parametrizations of a given problem that are non-linearly related to each other SET UP: Consider an equilateral triangle inscribed in a circle and suppose a chord of the circle is chosen at random. What is the probability that the chord is longer than a side of the triangle? Bertrand gave three arguments, all apparently valid, yet yielding different results.
Finite relative frequency
attaches probabilities to events in a finite reference class: the probability of an event A in a finite reference class B is the relative frequency of actual occurrences of A within B counts actual outcomes/occurances
Inference to the best explanation
characterized as the process of "accepting a hypothesis because [it] provides [a] better explanation of the given evidence when comparing to the other competing hypotheses" The information at hand changes which explanations are plausible not deductive EX: Person A wants to figure out why they are missing their homework. The last time they saw it, they were in the Engineering Center working on it with Person B. They come up with 4 explanations to where their homework may be: Friend B accidentally took it Person A left it in the Engineering Center It flew out of a hole in Person A's backpack Some random person stole it off the table to spite Person A Person A texts Person B if they have Person A's homework. Person B says that they do not and the last time they saw it was in the Engineering Center when they were working on their homework together. Person A does not have any holes in their backpack and it was fully zipped when they came home. Person A also does not know who would spite them, and no college student really cares enough to steal someone else's homework. As a result, the solution that makes the most sense is that Person A forgot to grab their homework when packing up at the engineering center.
measure of objective evidential support
classical interpretation logical/evidential interpretation MOECL
Suppose that a friend, Marilynne, claims that she can determine the order in which milk and tea were poured into the cup. Imagine that you prepare 10 cups of tea for her, randomly assigning one of two possible methods for making the tea: milk first, then tea tea first, then milk Once prepared, you ask Marilynne to taste each of the 10 cups and state which of the above methods were used. Based on the data (she guessed correctly on 4 out of 10 cups), you conclude that Marilynne does not have an ability to distinguish between the two methods. This conclusion is:
data transcending without further information, an example of the fallacy of acceptance (because the null here was that she doesn't have the ability) inductive
frequentist statistics
defined as statistical procedures that only rely on probability assignments over sample spaces Probability not attached to anything outside of sample space (no attachment to hypotheses)
Relative Frequentist Interpretation DEF
defines probability by looking at limits + looks at occurrences of event E. supposes we can preform an experiment n times under the same conditions with sample space S. More specifically, the Prob of E (P(E)) is the limit as n--> infinity of n(E)/n. n(E) = number of times that the event E occurs (in n experiments) interpretation states that probability is just a relative frequency of some kind: 1. Finite relative frequency 2. Hypothetical relative frequency probabilities consider the limit as the number of trails goes to infinity (repeat process over and over again), that limiting frequency is the probability of the event believes the future will be like the past wants to measure variability (and measure variability in the data) wants to ensure we won't mistake interpretations of data too often in the long run (of experience) wants to quantify the strength of evidence for a particular claim or hypothesis
fallacy of rejection
evidence against H0 is misinterpreted as evidence for a particular H1 i.e., a low p-value causing a researcher to accept a particular 𝜃1∈𝐻1
Predictions
form of inductive inference use information from the past, to say something about the future
chance of a coin flip coming up heads is the relative frequency of heads in a long series repeated coin tosses
frequentist interpretation of probability claim
easily misused
hypothesis testing can easily be misused: 1. p hacking 2. multiple comparisons without adjustment 3. researchers degrees of freedom (bias in exclusion of certain observations, how to group participants, etc)
reference class problem
idea is that relative frequencies must be relativized to a reference class deals with the denominator in the defitnion of prob using freq interp EX: considering the probability of having children The answer to this question will vary greatly depending on who is being surveyed and what characteristics are prioritized. For example, if an upper class woman who is unmarried and unemployed is surveyed, the probabilities of having children will be very different for the classes "lower class" and "married."
Problem of Induction
idea that there is no rigorous justification for inductive inference It "concerns the justification of inferences or procedures that extrapolate from data to predictions and general facts." There is a logical gap between the information derived from empirical observation and the content of typical scientific theories. So, how can such information give us reasonable confidence in those theories??? → problem of induction even if the premises are true, the conclusion does not necessarily follow so can we come up with an argument for the conclusion that C: inductive inferences are justified?
ampliative inference
inference that provides us with more knowledge than we had before we can get more out than we put in get conclusions that go pass the premises
uncountably many events
involves cases where there is a uncountable number of events (and recall HF assumes countable sequence of results) If we have many uncountable events, how are we to form a countable sequence of them?
problem of the single case
involves the issue of assigning probabilities to events that only happen once arises in situations where there is no theoretical way for an event to occur multiple times ex: a coin that is tossed exactly once yields a relative frequency of heads of either 0 or 1. The prob of heads can surely be intermediate and The second toss will not yield the same probability value
Probability Function/ Axioms of Probability
let S be a sample space and let E1,E2,...En be events. A probability function is function P that satisfies: 1. For any set/event E, P(E)>=0 2. P(S)=1 3. See picture... about mutually disjoint events
admissible
meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently all the theorems, into true statements
A "worse fit"
means that the observed sample mean xbar would be further from the claim C
"If C were false"
means that we evaluate the relevant probability under the complement of C. We choose the value of 𝜇 that is the smallest perturbation to make C false
Operationalism with respect to subjective interpretation
means we can assign probabilities/numbers to DOB
severity
meant to answer the question "what is the probability of a worse fit with C if C were false? We can think of severity as a function of three items: 1. a test T (e.g., a specified statistical model, null, and alternative hypotheses) 2. an outcome O (e.g., an bar) 3. a claim C, (e.g., 𝜇>𝜇0) high severity means hypothesis is severely tested Severity is = p val when gamma =0
Bernoulli's fallacy/base rate fallacy/prosecutors fallacy
misguided belief that inference (and settling questions of inference) can be preformed using sampling probabilities alone
fallacy of acceptance
no evidence against H0 is misinterpreted as evidence for H0 a high p-value causing a researcher to accept a null hypothesis
Problem of circularity
outcomes for which we have symmetrically balanced evidence leads to problem of virtuality have this problem becauase: weighing of the evidence in favor of each outcome is required, and this seems to require a reference to probability.
Inductive Inference
pertains to future or general states of affairs forms of inductive inference: 1. inference to the best explanation 2. induction by enumeration 3. argument by analogy 4. prediction
qs
price paid for the bet where S is won if E occurs
p value
prob of observing data at least as extereme as the data we actually observe, under the null big p val (val >= alpha) = fail to reject the null = data is not rare under the null does NOT say anything about the probability of the null hypothesis or alternative being true/false p = alpha means that if the null hypothesis is true, data at least as extreme as those observed are "extreme" or "rare". needs to include observed and extremeness to be right interpretation
Propensity interpretation DEF
probabilities defined as "objective properties of the entities in the real world." Probability is a physical propensity, disposition or tendency of a given type of physical situation to yield an outcome of a certain kind Propensity interpretations come in two types: 1. Long-run propensity theories. 2. Single-case propensity theories.
Lindley Paradox
problem of sample size counterintuitive situation where the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution occurs when p-values suggest reject the null hypothesis while posterior model probabilities indicate high belief in the null hypothesis Let M𝜇(𝐱)=(𝐗,𝑓(𝐱;𝜇)), where the random variables in 𝐗 are independent and identically distributed, and 𝑓(𝐱;𝜇)f(x;μ) is the joint normal pdf. Consider testing the hypotheses 𝐻0:𝜇=0H0:μ=0 vs 𝐻1:𝜇<0H1:μ<0 at the 𝛼=0.05 significance level. As the sample size increases, the sample mean that would reject 𝐻0 approaches xbar=0 (so should we really reject the null?) small p value causes us to reject the null(mu =.5) even tho xbar approaches .5
long run propensity theories
propensities are associated with repeatable conditions, and are regarded as propensities to produce in a long series of repetitions of these conditions frequencies which are approximately equal to the probabilities propensities are not measured by the probability values themselves
single case propensity theories
propensities are regarded as propensities to produce a particular result on a specific occasion propensities are measured by the probability values
applicability
requires that we can apply probabilities broadly "Probability is the very guide of life."
ascertainable
requires that we can, at least in principle, calculate probabilities
Operational definitions
seek to define concepts/ unobservable entities (e.g., probability!) concretely in terms of the physical and mental operations used to measure them
Logical/Evidential Interpretation DEF
seeks to formalize the "degree of support" or "degree of confirmation" that some evidence e confers upon a given hypothesis H attempt to generalize deductive logic probability assignments do not need to be right in the sense of being verifiable externally by some experiment; they only need to be a valid expression of the information we are given and the assumptions we hold to be true probabilities can be determined a priori, and they generalize it in two important ways: 1. the probabilities may be assigned unequal weights 2. probabilities can be computed whatever the evidence may be, symmetrically balanced or not.
Hypothesis testing Classical frequentist vs error stat philosophy part two
set up: have evidence x_0 for H0: theta=theta_0 classical freq says behave as if H0 is true error stat Phil says evidence for a discrepancy of less than gamma from theta_0
Hypothesis testing Classical frequentist vs error stat philosophy part one
set up: have evidence x_0 for H1: theta>theta_0 classical freq says we behave as if H1 is true error stat Phil says this is evidence for a discrepancy of at least gamma from theta_0
falsification
the process of proposing theories and attempting to refute them a way of separating science from non science
metaphysics
the study of the fundamental nature of reality Ex- Why is there something rather than nothing? What is time, and what does it mean for entities to persist through time?
Classical Interpretation DEF
theory of chances reducing all events of the same kind to a certain number of equally possible cases P(Event A)= # ways event A can occur/# possible outcomes that can occur probability seen as ratios of equally probable events probability is about counting