Stat 130 Final Exam
probability sampling
A sampling procedure that gives every element of the population a (known) nonzero chance of being selected in the sample
non-probability sampling
A sampling procedure that gives some element of the population a (known) zero chance of being selected in the sample
sample points
An event occurred if the outcome of the experiment is one of the ___________ belonging in the event.
P10...P90
D1...D9
LEVEL OF SIGNIFICANCE (∝)
It is the maximum probability of Type I error the researcher is willing to commit.
ALTERNATIVE HYPOTHESIS
It is the operational statement of the theory that the experimenter believes to be true and wishes to prove.
ACCEPTANCE REGION
It is the set of values of the test statistic for which the null hypothesis will not be rejected.
CRITICAL REGION
It is the set of values of the test statistic for which the null hypothesis will be rejected.
range
It is the simplest and easiest to use measure of dispersion.
Null Hypothesis
It is the statement being tested
ALTERNATIVE HYPOTHESIS
It is the statement that must be true if the other one is false.
CRITICAL VALUE
It is the value of the test statistic separating the acceptance and rejection regions.
Statistical Methods of Applied Statistics, Statistical Theory of Mathematical Statistics
Fields of Statistics
ESTIMATION
Finding a single value or a range of values computed from the sample data that may be used to make a statement about the unknown value of the parameter
P(A) = P(A1) + P(A2) +... + P(An)
Finite Additivity Given A1 ∪ A2 ∪ A3 ∪ ... ∪ An // mutually exclusive events can be added together
decreases the length of the confidence interval
For a fixed level of confidence, higher sample size, ______
true
For a fixed sample size, a smaller α yields a smaller critical region. (That is why it is more difficult to reject the Ho)
increases the length of the confidence interval
For fixed sample size, higher level of confidence, ______
One-tailed Test of Hypothesis
Ho: μ = 14 vs. Ha: μ > 14 Ho: μ = 14 vs. Ha: μ < 14
Two-tailed Test of Hypothesis
Ho: μ = 14 vs. Ha: μ ≠ 14
UCL - LCL (Upper confidence limit-Lower confidence limit)
Length of the interval
choose the shorter kasi mas reliable/sakto
Length of the interval (choose the longer or choose the shorter)
Statistic
Let (X1, X2 ,...,Xn) be a random sample, ______ is a random variable that is a function of X1 , X2 ,...,Xn
Conditional Probability
Let A and B be two events where P(B) > 0. The conditional probability of event A given the occurrence of event B, denoted by P(A|B)
Ratio Interval Ordinal Nominal
Levels of Measurement
A ∩ B = ∅
Mutually Exclusive Events
oIts properties are: ◦ Symmetric about 𝜇 ◦ Bell-shaped ◦ Its range of possible values are infinite on both directions. ◦ Asymptotic to the x-axis ◦ Area below the whole curve is 1
NORMAL properties
ESTIMATION
Obtaining a possible value or a set of values of the parameter using sample data.
true
PDF of the chi-square distribution is positive for positive real number only. Otherwise, it is zero
Inferential Statistics
Process of drawing conclusions about certain characteristics of the population using information obtained from a sample.
true
The Binomial distribution is the generalization of the Bernoulli distribution (Be(p) = Bi(n = 1, p))
1. the sample size, n 2. method of choosing the random sample 3. population under study
The Factors that Affect the Sampling Distribution of a Statistic
reduced sample space
The conditional probability of event A given B can be regarded as the probability assigned to the event containing sample points that are in A and in B
BEFORE
The confidence coefficient gives the probability that the confidence interval, _______ sampling, will enclose the true parameter value.
true
The cut-off/decision rule gives us the confidence that we can support the said claim or disprove it.
Frequency Histogram
-bar graph that displays the intervals on the horizontal axis and the frequencies of classes on the vertical axis
range
-difference between the largest and the smallest value
Measure of Central Tendency
-finding the typical value
Variance
-gives an idea on how close the observations are to the mean
Measures of Dispersion
-indicate the extent to which individual items in a series are scattered about an average
Census
-process of gathering information from every unit in the population
Unbiasedness
-the expected value of the estimator is equal to the parameter it is estimating
Standard Deviation
positive square root of the variance
t-distribution
t is different with the standard normal, in terms of its variance. It has a larger variance.
Elementary Unit, Sampling Unit, Sampling Frame
student students list of all colleges
Elementary Unit, Sampling Unit, Sampling Frame
student students list of all students
event
subset of the sample space whose probability is defined
parameter
summary measure describing a specific characteristic of the population.
statistic
summary measure describing a specific characteristic of the sample.
sample space (Ω)
sure event
x(1) ( subscript yung (1) )
symbol for array
σ squared
symbol for population variance
x1 (subscript yung 1)
symbol for raw data
s squared
symbol for sample variance
Md
symbol of Median
σ
symbol of POPULATION STANDARD DEVIATION
μ
symbol of Population Mean
s
symbol of SAMPLE STANDARD DEVIATION
x̅
symbol of Sample Mean
Mo
symbol of mode
p
symbol of parameter
Ω
symbol of sample space
p̂
symbol of statistic
Measures of Dispersion
tells how varied the observations are from each other
CONTINUOUS UNIFORM
used in any situation when a value is picked "at random" from a given interval.
Partition
The number of distinct ways of arranging n objects of which n1 are of one kind, n2 are of a second kind, ..., nk are of a kth kind is
Independent Sampling
The sample size from the first population may not be equal to the sample size from the second population.
true
The sampling distribution of a statistic is its probability distribution.
Independent Sampling
The selection of the random sample from one population will not affect the selection of the random sample from the other population.
parameter space
The set of all possible value that the parameter can take on is denoted by Θ
Standard Error
The standard deviation of a statistic
mean median mode
Three Measures of Central Tendencies
null hypothesis, either reject Ho or fail to reject Ho
We test the _______ directly. We assume it is true and reach a conclusion to either _____ or _____
outliers
data values that are extremely different from the rest of the data items
Statistical Theory of Mathematical Statistics
deals with the development and exposition of theories that serve as bases of statistical methods; more on proofs
a posteriori
definition of the probability of event A is the limiting value of its empirical probability if we repeat the process endlessly.
CUMULATIVE DISTRIBUTION FUNCTION
denoted by F(•), is a function defined for any real number x as 𝐹 𝑥 = 𝑃(𝑋 ≤ 𝑥)
PROBABILITY DENSITY FUNCTION
denoted by f(•), is a function that is defined for any real number x and satisfies the following properties: 1. f(x) ≥ 0 2. The area below the curve, f(x), and above the x-axis is always equal to 1; and 3. P ( a ≤ X ≤ b) is the area bounded by the curve f(x), the x-axis, and the lines x = a and x = b.
X (uppercase) - denotes the random variable x (lowercase) - denotes the value of the random variable
denoting random variable and value
random variable
depends on the outcome of the random experiment
Probabilistic Model
describes a phenomenon by assigning a likelihood of occurrence to the different possible outcomes of the process.
Deterministic Model
describes a phenomenon through known relationships among the states and events, in which a given input will always produce the same output.
sa drawing, yung pinakamalaki at pinakasiksik yung minimum variance = most reliable
drawing, describe minimum variance
sample point.
element of the sample space
true
a lower level of significance means a "stricter" test, i.e it would be more difficult to reject Ho.
Poisson
can be effectively used to approximate Binomial probabilities when the number of trials n is large, and the probability of success p is small.
Graphical Displays of Data
can be helpful to describe the distribution of dataset
Reliability
can be seen through minimum variance of the estimator
P50, Q2, D5
equivalent measure of location of median
EXPONENTIAL
continuous analogue of the geometric distribution
GAMMA
continuous counterpart of the negative binomial distribution
Hypergeometric
converges to the Binomial distribution as M becomes large. This is when the sample size n is small relative to the size of the population.
0
for continuous random variable, P( X = x) = ?
PROBABILITY MASS FUNCTION
function defined for any real number x as f(x) = P(X = x)
Probability
gaano kalikely mangyayari ang isang bagay
Hypothesis Testing
gives a clear cut-off from where we can distinguish between events that could have occurred by chance and events that are highly unlikely.
Frequency Histogram
graphical display of data for grouped data (in intervals)
Stem and Leaf Display
graphical display of data which retains the actual observations (needs to be equally spaced)
EXPONENTIAL
has been used as a model for lifetimes of various things. May also be used for areas or volumes.
t-distribution and chi-square
it only has one parameter, its degrees of freedom (v)
Inferential
kahit sample ka kumuha, kaya ibalik sa population
Descriptive Statistics
kung saan ka kumukuha, doon ka lang kuha conclusion
status
latin word of statisics
sample space
listahan ng possible outcomes in a random experiment
Roster Method
listing down all the elements belonging in the set
mean
locates the center of mass of the dataset
Pearson's second coefficient of skewness
lots of mode
Lower confidence limit
lower endpoint
Confidence Interval estimator
may room for error; range of values; nacacapture niya ba yung true parameter
the value for which at least x% of the observations are less than or equal to it and at least 100-x% have values greater than or equal to it.
meaning of Px
state
meaning of status
measures of dispersion
measure ng kakaiba
standard deviation
measure of absolute dispersion
mode
measure of central tendency that does not always exist
Median
measure of central tendency that is best used when dealing with data containing outliers
mean, median
measure of central tendency that is unique
mean
measure of central tendency that may not be one of the observations
mean
measure of central tendency that utilizes all of the observed values in the collection.
median
measure of central tendency which is also a measure of location.
Descriptive Statistics
methods concerned w/ collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group
Inferential Statistics
methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data
Median
middle observation of the array or average of the two middle values in the array
Pearson's first coefficient of skewness
one mode
confidence level
other name of Confidence coefficient
relative frequency
other name of a posteriori
classical probability
other name of a priori
n = kung ilan pair; The sample size, n, is equal to the number of pairs in the sample.
paired sampling's n
sample
part or subset of the population from which the information is collected.
PROPORTION
part/total
pdf = derivative of cdf cdf = integration of pdf
pdf and cdf relationship
In simplicity, any pdf should be able to satisfy 1. f(x) ≥ 0 for all x, and 2. Area under the entire density curve = integ from −∞ to ∞ 𝑓( 𝑥 )𝑑𝑥 = 1
pdf properties
Target population
population from which information is desired
Ratio
(Levels of Measurement) height of an adult (in cms.)
Nominal
(Levels of Measurement) major island group
Interval
(Levels of Measurement) ratio with no absolute zero
Nominal
(Levels of Measurement) religion,
Nominal
(Levels of Measurement) sex, religion, major island group
Ordinal
(Levels of Measurement) student ranking
Ratio
(Levels of Measurement) the speed of a car (in kms/hr)
Random Samples
A sample that is selected using probability sampling.
Negatively Skewed Distribution
Sk < 0
Symmetric Distribution
Sk = 0
Positively Skewed Distribution
Sk > 0
mode
best used for categorical data
GEOMETRIC BY MENDENHALL
number of trials until the first success
NEGATIVE BINOMIAL BY MENDENHALL
number of trials until the rth success
Paired Sampling
oIt may be done using oSame sample subjects for the two populations (before and after) oPairing of subjects from the two groups with respect to an extraneous variable (The pairing removes the effects of the extraneous variable).
Sampling unit
unit of the population that we select in our sample
the square of the unit used to measure X.
unit of the variance
Upper confidence limit
upper endpoint
range
use this measure of dispersion if concentrated
POISSON
used for modelling the number of occurrences of a certain event (usually a rare event such as accidents and errors) in a specified space or time interval.
Confidence coefficient
(1 −∝ )100%
Nominal
(Levels of Measurement) distinct
CONTINUOUS
(DISCRETE or CONTINUOUS) Cooking Time in minutes
DISCRETE
(DISCRETE or CONTINUOUS) Height of a person (Halimaw, Model, Normal, Elf, and Smurf)
CONTINUOUS
(DISCRETE or CONTINUOUS) Height of a post in centimeters
CONTINUOUS
(DISCRETE or CONTINUOUS) Hours of sleep
DISCRETE
(DISCRETE or CONTINUOUS) Number of views a youtube video has
Ordinal
(Levels of Measurement) distinct, order
Set of all shuttlecocks of the particular brand produced this year
(Define population) A manufacturer of badminton shuttlecocks wishes to determine how many games their brand of shuttlecock (produced this year) will last on the average.
Set of all children below 12 years old in Metro Manila in 2011
(Define population) The Department of Health is interested in determining the percentage of children below 12 years old infected by the Hepatitis B virus in Metro Manila in 2011.
collection of all households in Metro Manila.
(Define population) to determine the average expenditure of all households in Metro Manila
Collection of all graduates of the University from the years 2006 to 2010
(Define population) The Office of Admissions is studying the relationship between the score in the entrance examination during application and the general weighted average (GWA) upon graduation among graduates of the university from 2006 to 2010.
the collection of all households in Quezon City or the collection of all households in Manila
(Define sample) determine the average expenditure of all households in Metro Manila. ◦ If we were to delimit the scope of the study, possibly due to budget constraints, then we would have to redefine the population of interest. This time we can delimit the scope of the study to include QC or Manila
Inferential
(Inferential or Descriptive) -A car manufacturer wishes to estimate the average lifetime of batteries by testing a sample of 50 batteries
Descriptive
(Inferential or Descriptive) -Janine wants to determine the variability of her six exam scores in Algebra.
Descriptive
(Inferential or Descriptive) -Ms. Macasaet wants to determine the proportion spent on transportation during the past four months using the daily records of expenditure that she keeps.
Inferential
(Inferential or Descriptive) . The Golden State Warriors wishes to estimate their chance of winning in the championship next season based on their average scores last season.
Inferential
(Inferential or Descriptive) . Toyota wishes to estimate the average lifetime of all their batteries by testing a sample of 50 batteries.
Inferential
(Inferential or Descriptive) A bowler wants to estimate his chance of winning a game based on his current season averages and the averages of his opponents
Descriptive
(Inferential or Descriptive) A politician wants to know the exact number of votes he received in the last election
Inferential
(Inferential or Descriptive) A politician would like to estimate, based on an opinion poll, his chance of winning in the upcoming election
Descriptive
(Inferential or Descriptive) The marketing research group of a company wishes to determine the number of families not eating three times a day in the sample used for their survey.
Descriptive
(Inferential or Descriptive) Tina wants to determine her average score in her past three exams in Stat 101.
Inferential
(Inferential or Descriptive) To determine if reforestation is effective, we can take a representative portion of denuded forests and use inferential statistics to draw conclusions about the effect of reforestation on all denuded forests.
Descriptive
(Inferential or Descriptive) Tobey wants to know the proportion of his allowance spent on transportation during the past three weeks, using his personal daily expenditure records.
Descriptive
(Inferential or Descriptive) ◦ A bowler wants to find his bowling average for the past 12 games
Interval
(Levels of Measurement) distinct, order, fixed spacing
Ordinal
(Levels of Measurement) faculty rank
Ratio
(Levels of Measurement) weight of a newborn baby (in kgs.)
Interval
(Levels of Measurement) Centigrade
Nominal
(Levels of Measurement) Dichotomous
Ratio
(Levels of Measurement) Kelvin
Ordinal
(Levels of Measurement) Performance rating
Ratio
(Levels of Measurement) allowance of a student (in pesos),
Ordinal
(Levels of Measurement) can be arrange but di fixed difference
Nominal
(Levels of Measurement) di naarrange
Ratio
(Levels of Measurement) distance traveled by an airplane (in kms.),
Probability Sampling
(Probability vs Non-Probability Sampling) 1-35 people in class to be chosen
Probability Sampling
(Probability vs Non-Probability Sampling) dice
Non-Probability Sampling
(Probability vs Non-Probability Sampling) polling tas orgmates lang pinasagutan
Qualitative
(Qualitative or Quantitative) Color: 1- Black, 2- White
Qualitative
(Qualitative or Quantitative) Color: Black, White
Quantitative
(Qualitative or Quantitative) Age: 20, 32, 34
Random Sample from an Infinite Population
(X1, X2 ,...,Xn ) if the values of X1 , X2 ,...,Xn are n independent observations generated from the same probability distribution.
deterministic model
(deterministic/probabilistic) area of a circle, hypotenuse of a right triangle
probabilistic model
(deterministic/probabilistic) tossing a coin, medical experiment
higher
(higher/lower) confidence level = better = para konti mali
small
(small/large) measure of dispersion level = observations are not too different from each other
large
(small/large) measure of dispersion level = observations are very different from each other
small
(small/large) variance = observations are concentrated about the mean
large
(small/large) variance = observations are far or very different from the mean
Positively Skewed Distribution
- Concentration is on the left side, tapering-off on the right side (Skewed to the right)
Point Estimator
- It is a rule or a formula that tells us how calculate a single value to estimate the unknown value of the parameter
Variance
-average squared difference of each observation from the mean
Discrete
- a variable which can assume finite, or, at most, countably infinite number of values; usually measured by counting or enumeration
Continuous
- a variable which can assume infinitely many values corresponding to a line interval
Census
- complete enumeration
Raw Data
- data in their original form; as u collect it
Elementary unit/Element
- is a member of the population whose measurement on the variable of interest is what we wish to examine
Type II Error
- is the error made by accepting (not rejecting) the null hypothesis when it is false.
Sampling frame
- listing of all the individual units in the population
Array
- ordered arrangement of data according to magnitude
Point Estimate
- realized value of the random variable (estimator)
Statistical Methods of Applied Statistics
- refer to procedures and techniques used in the collection, presentation, analysis, and interpretation of data
Measure of Central Tendency
- single value that is used to identify the "center" of the data
Rule Method
- stating a rule that the elements must satisfy in order to belong in the set
Random Component
- takes care of the variability
minimum
-An optimal property of interval estimators is (minimum, maximum) length
TEST OF INDEPENDENCE
-Contingency tables, r rows and c columns
TEST OF INDEPENDENCE
-Used to test whether or not two categorical variables are related
Reliability
-an estimator that does not differ much in value from sample to sample
1
0!
1. t-distribution 2. Chi-square distribution
2 NEW DISTRIBUTIONS
Uniform Experiments
A point x is selected at random from the interval [a,b] in such a way that the probability that X will belong to any subinterval of [a,b] is proportional to the length of that subinterval.
Bernoulli trial
A random experiment whose outcomes can be classified as "success" or "failure"
Discrete Random Variable
A random variable defined over a discrete sample space is called
independent
A, B A, Bc B, Ac Ac, Bc
PROBABILITY SAMPLING
All the elements of the sampled population must have an opportunity of being a part of the sample.
Memory-Less Property of the Exponential Distribution
An "old" functioning component has the same lifetime distribution as a "new" functioning component or that the component is not subject to fatigue or to wear.
A priori A posteriori Subjective
Approaches to Assigning Probabilities
chi-square
As the degrees of freedom increase, so does the mean and the variance.
true
As the degrees of freedom increase, the variance of the t-distribution goes to 1
Test for the POPULATION MEAN
Assumes that the random sample is taken from a normal distribution
Test for the POPULATION PROPORTION
Assumes that the true population proportion is not close to 0 or 1 ◦ Assumes that n is large
Note that the test for independence is only valid if 1.The test is valid if at least 80% of the cells have expected frequencies of at least 5 2. No cell has an expected frequency ≤ 1. //If many expected frequencies are very small, researchers commonly combine categories of variables to obtain a table having larger cell frequencies. Generally, one should not pool categories unless there is a natural way to combine them.
Assumptions for Test of Independence
Monotonicity Property
B ⊆ A, P(B) ≤ P(A)
oBinomial => number of trials is fixed => number of successes is a random variable Geometric => the number of trials a random variable => number of successes is fixed at 1.
Binomial vs Geometric
Negatively Skewed Distribution
Concentration is on the right side, tapering-off on the left side (Skewed to the left)
Inferential Statistics
Deals with the techniques used in analyzing the sample data that will lead to generalizations about a population from which the sample came from
Descriptive Statistics
Deals with the techniques used in the collection, presentation, organization, and analysis of the data on hand.
TYPE I ERROR: Saying that the mean number is less than 100, when it is actually greater than or equal to 100. TYPE II ERROR: Saying that the mean number is greater than or equal to100, when it is actually less than 100.
Define Type 1 and Type II error: A specific brand of tissue pull-ups claim that each pack contains 100 pulls. Ho: The mean number of pull-ups for a pack of tissue is greater than or equal to 100. Ha: The mean number of pull-ups for a pack of tissue is less than 100.
Probabilistic Model
Describes a phenomenon whose outcomes now depend on some random component
Unbiasedness, Reliability
Desirable Properties of a Point Estimator
Ratio Interval Interval
Determine the level of measurement used in each item. a. Height of a building in meters b. Age (0 years old, 1 year old, 2 years old,...) c. IQ
CRITICAL VALUE
Determines the "cut-off".
Quartiles
Divide the ordered observations into 4 equal parts.
Deciles
Divides the ordered observations into 10 equal parts
Bernoulli
Either success or hindi
True
Every sample point must be assigned to any real number.
population mean and population proportion
Examples of parameters are ___ and ___
Interval - Frequency
Frequency Histograms Parts: Horizonal - Vertical
1. Frequency Histogram 2. Stem and Leaf Display
Graphical Displays of Data Types
0 or 1
However, once a sample has been observed, the CI is not anymore random and thus the probability that it will enclose the true parameter value is either ____ OR ____
Independent Sampling
IF UNEQUAL and walang mathcing, ito kagad yung sa two population
"weighted average".
If X is discrete, it Expected Value similar to the ____
a posteriori
If a random experiment is repeated may times under uniform conditions, use the empirical probability of event A to assign its probability
a priori
If an experiment can result in any one of N different equally likely outcomes, and if exactly n of these outcomes belong to event A
standard normal random variable
If the normal random variable has mean 0 and variance 1
true
If the p-value is small, the observed data is inconsistent with Ho. Thus, we tend to reject Ho for smaller p-values.
RIGHT INTERPRETATION OF CONFIDENCE COEFFICIENT
If we repeatedly take samples of size n, for each of which we compute (1-α)100% confidence intervals, then (1- α)100% of these computed confidence intervals will contain the unknown value of the parameter
true
If 𝐴1, 𝐴2,..., 𝐴𝑛 are independent, then the compliments 𝐴1𝑐 , 𝐴2𝑐, ..., 𝐴𝑛𝑐 are independent.
CENTRAL LIMIT THEOREM
If 𝑋 is the mean of a random sample of size n from a large or infinite population with mean 𝜇 and variance 𝜎 ( 𝑛𝑜𝑡 𝑛𝑒𝑐𝑒𝑠𝑠𝑎𝑟𝑖𝑙𝑦 𝑛𝑜𝑟𝑚𝑎𝑙) , then the sampling distribution of 𝑋 is approximately normally distributed with mean when n is sufficiently large.
Inferential Statistics
Information from the sample is used to characterize the population from where it is drawn
Instead of getting the probability of X being equal to a particular value, the probability of X falling between an interval [a, b] is computed.
Length of the interval, Level of Confidence
Interval Estimation: Measure of "Correctness"
true
It has been the practice that you work in the hopes of concluding Ha,
Continuous Random Variable
It is a random variable that can take on infinitely many values.
Test Statistic
It is a statistic whose value is calculated from sample measurements and on which the statistical decision will be based
Two-tailed Test of Hypothesis
It is a test where the alternative hypothesis does not specify a directional difference for the parameter of interest.
One-tailed Test of Hypothesis
It is a test where the alternative hypothesis specifies a one-directional difference for the parameter of interest.
CRITICAL REGION
It is also called the rejection region.
PERMUTATION
It is an ordered arrangement of r-distinct elements selected from the set Z. It can be represented by an ordered r-tuple with distinct coordinates.
ALTERNATIVE HYPOTHESIS
It is denoted by Ha.
Null Hypothesis
It is denoted by Ho.
ALTERNATIVE HYPOTHESIS
It is sometimes referred to as the research hypothesis
Null Hypothesis
It must contain the condition of equality and must be written with the symbol =, ≤, or ≥.
Null Hypothesis
It represents what the experimenter doubts to be true.
Discrete Random Variable
Its sample space must be discrete or countable; There are no in between values.
Continuous Random Variable
Its set of possible values consists of an entire interval on the number line - that is, if for some a < b, any number x between a and b is possible
random variable
Maps each element of the sample space to one and only one real number.
Negatively Skewed Distribution
Mean < Median < Mode
Symmetric Distribution
Mean = Median = Mode
Positively Skewed Distribution
Mean > Median > Mode
true
Narrower intervals and higher confidence levels are associated with more precise estimates but they're inversely proportional
P(A) ≥ 0
Non-negativity
mean
Outliers may greatly affect the value of this
P( X = x) not a truly impossible event, but it almost never happens.
Multiplication Rule
P( summation 𝐴𝑖) = P(𝐴1) P(𝐴2|𝐴1)x P(𝐴3|𝐴1𝐴2)... P(𝐴𝑛|𝐴1𝐴2.... 𝐴𝑛−1)
P(A) - P(A ∩ B)
P(A ∩ Bc) = ?
Theorem of Total Probability
P(A) = P(A|B)P(B) + P(A|𝐵𝑐)P(𝐵𝑐)
1 - P(A|D)
P(Ac | D) = ?
1 - P(A)
P(Ac) = ?
Independence of Events
P(A|B) = P(A) if P(B) > 0 P(B|A) = P(B) if P(A) > 0 P(A ∩ B) = P(A)xP(B
P(A) + P(B) - P(A ∩ B)
P(A∪ B) = ?
P value
P(obtaining data at least as "extreme" as what has been observed|Ho is true)
0
P(∅) = ?
0
P(∅|𝐷) = ?
Independent Paired Paired Independent
Paired or Independent: The principal wishes to determine if grade 6 boys are better than grade 6 girls in math. A random sample of boys were selected. A random sample of girls were also selected. All students were asked to take a standardized exam in math. 2. The police wants to assess the effect of an obvious radar trap on the speed of cars. Ten cars were randomly selected on a highway, and their speeds were measured before a radar trap comes into view and after they pass the radar. 3. To test the effect of background music on productivity, factory workers were observed. For one week they worked with no music. For the next week, they worked with background music. 4. A drug company wishes to examine the effectiveness of a new drug in reducing blood pressure. A group of 50 people were given the new drug, while another group of 50 people were given a placebo.
99%
Percentile Limit / up to hanggang lang ang percentile
Permutation
Permutation or Combination: Assigning students to their seats in the first day of class.
Combination
Permutation or Combination: Selecting 3 students to attend a meeting in Cebu.
Permutation
Permutation or Combination: Selecting a lead and an understudy for a play.
It is a function whose value ranges from 0 to 1. oIts domain is the set of all real numbers while its range is the interval [0,1]. oIt is a nondecreasing function. oEvery random variable will only have one and only one CDF.
Properties of CDF
Non-negativity, Norming and Finite Additivity
Properties of Probability
(1) P(X = x) = f(x) > 0 if x is a mass point (2) ∑f(x) = 1 across all mass points (3) P(X∈A)=∑x∈Af(x)
Properties of a PMF
P25, P50, P75
Q1, Q2, Q3 equivalent to percentiles
SRSWR
SRSWOR or SRSWR: lotto na balik sa fishbowl
SRSWOR
SRSWOR or SRSWR: prerog
Census, Sampling. Probability Sampling, Non-probability Sampling
Sampling Designs
1. 𝐹𝑥 −∞ = lim 𝑥→−∞ 𝐹𝑥 𝑥 = 0 and 𝐹𝑥 ∞ = lim 𝑥→∞ 𝐹𝑥 𝑥 = 1 2. 𝐹𝑥(∙) is a monotone, nondecreasing function; i.e. 𝐹𝑥 𝑎 ≤ 𝐹𝑥 𝑏 for any a < b. 3. 𝐹𝑥 ∙ is continuous from the right; that is, lim ℎ→0+ 𝐹𝑥 𝑥 + ℎ = 𝐹𝑥 𝑥 for all x. just use the slide
Satisfying CDF
Mean
Sum of all the values in the collection divided by the total number of elements in the collection.
Generalized Basic Principle of Counting
Suppose an experiment can be performed in k stages. If there are n1 distinct possible outcomes in the first stage, and if for each of these n1 outcomes there are n2 distinct possible outcomes in the second stage, and if for each of the n1 x n2 outcomes of the first 2 stages, there are n3 distinct possible outcomes in the third stage; continuing in this manner, we reach the last stage where there are nk distinct possible outcomes for each of the outcomes of the first (k-1) stages, then there are n1 x n2 x ... x nk possible outcomes of the experiment.
Basic Principle of Counting
Suppose an experiment can be performed in two stages. If there are n distinct possible outcomes in the first stage of the experiment and if, for each outcome of the first stage, there are m distinct possible outcomes in the second stage, then there are n x m possible outcomes of this experiment
Random Sample from a Finite Population
Suppose we select n distinct elements from a population consisting of N elements, using a particular sampling method.
Independent sampling Paired/Related sampling
TWO TYPES OF SAMPLING FROM 2 POPULATIONS
Ho is true Ho is false Reject Ho, Type I, Correct Do not reject Ho, Correct, Type II
Table of Errors
PROBABILITY SAMPLING
The chance that an element will be included in the sample need not be equal for all elements.
PROBABILITY SAMPLING
The chance that the element will be included in the sample can be determined.
true
The hypothesized value of 𝜇 or P can be obtained from previous studies or from knowledge about the population
EXPONENTIAL
The length of time interval between successive happenings can be shown as this provided that the number of happenings in a fixed time interval has a Poisson distribution
True
The mass points with larger chances of occurrence have heavier weights in Expected value
specification, scope
The of the _____ of interest depends upon the ____ of the study
before
The parameter must be identified (before/after) analysis
Random Samples
The probabilities of including an element need not be equal for all elements. It just needs to be known and nonzero.
mass points
The values of the discrete random variable X for which f(x) > 0
estimation and hypothesis testing
There are two ways of statistical inference about a parameter can be made from a sample: through _______ and by ______.
Wrong interpretation! TANDAAN WAG SA ESTIMATE. SA ESTIMATOR YUNG PAGCAPTURE. RANDOM VARIABLE = ESTIMATOR! TAS ONLY RANDOM VARIABLE CAN HAVE A PROBABILITY not the value / estimate
There is a (1-𝛼)100% probability that the true value of the parameter falls within the computed interval.
Paired Sampling
This is accomplished by "matching" the measurements in two populations.
CRITICAL VALUE
This is the value that you compare to the test statistic to draw the conclusion of rejecting or not rejecting the null hypothesis.
Descriptive and Inferential
Two Branches of Statistics
1. CDF Technique 2. MGF Technique
Two Transformation Techniques
Null and alternative Hypothesis
Two Types of Hypotheses
Random Sample from a finite population Random Sample from an infinite population
Two Types or Random Sample
roster method and rule method
Two Ways of Specifying a Set
Test for the POPULATION MEAN and Test for the POPULATION PROPORTION
Two types of Single Population Test
One- tailed test Two- tailed test
Type of Test of Hypothesis
Discrete Continuous
Types of Random Variable
Descriptive Statistics, Inferential Statistics
Types of Statistical Methods of Applied Statistics
HYPOTHESIS TESTING
Using information from the sample to either support or disprove a statement about a certain characteristic of the population
HYPOTHESIS TESTING
Usually the conjecture concerns one of the unknown parameters of the population.
TYPE I ERROR: Saying that the mean number is not 100, when it is 100. TYPE II ERROR: Saying that the mean number is 100, when it is not.
What are the errors: A specific brand of tissue pull-ups claim that each pack contains 100 pulls. Ho: The mean number of pull-ups for a pack of tissue is equal to 100. Ha: The mean number of pull-ups for a pack of tissue is not equal to 100.
TYPE I ERROR: Saying the bridge is not risky, when it is risky. TYPE II ERROR: Saying the bridge is risky, when it is not risky.
What are the errors: An inspector has to choose between certifying a bridge as risky or saying that the bridge is not risky. Ho: The bridge is risky. Ha: The bridge is not risky
true
When actually conducting a hypothesis test, we operate under the assumption that the parameter is equal to a specific value.
same
While the variance is in the squared unit of the measurements, the standard deviation possesses the ____ unit as the measurements.
P(Ω) = 1
[Norming]
Positively Skewed Distribution
^\______
Symmetric Distribution
__/^\__
Negatively Skewed Distribution
_______/^
random variable
a function whose value is a real number that is determined by each sample point in the sample space.
Measure of Skewness
a single value that indicates the degree and direction of asymmetry.
Quantitative
a variable that takes on numerical values representing an amount or quantity
Qualitative
a variable that yields categorical responses
simple random sampling without replacement (SRSWOR)
all the n elements in the sample must be distinct from each other.
Stem and Leaf Display
alternative method for describing a set of data and presents a histogram-like picture of the data
HYPOTHESIS TESTING
an area of statistical inference in which one evaluates a conjecture about some characteristic of the parent population based upon the information contained in the random sample.
𝐸𝑖𝑗′𝑠
are the expected frequencies if the variables are independent
a priori
assigns probabilities to events before the experiment is performed
A posteriori
assigns probabilities to events by repeating the experiment a large number of times.
Subjective Probability
assigns probabilities to events using intuition, personal beliefs, and other indirect information
Variance of X
average squared deviation between the realized value of X and mew
Paired Sampling
before vs after, twins.
"collection of", "set of"
beginning phrase of population and sample
Statistics
branch of science that deals with the collection, presentation, organization, analysis and interpretation of data.
Graphical Displays of Data
can give information on the location, spread, extremes, and shape of the distribution
n(Ω)
cardinality
variable
characteristic or attribute of the elements in a collection that can assume different values for the different elements.
parameter
characteristics of the population
equal to v
chi's mean
2v
chi's variance
Qualitative or Quantitative, Discrete or Continuous
classification of variables
universe/population
collection of all elements under consideration in a (statistical) inquiry
Sample Space
collection of all possible outcomes of a random experiment
parametric family of density functions
collection of density functions that is indexed by a quantity called a parameter.
Sampled population
collection of elements from which the sample is actually taken
Data
collection of observations
parameter (θ)
constant that determines the specific form of the density function
Abstract Model
description of the essential properties of a phenomenon that is formulated in mathematical terms
not unique
description of the sample space is (unique/not unique)
Ratio
distict, order, fixed spacing, absolute zero
beta
distribution of 𝑅 2 under the standard regression assumptions.
Chi-square distribution
distribution skewed to the right
Percentiles
divide the ordered observations into 100 equal parts.
Median
divides the ordered set of observations into two equal parts.
domain: sample space counterdomain: the set of all real numbers
domain and range of random variables
GAMMA
extension of the exponential distribution where we now wait for the rth occurrence of an event.
DISCRETE UNIFORM
if hanap probability na makuha si X
empty set (∅)
impossible event
hypergeometric experiment
involves the selection of a sample of size n using simple random sampling without replacement from a population consisting of N elements, K of which falls in the category of "success" and the remaining N-K as "failure".
hypothesis
is a claim or statement about the population parameter
Simple random sampling (SRS)
is a method of selecting n units out of the N units in the population in such a way that every distinct sample of size n has an equal chance of being drawn.
Random Experiment
is a process that can be repeated under similar conditions but whose outcome cannot be predicted with certainty before hand.
PROBABILITY
is a quantity between 0 to 1.
point estimator
is a single statistic whose realized value is used to estimate an unknown parameter
INDICATOR FUNCTIONS
is the function with domain Ω and counterdomain {0,1}
NORMAL
is the most widely known and used probability distribution
Factorial notation
is the compact representation for the product of the first n consecutive positive integers. 𝑛! = 𝑛 𝑥 𝑛 − 1 𝑥 𝑛 − 2 𝑥 ... 𝑥 2 𝑥 1
Type I Error
is the error made by rejecting the null hypothesis when it is true
P value
is the probability of getting "worse" results.
Confidence coefficient
is the probability that the interval estimator encloses the true value of the parameter.
BETA
model linked with many common distributions like the binomial distribution, uniform distribution, and gamma distribution.
Mode
most frequent observed value in the data set
combination
number of distinct r-combinations that can be formed from the n elements of set Z is
NEGATIVE BINOMIAL
number of failures before the rth success.
Binomial
n Bernoulli trials, same probability of success, trials are independent
n taken r
nCr
permutation n taken r
nPr read as this
sample variance
not the mean of the squared deviations of the observations from the mean.
geometric
number of failures until the first success.
variance
not absolute tong measure of dispersion kasi di same unit
Measurement
process of determining the value or label of the variable based on what has been observed.
Sampling
process of obtaining information from the units in the selected sample
PROPORTION
proportion of elements possessing a characteristic of interest in a collection
random experiment
pwedeng ulit-ulitin within a certain conditions pero di ka parin sure sa output
PROPORTION
quotient obtained when we divide the magnitude of a part by the magnitude of the whole.
GEOMETRIC
random experiment that satisfies the following properties: 1. It consists of repeated independent Bernoulli trials with probabilities of success remaining constant from trial to trial. 2. It will be concluded only after the first success is observed.
Poisson experiment
random experiment that satisfies the following properties: 1. It is possible to partition a specified time or space interval into many smaller non-overlapping subintervals; 2. The number of outcomes that occur in a given subinterval does not depend on the number of outcomes in any other disjoint subinterval; 3. The probability that a single outcome will occur in a very short subinterval is proportional to the length of the subinterval; and, 4. The probability that more than 1 outcome will occur in such a short subinterval is almost zero.
waiting time for the next success
random variable X in exponential counterpart geometric is.
interval estimation
range of values
observation
realized value of a variable.
point estimate
realized value of an estimator
Confidence interval estimate
realized value of the interval
Partition
regrouping of the whole set
Confidence Interval estimator
rule that tells us how to calculate two numbers based on sample data that will form an interval where we expect the population parameter to fall with a specified degree of confidence.
probabilistic model
same input, different output
deterministic model
same input, same output
Measure of Skewness
shows the degree of asymmetry, or departure from symmetry of a distribution.
point estimation
single value
flat / tail
skewed hint positive is right negative is left
zero
smallest value of measure of dispersion meaning they're the same
array
sorted data or ordered data; arranged
comparison, explanation, justification, prediction, estimation
sound decision making
Expected Value
the mean of the distribution X because it is really a measure of central tendency
PROBABILITY
the measure of the likeliness an event will occur.
simple random sampling with replacement (SRSWR)
the n elements in the sample need not be distinct, that is, an element can be selected more than once to be part of the sample.
Statistics
the science dealing with data about the condition of a state or community
empty set (∅) and sample space (Ω)
two subset of the sample space that will always be events
Deterministic Model
type of abstract model that does not leave room for random variation.
Deterministic Probabilistic
types of abstract model
Median, Mode
unaffected by outliers.
Measures of Location
values below which a specified fraction or percentage of the observations in a given set must fall
Standard Error
will be used as a measure of reliability of our statistic.
Rule Method
Ω = { (x,y) | x,y is a _____ }
Roster Method
Ω = { , , , }
Bayes' Theorem
𝑃 (𝐵|𝐴) = 𝑃(𝐴|𝐵)𝑃(𝐵) / P(A|B)P(B) + P(A|𝐵𝑐)P(𝐵𝑐)