Stat 130 Final Exam

Ace your homework & exams now with Quizwiz!

probability sampling

A sampling procedure that gives every element of the population a (known) nonzero chance of being selected in the sample

non-probability sampling

A sampling procedure that gives some element of the population a (known) zero chance of being selected in the sample

sample points

An event occurred if the outcome of the experiment is one of the ___________ belonging in the event.

P10...P90

D1...D9

LEVEL OF SIGNIFICANCE (∝)

It is the maximum probability of Type I error the researcher is willing to commit.

ALTERNATIVE HYPOTHESIS

It is the operational statement of the theory that the experimenter believes to be true and wishes to prove.

ACCEPTANCE REGION

It is the set of values of the test statistic for which the null hypothesis will not be rejected.

CRITICAL REGION

It is the set of values of the test statistic for which the null hypothesis will be rejected.

range

It is the simplest and easiest to use measure of dispersion.

Null Hypothesis

It is the statement being tested

ALTERNATIVE HYPOTHESIS

It is the statement that must be true if the other one is false.

CRITICAL VALUE

It is the value of the test statistic separating the acceptance and rejection regions.

Statistical Methods of Applied Statistics, Statistical Theory of Mathematical Statistics

Fields of Statistics

ESTIMATION

Finding a single value or a range of values computed from the sample data that may be used to make a statement about the unknown value of the parameter

P(A) = P(A1) + P(A2) +... + P(An)

Finite Additivity Given A1 ∪ A2 ∪ A3 ∪ ... ∪ An // mutually exclusive events can be added together

decreases the length of the confidence interval

For a fixed level of confidence, higher sample size, ______

true

For a fixed sample size, a smaller α yields a smaller critical region. (That is why it is more difficult to reject the Ho)

increases the length of the confidence interval

For fixed sample size, higher level of confidence, ______

One-tailed Test of Hypothesis

Ho: μ = 14 vs. Ha: μ > 14 Ho: μ = 14 vs. Ha: μ < 14

Two-tailed Test of Hypothesis

Ho: μ = 14 vs. Ha: μ ≠ 14

UCL - LCL (Upper confidence limit-Lower confidence limit)

Length of the interval

choose the shorter kasi mas reliable/sakto

Length of the interval (choose the longer or choose the shorter)

Statistic

Let (X1, X2 ,...,Xn) be a random sample, ______ is a random variable that is a function of X1 , X2 ,...,Xn

Conditional Probability

Let A and B be two events where P(B) > 0. The conditional probability of event A given the occurrence of event B, denoted by P(A|B)

Ratio Interval Ordinal Nominal

Levels of Measurement

A ∩ B = ∅

Mutually Exclusive Events

oIts properties are: ◦ Symmetric about 𝜇 ◦ Bell-shaped ◦ Its range of possible values are infinite on both directions. ◦ Asymptotic to the x-axis ◦ Area below the whole curve is 1

NORMAL properties

ESTIMATION

Obtaining a possible value or a set of values of the parameter using sample data.

true

PDF of the chi-square distribution is positive for positive real number only. Otherwise, it is zero

Inferential Statistics

Process of drawing conclusions about certain characteristics of the population using information obtained from a sample.

true

The Binomial distribution is the generalization of the Bernoulli distribution (Be(p) = Bi(n = 1, p))

1. the sample size, n 2. method of choosing the random sample 3. population under study

The Factors that Affect the Sampling Distribution of a Statistic

reduced sample space

The conditional probability of event A given B can be regarded as the probability assigned to the event containing sample points that are in A and in B

BEFORE

The confidence coefficient gives the probability that the confidence interval, _______ sampling, will enclose the true parameter value.

true

The cut-off/decision rule gives us the confidence that we can support the said claim or disprove it.

Frequency Histogram

-bar graph that displays the intervals on the horizontal axis and the frequencies of classes on the vertical axis

range

-difference between the largest and the smallest value

Measure of Central Tendency

-finding the typical value

Variance

-gives an idea on how close the observations are to the mean

Measures of Dispersion

-indicate the extent to which individual items in a series are scattered about an average

Census

-process of gathering information from every unit in the population

Unbiasedness

-the expected value of the estimator is equal to the parameter it is estimating

Standard Deviation

positive square root of the variance

t-distribution

t is different with the standard normal, in terms of its variance. It has a larger variance.

Elementary Unit, Sampling Unit, Sampling Frame

student students list of all colleges

Elementary Unit, Sampling Unit, Sampling Frame

student students list of all students

event

subset of the sample space whose probability is defined

parameter

summary measure describing a specific characteristic of the population.

statistic

summary measure describing a specific characteristic of the sample.

sample space (Ω)

sure event

x(1) ( subscript yung (1) )

symbol for array

σ squared

symbol for population variance

x1 (subscript yung 1)

symbol for raw data

s squared

symbol for sample variance

Md

symbol of Median

σ

symbol of POPULATION STANDARD DEVIATION

μ

symbol of Population Mean

s

symbol of SAMPLE STANDARD DEVIATION

symbol of Sample Mean

Mo

symbol of mode

p

symbol of parameter

Ω

symbol of sample space

symbol of statistic

Measures of Dispersion

tells how varied the observations are from each other

CONTINUOUS UNIFORM

used in any situation when a value is picked "at random" from a given interval.

Partition

The number of distinct ways of arranging n objects of which n1 are of one kind, n2 are of a second kind, ..., nk are of a kth kind is

Independent Sampling

The sample size from the first population may not be equal to the sample size from the second population.

true

The sampling distribution of a statistic is its probability distribution.

Independent Sampling

The selection of the random sample from one population will not affect the selection of the random sample from the other population.

parameter space

The set of all possible value that the parameter can take on is denoted by Θ

Standard Error

The standard deviation of a statistic

mean median mode

Three Measures of Central Tendencies

null hypothesis, either reject Ho or fail to reject Ho

We test the _______ directly. We assume it is true and reach a conclusion to either _____ or _____

outliers

data values that are extremely different from the rest of the data items

Statistical Theory of Mathematical Statistics

deals with the development and exposition of theories that serve as bases of statistical methods; more on proofs

a posteriori

definition of the probability of event A is the limiting value of its empirical probability if we repeat the process endlessly.

CUMULATIVE DISTRIBUTION FUNCTION

denoted by F(•), is a function defined for any real number x as 𝐹 𝑥 = 𝑃(𝑋 ≤ 𝑥)

PROBABILITY DENSITY FUNCTION

denoted by f(•), is a function that is defined for any real number x and satisfies the following properties: 1. f(x) ≥ 0 2. The area below the curve, f(x), and above the x-axis is always equal to 1; and 3. P ( a ≤ X ≤ b) is the area bounded by the curve f(x), the x-axis, and the lines x = a and x = b.

X (uppercase) - denotes the random variable x (lowercase) - denotes the value of the random variable

denoting random variable and value

random variable

depends on the outcome of the random experiment

Probabilistic Model

describes a phenomenon by assigning a likelihood of occurrence to the different possible outcomes of the process.

Deterministic Model

describes a phenomenon through known relationships among the states and events, in which a given input will always produce the same output.

sa drawing, yung pinakamalaki at pinakasiksik yung minimum variance = most reliable

drawing, describe minimum variance

sample point.

element of the sample space

true

a lower level of significance means a "stricter" test, i.e it would be more difficult to reject Ho.

Poisson

can be effectively used to approximate Binomial probabilities when the number of trials n is large, and the probability of success p is small.

Graphical Displays of Data

can be helpful to describe the distribution of dataset

Reliability

can be seen through minimum variance of the estimator

P50, Q2, D5

equivalent measure of location of median

EXPONENTIAL

continuous analogue of the geometric distribution

GAMMA

continuous counterpart of the negative binomial distribution

Hypergeometric

converges to the Binomial distribution as M becomes large. This is when the sample size n is small relative to the size of the population.

0

for continuous random variable, P( X = x) = ?

PROBABILITY MASS FUNCTION

function defined for any real number x as f(x) = P(X = x)

Probability

gaano kalikely mangyayari ang isang bagay

Hypothesis Testing

gives a clear cut-off from where we can distinguish between events that could have occurred by chance and events that are highly unlikely.

Frequency Histogram

graphical display of data for grouped data (in intervals)

Stem and Leaf Display

graphical display of data which retains the actual observations (needs to be equally spaced)

EXPONENTIAL

has been used as a model for lifetimes of various things. May also be used for areas or volumes.

t-distribution and chi-square

it only has one parameter, its degrees of freedom (v)

Inferential

kahit sample ka kumuha, kaya ibalik sa population

Descriptive Statistics

kung saan ka kumukuha, doon ka lang kuha conclusion

status

latin word of statisics

sample space

listahan ng possible outcomes in a random experiment

Roster Method

listing down all the elements belonging in the set

mean

locates the center of mass of the dataset

Pearson's second coefficient of skewness

lots of mode

Lower confidence limit

lower endpoint

Confidence Interval estimator

may room for error; range of values; nacacapture niya ba yung true parameter

the value for which at least x% of the observations are less than or equal to it and at least 100-x% have values greater than or equal to it.

meaning of Px

state

meaning of status

measures of dispersion

measure ng kakaiba

standard deviation

measure of absolute dispersion

mode

measure of central tendency that does not always exist

Median

measure of central tendency that is best used when dealing with data containing outliers

mean, median

measure of central tendency that is unique

mean

measure of central tendency that may not be one of the observations

mean

measure of central tendency that utilizes all of the observed values in the collection.

median

measure of central tendency which is also a measure of location.

Descriptive Statistics

methods concerned w/ collecting, describing, and analyzing a set of data without drawing conclusions (or inferences) about a large group

Inferential Statistics

methods concerned with the analysis of a subset of data leading to predictions or inferences about the entire set of data

Median

middle observation of the array or average of the two middle values in the array

Pearson's first coefficient of skewness

one mode

confidence level

other name of Confidence coefficient

relative frequency

other name of a posteriori

classical probability

other name of a priori

n = kung ilan pair; The sample size, n, is equal to the number of pairs in the sample.

paired sampling's n

sample

part or subset of the population from which the information is collected.

PROPORTION

part/total

pdf = derivative of cdf cdf = integration of pdf

pdf and cdf relationship

In simplicity, any pdf should be able to satisfy 1. f(x) ≥ 0 for all x, and 2. Area under the entire density curve = integ from −∞ to ∞ 𝑓( 𝑥 )𝑑𝑥 = 1

pdf properties

Target population

population from which information is desired

Ratio

(Levels of Measurement) height of an adult (in cms.)

Nominal

(Levels of Measurement) major island group

Interval

(Levels of Measurement) ratio with no absolute zero

Nominal

(Levels of Measurement) religion,

Nominal

(Levels of Measurement) sex, religion, major island group

Ordinal

(Levels of Measurement) student ranking

Ratio

(Levels of Measurement) the speed of a car (in kms/hr)

Random Samples

A sample that is selected using probability sampling.

Negatively Skewed Distribution

Sk < 0

Symmetric Distribution

Sk = 0

Positively Skewed Distribution

Sk > 0

mode

best used for categorical data

GEOMETRIC BY MENDENHALL

number of trials until the first success

NEGATIVE BINOMIAL BY MENDENHALL

number of trials until the rth success

Paired Sampling

oIt may be done using oSame sample subjects for the two populations (before and after) oPairing of subjects from the two groups with respect to an extraneous variable (The pairing removes the effects of the extraneous variable).

Sampling unit

unit of the population that we select in our sample

the square of the unit used to measure X.

unit of the variance

Upper confidence limit

upper endpoint

range

use this measure of dispersion if concentrated

POISSON

used for modelling the number of occurrences of a certain event (usually a rare event such as accidents and errors) in a specified space or time interval.

Confidence coefficient

(1 −∝ )100%

Nominal

(Levels of Measurement) distinct

CONTINUOUS

(DISCRETE or CONTINUOUS) Cooking Time in minutes

DISCRETE

(DISCRETE or CONTINUOUS) Height of a person (Halimaw, Model, Normal, Elf, and Smurf)

CONTINUOUS

(DISCRETE or CONTINUOUS) Height of a post in centimeters

CONTINUOUS

(DISCRETE or CONTINUOUS) Hours of sleep

DISCRETE

(DISCRETE or CONTINUOUS) Number of views a youtube video has

Ordinal

(Levels of Measurement) distinct, order

Set of all shuttlecocks of the particular brand produced this year

(Define population) A manufacturer of badminton shuttlecocks wishes to determine how many games their brand of shuttlecock (produced this year) will last on the average.

Set of all children below 12 years old in Metro Manila in 2011

(Define population) The Department of Health is interested in determining the percentage of children below 12 years old infected by the Hepatitis B virus in Metro Manila in 2011.

collection of all households in Metro Manila.

(Define population) to determine the average expenditure of all households in Metro Manila

Collection of all graduates of the University from the years 2006 to 2010

(Define population) The Office of Admissions is studying the relationship between the score in the entrance examination during application and the general weighted average (GWA) upon graduation among graduates of the university from 2006 to 2010.

the collection of all households in Quezon City or the collection of all households in Manila

(Define sample) determine the average expenditure of all households in Metro Manila. ◦ If we were to delimit the scope of the study, possibly due to budget constraints, then we would have to redefine the population of interest. This time we can delimit the scope of the study to include QC or Manila

Inferential

(Inferential or Descriptive) -A car manufacturer wishes to estimate the average lifetime of batteries by testing a sample of 50 batteries

Descriptive

(Inferential or Descriptive) -Janine wants to determine the variability of her six exam scores in Algebra.

Descriptive

(Inferential or Descriptive) -Ms. Macasaet wants to determine the proportion spent on transportation during the past four months using the daily records of expenditure that she keeps.

Inferential

(Inferential or Descriptive) . The Golden State Warriors wishes to estimate their chance of winning in the championship next season based on their average scores last season.

Inferential

(Inferential or Descriptive) . Toyota wishes to estimate the average lifetime of all their batteries by testing a sample of 50 batteries.

Inferential

(Inferential or Descriptive) A bowler wants to estimate his chance of winning a game based on his current season averages and the averages of his opponents

Descriptive

(Inferential or Descriptive) A politician wants to know the exact number of votes he received in the last election

Inferential

(Inferential or Descriptive) A politician would like to estimate, based on an opinion poll, his chance of winning in the upcoming election

Descriptive

(Inferential or Descriptive) The marketing research group of a company wishes to determine the number of families not eating three times a day in the sample used for their survey.

Descriptive

(Inferential or Descriptive) Tina wants to determine her average score in her past three exams in Stat 101.

Inferential

(Inferential or Descriptive) To determine if reforestation is effective, we can take a representative portion of denuded forests and use inferential statistics to draw conclusions about the effect of reforestation on all denuded forests.

Descriptive

(Inferential or Descriptive) Tobey wants to know the proportion of his allowance spent on transportation during the past three weeks, using his personal daily expenditure records.

Descriptive

(Inferential or Descriptive) ◦ A bowler wants to find his bowling average for the past 12 games

Interval

(Levels of Measurement) distinct, order, fixed spacing

Ordinal

(Levels of Measurement) faculty rank

Ratio

(Levels of Measurement) weight of a newborn baby (in kgs.)

Interval

(Levels of Measurement) Centigrade

Nominal

(Levels of Measurement) Dichotomous

Ratio

(Levels of Measurement) Kelvin

Ordinal

(Levels of Measurement) Performance rating

Ratio

(Levels of Measurement) allowance of a student (in pesos),

Ordinal

(Levels of Measurement) can be arrange but di fixed difference

Nominal

(Levels of Measurement) di naarrange

Ratio

(Levels of Measurement) distance traveled by an airplane (in kms.),

Probability Sampling

(Probability vs Non-Probability Sampling) 1-35 people in class to be chosen

Probability Sampling

(Probability vs Non-Probability Sampling) dice

Non-Probability Sampling

(Probability vs Non-Probability Sampling) polling tas orgmates lang pinasagutan

Qualitative

(Qualitative or Quantitative) Color: 1- Black, 2- White

Qualitative

(Qualitative or Quantitative) Color: Black, White

Quantitative

(Qualitative or Quantitative) Age: 20, 32, 34

Random Sample from an Infinite Population

(X1, X2 ,...,Xn ) if the values of X1 , X2 ,...,Xn are n independent observations generated from the same probability distribution.

deterministic model

(deterministic/probabilistic) area of a circle, hypotenuse of a right triangle

probabilistic model

(deterministic/probabilistic) tossing a coin, medical experiment

higher

(higher/lower) confidence level = better = para konti mali

small

(small/large) measure of dispersion level = observations are not too different from each other

large

(small/large) measure of dispersion level = observations are very different from each other

small

(small/large) variance = observations are concentrated about the mean

large

(small/large) variance = observations are far or very different from the mean

Positively Skewed Distribution

- Concentration is on the left side, tapering-off on the right side (Skewed to the right)

Point Estimator

- It is a rule or a formula that tells us how calculate a single value to estimate the unknown value of the parameter

Variance

-average squared difference of each observation from the mean

Discrete

- a variable which can assume finite, or, at most, countably infinite number of values; usually measured by counting or enumeration

Continuous

- a variable which can assume infinitely many values corresponding to a line interval

Census

- complete enumeration

Raw Data

- data in their original form; as u collect it

Elementary unit/Element

- is a member of the population whose measurement on the variable of interest is what we wish to examine

Type II Error

- is the error made by accepting (not rejecting) the null hypothesis when it is false.

Sampling frame

- listing of all the individual units in the population

Array

- ordered arrangement of data according to magnitude

Point Estimate

- realized value of the random variable (estimator)

Statistical Methods of Applied Statistics

- refer to procedures and techniques used in the collection, presentation, analysis, and interpretation of data

Measure of Central Tendency

- single value that is used to identify the "center" of the data

Rule Method

- stating a rule that the elements must satisfy in order to belong in the set

Random Component

- takes care of the variability

minimum

-An optimal property of interval estimators is (minimum, maximum) length

TEST OF INDEPENDENCE

-Contingency tables, r rows and c columns

TEST OF INDEPENDENCE

-Used to test whether or not two categorical variables are related

Reliability

-an estimator that does not differ much in value from sample to sample

1

0!

1. t-distribution 2. Chi-square distribution

2 NEW DISTRIBUTIONS

Uniform Experiments

A point x is selected at random from the interval [a,b] in such a way that the probability that X will belong to any subinterval of [a,b] is proportional to the length of that subinterval.

Bernoulli trial

A random experiment whose outcomes can be classified as "success" or "failure"

Discrete Random Variable

A random variable defined over a discrete sample space is called

independent

A, B A, Bc B, Ac Ac, Bc

PROBABILITY SAMPLING

All the elements of the sampled population must have an opportunity of being a part of the sample.

Memory-Less Property of the Exponential Distribution

An "old" functioning component has the same lifetime distribution as a "new" functioning component or that the component is not subject to fatigue or to wear.

A priori A posteriori Subjective

Approaches to Assigning Probabilities

chi-square

As the degrees of freedom increase, so does the mean and the variance.

true

As the degrees of freedom increase, the variance of the t-distribution goes to 1

Test for the POPULATION MEAN

Assumes that the random sample is taken from a normal distribution

Test for the POPULATION PROPORTION

Assumes that the true population proportion is not close to 0 or 1 ◦ Assumes that n is large

Note that the test for independence is only valid if 1.The test is valid if at least 80% of the cells have expected frequencies of at least 5 2. No cell has an expected frequency ≤ 1. //If many expected frequencies are very small, researchers commonly combine categories of variables to obtain a table having larger cell frequencies. Generally, one should not pool categories unless there is a natural way to combine them.

Assumptions for Test of Independence

Monotonicity Property

B ⊆ A, P(B) ≤ P(A)

oBinomial => number of trials is fixed => number of successes is a random variable Geometric => the number of trials a random variable => number of successes is fixed at 1.

Binomial vs Geometric

Negatively Skewed Distribution

Concentration is on the right side, tapering-off on the left side (Skewed to the left)

Inferential Statistics

Deals with the techniques used in analyzing the sample data that will lead to generalizations about a population from which the sample came from

Descriptive Statistics

Deals with the techniques used in the collection, presentation, organization, and analysis of the data on hand.

TYPE I ERROR: Saying that the mean number is less than 100, when it is actually greater than or equal to 100. TYPE II ERROR: Saying that the mean number is greater than or equal to100, when it is actually less than 100.

Define Type 1 and Type II error: A specific brand of tissue pull-ups claim that each pack contains 100 pulls. Ho: The mean number of pull-ups for a pack of tissue is greater than or equal to 100. Ha: The mean number of pull-ups for a pack of tissue is less than 100.

Probabilistic Model

Describes a phenomenon whose outcomes now depend on some random component

Unbiasedness, Reliability

Desirable Properties of a Point Estimator

Ratio Interval Interval

Determine the level of measurement used in each item. a. Height of a building in meters b. Age (0 years old, 1 year old, 2 years old,...) c. IQ

CRITICAL VALUE

Determines the "cut-off".

Quartiles

Divide the ordered observations into 4 equal parts.

Deciles

Divides the ordered observations into 10 equal parts

Bernoulli

Either success or hindi

True

Every sample point must be assigned to any real number.

population mean and population proportion

Examples of parameters are ___ and ___

Interval - Frequency

Frequency Histograms Parts: Horizonal - Vertical

1. Frequency Histogram 2. Stem and Leaf Display

Graphical Displays of Data Types

0 or 1

However, once a sample has been observed, the CI is not anymore random and thus the probability that it will enclose the true parameter value is either ____ OR ____

Independent Sampling

IF UNEQUAL and walang mathcing, ito kagad yung sa two population

"weighted average".

If X is discrete, it Expected Value similar to the ____

a posteriori

If a random experiment is repeated may times under uniform conditions, use the empirical probability of event A to assign its probability

a priori

If an experiment can result in any one of N different equally likely outcomes, and if exactly n of these outcomes belong to event A

standard normal random variable

If the normal random variable has mean 0 and variance 1

true

If the p-value is small, the observed data is inconsistent with Ho. Thus, we tend to reject Ho for smaller p-values.

RIGHT INTERPRETATION OF CONFIDENCE COEFFICIENT

If we repeatedly take samples of size n, for each of which we compute (1-α)100% confidence intervals, then (1- α)100% of these computed confidence intervals will contain the unknown value of the parameter

true

If 𝐴1, 𝐴2,..., 𝐴𝑛 are independent, then the compliments 𝐴1𝑐 , 𝐴2𝑐, ..., 𝐴𝑛𝑐 are independent.

CENTRAL LIMIT THEOREM

If 𝑋 is the mean of a random sample of size n from a large or infinite population with mean 𝜇 and variance 𝜎 ( 𝑛𝑜𝑡 𝑛𝑒𝑐𝑒𝑠𝑠𝑎𝑟𝑖𝑙𝑦 𝑛𝑜𝑟𝑚𝑎𝑙) , then the sampling distribution of 𝑋 is approximately normally distributed with mean when n is sufficiently large.

Inferential Statistics

Information from the sample is used to characterize the population from where it is drawn

pdf

Instead of getting the probability of X being equal to a particular value, the probability of X falling between an interval [a, b] is computed.

Length of the interval, Level of Confidence

Interval Estimation: Measure of "Correctness"

true

It has been the practice that you work in the hopes of concluding Ha,

Continuous Random Variable

It is a random variable that can take on infinitely many values.

Test Statistic

It is a statistic whose value is calculated from sample measurements and on which the statistical decision will be based

Two-tailed Test of Hypothesis

It is a test where the alternative hypothesis does not specify a directional difference for the parameter of interest.

One-tailed Test of Hypothesis

It is a test where the alternative hypothesis specifies a one-directional difference for the parameter of interest.

CRITICAL REGION

It is also called the rejection region.

PERMUTATION

It is an ordered arrangement of r-distinct elements selected from the set Z. It can be represented by an ordered r-tuple with distinct coordinates.

ALTERNATIVE HYPOTHESIS

It is denoted by Ha.

Null Hypothesis

It is denoted by Ho.

ALTERNATIVE HYPOTHESIS

It is sometimes referred to as the research hypothesis

Null Hypothesis

It must contain the condition of equality and must be written with the symbol =, ≤, or ≥.

Null Hypothesis

It represents what the experimenter doubts to be true.

Discrete Random Variable

Its sample space must be discrete or countable; There are no in between values.

Continuous Random Variable

Its set of possible values consists of an entire interval on the number line - that is, if for some a < b, any number x between a and b is possible

random variable

Maps each element of the sample space to one and only one real number.

Negatively Skewed Distribution

Mean < Median < Mode

Symmetric Distribution

Mean = Median = Mode

Positively Skewed Distribution

Mean > Median > Mode

true

Narrower intervals and higher confidence levels are associated with more precise estimates but they're inversely proportional

P(A) ≥ 0

Non-negativity

mean

Outliers may greatly affect the value of this

pdf

P( X = x) not a truly impossible event, but it almost never happens.

Multiplication Rule

P( summation 𝐴𝑖) = P(𝐴1) P(𝐴2|𝐴1)x P(𝐴3|𝐴1𝐴2)... P(𝐴𝑛|𝐴1𝐴2.... 𝐴𝑛−1)

P(A) - P(A ∩ B)

P(A ∩ Bc) = ?

Theorem of Total Probability

P(A) = P(A|B)P(B) + P(A|𝐵𝑐)P(𝐵𝑐)

1 - P(A|D)

P(Ac | D) = ?

1 - P(A)

P(Ac) = ?

Independence of Events

P(A|B) = P(A) if P(B) > 0 P(B|A) = P(B) if P(A) > 0 P(A ∩ B) = P(A)xP(B

P(A) + P(B) - P(A ∩ B)

P(A∪ B) = ?

P value

P(obtaining data at least as "extreme" as what has been observed|Ho is true)

0

P(∅) = ?

0

P(∅|𝐷) = ?

Independent Paired Paired Independent

Paired or Independent: The principal wishes to determine if grade 6 boys are better than grade 6 girls in math. A random sample of boys were selected. A random sample of girls were also selected. All students were asked to take a standardized exam in math. 2. The police wants to assess the effect of an obvious radar trap on the speed of cars. Ten cars were randomly selected on a highway, and their speeds were measured before a radar trap comes into view and after they pass the radar. 3. To test the effect of background music on productivity, factory workers were observed. For one week they worked with no music. For the next week, they worked with background music. 4. A drug company wishes to examine the effectiveness of a new drug in reducing blood pressure. A group of 50 people were given the new drug, while another group of 50 people were given a placebo.

99%

Percentile Limit / up to hanggang lang ang percentile

Permutation

Permutation or Combination: Assigning students to their seats in the first day of class.

Combination

Permutation or Combination: Selecting 3 students to attend a meeting in Cebu.

Permutation

Permutation or Combination: Selecting a lead and an understudy for a play.

It is a function whose value ranges from 0 to 1. oIts domain is the set of all real numbers while its range is the interval [0,1]. oIt is a nondecreasing function. oEvery random variable will only have one and only one CDF.

Properties of CDF

Non-negativity, Norming and Finite Additivity

Properties of Probability

(1) P(X = x) = f(x) > 0 if x is a mass point (2) ∑f(x) = 1 across all mass points (3) P(X∈A)=∑x∈Af(x)

Properties of a PMF

P25, P50, P75

Q1, Q2, Q3 equivalent to percentiles

SRSWR

SRSWOR or SRSWR: lotto na balik sa fishbowl

SRSWOR

SRSWOR or SRSWR: prerog

Census, Sampling. Probability Sampling, Non-probability Sampling

Sampling Designs

1. 𝐹𝑥 −∞ = lim 𝑥→−∞ 𝐹𝑥 𝑥 = 0 and 𝐹𝑥 ∞ = lim 𝑥→∞ 𝐹𝑥 𝑥 = 1 2. 𝐹𝑥(∙) is a monotone, nondecreasing function; i.e. 𝐹𝑥 𝑎 ≤ 𝐹𝑥 𝑏 for any a < b. 3. 𝐹𝑥 ∙ is continuous from the right; that is, lim ℎ→0+ 𝐹𝑥 𝑥 + ℎ = 𝐹𝑥 𝑥 for all x. just use the slide

Satisfying CDF

Mean

Sum of all the values in the collection divided by the total number of elements in the collection.

Generalized Basic Principle of Counting

Suppose an experiment can be performed in k stages. If there are n1 distinct possible outcomes in the first stage, and if for each of these n1 outcomes there are n2 distinct possible outcomes in the second stage, and if for each of the n1 x n2 outcomes of the first 2 stages, there are n3 distinct possible outcomes in the third stage; continuing in this manner, we reach the last stage where there are nk distinct possible outcomes for each of the outcomes of the first (k-1) stages, then there are n1 x n2 x ... x nk possible outcomes of the experiment.

Basic Principle of Counting

Suppose an experiment can be performed in two stages. If there are n distinct possible outcomes in the first stage of the experiment and if, for each outcome of the first stage, there are m distinct possible outcomes in the second stage, then there are n x m possible outcomes of this experiment

Random Sample from a Finite Population

Suppose we select n distinct elements from a population consisting of N elements, using a particular sampling method.

Independent sampling Paired/Related sampling

TWO TYPES OF SAMPLING FROM 2 POPULATIONS

Ho is true Ho is false Reject Ho, Type I, Correct Do not reject Ho, Correct, Type II

Table of Errors

PROBABILITY SAMPLING

The chance that an element will be included in the sample need not be equal for all elements.

PROBABILITY SAMPLING

The chance that the element will be included in the sample can be determined.

true

The hypothesized value of 𝜇 or P can be obtained from previous studies or from knowledge about the population

EXPONENTIAL

The length of time interval between successive happenings can be shown as this provided that the number of happenings in a fixed time interval has a Poisson distribution

True

The mass points with larger chances of occurrence have heavier weights in Expected value

specification, scope

The of the _____ of interest depends upon the ____ of the study

before

The parameter must be identified (before/after) analysis

Random Samples

The probabilities of including an element need not be equal for all elements. It just needs to be known and nonzero.

mass points

The values of the discrete random variable X for which f(x) > 0

estimation and hypothesis testing

There are two ways of statistical inference about a parameter can be made from a sample: through _______ and by ______.

Wrong interpretation! TANDAAN WAG SA ESTIMATE. SA ESTIMATOR YUNG PAGCAPTURE. RANDOM VARIABLE = ESTIMATOR! TAS ONLY RANDOM VARIABLE CAN HAVE A PROBABILITY not the value / estimate

There is a (1-𝛼)100% probability that the true value of the parameter falls within the computed interval.

Paired Sampling

This is accomplished by "matching" the measurements in two populations.

CRITICAL VALUE

This is the value that you compare to the test statistic to draw the conclusion of rejecting or not rejecting the null hypothesis.

Descriptive and Inferential

Two Branches of Statistics

1. CDF Technique 2. MGF Technique

Two Transformation Techniques

Null and alternative Hypothesis

Two Types of Hypotheses

Random Sample from a finite population Random Sample from an infinite population

Two Types or Random Sample

roster method and rule method

Two Ways of Specifying a Set

Test for the POPULATION MEAN and Test for the POPULATION PROPORTION

Two types of Single Population Test

One- tailed test Two- tailed test

Type of Test of Hypothesis

Discrete Continuous

Types of Random Variable

Descriptive Statistics, Inferential Statistics

Types of Statistical Methods of Applied Statistics

HYPOTHESIS TESTING

Using information from the sample to either support or disprove a statement about a certain characteristic of the population

HYPOTHESIS TESTING

Usually the conjecture concerns one of the unknown parameters of the population.

TYPE I ERROR: Saying that the mean number is not 100, when it is 100. TYPE II ERROR: Saying that the mean number is 100, when it is not.

What are the errors: A specific brand of tissue pull-ups claim that each pack contains 100 pulls. Ho: The mean number of pull-ups for a pack of tissue is equal to 100. Ha: The mean number of pull-ups for a pack of tissue is not equal to 100.

TYPE I ERROR: Saying the bridge is not risky, when it is risky. TYPE II ERROR: Saying the bridge is risky, when it is not risky.

What are the errors: An inspector has to choose between certifying a bridge as risky or saying that the bridge is not risky. Ho: The bridge is risky. Ha: The bridge is not risky

true

When actually conducting a hypothesis test, we operate under the assumption that the parameter is equal to a specific value.

same

While the variance is in the squared unit of the measurements, the standard deviation possesses the ____ unit as the measurements.

P(Ω) = 1

[Norming]

Positively Skewed Distribution

^\______

Symmetric Distribution

__/^\__

Negatively Skewed Distribution

_______/^

random variable

a function whose value is a real number that is determined by each sample point in the sample space.

Measure of Skewness

a single value that indicates the degree and direction of asymmetry.

Quantitative

a variable that takes on numerical values representing an amount or quantity

Qualitative

a variable that yields categorical responses

simple random sampling without replacement (SRSWOR)

all the n elements in the sample must be distinct from each other.

Stem and Leaf Display

alternative method for describing a set of data and presents a histogram-like picture of the data

HYPOTHESIS TESTING

an area of statistical inference in which one evaluates a conjecture about some characteristic of the parent population based upon the information contained in the random sample.

𝐸𝑖𝑗′𝑠

are the expected frequencies if the variables are independent

a priori

assigns probabilities to events before the experiment is performed

A posteriori

assigns probabilities to events by repeating the experiment a large number of times.

Subjective Probability

assigns probabilities to events using intuition, personal beliefs, and other indirect information

Variance of X

average squared deviation between the realized value of X and mew

Paired Sampling

before vs after, twins.

"collection of", "set of"

beginning phrase of population and sample

Statistics

branch of science that deals with the collection, presentation, organization, analysis and interpretation of data.

Graphical Displays of Data

can give information on the location, spread, extremes, and shape of the distribution

n(Ω)

cardinality

variable

characteristic or attribute of the elements in a collection that can assume different values for the different elements.

parameter

characteristics of the population

equal to v

chi's mean

2v

chi's variance

Qualitative or Quantitative, Discrete or Continuous

classification of variables

universe/population

collection of all elements under consideration in a (statistical) inquiry

Sample Space

collection of all possible outcomes of a random experiment

parametric family of density functions

collection of density functions that is indexed by a quantity called a parameter.

Sampled population

collection of elements from which the sample is actually taken

Data

collection of observations

parameter (θ)

constant that determines the specific form of the density function

Abstract Model

description of the essential properties of a phenomenon that is formulated in mathematical terms

not unique

description of the sample space is (unique/not unique)

Ratio

distict, order, fixed spacing, absolute zero

beta

distribution of 𝑅 2 under the standard regression assumptions.

Chi-square distribution

distribution skewed to the right

Percentiles

divide the ordered observations into 100 equal parts.

Median

divides the ordered set of observations into two equal parts.

domain: sample space counterdomain: the set of all real numbers

domain and range of random variables

GAMMA

extension of the exponential distribution where we now wait for the rth occurrence of an event.

DISCRETE UNIFORM

if hanap probability na makuha si X

empty set (∅)

impossible event

hypergeometric experiment

involves the selection of a sample of size n using simple random sampling without replacement from a population consisting of N elements, K of which falls in the category of "success" and the remaining N-K as "failure".

hypothesis

is a claim or statement about the population parameter

Simple random sampling (SRS)

is a method of selecting n units out of the N units in the population in such a way that every distinct sample of size n has an equal chance of being drawn.

Random Experiment

is a process that can be repeated under similar conditions but whose outcome cannot be predicted with certainty before hand.

PROBABILITY

is a quantity between 0 to 1.

point estimator

is a single statistic whose realized value is used to estimate an unknown parameter

INDICATOR FUNCTIONS

is the function with domain Ω and counterdomain {0,1}

NORMAL

is the most widely known and used probability distribution

Factorial notation

is the compact representation for the product of the first n consecutive positive integers. 𝑛! = 𝑛 𝑥 𝑛 − 1 𝑥 𝑛 − 2 𝑥 ... 𝑥 2 𝑥 1

Type I Error

is the error made by rejecting the null hypothesis when it is true

P value

is the probability of getting "worse" results.

Confidence coefficient

is the probability that the interval estimator encloses the true value of the parameter.

BETA

model linked with many common distributions like the binomial distribution, uniform distribution, and gamma distribution.

Mode

most frequent observed value in the data set

combination

number of distinct r-combinations that can be formed from the n elements of set Z is

NEGATIVE BINOMIAL

number of failures before the rth success.

Binomial

n Bernoulli trials, same probability of success, trials are independent

n taken r

nCr

permutation n taken r

nPr read as this

sample variance

not the mean of the squared deviations of the observations from the mean.

geometric

number of failures until the first success.

variance

not absolute tong measure of dispersion kasi di same unit

Measurement

process of determining the value or label of the variable based on what has been observed.

Sampling

process of obtaining information from the units in the selected sample

PROPORTION

proportion of elements possessing a characteristic of interest in a collection

random experiment

pwedeng ulit-ulitin within a certain conditions pero di ka parin sure sa output

PROPORTION

quotient obtained when we divide the magnitude of a part by the magnitude of the whole.

GEOMETRIC

random experiment that satisfies the following properties: 1. It consists of repeated independent Bernoulli trials with probabilities of success remaining constant from trial to trial. 2. It will be concluded only after the first success is observed.

Poisson experiment

random experiment that satisfies the following properties: 1. It is possible to partition a specified time or space interval into many smaller non-overlapping subintervals; 2. The number of outcomes that occur in a given subinterval does not depend on the number of outcomes in any other disjoint subinterval; 3. The probability that a single outcome will occur in a very short subinterval is proportional to the length of the subinterval; and, 4. The probability that more than 1 outcome will occur in such a short subinterval is almost zero.

waiting time for the next success

random variable X in exponential counterpart geometric is.

interval estimation

range of values

observation

realized value of a variable.

point estimate

realized value of an estimator

Confidence interval estimate

realized value of the interval

Partition

regrouping of the whole set

Confidence Interval estimator

rule that tells us how to calculate two numbers based on sample data that will form an interval where we expect the population parameter to fall with a specified degree of confidence.

probabilistic model

same input, different output

deterministic model

same input, same output

Measure of Skewness

shows the degree of asymmetry, or departure from symmetry of a distribution.

point estimation

single value

flat / tail

skewed hint positive is right negative is left

zero

smallest value of measure of dispersion meaning they're the same

array

sorted data or ordered data; arranged

comparison, explanation, justification, prediction, estimation

sound decision making

Expected Value

the mean of the distribution X because it is really a measure of central tendency

PROBABILITY

the measure of the likeliness an event will occur.

simple random sampling with replacement (SRSWR)

the n elements in the sample need not be distinct, that is, an element can be selected more than once to be part of the sample.

Statistics

the science dealing with data about the condition of a state or community

empty set (∅) and sample space (Ω)

two subset of the sample space that will always be events

Deterministic Model

type of abstract model that does not leave room for random variation.

Deterministic Probabilistic

types of abstract model

Median, Mode

unaffected by outliers.

Measures of Location

values below which a specified fraction or percentage of the observations in a given set must fall

Standard Error

will be used as a measure of reliability of our statistic.

Rule Method

Ω = { (x,y) | x,y is a _____ }

Roster Method

Ω = { , , , }

Bayes' Theorem

𝑃 (𝐵|𝐴) = 𝑃(𝐴|𝐵)𝑃(𝐵) / P(A|B)P(B) + P(A|𝐵𝑐)P(𝐵𝑐)


Related study sets

triceps brachii ( origin, insertion, action, innervation)

View Set

Ch 2 Interpersonal Communication & Emotional Intelligence

View Set

Fundamentals of Building Construction - Ch. 2 - Foundations

View Set

Prep U Fundamentals of Nursing CH 32

View Set

Health and Illness 2: Final Exam content

View Set