CTC MATH-1342 Final Exam
population mean
μ = Σx/N mu = sigma X over N
stem-and-leaf display
A device used to organize and group data that allows us to recover the data quickly. Similar to alphabetizing...quantitative
tree diagram
A diagram used to show the total number of possible outcomes
Exploratory Data Analysis (EDA)
A field of statistics that uses stem-and-leaf displays, box and whisker plots, histograms, etc, to detect extreme patterns and data values quickly.
time plot
A graph showing the data measurements in order over a length of time. (horizontal axis is time)
pictograph
A graph where pictures are used rather than solid bars.
standard normal distribution
A normal distribution with a mean of 0 and a standard deviation of 1.
box-and-whisker plot
A plot that shows how a set of data is distributed. The plot displays five numbers that summarize the data.
subjective probability
A probability assessment that is based on experience, intuitive judgment, or expertise
binomial probability distribution
A probability distribution showing the probability of x successes in n trial of a binomial experiment
chi-square distribution
A probability distribution, all distributions are positively skewed (except when df is large), x^2 cannot be less than 0, normal distribution when df is infinite
distinguishable permutations
A set of n objects has a n1 of one kind of object, n2 of a second kind, n3 of a third kind, etc with n = n1 + n2 + n3... nk, then the # of DPs of the n objects is n!/(n1! * n2! * n3! ... * nk!)
unbiased estimator
A statistic used to estimate a parameter is an unbiased estimator of the parameter if the mean of its sampling distribution is equal to the true value of the parameter. The most unbiased point estimate of the population mean µ is the sample mean x̅.
frequency table
A table that lists the number of times, or frequency, that each data value occurs.
outlier
A value much greater or much less than the others in a data set
pareto chart
A vertical bar graph in which the height of each bar are ordered according to height Frequency is plotted from the most frequent to the least frequent...qualitative
inherent zero
A zero that implies "none" $0 in my bank account 0 days old 0 video games
permutation
An arrangement of objects in which order is important
independent event
An event whose result does not depend on the result of another event
law of large numbers
As the number of trials increases the empirical probability gets closer and closer to the theoretical probability
Addition Rule
Considering mutually exclusive events, the probability of both occurring is the sum of the probabilities of each event.
interquartile range
Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.
percentiles
Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.
deciles
Divisions of a population into ten equal groups with respect to the distribution of a variable, such as income. So, basically 10% of the population with the lowest income.
bivariate normal distribution
For any fixed value of x, the corresponding values of y have a distribution that is approximately normal (requirements 2 and 3 of finding correlations)
minimum sample size to estimate µ
Given a confidence level c and margin of error E, the minimum sample size n needed to estimate the popu
right-tailed test
If the primary concern is deciding whether a population mean, µ, is greater than a specified value µ₀, we express the alternative hypothesis as
left-tailed test
If the primary concern is deciding whether a population mean, µ, is less than a specified value µ₀, we express the alternative hypothesis as
fractiles
Numbers that partition, or divide, an ordered data set into equal parts (each part has the same number of data entries). For instance, the median is a fractile because it divides an ordered data set into equal parts
quantitative data
Numerical data such as the number of hours it takes to drive to different locations.
Data Collection Methods
Observational Study Perform Experiment Simulation Survey
five-number summary
Presents five numbers in which they are the lowest value, highest value, & the cut off points for 1/4, 1/2, & 3/4 of the data.
standard deviation for grouped data
S = √Σ (x-x̄)^2/(n-1)
relative frequency histogram
Same as regular frequency histogram except vertical scale measures relative frequencies instead of frequencies (% instead of #) y-axis= class rel. frequency x-axis= data values
Empirical Rule
States that, in a normal distribution: 1. about 68% of the terms are within one standard deviation of the mean 2. about 95% are within two standard deviations, 3. about 99.7% are within three standard deviations (normal curve).
Examples of Interval L.O.M.
Temperatures Years
residuals
The difference between the observed value of the response variable and the value predicted by the regression line
dependent variable
The experimental factor that is being measured; the variable that may change in response to manipulations of the independent variable
permutation of n objects taken r at a time
The number of different permutations of n distinct objects taken r at a time
degrees of freedom
The number of individual scores that can vary without changing the sample mean. Statistically written as 'n-1' where n represents the number of subjects.
dependent event
The outcome of one event does affect the outcome of the second event
sample space
The set of all possible outcomes
standard error of the mean
The standard deviation of the sampling distribution of the sample mean
Central Limit Theorem
The theory that, as a sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution.
independent variable
The variable that is varied or manipulated by the researcher.
stem and leaf plot
a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes you should have as many leaves as there are observations in the data set. leaf= sample size min= 0|0 max= 3|9 for 3-digit #'s i.e: 102 stem-> 10|2<- leaf
time series chart
a data set composed of quantitative entries taken at regular intervals over a period of time
unimodal
a data set or distribution with a single mode
bimodal
a data set with two modes
mode
a data value that occurs more often than any other data value can be more than one or no mode.
negatively skewed or left-skewed distribution
a distribution in which the majority of the data values fall to the right of the mean
bell-shaped or mound-shaped distribution
a distribution shape that has a single peak and tapers off at either end; it is approximately symmetric
J-shaped distribution
a distribution shape that has few data values on the left side and increases as one moves to the right
reversed J-shaped distribution
a distribution shape that has few data values on the right side and increases as one moves to the left
uniformed-shaped distribution
a distribution shape whose values are evenly distributed over its range
skewed
a distribution with its peak well to one side. A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.
open-ended distribution
a frequency distribution that has no specific beginning value or no specific ending value
categorical frequency distribution
a frequency distribution used when the data are categorical (nominal)
cumulative frequency distribution
a frequency distribution using cumulative frequencies for the data
probability density function
a function with non-negative values such that probability can be described by areas under the curve graphing the function
scatter plot
a graph of pairs of data values where the ordered pairs are graphed as points in a coordinate plane is used to show the relationship between the two quantatative variables
Bar Graph
a graph with bars that are of uniform width and are evenly spaced with gaps between the bars A graph used to analyze categorical data... qualitative data
histogram
a graph with bars that represent a range of values on the horizontal axis and no gaps between the bars Shows frequencies of data values in intervals of the same size...quantitative data properties: 1. horizontal scale is quantitative and measures the data values 2. the vertical scale measures the frequencies of the classes 3. consecutive bars much touch
combination
a grouping of items in which order does not matter
point estimate
a single value estimate of a population parameter.
statistical hypothesis
a statement about a population parameter
frequency distribution
a table that shows classes or intervals of data entries with a count of the number of entries in each class
two-tailed test
a test that indicates the null hypothesis should be rejected when the test values is in either of the two critical regions
z-score or standard score
a type of standard score that tells us how many standard deviation units a given score is above or below the mean for that group
measures of central tendency
a value that represents a typical or central entry of a data set
discrete variable
a variable that has a finite or countable number of possible outcomes
continuous variable
a variable that has an uncountable number of possible outcomes represented by an interval on a number line
μ = Σx * P(x)
the mean of a discrete random variable
frequency - ƒ
the number data entries in a class
0 ≤ P(x) ≥ 1
the probability of each discrete random variable is between 0 and 1, inclusive
confidence level for a population mean µ
the probability that the confidence interval contains µ is c, assuming that the estimation process is repeated a large number of times.
rejection region
the range for which the null hypothesis is nor probable
Replication
the repetition of an experiment under the same or similar conditions
compliment of E (E')
the set of all outcomes in a sample space that are not included in event E P(E') = 1 - P(E)
population standard deviation
the square root of the population variance
sample standard deviation
the square root of the sample variance
∑P(x) = 1
the sum of all probabilities is equal to 1
mean
the sum of the data set divided by the number of entries average x̅ (x bar) = sample mean μ (mu) = population mean
cumulative frequency
the sum of the frequencies of that class and all previous classes is the sum of observations in a class and the observations in all previous classes. _____ of the last class is equal to the sample size Rule: the first slot will ALWAYS be the 1st class frequency.
class midpoint
the sum of the lower and upper limits of the class divided by two, sometimes called the class mark (lower class limits) + (upper class limits) ―――――――――――――――――― 2
explained variation of regression line
the sum of the squares of the differences between each predicted y-value and the mean of y
unexplained variation of regression line
the sum of the squares of the differences between the y-value and each corresponding predicted y-value
total variation of regression line
the sum of the squares of the differences between the y-value of each ordered pair and the mean of y
expected value
the sum of the Σx * P(x) is the mean or the expected value. Although a probability can never be negative, the expected value can be (i.e. we expect to lose money playing the lottery. The probability is positive to gain negative money).
Ordinal Level of Measurement (Ordinal = order)
"are qualitative or quantitative data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful. Ex: 1 - First Place School (#1 movie) 2 - Second Place School (#2 movie) 3 - Third Place School (#3 movie) can be arranged in order differences between data values cannot be determined or are meaningless"
Probability(at least one)
1 - P(none)
normal distribution
A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.
correlation
A relationship between two variables in which a change in one coincides with a change in the other.
event
A subset of a sample space.
Systematic Sampling
Assign each member of the population a number. Select some starting point and then select every nth number in the population at regular intervals
standard error of estimate
Gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data.
Examples of Ordinal L.O.M.
Grades Ranks
Examples of Ratio L.O.M.
Measurements Class Times
qualitative data
Non-numerical data such as the color of a person's eyes.
decision rule based on P-value
P ≤ α, reject H-sub0 P > α, fail to reject H-sub0
Bayes' Theorem
P(A | B) = P(B | A) * P(A) / P(B); P(A) being the number of instances of a given value divided by the total number of instances; P(B) is often ignored since this equation is typically used in a probability ratio that compares two different values for A, with P(B) being the same for both
inflection points
Points where the curvature of the graph changes. Located at x = − µ σ and x = + µ σ on the normal curve.
Biased Sample Identification
Sample should be representative of the population and inferences are valid?
t-distribution
Similar to the z-distribution but used when the population mean (σ) is not known. Bell shaped around the mean. Area under the curve equal to 1. Mean, mode and median are all 0. Standard deviation is greater than 1 but varies. Curve determined by degrees of freedom, d.f., which is sample n-1. As d.f. increases the bell approaches the standard normal distribution.
conditional probability
The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).
confidence interval for proportion p
The probability that the confidence interval contains p is c assuming that the estimation process is repeated a large number of times.
Multiplication Rule
The probability that two events will occur in sequence is P(A ∩ B) = P(A) * P(B|A) If the two events are independent then P(A ∩ B) = P(A) * P(B)
correlation coefficient
The relationship between variables, between -1 and +1., a statistic representing how closely two variables co-vary
slope m
The steepness of a line on a graph
Example of Nominal L.O.M.
Yes/No/Undecided Political Party SSN (substitute for names)
circle graph (pie chart)
a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution shows the total quantity divided into component parts using proportional segments of a circle...qualitative % * 360 to get the degrees to plot Ex: 47.9% = 172 degrees.
Census
a count or measure of an entire population
Sampling
a count or measure of part of a population more common
ogive
a line graph that displays the cumulative frequency or cumulative relative frequency distribution of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis and the cumulative frequencies are marked on the vertical axis y-axis= class cumulative frequency x-axis= data values *last one must be 25 if n=25
frequency polygon
a line graph that emphasizes the continuous change in frequencies displays data by using lines that connect points plotted for the frequencies at the midpoints of the classes
Parameter
a numerical description of a population characteristic Ex: 52% of the governors of the 50 states are Democrats All 50 states = all included = 52% parameter
Statistic
a numerical description of a sample characteristic Ex: of 300 computer users, 8% said they had repairs 300 computer users = subset of population = more than 300 pc users in the world = 8% statistic
hypothesis test
a process that uses sample statistics to test a claim about the value of a population parameter
probability experiment
an action, or trial, through which specific results (counts, measurements, or responses) are obtained
simple event
an event consisting of only one outcome
binomial experiment
an experiment in which there are exactly two possible outcomes for each trial, a fixed number of independent trials, and the probabilities for each trial are the same
Survey
an investigation of one or more characteristics of a population in design it is important to word the questions that they do not lead to biased results
mean of a frequency distribution
approximated where x and f are the mid point and frequency of each class respectively x̅ = Σ(x f)/n
Nominal Level of Measurement (Nominal = name)
are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level. names, labels, categories only cannot be arranged in an order
class boundaries
are the numbers used to separate classes, but without the gaps created by class limits. the upper and lower values of a class for a grouped frequency distribution whose values have one additional decimal place more than the data and end in the digit 5 classes meet halfway in between. So, if first class limit is 79, the 1st class boundary would be 78.5.
negative linear correlation
as x increases, y tends to decrease
positive linear correlation
as x increases, y tends to increase
standardized test statistic
assuming that the null hypothesis is true, test statistic is converted to z, t or chi-square value
Empirical probability
based on observations obtained from probability experiments
P(x) = cCx p^x q^n-x = n!/[(n-x)!x!]p^xq^n-x
binomial probability equasion
Interval Level of Measurement
can be ordered, and meaningful differences between data entries can be calculated. A zero entry simply represents a position on a scale; the entry is not an inherent zero. Ex: Temperature 0 degrees is a position on a scale and not an inherent 0 or starting point can be arranged in order differences can be found and are meaningful no natural zero starting point
relative frequency formula
class frequency / sample size ƒ / n ex: frequency=n and n=25 frequency for class 1: 7. f/n= 7/25 = 0.28 = 28% Rule: sum of all f/n WILL ALWAYS EQUAL 1.
Qualitative Data
consist of attributes, labels, or nonnumerical entries.
Quantitative Data
consist of numerical measurements or counts.
Data
consists of information coming from observations, counts, measurements or responses
discrete probability distribution
consists of the values a random variable can assume and the corresponding probabilities of the values
raw data
data collected in original form
sum of squares
denoted by SSx To overcome the deviation for any data set being 0, we consider the squares of each deviation. It is the sum of these squares.
shape of distribution
describes how data is distributed. normal, uniform, skewed right (positive), skewed left (negative) , bimodal
geometric distribution
discrete probability distribution of random variable x that satisfies: a trial is repeated until success occurs the repeated trials are independent probability of success p is the same for each trial random variable x represents the number of the trial where the first success occurs P(x) = pq^(x-1) where q = 1 - p.
Poisson distribution
discrete probability distribution of random variable x that satisfies: consists of counting the number of times x event occurs in a given interval probability of the event occurring is the same for each interval each is independent of other intervals
positively skewed or right-skewed distribution
distribution in which the majority of the data values fall to the left of the mean
grouped frequency distribution
distribution used when the range is large and classes of several units in width are needed
Cluster Sampling
divide the population into clusters and select clusters all of the members in one or more (but not all) of the clusters
direct cause-and-effect
does x cause y?
reverse cause-and-effect
does y cause x?
classical probability
each outcome in a sample space is equally likely
mean of the sample means
equal to the mean of the population
coefficient of determination
equal to the pearson's correlation squared, it is the proportion of variance in the dependent variable that is explained by the independent variable. If r² = .81, it would mean that 81% of the variation can be explained by the regression line.
Random Sample
every members of the population has an equal chance of being selected
Simple Random Sampling
every possible sample of the same size has the same chance of being selected Ex: table of random numbers
E(x) = μ = Σx * P(x)
expected value
Confounding Variable
experimenter cannot tell the difference between the effects of different factors on a variable
coefficient of variation
expresses the standard deviation as a percentage of the mean of a data set
type II error
fail to reject the null hypothesis when it is false
multinomial experiment
fixed number of trials n where each trial is independent has k number of mutually exclusive outcomes (E1, E2, E3, ..., Ek) each outcome has a fixed probability so P(E1) = p1, P(E2) = p2, ..., P(Ek) = pk. the number of times E1 occurs is x1, etc. discreet random variable x counts the number of times x1, x2, ..., xk occurs in n independent trials (x1+x2+x3+...+xk=n)
Chebychev's Theorem
for any distribution, the proportion of observations that lie within K standard deviations of the mean is guaranteed to be at least 1-(1/ksquared), for k>1. 0%, 75%, 89.9%
class
grouping the data into intervals quantitative or qualitative category used to classify data
midpoint
half way point within the class interval. average of the upper and lower real limits of each class interval. diff between midpoint= class interval size
class limit
highest and lowest possible
fundamental counting principle
if an event can happen in N ways, and another, independent event can happen in M ways, then both events together can happen in N x M ways.
sampling distribution of sample means
if the sample statistic is the sample mean
frequency histogram
is a bar graph that represents the frequency distribution of a data set a histogram that shows the ratio between class frequency and the total of all frequencies or the ratio can be expressed as a percent --- f/n
null hypothesis
is a statistical hypothesis that contains a statement of equality such as ≤, =, or ≥
coincidental relationship/no cause-and-effect
is it possible that the relationship is a coincidence?
third variable cause-and-effect
is it possible that the relationship is caused by 3rd variable or combination of several other variables?
Descriptive Statistics
is the branch of statistics that involves the organization summarization, and display of data.
Inferential Statistics
is the branch of statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics is probability.
Population
is the collection of all outcomes, responses(observation), measurements of counts that are of interest. Ex: The United States adults population
class width
is the distance between lower (or upper) limits of consecutive classes. Rule: 1st class limits starts with the min. and it is one unit less than the 2nd class limit.
Statistics
is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.
Sample
is the subset, or part, of a population. Identifying data sets: Ex: 1,500 adults sample of the US population
level of significance
maximum allowable probability of making a type I error
Stratified Sampling
members of a population are divided into two or more subsets that share a similar characteristic (age, gender, ethnicity, political preference) a sample is then selected from each of the strata
Double-blind
neither the experimenter nor the subjects know if the subjects are receiving a treatment or placebo. The experimenter is informed after all data have been collected. This type of experimental design is preferred by researchers.
sample size
number of participants (people, plants, rats, etc.)
point estimate for p
p hat = x/n
range of probabilities rule
probability of an event E is between 0 and 1 inclusive 0 ≤ P(E) ≥ 1
level of confidence c
probability that an interval estimate contains the population parameter being estimate, assuming that the estimation process is repeated a large number of times.
P-value
probability value. if null hypotheses is true, it is the probability of obtaining a sample statistic with a value as extreme or more extreme than the one determined from the sample data.
Randomization
process of randomly assigning subjects to different treatment groups
proportion of failures
q hat = 1 - p hat
population proportion
ratio of members of a population with a particular characteristic to the total members of the population
type I error
reject the null hypothesis when it is true
Σ (sigma)
represents the sum
Randomized block design
researcher divides subjects with similar charicteristics into blocks and then within each block, randomly assigns subjects to treatment groups
Observational Study
researcher observes and measures characteristics of interest of a part of a population but does not change existing conditions Ex: Wildlife photographer or observing
outcome
result of a single outcome
point estimate for σ
s
interval estimate
sample of values within which the parameter will fall with some level of confidence
test statistic
sample statistic representing a population
critical value z sub-0
separates the rejection region from the non rejection region
Ratio Level of Measurement
similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so that one data value can be meaningfully expressed as a multiple of another. Ex: Age -- 10 years old vs. 20 years old = 2nd person is twice the age of 1st. -- age starts at 0 as starting point. can be arranged in order differences can be found and are meaningful natural zero starting point differences and ratios are both meaningful can be a multiple of another
σ = √ σ²
standard deviation of a discrete random variable
dotplots
statistical graph in which each data value is plotted by using a dot above a horizontal axis can be used with categorical or quantitative variables a simple graph used to show the location of all the data value graphs a dot for each case on a single axis
z-test for mean μ
statistical test for a population mean. Test statistic is sample mean x̅. Used when population standard deviation is known.
t-test for mean μ
statistical test for a population mean. Test statistic is sample mean x̅. Used when population standard deviation is unknown.
z-test for proportion p̂
statistical test for a population proportion. Test statistic is sample p̂
chi-square test for variance or standard deviation
statistical test for a population variance or standard deviation. Test statistic is sample s or s² and uses χ²
Completely random design
subjects are assigned to different treatment groups through random selection
regression line
summarizes the points of a scatterplot and provides the means for making predictions
point estimate for σ²
s²
Blinding
technique where subject does not know whether they are receiving a treatment or a placebo
exploratory data analysis
the act of analyzing data to determine what information can be obtained by using stem and leaf plots, medians, interquartile ranges, and boxplots
population variance
the average of the squares of the deviations in a population data set.
sample variance
the average of the squares of the deviations in a sample data set.
critical values
the boundaries for the rejection/non-rejection regions If c = 90%, 5% lies to the left of zc = -1.645, 5% lies to the right of zc = 1.645
alternative hypothesis
the compliment to the null hypothesis a statement that must be true if the null hypothesis is false contains a statement of strict inequality such as <, ≠, or >.
deviation
the difference between the entry and the mean of μ in the data set. x = x - μ
range
the difference between the maximum and minimum data entries
sampling error
the difference between the point estimate and the actual parameter value
Sampling Error
the difference between the results of the sample and those of the population
stem
the entry's left most digit ex: 42 = stem of 4
leaf
the entry's right most digit ex: 42 = leaf of 2
margin of error E
the greatest possible distance between the point estimate and the value of the parameter it is estimating. the maximum error of estimate or error tolerance the population has to be normally distributed or the sample has to be n≥30
lower class limit
the lower value of a class in a frequency distribution that has the same decimal place value as the data
weighted mean
the mean of a data set whose entries have varying weights
relative frequency
the number of times an event occurs, DIVIDED by the total number of trials
quartiles
the numbers that separate the set into four equal parts
mutually exclusive
the occurrence of one means that none of the others can occur
sampling distribution
the probability distribution of a sample statistic that is formes when samples of n size are repeatedly taken from a population; describes how values of a sample statistic vary across all possible samples of a specific size that can be taken from a population
normal curve
the symmetrical bell-shaped curve that describes the distribution of many physical and psychological attributes. Most scores fall near the average, and fewer and fewer scores lie near the extremes.
upper class limit
the upper value of a class in a frequency distribution that has the same decimal place value as the data
Simulation
the use of a mathematical or physical model to reproduce the conditions of a situation or process they allow you to study situations that are impractical or even dangerous to recreate Ex: Crash test dummy Wave pool with a tiny scale ship Vibrating "earthquake" table
median
the value that lies in the middle of the data when the data set is ordered middle number when odd number of entries mean of the middle 2 entries when the data set is even
y-intercept b
the y-coordinate of a point where a graph crosses the y-axis
continuous probability
theoretically an infinite number of outcomes within a given range
Experimentation
treatment is applied to part of a population and responses are observed a control group does not receive the treatment but is observed the responses of the treatment group and control group are compared and studied Ex: Science Fair project
Convenience Sampling
use results that are very easy to get (localized)
random variable
variable whose value is determined by the outcomes of a probability experiment
σ² = Σ(x-μ)²*P(x)
variance of a discrete random variable
symmetric distribution
when a vertical line can be drawn through the middle of the graph and the resulting halves are approximately mirror images
paired data sets
when each entry in one data set corresponds to one entry in a second data set
sample mean
x̅ = Σx/n x bar = sigma x over n
frequency histogram axis
y-axis= class frequency x-axis= data values
c-prediction for interval y
ŷ - E < y < ŷ + E