CTC MATH-1342 Final Exam

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

population mean

μ = Σx/N mu = sigma X over N

stem-and-leaf display

A device used to organize and group data that allows us to recover the data quickly. Similar to alphabetizing...quantitative

tree diagram

A diagram used to show the total number of possible outcomes

Exploratory Data Analysis (EDA)

A field of statistics that uses stem-and-leaf displays, box and whisker plots, histograms, etc, to detect extreme patterns and data values quickly.

time plot

A graph showing the data measurements in order over a length of time. (horizontal axis is time)

pictograph

A graph where pictures are used rather than solid bars.

standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1.

box-and-whisker plot

A plot that shows how a set of data is distributed. The plot displays five numbers that summarize the data.

subjective probability

A probability assessment that is based on experience, intuitive judgment, or expertise

binomial probability distribution

A probability distribution showing the probability of x successes in n trial of a binomial experiment

chi-square distribution

A probability distribution, all distributions are positively skewed (except when df is large), x^2 cannot be less than 0, normal distribution when df is infinite

distinguishable permutations

A set of n objects has a n1 of one kind of object, n2 of a second kind, n3 of a third kind, etc with n = n1 + n2 + n3... nk, then the # of DPs of the n objects is n!/(n1! * n2! * n3! ... * nk!)

unbiased estimator

A statistic used to estimate a parameter is an unbiased estimator of the parameter if the mean of its sampling distribution is equal to the true value of the parameter. The most unbiased point estimate of the population mean µ is the sample mean x̅.

frequency table

A table that lists the number of times, or frequency, that each data value occurs.

outlier

A value much greater or much less than the others in a data set

pareto chart

A vertical bar graph in which the height of each bar are ordered according to height Frequency is plotted from the most frequent to the least frequent...qualitative

inherent zero

A zero that implies "none" $0 in my bank account 0 days old 0 video games

permutation

An arrangement of objects in which order is important

independent event

An event whose result does not depend on the result of another event

law of large numbers

As the number of trials increases the empirical probability gets closer and closer to the theoretical probability

Addition Rule

Considering mutually exclusive events, the probability of both occurring is the sum of the probabilities of each event.

interquartile range

Describes the spread of middle 50% of a data set, IQR = Q3 - Q1.

percentiles

Divide the data set into 100 equal parts. An observation at the Pth percentile is higher tha P percent of all observations.

deciles

Divisions of a population into ten equal groups with respect to the distribution of a variable, such as income. So, basically 10% of the population with the lowest income.

bivariate normal distribution

For any fixed value of x, the corresponding values of y have a distribution that is approximately normal (requirements 2 and 3 of finding correlations)

minimum sample size to estimate µ

Given a confidence level c and margin of error E, the minimum sample size n needed to estimate the popu

right-tailed test

If the primary concern is deciding whether a population mean, µ, is greater than a specified value µ₀, we express the alternative hypothesis as

left-tailed test

If the primary concern is deciding whether a population mean, µ, is less than a specified value µ₀, we express the alternative hypothesis as

fractiles

Numbers that partition, or divide, an ordered data set into equal parts (each part has the same number of data entries). For instance, the median is a fractile because it divides an ordered data set into equal parts

quantitative data

Numerical data such as the number of hours it takes to drive to different locations.

Data Collection Methods

Observational Study Perform Experiment Simulation Survey

five-number summary

Presents five numbers in which they are the lowest value, highest value, & the cut off points for 1/4, 1/2, & 3/4 of the data.

standard deviation for grouped data

S = √Σ (x-x̄)^2/(n-1)

relative frequency histogram

Same as regular frequency histogram except vertical scale measures relative frequencies instead of frequencies (% instead of #) y-axis= class rel. frequency x-axis= data values

Empirical Rule

States that, in a normal distribution: 1. about 68% of the terms are within one standard deviation of the mean 2. about 95% are within two standard deviations, 3. about 99.7% are within three standard deviations (normal curve).

Examples of Interval L.O.M.

Temperatures Years

residuals

The difference between the observed value of the response variable and the value predicted by the regression line

dependent variable

The experimental factor that is being measured; the variable that may change in response to manipulations of the independent variable

permutation of n objects taken r at a time

The number of different permutations of n distinct objects taken r at a time

degrees of freedom

The number of individual scores that can vary without changing the sample mean. Statistically written as 'n-1' where n represents the number of subjects.

dependent event

The outcome of one event does affect the outcome of the second event

sample space

The set of all possible outcomes

standard error of the mean

The standard deviation of the sampling distribution of the sample mean

Central Limit Theorem

The theory that, as a sample size increases, the distribution of sample means of size n, randomly selected, approaches a normal distribution.

independent variable

The variable that is varied or manipulated by the researcher.

stem and leaf plot

a data plot that uses part of a data value as the stem and part of the data value as the leaf to form groups or classes you should have as many leaves as there are observations in the data set. leaf= sample size min= 0|0 max= 3|9 for 3-digit #'s i.e: 102 stem-> 10|2<- leaf

time series chart

a data set composed of quantitative entries taken at regular intervals over a period of time

unimodal

a data set or distribution with a single mode

bimodal

a data set with two modes

mode

a data value that occurs more often than any other data value can be more than one or no mode.

negatively skewed or left-skewed distribution

a distribution in which the majority of the data values fall to the right of the mean

bell-shaped or mound-shaped distribution

a distribution shape that has a single peak and tapers off at either end; it is approximately symmetric

J-shaped distribution

a distribution shape that has few data values on the left side and increases as one moves to the right

reversed J-shaped distribution

a distribution shape that has few data values on the right side and increases as one moves to the left

uniformed-shaped distribution

a distribution shape whose values are evenly distributed over its range

skewed

a distribution with its peak well to one side. A unimodal asymmetric, distribution that tends to slant-most of the data are clustered on one side of the distribution and "tails" off on the other side.

open-ended distribution

a frequency distribution that has no specific beginning value or no specific ending value

categorical frequency distribution

a frequency distribution used when the data are categorical (nominal)

cumulative frequency distribution

a frequency distribution using cumulative frequencies for the data

probability density function

a function with non-negative values such that probability can be described by areas under the curve graphing the function

scatter plot

a graph of pairs of data values where the ordered pairs are graphed as points in a coordinate plane is used to show the relationship between the two quantatative variables

Bar Graph

a graph with bars that are of uniform width and are evenly spaced with gaps between the bars A graph used to analyze categorical data... qualitative data

histogram

a graph with bars that represent a range of values on the horizontal axis and no gaps between the bars Shows frequencies of data values in intervals of the same size...quantitative data properties: 1. horizontal scale is quantitative and measures the data values 2. the vertical scale measures the frequencies of the classes 3. consecutive bars much touch

combination

a grouping of items in which order does not matter

point estimate

a single value estimate of a population parameter.

statistical hypothesis

a statement about a population parameter

frequency distribution

a table that shows classes or intervals of data entries with a count of the number of entries in each class

two-tailed test

a test that indicates the null hypothesis should be rejected when the test values is in either of the two critical regions

z-score or standard score

a type of standard score that tells us how many standard deviation units a given score is above or below the mean for that group

measures of central tendency

a value that represents a typical or central entry of a data set

discrete variable

a variable that has a finite or countable number of possible outcomes

continuous variable

a variable that has an uncountable number of possible outcomes represented by an interval on a number line

μ = Σx * P(x)

the mean of a discrete random variable

frequency - ƒ

the number data entries in a class

0 ≤ P(x) ≥ 1

the probability of each discrete random variable is between 0 and 1, inclusive

confidence level for a population mean µ

the probability that the confidence interval contains µ is c, assuming that the estimation process is repeated a large number of times.

rejection region

the range for which the null hypothesis is nor probable

Replication

the repetition of an experiment under the same or similar conditions

compliment of E (E')

the set of all outcomes in a sample space that are not included in event E P(E') = 1 - P(E)

population standard deviation

the square root of the population variance

sample standard deviation

the square root of the sample variance

∑P(x) = 1

the sum of all probabilities is equal to 1

mean

the sum of the data set divided by the number of entries average x̅ (x bar) = sample mean μ (mu) = population mean

cumulative frequency

the sum of the frequencies of that class and all previous classes is the sum of observations in a class and the observations in all previous classes. _____ of the last class is equal to the sample size Rule: the first slot will ALWAYS be the 1st class frequency.

class midpoint

the sum of the lower and upper limits of the class divided by two, sometimes called the class mark (lower class limits) + (upper class limits) ―――――――――――――――――― 2

explained variation of regression line

the sum of the squares of the differences between each predicted y-value and the mean of y

unexplained variation of regression line

the sum of the squares of the differences between the y-value and each corresponding predicted y-value

total variation of regression line

the sum of the squares of the differences between the y-value of each ordered pair and the mean of y

expected value

the sum of the Σx * P(x) is the mean or the expected value. Although a probability can never be negative, the expected value can be (i.e. we expect to lose money playing the lottery. The probability is positive to gain negative money).

Ordinal Level of Measurement (Ordinal = order)

"are qualitative or quantitative data at this level can be arranged in order, or ranked, but differences between data entries are not meaningful. Ex: 1 - First Place School (#1 movie) 2 - Second Place School (#2 movie) 3 - Third Place School (#3 movie) can be arranged in order differences between data values cannot be determined or are meaningless"

Probability(at least one)

1 - P(none)

normal distribution

A continuous probability distribution that appears in many situations, both natural and man-made. It has a bell-shape and the area under the normal density curve is always equal to 1.

correlation

A relationship between two variables in which a change in one coincides with a change in the other.

event

A subset of a sample space.

Systematic Sampling

Assign each member of the population a number. Select some starting point and then select every nth number in the population at regular intervals

standard error of estimate

Gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data.

Examples of Ordinal L.O.M.

Grades Ranks

Examples of Ratio L.O.M.

Measurements Class Times

qualitative data

Non-numerical data such as the color of a person's eyes.

decision rule based on P-value

P ≤ α, reject H-sub0 P > α, fail to reject H-sub0

Bayes' Theorem

P(A | B) = P(B | A) * P(A) / P(B); P(A) being the number of instances of a given value divided by the total number of instances; P(B) is often ignored since this equation is typically used in a probability ratio that compares two different values for A, with P(B) being the same for both

inflection points

Points where the curvature of the graph changes. Located at x = − µ σ and x = + µ σ on the normal curve.

Biased Sample Identification

Sample should be representative of the population and inferences are valid?

t-distribution

Similar to the z-distribution but used when the population mean (σ) is not known. Bell shaped around the mean. Area under the curve equal to 1. Mean, mode and median are all 0. Standard deviation is greater than 1 but varies. Curve determined by degrees of freedom, d.f., which is sample n-1. As d.f. increases the bell approaches the standard normal distribution.

conditional probability

The probability of an event occurring given that another has occurred. The probability of A given that B has occurred is denoted as P(A|B).

confidence interval for proportion p

The probability that the confidence interval contains p is c assuming that the estimation process is repeated a large number of times.

Multiplication Rule

The probability that two events will occur in sequence is P(A ∩ B) = P(A) * P(B|A) If the two events are independent then P(A ∩ B) = P(A) * P(B)

correlation coefficient

The relationship between variables, between -1 and +1., a statistic representing how closely two variables co-vary

slope m

The steepness of a line on a graph

Example of Nominal L.O.M.

Yes/No/Undecided Political Party SSN (substitute for names)

circle graph (pie chart)

a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution shows the total quantity divided into component parts using proportional segments of a circle...qualitative % * 360 to get the degrees to plot Ex: 47.9% = 172 degrees.

Census

a count or measure of an entire population

Sampling

a count or measure of part of a population more common

ogive

a line graph that displays the cumulative frequency or cumulative relative frequency distribution of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis and the cumulative frequencies are marked on the vertical axis y-axis= class cumulative frequency x-axis= data values *last one must be 25 if n=25

frequency polygon

a line graph that emphasizes the continuous change in frequencies displays data by using lines that connect points plotted for the frequencies at the midpoints of the classes

Parameter

a numerical description of a population characteristic Ex: 52% of the governors of the 50 states are Democrats All 50 states = all included = 52% parameter

Statistic

a numerical description of a sample characteristic Ex: of 300 computer users, 8% said they had repairs 300 computer users = subset of population = more than 300 pc users in the world = 8% statistic

hypothesis test

a process that uses sample statistics to test a claim about the value of a population parameter

probability experiment

an action, or trial, through which specific results (counts, measurements, or responses) are obtained

simple event

an event consisting of only one outcome

binomial experiment

an experiment in which there are exactly two possible outcomes for each trial, a fixed number of independent trials, and the probabilities for each trial are the same

Survey

an investigation of one or more characteristics of a population in design it is important to word the questions that they do not lead to biased results

mean of a frequency distribution

approximated where x and f are the mid point and frequency of each class respectively x̅ = Σ(x f)/n

Nominal Level of Measurement (Nominal = name)

are qualitative only. Data at this level are categorized using names, labels, or qualities. No mathematical computations can be made at this level. names, labels, categories only cannot be arranged in an order

class boundaries

are the numbers used to separate classes, but without the gaps created by class limits. the upper and lower values of a class for a grouped frequency distribution whose values have one additional decimal place more than the data and end in the digit 5 classes meet halfway in between. So, if first class limit is 79, the 1st class boundary would be 78.5.

negative linear correlation

as x increases, y tends to decrease

positive linear correlation

as x increases, y tends to increase

standardized test statistic

assuming that the null hypothesis is true, test statistic is converted to z, t or chi-square value

Empirical probability

based on observations obtained from probability experiments

P(x) = cCx p^x q^n-x = n!/[(n-x)!x!]p^xq^n-x

binomial probability equasion

Interval Level of Measurement

can be ordered, and meaningful differences between data entries can be calculated. A zero entry simply represents a position on a scale; the entry is not an inherent zero. Ex: Temperature 0 degrees is a position on a scale and not an inherent 0 or starting point can be arranged in order differences can be found and are meaningful no natural zero starting point

relative frequency formula

class frequency / sample size ƒ / n ex: frequency=n and n=25 frequency for class 1: 7. f/n= 7/25 = 0.28 = 28% Rule: sum of all f/n WILL ALWAYS EQUAL 1.

Qualitative Data

consist of attributes, labels, or nonnumerical entries.

Quantitative Data

consist of numerical measurements or counts.

Data

consists of information coming from observations, counts, measurements or responses

discrete probability distribution

consists of the values a random variable can assume and the corresponding probabilities of the values

raw data

data collected in original form

sum of squares

denoted by SSx To overcome the deviation for any data set being 0, we consider the squares of each deviation. It is the sum of these squares.

shape of distribution

describes how data is distributed. normal, uniform, skewed right (positive), skewed left (negative) , bimodal

geometric distribution

discrete probability distribution of random variable x that satisfies: a trial is repeated until success occurs the repeated trials are independent probability of success p is the same for each trial random variable x represents the number of the trial where the first success occurs P(x) = pq^(x-1) where q = 1 - p.

Poisson distribution

discrete probability distribution of random variable x that satisfies: consists of counting the number of times x event occurs in a given interval probability of the event occurring is the same for each interval each is independent of other intervals

positively skewed or right-skewed distribution

distribution in which the majority of the data values fall to the left of the mean

grouped frequency distribution

distribution used when the range is large and classes of several units in width are needed

Cluster Sampling

divide the population into clusters and select clusters all of the members in one or more (but not all) of the clusters

direct cause-and-effect

does x cause y?

reverse cause-and-effect

does y cause x?

classical probability

each outcome in a sample space is equally likely

mean of the sample means

equal to the mean of the population

coefficient of determination

equal to the pearson's correlation squared, it is the proportion of variance in the dependent variable that is explained by the independent variable. If r² = .81, it would mean that 81% of the variation can be explained by the regression line.

Random Sample

every members of the population has an equal chance of being selected

Simple Random Sampling

every possible sample of the same size has the same chance of being selected Ex: table of random numbers

E(x) = μ = Σx * P(x)

expected value

Confounding Variable

experimenter cannot tell the difference between the effects of different factors on a variable

coefficient of variation

expresses the standard deviation as a percentage of the mean of a data set

type II error

fail to reject the null hypothesis when it is false

multinomial experiment

fixed number of trials n where each trial is independent has k number of mutually exclusive outcomes (E1, E2, E3, ..., Ek) each outcome has a fixed probability so P(E1) = p1, P(E2) = p2, ..., P(Ek) = pk. the number of times E1 occurs is x1, etc. discreet random variable x counts the number of times x1, x2, ..., xk occurs in n independent trials (x1+x2+x3+...+xk=n)

Chebychev's Theorem

for any distribution, the proportion of observations that lie within K standard deviations of the mean is guaranteed to be at least 1-(1/ksquared), for k>1. 0%, 75%, 89.9%

class

grouping the data into intervals quantitative or qualitative category used to classify data

midpoint

half way point within the class interval. average of the upper and lower real limits of each class interval. diff between midpoint= class interval size

class limit

highest and lowest possible

fundamental counting principle

if an event can happen in N ways, and another, independent event can happen in M ways, then both events together can happen in N x M ways.

sampling distribution of sample means

if the sample statistic is the sample mean

frequency histogram

is a bar graph that represents the frequency distribution of a data set a histogram that shows the ratio between class frequency and the total of all frequencies or the ratio can be expressed as a percent --- f/n

null hypothesis

is a statistical hypothesis that contains a statement of equality such as ≤, =, or ≥

coincidental relationship/no cause-and-effect

is it possible that the relationship is a coincidence?

third variable cause-and-effect

is it possible that the relationship is caused by 3rd variable or combination of several other variables?

Descriptive Statistics

is the branch of statistics that involves the organization summarization, and display of data.

Inferential Statistics

is the branch of statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics is probability.

Population

is the collection of all outcomes, responses(observation), measurements of counts that are of interest. Ex: The United States adults population

class width

is the distance between lower (or upper) limits of consecutive classes. Rule: 1st class limits starts with the min. and it is one unit less than the 2nd class limit.

Statistics

is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.

Sample

is the subset, or part, of a population. Identifying data sets: Ex: 1,500 adults sample of the US population

level of significance

maximum allowable probability of making a type I error

Stratified Sampling

members of a population are divided into two or more subsets that share a similar characteristic (age, gender, ethnicity, political preference) a sample is then selected from each of the strata

Double-blind

neither the experimenter nor the subjects know if the subjects are receiving a treatment or placebo. The experimenter is informed after all data have been collected. This type of experimental design is preferred by researchers.

sample size

number of participants (people, plants, rats, etc.)

point estimate for p

p hat = x/n

range of probabilities rule

probability of an event E is between 0 and 1 inclusive 0 ≤ P(E) ≥ 1

level of confidence c

probability that an interval estimate contains the population parameter being estimate, assuming that the estimation process is repeated a large number of times.

P-value

probability value. if null hypotheses is true, it is the probability of obtaining a sample statistic with a value as extreme or more extreme than the one determined from the sample data.

Randomization

process of randomly assigning subjects to different treatment groups

proportion of failures

q hat = 1 - p hat

population proportion

ratio of members of a population with a particular characteristic to the total members of the population

type I error

reject the null hypothesis when it is true

Σ (sigma)

represents the sum

Randomized block design

researcher divides subjects with similar charicteristics into blocks and then within each block, randomly assigns subjects to treatment groups

Observational Study

researcher observes and measures characteristics of interest of a part of a population but does not change existing conditions Ex: Wildlife photographer or observing

outcome

result of a single outcome

point estimate for σ

s

interval estimate

sample of values within which the parameter will fall with some level of confidence

test statistic

sample statistic representing a population

critical value z sub-0

separates the rejection region from the non rejection region

Ratio Level of Measurement

similar to data at the interval level, with the added property that a zero entry is an inherent zero. A ratio of two data values can be formed so that one data value can be meaningfully expressed as a multiple of another. Ex: Age -- 10 years old vs. 20 years old = 2nd person is twice the age of 1st. -- age starts at 0 as starting point. can be arranged in order differences can be found and are meaningful natural zero starting point differences and ratios are both meaningful can be a multiple of another

σ = √ σ²

standard deviation of a discrete random variable

dotplots

statistical graph in which each data value is plotted by using a dot above a horizontal axis can be used with categorical or quantitative variables a simple graph used to show the location of all the data value graphs a dot for each case on a single axis

z-test for mean μ

statistical test for a population mean. Test statistic is sample mean x̅. Used when population standard deviation is known.

t-test for mean μ

statistical test for a population mean. Test statistic is sample mean x̅. Used when population standard deviation is unknown.

z-test for proportion p̂

statistical test for a population proportion. Test statistic is sample p̂

chi-square test for variance or standard deviation

statistical test for a population variance or standard deviation. Test statistic is sample s or s² and uses χ²

Completely random design

subjects are assigned to different treatment groups through random selection

regression line

summarizes the points of a scatterplot and provides the means for making predictions

point estimate for σ²

Blinding

technique where subject does not know whether they are receiving a treatment or a placebo

exploratory data analysis

the act of analyzing data to determine what information can be obtained by using stem and leaf plots, medians, interquartile ranges, and boxplots

population variance

the average of the squares of the deviations in a population data set.

sample variance

the average of the squares of the deviations in a sample data set.

critical values

the boundaries for the rejection/non-rejection regions If c = 90%, 5% lies to the left of zc = -1.645, 5% lies to the right of zc = 1.645

alternative hypothesis

the compliment to the null hypothesis a statement that must be true if the null hypothesis is false contains a statement of strict inequality such as <, ≠, or >.

deviation

the difference between the entry and the mean of μ in the data set. x = x - μ

range

the difference between the maximum and minimum data entries

sampling error

the difference between the point estimate and the actual parameter value

Sampling Error

the difference between the results of the sample and those of the population

stem

the entry's left most digit ex: 42 = stem of 4

leaf

the entry's right most digit ex: 42 = leaf of 2

margin of error E

the greatest possible distance between the point estimate and the value of the parameter it is estimating. the maximum error of estimate or error tolerance the population has to be normally distributed or the sample has to be n≥30

lower class limit

the lower value of a class in a frequency distribution that has the same decimal place value as the data

weighted mean

the mean of a data set whose entries have varying weights

relative frequency

the number of times an event occurs, DIVIDED by the total number of trials

quartiles

the numbers that separate the set into four equal parts

mutually exclusive

the occurrence of one means that none of the others can occur

sampling distribution

the probability distribution of a sample statistic that is formes when samples of n size are repeatedly taken from a population; describes how values of a sample statistic vary across all possible samples of a specific size that can be taken from a population

normal curve

the symmetrical bell-shaped curve that describes the distribution of many physical and psychological attributes. Most scores fall near the average, and fewer and fewer scores lie near the extremes.

upper class limit

the upper value of a class in a frequency distribution that has the same decimal place value as the data

Simulation

the use of a mathematical or physical model to reproduce the conditions of a situation or process they allow you to study situations that are impractical or even dangerous to recreate Ex: Crash test dummy Wave pool with a tiny scale ship Vibrating "earthquake" table

median

the value that lies in the middle of the data when the data set is ordered middle number when odd number of entries mean of the middle 2 entries when the data set is even

y-intercept b

the y-coordinate of a point where a graph crosses the y-axis

continuous probability

theoretically an infinite number of outcomes within a given range

Experimentation

treatment is applied to part of a population and responses are observed a control group does not receive the treatment but is observed the responses of the treatment group and control group are compared and studied Ex: Science Fair project

Convenience Sampling

use results that are very easy to get (localized)

random variable

variable whose value is determined by the outcomes of a probability experiment

σ² = Σ(x-μ)²*P(x)

variance of a discrete random variable

symmetric distribution

when a vertical line can be drawn through the middle of the graph and the resulting halves are approximately mirror images

paired data sets

when each entry in one data set corresponds to one entry in a second data set

sample mean

x̅ = Σx/n x bar = sigma x over n

frequency histogram axis

y-axis= class frequency x-axis= data values

c-prediction for interval y

ŷ - E < y < ŷ + E


Ensembles d'études connexes

SIE 2.2 Understanding Product and their Risk

View Set

Chapter 7: By-Products of Meat Animals

View Set

Real World Examples IB ECONOMICS

View Set

Установка заземлений

View Set

Critical Care Hesi practice questions:

View Set