QMB- Summer Semester UWF
t distribution
A family of probability distributions that can be used to develop an interval estimate of a population mean whenever the population standard deviation σ is unknown and is estimated by the sample standard deviation s.
two tailed test
A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in either tail of its sampling distribution.
coefficient of determination
A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.
unbiased
A property of a point estimator that is present when the expected value of the point estimator is equal to the population parameter it estimates.
point estimator
A single value used as an estimate of the corresponding population parameter.
regression analysis
A statistical procedure used to develop an equation showing how the variables are related.
best subsets
A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables
dummy variable
A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.
backward elimination
An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step
cross- validation
Assessment of the performance of a model on data other than the data that were used to generate the model.
overfitting
Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population.
holdout method
Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.
linear regression
Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line.
multiple linear regression
Regression analysis involving one dependent variable and more than one independent variable.
quadratic regression model
Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model: ; also referred to as a second-order polynomial model.
piecewise linear regression model
Regression model in which one linear relationship between the independent and dependent variables is fit for values of the independent variable below a prespecified value of the independent variable, a different linear relationship between the independent and dependent variables is fit for values of the independent variable above the prespecified value of the independent variable, and the two regressions have the same estimated value of the dependent variable (i.e., are joined) at the prespecified value of the independent variable.
regression model
The equation that describes how the dependent variable y is related to an independent variable x and an error term
estimated regression
The estimate of the regression equation developed from sample data by using the least squares method.
target population
The population for which statistical inferences such as point estimates are made. It is important for the target population to correspond as closely as possible to the sampled population.
statistical inference
The process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population.
interval estimation
The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.
one tailed test
a Hypothesis test in which rejection of the nil hypothesis occurs for values of the test statistics in one tail of its sampling distribution
sample statistic
a characteristic of sample data, such as a sample mean, a sample standard deviation, a sample proportion, and so on; the value of the sample statistics is used to estimate the value of the corresponding population parameter
variable
a characteristic or quantity of interest that can take on different values
event
a collection of outcomes
uniform probability distribution
a continuous probability distribution for which the probability that the random variable will assume a value in any interval is the same for each interval of equal length
normal probability distribution
a continuous probability distribution in which the probability density function is bell shaped and determined by its mean 'u' and standard deviation 'o'
triangular probability distribution
a continuous probability distribution in which the probability density function is shaped like a triangle defined by the minimum possible vale a, and the maximum possible value b, and the more likely value m; a triangular probability distribution is often used when only subjection estimates are available for the minimum, maximum, and most likely values
exponential probability distribution
a continuous probability distribution that is useful in computing probabilities for the time it takes to complete a task or the time between arrivals; the mean and standard deviation for an exponential probability distribution are equal to each other
tall data
a data set that has so many observations that traditional statistical inference has little meaning
wide data
a data set that has so many variables that simultaneous consideration of all variables is infeasible
probability distribution
a description of how probabilities are distributed over the values of a random variable
probability density function
a function used to compute probabilities for a continuous random variable; the area under the graph off a probability density function over an interval represents probability
probability mass function
a function, denoted by f (x), that provides the probability that x assumes a particular value for a discrete random variable
histogram
a graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the bin intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis
scatter chart
a graphical presentation of the relationship between two quantitative variables; one variable is shown on the horizontal axis and the other on the vertical axis
venn diagram
a graphical representation of the sample space and operations involving events, in which the sample space is represented by a rectangle and events are represented as circles within the same space
box plot
a graphical summary of data based on the quartiles of a distribution
multiplication law
a law used to compute the probability of the intersection of events
frame
a listing of the element from which the sample will be selected
parameter
a measurable factor that define a characteristic of a population, process, or system, such as a population mean, a population standard deviation, a population proportion, and so on
parameter
a measurable factor that defines a characteristic of a population, process, or system
mean (arithmetic mean)
a measure of central location computed by summing the data values and dividing by the number of observations
mode
a measure of central location defined as the value that occurs with the greatest frequency
median
a measure of central location provided by the value in the middle when the data are arranged in ascending order
geometric mean
a measure of central location that is calculated by finding the nth root of the product of n values
covariance
a measure of linear association between two variables
coefficient of variation
a measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100
expected value
a measure of the central location, or mean, of a random variable
skewness
a measure of the lack of symmetry in a distribution
variance
a measure of variability based on the squared deviations of the data values about the mean
standard deviation
a measure of variability computed by taking the positive square root of the variance
range
a measure of variability defined to be the largest value minus the smallest value
variance
a measure of variability, or dispersion, of a random variable
bayes' theorem
a method used to compute posterior probabilities
standard normal distribution
a normal distribution with a mean of zero and a standard deviation of one
random variables
a numerical description of the outcome of an experiment
probability
a numerical measure of the likelihood that an event with occur
degrees of freedom
a parameter of the t distribution; when the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n-1 degrees of freedom, when n is the size of the sample
sampling distribution
a probability distribution consisting of all possible values of a sample statistic
custom discrete probability distribution
a probability distribution for a discrete random variable for which each value xi that the random variable assumes is associated with a defined probability f(xi)
poisson probability distribution
a probability distribution for a discrete random variable showing the probability of x occurrences of an event over a specified interval of time or space
binomial probability distribution
a probability distribution for a discrete random variable showing the probability of x successes in n trials
empirical probability distribution
a probability distribution for which the relative frequency method is used to assign probabilities
discrete uniform probability distribution
a probability distribution in which each possible value of the discrete random variable has the same probability
addition law
a probability law used to compute the probability of the union of events
lease squares method
a procedure for using sample data to find the estimated regression equation
random experiment
a process that generates well-defined experimental outcomes; on any single repetition of trial, the outcome that occurs is determined by chance
random variable
a quantity whose values are not known with certainty
random variable, or uncertain variable
a quantity whose values are not known with certainty
random sample
a random sample of from an infinite population is a sample selected such that the following conditions are satisfied: (1) each element selected comes from the same population and (2) each element is selected independently
discrete random variable
a random variable that can take on only specified discrete values
continuous random variable
a random variable that may assume any numerical value in an interval or collection of intervals; an interval can include negative and positive infinity
empirical rule
a rule that can be used to compute the percentage of data values that must be within 1, 2, or 3 standard deviations of the mean for data that exhibit a bell shaped frequency
observation
a set of values corresponding to a set of variables
simple random sample
a simple random sample size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected
correlation coefficient
a standardized measure of linear association between two variables that takes on values between -1 and +1; values near -1 indicate a strong negative linear relationship; values near +1 indicate a strong positive linear relationship; and values near zero indicate the lack of a linear rrelationship
test statistic
a statistic whose value helps determine whether a null hypothesis should be rejected
spillover
a subject continuing to rate something positively or negatively because that was his/her earlier rating and he wants to stay true to the earlier rating, rather than the impression
sample
a subset of the population
relative frequency distribution
a tabular summary of data showing the fraction or proportion of data values in each of several nonoverlapping bins
frequency distribution
a tabular summary of data showing the number (frequency) of data values in each of several nonoverlapping bins
percent frequency distribution
a tabular summary of data showing the percentage of data values in each of several nonoverlapping bins
cumulative frequency distribution
a tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each bin
central limit theorem
a theorem stating that when enough independent random variable are added, the resulting sum is the normally-distributed random variable; this result allows one to use the normal probability distribution to approximate the sampling distributions of the sample mean and the sample proportion for sufficiently large sample sizes
z score
a value computed by dividing the deviation about the mean (xi-x) by the standard deviation; a z score is referred to as a standardized value and denotes the number of standard deviations that xi is from the mean
percentile
a value such that approximately p% of the observations have values less than the pth percentile; hence, approximately (100-p)% of the observations have values greater than the pth percentile; the 50th percentile is the median
interval estimate
an estimate of a population parameter that provides an interval believed to contain the value of the parameter
confidence interval
an estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence
confidence level
an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating
predication interval
an interval estimate of the prediction of an individual y value give values of the independent variable
stepwise selection
an iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step
forward selection
an iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step
outliers
an unusually large or unusually small data value
confidence interval
another name for an interval estimate
non sampling error
any difference between the value of a sample statistics (such as the sample mean, sample standard deviation, or sample proportion) and the value of the corresponding population parameter (population mean, population standard deviation, or population proportion) that are not the result of the sampling error; these include but are not limited to coverage error, nonresponse error, measurement error, interviewer error, and processing error
big data
any set of data that is too large or too complex to be handled by standard data processing techniques and typical desk top software
measurement error
anything that causes questions about the accuracy of the variable(s) measured
random sampling
collecting a sample that ensures that( )1 each element is selected comes from the same population and (2) each element is selected independently
census
collection of data from every element in the population of interest
cross-sectional data
data collected at the same or approximately the same point in time
categorical data
data for which categories of like items are identified by labels or names
quantitative data
data for which numerical values are used to indicate magnitude, such as how many or how much; arithmetic operations such as addition, subtraction, and multiplication can be performed on quantitative data
time series data
data that are collected over a period of time
variation
differences in values of a variable over observations
probability of an event
equal to the sum of the probabilities of outcomes for the event
mutually exclusive events
events that have no outcomes in common
sources of non sampling error
generalizability, inappropriate sampling, non-response, self-selection, measurement error, experimenter bias, timing, experimental demand, spillover, poor question practices, poor survery practices, erroneous interpretation
non-response
inability to gather data from some of the entities always raises a question about whether those entities are somehow systematically different from the ones about which you do have information
prior probability
initial estimate of the probabilities of events
erroneous interpretation
making suggestions for actions that are not supported by the evidence provided
timing
measuring something too soon or too late after some change or intervention
leave on out cross validation
method of cross validation in which candidate models are repeatedly fit using n- 1 observations and evaluated with the remaining observation
k-fold cross validation
method of cross validation in which sample data set are randomly divided into k equal sized, mutually exclusive and collectively exhaustive subsets; in each of k iterations, one of the k subsets is used to evaluate a candidate model that was constructed on the data from the other k-1 subsets
illegitimately missing data
missing data that do not occur naturally
legitimately missing data
missing data that occur naturally
generalizability
no population has been identified to which the study results should be generalized
measurement error
non sampling error that results from the incorrect measurement of the population characteristics of interest
coverage error
non sampling error that results when the research objective and the population from which the sample is to be drawn are not aligned
nonresponse error
nonsampling error that results when some segments of the population are more likely or less likely to respond to the survey mechanism
experimenter bias
often researchers inadvertently or intentionally influence studies so that the result supports their position (hypothesis)
standard deviation
positive square root of the variance
extrapolation
prediction of the mean value of the dependent variable y for values of the independent variables x sub 1, x sub 2..... that are outside the experimental range
simple linear regression
regression analysis involving one independent variable and one dependent variable
interaction
regression modeling technique used when the relationship between the dependent variable and on in dependent variable is different at different values of a second independent variable
posterior probabilities
revised probabilities of events based on additional information
t test
statistical test based on the students t probability distribution that can be used to test the hypothesis that a regression parameters B is zero; if this hypothesis is rejected, we conclude that there is a regression relationship between the jth independent variable and the dependent variable
imputation
systematic replacement of missing values with values that seem reasonable
margin of error
the + or - value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter
quartiles
the 25th, 50th, and 75th percentiles, referred to as the first quartile, second quartile (median), and third quartile, respectively; the quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data
volume
the amount of data generated
confidence level
the confidence associated with an interval estimates; for example, if an interval estimation procedure provides intervals such that 95% of the intervals formed using the procedure will include the population parameter, the interval estimate is said to be constructed at the 95% confidence level
confidence coefficient
the confidence level expressed as a decimal value; for example 0.95 is the confidence coefficient for a 95% confidence level
training set
the data set used to build the candidate models
validation set
the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable
Multicollinearity
the degrees of correlation among independent variables in a regression model
residual
the difference between the observed value of the dependent variable and the value predicted using the estimated regression equation
interquartile range
the difference between the third and first quartiles
sampling error
the difference between the value of sample statistics (such as the sample mean, sample standard deviation, or sample proportion) and the value of the corresponding population parameter (population mean, population standard deviation, population proportion) that occurs because a random sample is used to estimate the population parameter
variety
the diversity in types and structures of data generated
inappropriate sample
the entities sampled do not represent the population that the researcher had in mind
type II error
the error of accepting H sub 0 when it is false
type I error
the error of rejecting H sub 0 when it is true
complement of A
the event consisting of all outcomes that are not in A
union of A and B
the event containing the outcomes belonging to A or B or both; the union of A and B is denoted by A u B
intersection of A and B
the event containing the outcomes belonging to both A and B
data
the facts and figures collected, analyzed, and summarized for presentation and interpretation
alternative hypothesis
the hypothesis concluded to be true if the null hypothesis is rejected
null hypothesis
the hypothesis tentatively assumed to be true in the hypothesis testing procedure
bins
the nonoverlapping groupings of data used to create a frequency distribution
growth factor
the percentage increase of a value over a period of time is calculated using the formula 1-growth factor
sampled population
the population from which the sample is drawn
knot
the prespecified value f the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model also called the break point or the joint
conditional probability
the probability of an event given that another event has already occurred
joint probabilities
the probability of two events both occurring; in other words, the probability of the intersection of two events
p value
the probability that a random sample of the same size collected from the same population using the same procedure with yield strong evidence against a hypothesis that the evidence in the sample data given that the hypothesis is actually true
level of significance
the probability that is interval estimation procedure will generate an interval that does not contain the value of parameter being; also the probability of making at type I error when the null hypothesis is true as an equality
p value
the probability, assuming that H sub 0is true, of obtaining a random sample of size n that results in a test statistic at least as extreme as the one observed in the current sample; for a lower tail test, the p value is the probability of obtaining a value for the test statistics as small as or smaller than the provided sample; for an upper tail test the p value is the probability of obtaining a value for the test statistics as large as or larger than that provided by the sample; for a two tail test, the p value is the probability of obtaining a value for the test statistics at least as unlikely as or more than the provided by the sample
hypothesis testing
the process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture
statistical inference
the process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population
dimension reduction
the process of removing variables from the analysis without losing crucial information
interval estimation
the process of using sample data to calculate a range of values that is believed to include the unknown value of a population parameter
experimental region
the range of values for the independent variables for the data that are used to estimate the regression model
pratical significance
the real-whorl impact the result of statistical inference will have on business decisions
veracity
the reliability of the data generated
experimental demand
the researcher influences a subject (consciously or not) to provide data the researcher is hoping for
point estimator
the sample statistics that provides the point estimate of the population parameter
population
the set of all elements of interest in a particular study
sample space
the set of all outcomes
velocity
the speed at which the data are generated
standard error
the standard deviation of a point estimator
missing completely at random (MCAR)
the tendency for an observation to be missing a value of some variable is entirely random
missing not at random (MNAR)
the tendency for an observation to be missing a value of some variable is related to the missing value
missing at random (MAR)
the tendency for an observation to be missing a value of some variable is related to the value of some other variable(s) in the data
finite population correction factor
the term that is used in the formulas for computing the (estimated) standard error for the sample means and the sample proportion whenever a finite population, rather than an infinite population, is being sampled
point estimate
the value of a point estimator used in a particular instance as an estimate of a population parament
marginal probabilities
the values in the margins of a joint probability table that provide the probabilities of each event separately
independent variable
the variable (s) used for predicting or explaining values of the dependent variable; it is denoted by x and is often referred to as the predictor variable
dependent variable
the variable that is being predicted or explained; it is denoted by y and is often referred to as the response
self-selection
those motivated to respond are more likely to share views or to participate in the study itself- this is the basis of an 'unscientific' survey
independent events
two events A and B are independent is P(A I B)= P(A) or P(B I A)= P (B) ; the events do not influence each other
poor question practices/ poor survey practices
wording responses in a way to influence the response, failing to randomize choices, using language too obscure for respondents to fully understand