QMB- Summer Semester UWF

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

t distribution

A family of probability distributions that can be used to develop an interval estimate of a population mean whenever the population standard deviation σ is unknown and is estimated by the sample standard deviation s.

two tailed test

A hypothesis test in which rejection of the null hypothesis occurs for values of the test statistic in either tail of its sampling distribution.

coefficient of determination

A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.

unbiased

A property of a point estimator that is present when the expected value of the point estimator is equal to the population parameter it estimates.

point estimator

A single value used as an estimate of the corresponding population parameter.

regression analysis

A statistical procedure used to develop an equation showing how the variables are related.

best subsets

A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables

dummy variable

A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.

backward elimination

An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step

cross- validation

Assessment of the performance of a model on data other than the data that were used to generate the model.

overfitting

Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population.

holdout method

Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.

linear regression

Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line.

multiple linear regression

Regression analysis involving one dependent variable and more than one independent variable.

quadratic regression model

Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model: ; also referred to as a second-order polynomial model.

piecewise linear regression model

Regression model in which one linear relationship between the independent and dependent variables is fit for values of the independent variable below a prespecified value of the independent variable, a different linear relationship between the independent and dependent variables is fit for values of the independent variable above the prespecified value of the independent variable, and the two regressions have the same estimated value of the dependent variable (i.e., are joined) at the prespecified value of the independent variable.

regression model

The equation that describes how the dependent variable y is related to an independent variable x and an error term

estimated regression

The estimate of the regression equation developed from sample data by using the least squares method.

target population

The population for which statistical inferences such as point estimates are made. It is important for the target population to correspond as closely as possible to the sampled population.

statistical inference

The process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population.

interval estimation

The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.

one tailed test

a Hypothesis test in which rejection of the nil hypothesis occurs for values of the test statistics in one tail of its sampling distribution

sample statistic

a characteristic of sample data, such as a sample mean, a sample standard deviation, a sample proportion, and so on; the value of the sample statistics is used to estimate the value of the corresponding population parameter

variable

a characteristic or quantity of interest that can take on different values

event

a collection of outcomes

uniform probability distribution

a continuous probability distribution for which the probability that the random variable will assume a value in any interval is the same for each interval of equal length

normal probability distribution

a continuous probability distribution in which the probability density function is bell shaped and determined by its mean 'u' and standard deviation 'o'

triangular probability distribution

a continuous probability distribution in which the probability density function is shaped like a triangle defined by the minimum possible vale a, and the maximum possible value b, and the more likely value m; a triangular probability distribution is often used when only subjection estimates are available for the minimum, maximum, and most likely values

exponential probability distribution

a continuous probability distribution that is useful in computing probabilities for the time it takes to complete a task or the time between arrivals; the mean and standard deviation for an exponential probability distribution are equal to each other

tall data

a data set that has so many observations that traditional statistical inference has little meaning

wide data

a data set that has so many variables that simultaneous consideration of all variables is infeasible

probability distribution

a description of how probabilities are distributed over the values of a random variable

probability density function

a function used to compute probabilities for a continuous random variable; the area under the graph off a probability density function over an interval represents probability

probability mass function

a function, denoted by f (x), that provides the probability that x assumes a particular value for a discrete random variable

histogram

a graphical presentation of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the bin intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis

scatter chart

a graphical presentation of the relationship between two quantitative variables; one variable is shown on the horizontal axis and the other on the vertical axis

venn diagram

a graphical representation of the sample space and operations involving events, in which the sample space is represented by a rectangle and events are represented as circles within the same space

box plot

a graphical summary of data based on the quartiles of a distribution

multiplication law

a law used to compute the probability of the intersection of events

frame

a listing of the element from which the sample will be selected

parameter

a measurable factor that define a characteristic of a population, process, or system, such as a population mean, a population standard deviation, a population proportion, and so on

parameter

a measurable factor that defines a characteristic of a population, process, or system

mean (arithmetic mean)

a measure of central location computed by summing the data values and dividing by the number of observations

mode

a measure of central location defined as the value that occurs with the greatest frequency

median

a measure of central location provided by the value in the middle when the data are arranged in ascending order

geometric mean

a measure of central location that is calculated by finding the nth root of the product of n values

covariance

a measure of linear association between two variables

coefficient of variation

a measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100

expected value

a measure of the central location, or mean, of a random variable

skewness

a measure of the lack of symmetry in a distribution

variance

a measure of variability based on the squared deviations of the data values about the mean

standard deviation

a measure of variability computed by taking the positive square root of the variance

range

a measure of variability defined to be the largest value minus the smallest value

variance

a measure of variability, or dispersion, of a random variable

bayes' theorem

a method used to compute posterior probabilities

standard normal distribution

a normal distribution with a mean of zero and a standard deviation of one

random variables

a numerical description of the outcome of an experiment

probability

a numerical measure of the likelihood that an event with occur

degrees of freedom

a parameter of the t distribution; when the t distribution is used in the computation of an interval estimate of a population mean, the appropriate t distribution has n-1 degrees of freedom, when n is the size of the sample

sampling distribution

a probability distribution consisting of all possible values of a sample statistic

custom discrete probability distribution

a probability distribution for a discrete random variable for which each value xi that the random variable assumes is associated with a defined probability f(xi)

poisson probability distribution

a probability distribution for a discrete random variable showing the probability of x occurrences of an event over a specified interval of time or space

binomial probability distribution

a probability distribution for a discrete random variable showing the probability of x successes in n trials

empirical probability distribution

a probability distribution for which the relative frequency method is used to assign probabilities

discrete uniform probability distribution

a probability distribution in which each possible value of the discrete random variable has the same probability

addition law

a probability law used to compute the probability of the union of events

lease squares method

a procedure for using sample data to find the estimated regression equation

random experiment

a process that generates well-defined experimental outcomes; on any single repetition of trial, the outcome that occurs is determined by chance

random variable

a quantity whose values are not known with certainty

random variable, or uncertain variable

a quantity whose values are not known with certainty

random sample

a random sample of from an infinite population is a sample selected such that the following conditions are satisfied: (1) each element selected comes from the same population and (2) each element is selected independently

discrete random variable

a random variable that can take on only specified discrete values

continuous random variable

a random variable that may assume any numerical value in an interval or collection of intervals; an interval can include negative and positive infinity

empirical rule

a rule that can be used to compute the percentage of data values that must be within 1, 2, or 3 standard deviations of the mean for data that exhibit a bell shaped frequency

observation

a set of values corresponding to a set of variables

simple random sample

a simple random sample size n from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected

correlation coefficient

a standardized measure of linear association between two variables that takes on values between -1 and +1; values near -1 indicate a strong negative linear relationship; values near +1 indicate a strong positive linear relationship; and values near zero indicate the lack of a linear rrelationship

test statistic

a statistic whose value helps determine whether a null hypothesis should be rejected

spillover

a subject continuing to rate something positively or negatively because that was his/her earlier rating and he wants to stay true to the earlier rating, rather than the impression

sample

a subset of the population

relative frequency distribution

a tabular summary of data showing the fraction or proportion of data values in each of several nonoverlapping bins

frequency distribution

a tabular summary of data showing the number (frequency) of data values in each of several nonoverlapping bins

percent frequency distribution

a tabular summary of data showing the percentage of data values in each of several nonoverlapping bins

cumulative frequency distribution

a tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each bin

central limit theorem

a theorem stating that when enough independent random variable are added, the resulting sum is the normally-distributed random variable; this result allows one to use the normal probability distribution to approximate the sampling distributions of the sample mean and the sample proportion for sufficiently large sample sizes

z score

a value computed by dividing the deviation about the mean (xi-x) by the standard deviation; a z score is referred to as a standardized value and denotes the number of standard deviations that xi is from the mean

percentile

a value such that approximately p% of the observations have values less than the pth percentile; hence, approximately (100-p)% of the observations have values greater than the pth percentile; the 50th percentile is the median

interval estimate

an estimate of a population parameter that provides an interval believed to contain the value of the parameter

confidence interval

an estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence

confidence level

an indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating

predication interval

an interval estimate of the prediction of an individual y value give values of the independent variable

stepwise selection

an iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step

forward selection

an iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step

outliers

an unusually large or unusually small data value

confidence interval

another name for an interval estimate

non sampling error

any difference between the value of a sample statistics (such as the sample mean, sample standard deviation, or sample proportion) and the value of the corresponding population parameter (population mean, population standard deviation, or population proportion) that are not the result of the sampling error; these include but are not limited to coverage error, nonresponse error, measurement error, interviewer error, and processing error

big data

any set of data that is too large or too complex to be handled by standard data processing techniques and typical desk top software

measurement error

anything that causes questions about the accuracy of the variable(s) measured

random sampling

collecting a sample that ensures that( )1 each element is selected comes from the same population and (2) each element is selected independently

census

collection of data from every element in the population of interest

cross-sectional data

data collected at the same or approximately the same point in time

categorical data

data for which categories of like items are identified by labels or names

quantitative data

data for which numerical values are used to indicate magnitude, such as how many or how much; arithmetic operations such as addition, subtraction, and multiplication can be performed on quantitative data

time series data

data that are collected over a period of time

variation

differences in values of a variable over observations

probability of an event

equal to the sum of the probabilities of outcomes for the event

mutually exclusive events

events that have no outcomes in common

sources of non sampling error

generalizability, inappropriate sampling, non-response, self-selection, measurement error, experimenter bias, timing, experimental demand, spillover, poor question practices, poor survery practices, erroneous interpretation

non-response

inability to gather data from some of the entities always raises a question about whether those entities are somehow systematically different from the ones about which you do have information

prior probability

initial estimate of the probabilities of events

erroneous interpretation

making suggestions for actions that are not supported by the evidence provided

timing

measuring something too soon or too late after some change or intervention

leave on out cross validation

method of cross validation in which candidate models are repeatedly fit using n- 1 observations and evaluated with the remaining observation

k-fold cross validation

method of cross validation in which sample data set are randomly divided into k equal sized, mutually exclusive and collectively exhaustive subsets; in each of k iterations, one of the k subsets is used to evaluate a candidate model that was constructed on the data from the other k-1 subsets

illegitimately missing data

missing data that do not occur naturally

legitimately missing data

missing data that occur naturally

generalizability

no population has been identified to which the study results should be generalized

measurement error

non sampling error that results from the incorrect measurement of the population characteristics of interest

coverage error

non sampling error that results when the research objective and the population from which the sample is to be drawn are not aligned

nonresponse error

nonsampling error that results when some segments of the population are more likely or less likely to respond to the survey mechanism

experimenter bias

often researchers inadvertently or intentionally influence studies so that the result supports their position (hypothesis)

standard deviation

positive square root of the variance

extrapolation

prediction of the mean value of the dependent variable y for values of the independent variables x sub 1, x sub 2..... that are outside the experimental range

simple linear regression

regression analysis involving one independent variable and one dependent variable

interaction

regression modeling technique used when the relationship between the dependent variable and on in dependent variable is different at different values of a second independent variable

posterior probabilities

revised probabilities of events based on additional information

t test

statistical test based on the students t probability distribution that can be used to test the hypothesis that a regression parameters B is zero; if this hypothesis is rejected, we conclude that there is a regression relationship between the jth independent variable and the dependent variable

imputation

systematic replacement of missing values with values that seem reasonable

margin of error

the + or - value added to and subtracted from a point estimate in order to develop an interval estimate of a population parameter

quartiles

the 25th, 50th, and 75th percentiles, referred to as the first quartile, second quartile (median), and third quartile, respectively; the quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data

volume

the amount of data generated

confidence level

the confidence associated with an interval estimates; for example, if an interval estimation procedure provides intervals such that 95% of the intervals formed using the procedure will include the population parameter, the interval estimate is said to be constructed at the 95% confidence level

confidence coefficient

the confidence level expressed as a decimal value; for example 0.95 is the confidence coefficient for a 95% confidence level

training set

the data set used to build the candidate models

validation set

the data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable

Multicollinearity

the degrees of correlation among independent variables in a regression model

residual

the difference between the observed value of the dependent variable and the value predicted using the estimated regression equation

interquartile range

the difference between the third and first quartiles

sampling error

the difference between the value of sample statistics (such as the sample mean, sample standard deviation, or sample proportion) and the value of the corresponding population parameter (population mean, population standard deviation, population proportion) that occurs because a random sample is used to estimate the population parameter

variety

the diversity in types and structures of data generated

inappropriate sample

the entities sampled do not represent the population that the researcher had in mind

type II error

the error of accepting H sub 0 when it is false

type I error

the error of rejecting H sub 0 when it is true

complement of A

the event consisting of all outcomes that are not in A

union of A and B

the event containing the outcomes belonging to A or B or both; the union of A and B is denoted by A u B

intersection of A and B

the event containing the outcomes belonging to both A and B

data

the facts and figures collected, analyzed, and summarized for presentation and interpretation

alternative hypothesis

the hypothesis concluded to be true if the null hypothesis is rejected

null hypothesis

the hypothesis tentatively assumed to be true in the hypothesis testing procedure

bins

the nonoverlapping groupings of data used to create a frequency distribution

growth factor

the percentage increase of a value over a period of time is calculated using the formula 1-growth factor

sampled population

the population from which the sample is drawn

knot

the prespecified value f the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model also called the break point or the joint

conditional probability

the probability of an event given that another event has already occurred

joint probabilities

the probability of two events both occurring; in other words, the probability of the intersection of two events

p value

the probability that a random sample of the same size collected from the same population using the same procedure with yield strong evidence against a hypothesis that the evidence in the sample data given that the hypothesis is actually true

level of significance

the probability that is interval estimation procedure will generate an interval that does not contain the value of parameter being; also the probability of making at type I error when the null hypothesis is true as an equality

p value

the probability, assuming that H sub 0is true, of obtaining a random sample of size n that results in a test statistic at least as extreme as the one observed in the current sample; for a lower tail test, the p value is the probability of obtaining a value for the test statistics as small as or smaller than the provided sample; for an upper tail test the p value is the probability of obtaining a value for the test statistics as large as or larger than that provided by the sample; for a two tail test, the p value is the probability of obtaining a value for the test statistics at least as unlikely as or more than the provided by the sample

hypothesis testing

the process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture

statistical inference

the process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through analysis of sample data drawn from the population

dimension reduction

the process of removing variables from the analysis without losing crucial information

interval estimation

the process of using sample data to calculate a range of values that is believed to include the unknown value of a population parameter

experimental region

the range of values for the independent variables for the data that are used to estimate the regression model

pratical significance

the real-whorl impact the result of statistical inference will have on business decisions

veracity

the reliability of the data generated

experimental demand

the researcher influences a subject (consciously or not) to provide data the researcher is hoping for

point estimator

the sample statistics that provides the point estimate of the population parameter

population

the set of all elements of interest in a particular study

sample space

the set of all outcomes

velocity

the speed at which the data are generated

standard error

the standard deviation of a point estimator

missing completely at random (MCAR)

the tendency for an observation to be missing a value of some variable is entirely random

missing not at random (MNAR)

the tendency for an observation to be missing a value of some variable is related to the missing value

missing at random (MAR)

the tendency for an observation to be missing a value of some variable is related to the value of some other variable(s) in the data

finite population correction factor

the term that is used in the formulas for computing the (estimated) standard error for the sample means and the sample proportion whenever a finite population, rather than an infinite population, is being sampled

point estimate

the value of a point estimator used in a particular instance as an estimate of a population parament

marginal probabilities

the values in the margins of a joint probability table that provide the probabilities of each event separately

independent variable

the variable (s) used for predicting or explaining values of the dependent variable; it is denoted by x and is often referred to as the predictor variable

dependent variable

the variable that is being predicted or explained; it is denoted by y and is often referred to as the response

self-selection

those motivated to respond are more likely to share views or to participate in the study itself- this is the basis of an 'unscientific' survey

independent events

two events A and B are independent is P(A I B)= P(A) or P(B I A)= P (B) ; the events do not influence each other

poor question practices/ poor survey practices

wording responses in a way to influence the response, failing to randomize choices, using language too obscure for respondents to fully understand


Ensembles d'études connexes

Community Module 1.09 THE ETHICAL PRINCIPLE OF SOCIAL JUSTICE

View Set

Chapter 10: Insurance Regulation

View Set

Chapter 20 Section 2 Revolutions of 1830 and 1848

View Set