MBA 601 DA Final Exam
Covariance
A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship.
Geometric Mean
A measure of location that is calculated by finding the nth root of the product of n values. x^2 = a*b
Mode
A measure of location, defined as the value that occurs with greatest frequency
Coefficient of variation
A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100. (s/x¯)*100
Pearson product moment correlation coefficient
A measure of the linear relationship between two variables
Variance
A measure of variability based on the squared deviations of the data values about the mean. [Σ(Xi-X¯)]/n-1.
Standard Deviation
A measure of variability computed by taking the positive square root of the variance.
Interquartile Range (IQR)
A measure of variability, defined as the difference between the third and first quartiles. Q3-Q1
Range
A measure of variability, defined to be the largest value minus the smallest value.
Subjective Method
A method of assigning probabilities on the basis of judgment
Classical Method
A method of assigning probabilities that is appropriate when all the experimental outcomes are equally likely.
Relative Frequency Method
A method of assigning probabilities that is appropriate when data are available to estimate the proportion of the time the experimental outcome will occur if the experiments repeated a large number of times.
Bayes Theorem
A method used to compute posterior probabilities
Probability
A numerical measure of the likelihood that an event will occur
Population Parameters
A numerical value used as a summary measure for a population (e.g., the population mean, μ, the population variance, σ2, and the population standard deviation, σ.)
Sample Statistics
A numerical value used as a summary measure for a sample (e.g., the sample mean, x¯, the sample variance, s2, and the sample standard deviation, s).
Multiplication Law
A probability law used to compute the probability of the intersection of two events. It is P(A ∩ B) = P(B) P(A | B) or P(A ∩ B) = P(A)P(B | A). For independent events it reduces to P(A ∩ B) = P(A)P(B).
pth percentile
Approximately p% of the observation are less than the pth percentile and approximately (100 − p)% of the observations are greater than the pth percentile.
Statistical Inference
The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population.
Ratio Scale
The scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric.
Interval Scale
The scale of measurement for a variable if the data demonstrate the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric.
Ordinal Scale
The scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. Ordinal data may be nonnumeric or numeric.
Nominal Scale
The scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric.
Population
The set of all elements of interest in a particular study
Sample Space
The set of all experimental outcomes
Observation
The set of measurements obtained for a particular element.
Class Midpoint
The value halfway between the lower and upper class limits
Marginal Probabilities
The values in the margins of the joint probability table, which provide the probability of each event separately.
Independent Events
Two events A and B where P(A | B) = P(A) or P(B | A) = P(B); that is, the events have no influence on each other.
Descriptive Analytics
Analytical techniques that describe what has happened in the past
Stacked Bar Chart
A bar chart in which each bar is broken into rectangular segments of a different color showing the relative frequency of each class in a manner similar to a pie chart.
Variable
A characteristic of interest for the elements
Event
A collection of sample points
Bar Chart
A graphical device for depicting categorical data that have been summarized in a frequency, relative frequency, or percent frequency distribution.
Pie Chart
A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class
Dot Plot
A graphical device that summarizes data by the number of dots above each data value on the horizontal axis.
Side-by-Side Bar Chart
A graphical display for depicting multiple bar charts on the same display
Histogram
A graphical display of a frequency distribution, relative frequency distribution, or percent frequency distribution of quantitative data constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis.
Scatter Diagram
A graphical display of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.
Stem-and-leaf display
A graphical display used to show simultaneously the rank order and shape of a distribution of data.
Venn Diagram
A graphical representation for showing symbolically the sample space and operations involving events in which the sample space is represented by a rectangle and events are represented as circles within the sample space.
Boxplot
A graphical summary of the data based on a five-number summary
Trendline
A line that provides an approximation of the relationship between two variables.
Mean
A measure of central location computed by summing the data values and dividing by the number of observations. (2+4+6+8+10)/5
Median
A measure of central location provided by the value in the middle when the data are arranged in ascending order
Correlation Coefficient
A measure of linear association between two variables that takes on values between −1 and +1. Values near +1 indicate a strong positive linear relationship; values near −1 indicate a strong negative linear relationship; and values near zero indicate the lack of a linear relationship.
Addition Law
A probability law used to compute the probability of the union of two events. It is P(A ∩ B) = P(A) + P(B) − P(A ∪ B). For mutually exclusive events, P(A ∩ B) = 0; in this case the addition law reduces to P(A ∪ B) = P(A) + P(B).
Experiment
A process that generates well-defined outcomes
Empirical Rule
A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell-shaped distribution.
Point Estimator
A sample statistic, such as x ¯, s^2, and s, used to determine the corresponding population parameter.
Big Data
A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time. Characterized by great volume (a large amount of data), high velocity (fast collection and processing), or wide variety (could include nontraditional data such as video, audio, and text).
Data Dashboard
A set of visual displays that organizes and presents information that is used to monitor the performance of a company or organization in a manner that is easy to read, understand, and interpret.
Sample
A subset of the population
Sample Survey
A survey to collect data on a sample
Census
A survey to collect data on the entire population
Crosstabulation
A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns.
Relative Frequency Distribution
A tabular summary of data showing the fraction or proportion of observations in each of several non-overlapping categories or classes.
Frequency Distribution
A tabular summary of data showing the number (frequency) of observations in each of several non-overlapping categories or classes.
Percent Frequency Distribution
A tabular summary of data showing the percentage of observations in each of several non-overlapping classes.
Cumulative Relative Frequency Distribution
A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class.
Cumulative Frequency Distribution
A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class.
Cumulative Percent Frequency Distribution
A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class.
Five-Number Summary
A technique that uses five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value. Visually a boxplot.
Data Visualization
A term used to describe the use of graphical displays to summarize and present information about a data set.
Chebyshev's Theorem
A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean. 1-(1/k^2)
z-score
A value computed by dividing the deviation about the mean by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations xi is from the mean. (xi-x¯)/s
Percentile
A value that provides information about how the data are spread over the interval from the smallest to the largest value.
Categorical Variable
A variable with categorical data
Data Set
All the data collected in a particular study
Sample Point
An element of the sample space. A sample point represents an experimental outcome.
Multiple-Step Experiments
An experiment that can be described as a sequence of steps. If a multiple-step experiment has k steps with n1 possible outcomes on the first step, n2 possible outcomes on the second step, and so on, the total number of experimental outcomes is given by (n1)(n2)...(nk).
Outliers
An unusually small or unusually large data value
Simpson's Paradox
Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation.
Cross-Sectional Data
Data collected at the same or approximately the same point in time
Time Series Data
Data collected over several time periods
Skewness
Definition: A measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right result in positive skewness.
Mutually Exclusive Events
Events that have no sample points in common; that is, A ∩ B is empty and P(A ∩ B) = 0.
Joint Probability
The probability of two events both occurring; that is, the probability of the intersection of two events.
Prior Probabilitiy
Initial estimates of the probabilities of events
Categorical Data
Labels or names used to identify categories of like items
Quantitative Data
Numerical values that indicate how much or how many
Quantitative Data
Obtained using either the interval or ratio scale of measurement
Categorical Data
Obtained using either the nominal or ordinal scale of measurement
Posterior Probabilities
Revised probabilities of events based on additional information
Descriptive Statistics
Tabular, graphical, and numerical summaries of data
Quartiles
The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts, with each part containing approximately 25% of the data.
Statistics
The art and science of collecting, analyzing, presenting, and interpreting data.
Elements
The entities on which data are collected
Complement of A
The event consisting of all sample points that are not in A.
Union of A and B
The event containing all sample points belonging to A or B or both. The union is denoted A ∪ B.
Intersection of A and B
The event containing the sample points belonging to both A and B. Denoted as A ∩ B.
Data
The facts and figures collected, analyzed, and summarized for presentation and interpretation.
Weighted Mean
The mean obtained by assigning each observation a weight that reflects its importance.
Permutations
The number of ways n objects may be selected from among N objects when the order in which the n objects are selected is important. The total number of permutations of N objects taken n at a time is PnN=n!(Nn)=N!(N-n)! for n = 0, 1, 2, ..., N.
Combination
The number of ways n objects may be selected from among N objects without regard to the order in which the n objects are selected. The total number of combinations of N objects taken n at a time is CnN=(Nn)=N!n!(N-n)! for n = 0, 1, 2, ..., N.
Combinations
The number of ways n objects may be selected from among N objects without regard to the order in which the n objects are selected. The total number of combinations of N objects taken n at a time is CnN=(Nn)=N!n!(N-n)! for n = 0, 1, 2, ..., N.
Conditional Probabilities
The probability of an event given that another event already occurred. Denoted: A given B is P(A|B)=P(A∩B)P(B).
Joint Probabilities
The probability of two events both occurring; that is, the probability of the intersection of two events.
Basic Requirements for Assigning Probabilities
Two requirements that restrict the manner in which probability assignments can be made: (1) for each experimental outcome Ev, we must have 0 <= P(E1i) <= 1; (2) Considering all experimental outcomes, we must have P(E1) + P(E2) + ...... P(En) = 1
Tree Diagram
a graphical representation that helps in visualizing a multiple-step experiment