Business Analytics Chapter 2
Geometric Mean
nth root of the product of n values -Used in analyzing growth rates in financial data.
Quantitative data
Data on which numeric and arithmetic operations, such as addition, subtraction, multiplication, and division, can be performed
Standard Deviation
Positive square root of the variance -measures in the same unity as the original data
Observation
Set of values corresponding to a set of variables
Variation
The difference in a variable measured over observations
Data
The facts and figures collected, analyzed, and summarized for presentation and interpretation.
Coefficient of Variation
(Standard Deviation/Mean *100) -Measures the standard deviation relative to the mean -expresses as a percentage.
Population
All elements of interest
Experimental
Experimental Study: A variable of interest is first identified-then one or more other variables are identified and controlled or manipulated so that data can be obtained about how they influence the variable interest.
Outlier:
Extreme Value is a data set -It can be identified using STDV values (z-scores) -Any data value with a z-score less than -3 or greaterthan +3 is an outlier.
No Linear relationship
Near 0
Sample
Subset of the population
Percent frequency Distribution
Summarizes the percent frequency of the data for each bin -Percent frequency distribution is used to provide estimates of the relative likelihoods of different values of random variable.
Scatter Charts:
Useful graph for analyzing the relationship between two variables.
Categorical data
Data on which arithmetic operations cannot be performed
Three steps necessary to define the classes for a frequency distribution with quantitative data:
-Determine the number of nonoverlapping bins -Determine the width of each bin -Determine the bin limits
Variance
-Measure of the variability that utilizes all the data -It is based on the deviation about the mean, which is the difference between the value o each observation (xi) and the mean
Negative Linear
<0
Positive Linear
>0
Variable
A characteristic or quantity or interest that can take on different values
Histogram
A common graphical presentation of quantitative data -Constructed by placing the variable of interest on the horizontal axis and the selected frequency measure (absolute frequency, relative frequency, or precent frequency) on the vertical axis. -the frequency measure of each class is shown by drawing a rectangle whose base is determined by the class limits on the horizontal axis and whose height is the corresponding frequency measure. Provide information about the shape, or form, of a distribution
Random Variable/uncertain variable
A quantity whose values are not known with certainty
Random Sampling
A sampling method to gather a representative sample of the population data.
Frequency distribution
A summary of data that shows the number(frequency) of observation in each of several non-overlapping classes-typically referred to as bins, when dealing with distributions.
Cumulative Frequency Distribution
A variation of the frequency distribution that provides another tabular summary of quantitative data. -uses the number of classes, class widths, and class limits developed for the frequency distribution. -Shows the number of data items with values less than or equal to the upper class limit of each class
Mean/Arithmetic Mean
Average value for a variable -Denoted by x with a line above it. -n=sample size
Cross-Sectional data
Data collected from several entities at the same, or approximately the same, point in time.
Time series data
Data collected over several time periods -Graphs of time series data are frequently found in business and economic publications -Graphs help analysts understand what happened in the past, identify trends over time, and project future level for the time series.
Multimode Data
Data contain at least two modes
Bimodal Data
Data contain exactly two modes
Covariance:
Descriptive measure of the linear association between two variables
Q1
First Quartile: 25th Percentile
Empirical Rule
For data having a bell shaped distribution: Within 1 STDV-approximately 68% of the data values Within 2 STDV- approximately 95% of the data values Within 3 STDV- almost all the data values
Range
Found by subtracting the smallest value from the largest value in a data set Drawback- Range is based on only two of the observations and thus is highly influences by extreme values.
Box Plots
Graphical Summary of the Distribution of Data -Developed from the quartiles for a data set.
Relative Frequency Distribution
It is a tabular summary of data showing the relative frequency for each bin.
Skewness
Lack of Symmetry
Nonexperimental study or observational study
Make no attempt to control the variable of interest -A survey is perhaps the most common type of observation study.
Z-Score
Measures the Relative Location of a value in the data set -Helps to determine how far a particular value is from the mean relative to the data set's standard deviation.
Correlation coefficient:
Measures the relationship between two variables.
Q3
Third quartile: or 75th percentile.
Median
Value in the middle when the data are arranged in ascendin order -middle value, for an odd number of obervations -Average of two omiddle values, for an even number of obervations.
Percentiles
Value of a Variable at which a specified percentage of observations are below that value
Mode
Value that occurs most frequently in a data set
Quartiles
When the data is divided into four equal parts:Each part contains approximately 25% of the observations-Division points are referred to as quartiles
Q2
second quartile: 50th percentile (median)