MSIS 3223 Exam 1
variance
"average" of the squared deviations from the mean
Problem Solving with Analytics
1.Recognizing a problem 2.Defining the problem 3.Structuring the problem 4.Analyzing the problem 5.Interpreting results and making a decision 6.Implementing the solution
Basic Excel Functions
=MIN(range) =MAX(range) =SUM(range) =AVERAGE(range) =COUNT(range) - (must contain numbers) =COUNTIF(range,criteria)
histogram
A graphical depiction of a frequency distribution in the form of a column chart
Common Mathematical Operators
Addition + Subtraction - Multiplication * Division / Exponential - ^
How to copy formulas
Copy/Paste button in home tab Ctrl C/V Drag bottom right corner of cell (fill handle) across a row or column
Combination Charts
Excel 2016 for Windows provides a Combo Chartoption for constructing such a combination chart; in Excel 2016 for Mac, it must be done manually. We can also plot a second data series on a secondary axis; this is particularly useful when the scales differ greatly.
Column and Bar Charts
Excel distinguishes between vertical and horizontal bar charts, calling the former column charts and the latter bar charts. Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values, and for showing proportions or percentages of a whole.
Locating a particular function in Excel
Formulas - function library
How to create PivotTable
Insert > Tables > PivotTable
Line Charts
Line charts provide a useful means for displaying data over time.
Key measures in dispersion:
Range Interquartile range Variance Standard deviation
Identifying Outliers
There is no standard definition of what constitutes an outlier.
Descriptive Analytics
Use of data to understand past/current business performance and make informed decisions. Provides insight into the past. "What has happened?"
Absolute Addressing
Using a $ before either the row, column, or both. This makes it to where it does not change when copied or filed. Use this to keep a row or column constant. Example: $A$2, $C5, D$10
Relative Addressing
Using just the row/column label in cell references Example: A2, C5, D10
Data set
a collection of data
Database
a collection of related files containing records on people, places, or things
Decision Models
a logical or mathematical representation of a problem or business situation that can be used to understand, analyze, or facilitate making a decision. Inputs: -Data-assumed to be constant -Uncontrollable inputs - quantities that can change but cannot be controlled- -Decision options -controllable and selected at the discretion of the decision maker
Sample
a subset of the population
Metric
a unit of measurement that provides a way to objectively quantify performance
Cross-Tabulations
abular method that displays the number of observations in a data set for different subcategories of two categorical variables.
Population
all items of interest for a particular decision or investigation
PivotTables
allows you to create custom summaries and charts of key information in the data
Stock Charts
allows you to plot stock prices, such as daily high, low, and close values.
Model
an abstraction or representation of a real system, idea, or object. Captures the most important features. Can be a written or verbal description, a visual representation, a mathematical formula, or a spreadsheet.
Predictive Models
analyzes historical data assuming that the past is representative of the future. What will happen in the future
Continuous metrics
are based on a continuous scale of measurement.
Quartiles
break the data into four parts. The 25th percentile is called the first quartile The 50th percentile is called the second quartile The 75th percentile is called the third quartile The 100th percentile is the fourth quartile, Q4.
Ordinal data
can be ordered or ranked according to some relationship to one another.
Area Charts
combines the features of a pie chart with those of line charts.
Ratio data
continuous and have a natural zero.
Reliability
data are accurate and consistent.
Validity
data correctly measures what it is supposed to measure.
proportion
denoted by p, is the fraction of data that have a certain characteristic
Skewness
describes the lack of symmetry of data
Pie Charts
displays the relative proportion of each data source to the total by partitioning a circle into pie-shaped areas.
The subcategories of the variables must be mutually exclusive and exhaustive, meaning that..
each observation can be classified into only one subcategory, and, taken together over all subcategories, they must constitute the complete data set.
Descriptive Models
explain behavior and allow users to evaluate potential decisions by asking "what-if?" questions.
Optimization
finding values of decision variables that minimize (or maximize) something such as cost (or profit)
Using a $ sign before a column label (for example, $B4) keeps the reference to column B..
fixed but allows the row reference to change.
Using a $ sign before both the row and column labels (for example, $B$4) keeps the reference to cell B4...
fixed no matter where the formula is copied.
Using a $ sign before a row label (for example, B$4) keeps the reference...
fixed to row 4 but allows the column reference to change if the formula is copied to another cell.
Prescriptive Models
help decision makers identify the best solution to a decision problem. Optimization
Radar Charts
how multiple metrics on a spider web. This is a useful chart to compare survey data from one time period to another or to compare performance of different entities such as factories, companies, and so on using the same criteria.
Prescriptive analytics
identify the best alternatives to minimize or maximize some objective. "What should we do?"
You can easily identify the mode from a frequency distribution by
identifying the value or group having the largest frequency or from a histogram by identifying the highest bar.
The purpose of sampling is to obtain sufficient information to draw a valid
inference about a population.
range
is the simplest and is the difference between the maximum value and the minimum value in the data set.
Proportions are ________________________, such as defects or errors in quality control applications or consumer preferences in market research.
key descriptive statistics for categorical data
=VLOOKUP(lookup_value,table_array,col_index_num,[range_lookup])
looks up a value in the leftmost column of a table and returns a value in the same row from a column you specify
=HLOOKUP(lookup_value,table_array,row_index_num,[range lookup])
looks up a value in the top row of a table and returns a value in the same column from a row you specify.
Outliers can affect the value of the ______
mean
Net Present Value
measures the worth of a stream of cash flows, taking into account the time value of money =NPV(rate, value1, value2,...)
The ______ is most useful for data sets that contain a relatively small number of unique values.
mode
Data
numbers or textual data that are collected through some type of measurement process
Measures
numerical values associated with a metric.
mode
observation that occurs most frequently.
Discrete metric
one that is derived from counting something.
Interval data
ordinal but have constant differences between observations and have arbitrary zero points.
Median is not affected by
outliers
The range is affected by _____ and is often used only for very small data sets.
outliers
N represents the number of items in a __________ and n represents the number of observations in a __________.
population,sample
Predictive Analytics
predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time. "What could happen?"
coefficient of variation (CV)
provides a relative measure of dispersion in data relative to the mean
The median is meaningful for
ratio, interval, and ordinal data.
Dispersion
refers to the degree of variation in the data; that is, the numerical spread (or compactness) of the data.
Kurtosis
refers to the peakedness (i.e., high, narrow) or flatness (i.e., short, flat-topped) of a histogram.
Information
result of analyzing data; that is, extracting meaning from data to support evaluation and decision making
Logical Functions =AND(condition 1, condition 2...)
returns TRUE if all conditions are true and FALSE if not
Logical Functions =OR(condition 1, condition 2...)
returns TRUE if any condition is true and FALSE if not.
=CHOOSE(index_num, value1, value2, . . . )
returns a value from a list based on the position in the list, specified by index_num
=INDEX(array,row_num,col_num)
returns a value or reference of the cell at the intersection of a particular row and column in a given range.
Logical Functions =IF(condition, value if true, value if false)
returns one value if the condition is true and another if the condition is false
=MATCH(lookup_value,lookup_array,match_type)
returns the relative position of an item in an array that matches a specified value in a specified order.
Orbit Charts
scatter chart in which the points are connected in sequence, such as over time. Orbit charts show the "path" that the data take over time, often showing some unusual patterns that can provide unique insights.
Scatter Charts
show the relationship between two variables. To construct a scatter chart, we need observations that consist of pairs of variables.
Categorical (nominal) data
sorted into categories according to specified characteristics.
median
specifies the middle value when the data are arranged from least to greatest.
standard deviation
square root of the variance.
frequency distribution
table that shows the number of observations in each of several nonoverlapping groups.
Measurement
the act of obtaining data associated with a metric.
midrange
the average of the greatest and least values in the data set
Objective function
the equation that minimizes (or maximizes) the quantity of interest
For an even number of observations
the median is the mean of the two middle numbers.
For an odd number of observations
the median is the middle of the sorted numbers.
Data visualization
the process of displaying data (often in large quantities) in a meaningful fashion to provide insights that will support better decisions.
Bubble Charts
type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions.
It is common practice in statistics to use Greek letters, such as
u(mu). o(sigma), pi(pi) to represent population measures and italic letters such as by x-bar) s, and p to represent sample statistics.
kth percentile
value at or below which at least kpercent of the observations lie. The most common way to compute the kth percentile is to order the data values from smallest to largest and calculate the rank of the kth percentile using the formula: nk/100 + 0.5
Optimal solution
values of the decision variables at the minimum (or maximum) point
Dashboards
visual representation of a set of key business measures. It is derived from the analogy of an automobile's control panel, which displays speed, gasoline level, temperature, and so on.
To construct a frequency distribution...
we need only count the number of observations that appear in each category. This can be done using the COUNTIFfunction
We typically label the elements of a data set using subscripted variables,
x1, x2, and so on. xi represents the ith observation.