Ali Bagheri MSIS 3223 Exam 1
Metric
a unit of measurement that provides a way to objectively quantify performance.
Optimal solution
values of the decision variables at the minimum (or maximum) point
frequency distribution
we need only count the number of observations that appear in each category. This can be done using the Excel COUNT IF function is a table that shows the number of observations in each of several nonoverlapping groups. A graphical depiction of a frequency distribution in the form of a column chart is called a histogram
Absolute references
$A$2, $C5, D$10
Data set
- a collection of data -Examples: Marketing survey responses, a table of historical stock prices, and a collection of measurements of dimensions of a manufactured item
Database
- a collection of related files containing records on people, places, or things -A database file is usually organized in a two-dimensional table, where the columns correspond to each individual element of data (called fields, or attributes), and the rows represent records of related data elements.
Column and Bar Charts
-A clustered column chart compares values across categories using vertical rectangles; -a stacked column chart displays the contribution of each value to the total by stacking the rectangles; -a 100% stacked column chart compares the percentage that each value contributes to a total. •Column and bar charts are useful for comparing categorical or ordinal data, for illustrating differences between sets of values, and for showing proportions or percentages of a whole.
Commercial software
-IBM Cognos Express -SAS Analytics -Tableau
Tableau vs. Excel: What are the differences?
-Many people are uncertain about the difference between Tableau and spreadsheet tools like Excel. And that's a reasonable doubt until we point out they serve different purposes. Using Tableau doesn't necessarily mean you can forget about Excel and vice versa. -While Excel is not as powerful or intuitive as Tableau when it comes to data visualization, Tableau is not optimal when you would like to use it as a data creation tool.
Advantages of Using Tableau (cont.)
-Tableau can help enterprises futuristically to: ▪Analyze data without any specific goals in mind. ▪Explore visualizations and have a look at the same data from different angles. ▪Frame 'what if' queries and work with data by hypothetically visualizing it in a different manner and dynamically adding components for comparison and analysis..
Spreadsheets
-Widely used -Effective for manipulating data and developing and solving models -Support powerful commercial add-ons -Facilitate communication of results
•Common mathematical operators are used
-addition (+) -subtraction (-) -multiplication (*) -division (/) -exponentiation (^)
Assumptions are made to
-simplify a model and make it more tractable; that is, able to be easily analyzed or solved. -better characterize historical data or past observations. •The task of the modeler is to select or build an appropriate model that best represents the behavior of the real situation.
IF Function
=IF(condition, value if true, value if false)•Conditions may include the following: = equal > greater than < less than <> not equal to >= greater than or equal to <= less than or equal to
Lookup Functions for Database Queries
=VLOOKUP - looks up a value in the leftmost column of a table and returns a value in the same row from a column you specify =HLOOKUP - looks up a value in the top row of a table and returns a value in the same column from a row you specify. =INDEX -returns a value or reference of the cell at the intersection of a particular row and column in a given range. =MATCH - returns the relative position of an item in an array that matches a specified value in a specified order. =CHOOSE(index_num, value1, value2, . . . ) - returns a value from a list based on the position in the list, specified by index_num .
Bubble Charts
A bubble chart is a type of scatter chart in which the size of the data marker corresponds to the value of a third variable; consequently, it is a way to plot three variables in two dimensions
Relative references
A2, C5, D10
Pareto Analysis
An Italian economist, Vilfredo Pareto, observed in 1906 that a large proportion of the wealth in Italy was owned by a small proportion of the people. Similarly, businesses often find that a large proportion of sales come from a small proportion of customers, the "80-20 rule." A Pareto analysis involves sorting data and calculating cumulative proportions.
Form Controls
Buttons, boxes, and other mechanisms for inputting or changing data on spreadsheets easily that can be used to design user-friendly spreadsheets -Spin button—a button used to increase or decrease a numerical value -Scroll bar—a slider used to change a numerical value -Check box—a box used to select or deselect a scenario -Option button—a radio button used to select an option -List box—a box that provides a list of options -Combo box—a box that provides an expandable list of options -Group box—a box that can hold a group of controls
MS Excel
Excel is basically a spreadsheet for working with data in rows and columns. You need to first represent your data into a tabular format and then you can apply visualizations on top of it. When it comes to Excel, you need to have a prior knowledge of the insight that you want and then work with various formulas or functions in order to get there, along with that tabulation is also needed. In Excel, you need to have some programming in order to come up with real-time data visualization.
Advantages of Using Tableau
Fantastic Visualizations-You can now work with a lot of data that doesn't have any order to it and create a range of visualizations. Well, thanks to the in-built features of Tableau which help you create visualizations that stand out from the crowd. You also have the option of switching between different visualizations to bring about a greater context, ways of drilling down data, and exploring the data at a minute level.
Line Charts
Line charts provide a useful means for displaying data over time -You may plot multiple data series in line charts; however, they can be difficult to interpret if the magnitude of the data values differs greatly. In that case, it would be advisable to create separate charts for each data series.
Working with Disparate Data Sources
Tableau has a powerful reason to be included by various organizations in today's data -driven world where data can come from any point and any disparate sources. -Tableau has an edge over other Business Intelligence and Analytics tools as it lets you work by connecting to various data sources, data warehouses, and files which exist in the cloud, big data that exists in spreadsheets, and non-relational data, among other types of data. -Tableau effortlessly blends all different types of data to help organizations come up with compelling visualizations.
User-friendly Approach
This is the greatest strength of Tableau. It is built from the ground level for people who don't have any technical skills or coding experience. So, everything can be done with this tool by anybody without any prior set of skills. Since most of the features are in a drag-and-drop format, each visualization is so intuitive and self-depicting.
Nesting IF Functions
You may "nest" up to seven IF functions by replacing value if true or value if false in an IF function with another IF function: =IF(A8= 2,(IF(B3= 5, "YES", "")),15)
Decision Model
a logical or mathematical representation of a problem or business situation that can be used to understand, analyze, or facilitate making a decision
Population
all items of interest for a particular decision or investigation -all married drivers over 25 years old -all subscribers to Netflix
stock chart
allows you to plot stock prices, such as daily high, low, and close values
Excel table
allows you to use table references to perform basic calculations. Select the data range including headers. Next, click Table from the Tables group on the Insert tab and make sure that the box for My Table Has Headers is checked. •The table name (default: Table1),can be found (and changed) in the Properties group of the Table Tools Design tab in Windows or in the Table tab on a Mac.
Model
an abstraction or representation of a real system, idea, or object. -Captures the most important features -Can be a written or verbal description, a visual representation, a mathematical formula, or a spreadsheet.
SUMIF Sum range
an optional argument that allows you to add cells in a different range.
Continuous metrics
are based on a continuous scale of measurement. Any metrics involving dollars, length, time, volume, or weight, for example, are continuous
Sparklines
are graphics that summarize a row or column of data in a single cell
Data analytics
are providing benefits to businesses from the beginning of a startup, facilitating expansive growth, and exploding them into large, formidable companies.
Slicers
are tools for drilling down to "slice" a PivotTable and display a subset of data •To create a slicer for any of the columns in the database, click on the PivotTable and choose Insert Slicer from the Analyze tab in the PivotTable Tools ribbon in Windows, or from the PivotTable Analyze tab on a Mac.
Statistics
as defined by David Hand, past president of the Royal Statistical Society in the UK, is both the science of uncertainty and the technology of extracting information from data. Statistics involves collecting, organizing, analyzing, interpreting, and presenting data. A statistic is a summary measure of data.
SUM IF, AVERAGE IF, SUM IFS, and AVERAGE IFS
can be used to embed IF logic within mathematical functions.
Quartiles
break the data into four parts. -The 25th percentile is called the first quartile,Q1; -the 50th percentile is called the second quartile, Q2; -the 75th percentile is called the third quartile, Q3; and -the 100th percentile is the fourth quartile, Q4. •One-fourth of the data fall below the first quartile, one-half are below the second quartile, and three-fourths are below the third quartile. Excel function QUARTILE.INC(array, quart), where array specifies the range of the data and quart is a whole number between 1 and 4, designating the desired quartile
Ordinal data
can be ordered or ranked according to some relationship to one another
Tabular data
can be used to determine exactly how many units of a certain product were sold in a particular month, or to compare one month to another.
PivotTables
can be used to quickly create cross-tabulations and to drill down into a large set of data in numerous ways. allows you to create custom summaries and charts of key information in the data
standardized value
commonly called a z-score, provides a relative measure of the distance an observation is from the mean, which is independent of the units of measurement.
Ratio data
continuous and have a natural zero
Reliability
data are accurate and consistent
Validity
data correctly measures what it is supposed to measure.
Skewness
describes the lack of symmetry of data. Distributions that tail off to the right are called positively skewed; those that tail off to the left are said to be negatively skewed.
Merchandising
determining brands to buy, quantities, and allocations
Supply Chain Design
determining the best sourcing and transportation options and finding the best delivery routes
Staffing
ensuring appropriate staffing levels and capabilities, and hiring the right people
Descriptive models
explain behavior and allow users to evaluate potential decisions by asking "what-if?" questions.
Location
finding the best location for bank branches and ATMs, or where to service industrial equipment
Optimization
finding values of decision variables that minimize (or maximize) something such as cost (or profit)
Predictive models
focus on what will happen in the future. •Many predictive models are developed by analyzing historical data and assuming that the past is representative of the future.
Prescriptive models
help decision makers identify the best solution to a decision problem.
Prescriptive analytics
identify the best alternatives to minimize or maximize some objective
Customer segmentation
identifying and targeting key customer groups in retail, insurance, and credit card industries
range name
is a descriptive label assigned to a cell or range of cells. There are several ways to create range names in Excel -Name box -Create from Selection -Define Name
Correlation
is a measure of the linear relationship between two variables, X and Y, which does not depend on the units of measurement.
Statistical Thinking
is a philosophy of learning and action for improvement, based on principles that: all work occurs in a system of interconnected processes variation exists in all processes better performance results from understanding and reducing variation
Tableau
is a powerful and quickly growing data visualization tool used in the Business Intelligence Industry. It helps in simplifying raw data into a very easily understandable format. Data analysis is very fast with Tableau and the visualizations created are in the form of dashboards and worksheets.
Orbit Charts
is a scatter chart in which the points are connected in sequence, such as over time. Orbit charts show the "path" that the data take over time, often showing some unusual patterns that can provide unique insights. -Create a scatter chart with smooth lines and markers.
cross-tabulation
is a tabular method that displays the number of observations in a data set for different subcategories of two categorical variables. A cross-tabulation table is often called a contingency table The subcategories of the variables must be mutually exclusive and exhaustive, meaning that each observation can be classified into only one subcategory, and, taken together over all subcategories, they must constitute the complete data set.
relative frequency distribution
is a tabular summary of the relative frequencies of all categories
kth percentile
is a value at or below which at least k percent of the observations lie. The most common way to compute the kth percentile is to order the data values from smallest to largest and calculate the rank of the kth percentile using the formula: nk/100+.05 The Excel function PERCENTILE.INC(array,l) computes the kth percentile of data in the range specified in the array field, where k is in the range 0 to 1, inclusive (i.e., including 0 and 1).
Risk
is associated with the consequences of what actually happens.
Filtering
is finding a subset of records that meet certain characteristics. Excel provides two filtering tools: -AutoFilter for simple criteria -Advanced Filter for more complex criteria.
Business Analytics
is s the use of:•data, •information technology, •statistical analysis, •quantitative methods, and •mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions.
midrange
is the average of the greatest and least values in the data set Caution must be exercised when using the midrange because extreme values easily distort the result. This is because the midrange uses only two pieces of data, whereas the mean uses all the data; thus, it is usually a much rougher estimate than the mean and is often used for only small sample sizes.
Relative frequency
is the fraction, or proportion, of the total.
mode
is the observation that occurs most frequently. The mode is most useful for data sets that contain a relatively small number of unique values. You can easily identify the mode from a frequency distribution by identifying the value or group having the largest frequency or from a histogram by identifying the highest bar.
range
is the simplest and is the difference between the maximum value and the minimum value in the data set. •In Excel, compute as =MAX(data range) − MIN(data range). The range is affected by outliers, and is often used only for very small data sets.
Challenges
lack of understanding of how to use analytics, competing business priorities, insufficient analytical skills, difficulty in getting good data and sharing information, and not understanding the benefits versus perceived costs of analytics studies.
Net Present Value (or discounted cash flow)
measures the worth of a stream of cash flows, taking into account the time value of money.
Categorical variables
naturally define the groups in a frequency distribution.
Data
numbers or textual data that are collected through some type of measurement process
Measures
numerical values associated with a metric.
Discrete metric
one that is derived from counting something. -For example, a delivery is either on time or not; an order is complete or incomplete; or an invoice can have one, two, three, or any number of errors. Some discrete metrics would be the proportion of on-time deliveries; the number of incomplete orders each day, and the number of errors per invoice.
interquartile range (I Q R)
or the midspread is the difference between the first and third quartiles,
Interval data
ordinal but have constant differences between observations and have arbitrary zero points
Predictive analytics
predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time.
Icon sets
provide similar information using various symbols such as arrows or stoplight colors
coefficient of variation (C V)
provides a relative measure of dispersion in data relative to the mean: -Sometimes expressed as a percentage. -Provides a relative measure of risk to return.
Advanced Filter
provides a way of explicitly defining criteria by which to filter a database. •First, copy the headers from the database to an open location in the worksheet. Under the headers, specify the criteria that you want to use to filter the data. -Multiple criteria in the same row are logically joined by "and," while criteria in rows are joined by "or."
Benefits
reduced costs, better risk management, faster decisions, better productivity and enhanced bottom-line performance such as profitability and customer satisfaction.
Big data
refers to massive amounts of business data (volume) from a wide variety of sources (variety), much of which is available in real time (velocity), and much of which is uncertain or unpredictable (veracity).
Descriptive statistics
refers to methods of describing and summarizing data using tabular, visual, and quantitative techniques.
Dispersion
refers to the degree of variation in the data; that is, the numerical spread (or compactness) of the data. •Key measures: -Range -Interquartile range -Variance -Standard deviation
cumulative relative frequency
represents the proportion of the total number of observations that fall at or below the upper limit of each group.
Information
result of analyzing data; that is, extracting meaning from data to support evaluation and decision making
Uncertainty
s imperfect knowledge of what will happen in the future.
Health care
scheduling operating rooms to improve utilization, improving patient flow and waiting times, purchasing supplies, and predicting health risk factors
Pricing
setting prices for consumer and industrial goods, government contracts, and maintenance contracts
Radar charts
show multiple metrics on a spider web. This is a useful chart to compare survey data from one time period to another or to compare performance of different entities such as factories, companies, and so on using the same criteria
Categorical (nominal) data
sorted into categories according to specified characteristics
Database functions
start with a "D" (for example, DSUM, DAVERAGE, DCOUNT) and allow you to specify criteria that limit the calculations to a subset of records using the same format as the Advanced Filter. For example, the syntax for the DSUM function is DSUM(database, field, criteria). Database is the range that includes the column labels; field is the column name that contains the values to sum, enclosed in quotation marks, or a reference to the column name; and criteria is the range that specifies the records you want to sum.
Measurement
the act of obtaining data associated with a metric.
Objective function
the equation that minimizes (or maximizes) the quantity of interest
Data visualization
the process of displaying data (often in large quantities) in a meaningful fashion to provide insights that will support better decisions -Data visualization improves decision-making, provides managers with better analysis capabilities that reduce reliance on I T professionals, and improves collaboration and information sharing.
Descriptive analytics
the use of data to understand past and current business performance and make informed decisions
Error Values
•#DIV/0!—A formula is trying to divide by zero. •#N/A—"Not available," meaning that the formula could not return a result. •#NAME?—An invalid name is used in a formula. •#NUM!—An invalid argument is used in a function, such as a negative number in SQRT. •#REF!—A formula contains an invalid cell reference. •#VALUE!—Excel cannot compute a function because of an invalid argument.
Logical Functions
•=IF(condition, value if true, value if false) - returns one value if the condition is true and another if the condition is false, =AND(condition1, condition2,...) - returns TRUEif all conditions are true and FALSE if not, =OR(condition1, condition2,...) - returns TRUE ifany condition is true and FALSE if not.
Pie Charts
•A pie chart displays the relative proportion of each data source to the total by partitioning a circle into pie-shaped areas.
Area Charts
•An area chart combines the features of a pie chart with those of line charts. -Area charts present more information than pie or line charts alone but may clutter the observer's mind with too many details if too many data series are used; thus, they should be used with care.
Tableau
•Capabilities of Tableau include: -Transforming data into actionable insights. -Exploring with limitless visual analytics. -Building dashboards and performing ad hoc analyses in just a few clicks. -Sharing your work with anyone and making an impact on your business. Tableau is basically a data visualization tool which provides pictorial and graphical representations of data. In Tableau, you can gain insights that you never thought possible. You can play with interactive visualizations, deploy data drilling tools, and explore various data that is available, and you don't need to have any specific knowledge of the insight you are looking for. With Tableau, it is all about an easy and interactive approach.
Paste Special
•Copy the range of cells of interest and click on the cell where you want to paste the results. Examples: -To paste only the values in cells (not the formulas), select Values and then click OK. -To transpose data in columns to rows and vice versa, use Transpose.
Other Excel Data Visualization Tools
•Data bars •Color scales •Icon sets •Sparklines
IFERROR Function
•IFERROR(value, value_if_error) displays a specific value if an error is present. •For example, if you are computing A1/B1, then if B1 is zero or blank, then = IFERROR(A1/B1, " ") will display a blank cell instead of #DIV/0!;
Geographic Data
•Many applications of business analytics involve geographic data. Visualizing geographic data can highlight key data relationships, identify trends, and uncover business opportunities. In addition, it can often help to spot data errors and help end users understand solutions, thus increasing the likelihood of acceptance of decision models. •Companies like Nike use geographic data and information systems for visualizing where products are being distributed and how that relates to demographic and sales information. This information is vital to marketing strategies. •Excel 2016 for Windows includes a geographic visualization tool called 3D Maps. Another excellent option is Tableau, which is described in the Appendix to this chapter.
Percentage Views
•Percent of the grand total, Percent of row total, Percent of column total, and other options In the Value Field Settings dialog, click the tab Show Values As. Select the option in the dropdown box. The % of Row Total is shown
Scatter Charts
•Scatter charts show the relationship between two variables. To construct a scatter chart, we need observations that consist of pairs of variables.
Data Validation Tools
•Select the cell range for which data validation will be applied, and then choose Data Validation from the Data Tools group on the Data tab in Excel 2016, or select Validation from the Data menu on the a Mac.
ISNUMBER Function
•This function returns TRUE if the cell or range contains a number; otherwise, it returns FALSE.
Concatenation
•To concatenate means to join. In many applications, you might wish to take text data that are in different columns and join them together (for example, first and last names). The Excel function CONCATENATE(text1, text2, . . . , text30) can be used to join up to 30 text strings into a single string. -Suppose that cell A1 contains the last name Smith, and cell B1 contains the first name John. Then CONCATENATE(B1, " ", A1) will result in the text string John Smith.
Sample
•a subset of the population -a list of individuals who rented a comedy from Netflix in the past year The purpose of sampling is to obtain sufficient information to draw a valid inference about a population.
proportion
•denoted by p, is the fraction of data that have a certain characteristic. •Proportions are key descriptive statistics for categorical data, such as defects or errors in quality control applications or consumer preferences in market research.
Data bars
•display colored bars that are scaled to the magnitude of the data values (similar to a bar chart) but placed directly within the cells of a range. -Highlight the data in each column, click the Conditional Formatting button in the Styles group within the Home tab, select Data Bars, and choose the fill option and color.
dashboard
•is a visual representation of a set of key business measures. It is derived from the analogy of an automobile's control panel, which displays speed, gasoline level, temperature, and so on. -Dashboards provide important summaries of key business information to help manage a business process or function.
standard deviation
•is the square root of the variance. -Note that the dimension of the variance is the square of the dimension of the observations, whereas the dimension of the standard deviation is the same as the data. This makes the standard deviation more practical to use in applications.
Excel has three types of sparklines
•line, column, and win/loss. -Line sparklines are clearly useful for time-series data. -Column sparklines are more appropriate for categorical data. -Win-loss sparklines are useful for data that move up or down over time.
Kurtosis
•refers to the peakedness (i.e., high, narrow) or flatness (i.e., short, flat-topped) of a histogram. •The coefficient of kurtosis (C K) measures the degree of kurtosis of a population -C K < 3 indicates the data is somewhat flat with a wide degree of dispersion. -C K > 3 indicates the data is somewhat peaked with less dispersion.
median
•specifies the middle value when the data are arranged from least to greatest. -Half the data are below the median, and half the data are above it. -For an odd number of observations, the median is the middle of the sorted numbers. -For an even number of observations, the median is the mean of the two middle numbers.