Business Analytics 2 ch1-3
Compute the coefficient of variation for the following sample data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
21.36%
Compute the median of the following data. 32, 41, 36, 24, 29, 30, 40, 22, 25, 37
31
What is the mode of the data set given below? 35, 47, 65, 47, 22
47
A sample of 13 adult males' heights are listed below. Find the range of the data. 70, 72, 71, 70, 69, 73, 69, 68, 70, 71, 67, 71, 74
7
Which of the following is not included in CIA triad - the framework we can use to think about how best to protect data security?
Accountability
Data-ink ratio
Data-ink refers to ink used in a table or chart that is necessary to convey the mean of the data to the audience
Data preparation/treatment of missing values
Discard observations (rows or columns) with any missing values Fill in missing entries with estimated values Apply a data-mining algorithm that can handle missing values
Measuring similarity/dissimilarity between observations
Euclidean distance to measure the similarity/dissimilarity between a pair of observations Standardizing helps remove bias due to difference in measurement units, and variable weighting allows the analyst to introduce appropriate bias based on the business text Matching coefficient Jaccard's coefficient better than matching coefficient since it does not count matching zero value for observations
Why did Dr. Li said Orange and Blue are good colors to choose for data virtualization?
Good color contrast while considering the color-blind audience
__________ is an open-source programming environment that supports big data processing through distributed storage and distributed processing on clusters of computers.
Hadoop
__________ is the most critical step of the decision-making process.
Identifying and defining the problem
K-Means Clustering
In k-means clustering, the analyst must specify the number of clusters, k. If k is not clearly established by the context of the business problem, the k-means clustering algorithm can be repeated for several values of k. The algorithm repeats this process (calculate cluster centroid, assign observation to cluster with nearest centroid) until there is no change in the clusters or a specified maximum number of iterations is reached. In general, the larger the ratio of the distance between a pair of cluster centroids and the within-cluster distance, the more distinct the clustering is for the observations in the two clusters in the pair.
Examples of Predictive Analytics
Linear regression Time-series Data mining: used to find patterns or relationships among elements of the data in a large database Simulation: the use of probability and statistics to construct a computer model to study the impact of uncertainty on a decision
To summarize and analyze data with both a cross tabulation and charting, Excel typically pairs
PivotCharts with PivotTables.
Which one of the following statements is not true concerning PivotTables in Excel?
PivotTables can be built using data arrayed in rows.
Types of data
Population, sample, observations, variables Quantitative data vs categorical data Cross-sectional data vs. time-series data Frequency distribution, histogram
Which of the following gives the proportion of items in each bin?
Relative frequency
Examples of Descriptive Analytics
Reports Descriptive Statistics Data visualization (including data dashboards) Data-mining techniques Basic what-if spreadsheet models
Bill, the manager of Columbus Café, schedules twice the number of waiters and cooks on holiday. Which of the following is the approach Bill used in his decision-making?
Rules of thumb
Types of charts
Scatter chart: presents the relationship between two quantitative variables Trendline: a line that provides an approximation of the relationship between variables Sparkline Bar charts Column charts Pie charts Bubble charts Heat map Stacked column chart Clustered column chart Scatter chart matrix PivotChart
The 4 Vs of big data
Veracity Velocity Variety Volume
Hierarchical clustering:
a bottom-up hierarchical clustering approach starts with each observation in its own cluster and then iteratively combines the two clusters that are the most similar into a single cluster
A data visualization tool that updates in real time and gives multiple outputs is called
a data dashboard
Data query
a request for information with certain characteristics from a database
MISM 3116 summer class students will be chosen to represent Turner College to attend the campus-wide modeling competition. Of the students in Turner College, MISM 3116 students are _______________________
a sample
Big data
a set of data that cannot be managed, processed or analyzed with commonly available software in a reasonable amount of time
Be familiar with how to read clustered bar charts
and how to interpret data from them
A chart that is recommended as an alternative to a pie chart is a
bar chart
Optimization models:
best decision subject to constraints of the situation, e.g. portfolio, supply network design models, price markdown models, etc.
A better understanding of consumer behavior through analytics directly leads to
better marketing strategies
The correlation coefficient will always take values
between -1 and 1
Simulation optimization:
combining the use of probability and statistics to model uncertainty with optimization techniques to find the best decisions in highly complex and highly uncertain situations
The financial dashboard on the second floor of CCT building is a type of _________ analytics.
descriptive
Descriptive analytics
encompasses the set of techniques that describes what has happened in the past
Tactical decisions are concerned with
how the organization should achieve the goals and objectives set by its strategy.
Deleting the grid lines in a table and the horizontal lines in a chart
increases the data-ink ratio
Prescriptive analytics
indicates the best course of action to take
The letter grades (A, B, C, D, F) of business analysis students are recorded by a professor. This variable's classification
is categorical data
Data-ink is the ink used in a table or chart that
is necessary to convey the meaning of the data to the audience
A disadvantage of stacked-column charts and stacked-bar charts is that
it can be difficult to perceive small differences in areas
In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as
key performance indicators (KPIs)
A time series plot is also known as a
line chart
A set of values corresponding to a set of variables is defined as a(n)
observation
Dr. Bill plans to open a donuts store nearby CSU, he collected the data from the students about their favorite flavors of donuts is an example of a(n)
observational study
Any data value with a z-score less than -3 or greater than +3 is considered to be a(n)
outlier
Predictive analytics
predicting the future or ascertaining the impact of one variable on another
Advanced analytics generally refers to
predictive and prescriptive analytics
Data-driven decision making tends to decrease a firm's
risk
Williams & Lee Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major cities of Georgia to help complement its Internet-based strategy. This activity would be categorized as a(n)
strategic decision
The decisions concerning an organization's goals and future plans are called
strategic decisions
__________ merges maps and statistics to present data collected over different geographies.
the geographic information system
Dimension reduction
the process of removing variables from the analysis without losing any crucial information
Simulation optimization helps
to find good decisions in highly complex and highly uncertain settings.
A _____________ is a line that provides an approximation of the relationship between the variables.
trendline
Centroid linkage:
uses the averaging concept of cluster centroids to define between-cluster similarity
A quantity of interest that can take on different values is known as a(n)
variable