Data Mid-Term
What is a histogram?
A common graphical presentation of quantitative data. • Constructed by placing the variable of interest on the horizontal axis and theselected frequency measure (absolute frequency, relative frequency, or percentfrequency) on the vertical axis. • The frequency measure of each class is shown by drawing a rectangle whose base is the class limits on the horizontal axis and whose height is the corresponding frequency measure
What is a line chart?
A line connects the points in the chart.• Useful for time series data collected over a period of time (minutes,hours, days, years, etc.).
What is population in statistics?
A population is the group of all items of interest to an analytics practitioner. - Frequently very large; sometimes infinite. E.g. All 5 million Florida voters
linear scatter plot
A relationship is linear if one variable increases by approximately the same rate as the other variables changes by one unit
What is Sample in statistics?
A sample is a set of data drawn from the population. - Potentially very large, but less than the population .E.g. a sample of 765 voters exit polled on election day. - Random sampling: A sampling method to gather are presentative sample of the population data.
Which of the following descriptions best defines a frequency distribution? a. A representation of data that shows the number of observations in each of several overlapping classes b. A summary of data that shows the number (frequency) of observations in each of several overlapping classes c. A summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes, typically referred to as bins d. A tabular summary of data showing the relative frequency for each class
A summary of data that shows the number (frequency) of observations in each of several nonoverlapping classes, typically referred to as bins
Edward Tufte introduced the idea of the data-ink ratio, as a way of quantifying the proportion of "data-ink" to the total amount of ink used in a table or chart. Which of the following options would increase the data-ink ratio of a table? a. Rounding off the data to reduce excessive decimal precision b. Adding a title to the table c. Increasing the row heights and column widths for all rows and columns in the table d. Adding table borders
Adding a title to the table
What is a boxplot useful for in statistics? a. Understanding proportions of a dataset b. Interpreting correlations between variables c. Comparing different data sets. d. Uncovering relationships between different variables
Comparing different data sets.
Which of the following statistics measures the relationship between two variables? a. Mean value b. Variance c. Correlation d. Median value
Correlation
Which of the following best characterizes the use or functionality of the count if function in Excel? a. Directly calculates the percentage frequency of cells that meet multiple criteria. b. Directly calculates the percentage frequency of cells that meet a single criteria. c. Counts the number of cells that meet multiple criteria. d. Counts the number of cells that meet a single criteria.
Counts the number of cells that meet a single criteria.
Cross-Sectional Data
Cross-sectional data are collected at the same or approximately the same point in time. Example: data detailing the number of building permits issued in June 2006 in each of the counties of Ohio
What do corporate-level managers use to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen?
Data dashboards
frequency table summary
Data from a frequency table can be graphically pictured by a line graph, which plots the successive values on the horizontal axis and indicates the corresponding frequency by the height of a vertical line.
What are the types of analytics?
Descriptive Diagnostic Predictive Prescriptive
Which of the following encompasses reports, data dashboards, and descriptive statistics to describe past data?
Descriptive analytics
What type of analysis does a boxplot display? a. Distribution of data (value) b. Pairwise comparison c. Descriptive statistics (graphical) d. Mean comparison
Distribution of data (value)
What are independent events?
Events A and B are independent if (𝑃 𝐴|𝐵) = 𝑃 (𝐴) or 𝑃 (𝐵| 𝐴) = 𝑃 (𝐵)
What is the z score calculation?
For example, let's say you have a test score of 190. The test has a mean (μ) of 150 and a standard deviation (σ) of 25. Assuming a normal distribution, your z score would be: z = (x - μ) / σ = (190 - 150) / 25 = 1.6.
How does a histogram represent a frequency or relative frequency distribution? a. Audiovisually b. With a bar chart c. Tabularly d. Graphically
Graphically
Data can best be categorized based on which of the following? a. Why it is collected b. Where it is collected c. How it is collected d. When it is collected
How it is collected
What is the key benefit of using the covariance function to analyze statistical data? a. It enables clear visualizations of data. b. It decreases the risk of errors. c. It finds correlations between factors. d. It highlights a linear association between two variables.
It highlights a linear association between two variables
Which of the following descriptions defines the term variance in statistics? a. It is the average of the absolute differences between each value in a data set and the mean. b. It is the average of the differences between each value in a data set and the median. c. It is the average of the squared differences between each value in a data set and the mean. d. It is the sum of the differences between each value in a data set and the mean.
It is the average of the squared differences between each value in a data set and the mean.
Which of the following statements is true about a percentile in statistics? a. It is a measure of position in a frequency distribution. b. It is an average of two numbers. c. It is a measure of variability. d. It is the value at which a specified (approximate) percentage of observations fall below
It is the value at which a specified (approximate) percentage of observations fall below..
What is the mode in statistics? a. It is the value that appears most frequently in a data set. b. It is the value that is exactly in the middle of a data set when the values are arranged in ascending or descending order. c. It is the average of all values in a data set. d. It is the value that appears least frequently in a data set.
It is the value that appears most frequently in a data set.
Which of the following statements about z-scores is true? a. It measures the relative location of a value in the data set. b. It measures the absolute location of a value in the data set. c. It measures the correlation between two variables in the data set. d. It measures the frequency of a value in the data set.
It measures the relative location of a value in the data set.
How does a cumulative frequency distribution differ from a frequency distribution in terms of the information it presents? a. It shows comparison among the variable performance. b. It shows the number of data items with values less than or equal to the upper limit of each bin. c. It doesn't differ. d. It shows the number of data items with values greater than the upper limit of each bin.
It shows the number of data items with values less than or equal to the upper limit of each bin
Skewness is a distributional measure of: a. Correlation b. Lack of distributional symmetry c. Mean value d. Median value
Lack of distributional symmetry
Sorting and filtering data facilitates the identification of: a. Shape b. Pattern c. Lowest Value d. Highest value
Pattern
Quantitative Data
Quantitative data indicate how many or how much: Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data
To better understand the relationship between advertising dollars spent and the subsequent sales, one could create what type of chart? a. Scatter chart b. Line chart c. Clustered column (bar) chart d. Pie chart
Scatter chart
What is skewness on a graph?
Skewness is demonstrated on a bell curve when data points are not distributed symmetrically to the left and right sides of the median on a bell curve.
What type of correlation does a correlation coefficient of -o.93 indicate? a. Weak negative correlation b. No correlation c. Strong negative correlation d. Weak positive correlation
Strong negative correlation
What does a relative frequency distribution represent? a. Distribution pattern b. Average c. Tabular summary d. Frequency
Tabular summary
How does a cumulative frequency distribution provide a summary of quantitative data? a. Tabularly b. Numerically c. With a bar chart d. Graphically
Tabularly
How is the geometric mean calculated as a measure of location? a. The nth root of the sum of n values b. The average of the product of n values divided by n c. The nth root of the product of n values d. The average of the sum of n values divided by n
The nth root of the product of n values
What is a regression line when concerning a scatter plot
The regression line is a trend line we use to model a linear trend that we see in a scatterplot,
How can relative frequency be calculated?
The relative frequency is calculated by dividing the absolute frequency by the total number of values for the variable
How many steps are necessary to define the classes for a frequency distribution with quantitative data? a. Three b. Two c. Four d. Five
Three
Time Series Data
Time series data are collected over several time periods.• Example: data detailing the number of building permits issued in Lucas County, Ohio in each of the last 36 months
Which of the following types of graphs is useful for visualizing hierarchical data along multiple dimensions? a. Trendline b. Parallel coordinates plot c. Treemap d. Heat map
Treemap
Nominal
Values are the arbitrary numbers that represent categories.Only calculations based on the frequencies of occurrence are valid.Data may not be treated as ordinal or interval.
ordinal
Values must represent the ranked order of the data. Calculations based on an ordering process are valid.Data may be treated as nominal but not as interval
Which of the following statements about conditional formatting in Excel is true? a. Cells with negative values are shaded in blue, while cells with positive values are shaded in red. b. Cells with negative values are shaded in green, and those with positive values are shaded in yellow. c. When using Data Bars,negative values are shown to the right side of the axis, and positive values are shown to the left. d. When using Data Bars, negative values are shown to the left side of the axis, and positive values are shown to the right.
When using Data Bars, negative values are shown to the left side of the axis, and positive values are shown to the right.
Which of the following can be used to identify outliers? a. Median b. Standard deviation c. Z-score d. Geometric mean
Z-score
What is a frequency table
a "t-chart" or two-column table which outlines the various possible outcomes and the associated frequencies observed in a sample.
What are prescriptive analytics?
a form of business analytics which suggests decision options for how to take advantage of a future opportunity or mitigate a future risk, and shows the implication of each decision option
positive scatter plot
a graph that shows some data points that trend up from left to right in a linear fashion.
negative scatter plot
a graph that shows that all of the data points are in a pattern trending down from left to right
Firms guided by data-driven decision making tend to have
a higher market value.
What are the three types of relations on a scatter plot?
a linear or non-linear relationship, a positive (direct) or negative (inverse) relationship
What is standard deviation?
a measure of how dispersed the data is in relation to the mean
What is skewness?
a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean
non-linear scatter plot
a relationship whose scatter plot does not resemble a straight line
What is kurtosis in graphs?
a statistical measure used to describe the distribution of observed data around the mean
What is a dashboard?
a summary of different, but related data sets, presented in a way that makes the related information easier to understand.
What is a column chart?
a technique for data visualization where categories are represented in the form of vertical columns
what is uniform distribution?
a type of probability distribution in which all outcomes are equally likely.
What is a scatter chart?
a useful graph for analyzing the relationship between two variables. • The scatter chart also suggests that a straight line could be used as an approximation for the relationship between high temperature and sales of bottled water
What is geometric mean?
an average that multiplies all values and finds a root of the number
What are outliers?
an extremely high or extremely low data point relative to the nearest data point and the rest of the neighboring co-existing values in a data graph or dataset you're working with
What is a pivot table?
an interactive way to quickly summarize large amounts of data
What is considered an outlier?
any number less than -3 or greater than 3
What is mean?
average
A scatter chart is best defined as a useful graph for analyzing the relationship a. among the top ten variables. b. among three variables. c. among all listed variables. d. between two variables.
between two variables.
A frequency polygon is useful because it compares distributions of a. highest values. b. qualitative variables. c. quantitative variables. d. hypothetical variables.
c. quantitative variables.
What is empirical rule?
can be used to determine the percentage of data values that are within a specified number of standard deviations of the mean
when do you use continuous in quantitive data?
continuous, if measuring how much
Professional sports teams use analytics to
decide how much to offer players in contract negotiations.
What is relative frequency?
describes the number of times a particular value for a variable (data item) has been observed to occur in relation to the total number of values for that variable. -can be shown in a histograph
When do you discrete in quantitive data?
discrete, if measuring how many
The tools of business analytics are useful for all of the following except
enabling firms to eliminate all risk in decision making
What are diagnostic analytics?
examines data to understand the root causes of events, behaviors, and outcomes
Qualitative Data
-Labels or names used to identify an attribute of each element -Often referred to as categorical data -Use either the nominal or ordinal scale of measurement -Can be either numeric or non numeric -Appropriate statistical analyses are rather limited
Types of data
-cross-sectional -time-series -qualitative -experimental -observational -discrete -continuous -interval -nominal -ordinal
What is a normal distribution curve?
-data are symmetrically distributed with no skew -plotted along a horizontal axis labeled, Mean, which ranges from negative 3 to 3 in increments of 1
Which of the following statements comparing cumulative relative frequency distribution with cumulative percent frequency distribution is true? a. A cumulative relative frequency distribution shows the percentage of data items with values less than or equal to the upper limit of each bin, while a cumulative percent frequency distribution shows the proportion of data items. b. A cumulative relative frequency distribution shows the proportion of data items, while a cumulative percent frequency distribution shows the percentage of data items with values less than or equal to the upper limit of each bin. c. Both types of frequency distributions show the proportion of data items. d. Both types of frequency distributions show the percentage of data items with values less than or equal to the upper limit of each bin.
. A cumulative relative frequency distribution shows the proportion of data items, while a cumulative percent frequency distribution shows the percentage of data items with values less than or equal to the upper limit of each bin.
Tactical decisions are concerned with
how the organization should achieve the goals and objectives set by its strategy.
What is multiplication law?
is based on the definition of conditional probability, and it provides a way to compute the probability of the intersection of two events **look at the equation
What is the union of events?
is the event containing all outcomes that are in𝐴 and 𝐵 or both
What is percentile?
is the value of a variable at which a specified (approximate)percentage of observations are below that value. • The pth percentile tells us the point in the data where: • Approximately p percent of the observations have values less than the pth percentile. • Approximately (100- p) percent of the observations have values greater than the pth percentile
What is IQR?
it is the interquartile range- The difference between the third and first quartile **look at formula
An outlier in statistics is a value that a. lies far outside a range of numbers. b. within a range of numbers. c. between two numbers. d. at the end of a distribution.
lies far outside a range of numbers.
What is a correlation coefficient?
measures the relationship between two variables. • Not affected by the units of measurement for x and y. **look at formula
What is a z-score?
measures the relative location of a value in the data set.• Helps to determine how far a particular value is from the mean relative to the data set's standard deviation
What are descriptive analytics?
methods of organizing, summarizing, and presenting data in a convenient and informative way.
What is median?
middle value
observational studies/ data
no attempt is made to control or influence the variables of interest
Standard deviation is best defined as a a. positive square root of the variance. b. measure of the spread of a data set, calculated as the square root of the mean. c. measure of the center of a data set, calculated as the middle value when the values are arranged in ascending or descending order. d. measure of the average of a data set, calculated as the sum of the values divided by the number of values.
positive square root of the variance.
What is addition law?
provides a way to compute the probability of the union of two events ** look at the equation
Data on which arithmetic operations can be performed are considered to be a. categorical. b. cross-sectional. c. qualitative. d. quantitative.
quantitative
What is the probability of an event?
s equal to the sum of probabilities of outcomes for the event **there is an equation
A type of modeling that combines the use of probability and statistics, to model uncertainty with optimization techniques, to find good decisions in highly complex and highly uncertain settings is called
simulation optimization.
Bins are formed in a frequency distribution by a. converting observational data. b. averaging data. c. specifying the ranges used to group the data. d. specifying group data then average with standard one. Hide Feedback
specifying the ranges used to group the data.
Picks and Axes Inc. is an Internet-based retail seller of hiking boots and mountaineering gear. The company decides to open retail stores across the major areas of the city to help complement its Internet-based sales. This activity would be categorized as a(n)
strategic decision.
The decisions concerning an organization's goals and future plans are called
strategic decisions.
Increasing the "white space" in a table by removing unnecessary lines increases all of the following except the a. data-ink ratio. b. simplicity in conveying information to the reader. c. table's size. d. table's readability.
table's size
What is the intersection of events?
the event containing all outcomes belonging to both 𝐴 and 𝐵
What is coefficient of variation?
the ratio of the standard deviation to the mean expressed as a percentage
What is expected value?
the value that is most likely the result of the next repeated trial of a statistical experiment.
What is a quartile?
the values that divide the dataset into equal fourths
Experimental studies/ data
the variables of interest are first identified. Then one or more factors are controlled so that data can be obtained about how the factors influence the variable
What is often called a standardized value?
the z-score
Simulation optimization helps
to find good decisions in highly complex and highly uncertain settings.
A bar chart is a graphical presentation that a. updates in real time and gives multiple outputs. b. indicates the trend of data but not magnitude. c. has two axes that represent two variables, and the magnitude of the third variable is given by the size of the bubble. d. uses horizontal bars to display the magnitude of quantitative data.
uses horizontal bars to display the magnitude of quantitative data.
What are predictive analytics?
uses techniques that extract information from data and use it to predict future trends and identify behavioral patterns
interval data
values are real numbers.All calculations are valid.Data may be treated as ordinal or nominal
What are mutually exclusive events?
when they have no out comes in common