Bus analytics
The average weight of a bag of coffee is 460 grams with a standard deviation of 30 grams. You weigh a bag from the shelf at 430 grams. Is this weight an indication of an outlier that would merit further investigation?
430-460/30---->30/30 =1
(TRUE or FALSE) The data average is 7500 with a standard deviation of 400. You observe a value of 6000. This observed value is an outlier USE ZSCORE
6000-7500/400---->-1500/400=-3.75, True its an outlier
What are bins?
A bin—sometimes called a class interval—is a way of sorting data in a histogram. It's very similar to the idea of putting data into categories.
• Box plot
A graph that displays the highest and lowest quarters of data as whiskers, the middle two quarters of the data as a box, and the median
Frequency distribution & histograms • What are they?
A histogram is a plot that lets you discover, and show, the underlying frequency distribution (shape) of a set of continuous data.
What is random sampling?
A sampling method is a procedure for selecting sample elements from a population. Simple random sampling refers to a sampling method that has the following properties.
______________ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another. a. Predictive b. Descriptive c. Simulation d. Prescriptive e. None of the above
A, Predictive
A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of _______________. a. predictive analytics b. decision analysis c. prescriptive analytics d. descriptive analytics
A, Predictive analytics
A _______________ decision involves higher-level issues and is concerned with the overall direction of the organization, defining the overarching goals and aspirations for the organization's future. a. strategic b. tactical c. intuitive d. operational
A, Strategic
The ________________ is a point estimate of the population mean for the variable of interest. a. Sample mean b. Median c. Sample d. Geometric mean
A. sample mean
Select the correct response. a. The median is the same as the 50th percentile. b. The 25th percentile is the same as the first quartile. c. The 75th percentile score on a test means that 25% of the test takers did better. d. According to the Empirical Rule, in a normal distribution almost all of the data values will be within ± 3 standard deviations from the mean. e. All of the above are correct
A.yes B.yes C.yes D.yes E.correct
What is business analytics?
Applying the scientific process to data in order to make better decisions Fact-based decision making
The 4-V that indicates the quality of the data is ________. a. variety b. veracity c. volume d. velocity
B. Veracity
A _____________________ determines how far a particular value is from the mean relative to the data set's standard deviation. a. coefficient of variation b. z-score c. Variance d. percentile
B. Z score
A _____________________ determines how far a particular value is from the mean relative to the data set's standard deviation. a. coefficient of variation b. z-score c. Variance d. percentile
B. Zscore
With Big data, what's the difference between long, wide, and streaming data?
Big data: A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time Long-In the long format, each row is one time point per subject. So each subject (county) will have data in multiple rows. Any variables that don't change across time will have the same value in all the rows. wide- In the wide format, a subject's repeated responses will be in a single row, and each response is in a separate column. streaming data- The data on which processing is done is the data in motion. Big data streaming is ideally a speed-focused approach wherein a continuous stream of data is processed.
The correlation coefficient will always take values ______. a. greater than 0 b. between -1 and 0 c. between -1 and +1 d. less than -1
C between -1 and 1
The variance is based on the _______________. a. deviation about the median b. number of variables c. deviation about the mean d. correlation in the data
C. deviation about the mean
The correlation coefficient is a ______________. a. Measure of central tendency b. Measure of variability c. Measure of association d. None of the above
C. measure of association
Analyzing these data would require understanding the _________ of the data. a. variety b. veracity c. volume d. velocity
D. Velocity
The simplest measure of variability is the __________________. a. Variance b. standard deviation c. coefficient of variation d. range
D.range
Sorting
Data sorting is any process that involves arranging the data into some meaningful order to make it easier to understand, analyze or visualize.
What's the difference between descriptive, predictive, and prescriptive analytics?
Descriptive-Encompasses the set of techniques that describes what has happened in the past; predictive-Use models calibrated on past data to predict the future or ascertain the impact of one variable on another. Prescriptive-Indicates a best course of action to take.
conditional formatting
Highlights worksheet data by changing the look of cells that meet a specified condition
3 main categories of descriptive statistics
Measures of location-summarize a list of numbers by a "typical" value. The three most common measures of location are the mean, the median, and the mode. • Measures of variability- The most common measures of variability are the range, the interquartile range (IQR), variance, and standard deviation. • Measures of association-in statistics, any of various factors or coefficients used to quantify a relationship between two or more variables.
Sample vs. Population
Population: entire group of people about which information is wanted (e.g. American adults). Sample: a part or subset of the population that is used to gain information about the whole population.
Variable
Quantitative and qualitative
Different types of variables • Quantitative
Quantitative variables are numerical. They represent a measurable quantity.
What are they used for?
This allows the inspection of the data for its underlying distribution (e.g., normal distribution), outliers, skewness, etc.
filtering
This means the data sets are refined into simply what a user (or set of users) needs, without including other data that can be repetitive, irrelevant or even sensitive. Different types of data filters can be used to amend reports, query results, or other kinds of information results.
observation
Time series, Cross sectional, Panel
What drives the increased interest in business analytics?
Tools of business analytics can aid decision making by: •Creatinginsightsfromdata •Improvingabilitytomoreaccuratelyforecastforplanning
(TRUE or FALSE) Strategic decisions are those that affect the direction of the firm over a long time horizon while tactical decisions concern how the organization should achieve the goals and objectives set by strategy
True
(TRUE or FALSE) Strategic decisions are those that affect the direction of the firm over a long time horizon while tactical decisions concern how the organization should achieve the goals and objectives set by strategy.
True.
4Vs.
Volume-Data at rest, terebytes to exabytes of existing data to process Velocity-data in motion, streaming data, milliseconds to seconds response Variety-data in many forms, structured, unstructured, text, multimedia. Veracity-data in doubt, uncertainly due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations
What's a sample?
a Subset of the population
What is a random variable?
a characteristic of a sample measured with some sort or error associated with it.
• Time series
a forecasting technique that uses a series of past data points to make a forecast
Pivot tables
a program tool that allows you to reorganize and summarize selected columns and rows of data in a spreadsheet or database table to obtain a desired report.
From a survey. Are the data categorical or quantitative? a. Your date of birth b. Yes or No: You participated in sports in high school. c. If b=Yes, have you won any competitions? d. If c=Yes, how many have you won?
a- quantitative b- quantitative c- categorical d- quantitative
• Cross sectional
are observations that come from different individuals or groups at a single point in time.
For data having a bell-shaped distribution, approximately _____ percent of the data values will be within one standard deviation of the mean. a. 95 b. 66 c. 68 d. 97
c. 68
qualitative variable
describes an individual by placing the individual into a category or group, such as male or female categorical variable.
(TRUE or FALSE) You would rather have your math score be in the 1st percentile than in the 99th percentile.
fasle
• Panel
mix of time series and cross sectional
Dataset
the complete set of raw data, for all observational units and variables, in a survey or experiment
What is skewness?
the lack of symmetry of a distribution
Using the zscore to analyze distributions
• Measures the relative location of a value in the data set 𝑧 = ௫ ି ௫̅ ௦ • 𝑧 = z-score for 𝑥 • 𝑥̅= sample mean • s = sample standard deviation "Empirical Rule" • For data having a bell-shaped distribution: ±1 standard deviation of the mean: ~68% of the data values ± 2 standard deviations of the mean ~95% of the data values ± 3 standard deviations: virtually of the data values Z-scores are used to identify outliers