BSIS STUDY GUIDE EXAM #1
bins
created to show variability in data goal is to use enough bins to show the variation in the data but not so many that contain only a few data items stick between 5-20 bins
Tactical Decisions
decisions about how things will get done based on strategy
empirical rule
determine the % of data values that are within a specified number of standard deviations of the mean
interquartile range
difference between third and first quartiles useful measure of variation for data that have extreme values or are highly skewed
Predictive Analytics
extracts information from data and uses it to predict future trends and identify behavioral patterns
a z score greater than zero occurs for observations with a value ____ than the mean and z score less than zero occurs for observations with a value ______ than the mean
greater and less
z score
helps determine how far a particular value is from mean relative to the data sets standard deviation
coefficient of variation
how large the standard deviation is relative to the mean ( high # greater variability and low # means data isclose together with less variability) standard deviation/ mean
Prescriptive Analytics
identify the best alternatives to minimize or maximize some objective
quantitative data
if numeric and arithmetic operations such as addition subtraction multiplication and division can be performed on them
symmetric histogram
left tail mirrors the shape of the right tail
geometric mean
mean rate of change over time used to determine rate of change over several successive periods
Outliers Z score
-Values either less than -3 or grater than 3
Problem Solving Process
1. Identify and define the problem 2. Determine the criteria that will be used to evaluate alternative solutions 3. Determine the set of alternative solutions 4. Evaluate the alternatives 5. Choose an alternative
three types of decision making
1. Strategic Decisions - Big why and Whats 2. Tactical Decisions - Big Hows for Strategy 3. Operational Decisions - Day to Day Hows
Approaches for making decisions
1. We have always done it this way 2. Gut feel 3. Rules of Thumb 4. Using relevant data
calculate location of percentile
1. arrange data smallest to largest 2. smallest position is in position q, next smallest value in position 2 and so on 3. Lp= P/100 (N( total data amount) +1) 4. review page 48
3 steps for a frequency distribution with quantitative data
1. determine the number of non overlapping bins 2. determine the width of each bin 3. determine the bin limits
List the quartiles and their percentages.
25% first quartile 50 % 2nd quartile also the median 75 % third quartile
almost all of the data values will be within
3 standard deviations of the mean
Histogram
A graph of vertical bars representing the frequency distribution of a set of data.
Covariance
A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship
Boxplot
A plot of data that incorporates the maximum observation, the minimum observation, the first quartile, the second quartile (median), and the third quartile. - box drawn with ends of the box located at the first and third quartiles vertical line for median interquartile sets range for low and high whiskers are small and large data values
Median (even)
Add two numbers in the middle, divide by 2
cross sectional data
collected from several entities at the same or approximately the same point in time
scatter chart
Analyze the relationship between two variables
The test scores of 8 students are listed below. Find the standard deviation of the test scores. 80 82 83 86 89 92 95 99 6.71 8 45.02 88.25
Feedback: Correct. Standard deviation is found by summing the squared deviations from the mean and dividing by (n - 1).
time series data
collected over several time periods
Student grades are shown in the table below. The relative frequency for students who earned a D is 0.07. A:10 B:31 C:36 D:6 True False
Correct. The relative frequency of a bin equals the fraction or proportion of items belonging to a class divided by the total sample size. Relative frequency of a bin = Frequency of the bin divided by n. 6/83=0.07.
categorical data
Data that consists of names, labels, or other nonnumerical values( can't perform arithmetic operations)
According to the Empirical Rule, for data having a bell-shaped distribution approximately what percent of the data falls within 2 standard deviations of the mean? 68% 90% 95% 99.7%
Feedback: According to the Empirical Rule, for data having a bell-shaped distribution, approximately 95% of the data falls within 2 standard deviations of the mean.
Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. The bin size for the histogram is 4. True False
Feedback: Correct. (largest data value - smallest data value)/number of bins; (32 - 13)/5 = 3.8, so we round up to 4.
The test scores of 32 students are listed below. Find the interquartile range. 24.5 32 42.5 67
Feedback: Correct. IQR = Q3 - Q1. The third quartile is calculated by finding the location of the 75th percentile. We need the number that is in position 24 plus ¾ of the difference between the 24th and 25th values. Q3= 79 + 0.75(80 - 79) = 79.75. The first quartile is calculated by finding the location of the 25th percentile. . We need the number that is in position 8 plus ¼ of the difference between the 8th and 9th values. Q1= 55 + 0.25(56 - 55) = 55.25. IQR = 79.75 - 55.25 = 24.5.
The heights of women ages 18 to 24 have a mean of 64.5 inches and a standard deviation of 2 inches. Suppose you have a female friend that is 20 years old and is 66 inches tall. How many standard deviations does she fall above the mean height for women age 18 to 24? 0.75 1 1.5 1.75
Feedback: Correct. Recall that a z-score indicates how many standard deviations a given value falls away from the mean. This question is asking you to calculate a z-score. The woman's z-score = (sample - mean)/standard deviation, so (66 - 64.2)/2 = 0.75.
The College Board reported that in 2014, the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution; use the empirical rule to find the percent of students who scored less than 494. 2.5% 5% 16% 95%
Feedback: Correct. The z-score for this situation is (494 - 686)/96 = -2. Recall that 95% of observations will fall within 2 standard deviations of mean. This means that 2.5% of observations will fall above 2 standard deviations and 2.5% of observations will fall below 2 standard deviations. This question asks for the percent of students who scored less than 494 (below 2 standard deviations).
The test scores of 32 students are listed below. The median is 69.5. 32 37 41 44 46 48 53 55 56 57 59 63 65 66 68 6970 71 74 74 75 77 78 79 80 82 83 86 89 92 95 99 True False
Feedback: Correct. To solve, arrange all values in ascending order, noting the position of each value: the final position will be n, or the sample size. Calculate location (or position) of the 50th percentile. The median is the 16th term plus 0.5 times the difference between the 16th and 17th terms. Median = 69 + 0.5(70=69) = 69.5.
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. -2 2 1.2 0.5
Feedback: Correct. z-score = (sample - mean)/standard deviation, so (48 - 70)/11 = -2.
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. In this scenario, David has the higher standardized score. True False
Feedback: David's standardized score is (52 - 70)/11 = -1.64 and Steven's standardized score is (52 - 64)/6 = -2. David has the higher standardized score.
Compute the relative frequencies for students who earned a B, as shown in the table of grades below. A:10 B:31 C:36 D:6 0.37 0.43 0.62 2.67
Feedback: Incorrect. The relative frequency of a bin equals the fraction or proportion of items belonging to a class divided by the overall sample size. There were 31 students who earned a B. We divide 31 by the total of 83 to find the relative frequency of students who earned a B. 31/38 = 0.37.
The College Board reported that in 2014, the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution; use the empirical rule to find the percent of students who scored less than 494. 2.5% 5% 16% 95%
Feedback: Incorrect. The z-score for this situation is (494 - 686)/96 = -2. Recall that 95% of observations will fall within 2 standard deviations of mean. This means that 2.5% of observations will fall above 2 standard deviations and 2.5% of observations will fall below 2 standard deviations. This question asks for the percent of students who scored less than 494 (below 2 standard deviations
A Forbes subscriber survey asked 52 questions about subscriber characteristics and interests. What type of data is provided by the following question: "How long have you been in your present job or position?" categorical quantitative time series cross-sectional data
Feedback: The question posed provides quantitative data. Quantitative data is data where numerical values are used to indicate magnitude, such as how many or how much. Arithmetic operations such as addition, subtraction, and multiplication can be performed on quantitative data.
attached is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the relative frequency for audits that took 25 or more days? 0.25 0.2 0.4 0.45
Feedback: The relative frequency for audits that took 25 or more days is (1 + 3)/20. = 0.2. This is found by summing the frequencies for bins 25-28 and 29-32 then dividing by the total frequency.
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming SAT scores follow a bell-shaped distribution; use the empirical rule to find the percent of students who scored more than 600. 2.5% 16% 50% 68%
Feedback: The z-score for this situation is (700 - 600)/100 = 1. Recall that 68% of observations will fall within 1 standard deviation of mean. This means that 16% of observations will fall above 1 standard deviation and 16% of observations will fall below 1 standard deviation. This question asks for the percent of students who scored more than 600 (above 1 standard deviation). The correct answer is 16%.
The College Board originally scaled SAT scores so that the scores for each section were approximately normally distributed with a mean of 500 and a standard deviation of 100. Assuming SAT scores follow a bell-shaped distribution; use the empirical rule to find the percent of students who scored less than 700. 2.5% 16% 95% 97.5%
Feedback:. The z-score for this situation is (700 - 500)/100 = 2. Recall that 95% of observations will fall within 2 standard deviations of mean. This means that 2.5% of observations will fall above +2 standard deviations and 2.5% of observations will fall below -2 standard deviations. Since 2.5% of students score above 700 (above +2 standard deviations), 97.5% of students score less than 700 (below +2 standard deviations).
Operations Decisions
concerned with running day to day operations
relative frequency distribution
Frequency of the bin( # of times item shows up in set/ n( total number in set)
covariance is less than 0
x and y variables are negatively related which means as x increases y generally decreases
covariance is near 0
x and y variables are not linearly related
strategic decision making
Managers develop overall strategies, goals, and objectives
CORRELATION COEFFICIENT U SHAPE
NO LINEAR RELATIONSHIPS
Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. The relative frequency of the bin 21-24 is 0.2. True False
Relative frequency of a bin = Frequency of the bin/n. The frequency of the bin 21-25 is 5 and the total of each bin is (4 + 7 + 5 + 1 + 3) =20. Therefore, the relative frequency of the bin 21-24 is 5/20 = 0.25.
Percentile
Specific point in a distribution of data that has a given percentage of cases below it.
Below is the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the median number of days that it took Wyche Accounting to perform audits in the last quarter of last year? 19.5 20 20.5 31
The median is the value in the middle when the data are arranged in ascending order. Median = average of middle two values if n is even. In this case the median is (19+20)/2.
skewed right distribution
The peak of the data is to the left side of the graph. There are only a few data points to the right side of the graph.
Skewed Left Distribution
The peak of the data is to the right side of the graph. There are only a few data points to the left side of the graph.
Four V's of Big Data
Volume, Velocity, Variety, Veracity
big data
a set of data that cannot be managed processed or analyzed with commonly available software in a reasonable amount of time
frequency distribution
a summary of data that shows the number( frequency) of observations in each of several non overlapping classes aka bins details: 1. count how many times each item appears for observations (categorical data) aka frequency
Highly Skewed Right Histogram
a very long tail to the right
bin limits
must be chosen so that each data item belongs to one and only one class lower and upper bin limit assigns lowest possible and biggest allowed in each class
a score of 0 indicates
observation is equal to mean
68% empirical rule
of data values will be within 1 standard deviation of the mean
95% empirical rule
of the data value will be within 2 standard deviations of the mean
covariance greater than 0
positive relationship
mean
the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores total of all data sets/ number data
range
the difference between the highest and lowest scores in a distribution
Median odd
the middle score in a distribution; half the scores are above it and half are below it
mode
the most frequently occurring score(s) in a distribution
Descriptive Analytics
the use of data to understand past and current business performance and make informed decisions ( data queries, reports)
Business Analytics (BA)
uses data and statistical methods to gain insight into the data and provide decision makers with information they can act on
width of bins
width be the same for each bin largest data value - smallest data value/ number of bins
z score formula
z=(X (deviation above the mean)-mean)/standard deviation