BUAL Exam 1
Secondary Sources
data collected by existing sources
Time Series Data
data collected over different time periods
Experimental and observational studies
data we collect ourselves for a specific purpose
Existing sources
data already gathered by public or private sources
Cross-Sectional Data
data collected at the same or approximately the same point in time
Descriptive Statistics
"describe", the science of describing the important aspects of a set of measurements
Calculating Class Length
(largest number - smallest number)/number of classes
z score equation
(x-mean)/standard deviation
Find the z-score for an IQ test score of 92 when the mean is 100 and the standard deviation is 15. .53 .77 −.77 −.53 −8.00
-.53
What are the differences between a histogram and a bar graph?
-bars on the histogram touch to represent continuous data while the bars do not touch on a bar graph -one axis of a bar graph is categories (qualitative) where the axis on the histogram is quantitative data grouped into classes -frequencies on a bar graph represent the counts from categories (qualitative data) while frequencies on histograms present counts of quantitative data values grouped into classes
Measures of Variation
-knowing the measures of central tendency is not enough -range, variance, standard deviation
The following is a partial relative frequency distribution of grades in an introductory statistics course. grade and relative frequency: A (.22) B (?) C (.18) D (.17) F (.06) Find the relative frequency for the B grade.
.37
Example of interval variable not being "0"
0 degrees means cold, not no heat
Find the z-score for an IQ test score of 118 when the mean is 100 and the standard deviation is 15. 1.2 1.0 18.0 −1.03 −1.2
1.2
In a statistics class, 10 scores were randomly selected with the following results (mean = 71.5): 74, 73, 77, 77, 71, 68, 65, 77, 67, 66.What is the range? 22.72 12.00 4.77 516.20 144.00
12.00
The following is a relative frequency distribution of grades in an introductory statistics course. Grade and Relative Frequency: A (.22) B (.37) C (.18) D (.17) F (.06) If we wish to depict these data using a pie chart, find how many degrees (out of 360 degrees) should be assigned to grade B 133.2 degrees 37 degrees 79.2 degrees 140 degrees
133.2 degrees
Which percentile describes the first quartile, Q1? 25th 50th 75th 100th
25th
A normal population has 99.73 percent of the population measurements within ________ standard deviation(s) of the mean. 1 2 3 4
3
The following is a relative frequency distribution of grades in an introductory statistics course. Grade and Relative Frequency: A (.22) B (.37) C (.18) D (.17) F (.06) If this was the distribution of 200 students, give the frequency distribution for grade A. 44 22 200 22
44
The number of weekly sales calls by a sample of 25 pharmaceutical salespersons is below. 24, 56, 43, 35, 37, 27, 29, 44, 34, 28, 33, 28, 46, 31, 38, 41, 48, 38, 27, 29, 37, 33, 31, 40, 50How many classes should be used in the construction of a histogram? 4 6 10 5 2
5
In a statistics class, the following 10 scores were randomly selected: 74, 73, 77, 77, 71, 68, 65, 77, 67, 66. What is the mean? 71.5 72.0 77.0 71.0
71.5
In a statistics class, the following 10 scores were randomly selected: 74, 73, 77, 77, 71, 68, 65, 77, 67, 66. What is the median? 71.5 72.0 77.0 71.0
72
Which percentile describes the third quartile, Q3? 25th 50th 75th 100th
75th
In a statistics class, the following 10 scores were randomly selected: 74, 73, 77, 77, 71, 68, 65, 77, 67, 66. What is the mode? 71.5 72.0 77.0 71.0
77
If there are 130 values in a data set, how many classes should be created for a frequency histogram? 4 5 6 7 8
8
Which of the following is a type of question used in survey research?dichotomous open-ended multiple-choice All of the other answers are correct
All of the other answers are correct
Which of the following is a type of question used in survey research? open-ended All of the other answers are correct. multiple-choice dichotomous
All of the other answers are correct.
In ________ we select elements because they are easy to sample? random sampling probability sampling convenience sampling judgment sampling
Convenience sampling
A(n) ________ is a graphical presentation of the current status and historical trends of a business's key performance indicators. frequency distribution histogram Pareto chart Dashboard
Dashboard
Which of the following are quantitative variables? Nominative Ordinal Interval Ratio
Interval and ratio
Describing central tendency
a measure of central tendency represents the center or middle of the data
_______ uses traditional or newer graphics to present visual summaries of business information. Nonparametric predictive analytic Parametric predictive analytics Prescriptive analytics Graphical descriptive analytics
Graphical descriptive analytics
Which variables are qualitative? Nominative Ordinal Interval Ratio
Nominative and Ordinal
Primary sources
data collected by an individual or business directly through planned experimentation
referring to the pop and sample
Pop: N Sample: n
_______ sampling is where we know the chance that each element will be included in the sample, which allows us to make statistical inferences about the sample population. Convenience Voluntary Probability Judgment
Probability
________ sampling is where we know the chance that each element will be included in the sample, which allows us to make statistical inferences about the sample population. Voluntary Convenience Probability Judgment
Probability
Equation for variance
SD^2= sigmaE(x-m)^2/N
Examples of descriptive statistics
Salaries: high, low, mean, median, graph
________ is the difference between a numerical descriptor of the population and the corresponding descriptor of the sample. Nonresponse Sampling error Observation error Non observation error
Sampling error
You want to select a simple random sample of 100 employees of Company X. You assign a number to every employee in the company database from 1 to 1000 and use a random number generator to select 100 numbers. Simple random sample Cluster random sample Stratified random sample Analytics random sample
Simple random sample
The company has 800 female employees and 200 male employees. You want to ensure that the sample reflects the gender balance of the company, so you sort the population into two strata based on gender. Then you use random sampling on each group, selecting 80 women and 20 men, which gives you a representative sample of 100 people. What time of sampling is this?
Stratified sampling
All employees of the company are listed in alphabetical order. From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards, every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people Cluster random sample Stratified random sample Analytics random sample Systematic sampling
Systematic sampling
Skewed to the right
The right tail of the histogram is longer than the left tail
Ogive
a graph of cumulative distribution -plot a point above each upper class boundary at a height of the cumulative frequency -connect the points with line segments can be drawn using cumulative relative frequencies and cumulative percent frequencies
Examples of factors
Will you get accepted into Auburn? the things that are taken into consideration are GPA, HS, rank, test scores, extracurricular activities
According to a survey of the top 10 employers in a major city in the Midwest, a worker spends an average of 413 minutes a day on the job. Suppose the standard deviation is 26.8 minutes, and the time spent is approximately a normal distribution. What are the times within which approximately 68.26 percent of all workers will fall? [394.8, 431.2] [386.2, 439.8] [372.8, 453.2] [359.4, 466.6] [332.6, 493.4]
[386.2, 439.8]
Example of judgment sampling
a class and only looking at student doctors
frequency distribution
a list of data classes with the count of values that belong to each class "classify and count" table
population parameter
a number calculated from all the population measurements that describes some aspect of the population -standard deviation -variance -population mean
Sample statistic
a number calculated using the sample measurements that describes some aspect of the sample
Finite population
a population of limited size
Data Warehousing
a process of centralized data management and retrieval (its objective is the creation and maintenance of a central repository for all of an organization's data)
Ordinal Variables
a qualitative variable for which there is meaningful ordering, or ranking, of the categories
Nominative Variables
a qualitative variable for which there is no meaningful ordering, or ranking, of the categories
Process
a sequence of operations that takes inputs and turns them into outputs
Population
a set of all elements about which we wish to draw conclusions, it is everything of a group
Sample
a subset of the elements of a population
Contingency table
a table consisting of rows and columns that is used to classify data on two dimensions
measurement
a way to assign a value to an element
How to calculate mean
add all the variables/the number of variables that you added
Example of population
all netflix customers, all amazon customers, all mastercard customers
Interval Variable
all of the characteristics of ordinal plus -measurements are on a numerical scale with an arbitrary zero point -the zero is assigned: it is nonphysical and not meaningful -zero does not mean the absence of the quantity that we are trying to measure
Ratio Variable
all the characteristics of interval plus -measurements are on a numerical scale with a meaningful zero point (zero means none or nothing) -values can be compared by their intervals and ratios -in business and finance, most quantitative variables are ratio variables, such as anything related to money
Multiple choice questions
allow more than two responses, usually analyzed with averages
Census
an examination all of the population measurements, 100%
A measurement located outside the upper limits of a box-and-whiskers display is ________. always in the first quartile an outlier always the largest value in the dataset within the lower limits
an outlier
Example of sample
analytics students in the business school
variable
any characteristic of an element
Variance
average of squared number deviations of individual measurements from mean
Example of primary sources
banks, credit cards, etc
The general term for a graphical display of categorical data made up of vertical or horizontal bars is called a(n) ________. pie chart Pareto chart bar chart ogive plot
bar chart
Which of the following graphs is for qualitative data? histogram bar chart ogive plot stem-and-leaf
bar chart
Pareto Chart
bar chart having different kinds of defects listed on the horizontal scale -bar height represents frequency -arranged in decreasing height from left to right
A ________ displays the frequency of each class with qualitative data and a ________ displays the frequency of each class with quantitative data. histogram, stem-and-leaf display bar chart, histogram scatter plot, bar chart stem-and-leaf, pie chart
bar chart, histogram
Qualitative data is typically in the form of a
bar graph, pie chart, or pareto chart
What kind of visualizations as associated with qualitative data?
bar graphs, pie charts
Improper sampling
biased, convenience, voluntary, and judgment
Transactional data are now used by businesses as part of experimental studies survey analysis big data descriptive statistics
big data
Two types of modes
bimodal, multimodal
As a business owner, I have requested my staff to develop a set of dashboards that can be used by the public to show wait time at each of my four local coffee shops at peak times during the day and whether the time is short, medium, or long. Which of the following graphical displays would be the best choice? bullet graph sparkline treemap gauges
bullet graph
Web surveys
cheaper still, same problems as mail surveys
Pie chart
circle divided into slices where the size of each slice represents its relative frequency and or percent frequency
Dichotomous questions
clearly stated, easy to answer, easy to analyze, limited information
A quantity that measures the variation of a population or a sample relative to its mean is called the ________. range standard deviation coefficient of variation variance
coefficient of variation
A quality control worker at a factory selects the first 10 items she sees as her sample for the day. What is this an example of?
convenience sampling
A restaurant leaves comment cards on all of its tables and encourages customers to participate in a brief survey to learn about their overall experience. What is this an example of?
convenience sampling
In ________ we select elements because they are easy to sample. random sampling convenience sampling judgment sampling probability sampling
convenience sampling
Which of the following is a measure of the strength of the linear relationship between x and y that is dependent on the units in which x and y are measured? covariance correlation coefficient slope least squares line
covariance
A Yes or No question is ________. systematic evaluative dichotomous open-ended
dichotomous
A stem-and-leaf display is best used to ________. provide a point estimate of the variability of the data set provide a point estimate of the central tendency of the data set display the shape of the distribution display a two-variable treemap.
display the shape of the distribution
Multistage cluster sampling
divide population into clusters and then randomly select clusters to sample
Stratified random sample
divide population into non-overlapping groups (strata) then select a random sample from each strata
Examples of quantitative measurements
dollar amount, miles, gallons, feet, percentages, etc, selling price of a home, payment of bill, how many apples did you buy
Random Sample
equal chance of getting selected
Example of big data
facebook: pictures, videos, text, like and dislike
Which of the following is not a supervised learning technique in predictive analytics? linear regression factor analysis decision trees neural networks
factor analysis
Definition of data
facts and figures from which conclusions can be drawn
How to calculate mode
find the number that appears the most
A population that consists of all the customers who will use the drive-thru of the local fast food restaurant is called a(n) ________. infinite population random sample population statistical population
finite population, because you can count how many customers can count how many customers came to the drive thru)
Examples of nominative variables
gender, car color
Stem and leaf display
graphical portrayal of a data set that shows the data set distributions by using stems consisting of leading digits and leaves consisting of trailing digits
Histogram
graphically displays frequency distribution, relative frequency distribution, or percent frequency distribution. It divides the measurements into class and graphs frequency, relative frequency, or percent for each class
Which of the following divides quantitative measurements into classes and graphs the frequency, relative frequency, or percentage frequency for each class? histogram dot plot stem-and-leaf display scatter plot
histogram
What kind of visualizations as associated with quantitative data?
histograms, dot plots, line graph, scatterplots
The empirical rule for normal populations
if a population as a mean and standard deviation and is described by a normal curve, then -68.36% of the population measurement lie within one standard deviation of the mean -95.44% lies within two standard deviations of the mean -99.73% lie within three standard deviations of the mean
Multimodal
if there are more than two mode
Bimodal
if there are two modes
As the coefficient of variation ________, risk ________. increases, decreases decreases, increases increases, increases remains constant, increases
increases, increases
Phone surveys
inexpensive, low response rate
Mail surveys
inexpensive, low response rates (20-30 percent) Requires multiple mailings
Population mean
is the average of the population measurements
Range
largest measurement minus the smallest measurement
Sample frame
list from which the sample was selected
Systematic sampling
list population, select at a random starting point, sample each "nth" element
Prescriptive analytics
looks at variables and constraints, along with predictions from predictive analytics, to recommend courses of action
Big Data
massive amount of data, often collected in real time in different forms, sometimes needing quick analysis
examples of measures of central tendency
mean, median, mode
If a population distribution is skewed to the right, then, given a random sample from that population, one would expect that the ________. median would be greater than the mean mode would be equal to the mean median would be less than the mean median would be equal to the mean
median would be less than the mean
Sampling designs
methods for obtaining a sample
Predictive analytics
methods used to find anomalies, patterns, and associations in data sets to predict future outcomes
Which of the following is a quantitative variable? a person's gender the manufacturer of a cell phone mileage of a car whether a person is a college graduate
mileage of a car
Which of the following is a quantitative variable? the manufacturer of a cell phone a person's gender mileage of a car whether a person is a college graduate whether a person has a charge account
mileage of a car
Personal interviews
more expensive, more control, and higher response rate
Open-ended questions
most honest and complete information cannot be readily summarized
Number of classes equation
n=population 2^k>n
When developing a frequency distribution, the class (group) intervals must be ________. large small integer nonoverlapping
nonoverlapping
Examples of infinite population
numbers of stars in the sky, number of red blood cells in human body
Observational study
observes individuals and measures variables of interest but does not attempt to influence the responses
A(n) ________ is a graph of a cumulative distribution. histogram scatter plot ogive pie chart
ogive
An identification of police officers by rank would represent a(n) ________ level of measurement. nominative ordinal interval ratio
ordinal
Factors
other variables related to the response variable
A(n) ________ can be used to differentiate the "vital few" causes of quality problems from the "trivial many" causes of quality problems. histogram scatter plot pareto chart ogive plot stem-and-leaf display
pareto chart
Examples of qualitative measurements
phone number, zip code, social security number, address
Types of surveys
phone, mail, web, personal interviews
All of the following are used to describe quantitative data except the ________. histogram stem-and-leaf chart dot plot pie chart
pie chart
Data that are collected by an individual through personally planned experimentation or observation are ________. secondary data quantitative data primary data variables
primary data
Data that are collected by an individual through personally planned experimentation or observation are ________. variables secondary data primary data quantitative data
primary data
One method of being sure a sample being studied can be used to make statistical inferences about the population is to select a convenience sample. voluntary response sample. judgment sample probability sample.
probability sample
A sequence of operations that takes inputs and turns them into outputs is a ________. statistical inference random sampling process runs plot
process
Cross tabulation
process that classifies data into two dimensions
Dashboard
provides a graphical presentation of the current status and historical trends of key performance indicators
How to calculate median
put all the numbers in numerical order, find the middle number
what are two types of measurements?
qualitative and quantitative
All of the following are measures of central tendency except the ________. range mode mean median
range
Cumulative Distribution
rather than a count, we record the number of measurements that are less than the upper boundary of that class "running total"
The ________ is the positive square root of the sample variance. sample mean sample standard deviation range median
sample standard deviation
Judgment sampling
samples in which a person who is extremely knowledgeable about the population selections population elements he or she feels are most representative
Voluntary response sampling
samples in which participants self select -frequently used by radio and television -over represent people with strong opinions
Probability Sampling
sampling where we know the chance that each element in the population will be included in the sample -required for statistical inference -random sample
Convenience sampling
sampling where we select elements because they are convenient to sample -easy and convenient -not a probability sample
A ________ shows the relationship between two variables. stem-and-leaf bar chart histogram scatter plot pie chart
scatter plot
Which of the following graphical tools is not used to study the shapes of distributions? stem-and-leaf display scatter plot histogram Bar graph
scatter plot
If the mean is greater than the median, then the relative frequency curve is most likely to be ________. skewed right skewed left symmetrical bimodal
skewed right
mode<median<mean
skewed to left
mode>median>mean
skewed to right
Mean=30.25 Median=31 Mode=32 How is this skewed?
skewed to the left mean<median<mode
A relative frequency histogram having a longer tail to the right than to the left is said to be ________. skewed to the left normal a scatter plot skewed to the right
skewed to the right
The number of weekly sales calls by a sample of 25 pharmaceutical salespersons is below. 24, 56, 43, 35, 37, 27, 29, 44, 34, 28, 33, 28, 46, 31, 38, 41, 48, 38, 27, 29, 37, 33, 31, 40, 50 What is the shape of the distribution of the data? skewed to the right skewed to the left normal bimodal
skewed to the right
In the least squares line, ________ is defined as rise/run. correlation coefficient predicted value of y y-intercept slope
slope
Nonresponse
some of the individuals who were supposed to be included in the sample are not
standard deviation equation
square root of the variance
Linear scatterplots
straight line relationship between two variables
Nonoverlapping groups of similar elements in a population are called clusters. frames strata. stages.
strata
Alternatives to random sampling
stratified random sample, multistage cluster sampling, systematic sampling
Example of a finite population
students in a class, number of cars in parking lot, number of births per year
Relative frequency
summarizes proportion of items in each class
Frequency distribution table
summarizes the number of items in each of the several non overlapping classes
Mean=median=mode
symmetrical distribution
Examples of ordinal variables
teaching effectiveness
Example of response variable
the "y" in the example y=5x+6x
Data set
the data that are collected for a particular study
Sampling error
the difference between a numerical descriptor of the population and the corresponding descriptor of the sample
Target population
the entire population of interest
Examples of existing sources
the internet, library, us government, data collection agency
Skewed to the left
the left tail of the histogram is longer than the right tail
Qualitative Measurement
the possible measurements fall into several categories and are things that cannot be counted, they are descriptive things, and cannot be mathematical methods, they can be numerical or non numerical
quantitative measurement
the possible measurements of values of a variable are numbers that represent quantities
Experimental Study
the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables
Symmetrical
the right and left tails of the histogram appears to be mirror images of each other
sample survery
the sample we take
statistical inference
the science of using a sample of measurements to make generalizations about the important aspects of a population of measurements, "drawing conclusions"
Composite score
the total number of scores added up to get another given number
Data mining
the use of predictive analytics, algorithms, and IS techniques to extract useful knowledge from huge amounts of data
Descriptive analytics
the use of traditional and newer graphics to represent easy to understand visual summaries to up to minute data
Business Analytics
the use of traditional and newly developed statistical methods, advances in IS, and techniques from management science to explore and investigate past performance
No linear relationship
there is no coordinated linear movement between the two variables
The purpose of stem and leaf display
to see the overall pattern of the data by grouping into classes best for small to moderately sized data distributions
Scatterplots
used to study the relationships between two variables place on variable on x axis place a second variable on y axis place a dot on the coordinates
Response variable
variable of interest, it is the dependent variable
Bar chart
vertical or horizontal rectangle represents frequency of each category
Supervised learning
we observe values of a respond variable and corresponding predictor variables 1. linear regression 2. logistic regression 3. neural networks 4. decision trees
Unsupervised learning
we observe values of variables but not a response variable 1. cluster analysis 2. factor analysis 3. association rules
Errors of observation
when data values are recorded incorrectly
Recording error
when either the respondent or interviewer incorrectly marks an answer
Negative linear scatterplots
when one variable goes up the other variable goes down
Positive linear scatterplots
when one variable goes up, the other variable goes up
Response bias
when respondents do not tell the truth (also occurs when biased questions are used)
Undercoverage
when some population elements are excluded from the process of selecting the sample
Selection bias
when the opinions of those who complete a survey vary dramatically from those who do not