Stat Exam Prep
Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. Michelle has a score of 48. Convert Michelle's score to a z-score. (Round to two decimal places if necessary.)
-2
If P(A)=.80, P(B)=.65 and P(AUB)=.78, then P(A|B)=
.9750
The starting salaries of individuals with an MBA degree are normally distributed with a mean of $40,000 and a standard deviation of $5,000. What percentage of MBA's will have starting salaries of $34,000 to $46,000?
0.0668
Forty percent of all registered voters in a national election are female. A random sample of 5 voters is selected. What is the probability that there are no females in the sample?
0.0778
The random variable x is known to be uniformly distributed between 70 and 90. The probability of x having a value between 80 and 95 is
0.5
Suppose we had a data set of a call center where customers were asked to choose between the following three options: hear account information, billing questions, and customer service. Using the given order of the three options, using 0-1 dummy variables to encode the categorical variables, which of the following combinations would yield an entry "customer service"?
001
The strength of a cluster can be measured by comparing the average distance in a cluster to the distance between cluster centroids. One rule of thumb is that the ratio for between-cluster distance to within-cluster distance should exceed what value for useful clusters?
1
What is the total area under the normal distribution curve?
1
Compute the geometric mean for the following data on growth factors of an investment for 10 years. 1.10, .50, .70, 1.21, 1.25, 1.12, 1.16, 1.11, 1.13, 1.22
1.0148
A sample of 2,500 people was asked how many cups of coffee they drink in the morning. You are given the following information: 0=700, 1=900, 2=600, 3=300. The expected number of cups of coffee is
1.2
A survey of 100 random high school students finds that 85 students watch the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either?
10
The College Board reported that the mean Math Level 2 SAT subject test score was 686 with a standard deviation of 96. Assuming scores follow a bell-shaped distribution, use the empirical rule to find the percentage of students who scored less than 494.
2.5%
The manager of a grocery store has selected a random sample of 100 customer. The average length of time it took these 100 customers to check out was 3.0 minutes. It is known that the SD of the check out time is one minute. The 95% confidence interval for the average checkout time for all customers is________________.
2.804 to 3.196
In order to determine an interval for the mean of a population with unknown SD, a sample of 24 items is selected. The mean of the sample is determined to be 23. The number of degrees of freedom for reading the t-value is
23
Manhattan distance is the distance traveled as if traveled along rectangular city blocks. The Manhattan distance for the standardized observations of (-1.85, 0.65) and (0.55, -.75)
3.80
A researcher has collected the following sample data. The mean of the sample is 5. 3,5,12,3,2. What is the SD?
4.062
Compute the relative frequency for students who earned a C shown in the table below. A=10, B=31, C=36, D=8. total 83
43
Euclidean distance can be used to calculate the dissimilarity between two observations. Let u=(25,$350) correspond to a 25-year old customer that spent $350 at Store A in the previous fiscal year. Let v-(53, $420) correspond to a 53-year old customer that spent $4,100 at store A in the previous year. Calculate the dissimilarity between these two observations using Euclidean distance.
75.39
Which of the following graphs provides information on outliers and IQR of a data set?
Box Plot
_____________ is a measure of calculating dissimilarity between clusters by calculating the distance between the centroids of the two clusters.
Centroid linkage
Natalie needs to compare the number of employees by job title for the last five years. Which of the following charts should Natalie use?
Clustered-column (bar) chart
______________ is a measure of calculating dissimilarity between clusters by considering only the two most dissimilar observations in the two clusters.
Complete linkage
________________ are collected from several entities at the same point in time.
Cross-sectional data
A retail owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such prediction.
Data Mining
A student wants to determine if pennies are really fair, meaning equally likely to land heads up or tails up. He flips a random sample of 50 pennies and finds that 28 of them land heads up. What are the appropriate null and alternative hypotheses?
Hp:p=0.5, Hap=0.5
_____________ is the most critical step of the decision making process.
Identifying and defining the problem
Which statement is true about mutually exclusive events?
If events A and B cannot occur at the same time, they are called mutually exclusive.
Data ink used in a table or chart that
Is necessary to convey the meaning of the data to the audience
Which of the following is true of Euclidean distances?
It is commonly used as a method of measuring dissimilarity between quantitative observations.
DJ needs to display data over time. Which of the following charts should he use?
Line chart
_____________ is the dissimilarity measure that is more robust to outliers than Euclidean distance.
Manhattan distance
If A and B are independent events with P(A)=.38 and P(B)=.55, then P(A|B)=
None of the answers are correct
To summarize and analyze data with both a crosstabulation and charting, Excel typically pairs
PivotCharts with PivotTables
Which of the following analytical techniques helps us arrive at the best decision?
Prescriptive analytics
Data-driven decision making tends to decrease a firms _________.
Risk
A __________ is a graphical presentation of the relationship between two quantitative variables.
Scatter chart
A ___________ decision is concerned with how the organization should acheive the goals and objectives set by its strategy.
Tactical
Which of the following statements is correct?
The binomial distribution is a discrete probability distribution and the normal distribution is a continuous probability distribution.
For a population with an unknown distribution, the form the sampling distribution of the sample mean is__________.
approximately normal for a large sample size
A chart that is recommended as an alternative to a pie chart is a
bar chart
The charts that are helpful in making comparisons between categorical variables are
bar charts and column charts
In interval estimation, as the sample size becomes larger the interval estimate
becomes narrower
The correlation coefficient will always take the values
between -1 and +1
The data preparation technique used in market segmentation to divide consumers into two different homogenous groups is called
cluster analysis
An alternative for a stacked column chart when comparing more than a couple of quantitative variables in each category is
clustered column chart
Complete linkage can be used to measure the distance between _______________ in cluster analysis.
clusters
A graphical presentation that uses vertical bars to display the magnitude of quantitate data is known as a
column chart
In preparing categorical variables for analysis, it is usually best to _____________.
convert the categories to binary, dummy variables
A data visualization tool that updates in real time and gives multiple outputs is called
data dashboard
The US Internal Revenue Service uses _________ to identify patterns that distinguish questionable annual income tax filing.
data mining
Fields may be chosen to represent all of the following except ________ in the body of a PivotTable.
filters
The finite correction factor should be used in the computation of the SD of the sample mean and the standard population with n/N is
greater than .05
A two-dimensional graph representing the data using different shades of color to indicate magnitude is called
heat map
Bar charts use
horizontal bars to display the magnitude of the quantitative variable
Tactical decisions are concerned with
how the organization should achieve the goals and objectives set by its strategy.
Deleting the grid lines in a table and the horizontal lines in a chart
increases the data-link ratio
A disadvantage of stacked-column charts and stacked-bar charts is that
it can be difficult to perceive small differences in areas
In a business, the values indicating the business's current operating characteristics, such as its financial position, the inventory on hand, and customer service metrics, are typically known as
key performance indicators
A time series plot is also known as a
line chart
Single linkage can be used to measure the distance between clusters that are the _______ in cluster analysis.
most similar
A simple random sample of size n from a finite people of size N is a sample selected such that each possible sample of size
n has the same probability of being selected
In k-clustering, k represents the
number of clusters
Euclidean distance can be used to measure the distance between ______ in cluster analysis.
observations.
K-means clustering is the process of ______________.
organizing observations into distinct groups based on a measure of similarity.
In many cases, white space in a chart can improve
readability
A _____________ acts as a representative of the population.
sample
The value of the _____ is used to estimate the value of the population parameter.
sample statistic
A useful chart for displaying multiple variables is the ____________.
scatter chart matrix
The basis for using a normal probability distribution to approximate the sampling distribution of the sample means and population mean is
the central limit theorem
All of the events in the sample space that are not part of the specified event are called
the complement of the event
When clustering only by dummy variables that represent categorical variables, the simplest measure of similarity between two observations is called the
the matching coefficient
Simulation optimization helps ______________.
to find good decisions in highly complex and highly uncertain settings.
The symbol U indicates the
union of events
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with a 95% confidence. He selects a random variable of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the estimate of the Standard error of the proportion?
.039 √.353(1-.353)/150
A sample of 51 observations will be taken from a process (an infinite population). The population proportion equals 0.85. The probability that the sample proportion will be between 0.9115 and 0.946 is
.0819
The p-value is equal to: (n=49, x=54.8, o=28, Hou=50= Hau=50
.2302
The assembly time for a product is uniformly distributed between 6 and 10 minutes. The probability density function has what value in the interval between 6 and 10?
.25
The CEO of a company wants to estimate the percent of employees that use company computers to go on Facebook during work hours with 95% confidence. He selects a random sample of 150 of the employees and finds that 53 of them logged onto Facebook that day. What is the point estimate of the proportion of the population that logged onto Facebook that day?
.35
The time between arrivals of vehicles at a particular intersection follows an exponential probability distribution with a mean of 12 seconds. What is the probability that the arrival time between vehicles is 6 seconds or less?
.3935
In a multiple regression analysis involving 15 independent variables and 200 observations, SST=800 and SSE= 240. The coefficient of determination is
.700
A researcher has collected the following sample data. The mean of the sample is 5. What is the variance?
16.5
A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 steps and a SD of 1,500 steps. What percent of the days does he exceed 13,00 steps?
2.28%
A researcher has collected the following sample data. The mean of the sample is 5. 3,5,12,3,2. What is the coefficient variation?
81.24%
Compute the IRQ for the following data set: 10, 15, 17, 21, 25, 12, 16, 13,11, 22
9.50
A manager of a fast food restaurant wants the drive through employee to ask every 5th customer if he or she is satisfied with the service. Who makes up the population?
All customers who use the drive through window at the restaurant.
The ratio of the amount of ink used in a table or chart that is necessary to convey information to the total amount of ink used in the table and chart is known as data-ink ratio. Using additional ink that is not necessary to convey information has what effect on the data-ink ratio?
It reduces the data-ink ratio
____________ merges maps and statistics to present data collected over different geographies.
The geographic information system
If the Euclidean distance were to be represented in a right triangle, which of the following would be considered the distance between two observations?
The hypotenuse
Larger values of O have the disadvantage of increasing the probability of making a
Type 1 error
As the number of degrees of freedom for a t distribution increases, the difference between the t distribution and the standard normal distribution
becomes smaller
In order to visualize three variables in a two-dimensional graph, we use a
bubble chart
Average linkage is a measure of calculating dissimilarity between two clusters by
computing the average distance between every pair of observations between two clusters.
Corporate-level managers use _____________ to summarize sales by region, current inventory levels, and other company wide metrics all in a single screen.
data dashboards
A tree diagram used to illustrate the sequence of nested clusters produced by hierarchical clustering is known as a _____.
dendogram
In order to manage an organizations human resource activities, such as hiring employees, tracking, and influencing employee retention, HR personel uses ________________.
descriptive and predictive analytics
Jaccard's coefficient is different from the matching coefficient in that the former:
does not count matching zero entries while the latter does
Any data value with a z-score less than -3 or greater than +3 is considered to be a(n)
outlier
A __________ is used for examining data with more than two variables, and it includes a different vertical axis for each variable.
parallel-coordinates plot
A simple random sample of 31 observations was taken from a large population. The sample mean equals 5. Five is a
point estimate
Bayes' theorem is a method used to compute __________ probabilities.
posterior
Revised probabilities of events based on additional information are
posterior probabilities
Observation refers to the
set of recorded values of variables associated with a single entity
A line chart that has no axes but is used to provide information on overall trends for time series data is called a
sparkline
A method for modifying variables that reduces bias prior to cluster analysis is
standardization
The goal of __________ is to use the variable values to identify relationships between observations.
unsupervised learning