BAnal Exam 1
73) What is the arithmetic mean in the following table on the variable score? Student_ID Score R304110 0.98 R304003 0.88 R102234 0.65 R209939 0.92 A) 0.92 B) 0.88 C) 0.765 D) 0.8575
D) 0.8575
85) As observations become more dispersed, the difference between the minimum and maximum observation of a variable is called __________. A) the range B) the variable C) the interquartile D) the mean absolute deviation
A) the range
46) Four observations were binned into one group. In this group, the values are: 40, 37, 38, and 33. What is the average of the group? A) 39 B) 38 C) 36 D) 37
D) 37
14) Unstructured data is best defined as A) not conforming to a predefined, row-column format. B) not conforming to a way to analyze data. C) conforming to a predefined, row-column format. D) conforming to a managing velocity.
A) not conforming to a predefined, row-column format.
28) During the winter, the ice festival committee measures the depth of the ice during the month of February. What is the type of measurement scale? A) ratio B) interval C) nominal D) numerical
A) ratio
68) The coefficient correlation for rent and square footage is computed to be 0.84, this means the relationship between the two variables are __________. A) strong and positive B) weak and negative C) weak but positive D) strong but negative
A) strong and positive
34) Mary in the accounting department has been assigned a specific vehicle as her company car to perform audits. This represents which type of relationship? A) 1 : 1 B) 1 : M C) M : N D) M : M
A) 1 : 1
48) The following table contains 2 variables with 2 observations. A new variable was created named Sum. This is the sum of the values x1 and x2 for each observation. What is the average value of Sum if the chart is completed? x1 x2 Sum 76 22 80 30 A) 104 B) 51 C) 98 D) 110
A) 104
49) The following table contains 2 variables with 2 observations. A new variable was created named Sum. This is the sum of the values x1 and x2 for each observation. What is the average value of Sum if the chart is completed? x1 x2 Sum 76 22 82 32 A) 106 B) 53 C) 98 D) 114
A) 106
99) Marin produced the following histogram based on his observations on the age of players willing to sample a video game. He then organized the age into frequencies and interval width of the respective intervals. Based on the results, which range has the best frequency for future video game sampling? A) 16-18 B) 19-22 C) 23-25 D) 8-10
A) 16-18
52) Ann is analyzing a data set that contains two variables, Job Title and 401K. 401K contains the name of the three companies that carry the retirement accounts. It is mandatory to have an account, thus no observation is blank. If 401K was transformed to dummy variables, how many should be created? A) 2 B) 3 C) 4 D) 1
A) 2
98) Using the following table, what is the percent of the relative frequency of a Blue car being observed? Car Color Frequency Relative Frequency Black 251 0.2660 Red 198 0.2100 Blue 203 0.2150 Silver 293 0.3100 A) 21.48% B) 2.1480% C) 20.30% D) 0.2148%
A) 21.48%
97) Using the following table, what is the percent of the relative frequency of a Blue car being observed? Car Color Frequency Relative Frequency Black 251 0.2623 Red 198 0.2069 Blue 215 0.2247 Silver 293 0.3062 A) 22.47% B) 2.2470% C) 21.50% D) 0.2247%
A) 22.47%
62) Which of the following is NOT an example of categorical data transformation? A) Category binning B) Category reduction C) Category scores D) Dummy variables
A) Category binning
42) Using the omission strategy, what value would be placed in the missing observation in x1? x1 x2 76 22 82 86 32 41 83 28 A) No value because excluded B) 82 C) 80 D) 65
A) No value because excluded
43) Using the omission strategy, what value would be placed in the missing observation in x1? x1 x2 76 22 82 91 32 41 88 28 A) No value because excluded B) 84 C) 83 D) 90
A) No value because excluded
64) Which of the following functions do you use to calculate the pth percentile in Excel? A) PERCENTILE.INC B) QUANTILE.INC C) quantile D) summary
A) PERCENTILE.INC
22) Which one is a drawback of interval-scaled data? A) The zero point is arbitrarily chosen. B) The degree of measurement is not a whole number. C) The scale is categorized and qualitative. D) The scale in nominal and zero point is meaningful.
A) The zero point is arbitrarily chosen.
93) If the correlations coefficient is 0, then x and y A) are not linearly related. B) are absolute and perfectly related. C) have a perfect positive relationship. D) have a perfect negative relationship.
A) are not linearly related.
35) The primary purpose of a(n) _____________ is to support decision-making and provide a composite view of the organization. A) data warehouse B) data mart C) entity D) attribute
A) data warehouse
1) Which of the following broad categories is not a type of analytic technique? A) manipulative analytics B) descriptive analytics C) predictive analytics D) prescriptive analytics
A) manipulative analytics
37) In the presence of outliers in a data set, extremely small or large values, it is preferred to use the _____________ instead of the _____________ to impute missing variables. A) median; mean B) mean; median C) subset; total D) average; range
A) median; mean
83) In the following Boxplot, the left whisker is longer than the right whisker. This indicates that the underlining distribution is __________. A) negatively skewed B) median skewed C) positively skewed D) no indicated skew
A) negatively skewed
13) A New York Times article notes that there are an increasing number of people calling for tech companies to ease their grip on the personal data of consumers. The concern is that a handful of companies holds most of the data. The immense amount of data is also called A) volume. B) veracity. C) values. D) variety.
A) volume.
47) Four observations were binned into one group. In this group, the values are: 40, 45, 38, and 33. What is the average of the group? A) 41 B) 40 C) 38 D) 39
D) 39
92) In analyzing the S&P 500 and the XYZ Incorporated in a five-year study, the covariance (S&P 500, XYZ Incorporated) is 9,107.30. What kind of linear relationship does the S&P 500 and the XYZ Incorporated have? A) a negative linear relationship B) a positive linear relationship C) no linear relationship D) a neutral linear relationship
B) a positive linear relationship
82) Using the following Boxplot, what is the star to the far right considered? A) upper quartile mark B) an outlier C) a whisker D) a deviation
B) an outlier
3) A massive volume of both structured and unstructured data that is extremely difficult to manage, process, and analyze is known by which catch phrase? A) wrangling B) big data C) data mining D) general data
B) big data
9) Mary asks her friends on Facebook for recommendations for the best restaurants in Chicago. The results are then placed in a table for review. What does the data represent? A) time-series data B) cross-sectional data C) numerical data D) quantitative data
B) cross-sectional data
33) Which term represents data items, events, or things stored in a database file? A) instance B) entity C) settings D) quantitative
B) entity
51) In the following table, there are four observations with three variables. Which category is the best fit to be transferred into dummy variables? Marital Status Age Income Single 24 $45,000 Married 26 $33,000 Single 33 $53,000 Married 28 $59,000 A) age B) marital status C) income D) none are a good fit for a dummy variable
B) marital status
69) In a boxplot, the dashed vertical line in the middle of the box represents which of the following measures of location? A) mean B) median C) mode D) percentile
B) median
71) What is the only meaningful measure of the central location for a categorical variable? A) mean B) mode C) median D) model
B) mode
1) Find the mode of the values: 1, 4, 5, 3, 4, 6, 1, 2, 4, 3, 2 A) 1 B) 2 C) 3 D) 4
D) 4
21) A large retailer is asking each customer at checkout for their zip code. If the zip code is the only recorded variable, what is the type of measurement scale? A) ratio B) nominal C) ordinal D) interval
B) nominal
27) An instructor hands out course evaluations where students have a rank of 0 to 5. What is the best way for the data to be measured? A) filtered B) ordinal C) nominal D) numerical
B) ordinal
25) Which one of the following variables is numerical? A) country B) population C) name D) religion
B) population
26) Which one of the following variables is numerical? A) city B) population C) state D) color
B) population
63) The variable x1 contains three categories ranging from "Poor" to "Good." Convert the category names into category scores into x2 (i.e., 1 = "Poor", 2 = "OK", and 3 = "Good"). How many observations have a category score of 1? x1 x2 OK Good Good Poor OK Good A) 0 B) 1 C) 2 D) 3
B) 1
87) Find the Mean Absolute Deviation (MAD) of 13, 9, 9, 11, 13. A) 4.0 B) 1.60 C) 11.0 D) 3.0
B) 1.60
56) Marcus wants to include the month of the year in the analysis as categories. How many dummy variables will be needed? A) 12 B) 11 C) 6 D) 1
B) 11
88) Find the Mean Absolute Deviation (MAD) of 10, 9, 3, 8, 10. A) 5 B) 2 C) 8 D) 4
B) 2
75) Carmen is a professor at a local university. After collecting data on her Introduction to Business course for a year, she wants to calculate the z-score for a student who scores 90 on the final exam. The mean and the standard deviation scores on the exam are 76 and 6, respectively. Calculate the z-score. A) 1.33 B) 2.33 C) 0.33 D) 2.67
B) 2.33
74) Carmen is a professor at a local university. After collecting data on her Introduction to Business course for a year, she wants to calculate the z-score for a student who scores 86 on the final exam. The mean and the standard deviation scores on the exam are 76 and 4, respectively. Calculate the z-score. A) 1.50 B) 2.50 C) 0.50 D) 1.75
B) 2.50
96) Find the mean of the values: 1, 4, 5, 3, 4, 6, 1, 2, 4, 3, 2 A) 3 B) 3.18 C) 3.27 D) 4
B) 3.18
41) Using the simple mean imputation strategy, what value would be placed in the missing observation in x1? x1 x2 76 22 82 91 32 41 88 28 A) No value because excluded B) 84 C) 83 D) 90
B) 84
40) Using the simple mean imputation strategy, what value would be placed in the missing observation in x1? x1 x2 79 22 85 91 32 41 88 28 A) No value because excluded B) 86 C) 84 D) 69
B) 86
45) The function that provides a natural logarithm in Excel is? A) INT function B) LN function C) YEARFRAC function D) VLOOKUP function
B) LN function
80) Candice is preparing for her final exam in Statistics. She knows she needs 80 out of 100 to earn an A overall in the course. Her instructor provided the following information to the students. On the final, 200 students have taken it with a mean score of 72 and a standard deviation of 6. Assume the distribution of scores is bell-shaped. Calculate to see if a score of 80 is within one standard deviation of the mean. A) Yes, 80 is the upper number of one standard deviation from the mean. B) No, the upper level of one standard deviation is 78. C) Yes, 80 is greater than the 66, one standard deviation below the mean. D) No, 80 is greater than the mean of 72.
B) No, the upper level of one standard deviation is 78.
61) What data preparation technique is Maeve using when she extracts a payroll data set into two separate files, one for hourly employees and one for salary employees? A) Separating B) Subsetting C) Typesetting D) Wrangling
B) Subsetting
89) The standard deviation of midterm scores and the final exam are 5.0 and 4.5, respectively. Which of the two exams is riskier and why? A) Both the midterm and the final share the same amount of risk. B) The midterm exam is riskier because the standard deviation is higher. C) The midterm exam is riskier because the standard deviation is lower. D) There is not enough information to determine which is the riskier of the two.
B) The midterm exam is riskier because the standard deviation is higher.
90) The standard deviation of midterm scores and the final exam are 8 and 6, respectively. Which of the two exams is riskier and why? A) Both the midterm and the final share the same amount of risk. B) The midterm exam is riskier because the standard deviation is higher. C) The final exam is riskier because the standard deviation is lower. D) There is not enough information to determine which is the riskier of the two.
B) The midterm exam is riskier because the standard deviation is higher.
91) In analyzing the S&P 500 and XYZ Incorporated in a five-year study, the covariance (S&P 500, XYZ Incorporated) is −7,303.30. What kind of linear relationship does the S&P 500 and XYZ Incorporated have? A) a positive linear relationship B) a negative linear relationship C) no linear relationship D) a neutral linear relationship
B) a negative linear relationship
44) Mark wants to have a better understanding of his client base at the credit union. To do so, he is running a report to show loan amount approval with corresponding credit scores. He realized the data set is quite large and wants to create categories by grouping. To do this, he needs to do all the following except A) identify the value he wants to transform into smaller groups or bins. B) remove 20% of the data to create a training set. C) ensure the data sets are not overlapping. D) identify how he wants the observations to be labeled in the bin.
B) remove 20% of the data to create a training set.
11) What term refers to the credibility and quality of data? A) volume B) veracity C) values D) variety
B) veracity
100) Using the following histogram, how would the distribution be described? A) bell-shaped B) symmetric skewed C) positively skewed D) negatively skewed
C) positively skewed
86) The interquartile range is IQR = Q3 − Q1. Thus, it can be thought of as A) the 75% interquartile range. B) the quartile or 25% of the variable. C) the middle 50% of the variable. D) the incorporation of all observations.
C) the middle 50% of the variable.
10) In big data, the most important aspect of any analytic initiative is __________. A) volume B) veracity C) values D) variety
C) values
20) A large retailer is asking each customer at checkout for their zip code. If the zip code is the only recorded variable, what would the summarized results field headers be in tabular format? A) zip code B) customer number, zip code, count C) zip code, count D) count
C) zip code, count
70) What is the median in the following table on the variable score? Student_ID Score R304110 0.98 R304003 0.88 R102234 0.65 R209939 0.92 A) 0.86 B) 0.88 C) 0.90 D) 0.92
C) 0.90
38) In a data set with 24 variables, if 17% of the values, randomly spread across observations, are missing (blank), what is the probable percent of complete and usable observations? A) 83% B) 17% C) 1.14% D) 1.66%
C) 1.14%
39) In a data set with 20 variables, if 8% of the values, randomly spread across observations, are missing (blank), what is the probable percent of complete and usable observations? A) 92% B) 8% C) 18.87% D) 15.29%
C) 18.87%
78) The mean credit score is 640 out of 300 used car loan applicants with a standard deviation of 16. Assuming a bell-shaped curve, what is the number of loan applicants that fall within a score of 608 and 672? A) 96 B) 204 C) 285 D) 300
C) 285
77) The mean credit score is 645 out of 310 used car loan applicants with a standard deviation of 17. Assuming a bell-shaped curve, what is the number of loan applicants that fall within a score of 611 and 679? A) 68 B) 211 C) 295 D) 310
C) 295
53) Transform the marital status into dummy variables where Single = 1 and Married = 0. How many would have the category score of 0? Marital Status Age Income Married 24 $45,000 Married 26 $33,000 Married 33 $53,000 Single 28 $59,000 Single 36 $62,000 Single 29 $48,000 A) 1 B) 6 C) 3 D) 0
C) 3
66) Find the median of the values: 1, 4, 5, 3, 4, 6, 1, 2, 4, 3, 2 A) 1 B) 2 C) 3 D) 4
C) 3
1) Transform the marital status into dummy variables where Single = 1 and Married = 0. How many would have the category score of 0? Marital Status Age Income Single 24 $45,000 Married 26 $33,000 Single 33 $53,000 Married 28 $59,000 Married 36 $62,000 Married 29 $48,000 A) 2 B) 6 C) 4 D) 0
C) 4
81) Using the following Boxplot, identify the median score on the test. A) 28 B) 68 C) 60 D) 90
C) 60
76) The empirical rule states all the following except A) <p>almost all observations fall in the interval. B) <p>approximately 95% of all observations fall in the interval <i>. C) <p>approximately 65% of all observations fall in the interval <i>.</i></p> D) <p>approximately 68% of all observations fall in the interval.
C) <p>approximately 65% of all observations fall in the interval <i>.</i></p>
58) Which of the following Excel functions will Ibrahim use to determine how many employees make more than $20 per hour? A) COUNT B) COUNTA C) COUNTIF D) COUNTIFS
C) COUNTIF
30) Which of the following is not related to data privacy? A) Data collection B) Data ethics C) Data usage D) Data transmission
C) Data usage
57) Kara is reviewing categories where a series of numbers represent the type of loan. She would prefer the actual name of the loan be retained when running her analysis. Using Microsoft Excel, what function will allow Kara to retain the category name instead of recording them in numbers? A) log function B) view function C) IF function D) head function
C) IF function
60)Using the imputation strategy for categorical values, what value would be placed in the missing observation in Fav_Color? Age Fav_Color 22 Purple Blue 32 Purple 41 Red 28 A) No value because excluded B) Blue C) Purple D) Red
C) Purple
4) The 2019 FIFA Women's World Cup contained 52 matches in total with 24 teams competing. The use of __________ data will display team standings during and at the end of the tournament. A) split-section B) organized C) cross-sectional D) numerical
C) cross-sectional
29) At the local animal shelter, each animal is marked if they are a boy or a girl. What is this type of measurement scale? A) filtered B) ordinal C) nominal D) numerical
C) nominal
19) Cassidy is researching the impacts of eating breakfast on college students who have classes prior to 9 am. To do this, she issued a Likert scale questionnaire to the students, with a scale of 1 through 10 to answer a series of 20 questions. What is the type of measurement scale? A) ratio B) nominal C) ordinal D) interval
C) ordinal
7) According to a report in US Today, 45% of young people between the ages of 18-28 have at least two tattoos. What do the overall observations in the study represent? A) categorical data B) random data C) population D) sample set
C) population
8) According to a report in US Today, 38% of young people between the ages of 18-29 have at least one tattoo. What do the overall observations in the study represent? A) categorical data B) random data C) population D) sample set
C) population
72) What is the arithmetic mean in the following table on the variable score? Student_ID Score R304110 0.99 R304003 0.82 R102234 0.65 R209939 0.92 A) 0.92 B) 0.82 C) 0.735 D) 0.8450
D) 0.8450
59) Which of the following Excel equations will identify the number of married individuals under the age of 30? Marital Status Age Income Single 24 $45,000 Married 26 $33,000 Single 33 $53,000 Married 28 $59,000 Married 36 $62,000 Married 29 $48,000 A) =COUNT(A2:A7, "=Married", B2:B7,"<30") B) =COUNTA(A2:A7, "=Married", B2:B7,"<30") C) =COUNTIF(A2:A7, "=Married", B2:B7,"<30") D) =COUNTIFS(A2:A7, "=Married", B2:B7,"<30")
D) =COUNTIFS(A2:A7, "=Married", B2:B7,"<30")
36) Mary has been tasked with reviewing a large data file. She wants to begin by first inspecting the number of values in each cell, both numeric and non-numeric, for any blank entries. The plan is to first find the blank or missing values for first review. Using Excel, what function(s) should she use to complete this task? A) COUNT B) COUNTA C) COUNTIF D) Both COUNT and COUNTA
D) Both COUNT and COUNTA
31) __________ is a set of data that are organized and processed in a meaningful and purposeful way. A) Data B) Knowledge C) Statistics D) Information
D) Information
79) Candice is preparing for her final exam in Statistics. She knows she needs 76 out of 100 to earn an A overall in the course. Her instructor provided the following information to the students. On the final, 200 students have taken it with a mean score of 68 and a standard deviation of 6. Assume the distribution of scores is bell-shaped. Calculate to see if a score of 76 is within one standard deviation of the mean. A) Yes, 76 is the upper limit of one standard deviation from the mean. B) Yes, the upper level of one standard deviation is 74. C) Yes, 76 is greater than the 62, one standard deviation below the mean. D) No, 76 is greater than one standard deviation above the mean, 74.
D) No, 76 is greater than one standard deviation above the mean, 74.
50) When too many variables are categorized in an analysis, several potential issues may occur. Which of the following is not one of the issues that may occur? A) model performance suffers B) rarely occurring categories may not be captured accurately C) difficulty in differentiating among observations D) an increase in the number of categories as the data set becomes larger
D) an increase in the number of categories as the data set becomes larger
16) The length of fishes in a pond is what type of variable? A) distraction B) discrete numerical C) categorical D) continuous numerical
D) continuous numerical
17) The time in hours spent sleeping per day is what kind of variable? A) distraction B) discrete numerical C) categorical D) continuous numerical
D) continuous numerical
84) The degree of strength of the linear relationship between x and y is called? A) correlation determination B) index C) standard deviation D) correlation coefficient
D) correlation coefficient
23) Of the following numerical variables, which is continuous? A) number of stocks B) cars sold by a car dealer C) number of children D) height
D) height
18) Molly Nelson has been collecting temperatures in degrees Fahrenheit, daily over the past five spring seasons, to determine the optimal point to plant her heirloom tomatoes. Because the difference between each degree is the same, irrelevant of the temperature, this is what type of measurement scale? A) ratio B) nominal C) ordinal D) interval
D) interval
2) The people of Appleton, WI represent the __________, whereas we analyze the education level of a subset or __________ to make inferences about the population. A) information; cross-section B) population; information C) items; sample D) population; sample
D) population; sample
67) Survey results provided the skewness coefficient is 0.21672 and the (excess) kurtosis coefficient is −1.15926. These values imply that the return value for the survey is __________ skewed, and the distribution has a __________ tail than a normal distribution. A) negatively; longer B) negatively; shorter C) positively; longer D) positively; shorter
D) positively; shorter
15) Tobias Smith is working with his company's data to examine inventory information. His intent is to use the variables to express ratios on inventory turnover. Based on this description, what is the strongest level of measurement being used? A) continuous variable B) interval scale C) categorical D) ratio scale
D) ratio scale
5) According to a report in US Today, 37% of young people between the ages of 19-30 have at least one tattoo. What does the 37% represent? A) categorical data B) random data C) population D) sample set
D) sample set
6) According to a report in US Today, 38% of young people between the ages of 18-29 have at least one tattoo. What does the 38% represent? A) categorical data B) random data C) population D) sample set
D) sample set
55) Michael is examining a data set and trying to determine which category he can transform into a dummy variable. Of the four variables, Employee Number, Pay Rate, Hire Date, and Sex, which is the best fit to use a dummy variable? A) employee number B) pay rate C) hire date D) sex
D) sex
95) If the coefficient correlation is computed to be −0.85, this means the relationship between the two variables are __________. A) strong, positive B) weak, negative C) weak, positive D) strong, negative
D) strong, negative
32) Which of the following is NOT a process of the data management system? A) acquire B) distribute C) store D) summarize
D) summarize
12) When compiling data, it is important to know data comes in all types, forms, and granularity. This is known as A) volume. B) veracity. C) values. D) variety.
D) variety.
94) If the coefficient correlation is computed to be 0.25, this means the relationship between the two variables are __________blank. A) strong, positive B) weak, negative C) strong, negative D) weak, positive
D) weak, positive
24) Of the following numerical variables, which is continuous? A) number of goals scored B) number of stocks C) number of children D) weight
D) weight