SOM 307 - HW Topics 1-4

Ace your homework & exams now with Quizwiz!

Data that are too large or too complex to be handled by standard data-processing techniques and typical desktop software are called _____. Predictive analytics Big data Small Data Python

Big data

Alice and Bob are both investors. Over the past year, they've invested in different assets and have kept track of the monthly returns. Alice invested in Stock A, while Bob invested in Stock B. The monthly returns (in percentage) for Stock A (Alice's investment) are: 2%, 3%, -1%, 2.5%, -0.5%, 1%, 2.5%, 3%, -1.5%, 2%, 2.5%, 1.5%. The monthly returns for Stock B (Bob's investment) are: 5%, 6%, -4%, 5%, -3%, 5%, 6%, -4%, 5%, 6%, -4%, 5%. Using the coefficient of variation (CV), determine which investment was more volatile over the year (Hint: The stock with the higher CV is more volatile). Bob's investment Not sufficient information Both have same volatility Alice's investment

Bob's investment

An environmental scientist gathered data on the concentration of a certain pollutant in a river for a month. The scientist observes that on one particular day, the concentration was exceptionally high due to an accidental spill from a factory. Which visualization technique would be the best for her to identify this specific day as an outlier based on the above scenario? Pie chart showing monthly averages of pollutant concentrations Line graph plotting daily pollutant concentrations over the month Scatter plot of pollutant concentration versus the day of the month Box plot showing the distribution of pollutant concentrations over the month

Box plot showing the distribution of pollutant concentrations over the month

A major e-commerce company wants to predict and reduce the number of product returns by customers. Which step in the CRISP procedure would involve the company defining the specific issues related to product returns and understanding the financial implications of those returns? Data preparation. Deployment. Modeling. Business understanding.

Business understanding.

Sarah is a data analyst who collected monthly salaries (in thousands) of a company's employees. She presented the following data summary: Minimum salary: $40,000 1st Quartile (Q1): $50,000 Median (Q2): $60,000 3rd Quartile (Q3): $70,000 Maximum salary: $100,000 Based on the provided data, what is the interquartile range (IQR) of the monthly salaries in this company? $20,000 $30,000 $10,000 $60,000

$20,000

Given that P(A) = 0.3, P(A | B) = 0.4, and P(B) = 0.5, compute P(A∩B) 0.15 0.02 0.12 0.60

0.02

A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. What is the probability that less than 8000 steps were taken ? 0.952 0.977 0.091 0.908

0.091

Let X be a random variable with a Uniform distribution between 8 and 20. Find the probability that X is less than 10? 0.10 0.20 0.50 0.16

0.16

A gym instructor is analyzing the workout data for four participants over a week. He recorded the number of hours spent on two activities: Cardio and Strength Training. The data is as follows: Using mean imputation, determine the missing value for Participant 2's Strength Training. Round your answer to the nearest whole number if necessary. 1 3 4 5

3

Below is a histogram for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. What is the frequency of the 21-24 bin?

3

The number of minutes that Samantha waits to catch the bus is uniformly distributed between 0 and 15 minutes. What is the probability that Samantha has to wait less than 4.5 minutes to catch the bus? 10% 20% 30% 3%

30%

Consider the data below. What percentage of students scored grade C? Grades | Number of students A - 16 B - 28 C - 33 D - 13 Total - 90 33% 37% 28% 31%

37%

At a school, 40% of students play basketball, 30% play soccer, and 15% play both basketball and soccer. If a student is selected at random and it's known that they play basketball, what is the probability they also play soccer? 15% 30% 37.5% 50%

37.5%

Below is the data for the number of days that it took Wyche Accounting to perform audits in the last quarter of last year. ​ 56, 42, 37, 29, 45, 51, 30, 25, 34, 57 ​ What is the median number of days that it took Wyche Accounting to perform audits in the last quarter of last year?

39.5

A school is planning its annual picnic. Based on historical weather data: There's a 30% chance that it will rain on the chosen date (Event A). There's a 25% chance that it will be too windy for outdoor activities on that date (Event B). There's a 10% chance that both rain and wind will occur together. If the school decides to proceed with the picnic on that date, what is the probability that the picnic will be disrupted by either rain or excessive wind? 35% 45% 55% 65%

45%

You're a data analyst given a dataset with over 100 potential predictor variables to predict the sale prices of houses. You've decided that manually picking variables isn't feasible and you want a systematic approach. Which of the following methods would you use to systematically select, add, or remove predictor variables to improve the model's performance? R-squared distribution Forward variable selection Outlier detection method Box plot

Forward variable selection

Which of the following is not an approach to making decisions? Tradition Rules of thumb Intuition Guess and check

Guess and check

_____ is the most critical step of the decision-making process. Choosing an alternative Identifying and defining the problem Evaluating the alternatives Determining the set of alternatives

Identifying and defining the problem

An experiment consists of determining the speed of automobiles on a highway by the use of radar equipment. The random variable in this experiment is a discrete random variable. continuous random variable. complex random variable. categorical random variable .

continuous random variable.

Compute the relative frequencies of Grade A for the data given in the table below: Grades | Number of students A - 16 B - 28 C - 33 D - 13 Total - 90 0.18 0.16 0.31 0.37

0.18

A nickel and a dime are tossed. If an event is defined as a single toss of both coins where at least one head appears, what is the probability of the complement of that event? 0 0.25 0.50 1.00

0.25

Given that A and B are independent variable and P(A|B) = 00.7, what P(A^c)? 0.7 0.3 0 1

0.3

Reviews of call center representatives over the last three years showed that 10% of all call center representatives were rated as outstanding, 75% were rated as excellent/good, 10% percent were rated as satisfactory, and 5% were considered unsatisfactory. For a sample of 10 reps selected at random, what is the probability that 1 will be rated as unsatisfactory? 0.315 0.075 0.125 0.981

0.315

In the probability table below, which value is a marginal probability? Completed Obstacle Course Level No Yes Total Challenging 0.4 0.3 0.7 Easy 0.1 0.2 0.3 Total 0.5 0.5 1.0 0.1 1.0 0.5 0.4

0.5

Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive-thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability that it takes less than two minute to fill an order? (you may use excel) 0.1813 0.4866 0.6321 0.736

0.736

A bucket contains 3 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and is not replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is yellow? 10/121 1/11 7/11 0.35

1/11

A survey of 100 random high school students finds that 85 students watched the Super Bowl, 25 students watched the Stanley Cup Finals, and 20 students watched both games. How many students did not watch either game? 15 30 10 20

10

Compute the 50th percentile for the following data. You may use excel if you want: 10, 15, 17, 21, 25, 12, 16, 11, 13, 22 18.6 15.5 17.7 13.3

15.5

Given a dataset with a mean of 60 and standard deviation of 10, what is the z-score for a data point of 80? 3 2 -2 0.5

2

A health conscious student faithfully wears a device that tracks his steps. Suppose that the distribution of the number of steps he takes in a day is normally distributed with a mean of 10,000 and a standard deviation of 1,500 steps. What percent of the days does he exceed 13,000 steps? 2.28% 5% 95% 97.72%

2.28%

The newest model of smart car is supposed to get excellent gas mileage. A thorough study showed that gas mileage (measured in miles per gallon) is normally distributed with a mean of 75 miles per gallon and a standard deviation of 10 miles per gallon. What is the probability that, if driven normally, the car will get 75 miles per gallon or better? 0.6% 2.5% 6% 50%

50%

A game at an arcade is in the form of a large wheel that a player spins. The wheel is programmed to give 2 tickets 50% of the time, 5 tickets 25% of the time, 10 tickets 23% of the time, and 100 tickets 2% of the time. If a player spins the wheel once, what is the expected number of tickets the player will win? 24.55 117 100 6.55

6.55

The random variable X is known to be uniformly distributed between 2 and 12. Compute E(X), the expected value of the distribution. 4 5 6 7

7

A bucket contains 2 red balls, 4 yellow balls, and 5 purple balls. One ball is taken from the bucket and then replaced. Another ball is taken from the bucket. What is the probability that the first ball is red and the second ball is yellow? 0.083 10/11 0.35 8/121

8/121

A company claims that its light bulbs are 95% reliable. If you purchase a pack of 10 light bulbs, what is the probability that exactly 9 of them work properly? =BINOM.DIST(9, 10, 0.05, FALSE) =BINOM.DIST(9, 10, 0.95, FALSE) =POISSON.DIST(9, 10, FALSE) =NORM.DIST(9, 10, 0.95, TRUE)

=BINOM.DIST(9, 10, 0.95, FALSE)

_____ acts as a representative of the population. The variance The variable A random variable A sample

A sample

A manager of a fast food restaurant wants the drive-thru employee to ask every fifth customer if he or she is satisfied with the service. Who makes up the population? All customers who use the drive-thru window of this fast food restaurant All survey respondents All customers of this restaurant The proportion of customers who say they are satisfied with their service

All customers who use the drive-thru window of this fast food restaurant

You are working on building a regression model for a marketing campaign. You compute the correlation between various predictor variables and find the following correlations: 1. Advertisement Reach and Website Visits have a correlation of 0.82. 2. Customer Age and Customer Loyalty have a correlation of 0.45. 3. Social Media Engagement and Product Sales have a correlation of -0.15. 4. CARDPROM and NUMPROM have a correlation of 0.95. Based on the guidance for identifying redundant variables, which pair(s) of variables would you consider redundant and potentially remove one from the analysis? Advertisement Reach and NUMPROM Customer Age and Customer Loyalty Social Media Engagement and Product Sales CARDPROM and NUMPROM

CARDPROM and NUMPROM

Which of the following best exemplifies big data? Five hundred Facebook users upload one thousand pictures per day. Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis. A local grocery store collects data from those that scan their loyalty card. A pharmacy keeps track of customer purchases to send its customers coupons.

Cellphone owners around the world generate vast amounts of data by calling, texting, tweeting, and browsing the Web on a daily basis.

_____ are collected from several entities at the same point in time. Cross-sectional data Categorical and quantitative data Random data Time series data

Cross-sectional data

Which one of the following approach is used to control overfitting? Box-plot approach Cross-validation Forward variable selection Stepwise regression

Cross-validation

Given that some dates of birth are recorded as mm/dd/yyyy and others as mm/dd/yy, this kind of disparity is an example of: Measurement Error Manual Data Entry Mistake Data Inconsistency Data Accuracy

Data Inconsistency

A healthcare clinic maintains a digital patient record system. While reviewing the records, the system administrator observed that for some patients, the blood type was recorded as "O pos" while for others it was listed as "O+". This variation is an example of: System glitches Data redundancy Data inconsistency Data encryption

Data inconsistency

An e-commerce website displays products with their respective weights. Upon inspection, it was noticed that a 5-pound bag of rice was listed as "5 lb" in one product description and "5lbs" in another. This discrepancy represents: Product mislabeling Data inconsistency Data compression Data normalization

Data inconsistency

A retail store owner offers a discount on product A and predicts that the customers would purchase products B and C in addition to product A. Identify the technique used to make such a prediction. Data query Simulation Data mining Data dashboards

Data mining

The use of analytical techniques for better understanding patterns and relationships that exist in large data sets is _____. Decision making Hadoop Data mining Data cleaning

Data mining

Scores on Ms. Bond's test have a mean of 70 and a standard deviation of 11. David has a score of 52 on Ms. Bond's test. Scores on Ms. Nash's test have a mean of 64 and a standard deviation of 6. Steven has a score of 52 on Ms. Nash's test. Which student has the higher standardized score? Cannot be determined with the information provided. David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score. David's standardized score is 1.64 and Steven's standardized score is 2.00. Therefore, Steven has the higher standardized score. David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, Steven has the higher standardized score.

David's standardized score is -1.64 and Steven's standardized score is -2.00. Therefore, David has the higher standardized score.

An online survey was distributed by a tech company to gather feedback on their newly released app. The survey included a mix of required and optional questions. After gathering the survey results, the company noticed that many respondents skipped the optional questions, resulting in blank cells in the dataset. Concerned about losing valuable insights, the company opted to eliminate the entries (rows) where respondents didn't answer all questions. Which approach to manage missing data is the company using? Use advanced statistical techniques Median imputation Deletion Mean imputation

Deletion

_____ are analytical tools that describe what has happened. Descriptive analytics Predictive analytics Simulation Prescriptive analytics

Descriptive analytics

_____ refers to the technology that allows data, collected from sensors in all types of machines, to be sent over the Internet to repositories where it can be stored and analyzed. Internet of Things (IoT) MapReduce Hadoop Advanced analytics

Internet of Things (IoT)

You are provided with a dataset from a pet store, which lists the types of pets people buy. The 'Pet Type' column includes categories like "Dog", "Cat", "Bird", and "Fish". How would this data look after applying the dummy variable transformation? Choose the correct transformation from the options below: Pet Type Dog Cat Bird Fish Is_Dog Is_Cat Is_Bird Is_Fish 1 1 0 0 1 0 1 0 1 0 0 1 Is_Dog Is_Cat Is_Bird Is_Fish 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 Dog Cat Bird 1 0 0 0 1 0 0 0 1 Pets Dog Cat Bird Fish

Is_Dog Is_Cat Is_Bird Is_Fish 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

A pharmaceutical company conducted clinical trials for a new drug. They collected data on various health metrics of participants throughout the trial. However, some participants missed certain check-ups, resulting in missing data in the dataset. The company decided to fill in these missing values using the average readings of the metrics. Which method of managing missing data did they employ? Deletion Mean imputation Regression imputation Median imputation

Mean imputation

An online streaming platform, like Netflix, wants to suggest shows and movies tailored to users' tastes. After collecting user activity data like watch history, search queries, and show ratings, they wish to identify patterns and features in this data, like genre preferences or watching times. Which step of the CRISP procedure best describes this activity? Data understanding Modeling Evaluation Deployment

Modeling

A manufacturing company produces light bulbs with a mean lifespan of 1,200 hours and a standard deviation of 50 hours. The lifespan of the light bulbs follows a normal distribution. Using Excel, what is the probability that a randomly chosen light bulb from this company will have a lifespan between 1,150 hours and 1,250 hours? NORM.DIST(1250,1200,50,TRUE) NORM.DIST(1150,1200,50,TRUE) NORM.DIST(1150,1200,50,TRUE) - NORM.DIST(1250,1200,50,TRUE) NORM.DIST(1250,1200,50,TRUE)−NORM.DIST(1150,1200,50,TRUE)

NORM.DIST(1250,1200,50,TRUE)−NORM.DIST(1150,1200,50,TRUE)

In a normal distribution, which is greater, the mean or the median? Mean Median Neither the mean or the median (they are equal) Cannot be determined with the information provided

Neither the mean or the median (they are equal)

A call center receives an average of 10 calls every hour. Using Excel, what is the probability that the call center will receive exactly 7 calls in a given hour? POISSON.DIST(7, 10, FALSE) =NORM.DIST(7, 10, 3, TRUE) =BINOM.DIST(7, 10, 0.1, FALSE) =EXP(-7,10)

POISSON.DIST(7, 10, FALSE)

You've been hired as a junior data scientist for a sports analytics company. Your first task is to build a predictive model to determine the future performance of football (soccer) players based on various metrics. You've been handed a dataset that contains the following variables: Player's Name, Goals Scored, Assists Made, Minutes Played, Favorite Movie Genre, Position Played, Number of Yellow Cards, and Shoe Size. Which of these variables would you consider least relevant and likely remove first when building a model focused purely on player performance? Goals Scored, Assists Made, Position Played Player's Name, Favorite Movie Genre, Shoe Size Number of Yellow Cards, Minutes Played Position Played, Shoe Size, Assists Made

Player's Name, Favorite Movie Genre, Shoe Size

The __________ probability distribution can be used to estimate the number of vehicles that go through an intersection during the lunch hour. binomial normal triangular Poisson

Poisson

_____ analytics are techniques that use models, constructed from past data, to predict the future or to ascertain the impact of one variable on another. Predictive Descriptive Simulation Prescriptive

Predictive

In the spectrum of business analytics, which is the most complex? Descriptive Predictive Prescriptive Operational

Prescriptive

_____ analytics use techniques that take input data and yield a best course of action. Prescriptive Simulation Strategic Operational

Prescriptive

Which of the following analytical techniques helps us arrive at the best decision? Predictive analytics Data mining Prescriptive analytics Descriptive analytics

Prescriptive analytics

Which of the following is NOT an example of predictive analytics in the business world? Identifying fraudulent transactions by analyzing patterns in banking data. Showing weather patterns for the upcoming month. Forecasting future sales based on historical sales data. Optimizing supply chain operations by predicting demand.

Showing weather patterns for the upcoming month.

Identify the shape of the distribution in the figure below.

Skewed right

A nutritionist conducted a study to understand if there is a relationship between the number of hours a person watches TV and the amount of Vitamin C they have in their bloodstream. After analyzing data from a sample of individuals, the calculated correlation coefficient was 0.00. Based on this result, how would you describe the strength of the relationship between the number of hours a person watches TV and the amount of Vitamin C in their bloodstream? There is a strong positive relationship. There is a strong negative relationship. There is no linear relationship. The relationship is undetermined.

There is no linear relationship.

Why is normalization of features important when building predictive models? To increase the variability in the dataset. To ensure that all features contribute equally to the model's prediction. To convert categorical data to numerical data. To handle missing values in the dataset.

To ensure that all features contribute equally to the model's prediction.

One of the 4 Vs of big data that refers to uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, and model approximations is _____. Volume Velocity Variety Veracity

Veracity

A student willing to participate in a debate competition is required to fill out a registration form. State whether each of the following information about the participant provides categorical or quantitative data. a. What is your birth month? b. Have you participated in any debate competition previously? c. If yes, in how many debate competitions have you participated so far? d. Have you won any of the competitions? e. If yes, how many have you won? a. Quantitative, b. Categorical, c. Quantitative, d. Categorical e. Quantitative a. Categorical, b. Categorical, c. Categorical, d. Categorical e. Quantitative a. Categorical, b. Categorical, c. Quantitative, d. Categorical e. Quantitative a. Categorical, b. Categorical, c. Quantitative, d. Categorical e. Qualitative

a. Categorical, b. Categorical, c. Quantitative, d. Categorical e. Quantitative

Corporate-level managers use ______ to summarize sales by region, current inventory levels, and other company-wide metrics all in a single screen. simulations crosstabulation data dashboards tables

data dashboards

The U.S. Internal Revenue Service uses _____ to identify patterns that distinguish questionable annual personal income tax filings. utility theory prescriptive analytics data mining decision analysis

data mining

Data dashboards are a type of _____analytics. predictive descriptive prescriptive decision

descriptive

Fast food restaurants pride themselves in being able to fill orders quickly. A study was done at a local fast food restaurant to determine how long it took customers to receive their order at the drive thru. It was discovered that the time it takes for orders to be filled is exponentially distributed with a mean of 1.5 minutes. What is the probability density function for the time it takes to fill an order? f(x) = 1/5e^-x/5 f(x) = 1/3e^-x/3 f(x) = 2/3e^-2/3^x None of these are correct.

f(x) = 2/3e^-2/3^x

A summary of data that shows the number of observations in each of several nonoverlapping bins is called a(n) _____. sample summary frequency distribution bin distribution observed distribution

frequency distribution

A _____ is a graphical summary of data previously summarized in a frequency distribution. line chart line chart box plot histogram scatter chart

histogram

A dashboard is a collection of tables, charts, and maps to help management _____ selected aspects of the company's performance. predict prescribe monitor underline

monitor

A forecast that helps direct police officers to areas where crimes are likely to occur based on past data is an example of _____. predictive analytics decision analysis prescriptive analytics descriptive analytics

predictive analytics

A mathematical model that gives the best decision, subject to the situation's constraints, is an a(n) _____. prescriptive predictive descriptive pie model

prescriptive

Data-driven decision making tends to decrease a firm's _____. market value productivity risk profit

risk

A __________ describes the range and relative likelihood of all possible values for a random variable. normal distribution statistical distribution density function probability

statistical distribution

If the corelation between two variables is near 0, it implies that ______. the variables are negatively related a positive relationship exists between the variables the variables are strongly related the variables are not linearly related

the variables are not linearly related

Which of the following is the examples of continuous random variable ? gender marital status. time. population of a city.

time

Data collected from several entities over a period of time (minutes, hours, days, etc.) are called _____. source data cross-sectional data categorical and quantitative data time series data

time series data

A _____ determines how far a particular value is from the mean relative to the data set's standard deviation. variance coefficient of variation percentile z-score

z-score

A student scored 85 marks in a SOM 307 test. The class mean score for the test is 78, with a standard deviation of 5 marks. Calculate the student's z-score and interpret its meaning. z=7; The student scored 7 standard deviations above the mean. z=1.4; The student scored 1.4 standard deviations above the mean. z=−1.4; The student scored 1.4 standard deviations below the mean. z=0.85; The student scored 0.85 standard deviations above the mean.

z=1.4; The student scored 1.4 standard deviations above the mean.


Related study sets

Section 3.1 Be a Diligent, Respectful, and Caring Steward

View Set

MedSurg Exam 7 practice questions

View Set

Chapter 7 Mini Sim on Segmentation, Targeting, and Positioning

View Set