Data for Decision Making
You are interning at a large e-commerce company as a data analyst. Your manager sends you a large file containing over 500,000 records. Each record represents a transaction, including the revenue (i.e., the dollar amount of the sale) generated from that transaction. If your task is to send back a report showing the quarterly revenue for 2020, how many rows of data will be shown in your report?
$
The database model shown below is partially completed. Assuming that the model should reflect that each invoice can document the acquisition of many materials in the invoice's details, and each material can be acquired many times through details on invoices, which of the following describes relationship(s) that should be added to the model based on this information? NOTE: In the answers below, 1:M reads as "One-to-Many".
1:M relationship from INVOICE to INVOICE_DETAILS and a 1:M relationship from MATERIAL to INVOICE_DETAILS
Which of the following statements is correct? A. A child table must have a foreign key. B. Every parent table must have a foreign key. C. A zip code attribute is a good selection for primary in a customer table. D. A child table will either have a foreign key or a primary key. E. A foreign key field cannot be duplicated.
A child table must have a foreign key.
Consider the diagram below. Which of the following best describes the data model regarding clients making payments on loans?
A client can make payments towards multiple loans.
What is a primary key?
A field that uniquely identifies a record in a table.
A Word cloud will be least effective in visualizing
A paragraph where each word occurs the same number of times
Karen downloaded App review data from the Apple App Store. Using her data skills, Karen created a data model that accurately represents the review data. Referring to data model below, which table(s) would be required in a query to answer a question about the average number of installations by App?
App
When creating a column chart that displays the total quantity sold of each type of cookie (e.g., chocolate chip, sugar), what type of visualization could be used to allow the user of the chart to view the total quantity sold of each type of cookie for a specific region of the United States in the column chart?
A slicer configured to slice on the region field
Which of the following must be included in the introduction statement for a survey?
A statement that motivates the respondent to participate in the survey
Which of the following is used by companies' to make their data accessible as public sources?
API
Sam's Subs collected customers' email addresses during a customer satisfaction survey. In the survey, it was optional to provide an email address. Sam would like to send a promotional coupon to customers using these email addresses, but he noticed some emails are missing the '.' (dot character), for example mike@gmailcom. He would like to identify the bad emails. Which cleaning strategy technique should Sam use?
Add a conditional column to indicate whether or not an email is valid based on whether or not the email contains a dot (.) character
Jerry would like to create a new attribute in a table that is equal to multiplying two other attributes in that table. Which of the following strategies will allow Jerry to create this new attribute?
Add a custom column
Sharon would like to create a new attribute in a table that is equal to an existing attribute in that table plus 0.5. Which of the following strategies will allow Sharon to create this new attribute?
Add a custom column
If the purpose of this visualization is to compare number of monthly orders in 2017 and 2018, which of the following changes should be made to the visualization in order to improve its effectiveness
Add axis titles and a graph title
Each of the following represent sound advise when it comes to determining the appropriate visualization type, except _____. A. Consider the content that you're trying to visualize B. Focus on requirements. That is, what is the business need and how is this information going to be used, and by whom C. Consider the data available for your visualization D. Consider any aggregation (e.g., sum, count, average) or filter requirements E. All of above
All of above
Which of the following visualizations can be made interactive using a slicer? A. Line B. Cluster column C. Scatter Plot D. Stacked Bar E. All of Above
All of above
Tom sent his last month's sales report to his supervisor. The report shows sales for each customer, for each product category and for each state. His supervisor returned the report back to Tom and asked for a consistency check. Which of the following needs to be considered while checking for consistency? A. Does the sum of each state's sales equal the overall total amount of sales? B. All of the other answers should be considered when evaluating consistency C. Is all summary information in agreement with detailed information? D. Does the sum of each product category sales equal the overall total amount of sales? E. Was each sale conducted using the same currency?
All of the other answers should be considered when evaluating consistency
Which of the following is a common problem with secondary data? A. Different objectives B. Outdated information C. All other choices are correct D. Possible low data quality E. Different unit of measurement
All other choices are correct
What is the primary difference between an entity and an attribute?
An entity is a table that stores information about objects, where as an attribute is a column or specific field of the data elements associated with an entity.
If we store information about the performance of a business process, such as average number of times per month we cycle through our inventory, that information is known as _____.
Analytical information
Consider Table 1 and Table 2 below. What steps would need to be performed in order to combine Table1 and Table 2 to create a query showing average graduation rate by state?
Append Table1 and Table2
Consider Table 1 and Table 2 below. What steps would need to be performed in order to combine Table1 and Table 2 to create a query showing average graduation rate of all schools?
Append Table1 and Table2
Which of the following survey questions is not constructed correctly?
Are you satisfied with your pay and work environment?
Which of the following is a "tool" composed of multiple visualizations (usually interactive) that allow the user to draw their own conclusions (and thus support decision making) by examining the data?
Business intelligence dashboard
A column classifying customers by their gender is an example of what type of variable?
Categorical - Nominal
Which of the following is an advantage of closed-ended questions?
Closed-ended questions are easier to tabulate and analyze
A car dealership's month-end report includes the number of cars sold by car make (e.g., Porsche, Ford) and vehicle type (e.g., SUV, truck) within the given month. Which visualization type would be best for presenting the number of cars sold that month by vehicle make and type?
Clustered bar chart
A grocery store's daily sales report includes the total number of products sold by department (e.g., meat, produce, dairy) and brand (e.g., Sunkist, Perdue). Which visualization type would be best for presenting the total number of products sold per day by department and brand?
Clustered bar chart
Consider the data below that was collected about a few songs. Considering only the given data, which of the attributes meets the requirements for a primary key?
Code
An attribute in a data model is equivalent to a _______ in the database table.
Column
he Giant Beagle is a regional food store chain that uses a customer loyalty card to collect information about customer purchase habits. The loyalty cards are free, and the sign up process is, frankly, quite lax. A summer intern was asked to evaluate the quality of the data. In her evaluation, she identified a reoccurring problem - the recording of the customer's email address in the zip code field. From a data quality perspective, the loyalty card data is an example of data that is not ___________.
Consistent or Accurate
Consider the ecommerce dashboard shown below. The choice of white text on the navy background is an example of which professional design principle?
Contrast
Consider the sample data below regarding purchases made via an e-commerce website. Which of the following transformations should be performed to determine the number of unique customers that have placed an order at least one time?
Count distinct values on customerID column
Which of the following attributes would contain discrete numerical data?
Count of the number of persons living in a household
The following is a snapshot of a database that captures the movie tickets sold in 2018. Primary keys are underlined, foreign keys are italicized. The first table is the movie table. The second table is the order table. Each table has more records than shown in the figure. You would like to have a list of movies longer than 80 minutes in the Crime genre. The list should also show the lowest budget movie in the first row. How would you achieve that?
Create a filter that indicates the Length must be greater than 80 Genre and Genre must equal Crime , and then sort by Budget ascending
Refer to the product, order, and order_line tables below. Which of the following best describes what is wrong with the row highlighted in yellow?
Creating this row in order_line violates a referential integrity constraint because order 024 does not exist.
The database follows the below data model that has three related tables: Customer, Order, Product. Which of the following tables should be used in a query in order to show the total number of customers from the state of PA?
Customer table
______ involves the visual representation of data, ranging from single charts to comprehensive dashboards.
Data Visualization
Which of the following allows an additional question to be added to a respondent's survey based on a specific answer to a prior question?
Display logic
Which of the following survey questions is incorrectly constructed?
Do you enjoy spending time with your friends and family?
Consider the diagram below. Which of the following statements is correct about the relationships shown in the model?
Each bank account belongs to only one customer.
Bill's Goatee Grooming Paraphernalia (BGGP) spent Q3 and Q4 of the last calendar year implementing various strategies designed to boost its on-line sales. The figure below reflects BGGP's total sales for last year. Which of the following statements accurately reflects the results of BGGP's efforts to boost on-line sales?
Efforts to boost on-line sales were unsuccessfu
Joe works for a company and his address and contact information are stored in the Employee_Contact entity shown below. His bank account information for paycheck direct deposit is stored in the Employee_Deposit entity. Joe's paycheck is deposited into two different accounts at two different banks. Which field (attribute) serves as the Foreign Key in the Employee_Deposit entity?
Employee_ID
Consider the sample data and query below regarding hotel room reservations. Primary keys are underlined, foreign keys are italicized. Which of the following is a claim that could be supported by the QUERY RESULT showing breakfast_id and how many times that breakfast was served?
English breakfast is the most popular breakfast
Which of the following statements about primary keys is incorrect? A. Values in a primary key column should not be modified or updated. B. Every record must have a primary key value. C. A primary key field cannot contain nulls. D. Every primary key must be linked to a foreign key. E. No two records can have the same primary key value.
Every primary key must be linked to a foreign key.
Consider the sample data shown below. What type of query was used to produce the query result?
Filter products.product_name to include products that contain Brownie AND Sugar Cookie, then group by products.product_name with sum of quantity
Consider the sample data shown below. If you want to create a query that shows only the Cookie products, which of the following transformations you would need to do?
Filter products.product_name to include products whose name contain Cookie
The following image is a snapshot of two queries (MOVIES and ORDER) that capture 2018 movie ticket sales. In each table, primary keys are underlined and foreign keys are italicized. Each query contains more records than is shown in the image. Which of the following answers describes steps that can be used to determine the total count of tickets sold for each movie on 10/21/2018? The results must, at a minimum, display the total count of tickets sold for each movie by name. NOTE: The order of the fields in the result is not important.
Filter the ORDER query's Date column for the date 10/21/2018 only. Group the records in the ORDER query by Movie id using sum aggregation on the Number of tickets field creating a result that displays total number of tickets sold by Movie id. Merge the MOVIE table into the current query using an Inner Join based on the Movie Id field found in both queries. Expand the merged MOVIE fields and retain the Movie title field.
Consider the sample data below. How should one transform the data to determine the total procedure fees charged for procedures on Joe Kelly?
Filter the procedure table to only include patientID 2 and then sum the procedure fee column
Consider the sample data below. Primary keys are underlined, foreign keys are italicized. How should one transform the data to determine the total of fees charged for procedure type 43192?
Filter the procedure table to only include procedure type 43192 and then sum the procedure fee column
What aspect of a table will make it possible to retrieve additional information about each row from another table?
Foreign key
Consider Table 1 below. How should one transform the data to create a query showing the number of customers at each company?
Group by company name and count rows within each group
Consider the sample data shown below. How would you determine the total transaction amount for each payment type used by customers?
Group by payment_type and sum the total_transaction_amount within each group
When importing a commodity, the Shipper is the company who is the supplier of the commodity and Shipment is the cargo of commodities. Given the relationship shown below between Shipment and Shipper, how could you determine the number of shipments carried out by each ShipperID?
Group by the SHIPPER ID in the SHIPMENT table and count rows in the SHIPMENT table
Consider the sample data and query below regarding purchases made via an e-commerce website. What type of transformation was used to produce result in the query produced?
Group by with count aggregation
Consider the sample data and query below regarding purchases made via an e-commerce website. What type of transformation was used to produce result in the query produced?
Group by with sum aggregation
The following is a snapshot of a database that captures the movie tickets sold in 2018. Primary keys are underlined, foreign keys are italicized. The first table is the movie table. The second table is the order table. Each table has more records than shown in the figure. You would like to know the average budget for each genre. How would you achieve this?
Group the records in movie table by genre and average budget within each group
Which of the following is NOT a goal of infographics? A. Simplify complex information B. Engage audiences by telling a story C. Condense long documents into fewer pages D. Incorporate images unrelated to the content E. Enhance boring text documents with illustrative visuals
Incorporate images unrelated to the content
What error was made in constructing the following survey question?
It is a double-barreled question
What error was made in constructing the following survey question? Indicate your agreement with the following statement: Pizza and spaghetti are delicious foods. A. Strongly agree B. Somewhat agree C. Neither agree or disagree D. Somewhat disagree E. Strongly disagree
It is a double-barreled question.
Consider the visualization below that shows total units sold by month and manufacturer (e.g., Aliqui, Natura). During what month of 2014 did total units sold of Natura products exceed total units sold of VanArsdel products?
June
You would like to show the trend in Apple's daily stock price during the last quarter. Which of the following visualizations is appropriate?
Line Chart
When considering the line chart below, we can clearly conclude that the Budget Variance % is ________.
Lowest in Month 4
Shown below is a visualization created by Borizen Wireless using two customer related variables. The techsavy variable (shown on the Y axis) reflects customer use of Borizen's advanced technology features and the income variable (shown on the X axis) reflects customer annual income in dollars. If a "dot" in the figure below represents one customer, which of the following statements is true?
More lower income customers have techsavy scores above 30 than higher income customers
Consider the product data below that was collected for a few of the products available in an apparel store. When designing a product table that will contain all of the store's products, which of the following attributes could be used as the primary key? T hat is, which of the following attributes meets the requirements for a primary key? A. price B. department C. size D. color E. None of the other choices is correct.
None of the other choices is correct
Jessie works for United Health. Using her data skills, Jessie created a data model that accurately represents the relationships between Patients, Visits, Doctors and doctor Specialty. Referring to data model below, which table(s) would be required in a query whose purpose is to display the count of visits by patient. The query results must display the patient's last name along with the count of visits.
PATIENT and VISIT
A primary key combined with a foreign key creates:
Parent-Child relationship between the tables that connect them
Consider the visualization below. Which of the following claims can be supported by this visualization?
Pennsylvania has the highest number of calls to customer service among households with 1 to 3 individuals
Consider the visualization below. The X axis represents order amount ($) and Y axis represents shipping cost ($) across several transactions in a warehouse system. Which kind of relationship does the visualization reveal?
Positive; as order amount increases shipping cost increases
JoAnn works for a market research firm. Her firm has been hired by the Giant Beagle food stores to gauge customers' interest in an increased variety of international foods being available at the stores. As project manager, JoAnn has decided that a survey of a random sample of Giant Beagle's existing customers is the best way to collect data to answer this question. JoAnn is suggesting a _____ data collection effort A. secondary B. none of the other answers are correct C. convenient D. purposeful E. primary
Primary
Sam is collecting information for the quarterly board meeting. She is very thorough and double-checks the information used for the meeting to ensure high-quality information is used. Which of the following should she NOT consider as it is not a characteristic of high-quality information?
Quantity
You would like to explore the relationship between the time spent on social media (e.g. 3 hours) and the time spent on physical exercise (e.g. 2 hours). Which of the following visualizations is appropriate?
Scatter plot
Sebastian works for a local hardware store and is seeking information about upcoming weather patterns so that he can decide whether to purchase additional shovels, snow blowers, and salt from his supplier. He is able to obtain weather data from Weather.com for the next 60 days and view it in Power BI. What type of data source is this?
Secondary public data
If Duquesne was trying to determine how many people visit campus each day, which of the following represents the best source of data to answer this business question?
Send a survey to all DUQ.EDU email addresses and ask them how many times they visit campus each week
Java-O, a coffee franchise, would like to know what their customers like the most about their coffee offerings (e.g. taste, smell, price, etc.). Which data source should Java-O choose based on their objective?
Send an online survey to existing customers
JuJu works for Yachts-R-Us, and he wants to identify the 5 most expensive yachts listed in the Product table. Assuming the yacht price is an attribute in the Product table, JuJu should ______.
Sort the Product table in Descending order by price and look at the first 5 yachts
eremy works for a veterinarian's office and has access to the data about customer's pets. He is sourcing this data to determine the average age of each type of pet (e.g., dog, cat, snake) the office serves. What type of sourcing does this represent?
Sourcing from the internal enterprise database
Sebastian works for a local hardware store and is seeking information about upcoming weather patterns so that he can decide whether to purchase additional shovels, snow blowers, and salt from his supplier. He is able to obtain weather data from Weather.com for the next 60 days and view it in Power BI. What type of sourcing does this represent?
Sourcing via web scraping
A street address attribute of a customer table should not be the primary key since ______.
Street address is likely to change.
A university dining hall is running different queries based on their transactions in 2020 to answer different business questions. Which of the below queries would return largest number of rows?
Students who live on campus or their first name starts with T or are in the School of Business
Which of the following indicates that the Customers table shown below is not well-structured?
The data in the table represents more than one entity.
Consider the database model below. Assuming the model should show that staff members and clients can have many appointments at a salon, and salons can house many appointments, which of the following describes the error in the database model?
The direction of the relationship between Salon and Appointment tables is incorrect.
You collected the product reviews your organization has received on social media and created the chart shown below. The chart shows the topics of the reviews (e.g. reviews about pricing, customer service etc) as well as the sentiment of the reviews (i.e. whether the review was a positive, negative or a neutral review). Which of the following conclusions can be drawn based on the chart?
The least frequent customer review topic is reliability.
In a survey, what is wrong with the following set of responses for the question: "What is your current age?" if only one response can be selected by respondents? 1-5 5-10 10-20 20-30 30-40
The responses are not mutually exclusive AND not exhaustive
Which of the following is not an advantage of closed-ended questions?
They allow more freedom in response
You are cleaning a data set. The address column includes street name, city, state and zip code. It follows the format "StreetName, City; State_Zip". For example, one address is recorded as "600 Forbes Avenue, Pittsburgh; PA_15282". You are trying to extract the zip code information (15282) using a delimiter. Which delimiter should you use?
Underline and comma
Jack (employee ID 115) recently got married and he and his wife both adopted a new surname. As his email address also changed, a new record was entered into the employee table in the organization's database under employee ID 77 to store his new email address. His paychecks are recorded using his new employee ID (77), while his performance reviews and promotion information are recorded using his old employee ID (115) that is associated with his previous name. Which of the common characteristics of high-quality information did the database administrator disregard?
Uniqueness
Effective _____ significantly reduce the amount of time it takes for your audience to process information and access valuable insights.
Visualization
Which of the following questions should be listed near the end of a survey?
What is your household income?
Which of the following statements is correct regarding count vs count distinct?
When counting values of the primary key column, count and count distinct will always give the same answer
Which of the following answers best describes the purpose of a query?
a query involves the identification and transformation of data to answer a question
If one creates a relationship and chooses to enforce referential integrity, then __________________.
a value must exist in the primary key column of the parent table before it can be entered into the foreign key column of the child table
One of the biggest mistakes made in data sourcing is to:
collect data before setting objectives
Using orange and green text together in a design will make it look more professional because it creates:
contrast
Giant Beagle, a chain of local grocery stores, wants to survey customers to decide what type of new cookies to offer in the stores' bakery departments. A team of survey takers visited each store on Saturday and Wednesday and asked customers browsing the bakery department what type of cookie they would be interested in buying. What type of sampling strategy did Giant Beagle use?
convenience
Consider the sample data shown below. How would you determine number of unique customers who owns a car
count distinct customer_id
What is data sourcing?
dentifying sources of data that can be collected to answer business questions and support decision-making
What is the best placement for the following survey question: "What is your age?"?
end of survey
It is important to utilize clean data when performing analyses to answer business questions in order to _____.
ensure the integrity of the analysis results
A state fair had 500,000 people in attendance. The fair committee performed a survey to understand what do fairgoers enjoy most about the fair. Of those who took the survey, 83% reported eating traditional foods. Which group of people represents the sample?
fair goers who took the survey
Consider the screen shot below of a portion of a dashboard made in Power BI. How could the slicer be adjusted to include the week starting with September 23, 2019?
in the slicer, slide the right white circle to the right
The database model shown below is partially completed. Assuming that the model should reflect that each invoice can document the acquisition of many materials in the invoice's details, and each material can be acquired many times through details on invoices, which of the tables would require the addition of a foreign key to document these relationship(s) in the model
invoice_details table only
Data is considered of high quality when:
it satisfies the requirements of its intended use
If every individual in a population has the same chance of being included in a sample, the sample is a(n) _________ sample.
random
Consider the product data below that was collected for a few of the products available in an apparel store. If a data set showing 3000 of the store's products was collected in the same fashion, how could one use this data to create a unique list of the colors offered in the store?
remove duplicates in the color column
Which of the following shows the parts of a website that you are permitted and not permitted to scrape?
robots.txt
Each law firm in one state registers its phone number with the state court system. Using a computer program, an employee of the state court system selects 50 random registered phone numbers, and the law firms associated with those numbers will be sent an audit survey. What type of sample is this?
simple random
The Dean of the School of Business wanted to learn more about what students would like to purchase in the cafe located in Rockwell. He obtained a list of all students in the School of Business and used Excel to randomly identify 200 students to whom to send a survey. What type of sampling strategy did the Dean use?
simple random
[dropped] The following is a snapshot of a query that captures products and their prices. You would like to rank the products from lowest to highest price within each category where the categories are to be sorted in A-Z (alphabetical) order. How would you achieve that?
sort by product_category in ascending order followed by sorting by price in descending order
A school chooses 4 randomly selected athletes from each of its sports teams to participate in a survey about athletics at the school. What type of sampling strategy was used?
stratified
Consider the sample iPhone sales data below. If making a visualization, what type of aggregation would be required to display the total number of phones sold each month and what field should it be applied to? This could be used, for example, in a column chart that shows the total number of phones sold per month.
sum aggregation on the quantity field
Consider the sample iPhone sales data below. If making a visualization, what type of aggregation would be required to display the total quantity sold of each product and what field should it be applied to? This could be used, for example, in a column chart that shows the total quantity sold of each phone.
sum aggregation on the quantity field
Consider the sample data shown below. How would you determine total quantity of products sold?
sum quantity
Consider the sample data shown below. How would you determine the total of all transaction amounts?
sum the total_transaction_amoun
If the football coach surveys the starting lineup (i.e., the players that are on the field at the start of the game) regarding which uniform the football team should wear, what is the population?
the football team
Open-ended questions should be used
when respondents are required to elaborate on a question
