Data Analytics for Accounting - Midterm Set

Ace your homework & exams now with Quizwiz!

Which skills were not emphasized that analytic-minded accountants should have? Multiple Choice A. Statistical data analysis competency B. Classification of test approaches C. Developed an analytics mindset D. Data scrubbing and data preparation

B. Classification of test approaches

In which areas were skills not emphasized for analytic-minded accountants? Multiple Choice A. Descriptive data analysis B. Data and systems analysis and design C. Data visualization and data reporting D. Data quality

B. Data and systems analysis and design

Which of the following is not a typical example of nominal data? A. Gender B. SAT scores C. Hair color D. Ethnic group

B. SAT scores

Match the desired visualization for quantitative data to the following chart types: 5. Data trends for net income over the past eight quarters

Bar Charts

An observation about the frequency of leading digits in many real-life sets of numerical data is called:

Benford's Law

Match the desired visualization for quantitative data to the following chart types: 1. Useful for showing quartiles, medians, and outliers

Box and whisker plots

As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? Multiple Choice A. Remove headings and subtotals. B. Format negative numbers. C. Clean up trailing zeroes. D. Correct inconsistencies across data.

C. Clean up trailing zeroes.

What data analytics test will predict which firms will go bankrupt and which firms will not go bankrupt?

Classification

Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs?

Classification

What data analytics test will segment all of the company's customers into groups that will allow further specific analysis?

Clustering

What data analytics test will identity customers who buy product X will be most likely to be also interested in product Y.

Co-occurrence Grouping

Which of the following describes part of the goal of the ETL process? Multiple Choice A. Identify which approach to data analytics should be used. B. Load the data into a relational database for storage. C. Communicate the results and insights found through the analysis. D. Identify and obtain the data needed for solving the problem.

D. Identify and obtain the data needed for solving the problem.

The IMPACT cycle specifically includes all except the following steps: Multiple Choice A. address and refine results. B. perform test plan. C. communicate insights. D. data preparation.

D. data preparation.

Models associated with regression and classification data approaches have all have these important parts except: A. identifying which variables (we'll call these independent variables) might help predict an outcome (we'll call this the dependent variable). B. the functional form of the relationship (linear, nonlinear, etc.). C. the numeric parameters of the model (detailing the relative weights of each of the variables associated with the prediction). D. test data.

D. test data.

The IMPACT cycle includes all except the following steps: Multiple Choice A. perform test plan. B. master the data. C. track outcomes. D. visualize the data.

D. visualize the data.

What data analytics test will use stratified sampling to focus audit effort on transactions with greatest risk.?

Data Reduction

Why is identifying the question such a critical first step in the IMPACT process cycle?

Data analysis is most effective when a question is identified that needs to be addressed. That will focus the analysis on which data and which test method might be most effective in addressing or answering the question.

The metadata that describes each attribute in a database is called:

Data dictionary

3. Summary statistics

Descriptive analytics

4. What were the total taxes paid in the past 5 years?

Descriptive analytics

8. Which product sold the most last month?

Descriptive analytics

9. Data reduction or filtering

Descriptive analytics

What are attributes that exist in a relational database that are neither primary nor foreign keys?

Descriptive attributes

1. Clustering

Diagnostic analytics

10. Profiling

Diagnostic analytics

2. What was the price and quantity variance associated with the production of chicken at Tyson?

Diagnostic analytics

6. Co-occurrence grouping

Diagnostic analytics

7. Our refunds seem to be high. Are they fraudulent?

Diagnostic analytics

8. Similarity matching

Diagnostic analytics

Identify the behavior, error, or fraudulent scheme that could be detected when you apply Benford's Law to the following accounts: Vendor payments

Duplicate checks

Identify the behavior, error, or fraudulent scheme that could be detected when you apply Benford's Law to the following accounts: Travel and entertainment

Expense approval circumvention

Identify the behavior, error, or fraudulent scheme that could be detected when you apply Benford's Law to the following accounts: Sales Records

Ficitious sales transactions

Match the desired visualization for quantitative data to the following chart types: 3. Distribution of sales across states or countries

Filled geographic maps

Even though it is preferable to store data in a relational database, storing data across separate tables can make data analysis cumbersome. Describe three reasons it is worth the trouble to store data in a relational database.

Flat files make maintenance cumbersome due to redundancy, inaccuracies, and completeness. In a relational database, there is only one table where you would update the supplier's address. This ensures that all staff have the same accurate information, without having to type in the updated address in every table where the supplier's location is linked.

Among the advantages of using a relational database is enforcing business rules. Based on your understanding of how the structure of a relational database helps prevent data redundancy and other advantages, how does the primary key/foreign key relationship structure help enforce a business rule that indicates that a company shouldn't process any purchase orders from suppliers who don't exist in the database?

For purchase orders, the primary key would be the PO number- while one of the foreign keys required would be a supplier ID. The supplier ID is a foreign key in the PO table, which is a primary key in the supplier table. A relational database is able to enforce supplier registration, by making the supplier ID a required data point. If the supplier has not been registered, the individual setting up the PO will be unable to move forward with issuing a PO- since the ID is required.

Why is Supplier ID considered to be a primary key for a Supplier table?

It contains a unique identifier for each supplier.

Match the desired visualization for quantitative data to the following chart types: 6. Data trends for stock price over the past five years

Line charts

What data analytics test will look for relationships between related parties that are not otherwise disclosed.?

Link Prediction

Which approach to data analytics attempts to predict a relationship between two data items?

Link prediction

Identify the behavior, error, or fraudulent scheme that could be detected when you apply Benford's Law to the following accounts: Purchases

Potential kickback schemes

Identify the behavior, error, or fraudulent scheme that could be detected when you apply Benford's Law to the following accounts: Sales returns

Potential kickback schemes

1. What are the expected stock returns to our investment in Facebook stock?

Predictive analytics

11. Regression

Predictive analytics

2. Classification

Predictive analytics

3. What are the cash needs and projections over the next 3 months?

Predictive analytics

5. Link prediction

Predictive analytics

6. Should we ship by truck, rail, or air given the expected increase in fuel expenses?

Prescriptive

4. Decision support systems

Prescriptive analytics

5. If we expect our Asian sales to increase, where should we produce them?

Prescriptive analytics

7. Machine learning and artificial intelligence

Prescriptive analytics

Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table?

Primary key

What data analytics test will Work to understand normal behavior, to then be able to identify abnormal behavior (such as fraud)?

Profilling

What data analytics test will predict the relationship between an investment in advertising expenditures and subsequent operating income?

Regression

Match the desired visualization for quantitative data to the following chart types: 2. Correlation between two variables

Scatter plots

Match the desired visualization for quantitative data to the following chart types: 4. Visualize the line of best fit

Scatter plots

What data analytics test will predict which new customers resemble the company's best customers?

Similarity Matching

Box and whisker plots (or box plots) are particularly adept at showing extreme observations and outliers. In what situations would it be important to communicate these data to a reader? Any particular accounts on the balance sheet or income statement?

Since Box Plots are useful for outlier detection, they are useful for visualizing exceptions to internal controls. Plotting transaction accounts (such as purchase card transactions) allow auditors to quickly observe whether there is unusual user activity and allow them to focus their effort on those high risk transactions.

The Big Four accounting firms (Deloitte, EY, KPMG, and PwC) dominate the audit and tax market in the United States. What chart would you use to show which accounting firm dominates in each state in terms of audit revenues? 1. Area Chart 2. Line Chart 3. Column Chart 4. Histogram 5. Bubble Chart 6. Stacked Column Chart 7. Stacked Area Chart 8. Pie Chart 9. Waterfall Chart 10. Symbol Chart

Since we are comparing relative sizes for each state the following charts would be effective: pie chart, stacked column chart, bubble map, or symbol map. The other choices are better choices for numerical data depicting trends. 5. Bubble Chart 6. Stacked Column Chart 8. Pie Chart 10. Symbol Chart

These data are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching this definition is:

Structured Data

is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line.

Support vector machine

is a set of data used to assess the degree and strength of a predicted relationship.

Test Data

In the ETL process, if the analyst does not have the security permissions to access the data directly, then he or she will need to fill out a data request form. While this doesn't necessarily require the analyst to know extraction techniques, why does the analyst still need to understand the raw data very well in order to complete the data request?

The analyst needs to understand the data very well in order to request the correct and complete data needed for the extraction. Requesting data can be a time-intensive and iterative process: the more prepared an analyst is in understanding the raw data, the more they can save on time and labor costs of the IT team.

What is the purpose of a data dictionary? Identify four different attributes that could be stored in a data dictionary, and describe the purpose of each.

The data dictionary is a centralized repository of descriptions for all of the data attributes of the data set 1. variable name 2. brief description 3. whether the field is made up of numbers or text or alphanumerics 4. the size 5. whether it serves as a primary or foreign key 6. notes, etc.

Based on the data from datavizcatalogue.com, what are some major flaws of using word clouds to communicate the frequency of words in a document?

The main flaws of word clouds include: long words are emphasised over short words and words whose letters contain many ascenders and descenders may receive more attention.

To address the question "Will I receive a loan from LendingClub?" we had available data to assess the relationship among (1) the debt-to-income ratios and number of rejected loans, (2) the length of employment and number of rejected loans, and (3) the credit (or risk) score and number of rejected loans. What additional data would you recommend to further assess whether a loan would be offered? Why would they be helpful?

There are many other potential predictors of whether the LendingClub would pay a loan. Here are a few possibilities: What other debt do they have? How much is their disposable income? Do they have a clean criminal record? Have they had a loan with LendingClub before and did they repay it? Do they rent or own their house?

What is the difference between training datasets and test (or testing) datasets?

Training datasets are used to help teach the algorithm certain classifying skillsets and can be introduced first in the process, while testing data must be kept separate in order to properly access the applicability of the algorithm's modeling.

Match the data examples to one of the following data types: (Interval Data, Ratio Data, Ordinal Data, Nominal Data, Structured Data, Unstructured Data) Data Examples: 1. GMAT Score 2. Total Sales 3. Blue Ribbon, Yellow Ribbon, Red Ribbon 4. Company Use of Cash Basis vs. Accrual Basis 5. Depreciation Method (Declining Balance, Straight-Line, etc.) 6. Management Discussion and Analysis 7. Income Statement 8. Inventory Method (FIFO, LIFO, etc.) 9. Blogs 10. Total Liabilities

1. GMAT Score = Interval data 2. Total Sales = Ratio data 3. Blue Ribbon, Yellow Ribbon, Red Ribbon = Ordinal data 4. Company Use of Cash Basis vs. Accrual Basis = Nominal data 5. Depreciation Method (Declining Balance, Straight-Line, etc.) = Nominal data 6. Management Discussion and Analysis = Unstructured data 7. Income Statement = Structured data 8. Inventory Method (FIFO, LIFO, etc.) =Nominal data 9.Blogs = Unstructured data 10. Total Liabilities = Ratio data

Identify the order sequence from least sophisticated (1) to most sophisticated data type (4). Interval Data Ordinal Data Nominal Data Ratio Data

1. Nominal Data 2. Ordinal Data 3. Interval Data 4. Ratio Data

What is the difference between a supervised and an unsupervised approach?

A supervised approach means that the analyst is starting with a business question to theory, and is selecting the relevant test area. Unsupervised approaches involve data exploration, for example, cluster mapping allows an analyst to look for potential patterns in the data.

"Customers" CustomerID (PK) FirstName LastName City State Phone_Number "Sales_Orders" Sales_Order_ID (PK) InventoryID Quantity_Sold Price CustomerID "Inventory" InventoryID (PK) Inventory_Description Price Write SQL queries for the following: A. The description and prices of all items with unit prices of at least $1,000. B. The average price of all inventory items. Rename the column in the output Avg_Price. C. The total number of orders for each state with more than one order. The aggregate should be renamed Total_Orders and the information should be presented alphabetically by state.

A. SELECT: Inventory_Description, Price FROM: Inventory WHERE: Price>=1000 B. SELECT: AVG (Price) AS Avg_Price FROM: Inventory C. SELECT: State, COUNT (Sales_Order_ID) AS Total_Orders FROM: Sales_Orders INNER JOIN: Customers ON: Sales_Orders.CustomerID=Customers.CustomerID GROUP BY: State HAVING: COUNT (Sales_Order_ID) > 1 ORDER BY: State ASC

In general, the more complex the model, the greater the chance of: Multiple Choice A. overfitting the data. B. underfitting the data. C. pruning the data. D. a more accurate prediction of the data.

A. overfitting the data.

The purpose of transforming data is: A. to validate the data for completeness and integrity. B. to load the data into the appropriate tool for analysis. C. to obtain the data from the appropriate source. D. to identify which data are necessary to complete the analysis.

A. to validate the data for completeness and integrity.

In the ETL process, one important step to process when transforming the data is to work with NULL, N/A, and zero values in the dataset. If you have a field of quantitative data (e.g., number of years each individual in the table has held a full-time job), what would be the effect of the following? 1. Transforming NULL and N/A values into blanks 2. Transforming NULL and N/A values into zeroes 3. Deleting records that have NULL and N/A values from your dataset

Transforming NULL and N/A values into blanks.The COUNT and AVERAGE functions would not include these fields in their computation for these variables. Transforming NULL and N/A values into zeroes.The COUNT and AVERAGE functions would incorporate these zeroes and would be included in their computation for these variables. It would have an impact particularly on the computation of the average since it would have the value of zero. Deleting records that have NULL and N/A values from your dataset.The COUNT and AVERAGE functions would not include these fields in their computation for these variables. If they are deleted all of the other fields and variables would be deleted as well, thus having a bigger impact on the overall dataset.

Based on the data from datavizcatalogue.com, a line graph is best at showing trends, relationships, compositions, or distributions?

Trends over time.

Identify the order sequence in the ETL process as part of mastering the data (i.e., 1 is first; 5 is last). a. Validate the data for completeness and integrity. selected b. Sanitize the data. c. Obtain the data. d. Load the data in preparation for data analysis e. Determine the purpose and scope of the data request.

e c a b d

Mastering the data can also be described via the ETL process. The ETL process stands for:

extract, transform, and load data

The Fahrenheit scale of temperature measurement would best be described as an example of:

interval data.


Related study sets

Expresiones de condición y propósito

View Set

Accounting - Chapter 7 - Posting

View Set

Chapter 7 - Ratios and Proportions

View Set

JKO HIPAA and Privacy Act Training (1.5 hours)

View Set

Hardware and Buying a Computer Quizzes for Exam 1- CGS 2060

View Set

Physiology 1.2. - Cardiovascular System

View Set