Data Analytics Exam

Ace your homework & exams now with Quizwiz!

What are standardized metrics?

Metrics used by data vendors to allow easier comparison of company reported XBRL data

What did Loughran and McDonald find in their paper related to this type of analysis?

"their research suggesting that the stock market reaction is related to the proportion of negative words (or inversely, the proportion of positive words). They call this method overlap. Thus, using this method to define the tone of the article, they indeed find a direct association, or relationship, between the proportion of negative words and the stock market reaction to the disclosure of 10-K reports.

What is profiling?

As you recall, profiling involves gaining an understanding of a typical behavior of an individual, group, or population (or sample).

What are the general steps to accomplish the classification model?

1.Identify the classes you wish to predict. 2.Manually classify an existing set of records. 3.Select a set of classification models. 4.Divide your data into training and testing sets. 5.Generate your model. 6.Interpret the results and select the "best" model.

Briefly list and describe the six steps in the IMPACT cycle.

1.Identify the questions. Understand the business problems that need to be addressed. •Are employees circumventing internal controls over payments? •Are there any suspicious travel and entertainment expenses? •How can we increase the amount of add-on sales of additional goods to our customers? Are our customers paying us in a timely manner? •How can we predict the allowance for loan losses for our bank loans? •How can we find transactions that are risky in terms of accounting issues? •Who authorizes checks above $100,000? How can errors be identified? 2.Master the data Know what data are available and how they relate to the problem. •Internal ERP systems •External networks and data warehouses •Data dictionaries •Extraction, transformation, and loading •Data validation and completeness •Data normalization •Data preparation and scrubbing 3.Perform the test plan Select an appropriate model to find a target variable. •Classification •Regression •Similarity matching •Clustering •Co-occurrence grouping •Profiling 4.Address and refine results Identify issues with the analyses, possible issues, and refine the model •Ask further questions •Explore the data •Rerun analyses 5.Communicate insights Communicate effectively using clear language and visualizations: •Dashboards •Static reports •Summaries 6.Track outcomes, Follow up on the results of the analysis. •How frequently should the analysis be performed? •Have the analytics changed? •What are the trends?

What is a supervised approach to data modeling? Given some examples of these types of techniques?

A supervised approach is used when you are trying to predict a future outcome based on historical data. Example: "Will a new vendor ship a large order on time?" Classification - predict whether data belongs to one class or another Similarity matching - group data by attributes Regression - predict a specific value Link prediction - social networks

What is an unsupervised approach to a data modeling? Give some example of these types of techniques?

An unsupervised approach is used when you don't have a specific question. Example: "Do our vendors form natural groups based on similar attributes?" Clustering - find undiscovered natural groupings in the data Co-occurrence grouping - events that happen together Profiling - identify typical behavior in the data Data reduction - filter or group the data to simplify the analysis

What is the purpose of this technique?

Analysts can track sentiment proportion over time and analyze other text sources (e.g. news articles, social media) to realize gains in the market.

What are some examples where DA can help improve the quality of estimates and valuations?

Better estimates of collectability, write-downs, etc. Managers can better understand the business environment through social media Identify risks and opportunities through analysis of Internet searches

Define Big Data and what are the 3 Vs

Big Data refers to datasets which are too large and complex to be analyzed traditionally. Remember the 3V 's: Volume refers to size Velocity refers to frequency Variety refers to different types

What is the difference between Data Analytics and Big Data?

Big data is what data analytics is trying to evaluate and deal with.

What are the four classes of ratios, what does each one measure, and provide some examples of each?

Liquidity is the ability to satisfy the company's short-term obligations (e.g. current and acid-test) Activity ratios are a computation of a firm's operating efficiency (e.g. asset turnover) Solvency (or financing) ratios help assess a company's ability to pay its debts and stay in business (e.g. debt-to-equity) Profitability ratios provide information on the profitability of a company and its prospects for the future (e.g. profit margin)

In performing a test plan, list and describe the eight approaches that could be taken.

Classification—An attempt to assign each unit (or individual) in a population into a few categories. An example classification might be, of all the loans this bank has offered, which are most likely to default? Or which loan applications are expected to be approved? Or which transactions would a credit card company flag as potentially being fraudulent and deny payment? Regression—A data approach used to predict a specific dependent variable value based on independent variable inputs using a statistical model. An example regression analysis might be, given a balance of total accounts receivable held by a firm, what is the appropriate level of allowance for doubtful accounts for bad debts? Similarity matching—An attempt to identify similar individuals based on data known about them. The opening vignette mentioned Alibaba and its attempt to identify seller and customer fraud based on various characteristics known about them to see if they were similar to known fraud cases. Clustering—An attempt to divide individuals (like customers) into groups (or clusters) in a useful or meaningful way. In other words, identifying groups of similar data elements and the underlying drivers of those groups. For example, clustering might be used to[...]" Co-occurrence grouping—An attempt to discover associations between individuals based on transactions involving them. Amazon might use this to sell another item to you by knowing what items are "frequently bought together" or "Customers who bought this item also bought . . ." as shown in Exhibit 1-2. Profiling—An attempt to characterize the "typical" behavior of an individual, group, or population by generating summary statistics about the data (including mean, standard deviations, etc.). By understanding the typical behavior, we'll be able to more easily identify abnormal behavior. When behavior departs from that typical behavior—which we'll call an anomaly—then further investigation is warranted. Profiling might be used in accounting to identify fraud or just those transactions that might warrant some additional investigation (e.g., travel expenses that are three standard deviations above the norm). Link prediction—An attempt to predict a relationship between two data items. This might be used in social media. For example, because an individual might have 22 mutual Facebook friends with me and we both attended Brigham Young University, is there a chance we would like to be Facebook friends as well? Exhibit 1-3 provides an example of this used in Facebook. Link prediction in an accounting setting might work to use social media to look for relationships between related parties that are not otherwise disclosed. Data reduction —A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items (i.e., highest cost, highest risk, largest impact, etc.). It does this by taking a large set of data (perhaps the population) and reducing it with a smaller set that has the vast majority of the critical information of the larger set. An example might include the potential to use these techniques in auditing. While auditing has employed various random and stratified sampling over the years, Data Analytics suggests new ways to highlight which transactions do not need the same level of vetting as other transactions.

Accountants don't need to become full-fledged data scientists, but they do need to be able to what?

Clearly articulate the business problem the company is facing. Communicate with the data scientists about specific data needs and understand the underlying quality of the data. Draw appropriate conclusions to the business problem based on the data and make recommendations on a timely basis. Present their results to individual members of management (CEOs, audit managers, etc.) in an accessible manner to each member.

What is clustering?

Clustering is used to identify groups of similar data elements and the underlying drivers of those groups.

What are the seven skills that an analytic-minded accountant should have?

Consistent with that, in this text, we emphasize seven skills that analytic-minded accountants should have: Develop an analytics mindset—recognize when and how data analytics can address business questions. Data scrubbing and data preparation—comprehend the process needed to clean and prepare the data before analysis. Data quality—recognize what is meant by data quality, be it completeness, reliability, or validity. Descriptive data analysis—perform basic analysis to understand the quality of the underlying data and its ability to address the business question. Data analysis through data manipulation—demonstrate ability to sort, rearrange, merge and reconfigure data in a manner that allows enhanced analysis. Define and address problems through statistical data analysis—identify and implement an approach that will use statistical data analysis to draw conclusions and make recommendations on a timely basis. Data visualization and data reporting—report results of analysis in an accessible way to each varied decision maker and his or her specific needs.

Define Data Analytics (DA)

Data Analytics is the process of evaluating data with the purpose of drawing conclusions to address business questions. Effective Data Analytics provides a way to search through large structured and unstructured data to identify unknown patterns or relationships.

Given that operational data are more relevant and accessible, accounting firms will approach audits differently. How?

Data Analytics may also allow an accountant or auditor to assess the probability of a goodwill write-down, warranty claims or the collectability of bad debts based on what customers, investors, and other stakeholders are saying about the company in blogs and in social media (like Facebook and Twitter). This information might help the firm determine both its optimal response to the situation and appropriate adjustment to its financial reporting.

What are some examples of use of profiling for accountants?

Data profiling can be as simple as calculating summary statistics on transactional data, such as the average number of days to ship a product, the typical amount we pay for a product, or the number of hours an employee is expected to work. On the other hand, profiling can be used to develop complex models to predict potential fraud. For example, you might create a profile for each employee in a company that may include a combination of salary, hours worked, and travel and entertainment purchasing behavior. Sudden deviations from an employee's past behavior may represent risk and warrant follow-up by the internal auditors. Similar to evaluating behavior, data profiling is typically used to assess data quality and internal controls. For example, data profiling may identify customers with incomplete or erroneous master data or mistyped transactions.

What is data reduction?

Data reduction is used to filter results. 1. Identify the attribute you would like to reduce or focus on. 2. Filter the results. 3. Interpret the results. 4. Follow up on the results.

What is Data science (use Wikipedia)?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data,[1][2] and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.

Know the difference between a decision tree and decision boundaries.

Decision trees are used to divide data into smaller groups. Decision boundaries mark the split between one class and another.

What is a taxonomy?

Defines and describes each key data element (like cash or accounts payable). The taxonomy also defines the relationships between each element (like inventory is a component of current assets and current assets is a component of total assets).

What is the purpose of financial statement analysis?

Financial statement analysis is used by investors, analysts, auditors, and other interested stakeholders to review and evaluate a company's financial statements and financial performance.

What are some examples of the use of regression for accountants?

In managerial accounting, regression may predict employee turnover: Employee turnover = f(current professional salaries, health of the economy [GDP], salaries offered by other accounting firms or by corporate accounting, etc.) In auditing, regression may be used to determine the appropriateness of allowance accounts: Allowance for loan loses amount = f(current aged loans, loan type, customer loan history, collections success)

What are some examples of use cases of clustering models for accountants?

Internal auditors can use clustering to identify groups of transactions that may indicate risk or fraud in insurance or other payments.

What are some examples of use of data reduction for accountants?

Internal auditors may want to locate payments made to Square vendors. Financial statement analysts will take XBRL instance documents and filter on specific tags.

What does extensible mean?

Language itself expands over time, or the tags that are used can be expanded upon by the company itself

What are some types of information that Data Analytics can help companies discover?

No longer will they be simply checking for errors, material misstatements, fraud, and risk in financial statements or merely be reporting their findings at the end of the engagement. Instead, audit professionals will now be collecting and analyzing the company's data similar to the way a business analyst would to help management make better business decisions. This means that, in many cases, external auditors will stay engaged with clients beyond the audit. This is a significant paradigm shift. The audit process will be changed from a traditional process toward a more automated one, which will allow audit professionals to focus more on the logic and rationale behind data queries and less on the gathering of the actual data.8 As a result, audits will not only yield important findings from a financial perspective, but also information that can help companies refine processes, improve efficiency, and anticipate future problems.

Did they use a standard dictionary in their analysis? If not, why not?

No, they created their own sentiment dictionary that would be able to use their own financial dictionary. This shows a truer picture of the correlation of what was said and the actual outcome by using the financial dictionary instead of the standard dictionary.

What does the SEC require to be tagged with XBRL for public company filers?

Numbers and dates: §Balance sheet Income statement Statement of comprehensive income Statement of cash flows Statement of stockholders' equity Text blocks: Including numbers and dates Footnotes Management discussion and analysis (MD&A)

What is overfitting or underfitting?

Overfitting occurs in overly complex models that fit the existing data but are not as good with new data.

What kind of data is normally used in profiling? What are some common statistics that are generated in this approach?

Profiling is done primarily using structured data—data that are stored in a database or spreadsheet and are readily searchable. Using these data, analysts can use common summary statistics to describe the individual, group, or population, including knowing its mean, standard deviation, sum, etc. Profiling is generally performed on data that are readily available, so the data have already been gathered and are ready for further analysis.

What is pruning?

Pruning removes branches from a decision tree to avoid overfitting the model.

What is the purpose of ratio analysis?

Ratio analysis is a tool used to evaluate relationships among different financial statement items to assess the financial health of a business.

What is regression?

Regression allows the accountant to develop models to predict expected outcomes. 1. Identify the variables that might predict an outcome. 2. Determine the functional form of the relationship. 3. Identify the parameters of the model. Dependent variable = f(independent variables)

What are sparklines and what are they used for?

Sparklines are small graphic trendlines that efficiently summarizes numbers or statistics in a graph without axes in a single spreadsheet cell.

When would you use sparklines?

Sparklines are useful for creating simple dashboards or summarizing large sheet of numbers.

What is XBRL-GL? What are some reasons why XBRL-GL information may not be transmitted in real-time?

Stands for XBRL-General Ledger; relates to the ability of enterprise system to tag financial elements within the firm's financial reporting system.

What is one way that DA can help tax accountants?

Tax strategy and planning Understanding of tax consequences of international transactions, investment, mergers and acquisitions Better organization of tax tables and other tax data. Now, however, tax executives must develop sophisticated tax planning capabilities that assist the company with minimizing its taxes in such a way to avoid or prepare for a potential audit. This shift in focus makes tax data analytics valuable for its ability to help tax staffs to predict what will happen rather than reacting to what just did happen. Arguably, one of the things that Data Analytics does best is predictive analytics—predicting the future! An example of how tax data analytics might be used is the capability to predict the potential tax consequences of a potential international transaction, R&D investment, or proposed merger or acquisition.

What is text mining and sentiment analysis?

Text mining analyzes the frequency of words in unstructured data (e.g. financial disclosure) and matches those to a sentiment dictionary (e.g. words identified as positive or negative).

What is a DuPont Ratio Analysis? Be able to compute each part of this analysis.

The DuPont ratio measures components of return on equity. Return on equity (ROE) = Profit margin × Asset turnover × Financial leverage = (Net profit/Sales) × (Sales/Average total assets) × (Average total assets/Average equity) For example, DuPont's 2009 Q2 return on equity is 27.8% = 29.4% PM x 20.1% AT x 471.7% FL

What determines the specific model(s) to choose to evaluate a set of data?

The choice of Data Analytics model depends largely on the type of question that you're trying to answer and your access to the data needed to answer the question. Descriptive and diagnostic analytics are typically paired when you would want to describe the past data and then compare it to a benchmark to determine why the results are the way they are, similar to the accounting concepts of planning and controlling. Likewise, predictive and prescriptive analytics make good partners when you would want to predict an outcome and then make a recommendation on how to follow up, similar to an auditor flagging a transaction as high risk and then following a decision flowchart to determine whether to request additional evidence or include it in audit findings. Ultimately, the model you use comes down to the questions you are trying to answer.

What is classification?

The goal of classification is to predict whether an individual we know very little about will belong to one class or another.

Know the difference between training data and test data.·

Training data are existing data that have been manually evaluated and assigned a class. Test data are existing data used to evaluate the model.

What is an example use case of classification model for accountants?

Using a classification model, you can predict whether a new vendor belongs to one class or another based on the behavior of the others, shown in Exhibit 3-10 Classification in auditing is going to be mainly focused on risk assessment. The predicted classes may be low risk or high risk, where an individual transaction is classified in either group. In the case of known fraud, auditors would classify those cases or transactions as fraud/not fraud and develop a classification model that could predict whether similar transactions might also be potentially fraudulent. There is a longstanding classification method used to predict whether a company is expected to go bankrupt or not. Altman's Z is a calculated score that helps predict bankruptcy and might be useful for auditors to evaluate a company's ability to continue as a going concern.

What does XBRL stand for and what is its purpose?

XBRL stands for eXtensible Business Reporting Language and is a type of XML (extensible markup language) used for organizing and defining financial elements.

There is a movement toward leveraging advanced business analytic techniques to refine the focus on ______________ and derive deeper insights into _________________.

focus on risk ; derive deeper into an organization

What are the three types of data profiling?

structure discovery, content discovery and relationship discovery. The goals, though, are consistent - improving data quality and gaining more understanding of the data. Structure discovery, also known as structure analysis, validates that the data that you have is consistent and formatted correctly. There are several different processes that you can use for this, such as pattern matching. For example, if you have a data set of phone numbers, pattern matching helps you find the valid sets of formats within the data set. Pattern matching also helps you understand whether a field is text- or number-based along with other format-specific information. Structure discovery also examines simple basic statistics in the data. By using statistics like the minimum and maximum values, means, medians, modes and standard deviations, you can gain insight into the validity of the data. Content discovery is the process of looking more closely into the individual elements of the database to check data quality. This can help you find areas that contain null values or values that are incorrect or ambiguous. Many data management tasks start with an accounting for all the inconsistent and ambiguous entries in your data sets. The standardization process in content discovery plays a major role in fixing these little problems. For example, finding and correcting your data to fit street addresses into the correct format is an essential part of this step. The potential problems that could arise from non-standard data, like being unable to reach customers via mail because the data set includes incorrectly formatted addresses, are costly and can be addressed early in the data management process. Finally, relationship discovery involves discovering what data is in use and trying to gain a better understanding of the connections between the data sets. This process starts with metadata analysis to determine key relationships between the data and narrows down the connections between specific fields, particularly where the data overlaps. This process can help cut down on some of the problems that arise in your data warehouse or other data sets when data is not aligned.

Scanning ______________ ______________ may result in information about potential risks or opportunities related to the industry in which the firm operates as well as its competitors

the internet


Related study sets

SOL Study Guide - ES.3 The Solar System

View Set

AP European History Chapter 19 Review: Napoleonic Era and French Revolution.

View Set

Philosophy 1301 Final- introduction/overview, history, Descartes, Plato, Republic, Intro to Logic, Basic Analysis, Rousseau, Origin of Inequality, Social Contract, Philosophical Writing, Wollstonecraft, Rights of Woman

View Set

Life Insurance Policy Provisions, Options, & Riders Quiz

View Set

BIlly - AP Comp Scie 3rd 9 week test

View Set

NE 105- Test 2- Practice Questions- Toddler/Preschooler

View Set

Leadership I (Corporal's Course)

View Set

Chapter 13: Deserts and Wind Action

View Set