Exam 1

Ace your homework & exams now with Quizwiz!

Duplicate Records Detection Methods:

1.Exact matching: -Records are identical - - 2.Fuzzy (near-identical) matching (Weis et.al., 2008): -Records have similar values for certain relevant fields -Causes: data entry errors, different value formats, etc. E.g. 10/21/10 vs. October 21, 2010 -Classified as duplicates based on a threshold and some similarity criteria (e.g. Levenshtein distance)

Can emphasize through position

2-D position, spatial groupong


A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. The smaller the variance, the closer the individual scores are to the mean.

Which of the following is NOT true? The analytical mindset is the ability to:

Buy the right software


CLEANING THE DATA (convert the data)

How can unstructured text be converted to structured data


Can emphasize through motion

Direction of motion

The DuPont method includes profit margin ratio, asset turnover ratio, and financial leverage ratio. Profit margin ratio can be calculated by

Dividing net income by sales

Field Statistics do not include descriptive statistics such as mean, minimum value, maximum value, and standard deviation.


(1) serve as the primary means to communicate the data, but also can be accompanied by other communication means, such as text.


Which of the following is not a key risk associated with accounts payable?

Items (e.g., purchase orders, checks) are not missing.

Lead to action

Research shows that storytelling can engage parts of the brain that lead to action.

Which popular accounting ratio was originally developed and used by Dupont?

Return on equity


Stories lead to emotional coupling. Both the storyteller and the audience go through and relate to the same experience.


Stories will make it easier for the audience to connect and remember the information you are trying to convey


The middle value when a variable's values are ranked in order; the point that divides a distribution into two equal halves.


The spread, or the distance, between the lowest and highest values of a variable.

Standard Deviation

The square root of the variance reveals the average deviation of the observations from the mean.

In the data set associated with the DuPont Case, how can each row be uniquely identified in both the balance sheet data and the income statement data?

Ticker and Year

Which of the following is NOT a component of Big Data?


Ability to interpret and share results with stakeholders

Your insights are derived from your interpretation of the analytics results.

This question is from the syllabus: Does this class use Webex, Zoom, Teams, or in-person meetings for office hours?


Event Log is

a chronological record of computer systems activities which are saved to a file on the system

Each node has

a complete record of the entire block chain.

The analytics mindset includes the

ability to interpret and share the results of data analytics techniques with stakeholders.



There are two main types of data visualization

exploratory and explanatory

Some of the steps in capturing data can be automated. As you can imagine, it is better to have something automated than require manual intervention due to the possibility of

human error

Process mining

is a technique that extracts the information from event logs to discover, monitor, and improve business processes

Big Data traits

large population of data

INDEX MATCH can be used to return avalue from a column to the

left of thelookup range.

Can emphasize through form

length, width, orientation, size, shape, curvature, enclosure, blur

data-to-ink ratio

less can be more

To compare quantity, position is the

most accurate way


most common data point

Gestalt principles describe how

our mind organizes individual visual elements into groups to make sense of an entire visual.

There has been a disproportionate investment in and management attention focused on technology compared with

people and processess

VLOOKUP cannot

return a result from acolumn to the left of the lookup range (1stcolumn in the table_array)

Which of the following is a pre-attentive attribute that emphasizes Form?

shape, curvature, enclosure

(1) is the most effective visualizations incorporate a storytelling approach.


Process mining is an enabler that

uncovers the root causes of process inefficiencies by reconstructing and visualizing as-is business process flows and their many variations.

VLOOKUP return's a cell's


Duplicate Records Causes

ØDifferent formats, structures or schema of databases ØLack of a global or unique identifier ØHuman factors (data entry, lack of constraints, intentional)

Continuous Auditing Performed by Internal Audit

• Gain audit evidence more effectively and efficiently • React more timely to business risks • Leverage technology to perform more efficient internal audits • Focus audits more specifically • Help monitor compliance with policies, procedures, and regulations

What is Artificial Intelligence?

Definition: "Use of a computer to model intelligent behavior with minimal human intervention" •Intelligence exhibited by machines. •Artificial Intelligence: a machine mimics 'cognitive' functions that humans associate with other human minds •Machines & computer programs are capable of problem solving and learning, like a human brain.

In Tableau, measures are automatically aggregated to the granularity of the view, and the granularity, or number of marks, is set by


Limitations to process mining

Executable models may be used to force people to work in a particular manner. However, most models are not well-aligned with reality. ● Most hand-made models are disconnected from reality and provide only an idealized view on the processes at hand: "paper tigers". ● Given (a) the interest in process models, (b) the abundance of event data, and (c) the limited quality of hand-made models, it seems worthwhile to relate event data to process models: process mining

The Vlookup strategy for looking up values is MORE flexible than the Index/Match strategy for looking up values.


A ________________ is a convenient tool in Excel for summarizing data and computing basic statistics.

Pivot table

Which of the following emerging technologies was NOT covered in this week's slides?

Process Mining

In Tableau, the Show Me pane allows you to quickly see different visualizations and choose one appropriate for your data.


The Control Total is use to verify that the data has been imported correctly.


Which of the following is not a key risk associated with accounts receivable?

Unauthorized premiums are given to suppliers.

Big Data 4 V's

Volume, Velocity, Variety, Veracity

Ask the right quistions to then know...

WHAT data to collect

Despite the increased sophistication of the data and the analytics, the most important aspect continues

the human element

Throughout the ETL process, it is important to maintain

the integrity of your data. This is often done through data validation

INDEX Returns a Cell's Value based on

the intersection ofthe row and column number of a range.

Color, through hue, saturation or density, is

the least accurate way to compare quantity

In symmetric distributions, the mean, median, and mode are

the same

Data scientists are

the users


-Bar chart or histogram -Stem and Left Plot -Frequency polygon

Which of the following are keys risk associated with accounts receivable?

-Credit is granted to customers that are likely to default. -Accounts Receivable is not properly aged. -A significant percentage of the receivables is concentrated in a few customers.

Human Element includes

-Critical thinking -Judgment

behavioral alignment includes

-Culture and mental models -Organization and process design -Learning and development -Incentives and rewards

The following are all possible causes of duplicate records

-Different format (e.g. Date format) -Lack of global or unique identifier -Human error

AI Enablers

-Faster technology -Larger yet cheaper storage -Computerization -High level of investments by industry (Google, Baidu, Microsoft, etc) -Deepmind developed AlphaGo -IBM Watson uses in healthcare -Deloitte and Kira systems in contract analysis


-Frequency Distributions -Relative Frequency Distributions

Audit Analytics is transforming data into a way that is helpful for us "...x..."

"Extracting data from data"

Central Tendency

"Middle Values" -Mean, median, mode


"Summary of differences within groups" -Range, interquartile range, variance, standard deviation

Descriptive Statistics are Used by (1) to report on (2)

(1) Researchers (2) Populations and Samples

What is "analytics"?

-Analytics is a means of extracting value from data. -Analytics is the assessment of data with technology tools. -Today, there are more powerful analytics tools to more efficiently and effectively analyze a broader range of data and types of data than in the past. -Thus, there is an increased opportunity for enhanced insights about what stories data can tell to address business issues and transform the way decisions are made.

Continuous Auditing refers to (select all that apply):

-Audit by Exception -Entire population testing

Gestalt principles

-Gestalt principles describe how our mind organizes individual visual elements into groups to make sense of an entire visual. -Gestalt Principles can be used to de-emphasize non-important patterns. to highlight important patterns. -Gestalt Principles can be used to to highlight important patterns.

The analytical mindset is the ability to:

-Interpret and share the results with stakeholders -Apply appropriate data analytics techniques -Ask the right questions

Which of the following are keys risk associated with accounts payable?

-Payments are made to unauthorized suppliers. -Payments are made to individuals or employees. -Invoices are processed twice.

•There are many ways to analyze data. Some of the more fundamental analyses that you should be able to understand and apply include:

-Ratios (e.g., gross margin or a day's sales in accounts receivable) -Sorting (e.g., by industry or month) -Aggregation (e.g., total of an account balance) -Trends (e.g., the movement in inventory associated with both purchases and sales) -Comparisons (e.g., sales month to month) -Forecasting (e.g., budgeted expenses)

Examples that are NOT unstructed data

-Relational databases -Spreadsheet tabular AP data

•Blockchain was originally designed to:

-Solve the problem of double spending cyber currency. -Enable trading in a low or zero trust environment. -Create a distributed ledger that is robust to failure of various nodes. -Operate without a centralized authority.

1. Ask the right questions. 2. Extract, transform and load relevant data (i.e., the ETL process). 3. Apply appropriate data analytics techniques. 4. Interpret and share the results with stakeholders.

-Step 2: Must include cleaning the data -If step 4 doesn't happen then your work is useless -Steps after 1 will not be able to happen if within the first step the questions asked are wrong

Blockchain is designed to be:

-Tamper resistant. -Robust -Pseudo-anonymous -Decentralized

Apply appropriate data analytics techniques. It is important to understand:

-The purpose of different types of data analytics techniques -How to determine which techniques are most appropriate for the objectives of your analysis ØYour objectives might include the need to prove or disprove your expectation if you developed one (a confirmatory approach versus an exploratory approach).

Blockchain technology was designed for all of the following

-To be tamper resistant -To use cryptography -To be robust

Components of Big Data

-Variety -Volume -Velocity

Heat maps in Tableau

-are useful for dentifying concentrated areas of data points -have dynamically displayed density, such that when zooming in on a map density will be recomputed and displayed -can be adjusted for opacity and intensity

Tufte Principles of Analytical Design

-data to ink ratio -remove backgrounds -remove redundant lables -remove borders -remove colors -remove special effects -remove bold effects -lighten lables -lighten or remove lines -label directly -no color overload

3 main reasons storytelling is the most effective visulization

-memorable -relatable -lead to action

Which of the following is an example of unstructured data? (Select all that apply)

-raw text -images

VLOOKUP can return

-the wrong resultwhen columns are added/deleted fromthe table_array.

How does Blockchain actually work?

-•Transactions are collected to be loaded onto a block. -•Once a block is completed, it is "mined" and added to the blockchain.

Audit data analytics (ADA or ADAs) are defined by the AICPA as

...the science and art of discovering and analyzing patterns, identifying anomalies, and extracting other useful information in data underlying or related to the subject matter of an audit through analysis, modeling, and visualization for the purpose of planning or performing the audit.

Duplicate Payment Prioritization Framework phases

1- Duplication detection 2-Duplication prevention 3-Model Review

An analytics mindset is the ability to

1. Ask the right questions. 2. Extract, transform and load relevant data (i.e., the ETL process). 3. Apply appropriate data analytics techniques. 4. Interpret and share the results with stakeholders.

Perspectives covered by process mining models

1. The control flow perspective focuses on the control flow, i.e., the ordering of activities.(Focuses on the variants) 2. The organizational perspective focuses on information about resources in the activity, i.e., which actors (e.g., people, systems, roles or departments) are involved and how they are related. (Focuses on people and their roles) 3. The case perspective focuses on properties of cases. Case: what u are looking at as one instance and Focuses on key events within the case) 4. The time perspective is concerned with the timing and frequency of events. (Focuses on the timing of these events)

Mean and outliers

1.Means can be badly affected by outliers (data points with extreme values unlike the rest) 2.Outliers can make the mean a bad measure of central tendency or common experience

Median and outliers

1.The median is unaffected by outliers, making it a better measure of central tendency, better describing the "typical person" than the mean when data are skewed.

Interquartile Range

A quartile is the value that marks one of the divisions that breaks a series of values into four equal parts.

What is a data visualization?

A visualization is a means of communicating data, primarily through imagery, that is both readable and recognizable.1


A way to graphically portray almost all the descriptive statistics at once is the box-plot.

Which of the following is NOT true about Gestalt principles?

Gestalt Principles encourage the use of the color yellow.

The following are all possible causes of duplicate records EXCEPT:

Good controls

The employee that authorized questionable payments throughout the accounts payable audit had the initials:


In general, is it better for asset turnover ratio (Sales / Total Assets) to be higher or lower?



How you ENGAGE your audience

Visual Design

How you SHOW your story


How you TELL your story

Can emphasize through color

Hue, Intensity

Extract, Transform, Load (ETL) is the process of obtaining data, cleaning it up, and loading it into relevant software in preparation for analysis.


Blockchain technology was designed for all of the following EXCEPT:

To be a centralized Ledger

Ask the right questions first

To drive better decisions, you must ask the right questions first and then seek answers in the data

Benford's law analysis can detect deviations from expected digit frequencies.


In the TechWear case was there a collectability issue in 2015? Were cash receipts keeping up with sales through the end of 2015?

Yes there was a collectibility issue and cash receipts were not keeping up with sales in 2015.


be consistent and effective when using color, cater the color for your audience, understand color context

Data visualization can be described as

blending the art of design with the science of data.

Big data is a

collection of data sets that are so large or complex that it is impossible to analyze them with traditional databases and tools. Sources include bank transactions, financial and market reports, orders and invoicing, surveys, online activities, and even weather or traffic reports.

In skewed data, the mean and median lie (1) toward the skew than the mode


mean lies (1) toward the skew than the median.


Tufte's principles

highlight that "excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency

Exploratory visualization

is what we do to understand the data, e.g., to develop and assess a hypothesis or question or find a pattern in the data. ►Allows the audience to explore data for further analyses ►Is conducted for a problem that has not been clearly defined

MATCH returns a column/row


blockchain is A chronological chain of blocks where

ØEach block is linked to all previous blocks. ØEach block contains identifying information about the information contained therein. ØEach transaction is verified before being loaded onto the blockchain

Benefits of making more use of ADAs include

— Improved understanding of an entity's operations and associated risks — Increased potential for detecting material misstatements — Improved communications with those charged with governance of audited entities

Continuous Monitoring Responsibility of Management

• Improve governance - aligning business/ compliance risk to internal controls and remediation • Improve transparency and react more timely to make better day-to-day decisions • Strive to reduce cost of controls and cost of Testing / monitoring • Leverage technology to create efficiencies and opportunities for performance improvements

Emerging Technological Landscape

•Artificial Intelligence •Drones •Blockchain •XBRL •Text mining

Continuous Auditing

•Audit by exception (Vasarhelyi and Halper, 1991) •Entire population testing •Close to real-time •Continuous assurance, utilizing some techniques such as analytical method and validity test, could detect anomalous transactions sooner than traditional audit does (Vasarhelyi et al, 2002) .

Extract, transform and load relevant data (the ETL process)

•Data characteristics and data relevance

Why Do We Need Standardized Syntax for Business Reporting?

•Format, syntax, semantics


•GPS receiver in your cell phone •Cash registers when you make a purchase •Cameras in public places •Your car •Your digital photos •Your IoT devices •Sensors

Summarize your results... Think about:

•How your stakeholder will best receive the information •How much time you will have to present •Whether the presentation will be in person or virtual •What the best format of the presentation will be •Whether you have any additional recommendations for further analysis

What is Big Data

•No unified definition: -Data exceeding the level of efficient manageability within traditional DB (Harris 2013) -Process of analyzing a large volume of diverse data, in any variety of form, using ground-breaking apparatus to identify opportunities to improve overall value (Miller 2012; Moore et al. 2013; Wyner 2013)

Robotic Process Automation

•RPA is a software that can automate repetitive and rule-based tasks. •RPA robots are capable of mimicking many-if not all-human user actions. •They log into applications, move files and folders, copy and paste data, fill in forms, extract structured and semi-structured data from documents, scrape browsers, and more.

RPA defined

•RPA is the use of a software "robot" (a program) that replicates the actions of a human being interacting with the user interface of a computer system. •RPA is a rule-based system that executes processes without the need for constant human supervision, and connects multiple systems without changing the existing IT landscape. •When applied to the right tasks, RPA can provide significant time and money savings as it allows actions and workflows to run autonomously without manpower. •Robots in an RPA process might also be referred to a bots or agents. As they replace the work of a human in a process, they can also be referred to as a virtual workforce.

Robotic Process Automation

•Robotic Process Automation (RPA) runs application software in the same way that a person works with that software.

Understand the flow of data in an accounting information system

•Type of accounting information systems •What modules are in this context •Capabilities and limitations of the data •Routine versus non-routine flows of data •Who generates and oversees the data and in what capacity

The ETL process - extraction of data

•What data to ask for •How to ask for data •What format the data needs to be in •Data transforming or cleansing involves converting data from one format to another to load it into an analytics tool. •This includes making certain that only the data needed is extracted and that this data is complete and accurate. •Data cleansing needs to be performed both before and after the data loading process.

Visualization can be used as

•a technique and a way to present findings as well.

Data transforming or cleansing involves

•converting data from one format to another to load it into an analytics tool. •This includes making certain that only the data needed is extracted and that this data is complete and accurate. •Data cleansing needs to be performed both before and after the data loading process.

Blockchain is a

•decentralized ledger system developed originally for Bitcoin trading.

One of the benefits of new analytics tools is the ability to

•present your results easily in a more sophisticated visual manner (i.e., visualization). Snapshots of different visual components are called dashboards.

The analyst needs to

•understand who the relevant stakeholders are and their objectives. Knowing your audience and what they want to accomplish is critical to understanding value and how to identify a "right" question.

Pre-attentive attributes

►Emphasis ►Quantity ►Color

Explanatory data visualization

►Explains what the audience needs to know ►Shows specific relationships in data, such as link between causes and results

Three key capabilities of process mining

● Discovery (the model) ● Conformance (compare the model to the data) ● Enhancement (improving the model)

Advantages of Process Mining Evaluates the effectiveness of internal control

● Full population of data ● Provide a process model reflecting real situation ● Analysis of the process deviations enables auditors to assess the internal control system (Alrefai, 2019). ● Additional analysis on attributes of event logs ● Automation of auditing process with real-time data

Related study sets

MATH 1680 - Statistics - Chapter 5 Probability - Section 5.4

View Set

PHARM - Safe Dosage & Medication Administration ATI Modules

View Set

Anatomy 2 Quiz 3 Blood Vessels & Circulation

View Set