Audit Midterm
b
A DFD is a representation of which of the following? a. The logical operations performed by a computer program b. Flow of data in an organization. c. Decision rules in a computer program d. Computer hardware configuration
area chart
A second chart type used to show trends is the
c
Which type of fraud is associated with 50% of all auditor lawsuits? a. Ponzi schemes b. Kiting c. Fraudulent financial reporting d. Lapping
True
Within diagnostic analysis, both informal and formal analyses can be conducted.
c
You co-own a theme park. You believe that the longer customers stay in the park, the hungrier they will be which would increase the amount they spend on food. Your co-owner believes that the longer customers stay in the park, the more likely they are to feel nauseated which would decrease the amount they spend on food. Both of you gather data and find some evidence supporting your belief. If the true relation is that there is no relation between time in the park and food sales, what type of error did your co-owner make? a. data overfitting error b. GIGO error c. type I error d. type II error
Violated attribute dependencies
are errors that occur when a secondary attribute does not match the primary attribute
Advanced testing techniques
are possible with a deeper understanding of the content of data
DFD
are visually simple and can be used to represent the same process at a high abstract (summary) or detailed level
Informal diagnostic analysis
builds on descriptive analytics. It includes using logic and basic tests to try to reveal relationships in the data that explain why something happened.
Prescriptive analytics
can be either recommendations to take or programmed actions a system can take based on predictive analytics results.
Basic Statistical Test
can be performed to validate the data ex: min, max, mean, sum
True
data validation is an iterative process that both helps identify what needs to be transformed in the data as well as verify that the data has been transformed correctly.
Data Structuring
is the process of changing the organization and relationships among data fields to prepare the data for analysis.
Data Joining
is the process of combining different data sources
Data Standardization
is the process of standardizing the structure and meaning of each data element so it can be analyzed and used in decision making.
test dataset
is used to assess how well the model predicts the target outcome
Computer Instructions Fraud
modifying software, illegal copying of software, using software in an unauthorized manner, creating software to undergo unauthorized activities
Data overfitting
occurs when a model fits training data very well but does not predict well when applied to other datasets
Fraud occurs when
people have perceived non shareable pressure, the opportunity gateway is left open and they can rationalize their action to reduce the moral impact in their minds
null hypothesis
proposed explanation worded in the form of an equality, meaning that one of the two concepts, ideas, or groups will be no different than the other concept, idea, or group.
ethical presentation.
refers to avoiding the intentional or unintentional use of deceptive practices that can alter the user's understanding of the data being presented.
distance (simplifying)
refers to how far apart related information is presented (simplification)
simplification
refers to making a visualization easy to interpret and understand.
Visual weight (emphasis)
refers to the amount of attention an element attracts. There are various techniques to increase visual weight, including color, complexity, contrast, density, and size
Auditor's Responsibility (via SAS No. 99)
requires auditors to: - Understand fraud - Discuss the risks of material fraudulent misstatements - Obtain information - Identify, assess, and respond to risks - Evaluate the results of their audit tests - Document and communicate findings - Incorporate a technology focus
Sarbanes-Oxley Act (SOX)
requires management to assess internal controls and auditors to evaluate the assessment
extrapolation beyond the range of data
is a process of estimating a value that is beyond the data used to create the model
Level 0 Diagram (DFD)
is a projection of the process on s on the Context diagram. It is like opening up that process and looking inside to see how it works (to show the internal sub-processes)- you repeat the external entities but you also expand the main process into its subprocesses (Also data stores will appear at this level)
Business Process Diagram
is a visual way to represent the activities in a business process and the intent is that all business users can easily understand the process from a standard notation (BPMN: business process modeling notation)
Data ordering
is the intentional arranging of visualization items to produce emphasis. The two most common ways of ordering data are (1) by using categories on the axes and (2) by the values of the data.
Line chart
is the most typical viz used to show trends
Attributes of High-Quality Data
Accurate, Complete, Consistent, Timely, Valid
c (All data elements should be named, with the exception of data flows into data stores, when the inflows and outflows make naming the data store redundant.)
All of the following are guidelines that should be followed in naming DFD data elements EXCEPT: a. Process names should include action verbs such as update, edit, prepare, and record. b. Make sure the names describe all the data or the entire process c. Name only the most important DFD elements d. Choose active and descriptive names
Diagnostic
An internal auditor notices an increase in a company's inventory shrinkage (i.e., inventory being stolen). The internal auditor creates a data model that explains what types of inventory are being stolen.
Advanced testing techniques
An internal auditor validates the daily changes in customer's accounts receivable balances against daily sales made on account less cash collected on receivables.
Data Flow Diagram (DFD)
a process model that focuses on: data flows, processes, sources and destinations of the data, data stores
Fraud
any means a person uses to gain an unfair advantage over another person; includes: - A false statement, representation, or disclosure - A material fact, which induces a victim to act - An intent to deceive - Victim relied on the misrepresentation - Injury or loss was suffered by the victim Fraud is a white collar crime
Fraudulent Financial Reporting
"cooking the books" (booking fictitious revenue, overstating assets, etc.)
Quantity
(goldilocks principle) axis increments, information in labeling of axis, improper use of too many colors, number of data points
Vulnerabilities of computer systems
- Company databases can be huge and access privileges can be difficult to create and enforce. Consequently, individuals can steal, destroy, or alter massive amounts of data in very little time - Organizations often want employees, customers, suppliers, and others to have access to their system from inside the organization and without. This access also creates vulnerability - Computer programs only need to be altered once, and they will operate that way until the system is no longer in use or someone notices - Modern systems are accessed by PCs, which are inherently more vulnerable to security risks and difficult to control *It is hard to control physical access to each PC *PCs are portable, and if they are stolen, the data and access capabilities go with them *PCs tend to be located in user departments, where one person may perform multiple functions that should be segregated *PC users tend to be more oblivious to security concerns)
Basic Guidelines for creating a DFD
- Understand the system that you are trying to represent - A DFD is a simple representation meaning that you need to consider what is relevant and what needs to be included - Start with a high level (context diagram) to show how data flows between outside entities and inside the system. Use additional DFDs at the detailed level to show how data flows within the system - Identify and group all the basic elements of the DFD - Name data elements with descriptive names, use action verbs for processes (e.g., update, edit, prepare, validate, etc.) - Give each process a sequential number to help the reader navigate from the abstract to the detailed levels - Edit/Review/Refine your DFD to make it easy to read and understand
Flowcharts vs, DFDs
-DFD's place a heavy emphasis on the logical aspects of a system -Flowcharts place more emphasis on the Physical characteristics of the system Changes in the physical characteristics of the process do affect the flowchart but have little or no impact on the DFD When deciding which tool to employ, consider the information needs of those who will view it
Goal of document flowchart
-show where each document originate and its final disposition
Preventing and Detecting fraud
1. Make fraud less likely to occur 2. Make it difficult to commit 3. Improve detection 4. Reduce fraud losses
Four threats to AIS
1. Natural and political disasters 2. Software errors and equipment malfunctions 3. Unintentional acts 4. Intentional acts
Basic Statistical Test
A CFO receives a spreadsheet file for review that contains annual pay raises for all company employees. The CFO examines the minimum, maximum, average, and median to make sure the data looks correct before making the final approval for pay increases. What Technique?
True
A data source and a data destination are entities that send or receive data that the system uses or produces.
b (Including all exception procedures and error routines clutters the flowchart and makes it difficult to read and understand.)
All of the following are recommended guidelines for making flowcharts more readable, clear, concise, consistent, and understandable EXCEPT: a. Divide a document flowchart into columns with labels. b. Flowchart all data flows, especially exception procedures and error routines. c. Design the flowchart so that flow proceeds from top to bottom and front left to right. d. Show the final disposition of all documents to prevent loose ends that leave the reader dangling.
visualization
Any visual representation of data, such as a graph, diagram, or animation; called a viz for short.
True
Best practice for storing data is to store the information in as disaggregated form as possible and then aggregate the data through querying for other uses
a
Billy-Bob Barker bakes big, beautiful brownies. However, Billy-Bob notices that the recipe he printed from the company database correctly states that the recipe needs flour but incorrectly lists the approved flour company and instead lists the approved salt vendor. This is an example of which of the following? a. Violated attribute dependency b. Data contradiction error c. Data imputation error d. Data duplication error
Business Process Diagrams Structure
Can show the organizational unit performing the activity
d
Catalina was reviewing the data imputation formula for missing values in a customer's credit score. She found that two lines of data had the exact same customer name, birthdate, address, and buying history, but they had two different social security numbers. Assuming there was no fraud by the customer, what best describes what Catalina likely found? a. Violated attribute dependency b. Misfielded data value c. Data threshold violation d. Data contradiction error
True
Choosing the right type of visualization strengthens the ability of the viz to communicate effectively.
Comparison
Comparing data across categories or groups represents the most common reason to create a visualization in business require both numeric and categorical data values ex: A bar chart comparing pay differences between men and women. A bullet chart that compares current year performance ratings for each employee to their previous year's performance ratings (using a bullet to show prior year performance).
Accurate
Correct; free of error' accurately represents events and activities
True
Data entry errors may be indistinguishable from data formatting and data consistency errors in an output data file.
data cleaning
Data validation is an important precursor to ________________
true
Descriptive analytics can also be performed on qualitative data by first transforming the qualitative data into numbers
True
Diagnostic analytics can also be much more formal and employ confirmatory data analysis techniques.
T
Documentation methods such as DFDs, BPDs, and flowcharts save both time and money, adding value to an organization. T or F
Complete
Does not omit aspects of events or activities; of enough breath and depth
shape
Every _______ on a flowchart depicts a unique operation, input, processing activity, or storage medium
Consistent
Example of violation of attribute A company switches the denomination of amounts (thousands, million etc) irregularly.
misfielded
Example: if a data field for a city contains the country name Germany, the data values are ________.
Computer Fraud
Exists if a computer is used to commit fraud. In using a computer, fraud perpetrators can steal: More of something, in less time, with less effort - They may also leave very little evidence, which can make these crimes more difficult to detect
Aggregating the data at different levels of detail, Joining different data together, and/or Pivoting the data
Extracted data often needs to be structured in a manner that will enable analysis which can entail:
Employee fraud (pressure)
Fraud type: Misappropriation of assets Pressures that lead to employee fraud: Financial emotional lifestyle
Rationalization
Fraudsters do not regard themselves as unprincipled, they regard themselves as highly principled individuals. The only way they can commit their frauds and maintain their self image as principled individuals is to create rationalizations that recast their actions as "morally acceptable" behaviors. These rationalizations may include: - I was just borrowing the money - It wasn't really hurting anyone (corporations are often seen as non-persons, therefore crimes against them are not "hurting" anyone) - Everybody does it - I've worked for them for 35 years and been underpaid all that time. I wasn't stealing; I was only taking what was owed to me - I didn't take it for myself. I needed it to pay my child's medical bills
unintentional acts
Greatest risk Accidents caused by human carelessness, failure to follow established procedures, and poorly trained or supervised personnel Innocent errors or omissions Lost, erroneous, destroyed, or misplaced data Logic errors Systems that do not meet company needs or cannot handle intended tasks example: A data entry clerk at Mizuho Securities mistakenly keyed in a sale for 610,000 shares of J-Com for 1 yen instead of the sale of 1 share for 610,000 yen. The error cost the company $250 million.
Software errors and equipment malfunctions
Hardware or software failures, Software errors or bugs, Operating system crashes, Power outages and fluctuations, undetected data transmission errors examples: As a result of tax system bugs, California failed to collect $635 million in business taxes.
simplification, emphasis, and ethical presentation.
High-quality visualizations follow three important design principles:
d
In row 8 and row 9, which of the following problems do you find? a. Data contradiction error b. Data concatenation error c. Data aggregation error d. Duplicate values
d
In which step of the data transformation process would you analyze whether the data has the properties of high-quality data? a. Data structuring b. Data standardization c. Data cleaning d. Data validation
C
Indicate which option orders the type of analytic from the one that provides the most value added to an organization to the least value added to the organization. A.Predictive, prescriptive, diagnostic, descriptive B.Prescriptive, predictive, descriptive, diagnostic C.Prescriptive, predictive, diagnostic, descriptive D.Predictive, prescriptive, descriptive, diagnostic
aggregated
Information is lost when data is __________.
Classifications of Computer Fraud
Input fraud, processor fraud, computer instructions fraud, data fraud, and output fraud
c
Making an item in the data area of a viz larger to increase emphasis is an example of using which principle? a. highlighting b. ordering c. weighting d. It's a poor design choice; items should all be the same size.
D
Making sure to use separate training datasets and test datasets is especially important for creating what type of analytic? A.Diagnostic analytic B.Descriptive analytic C.Prescriptive analytic D.Predictive analytic
miracle
Outflow but no inflow
Bar charts
Pie charts are the most over-used type of charts. This is because they are often used to show comparison. Select which chart type is best for making comparisons.
Techniques include
Techniques include Highlighting Weighting and Ordering
Simplification
Techniques include Quantity Distance and Orientation
a
The documentation skills that accountants require vary with their job function. However, all accountants should at least be able to do which of the following? a. Read documentation to determine how the system works b. Critique and correct documentation that others prepare c. Prepare documentation for a newly developed information system d. Teach others how to prepare documentation
histograms and box plots
The two most common distribution visualizations are
Input Fraud (Computer Fraud)
alteration or falsifying input
Flowchart
analytical technique used to describe some aspect of an information system in a clear, concise, and logical manner, they use a set of standard symbols to depict procedures and the flow of data
Prescriptive analytics
answers the question "what should be done?
Descriptive Analysis
are computations that address the basic question of "what happened?"
Valid
data measures what it is intended to measure; conforms to syntax rules and to requirements
Common Problems with Data Analytics
data overfitting, extrapolation beyond the range of data, failing to consider to variation
Predictive analytics
go a step further than diagnostic analytics to answer the question "what is likely to happen in the future?"
Diagnostic analysis
goes beyond examining what happened to try to answer the question, "why did this happen?"
Consistent
presented in same format over time
scatterplot
the most common correlation viz is a ______
Processor Fraud
unauthorized system use
non-proportional display of data (ethical presentation)
y axis should always start at 0
Misappropriation of assets and fraudulent financial reporting
Two Categories of Fraud
a
Using which of the following data validation techniques, can the validator estimate a likely error rate in the population of data? a. Audit a sample b. Visual Inspection c. Advanced testing techniques d. Basic statistical tests
Employee Pressure and Financial Statement Pressure
Two types of pressure depending on what fraud you are going to commit
Guidelines for drawing flowcharts
Understand the system you are trying to represent - Identify business processes, documents, data flows, and data processing procedures - Organize the flowchart so that it reads from top to bottom and left to right - Clearly label all symbols - Use page connectors (if it cannot fit on a single page) - Edit/review/refine to make it easy to read and understand
B
When confirmatory data analysis techniques are used, what type of analytic is likely being computed? A.Predictive analytic B.Diagnostic analytic C.Prescriptive analytic D.Descriptive analytic
a
Which chart type is best for depicting trends over time? a. area chart b. histogram c. bar chart d. pie chart
d
Which of the following can be used to present data unethically? a. selectively presenting only part of a viz b. with an axis, showing the most recent time closest to the origin c. truncating or stretching the axes d. All the above
a
Which of the following causes the majority of computer security problems? a. Human errors b. Power outages c. Software errors d. Natural disasters
d
Which of the following control procedures is most likely to deter lapping? a. Continual update of the access control matrix b. Encryption c. Background check on employees d. Periodic rotation of duties
outliers
Identifying __________ is important because they can exert undue influence on the computation of many analytics—which may lead to erroneous interpretations of the data.
b
In column 2, row 7, which of the following problems do you find? a. Data threshold violation b. Data entry error c. Violated attribute dependencies d. Dichotomous variable problem
a
In column 3, which of the following problems do you find? a. Data consistency error b. Data imputation error c. Data contradiction error d. Violated attribute dependencies
d
In column 5, which of the following problems do you find? a. Data pivoting error b. Violated attribute dependencies c. Data consistency error d. Cryptic values
c
In column 7, row 1, which of the following problems do you find? a. Data consistency error b. Data parsing error c. Data threshold violation d. Misfielded data value
d
Joleen queried the company database and returned 23 columns of information for her report. In examining the data, she noticed that one column only had values half of the time. Joleen decided to delete this column from her report. This is an example of which of the following? a. Data imputation b. Data entry error c. Data de-duplication d. Data filtering
Descriptive, Diagnostic, Predictive, and Prescriptive
There are four categories of data analytics:
dummy variable (dichotomous variable)
When a field contains only two different responses, typically 0 or 1, this field is called a
Misfield data values
are data values that are correctly formatted but do not belong in the field. can be a problem with an entire field (i.e., the entire column) or with individual values (i.e., entries in a row)
True
A DFD consists of the following four basic elements: data sources and destinations, data flows, transformation processes, and data stores. Each is represented on a DFD by a different symbol. T or F
d
A company uses a boxplot in a visualization. What is likely the purpose of the visualization? a. comparison. b. correlation c. part to whole d. distribution
d
A company wants to determine how to decrease employee turnover. In order to do this, they test whether paying off an employee's student debt will cause fewer employees to leave. The analytic testing whether paying off an employee's student debt causes lower turnover is an example of which type of analytic? a. prescriptive b. predictive c. descriptive d. diagnostic
Timely
provided in time for decision makers to make decisions
Misappropriation of Assets
theft of company assets which can include physical assets (cash, inventory, etc.) and digital assets (intellectual property such as protected trade secrets, customer data)
pie charts
they are the most overused and misused Part-to-whole visualization type
White-collar criminals
those who commit fraud
Correlation
Another common visualization is comparing how two numeric variables fluctuate with each Ex: A scatterplot showing the relation between job performance and income. A heatmap that shows the relation between training and job performance. In this case, you might use a heatmap instead of a scatterplot if you think the relationship is not linear but rather depends on "levels" of training; for example, creating groups for every 10 hours of completed training.
a
At what point in the ETL process should data validation take place? a. All of these b. During data cleaning c. During data structuring d. During data standardization
A
Chibuzo creates a chart to show the percentage of activities in the accounting function have been automated over time. She wants to stress the slow rate of change by the department to adopt automation. What is the purpose of Chibuzo's visualization and what type of chart would be best for this purpose? A.Trend evaluation, line chart B.Correlation, scatterplot C.Comparison, bar chart D.Comparison, line chart
lapping
Concealing the theft of cash by means of a series of delays in posting collections to accounts receivable. For example, a perpetrator steals customer A's accounts receivable payment. Funds received at a later date from customer B are used to pay off customer A's balance. Funds from customer C are used to pay off B's balance, and so forth.
Knowledgeable insiders
Former and current employees who are much more likely than non-employees to perpetrate frauds (and big ones) against companies. - Largely owing to their understanding of the company's systems and its weaknesses, which enables them to commit the fraud and cover their tracks
True
Fraud against companies may be committed by an employee or an external party
How data analytics can be used to prevent and detect fraud
Fraud detection is much more effective when data analytics software tools are used to examine an entire data population - Using data analytics software, every transaction or item in the data can be compared against selected criteria and any items identified as anomalies, unusual, or unexpected could be tagged for human examination - Data analytics don't directly detect fraud (Experienced humans are needed to examine and understand any suspicious activities identified and to determine if fraud is involved) - There are benefits as well as challenges when using data analytics to prevent and detect fraud
Financial Statement Fraud (Pressure)
Fraud type: fraudulent financial reporting is distinct from other types of fraud in that the individuals who commit the fraud are not the direct beneficiaries (the company is direct beneficiaries, perpetrator is indirect beneficiaries) Pressures that lead to Financial statement fraud: -deceive investors/creditors, - increase a company's stock price, -meet cash flow needs, -hide company losses or other problems
as a square
How are data sources and destinations represented in a data flow diagram?
data should be split into a training dataset and a test dataset
How to Create and validate a model?
Fraud Triangle (Conditions for Fraud)
Pressure, opportunity, and rationalization
System flowcharts
depicts the relationships among system input, processes, and outputs of AIS. they are pictorial representations of automated processes and files. system flowchart begins by identifying the inputs to the system each input is followed by a process (the steps performed on the data) The process is followed by outputs (the resulting new information) (lots of process symbols)
DFD requirement
every process must have at least one data inflow and at least data outflow
Documentation
explains how a system works, including the who, what, when, where, why, and how of data entry, data processing, data storage, information output, and system controls
Natural and political disasters
fires, floods, earthquakes, hurricanes, tornadoes, blizzards, wars, and attacks by terrorists—can destroy an information system and cause many companies to fail. For example: Terrorist attacks on the World Trade Center in New York City and on the Federal Building in Oklahoma City destroyed or disrupted all the systems in those buildings. A flood in Chicago destroyed or damaged 400 data processing centers.
Document Flowcharts
identify: all departments using the system all documents all processes performed on document -each department gets its own column
Data Fraud
illegally using, copying, browsing, searching, or harming company data
Program flowchart
illustrate the sequence of logical operations performed in a computer program
Program flowcharts
illustrates the sequence of logical operations performed by a computer in executing a program they also follow an input process output pattern Each rectangle (automated process) on system flowchart requires a program flowchart
Confirmatory data analysis
tests a hypothesis and provides statistical measures of the likelihood that the evidence (data) refutes or supports a hypothesis.
Bar chart
the most common type of visualization used in making comparisons is a ________
Opportunity
the opening or gateway that allows an individual to -commit the fraud, -conceal the fraud, and -convert the proceeds
Convert (opportunity)
unless the target of the theft is cash, then the stolen goods must be converted to cash or some form that is beneficial to the perpetrator - Checks can be converted through alterations, forged endorsements, check washing, etc. - Non-cash assets can be sold (online auctions are a favorite form) or returned to the company for cash
Prescriptive
A corporate accountant designs a cook scheduling system based on past data for meal preparation. The new system should assure that there are always enough cooks scheduled for peak demand at the restaurant.
comparison, correlation, distribution, trend evaluation, and part-to-whole.
The five main purposes for visualization are:
True
The key to being successful is the development of initial predictive models and then applying appropriate learning algorithms so those models continue to improve their recommendations over time.
balance
The level 0 diagram must "_________" with the Context diagram. This means they should both have the same external entities with the same flows to and from those entities
Heat map
The second most frequent correlation visualization is a _________ allow a representation of correlation between a numeric and non-numeric field if the non-numeric field can be ordered in a meaningful way.
emphasis
in design is assuring the most important message is easily identifiable.
highlighting (emphasis)
includes using colors, contrasts, call-outs, labeling, fonts, arrows, and any other technique that brings attention to an item.
Orientation
information should be presented and able to be read in a horizontal fashion
failing to consider to variation
inherent in a model (refers to the spread of the data about a prediction)
Data deception
is "a graphical depiction of information, designed with or without an intent to deceive, that may create a belief about the message and/or its components, which varies from the actual message."
training dataset
is used to create the model for future prediction
intentional acts
sabotage, misrepresentation, false use, or unauthorized disclosure of data, misappropriation of assets, financial statement fraud, corruption, computer fraud example: A hacker stole 1.5 million credit and debit card numbers from Global Payments, resulting in an $84 million loss and a 90% drop in profits in the quarter following disclosure.
Document flowchart
shows the flow of documents and data between departments or units, useful in evaluating internal controls
Output Fraud
stealing, copying, or misusing computer printouts or displayed information
Treemaps
use nested rectangles to show the amount that each group or category contributes.
Prescriptive analytics
use techniques such as artificial intelligence, machine learning, and other statistics to generate predictions
Trend Evaluation
visualizations show changes over an ordered variable, most often a measurement of time. ex: A line chart showing how compensation has changed each year for each employee. An area chart showing how total compensation for each year differs by department.
Distribution
visualizations show the spread of numeric data values ex:A histogram showing salary by bins of $10,000. A boxplot showing the distribution of employee performance ratings.
Part-to-whole
visualizations, show which items make up the parts of a total. ex: A pie chart showing the percentage of total employee pay for each department. A treemap showing how different academic degrees make up the total amount of pay.
Bullet graph
A variation of the bar chart that is also useful for comparison is the
Commit (opportunity)
- Lack of internal controls - Failure to enforce controls (the most prevalent reason) - Excessive trust in key employees - Incompetent supervisory personnel - Inattention to details - Inadequate staff Management may allow fraud by: Not getting involved in the design or enforcement of internal controls, inattention or carelessness, overriding controls, using their power to compel subordinates to carry out the fraud
b
A construction company classifies their projects into one of seven different types. To keep track of project classification, the clerk enters a number from 1 to 7 in the ProjectType field. The values 1 to 7 are best described as ________. a. Imputed data values b. Cryptic data values c. Misfielded data values d. Duplicate data values
Data Structuring, Data Standardization, Data Cleaning, Data Validation
A four-step process for transforming data that will maintain or improve data quality:
alternative hypothesis
A proposed explanation worded in the form of an inequality, meaning that one of the two concepts, ideas, or groups will be greater or less than the other concept, idea, or group.
Predictive
A tax accountant prepares analyses that shows what will happen to the customers of his client if the country adopts a new tax law.
Visual Inspection
A terrorist group launches a computer virus aimed at corrupting all transaction data for a corporation by randomly changing the currency of transactions. An internal auditor scans the company's database to see if the transactions appear to be in multiple different currencies. What technique?
Descriptive
A virus pandemic causes governments to shut down all restaurants. A national restaurant chain creates an analysis to see how long their cash reserves can continue to pay employees before the company runs out of money. Data Analytics Category?
Conceal (opportunity)
Concealing the fraud often takes more time and effort and leaves more evidence than the actual theft or misrepresentation and may include: - Charge a stolen asset to an expense account or to an account receivable that is about to be written off - Create a ghost employee who receives an extra paycheck - Lapping (A/R) or kiting (banks)
check kiting
Creating cash using the lag between the time a check is deposited and the time it clears the bank. Suppose an account is opened in banks A, B, and C. The perpetrator "creates" cash by depositing a $1,000 check from bank B in bank C and withdrawing the funds. If it takes two days for the check to clear bank B, he has created $1,000 for two days. After two days, the perpetrator deposits a $1,000 check from bank A in bank B to cover the created $1,000 for two more days. At the appropriate time, $1,000 is deposited from bank C in bank A. The scheme continues—writing checks and making deposits as needed to keep the checks from bouncing—until the person is caught or he deposits money to cover the created and stolen cash.
Context diagram
Highest-level DFD; provides a summary-level view of a system depicts a data processing system and the external entities that are: sources of input and destinations of output the process symbol is numbered with a "0" Ignore data stored
Data presentation
Researchers have identified several benefits of visualizing data relative to reading, including: 1. Visualized data is processed faster than written or tabular information. 2. Visualizations are easier to use. Users need less guidance to find information with visualized data. 3. Visualization supports the dominant learning style of the population because most learners are visual learners.
Comparison of 3 people groups
They found significant differences between violent and white-collar criminals and few differences between white-collar criminals and the general public
How accounts use documentation
They have to read documentation to understand how a system works, they have to evaluate the strengths and weaknesses of an entity internal controls, they may prepare documentation to: demonstrate how a proposed system would work, demonstrate their understanding of a system of internal controls
Document, System, Program
Three types of flowcharts
True
To avoid data deception, consider the following principles: Show representations of numbers proportional to the reported number (starting the y-axis at zero helps ensure this). In vizs designed to depict trends, show time progressing from left to right on the x-axis. Present complete data given the context.
d (A document flowchart traces the life of a document from its cradle to its grave as it works its way through the areas of responsibility within an organization.)
Which of the following flowcharts illustrates the flow of data among areas of responsibility in an organization? a. Program flowchart b. Computer configuration chart c. System flowchart d. Document flowchart
b
Which of the following is NOT a good reason to visualize data? a. Visualizations help the majority of people to learn better. b. Building visualizations does not take as much time as writing a report. c. Users can find information more quickly with visualized data. d. Visualized data is processed faster than written information.
c
Which of the following is NOT an example of computer fraud? a. Unauthorized modification of a software program b. Obtaining information illegally using a computer c. Failure to perform preventive maintenance on a computer d. Theft of money by altering computer records
d
Which of the following is NOT one of the responsibilities of auditors in detecting fraud according to SAS No. 99? a. Evaluating the results of their audit tests. b. Incorporating a technology focus c. Discussing the risks of material fraudulent misstatements. d. Catching the perpetrators in the act of committing the fraud.
a
Which of the following is a fraud in which later payments on account are used to pay off earlier payments that were stolen? a. Lapping b. Kiting c. Salami technique d. Ponzi scheme
c
Which of the following is a technique to simplify data presentations? a. highlighting b. weighting c. distance d. ordering
c (DFDs show data movement, but not necessarily the timing of the movement.)
Which of the following statements is FALSE? A. Flowcharts make use of many symbols. b. A document flowchart emphasizes the flow of documents or records containing data c. DFDs help convey the timing of events. d. Both a and b are false
d
Which of the following statements is FALSE? a. A flowchart is an analytical technique used to describe some aspect of an information system in a clear, concise, and logical manner. b. Flowcharts use a standard set of symbols to describe pictorially the flow of documents and data through a system c. Flowcharts are easy to prepare and revise when the designer utilizes a flowcharting software package d. A system flowchart is a narrative representation of an information system
c
Which of the following statements is FALSE? a. There is little difference between computer fraud perpetrators and other types of white-collar criminals. b. Some computer fraud perpetrators do not view themselves as criminals. c. The psychological profiles of white-collar criminals are significantly different from those of the general public. d. The psychological profiles of white-collar criminals differ from those of violent criminals.
D
________ often make use of exploratory data analytic techniques, while _______ make use of machine learning techniques. A.Diagnostic analytics, prescriptive analytics B.Descriptive analytic, prescriptive analytics C.Diagnostic analytics, predictive analytics D.Descriptive analytics, predictive analytics
black hole
a flow coming in but no flows coming out (we didn't do anything with data we received)
System flowchart
depicts the data processing cycle for a process; describes the relationship between inputs, processing, and outputs
SAS-94
requires that auditors understand the automated and manual procedures an entity uses - This understanding can be leaned through documenting the internal control system ~ a process that effectively exposes strengths and weaknesses of the system
Audit a sample
A computer engineer performs a complicated merge of data from five different accounting systems of company subsidiaries. To check her work, she randomly selects 50 transactions from each system to validate to make sure the merge worked correctly. What Technique?
d
Adi queries the company database to return all values from the field "FullAddress." Adi reviews the information and finds that half of the time the values store the city and country values before the street address and the other half of the time the street address is listed before the city and country. What type of error did Adi find in the database? a. Cryptic data value error b. Data formatting error c. Misfielded data value error d. Data consistency error
c
Among the following statements, which is likely to be detected using visual inspection? The setting: a company extracts data from one system, transforms the data into a new format, and then loads it into a new system. The visual inspection validation tests are performed on a portion of the data in the new setting. a. There are minor data imputation errors for some missing values. b. Not all data was extracted from the original system. c. Data from two fields was not concatenated into one field during the transformation process. d. Not all data was loaded into the new system.
d
Analyn spent the entire day entering information about suppliers into the company database. She did not make a single spelling mistake in any of the entries. However, at the end of the day, Analyn notices that she entered the state into the country field for all of the data. The mistaken data values in the country field are best described as which of the following? a. Cryptic data values b. Data formatting errors c. Data contradiction errors d. Misfielded data values
c
As part of the data standardization process, often items contained in different field for the same record need to be combined into a single field. This process is called: a. Aggregating data b. Data aggregation c. Data concatenation d. Data parsing
Accurate
Example of violation of attribute A sale occurred on December 27 but is recorded as occurring in the following year on January 4
Complete
Example of violation of attribute An annual evaluation of vendor performance only contains 7 months of data
Timely
Example of violation of attribute Customer purchasing metrics are 2 years old.
Valid
Example of violation of attribute There are only 7 unique job positions at a company but 9 different positions are attributed to employees, 2 answers are not valid.
visual inspection, basic statistical tests, auditing a sample of data, and advanced testing techniques.
The techniques used to validate data can be thought of as a continuum from simple to complex. These techniques include
d
Column 4, row 12, is most likely an example of which of the following? a. Data imputation b. Cryptic data value c. Violated attribute dependencies d. Listing the date in serial date format
b
Julie knows that her report printout should have only two columns of information from the database and that each column should have a dummy variable in it. The report she receives from IT has a single column and some examples of the values in the column are "10", "01", "11", "00." Julie surmises that what likely is the problem? a. The fields in the database have misfielded data values. b. The IT department improperly concatenated the data. c. The database has become corrupted and has nonsensical values in all fields. d. The IT department did not properly parse the data.
Audit a sample
One of the best techniques for assuring data quality
a
Santiago reviewed a recent extract of data about customer credit limits. He noticed that one company had a credit limit of $1,000,000,000 USD, whereas the next highest credit limit was $10,000 USD. What might have Santiago discovered? a. Data threshold violation b. Violated attribute dependency c. Data contradiction error d. None of these
a
Suzette sends Jimmy a flat file with a list of all sales transactions the company made during the last year. Each line contains all the information about a single sale. Jimmy prepares a report that shows three different views of the data (1) the total sales for each quarter, (2) the total sales by customer, and (3) the total sales for the entire year. To make this report, Jimmy had to do which of the following to the data Suzette sent? a. Aggregate data b. Join Data c. Pivot Data d. All of these
a
When data is aggregated, some of the detailed information is lost. Which of the following is needed if you want to show both the aggregated and disaggregated data together? a. Joining the aggregated data with the disaggregated data b. Pivoting the aggregated data c. Cleaning the aggregated data d, None of these will restore the lost information.
d
When data is joined together it is called _________, when it is split apart it is called ________. a. Data parsing, data concatenation b. data formatting, data standardization c. data standardization, data formatting d. data concatenation, data parsing
d
Which of the following could be used to catch the problems listed in the figure? a. Visual inspection b. Basic statistical tests c. Auditing a sample d. All of the above
d
Which of the following reasons describes why transforming data is necessary? a. Data aggregated at different levels needs to be joined b. Data within a field has various formats c. Multiple data values are contained in the same field and need to be separated d. All of the above
d
Which of the following techniques is most likely to discover a very large data threshold violation in a dataset containing 10 billion transactions? a. Audit a sample b. Visual Inspection c. Advanced testing techniques d. Basic statistical tests
a
Which of the following techniques is most likely to discover an error where a data analyst did not correctly parse data from one field into two fields? a. Visual Inspection b. Basic statistical tests c. Advanced testing techniques d. All of these
Data contradiction error
a data file contains information about a manufacturing plant in two different records; however, the physical address of the manufacturing plant is different in each record even though each record is meant to reference the same physical location.
Data entry errors
are all types of errors that come from inputting data incorrectly. these errors often occur in human data entry, such as misspelling words, transposing digits in numeric strings, and failing to enter data. they can also be introduced by the computer system.
Data threshold violations
are data errors that occur when a data value falls outside an allowable level.
Cryptic data values
are data items that have no apparent meaning without understanding the underlying coding scheme.
Visual Inspection
examining data using human vision to see if there are problems.
Data contradiction errors
exists when the same entity is described in two conflicted ways. need to be investigated and resolved appropriately.
violated attribute dependency
if a record accurately indicates a person lives in Nauvoo, Illinois but mistakenly lists the zip code as 26354, there is a _____________________ because the zip code for Nauvoo, Illinois is 62354.
Data parsing
involves separating data from a single field into multiple fields. is often an iterative process that relies heavily on pattern recognition
Outlier
is a data point, or a few data points, that lie an abnormal distance from other values in the data.
Exploratory data analysis
is an approach that explores data without testing formal models or hypotheses
Data concatenation
is combining data from two or more fields into a single field (is often used to create a unique identifier for a row)
Dirty data
is data that is inconsistent, inaccurate or incomplete. To be useful, it must be cleaned
Data standardization
is particularly important when merging data from several sources
Data Pivoting
is rotating data from rows to columns
type II error
is the failure to reject a false null hypothesis.
type I error
is the incorrect rejection of a true null hypothesis.
Aggregate Data
is the presentation of data in a summarized form.
Data de-duplication
is the process of analyzing the data and removing two or more records that contain identical information
Data Cleaning
is the process of updating data to be consistent, accurate, and complete.
Data consistency
the principle that every value in a field should be stored the same way.
Data Validation
the process of analyzing data to make certain the data has the properties of high quality data is both a formal and informal process.
Data filtering
the process of removing records or fields of information from a data source.
Data imputation
the process of replacing a null or missing value with a substituted value. Only works with numeric data It is critical to document in the analysis log when values have been ______
Select the target outcome. Find and prepare the appropriate data. Create and validate a model.
there are three steps to creating a predictive analytic model:
Descriptive Analytics
use exploratory data analysis techniques
Predictive analytics
use historical data to find patterns likely to manifest themselves in the future—the more data, the better chance of finding patterns.