Fraud Chapter 9- Transforming Data Into Evidence (Part 2)

Ace your homework & exams now with Quizwiz!

Daubert factors

(1) Can be tested with scientific method? (2) Subject to peer review and publication? (3) Known or potential rate of error? (4) Generally accepted?

Two common Excel add-ins are

Analysis Toolpak ActiveData for Excel

Benford's Law used to identify patterns in Income Tax

Christian and Gupta 1993 identified tendency among individuals to claim additional deductions for the purpose of decreasing taxable income to fall within the next-lowest tax bracket Nigrini 1996 examined interest income and interest expense reported on tax returns, found evidence of understatement in the former and overstatement in the latter

Two basic charts that can be used for a variety of purposes are

pie charts and bar graphs

Data sets for financial transactions such as accounts payable, accounts receivable, and sales are

positively skewed and not normally distributed

According to Benford's law the distribution of first digits is

positively skewed or more heavily weighted toward smaller numbers means the first digit or left-most digit is more often low than high

The ultimate goal of comparative analysis of data is

prediction of the likelihood that a deviant observation is the result of some external influence such as error or fraud and not attributable to mere chance

data mining can serve as a

preventative or detective role in fraud investigations

Skewness

a measure of the degree of asymmetry of a data distribution around its Mean

Negative relationship data

negative data moving in the opposite direction

The value of RSF is that it provides a

numeric measure that can be compared to a bench or tracked over time

Discrete distribution

observations are countable, and there is a discrete "jump" between successive values

Every observation should be included in only _____ interval

one

Applications of Benford's law

-a number series must approximately follow a geometric sequence in which each successive number is calculated as a fixed percentage increase over the previous number

Examples of credit card transaction frauds/patterns

-abrupt shifts in the curves or changes of a spending slope indicate fraud -Customers using specific cards for specific types of purchases -fraudster usually spends as much as possible on the card in a short amount time before theft is discovered -Transactions of first-time users are usually less frequent than usage of long-time users -certain transactional patters: red flags; frequent purchases of small electronics or jewelry which can be resold on the black market and usage across a wide geographic area

Data mining methods to detect potentially fraudulent transactions are based on

-customer usage patterns -expected usage patterns -patterns that are known to be associated with fraud

Trade-off between Type I and Type II errors

-decreasing the occurrence of one increases the occurrence of the other

Advantages of basic data analysis programs

-ease of use -flexibility of application -various functions are available

Useful tasks from sorting

-identifying duplicate entries -identifying transactions with round numbers -identify gaps in the data sequence (such as dates, check numbers, or invoice numbers) -identify matches in data fields (such as employees and vendors with the same name or contact information) -Compute category totals (such as total payments made to a specific vendor or employee or total payments for specific expense category) -Highlight blanks (or lack of data) in a particular data field (such as employees without a social security number or vendors without an address) -identify inconsistencies among data fields (such as incompatible telephone numbers and addresses or back-dated checks)

Advantages of Visual exhibits and methods of displaying data

-images are more effective than words in conveying ideas, especially complex ideas - information can be communicated more efficiently in visual in less time and with more precision

Important features of the normal distribution

-it is symmetric around its Mean- it has zero skewness -because it is symmetric, its mean, median, and mode are all equal -it is completely described by its mean and standard deviation. graphing the distribution does not require knowledge of the individual data points, just the mean and standard deviation -the curve is bell-shaped; the normal distribution is often called a " bell curve"

Examples of data mining applications

-marketing research: predicting customer demand and sales -drug research: predicting the effectiveness of drugs and the likelihood of side effects -credit scoring: predicting the likelihood of default or bankruptcy -operations management: predicting input usage and productive efficiency -investment analysis: predicting life expectancies and probabilities of other insurable events -fraud detection: predicting the likelihood that irregular transactions reflect unlawful practices

Commonly employed ratios

-ratio of the largest value to the smallest value; larger ratio indicates greater variation in the data set -ratio of the largest value to the second-largest value; known as the Relative Size Factor; large RSF indicates an outlier in the data set -ratio of the smallest value to the second-smallest value: identifies outliers but on the opposite side of the distribution -ration of the largest or smallest value to the mean: means of identifying outliers using a different reference point

Ways to manipulate graphs

-scale (should always be included) -inclusions or lack thereof of labels including the title of the graph, labels of the horizontal and vertical axes, and labels of individual data points

Common types of credit card fraud

-stolen card: unauthorized usage of a stolen card -counterfeit card: duplicating credit cards for the purpose of fraudulent transactions -cardholder-not-present fraud: unauthorized usage of credit card information for transactions via phone, Internet, or mail -Application fraud: opening a credit account using another person's personal information

Data profiles may reflect

-the past behavior of the system being studied -may be extrapolated from other similar systems -may be the product of complex models that consider multiple factors

Disadvantages of Microsoft Excel and Access

-they allow data to be altered intentionally or unintentionally without record -errors can be easily introduced, most common via formulas, copy paste, incorrect cell references, improperly defined cell ranges -sometimes cannot accommodate data in certain formats; data must be converted where data could be further compromised

Data profiles are often defined in terms of

-trends over time (time series models, data distributions) -changes in expected trends or observations that fall outside the expected distributions suggest the need for closer investigation

USDA high-tech strategies against fraud

-working with social media firms and using mining techniques -data is collected from LINK terminals and reviewed for suspicious transactions -uses Anti-Fraud locator using EBT Retailer Transactions Alert System

Two common examples of GAS

1. Audit Command Language (ACL) 2. Interactive Data Extraction Analysis (IDEA)

Categories for Descriptive measures

1. Measures of central tendency (where observations are concentrated) 2. Measures of viability (how the observations are dispersed)

Three Common Measures of Variability

1. Range 2. Variance 3. Standard deviation

Major disadvantages of GAS programs

1. higher cost compared to basic programs 2. extensive training required to use them effectively

Total area under the curve is

1.00 or 100%

Close to conformity to Benford's Law requires a large data set often defined as at least

1000 observations with numbers having at least four digits

Benford's Law use to identify patterns in reported net income

Carlaw 1988 found evidence that companies with net income below a certain threshold have a tendency to round the income number up

Data Analysis tests used in Benford's Law to identify irregularities in data sets

First-Digit test First-Two-Digits test Last-Two-Digits test

GAS

Generalized Audit Software developed for use in auditing and fraud investigation engagements

data mining

Goal is to find individual items of value accountant uses it to reduce a large number of observations to a smaller number tat can be examined more closely allows the analyst to screen all the observations in a data set instead of relying on a sample

Fore symmetric distribution (with no skewness), the

Mean, Median, and Mode are all equal

Two basic data analysis programs

Microsoft Excel Microsoft Access

Benford's Law used to identify patterns in Fraud detection

Nigrini 1994 was the first to use digital analysis for fraud detection

Benford's Law formula

P(d)=log10 [1+(1/d)] where P represents probability or frequency and d is an integer from 1-9

Benford's Law

Pattern describes the expected frequencies of digits in numbers, the probability that any given digit in a number will take a certain value Mathematical algorithm or series of formulas that accurately predicts that, for many data sets, the first digit of each group of numbers in a random sample will begin with 1 more than a 2, a 2 more than a 3, a 3 more than a 4, and so on. Predicts the percentage of time each digit will appear in a sequence of numbers.

Descriptive Statistics

Purpose: to describe data using various numerical measures and graphical depictions -measures that describe samples of data *can be used in any engagement that involves analysis of a numerical data set; the larger the data set the more valuable the summary measures

SNAP

Supplemental Nutrition Assistance Program -a federally funded benefit program that assists low-income individuals and families with purchasing eligible food items.

Variance

The average of the squared deviations of the observations from the mean

Mean

The average value calculated by adding all the observations and dividing the number of observations

Median

The center point of the data set, when the observations are ordered by magnitude. This can be a single observation or point between two observations

Range

The difference between the values of the largest and smallest observations

Standard deviation

The square root of the variance

Benford's Law used to identofy patterns in Earnings per share

Thomas 1989 found that EPS numbers in the U.S. displayed unusually high frequencies of 5-10 cent multiples this provided additional evidence of rounding numbers up

In statistics, a false positive is called a

Type I error

A failure to identify a true signal is called a

Type II error

SNAP Fraud

USDA defines it as the exchange of SNAP benefits for cash also known as trafficking or discounting; prohibited by federal law -most is committed by retailers exchanging benefits for cash or by -people selling or trading their LINK cards in the open market, often through websites -usually smaller stores are more likely to participate in Fraud

Histogram

a bar chart in which each bar represents a single interval and the height is an important data analysis tool because it illustrates the shape of the data distribution which is a key determinant of the analytic methods that can be applied a graph of the frequencies of grouped data

Negatively Skewed

a distribution that extends farther to the left than to the right the distribution is more heavily weighted toward larger numbers

A measure that describes a population is?

a parameter

To perform data comparisons you must determine

a) what the data set actually looks like b) what it should look like

When can data be sorted?

after the data has been compiled into a spreadsheet with various data fields

Specialized programs ability to record the analytics

allows for the forensic accountant to review the analysis that has already been completed for guiding future efforts and avoiding future engagements a complete record of the data analysis process provides essential context for the results of the analysis

Using a larger number of intervals provides

amore detailed picture of the data distribution but may be misleading if the observations are heavily weighted in only a few intervals *between 7 to 12 intervals is sufficient

What is the most common way that graphs are biased

by manipulating the scale

Data mining cannot detect fraudulent transactions with

certainty it is limited to identifying irregular transactions that have a higher likelihood of being fraudulent

Almost all natural numbers display geometric tendency including

city populations sizes of geologic objects accounting numbers (stock prices, company revenues, trading volume)

For a fraud scheme to be eligible for data analysis, the data must be

collected, recorded, stored, and organized -bribery, kickbacks, and other forms of corruption do not create such data

First-Two-Digits Test

compares the first two digits of a data set with Benford's profile for the first two digits slightly steep slope this test offers more precision 90 total combinations 10-99

Last-Two-Digit Test

compares the last two digits of a data set with Benford's profile for the last two digits 100 possible combinations 00-99 each combination has the same probability of occurrence 1% this test is useful for identifying round or whole numbers which are red flags for invented numbers

data profiles

created with patterns of existing data that reflect expected or normal experience for it can be compared to new data

Five dimensions Banks use to evaluate transactions

customer account product geography time

In Access each row in the table contains

data for a single record (a transaction) fields are columns each field can only store one type of data for all the records

Positive relationship data

data moving in the same direction

Two ____ _____ can have the same number of observations and the same Mean, Median, and mode but have different variability

data sets

Examples of numbers that do not follow Benford's Law

data sets with built-in maximums or minimums and assigned numbers such as (Social Security numbers, account numbers, and zip codes) there is some unnatural (external) influence on the numbers that stifles the development of geometric pattern

What is the first step in creating a histogram?

defining the intervals, which is a matter of judgement for the analyst

Measures of Viability

describe how the observations are dispersed around the Mean

The challenge in applying the last-two-digits test is

determining which last-two digits are appropriate for the analysis

Both _____ and ____ distributions can be graphed as histograms

discrete; continuous

Pie charts are used to

display categories of data that sum to a total slices of the pie represent the percentage of the total contained within each category

For a right-skewed distribution, the Mean is

greater than the Median, which is greater than the Mode

The magnitude of an observation can

have a negative value a zero value or a positive value

The key function of statistics is defining

imprecision, which is represented by terms such as the error rate significance level confidence level

Analysis Toolpak

included with Excel and can be accessed through simple loading process includes analysis tools such as: descriptive statistics histogram sampling

Compared to the variance, the standard deviation is more easily

interpreted because it is state in terms of basic units rather than squared units

What kind of relationship exists between size of the intervals and the number of intervals for a given data set?

inverse the smaller the intervals the larger the number of intervals

digital analysis

is founded on the counterintuitive observation that individual digits of multidigit numbers are not random, but follow a pattern known as Benford's Law

Normal distribution

is the most prominent continuous distribution

Defining the intervals for a histogram is a_____ process

iterative, where the analyst considers various alternatives before selecting one that is most appropriate for the specific purpose of the analysis

In an embezzlement scenario, a fraudster may deliberately issue payments

just below some threshold or transportation of digits to make it seem like it was an innocent error (12,323 vs. 21,313)

The most basic form of presenting quantitative data is a

listing of the value for each individual observation not feasible for large data sets useful to summarize the data in some way such as value intervals, time intervals, or categories and present the summary measures in tabular format

With Access it is always clear whether you are

looking at data (input) or results (output)

Numbers with fewer digits paly a slightly higher bias toward

lower digits

The advantage of the Mean is that it considers the

magnitude of all the observations, representing the point where their mass (or weight) is concentrated

The Media does Not consider the

magnitude of each observations, only whether it is located in the upper or lower half of the distribution for this reason it is not affected by extreme observations (outliers)

ActiveData for Excel

more than 100 tools available that must be purchased and installed; offered in two versions

Experts can be challenged in deposition or cross-examination about

probabilities related to their conclusions an effective response requires knowledge of whether such probabilities can be determined, if not, the ability to explain why

Known or potential error rate of the method

probability concept

Inferential Statistics

purpose: to draw conclusions (inferences) about a population based on information obtained from a sample -limited -requires that a sample be drawn randomly from a population

Observations can be compiled in a

single point in time (a cross section) or over some period of time (a time series) different statistical sets are needed for each

It is more difficult to identify significant observations from Benford's profile in

small data sets

What is a straightforward form of data mining?

sorting

A measure that describes a sample is called a

statistic

Financial data that is normally distributed includes

stock prices, rates of return, profits, commodity or currency prices

A sample is a

subset of observations selected from the population

A key difference between Microsoft Excel and Microsoft Access is

that Access forces some structure on the data analysis project while excel allows more flexibility

The key measures of central tendency are

the Mean, Median, and Mode of these three Mean is the most commonly recognized

A key advantage of data mining is

the ability to examine all not just samples of the data

The standard deviation and the variance have

the advantage of considering all the observations in the data set

Absolute frequency

the count of observations within an interval

A population is

the entire group of observations in which we are interested

First Digit Test

the first-digit profile of a data set is compared to Benford's first- digit profile.

For a left-skewed distribution

the median is larger than the mean

Mode

the most frequently occurring value

The most basic feature of a data set is

the number of observations -it is important because it determines the scope of the analysis; - determines what methods can be applied to the data, what technological resources are needed, and how long the analysis will take

Low variability indications that

the observations are located farther from the Mean

Relative Frequency

the percentage of the total number of observations that fall within an interval

Measures of Central Tendency

the tendency for quantitative data to cluster around certain values -clustering often (but not always) occurs near the center of the data distribution

The efficiency of data mining can be evaluated in terms of

the true signals it identifies relative to false signals, also described as false positives or noise

Advantages of Relative frequencies

they are standardized or described relative to a standard quantity---the total number of observations

Advantages of specialized software

they can process data in a wide variety of formats which eliminates a need for conversion analyzes database as read only to avoid altering data they have the ability to record the analytics that have been performed creating an audit trail

What is the purpose of statistics?

to summarize data, analyze them, and draw meaningful inferences that lead to improved decisions

Second important feature in a data set is

two dimensions: 1. time 2. magnitude (amount)

Continuous Distribution

values can be measured to an infinitesimally small degree of accuracy values are continuous' there is no discrete jump ex: time, weight, and distance

Specialized software programs address the

weaknesses of Microsoft Excel and Access

Positively skewed

when a distribution extends farther to the right than to the left more heavily weighted toward smaller numbers


Related study sets

NUR344 PrepU: Chapter 30 - Disorders of Hepatobiliary and Exocrine Pancreas Function

View Set

More!2 Unit 1 - Irregular Verbs part 1

View Set

Introduction to Cartography (How to Make an Effective Map)

View Set

Med/Surg Ch 35, 36, 37, 38 Prep U

View Set