ACIS 3504 Exam 2

¡Supera tus tareas y exámenes ahora con Quizwiz!

Which of the following is NOT one of the responsibilities of auditors in detecting fraud according to SAS No. 99?

Catching the perpetrators in the act of committing the fraud.

Examples of concealment efforts include

Charge a stolen asset to an expense account or to an account receivable that is about to be written off. Create a ghost employee who receives an extra paycheck Lapping Kiting.

The principle of simplification techniques include

Color, Quantity, distance, orientation

The principle of emphasis techniques include

Color, highlighting, weighting, ordering

•Opportunity is the opening or gateway that allows an individual to do what three things?

Commit the fraud Conceal the fraud Convert the proceeds

What type of computer fraud is tampering with software, illegal copying of software, using software in an unauthorized manner, creating software to undergo unauthorized activities

Computer instructions fraud

The highest level of DFD that provides a summary-level view of the system and depicts a data processing system and the external entities that are sources of its input destinations of its output. The process symbol is numbered with a "0

Context diagram

Employees who steal inventory or equipment sell the items or otherwise convert them to cash is an example of

Convert the theft or misrepresentation to personal gain

•place a heavy emphasis on the logical aspects of a system.

DFDs

Which of the following statements is FALSE?

DFDs help convey the timing of events.

Catalina was reviewing the data imputation formula for missing values in a customer's credit score. She found that two lines of data had the exact same customer name, birthdate, address, and buying history, but they had two different social security numbers. Assuming there was no fraud by the customer, what best describes what Catalina likely found

Data Contradiction Error

Santiago reviewed a recent extract of data about customer credit limits. He noticed that one company had a credit limit of $1,000,000,000 USD, whereas the next highest credit limit was $10,000 USD. What might have Santiago discovered?

Data Threshold Violation

•the process of updating data to be consistent, accurate, and complete.

Data cleaning

•the combining of data from two or more fields into a single field.

Data concatenation

the principle that every value in a field should be stored in the same way.

Data consistency

•errors that exist when the same entity is described in two conflicting ways and need to be investigated and resolved appropriately

Data contradiction errors

The entity that receives data produced by a system

Data destination

When an employee makes a mistake typing data into the system, it is called a _______.

Data entry error

•types of errors that come from inputting data incorrectly. They often occur in human data entry and can also be introduced by the computer system. May be indistinguishable from data formatting and data consistency errors in an output data file

Data entry errors

the process of removing records or fields of information from a data source

Data filtering

The movement of data among processes, stores, sources and destinations

Data flow

A graphical description of the flow of data within an organization, including data sources/destinations, data flows, transformation processes, and data storage

Data flow diagram (DFD)

Data flow diagram symbol that is represented by an arrow

Data flows

What do Data Flow Diagrams (DFD) focus on?

Data flows, processes, sources and destinations of the data, data stores

What type of computer fraud is illegally using, copying, browsing, searching, or harming company data

Data fraud

The intentional arranging of visualization items in a way to produce emphasis. Can be used in ascending, descending order, random, or alphabetically

Data ordering

The entity that produces or sends the data entered into a system

Data source

Data flow diagram symbol that is represented by a square

Data sources and destinations

The place or medium where system data is stored

Data store

Data flow diagram symbol that is represented by two parallel lines

Data stores

Computer systems are vulnerable to computer crimes because

Databases can be huge and access privileges can be difficult to create and enforce Organizations want employees, customers, suppliers and others to have access to their system Computer programs only need to be altered once

What are some of the reasons for fraudulent financial statements

Deceive investors or creditors Increase a company's stock price Meet cash flow needs Hide company losses or other problems

________ often make use of exploratory data analytic techniques, while _______ make use of machine learning techniques.

Descriptive analytics, predictive analytics

-goes beyond examining what happened to try to answer the question, "why did this happen?"

Diagnostic

A company wants to determine how to decrease employee turnover. In order to do this, they test whether paying off an employee's student debt will cause fewer employees to leave. The analytic testing whether paying off an employee's student debt causes lower turnover is an example of which type of analytic?

Diagnostic

When confirmatory data analysis techniques are used, what type of analytic is likely being computed?

Diagnostic analytic

Which of the following is a technique to simplify data presentations?

Distance

Illustrate the flow of documents and data among areas of responsibility within an organization, from cradle to grave; shows where each document originates, its distribution, its purpose and its ultimate disposition

Document flowcharts

-assuring the most important message is easily identifiable.

Emphasis

Process represented by a small bolded circle

End

refers to avoiding the intentional or unintentional use of deceptive practices that can alter the user's understanding of the data being presented.

Ethical presentation

An internal auditor validates the daily changes in customer's accounts receivable balances against daily sales made on account less cash collected on receivables

Example of Advanced Testing Techniques

A computer engineer performs a complicated merge of data from five different accounting systems of company subsidiaries. To check her work, she randomly selects 50 transactions from each system to validate to make sure the merge worked correctly.

Example of Audit a Sample

Ashton selects has 100,000 records. Ashton chooses to audit 1,000 records, or 1% of the total number of records. If in those 1,000 records Ashton finds 70 errors, Ashton can compute a 7% error rate.

Example of Audit sample

A CFO receives a spreadsheet file for review that contains annual pay raises for all company employees. The CFO examines the minimum, maximum, average, and median to make sure the data looks correct before making the final approval for pay increases.

Example of Basic Statistical Tests

COVID is not prevented by wearing masks and a company thought that masks did prevent COVID and people are wearing masks for no reason

Example of Type 1 Error

COVID is prevented by wearing a mask but the company thought that masks didn't prevent COVID and people are not wearing masks from spreading the virus

Example of Type 2 Error

If an alarm goes off while there is no fire

Example of Type I error

If an alarm doesn't go off and there is a fire

Example of Type II error

Which of the following reasons describes why transforming data is necessary?

**All of the above Data aggregated at different levels need to be joined Data within a field has various formats Multiple data values contained in the same field need to separated

Which of the following can be used to present data unethically?

**All of the above Selectively presenting only part of a viz With an axis, showing the most recent time closest to the origin Truncating or stretching the axes

At what point in the ETL process should data validation take place?

**All of these During data cleaning During data structuring During data standardization

a record accurately indicates a person lives in Nauvoo, Illinois but mistakenly lists the zip code as 26354, but the actual zip code for Nauvoo, Illinois is 62354

Example of Violated attribute dependencies

A terrorist group launches a computer virus aimed at corrupting all transaction data for a corporation by randomly changing the currency of transactions. An internal auditor scans the company's database to see if the transactions appear to be in multiple different currencies.

Example of Visual Inspection

paying employees more decreases the likelihood of employees leaving the company

Example of alternative hypothesis

consulting firm may keep track of positions in the organization such as partner, senior consultant, and research analyst by entering into the database the number 1 for partner, 2 for senior consultant, and 3 for research analyst.

Example of cryptic data values

each of these date formats represents the same date: April 3, 1982; 3 April 1982; 03/04/82; and 04/03/82. A single format should be chosen and used for all dates in a field and typically all dates contained in a file

Example of data consistency

Milton Armstrong's telephone number on line 16 is different than his phone number on all other lines. Due to the contradiction error in Milton's phone number, we do not know the true value. The phone number should be corrected so that Milton's phone number is the same throughout the dataset.

Example of data contradiction errors

a system may fail to record the first two digits of a year, and so it is not clear if the date is meant to be 1910 or 2010

Example of data entry errors

The office manager of a Wall Street law firm sold information to friends and relatives about prospective mergers and acquisitions found in Word files. They made several million dollars trading the securities.

Example of data fraud

A data set is created showing the number of hours employees work a week. Sam Howell's data was entered as "400" hours per week. This would be an example of

Example of data threshold violation

a field capturing the number of children a taxpayer claims as dependents in which the taxpayer lists the value of "300.

Example of data threshold violation

if a column of information listing customer names shows that all names were recorded fully capitalized, one might want to change the formatting such that only the first letter of the first and last name are capitalized.

Example of data validation

A virus pandemic causes governments to shut down all restaurants. A national restaurant chain creates an analysis to see how long their cash reserves can continue to pay employees before the company runs out of money.

Example of descriptive data

a company wants to examine social media data to see if people are saying positive or negative things about their company. They count positive and negative social media mentions or using text analysis software to give a numerical score of the tone of the tweets is an example of

Example of descriptive data analytics

An internal auditor notices an increase in a company's inventory shrinkage (i.e., inventory being stolen). The internal auditor creates a data model that explains what types of inventory are being stolen.

Example of diagnostic data

a large quantity of low gross margin products was sold, the data analyst finds that the marketing department advertised these products heavily in the last quarter. He wants to know why the marketing department focused on those products

Example of diagnostic data analytics

Railroad employees entered data to scrap more than 200 railroad cars. They removed the cars from the railway system, repainted them, and sold them.

Example of input fraud

a data field for city contains the country name Germany, the data values are misfielded. The value Germany should be entered in a data field for country.

Example of misfielded data values

paying employees more will have no effect on their likelihood of leaving the company

Example of null hypothesis

An employee scans a company paycheck, use desktop publishing software to erase the payee and amount, and print fictitious paychecks.

Example of output fraud

A tax accountant prepares analyses that shows what will happen to the customers of his client if the country adopts a new tax law.

Example of predictive data

Match.com uses sophisticated prediction algorithms that consider users' stated preferences and their browsing and searching activities in order to match each client with potentially successful future love interests

Example of predictive data Analytics

Amazon.com uses customer purchasing and search patterns to predict (and then display) other products the customer might be interested in purchasing.

Example of predictive data analytics

A corporate accountant designs a cook scheduling system based on past data for meal preparation. The new system should assure that there are always enough cooks scheduled for peak demand at the restaurant.

Example of prescriptive data

United Parcel Services (UPS), design a real-time solution. The program and subsequent updates optimize driver's delivery routes to save time, minimize driving distance, reduce emissions, increase safety, and ultimately boost the bottom

Example of prescriptive data analytics

An insurance company installed software to detect abnormal system activity and found that employees were using company computers to run an illegal gambling website.

Example of processor fraud

if a column should have numeric values, sorting will show if there are also characters contained in some entries in the column.

Example of visual inspection

the spread of the data about a prediction inherent in a model.

Failing to consider the variation

Which of the following is NOT an example of computer fraud?

Failure to perform preventive maintenance on a computer

•distinct from other types of fraud in that the individuals who commit the fraud are not the direct beneficiaries.

Financial Statement fraud

Which of the following statements is FALSE regarding flowcharts?

A system flowchart is a narrative representation of an information system

Flowchart symbol that indicates the flow of data, where flowcharts begin or end where decisions are made and how to add explanatory notes to flowcharts

Flow and miscellaneous symbols

An analytical technique that describes some aspect of an information system in a clear, concise, and logical manner. Use a set of standard symbols to depict processing procedures and the flow of data

Flowchart

All of the following are recommended guidelines for making flowcharts more readable, clear, concise, consistent, and understandable EXCEPT:

Flowchart all data flows, especially exception procedures and error routines

place more emphasis on the physical characteristics of the system.

Flowcharts

Intentional or reckless conduct, whether by act or omission, that results in materially misleading financial statements. "Cooking the books" (booking fictitious revenue, overstating assets)

Fraudulent financial reporting

Which type of fraud is associated with 50% of all auditor lawsuits?

Fraudulent financial reporting

a process is represented by a rounded-edge rectangle. An explanation of the activity is placed inside the rectangle.

Activity in a process

•the presentation of data in a summarized form.

Aggregate data

In a document flowchart you want to identify

All departments. documents, and processes

Wearing masks does decrease the chances of catching COVID

Alternative hypothesis example

Information that helps explain a business process is entered in the BPD and, if needed, a bolded dashed arrow is drawn from the explanation to the symbol.

Annotation information

Which chart type is best for depicting trends over time.

Area chart

How are data sources and destinations represented in a data flow diagram?

As a square

Pie charts are the most over-used type of charts. This is because they are often used to show comparison. Select which chart type is best for making comparisons

Bar charts

What is the term used for a data flow diagram where there is an inflow of data but no outflow of data

Black hole

Which of the following is NOT a good reason to visualize data?

Building visualizations does not take as much time as writing a report.

The intent is that all business users can easily understand the process from a standard notation Can show the organizational unit performing the activity

Business Process Modeling Notation (BPMN)

receiving an order, checking customer credit, verifying inventory availability, and confirming customer order acceptance, shipping the goods ordered, billing the customer, and collecting customer payments are all examples used in a

Business process diagram

A visual way to describe the different steps or activities in a business process, providing a reader with an easily understood pictorial view of what takes place in a business process

Business process diagram (BPD)

A general rule of thumb is that a visualization should only have 3-5 groups in the data area. Putting in more or less than this amount violates which principle?

Goldilocks principle

What pattern do a system flowchart and a process flowchart follow?

Identify the inputs Each input is followed by a process (steps preformed on the data) The process is followed by outputs (the resulting new information)

Flowchart symbol that shows input to or output from a system

Input/output symbols

In a document flowchart what does each department get

It's own column

Factors that allow opportunity include:

Lack of internal controls Failure to enforce controls (the most prevalent reason) Excessive trust in key employees Incompetent supervisory personnel Inattention to details Inadequate staff

•a projection of the process on the Context diagram. It is like opening up that process and looking inside to see how it works to show the internal sub-processes. You repeat the external entities but you also expand the main process into its subprocesses. Also, data stores will appear at this level.

Level 0 diagram

-Theft of company assets by employees which can include physical assets (e.g., cash, inventory) and digital assets (e.g., intellectual property such as protected trade secrets, customer data)

Misappropriation of assets

Analyn spent the entire day entering information about suppliers into the company database. She did not make a single spelling mistake in any of the entries. However, at the end of the day, Analyn notices that she entered the state into the country field for all of the data. The mistaken data values in the country field are best described as which of the following?

Misfielded data values

Data values that are correctly formatted but not listed in the correct field

Misfielded data values

All of the following are guidelines that should be followed in naming DFD data elements EXCEPT:

Name only the most important DFD elements

bars on a bar chart are displayed as much thicker than the other, it makes the thicker bar appear to be much more important because of increased visual weight is an example of

Non-proportional display of data

-a proposed explanation worded as a statement of equality meaning that one of the two concepts, ideas, or groups will be no different from the other concept, idea or group

Null hypothesis

Wearing masks doesn't effect the likelihood of catching COVID

Null hypothesis example

Computes minimum, maximum, mean, median, and sum for numeric fields and see if the dataset contains a complete set of all the original transactions.

Numeric values in basic statistical tests

Ashton selects has 100,000 records. Ashton chooses to audit 1,000 records, or 1% of the total number of records. If in those 1,000 records Ashton finds 70 errors, Ashton can compute a 7% error rate. Ashton can assume that

Out of the 100,000 records there could be 7,000 total errors

Data analytics techniques to detect fraud include

Outliner detection, anomaly detection using trends and patterns, regression analysis, semantic modeling, and Benford's Law.

What type of computer fraud is stealing, copying, or misusing computer printouts or displayed information

Output Fraud

Which of the following control procedures is most likely to deter lapping?

Periodic rotation of duties

Part to Whole uses what two types of visualizations

Pie chart, treemap

Making sure to use separate training datasets and test datasets is especially important for creating what type of analytic?

Predictive analytic

Indicate which option orders the type of analytic from the one that provides the most value added to an organization to the least value added to the organization.

Prescriptive, predictive, diagnostic, descriptive

These three conditions must be present for fraud to occur

Pressure, opportunity, rationalize

Action that transform data into other data or information

Processes

Flowchart symbol that shows data processing, either electronically or by hand

Processing symbols

illustrates the sequence of logical operations performed in a computer progrm

Program

•illustrate the sequence of logical operations performed by a computer in executing a program; describes the specific logic to perform a process show on a system flowchart

Program Flowchart

The documentation skills that accountants require vary with their job function. However, all accountants should at least be able to do which of the following?

Read documentation to determine how the system works

How do accountants use documentation?

Read documentation to understand how a system works (auditors assess risk) Evaluate strengths and weaknesses of an entity's internal controls Prepare documentation to demonstrate how a proposed system would work or demonstrate their understanding of a system of internal controls

Requires that auditors understand the automated and manual procedures an entity uses. This understanding can be gleaned through documenting the internal control system—a process that effectively exposes the strengths and weaknesses of the system.

SAS-94

Legislation intended to prevent financial statement fraud, make financial reports more transparent, provide protection to investors, strengthen internal controls at public companies, and punish executives who perpetrate fraud.

Sarbanes-Oxley Act

•requires management to assess internal controls and auditors to evaluate the assessment

Sarbanes-Oxley Act (SOX)

Correlation uses what two types of visualizations

Scatterplot, heat map

refers to making a visualization easy to interpret and understand

Simplification

a process is represented by a small circle.

Start/Begin

Flowchart symbol that shows where data is stored

Storage symbols

Which of the following statements is FALSE about fraud criminals

The psychological profiles of white-collar criminals are significantly different from those of the general public.

Data flow diagram symbol that is represented by a circle

Transformation processes

Chibuzo creates a chart to show the percentage of activities in the accounting function have been automated over time. She wants to stress the slow rate of change by the department to adopt automation. What is the purpose of Chibuzo's visualization and what type of chart would be best for this purpose?

Trend evaluation, line chart

A DFD consists of the following four basic elements: data sources and destinations, data flows, transformation processes, and data stores. Each is represented on a DFD by a different symbol.

True

Documentation methods such as DFDs, BPDs, and flowcharts save both time and money, adding value to an organization.

True

Making an item in the data area of a viz larger to increase emphasis is an example of using which principle?

Weighting

In a document flowchart what does it show for documents

Where each document originated from and its final disposition

Researchers found significant differences between what two types of people

White collar criminals, violent criminals

The flow of data or information is indicated by

an arrow

help focus on a trend rather than individual values, and are useful when trying to show a progression over time.

area chart

puts the categorical data variable on the x-axis (or on the y-axis if the chart is rotated) and then plots the numerical value on the other axis.

bar chart

draws a line at the median value for a numeric variable and then shows another line for the upper quartile and lower quartile (the connection of these lines forms the box).

boxplot

adds a "bullet" or a small line by each bar that indicates an important benchmark

bullet graph

Any type of fraud that requires computer technology to perpetrate leaves little evidence making them more difficult to detect can steal more of something, In less time, With less effort

computer fraud

When data is joined together it is called _________, when it is split apart it is called ________.

data concatenation, data parsing

A graphical depiction of information designed with or without an intent to deceive, that may create a belief about the message and/or its components, which varies from the actual message

data deception

the process is represented by a diamond. An explanation of the decision is placed inside the symbol.

decision

A company uses a boxplot in a visualization. What is likely the purpose of the visualization?

distribution

•shows the flow of documents and data between departments or units, useful in evaluating internal controls

document

Which of the following flowcharts illustrates the flow of data among areas of responsibility in an organization?

document flowcharts

Whether or not someone is registered to vote could be an example of a

dummy variable

A pharmaceutical company is trying to develop a drug that will help cure the most people with a serious disease. To choose the drug that can cure the most people, the data analyst should look at what?

effect size

A level 0 diagram and context diagram should both have the same

external entities with the same flows to and from those entities.

Pressures That Lead To Employee Fraud include

financial, emotional, lifestyle

A DFD is a representation of which of the following?

flow of data in an organization

Changes in the physical characteristics of the process do affect the ___________ but have little or no impact on the ___________

flowchart, DFD

​​Any means a person uses to gain an unfair advantage over another person.

fraud

shows colors that relate to the magnitude of the different entries.

heat map

a single numeric value is divided into equal-sized bins, and the bin sizes are listed on the x-axis. Then, a bar is used to show the count of each value that falls into the bins.

histogram

Which of the following causes the majority of computer security problems?

human errors

What type of computer fraud is alteration or falsifying input

input fraud

Data flow diagram symbol that is represented by a orange triangle

internal control

Creating cash using the lag between the time a check is deposited and the time it clears the bank.

kiting

Former and current employees are much more likely than non-employees to perpetrate frauds (and big ones) against companies are also called

knowledgeable insiders

Which of the following is a fraud in which later payments on accounts recievable are used to pay off earlier payments that were stolen?

lapping

the x-axis is an ordered unit such as days, months, or years.

line chart

What is the term used in a data flow diagram where there is an outflow of data but no inflow of data

miracle

Every data flow diagram must have

one date inflow and one data outflow

The condition or situation that allows a person or organization to commit and conceal a dishonest act and convert it to a personal gain

opportunity

show which items make up the parts of a total. Appropriate when showing percentages that sum up to 100%

pie chart

A person's incentive or motivation for committing fraud

pressure

What type of computer fraud includes unauthorized system use, including theft of computer time and services

processor fraud

Which of the following conditions is/are usually necessary for fraud to occur? Please select ALL of the correct answers.

rationalization, pressure, opportunity

recasting actions as "morally acceptable" behaviors to maintain self image

rationalizations

In a system flowchart a process will almost always be represented by a

rectangle

where a numeric variable is listed on the x-axis, a different numeric variable is listed on the y-axis, and the values of each are plotted in the data area.

scatterplot

•depicts the data processing cycle for a process; describes the relationship between inputs, processing, and outputs

system

•depicts the relationship among the inputs, processes, and outputs of an AIS.

system flowchart

A program flow chart is drawn for each rectangle in

the system flowchart

A subset of data used to train a model for future prediction

training dataset

nested rectangles to show the amount that each group or category contributes

treemaps

You co-own a theme park. You believe that the longer customers stay in the park, the hungrier they will be which would increase the amount they spend on food. Your co-owner believes that the longer customers stay in the park, the more likely they are to feel nauseated which would decrease the amount they spend on food. Both of you gather data and find some evidence supporting your belief. If the true relation is that there is no relation between time in the park and food sales, what type of error did your co-owner make?

type 1 error

the amount of attention an element attracts

visual weighting

any visual representation of data, such as a graph, diagram, or animation; called a viz for short

visualization

Fraud is a

white collar crime

Researchers found few differences between what two types of people

white collar criminals, the general public

Typically, businesspeople who commit fraud. Usually resort to trickery or cunning and their crimes usually involve a violation of trust or confidence

white-collar criminals

Threats to AIS include

•Natural and political disasters •Software errors and equipment malfunctions •Unintentional acts •Intentional acts

The auditor's responsibility of SAS No. 99 includes

•Understand fraud •Discuss the risks of material fraudulent misstatements •Obtain information •Identify, assess, and respond to risks •Evaluate the results of their audit tests •Document and communicate findings •Incorporate a technology focus

Guidelines for creating a DFD include

•Understand the system that you are trying to represent. •A DFD is a simple representation meaning that you need to consider what is relevant and what needs to be included •Start with a high level (context diagram) to show how data flows between outside entities and inside the system. Use additional D F Ds at the detailed level to show how data flows within the system •Identify and group all the basic elements of the DFD •Name data elements with descriptive names, use action verbs for processes (e.g., update, edit, prepare, validate, etc.) •Give each process a sequential number to help the reader navigate from the abstract to the detailed levels. Edit/Review/Refine your D F D to make it easy to read and understand

Guidelines for Drawing Flowcharts include

•Understand the system you are trying to represent. •Identify business processes, documents, data flows, and data processing procedures. •Organize the flowchart so that it reads from top to bottom and left to right. •Clearly label all symbols . •Use page connectors (if it cannot fit on a single page) Draw a rough sketch of the flowchart •Edit/review/refine to make it easy to read and understand. Draw a final copy of the flowchart

There are only 7 unique job positions at a company but 9 different positions are attributed to employees

Violation of validity

Which of the following techniques is most likely to discover an error where a data analyst did not correctly parse data from one field into two fields?

Visual Inspection

•process of examining data using human vision to see if there are problems.

Visual inspection

Correct; free of error; accurately represents events and activities

Accuracy

possible with a deeper understanding of the content of data.

Advanced testing techniques

Suzette sends Jimmy a flat file with a list of all sales transactions the company made during the last year. Each line contains all the information about a single sale. Jimmy prepares a report that shows three different views of the data (1) the total sales for each quarter, (2) the total sales by customer, and (3) the total sales for the entire year. To make this report, Jimmy had to do which of the following to the data Suzette sent?

Aggregate the data

Comparison uses what two types of visualizations

Bar chart, bullet graph

Does not omit aspects of events or activities; of enough breadth and depth

Completeness

-tests a hypothesis and provides statistical measures of the likelihood that the evidence (data) refutes or supports a hypothesis.

Confirmatory data analysis

Presented in same format over time

Consistency

•data items that have no meaning without understanding a coding scheme.

Cryptic data values

Joleen queried the company database and returned 23 columns of information for her report. In examining the data, she noticed that one column only had values half of the time. Joleen decided to delete this column from her report. This is an example of which of the following?

Data filtering

Among the following statements, which is likely to be detected using visual inspection? The setting: a company extracts data from one system, transforms the data into a new format, and then loads it into a new system. The visual inspection validation tests are performed on a portion of the data in the new setting.

Data from two fields was not concatenated into one field during the transformation process.

process of analyzing data to make certain the data has the properties of high-quality data

Data validation

When a field contains only two different responses, typically 0 or 1, this field is called

Dummy/dichotomous variable

Distribution uses what two types of visualizations

Histogram, boxplot

When data is aggregated, some of the detailed information is lost. Which of the following is needed if you want to show both the aggregated and disaggregated data together?

Joining the aggregated data with the disaggregated data

answers the question "what should be done?"

Prescriptive

What different forms can data be presented?

Static graphics, tables, videos, static and dynamic models

Provided in time for decision makers to make decisions

Timely

incorrect rejection of a true null hyp.

Type I error

failure to reject a false null hyp.

Type II error

Data measures what it is intended to measure; conforms to syntax rules and to requirements

Validity

Billy-Bob Barker bakes big, beautiful brownies. However, Billy-Bob notices that the recipe he printed from the company database correctly states that the recipe needs flour but incorrectly lists the approved flour company and instead lists the approved salt vendor. This is an example of which of the following?

Violated Attribute dependency

•errors that occur when a secondary attribute in a row of data does not match the primary attribute.

Violated attribute dependencies

A sale occurred on December 27 but is recorded as occurring the following year on January 4.

Violation of accuracy

An annual evaluation of vendor performance only contains 7 months of data

Violation of completeness

A company switches the denomination of amounts regularly (thousands to millions)

Violation of consistency

Customer purchasing metrics are 2 years old

Violation of timely

As part of the data standardization process, often items contained in different field for the same record need to be combined into a single field. This process is called:

data concatenation

•a data point, or a few data points, that lie an abnormal distance from other values in the data

outlier

-a proposed explanation worded as a statement of inequality, meaning that one of the two concepts, ideas, or groups will be greater or less than the other concept, idea or group

Alternative hypothesis

Using which of the following data validation techniques, can the validator estimate a likely error rate in the population of data?

Audit Of A Sample

One of the best techniques for assuring data quality

Audit a sample

select a sample of data items from the original data sources and make sure all those items are listed in the final dataset.

Audit a sample

if the field captures data about whether a vendor is a preferred vendor or not, the value of 1 would suggest they are a preferred vendor and 0 that they are not. With dummy variables, best practice is to give them a meaningful name rather than a generic name.

Example of dichotomous variable

-data that is inconsistent, inaccurate, or incomplete.

Dirty data

Aggregating data, data joining, and data pivoting are examples of which of the following?

Examples of data structuring

•process of estimating a value that is beyond the data used to create the model.

Extrapolation beyond the range of data

Which of the following techniques is most likely to discover a very large data threshold violation in a dataset containing 10 billion transactions?

Basic Statistical Test

Preformed to validate the data

Basic statistical tests

A construction company classifies their projects into one of seven different types. To keep track of project classification, the clerk enters a number from 1 to 7 in the ProjectType field. The values 1 to 7 are best described as ________.

Cryptic data values

the process of analyzing data and removing two or more records that contain identical information.

Data de-duplication

•the process of replacing a null or missing value with a substituted value. This process only works with numeric data

Data imputation

The process of combining different data sources

Data joining

•when a model is designed to fit training data very well but does not predict well when applied to other datasets.

Data overfitting

•involves separating data from a single field into multiple fields.

Data parsing

A technique that rotates data from rows to columns

Data pivoting

•the process of standardizing the structure and meaning of each data element so it can be analyzed and used in decision making.

Data standardization

•the process of changing the organization and relationships among data fields to prepare the data for analysis.

Data structuring

data errors that occur when a data value falls outside an allowable level.

Data threshold violations

In which step of the data transformation process would you analyze whether the data has the properties of high-quality data?

Data validation

-computations that address the basic question of "what happened?"

Descriptive

-go a step further than diagnostic analytics to answer the question "what is likely to happen in the future?"

Predictive

The benefits of visualizing data relative to reading are

Processed faster than written or tabular information Easier to use. Users need less guidance to find information with visualized data Supports the dominant learning style of visual learning because most people are visual learners

A subset of data not used for the development of a model but used to test how well the model predicts the target outcome

Test dataset

counting the number of total records or the number of distinct records present before and after a data transformation., it is also possible to compute the length of each value and compare this amount to pre-transformation lengths to see if there are changes.

Text fields in basic statistical tests

Julie knows that her report printout should have only two columns of information from the database and that each column should have a dummy variable in it. The report she receives from IT has a single column and some examples of the values in the column are "10", "01", "11", "00." Julie surmises that what likely is the problem?

The IT department improperly concatenated the data.

What are the five main purposes of visualization

comparison, correlation, distribution, trend evaluation, and part-to-whole.

Adi queries the company database to return all values from the field "FullAddress." Adi reviews the information and finds that half of the time the values store the city and country values before the street address and the other half of the time the street address is listed before the city and country. What type of error did Adi find in the database?

data consistency error

•approach that explores data without testing formal models or hypotheses.

exploratory data


Conjuntos de estudio relacionados

Substance Related & Addictive Disorders Ch.19 Psych Exam2

View Set

Writing an Effective Comparison/Contrast Essay

View Set

G - PACMAN (Pneumonic) MAJOR CYP Inhibitors

View Set

Prep U Practice Questions (Perfusion)

View Set

Chapter 10: Types of Muscle Contractions and Fibers

View Set

Business Communications, Chapter 4 Quiz, SAU

View Set

Medical Surgical Nursing Lewis Chapter 10 - Substance Use Disorders, Mental Health Nursing: Substance Use & Addictive Disorders, LEWIS CH 10 Substance abuse

View Set