test 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

volume refers to

size

the purpose of loading data is to

put the data into the appropriate tool for analysis

most immediate/significant effect of database technology on accounting

quicker access and greater use of accounting info for decision making

when obtaining data yourself, you should do all of the following except

identify any errors or issues from the extraction

steps of data reduction

identify attributes to reduce or focus on, filter results, interpret results, follow up on results

examples of clustering

identify groups of transactions that may indicate risk or fraud

step 4: address and refine results

identify issues with the analyses, possible issues and refine the model

data analytics

process of evaluating data with intent of drawing conclusions to address business questions

which approach to data analytics would you use to identify fraud or transactions that might warrant additional investigation

profiling

5 most frequently used approaches

profiling, data reduction, regression, classification and clustering

ADS (audit data standards) aim

provide a guide to standardize audit data requests and the format in which data from the company is provided to the auditor; provide an opportunity for standardization

the IMPACT cycle is _________ in nature and suggests that _____

recursive; as questions are addressed, new important questions may emerge that can be addressed in a similar way

vlookup: col_index_num

refers to the column in selected table_array that contains data you wish to view; indicates what you want the function to return

Given a balance of total A/R held by a firm, what is the appropriate level of allowance for doubtful accounts for bad debts? is an example of which data analytics approach

regression

structured data should be stored in a normalized

relational database

type of database you are most likely to come across when extracting and using financial data

relational database

relational databases and enforcing business rules

relational databases can be designed to aid in the placement and enforcement of internal controls and business rules in ways that flat files cannot

foreign key creates the ____________ between tables

relationship

link prediction key word

relationship/related

cleaning the data

remove headings or subtotals, clean leading zeros and nonprintable characters, format negative numbers, correct inconsistencies

how to clean the data

remove headings or subtotals, clean leading zeros and nonprintable characters, format negative numbers, correct inconsistencies

pruning

removes branches from a decision tree to avoid overfitting the model

data visualization and data reporting

report results of analysis in an accessible way to each varied decision maker and their specific needs

data analytics provides a way

to search through large (un)structured data to identify unknown relationships or patterns

best describes the purpose of a relational database

to support business processes across the organization

VLookup

tool for looking up data from two separate tables and matching them base on a matching primary/foreign key relationship

a digital dashboard would be used during which step of the impact cycle

track outcomes

co-occurence grouping key word

transactions

support vector machines have ___ decision boundaries

two

two linked tables do not necessarily have

two foreign keys

profiling key word

typical

the simpler the model, the greater the chances of

underfitting the model

step 1: identify the questions

understand the business problems that need to be addressed

every column in a table must be

unique and relevant to the purpose of the table

unsupervised approach

used for data exploration looking for potential patterns

decision trees

used to divide data into smaller groups

data reduction is used to

filter results

step 6: track outcomes

follow up on the results of the analysis

link rows in one table to rows in another table

foreign key

vlookup: lookup_value

foreign key you wish to look up; single cell reference

velocity refers to

frequency

profiling relies on

gathering summary statistics and identifying outliers

visual example of classification

graph with different symbols and colors

visual example of clustering

graph with dots separated by color

clustering is used to identify

groups of similar data elements and the underlying drivers of those groups

three types of columns

primary keys, foreign keys and descriptive attributes

logical data model

abstract representation of a database's contents

each process-specific schema is a piece of

a greater whole, combining to form one integrated database

classification key words

classes, categories

SQL allows you to

extract only a potion of the data

supervised data mining

used when you are trying to predict a future outcome based on historical data; a form of data mining in which data miners develop a model prior to the analysis and apply statistical techniques to data to estimate values of the parameters of the model

linear classifiers

useful for ranking items rather than simply predicting the class probability

the purpose of transforming the data is to

validate for completeness and integrity

3 Vs of big data

volume, velocity and variety

join clause

way to extract data from more than one table in SQL

for each attribute, we learn:

what kind of key it is, what data is required, what data can be stored in it, how much data is stored

post-pruning occurs

when model is completed; evaluates completed model and discards branches after the fact

unsupervised data mining

when you don't have a specific question; a form of data mining where the analysts do not create a model or hypothesis before running the analysis. instead, they apply the data mining technique to the data and observe the results. with this method, analysts create hypotheses after the analysis to explain the patterns found.

physical views of database

where the data is physically arranged and sorted

In most cases, you need to know

which tables and attributes contain the relevant data

can a foreign key be null

yes

visual example of profiling

z score

order of SQL code to create a query

1) Select &* 2) From (table) 3) Inner Join (other table) 4) On

5 steps of ETL

1) determine purpose/scope of data request 2) obtain the data 3) validate data for completeness and integrity 4) clean the data 5) load the data

Extraction

1) determine the purpose and scope of the data request 2) obtain the data

skills that analytic-minded accountant should possess

1) develop an analytic mindset 2) data scrubbing and data pre 3) data quality 4) descriptive data analysis 5) data analysis through data manipulation 6) define and address problems through statistical analysis 7) data visualization and date reporting

5 steps of requesting data

1. determine purpose and scope of data request 2. obtain data 3. validate data 4. clean the data 5. load the data

a transaction with a z score of ___ or above would represent abnormal transactions

3

composite keys

A composite key is a combination of two or more foreign keys in a table to create a Primary Key; the use of more than one column of data to uniquely identify each row in a relational database table.

IMPACT Cycle

Identify questions, Master the data, Perform the test, Address and refine results, Communicate insights, Track outcomes

an example of classification

Of all the loans a bank has offered, which are most likely to default? Which loan applications are expected to be approved?

if you have direct access to a data warehouse, you can use

SQL and other tools to pull the data yourself

SQL example in excel

VLookup

questions for determining the purpose and scope of the data request

What is the purpose of the data request? what do you need to solve the data? what business problems will they address? what risks exists in data integrity? what is the mitigation plan? what other information will impact the nature, timing and extent of the analysis?

similarity matching

an attempt to identify similar individuals based on data known about them

foreign keys

a column or group of columns used to represent relationships. values of the foreign key are attributes that point to a primary key in another table

The primary key in the second table is

a combination of primary keys in the first table

the primary key in the first table is

a foreign key in the second table

use SQL to

combine data from one or more tables and organize it in a way that is more intuitive than how it is stored in a relational database

integration

combining databases

clustering

an attempt to divide individuals into groups (clusters) in a useful or meaningful way; identifying groups of similar data elements and the underlying drivers of those groups

regression

an attempt to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model; predict specific values

class examples

accept/reject, fraud/not fruad

descriptive attributes provide

actual business information

asking colleagues what they think of the analysis would be considered to be a part of which stage of the impact cycles

address and refine results

slicing/dicing the data, finding correlations, revising and rerunning the analysis are part of which stage of impact cycle

address and refine results

classification

an attempt to assign each unit (or individual) we know very little about in a population into a few categories

profiling

an attempt to characterize the "typical" behavior of an individual, group or population by generating summary statistics about the data (mean, standard deviations, etc) so that we can more easily identify abnormal behavior (anomalies)

co-occurence grouping

an attempt to discover associations between individuals based on transactions involving them

link prediction

an attempt to predict relationships between two data items; social media

target

an expected attribute or value that the want to evaluated

examples of profiling

analyzing travel and entertainment expenses; comparing variances from target ranges; Benford's law

accountants should be able to

articulate business problems, communicate with data scientists, draw conclusions, present results, develop an analytical mindset

profiling is typically used to

assess data quality and internal controls

test data is used to

assess the degree and strength of a relationship

data reduction

attempts to reduce the amount of information that needs to be considered to focus on the most critical items by taking a large set of data and reducing it with a smaller set that has the vast majority of the critical information from the larger set

step 5: communicate insights

communicate effectively using clear language and visualizations

after revising and rerunning the analysis, what comes next in the IMPACT cycle

communicate insights

one of the biggest differences between a flat file and a relational database is

how many tables there are; relational databases have multiple tables

validating data for completeness and integrity

compare the number of records and descriptive statistics for numeric fields; validate date and time fields; compare string limits

logical view of database

how the data is conceptually organized/understood

how to ensure data is valid for completeness and integrity

compare the number of records, compare descriptive stats, validate date/time fields, compare string limits for text fields

when evaluating classifiers, need to strike a balance between

complexity of the model and accuracy of the classification

how does data analytics affect financial reporting

better estimates of collectability and write downs, better understanding of business environment through social media, identifying risks and opportunities through analysis of internet searches

use of data warehouse in decision making

business intelligence

how do users retrieve data stored in a database

by executing a query

profiling regarding T&E expenses, which is not one of the areas that the analyst would try to uncover A) lack of controls B) change in procedures C) significant variances in standard cost D) individuals more willing to spend excessively

c

use a flowchart to

identify an appropriate approach

the purpose of extracting data is to

identify and obtain data from the appropriate source

the goal of ETL is to

identify and obtain the data needed for solving a problem

linear discriminants use mathematical equations to

draw the line that separates the two classes

pre-pruning occurs

during model generation

unsupervised approaches

clustering, profiling, co-occurence grouping, data reduction

attempting to sell additional items by suggesting "customers who bought this also liked..." or "frequently bought together" is an example of which approach to data analytics

co-occurence grouping

relational databases and redundancy

each element of data is stored in only one place

clustering algorithms

calculate the minimum distance of all observations and groups those elements

data dictionary

centralized repository of descriptions for all of the data attributes of a data set; contains information about the structure of the database

Which transactions is a credit card company flag as potentially being fraudulent and deny payment? is an example of which data analytics approach

classification

skill not emphasized that analytic-minded accountants should have

classification of test approaches; data and systems analysis and design

supervised approaches

classification, regression, similarity matching, link prediction, causal modeling

segmenting a customer into a small number of groups for additional analysis and marketing activities is an example of which approach to data analytics

clustering

vlookup: range_lookup

either FALSE or TRUE; false indicates you want an exact match

how does data analytics affect auditing

enhance audit quality, expand services, add value to clients and allow auditors to stay engaged beyond the audit

in a well structured relational database

every table should be related to at least one other table; every column in a row must be single valued

first assumption in normalization approach

everything initially stored in one large table

training data

existing data that has been manually evaluated and assigned to a class

test data

existing data used to evaluate the model; data that exists (for example, in a database) before a test is executed, and that affects or is affected by the component or system under test.

T/F: a data dictionary will be more robust and have more attributes to keep track of for a dataset stored as a flat file

false

data reduction key word

filter

______________ is the metadata that describes each attribute in a database.

data dictionary

storing data in a normalized relational database ensures that

data is complete, not redundant and that business rules are enforced; aids in communication and integration across business processes

not a benefit of using a normalized relational database

data is stored in one place

asking accountant to identify customers who might be candidates

data mining

profiling is used to assess

data quality and internal controls

when a manager wants to gather info about employees, use which language

data query language

data analytics that suggests new ways to highlight which transactions do not need the same level of vetting as the other transactions is an example of which approach to data analytics

data reduction

structured data

data that is organized and resides in a fixed field with a record or file

big data refers to

datasets that are too large and complex to be analyzed traditionally

not a benefit of database approach

decentralized management of data

linear classifiers identify

decision boundaries; ranks

formula for regression

dependent variable = f(independent variables)

primary keys are rarely

descriptive

critical data but not necessary to build the data model

descriptive attributes

after you have identified the objects/activity you want to profile, what should you do next?

determine the types of profiling you want to perform

linear classifiers are useful for

determining the really important values

ETL-- what other info will impact data analysis

determining the scope/purpose of data request

variety refers to

different types

support vector machine

discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin and then to find the middle line

steps of classification

identify the classes you wish to predict, manually classify an existing set of records, select a set of classification models, divide data into training and testing sets, generate model, interpret results and select the best model

data dictionaries help analysts

identify the data they need to use

steps of regression

identify variables that might predict an outcome, determine the functional form of the relationship, identify the parameters of the model

steps of profiling

identify what you want to profile, the type of profiling you want to perform, set boundaries/thresholds, interpret results, follow up on exceptions

the ETL process begins with

identifying what data you need

relational databases should be designed to support business processes which results in

improved communication across functional areas and more integrated business processes

independent variables

inputs; x axis

examples of targets

interest rate, fraud score

relational databases ensure that data:

is complete, not redundant, follow business rules and internal controls, and aid communication

descriptive attribute

it is an attribute that is used to describe or record information about the 'relationship'; includes everything else

why are relational databases preferred

its ability to store and maintain data integrity; "one version of the truth" across multiple data elements

step 2: master the data

know what data is available and how it relates to the problems

similarity matching key words

known data, similar

visual example of regression

line of best fit on graph

suggesting friends to add on social media based on mutual friends is an example of which approach to data analytics

link prediction

examples of data reduction

locating payments made to specific vendors, using XBRL to filter specific tags

data dictionaries help administrators

maintain databases

class

manually assigned category applied to a record based on an event

after you have identified classes you wish to predict, what is the next step

manually classify an existing set of records

decision boundaries

mark the split between one class and another

can primary keys be null

no

process of first developing a relational database and then breaking the table down into smaller tables

normalization

ETL-- where is data located in systems

obtain the data

data warehouse data storage

often fed by a variety of sources, and data is analyzed centrally

Unified Modeling Language (UML) is

one way to understand databases

the primary key is typically made of ___ column

one; but it occasionally be made of multiple columns

problem to normalization

only having one primary key

dependent variable

output; y axis

the more complex the model, the greater the chance of

overfitting the model

not found in a data dictionary

physical location of data

the goal of classification is to

predict whether an individual we know very little about will belong to one class or another

regression key words

predict, numerical values

causal modeling

predicting an outcome by identifying its relationship with one or more other factors; independent variables cause or are associated with dependent variables

examples of regression

predicting employee turnover; determining the appropriateness of allowance accounts

a data warehouse

primarily used for analysis than transaction processing

structured data is readily

searchable

by 2020, 1.7 megabytes of new information will be created every

second

clustering key words

segments, similar

step 3: perform the test plan

select and appropriate model to find a target variable

Attempting to identify seller and customer fraud based on various characteristics known about them to see if they are similar to known fraud cases is an example of which data analytics approach

similarity matching

benford's law states that in many naturally occurring collections of numbers, the significant leading digit is likely to be

small

flat file

stores data in one place as opposed to multiple tables, such as a relational database

profiling is done primarily using

structured data; data that is readily available

SQLetl

structured query language; used to create, update and delete records and tables in databases, extract data, select precise attributes and records that fit criteria of analysis goal

vlookup: tabble_array

table that contains the corresponding primary key; always looks in the first column

consider when obtaining data

tables that contain info you need (data dictionary/relationship model), identify which attributes hold the info you need, identify how the tables relate to each other

how does data analytics affect taxes

tax strategy and planning, understanding tax consequences of international transactions/investments/M&A, better organization of tax tables and other tax data

models associated with regression and classification do not have

test data

regression allows

the accountant to develop models to predict outcomes

the ETL process ends when

the clean data is loaded into the appropriate format into the tool to be used for analysis

primary keys

the column in a database that uniquely identifies each row; unique identifiers

ETL

the extract, transform and load process that is integral to mastering the data

the first argument in a vlookup is

the foreign key

business intelligence

the practice of monitoring customers, competitors and suppliers to better understand opportunities and threats

the model you should use depends on

the questions you are trying to answer

primary and foreign keys facilitate

the structure of a relational database

joins rely on

the structure of normalized relational databases that have tables related through primary and foreign keys

when you need to extract data from more than one table in a SQL query, what do you need to identify to properly join tables

the two fields that the tables have in common

schemas do not represent

their own separate databases


Set pelajaran terkait

National Real Estate Practice Exam

View Set

Ap Human Geography Chapter 5 Key Issue 1-2

View Set

Sentence transformation 6-7 - modal verbs

View Set

Chapter 64 - Nursing Management: Musculoskeletal Problems

View Set