ACCT 3130 Midterm

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

An observation about the frequency of leading digits in many real-life sets of numerical data is called: leading digits hypothesis. Moore's law. Benford's law. clustering.

Benford's law.

While accountants don't need to become data scientists, they must know how to do the following except: Comprehend the process needed to clean and prepare the data before analysis Build a data repository Communicate with the data scientists about specific data needs and understand the underlying quality of the data Clearly articulate the business problem the company is facing

Build a data repository

__________ mark the split between one class and another. Decision trees Identified questions Decision boundaries Linear classifiers

Decision boundaries

What are attributes that exist in a relational database that are neither primary nor foreign keys? Nondescript attributes Descriptive attributes Composite key Relational table attributes

Descriptive attributes

Which of these is not included in the five steps of the ETL process? Determine the purpose and scope of the data request. Obtain the data. Validate the data for completeness and integrity. Scrub the data.

Determine the purpose and scope of the data request.

Data profiling is used to assess data quality and internal controls and typically involves the following steps except: Identify the objects or activity you want to profile. Set boundaries or thresholds for the activity. Filter the results. Determine the types of profiling you want to perform.

Filter the results.

IMPACT: Track Outcomes

Follow up on the results of the analysis

IMPACT Model

Identify Questions Master the Data Perform test plan Address and refine results Communicate Insights Track Outcomes

Which of the following best describes an independent variable? Application Output Input Operation

Input

Why is Supplier ID considered to be a primary key for a Supplier table? It contains a unique identifier for each supplier. It is a 10-digit number. It can either be for a vendor or miscellaneous provider. It is used to identify different supplier categories.

It contains a unique identifier for each supplier.

IMPACT: Master the Data

Know what data are available and how they relate to the problem *ETL

Which of these is not included in the five steps of the ETL process? Learn what data is available in the data warehouse. Determine the purpose and scope of the data request. Obtain the data. Validate the data for completeness and integrity.

Learn what data is available in the data warehouse.

Which approach to Data Analytics attempts to predict relationship between two data items? Profiling Classification Link prediction Regression

Link prediction

Which approach to data analytics attempts to predict a relationship between two data items? Similarity matching Classification Link prediction Co-occurrence grouping

Link prediction

The advantages of storing data in a relational database include which of the following? Help in enforcing business rules Increased information redundancy Integrating business processes All of the above Only A and B Only B and C Only A and C

Only A and C

Which attribute is required to exist in each table of a relational database and serves as the "unique identifier" for each record in a table? Foreign key Unique identifier Primary key Key attribute

Primary key

Which approach to data analytics attempts to characterize the typical behavior of an individual, group or population by generating summary statistics about the data? Classification Profiling Link prediction Regression

Profiling

Which approach to Data Analytics attempts to identify similar individuals based on data known about them? Classification Regression Similarity matching Data reduction

Similarity matching

_________ is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line. Linear classifier Support vector machine Decision tree Multiple regression

Support vector machine

___________ is a set of data used to assess the degree and strength of a predicted relationship. Training data Unstructured data Structured data Test data

Test data

fuzzy matching

locates approximate matches Useful for identifying relationships in imperfect data.

decision boundaries

mark split between one class and another

overfitting

models that are too accurate are actually worse at predicting future observations *want to maximize accuracy of testing without overfitting

Gold, silver, and bronze medals would be examples of: nominal data. ordinal data. structured data. test data.

ordinal data.

In general, the more complex the model, the greater the chance of: overfitting the data. underfitting the data. pruning the data. a more accurate prediction of the data.

overfitting the data.

How does data analytics affect financial reporting?

-better estimates of collectability -better understand business environment -identify risks and opportunities

Patterns discovered from ________ enable businesses to identify opportunities and risks and better plan for ________. past archives; today current data; the future current data; today past archives; the future

past archives; the future

Big Data is often described by the three Vs, or volume, velocity, and variability. volume, velocity, and variety. volume, volatility, and variability. variability, velocity, and variety.

volume, velocity, and variety.

Accountants need to be able to:

•Articulate business problems. •Communicate with data scientists. •Draw appropriate conclusions. •Present results in an accessible manner. •Develop an analytics mindset.

Accountants need to be comfortable with:

•Data scrubbing and data preparation •Data quality •Descriptive data analysis •Data analysis through data manipulation •Define and address problems through statistical analysis •Data visualization and data reporting

How does data analytics affect tax?

-better tax planning strategies -understand tax consequences for international, investments, mergers, etc -aid compliance

How does data analytics affect auditing?

-enhances audit quality -expanded services -added value to clients -auditors engaged beyond audit

relational database

-ensures data is complete -Are not redundant -follow business rules -aid communication and integration of business processes

5 Steps to Requesting data

1. Determine the purpose and scope of the data request 2. Obtain the data (yourself or through IT dept) 3. Validate the data for completeness and integrity 4. Clean the data 5. Load the data for data analysis

make sure data is valid

1.Compare the number of records 2.Compare descriptive statistics for numeric fields 3.Validate Date/Time fields 4.Compare string limits for text fields

Steps of Classification

1.Identify the classes you wish to predict. 2.Manually classify an existing set of records. 3.Select a set of classification models. 4.Divide your data into training and testing sets. 5.Generate your model. 6.Interpret the results and select the "best" model.

Profiling steps

1.Identify the objects or activity you want to profile. 2.Determine the types of profiling you want to perform. 3.Set boundaries or thresholds for the activity. 4.Interpret the results and monitor the activity and/or generate a list of exceptions. 5.Follow up on exceptions.

Steps of regression

1.Identify the variables that might predict an outcome. 2.Determine the functional form of the relationship. 3.Identify the parameters of the model.

Common corrections when cleaning data

1.Remove headings or subtotals 2.Clean leading zeroes and nonprintable characters 3.Format negative numbers 4.Correct inconsistencies across data, in general

Which approach to Data Analytics attempts to assign each unit in a population into a small set of classes (or groups) where the unit best fits? Regression Similarity matching Co-occurrence grouping Classification

Classification

Which approach to data analytics attempts to assign each unit in a population into a small set of classes where the unit belongs? Classification Regression Similarity matching Co-occurrence grouping

Classification

Which skills were not emphasized that analytic-minded accountants should have? Develop an analytics mindset Data scrubbing and data preparation Classification of test approaches Define and address problems through statistical data analysis

Classification of test approaches

As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction and validation? Remove headings and subtotals. Format negative numbers. Clean up trailing zeroes. Correct inconsistencies across data.

Clean up trailing zeroes.

Correcting inconsistencies across data is an example of which of the following? Validating the data for integrity Validating the data for completeness Cleaning the data Obtaining the data

Cleaning the data

Retail stores often request customers' zip codes at the end of a sales transaction. This is an example of which data approach? Clustering Similarity matching Classification Regression

Clustering

IMPACT: Communicate Insights

Communicate effectively using clear language and visualizations

Which skills were not emphasized that analytic-minded accountants should have? Data quality Descriptive data analysis Data visualization Data and systems analysis and design

Data and systems analysis and design

The metadata that describes each attribute in a database is which of the following? Composite primary key Data dictionary Descriptive attributes Flat file

Data dictionary

Which of these terms is defined as being a central repository of descriptions for all of the data attributes of the dataset? Big Data Data warehouse Data dictionary Data Analytics

Data dictionary

The objective of data extraction is: To validate the data for completeness and integrity To identify and obtain the data from the appropriate source To identify which approach to data analytics should be used To load the data into the appropriate tool for analysis

To identify and obtain the data from the appropriate source

The objective of loading data is: To identify and obtain the data from the appropriate source To validate the data for completeness and integrity To identify which approach to data analytics should be used To load the data into the appropriate tool for analysis

To load the data into the appropriate tool for analysis

Which of the following best describes the purpose of relational databases? To ensure that business rules are enforced To support business processes across the organization To provide business information to data analysts To increase information redundancy in the organization

To support business processes across the organization

Data analytics is the process of evaluating data with the purpose of drawing conclusions to address business questions. (T/F)

True

IMPACT: Identify the questions

Understand the business problems that need to be addressed

UML

Unified Modeling Language: a way of visualizing the relationships between classes in a program.

foreign key

attributes that point to a primary key in another table

composite key

combination of two foreign keys used for line items

The IMPACT cycle includes all except the following process: data preparation. communicate insights. address and refine results. perform test plan.

data preparation.

test data

data that exists, used to evaluate the model

Summary Statistics

describe a set of data in terms of their location (mean, median), range (standard deviation, minimum, maximum), shape (quartile), and size (count).

Variety

different types

Support vector machine

discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin (or biggest pipe) and then works to find the middle line.

Regression

estimates or predicts the numerical value of a dependent variable based on the slope and intersect of a line and the value of an independent variable.

training data

existing data that have been manually evaluated and assigned a class

Mastering the data can also be described via the ETL process. The ETL process stands for: extract, total, and load data. enter, transform, and load data. extract, transform, and load data. enter, total, and load data.

extract, transform, and load data.

ETL

extraction, transformation, and loading

Velocity

frequency

Co-occurrence

grouping discovers associations between individuals based on common events, such as transactions they are involved in.

Similarity matching

grouping technique used to identify similar individuals based on data known about them.

With a goal to give organizations the information they need to make sound and timely business decisions, data analytics often involves all of the following except: technologies. growth. databases. statistics.

growth.

profiling

identifies the "typical" behavior of an individual, group, or population by compiling summary statistics about the data (including mean, standard deviations, etc.) and comparing individuals to the population. typically in structured data ex: z-score

Which of the following describes part of the goal of the ETL process: identify which approach to data analytics should be used. load the data into a relational database for storage. communicate the results and insights found through the analysis. identify and obtain the data needed for solving the problem.

identify and obtain the data needed for solving the problem.

clustering

identify groups (or clusters) of individuals (such as customers) that share common underlying characteristics—in other words, identifying groups of similar data elements and the underlying drivers of those groups.

IMPACT: Address and refine results

identify issues with the analyses, possible issues, and refine the model -ask further questions -explore data -rerun analyses

Descriptive attributes

include everything else

Machine learning and artificial intelligence

learning models or intelligent agents that adapt to new external data to recommend a course of action.

Which of the following best describes the goal of descriptive data analysis: demonstrate ability to sort, rearrange, merge, and reconfigure data in a manner that allows enhanced analysis comprehend the process needed to clean and prepare the data before analysis recognize what is meant by data quality, be it completeness, reliability or validity perform basic analysis to understand the quality of the underlying data and its ability to address the business question

perform basic analysis to understand the quality of the underlying data and its ability to address the business question

Classification

predicts a class or category for a new observation based on the manual identification of classes from previous observations.

Link prediction

predicts a relationship between two data items, such as members of a social media platform.

4 types of attributes

primary keys, foreign keys, composite keys, descriptive attributes

Diagnostic Analytics

procedures that explore the current data to determine why something has happened the way it has, typically comparing the data to a benchmark. ex: profiling, similarity matching, co-occurence & clustering

Prescriptive Analytics

procedures that model data to enable recommendations for what should be done in the future. ex: decision support systems & Machine learning and artificial intelligence

Descriptive Analytics

procedures that summarize existing data to determine what has happened in the past. ex: summary stats & data reduction or filtering

Predictive Analytics

procedures used to generate a model that can be used to determine what is likely to happen in the future. ex: regression, link prediction, & classification

Data Analytics

process of evaluating data with the purpose of drawing conclusions to address business questions. provides a way to search through large structured and unstructured data to identify unknown patterns or relationships

Data reduction or filtering

reduce the amount of observations to focus on relevant items (i.e., highest cost, highest risk, largest impact, etc.). It does this by taking a large set of data (perhaps the population) and reducing it to a smaller set that has the vast majority of the critical information of the larger set.

pruning

removes branches from decision tree to avoid overfitting the model

Decision support systems

rule-based systems that gather data and recommend actions based on the input.

By the year 2020, about 1.7 megabytes of new information will be created every: week. second. minute. day.

second.

IMPACT: Perform the test plan

select an appropriate model to find a target variable ex: classification, regression, similarity matching, clustering, co-occurence grouping, profiling

Volume

size

Data that are organized and reside in a fixed field with a record or a file. Such data are generally contained in a relational database or spreadsheet and are readily searchable by search algorithms. The term matching this definition is: training data. unstructured data. structured data. test data.

structured data.

Models associated with regression and classification data approaches have all except this important part: identifying which variables (we'll call these independent variables) might help predict an outcome (we'll call this the dependent variable). the functional form of the relationship (linear, nonlinear, etc.). the numeric parameters of the model (detailing the relative weights of each of the variables associated with the prediction). test data.

test data.

Big Data

to datasets which are too large and complex to be analyzed traditionally

The purpose of transforming data is: to validate the data for completeness and integrity. to load the data into the appropriate tool for analysis. to obtain the data from the appropriate source. to identify which data are necessary to complete the analysis.

to validate the data for completeness and integrity.

In general, the simpler the model, the greater the chance of: overfitting the data. underfitting the data. pruning the data. the need to reduce the amount of data considered.

underfitting the data.

primary key

unique identifier

Decision tree

used to divide data into smaller groups

Linear classifiers

useful for ranking items rather than simply predicting class probability. useful for determining the really important values, such as valuable customers, or which transactions are most likely fraudulent

The IMPACT cycle includes all except the following process: perform test plan. visualize the data. master the data. track outcomes.

visualize the data.


Set pelajaran terkait

Logical Fallacies Test 3/20/18 YOU CAN DO IT!!!!!!

View Set

Google Analytics Individual Qualification Exam Answers 2020 (1)

View Set

Social Studies-12th Grade-Unit 3

View Set

Unit 1c_Nature is good for you_Life A2

View Set

Intermediate Accounting II Final

View Set

Chapter 29: Questions Infection Prevention and Control

View Set