CH 1 -

¡Supera tus tareas y exámenes ahora con Quizwiz!

Effective Data Analytics

provides a way to search through large structured and unstructured data to identify unknown patterns or relationships.

Big Data

refers to datasets that are too large and complex to be analyzed traditionally

Data reduction

A data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items (e.g., highest cost, highest risk, and largest impact)

Regression

A data approach used to predict a specific dependent variable value based on independent variable inputs using a statistical model.

Classification

An attempt to assign each unit (or individual) in a population into a few categories

Profiling

An attempt to characterize the "typical" behavior of an individual, group, or population by generating summary statistics about the data (e.g., descriptive statistics such as mean or frequency)

Co-occurrence grouping

An attempt to discover associations between individuals based on transactions involving them

Clustering

An attempt to divide individuals (like customers) into groups (or clusters) in a useful or meaningful way.

Similarity matching

An attempt to identify similar individuals based on data known about them

Link Prediction

An attempt to predict connections between two data items

Identify the questions

Are employees circumventing internal controls in day-to-day transactions (think about the steps in the purchase-to-pay and order-to-cash cycle)? How to identify top-tier customers for targeted promotion and marketing (think about ABC analysis and cluster analyses)? What are appropriate cost drivers for activity-based costing purposes (think about regression analyses)? How strong is our accounting information system in terms of preventing cyber attacks (think about preventative controls)? How can errors made in journal entries be identified (think about tracing and vouching)?

Perform test plan

Classification (e.g., identify high risk transactions) Regression (e.g., establish a relationship between X and Y) *Similarity matching (e.g., identify similar customers) Clustering (e.g., grouping and labeling the groups) Co-occurrence grouping (e.g., Amazon "frequently bought together") Profiling (e.g., find anomalies that depart from typical behavior) *Link prediction (e.g., Facebook - people you may know) Data reduction (e.g., a subset of the data with critical information)

IMPACT

Identify the questions Master the data Perform the test plan Address/refine results Communicate insights Track Outcomes

Master the data

Know what data are available and how they relate to the problem. Internal ERP systems. External networks and data warehouses. Data dictionaries. Extraction, transformation, and loading (ETL). Data validation and completeness. Data normalization. Data preparation and scrubbing.

Data Analytics

Process of evaluating data with the purpose of drawing conclusions to address business questions

Difference between similarity matching and link prediction

Similarity matching is focusing on common behaviors, transactions, or patterns. This method is as if finding "twins" in the dataset Link Prediction is based on the likelihood of establishing a connection. Dissimilar people could still be connected in link prediction such as the connection between the publisher and the colleague professor. In contrast, this given connection will not show up in similarity matching.

4V 's

Volume refers to the size Velocity refers to the frequency Variety refers to different types Veracity refers to the data


Conjuntos de estudio relacionados

Biodiversity, Section 4: Mastering Biology Questions

View Set

GCD 3022 - Exam 4 Book Questions

View Set