Analytics- Midterm 1

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What are the attributes that exist in a relational database that are neither primary nor foreign keys?

descriptive attributes

completeness

ensures that all data required for a business process are included in the dataset

If your data analysis project is more declarative than explanatory, it is more likely that you will preform your data visualization to communicate results in

excel

training data

existing data that have been manually evaluated and assigned a class

target

expected attribute or value that we want to evaluate

Mastering the data can be described via the ETL process. The ETL process stands for

extract, transform, and load

Which of the following describes part of the goal of the ETL process

identify and obtain the data needed for solving the problem

decision support systems

information systems that support decision making activity within a business by combining data and expertise to solve problems and preform calculations

support vector machines

is a discriminating classifier that is defined by a separating hyperplane that works first to find the widest margin and then works to find the middle line

XBRL

is used to facilitate the exchange of financial reporting information between the company and the SEC

quantitative data charts

line, box and whisker plot, scatter plot, filled geographic maps

class

manually assigned category applied to a record based on an event

decision boundaries

mark the split between one class and another

Benford's law

observation about the frequency of leading digits in many real life sets of numerical data

In general, the more complex the model, the greater the chance of

overfitting the data

declarative visualizations

product of wanting to declare or present your findings to an audience

data dictionary

provides descriptions for all of the data attributes of the dataset

pruning

removes branches from a decision tree to avoid overfitting the model

discrete data

represented by whole numbers

What is the most appropriate chart when showing a relationship between two variables

scatter chart

By the year 2020, about 1.7 megabytes of new information will be created every

second

In the late 1960s, Ed Altman developed a model to predict if a company was at severe risk of going bankrupt. He called his statistic Altman's Z-score, now a widely used score in finance. Based on the name of the statistic, which statistical distribution would you guess this came from?

standardized normal distribution

models associated with regression and classification data approaches have all except this important part

test data

The purpose of transforming data is

to validate the data for completeness and integrity

charts that show proportions

tree/heat maps, symbol maps, word clouds

composite primary key

two foreign keys from the tables that it is linking combine to make up a unique identifier

decision trees

used to divide data into smaller groups

profiling

-attempt to characterize the typical behavior of an individual, group, or population by generating summary statistics about the data -primarily uses structured data -discover patterns of behavior -assess data quality and internal controls

qualitative data

-categorical -split into nominal and ordinal

big data

-datasets that are too large and complex for business' existing systems to handle utilizing their traditional capabilities to capture, store, manage, and analyze these data sets -volume, velocity, and variety

Five steps of the ETL process in order

-determine the purpose and scope of the data request -obtain the data -load the data for data analysis -validate the data for completeness and integrity -clean the data

data analytic skills needed by analytic minded accountants

-develop an analytics mindset -data scrubbing and data preparation -data quality -descriptive data analysis -data analysis through data manipulation -define and address problems through statistical data analysis -data visualization and data reporting

test data

-existing data used to evaluate the model -set of data used to assess the degree and strength of a predicted relationship

ratio data

-have a meaningful zero -0 means the absence of -money -most sophisticated type of data

data reduction steps

-identify the attribute you would like to reduce or focus on -filter the results -interpret the results -follow up on results

classification steps

-identify the classes you wish to predict -manually classify an existing set of records -select a set of classification models -divide your data into training and testing sets -generate your model -interpret the results and select the best model

IMPACT cycle

-identify the questions -master the data -perform test plan -address and refine results -communicate insights -track outcomes

explanatory visualization

-lines between step P,A, and C are not clearly divided -align with performing the test plan within visualization software -Tableau> gaining insight while working with the data

quantitative data

-more complex -difference between each point are meaningful -counted, ranked, averaged, and take standard deviation -split into interval and ratio

interval data

-no meaningful 0 -temperature

data analytics

-process of evaluating data with the purpose of drawing conclusions to address business questions -aims to transform raw data into knowledge to create value

nominal data

-simplest form of data -can only be counted

Slicing and dicing the data, finding correlations, revising and re-running the analysis would be considered to be part of which stage of the IMPACT cycle

Address and refine results

When evaluating classifiers, you need to be careful to strike a balance between what two things

Complexity of the model and accuracy of the classification

Which of the following is not a step for cleaning the data

Deleting any results that are unfavorable to the results you were hoping to retrieve

Which of the following is not one of the considerations for determining the purpose and scope of the data request

Determining how the data will be cleaned

Variance analysis, a common practice in management accounting, is an example of _____ analytics

Diagnostic

In which format do analysts typically prefer to analyze data

Flat file

______ looks for similarities between portions, or segments, of the text of each potential maych

Fuzzy match

When is a foreign key required?

If two tables are related in a relational database, one of the two must have a foreign key

The four benefits of storing data in a relational database or completeness of data, no redundant data, business rules are enforced, and communication and ________ a business processes

Integration

Which of the following is not one of the means of cleaning the data after extraction and validation

Load the data into the software program in preparation for analysis

Which of the following is not an existing Audit Data Standard

Manufacturing subledger

Machine learning, artificial intelligence, and decision support systems are all examples of _____ analytics

Prescriptive

The extraction process requires two steps. One of the steps is determine ___________ of the data request

Purpose and scope

Data Analytics may use what source to assess the probability of a goodwill write down, warranty claims or the collectibility of bad debts

Social media

Which of the following is not one of the means of cleaning the data after extraction and validation

Transform the data into a usable form

McKinsey Global Institute estimates that Data Analytics could generate up to $3_______ in value each year

Trillion

classification

attempt to assign each unit in a population into a few categories

co-occurence grouping

attempt to discover associations between individuals based on transactions involving them

clustering

attempt to divide individuals into groups in a useful or meaningful way

similarity matching

attempt to identify similar individuals based on data known about them

link prediction

attempt to predict a relationship between two data items

qualitative data charts

bar, pie, stacked bar

ordinal data

can be counted, categorized, and ranked

continuous data

can take on any value within a range

As mentioned in the chapter, which of the following is not a common way that data will need to be cleaned after extraction

clean up trailing zeroes

data reduction

data approach that attempts to reduce the amount of information that needs to be considered to focus on the most critical items

regression

data approach used to predict a specific dependent variable value based on independent variable inputs using a statistical model

structured data

data that are stored in a database or spreadsheet and are readily searchable


Ensembles d'études connexes

Environmental Science (BIO107)- Exam 2

View Set

PRINCIPLES OF COACHING Chapter 10, 11, 12

View Set

CHAPTER 3 AP HUG NO STRESS FINAL

View Set

Extensors and Flexors of the Wrist and Hand

View Set

PrepU Chapter 9: Teaching and Counseling

View Set

Ch 9. Forming and Maintaining Personal Relationships.

View Set