Data Analytics Exam 1
3 types of aggregates:
1. SUM 2. COUNT 3. AVG
4 categories of data analytics:
1. descriptive 2. diagnostic 3. predictive 4. prescriptive
5 steps of ETL
1. determine purpose and scope 2. obtain data 3. validate data 4. clean data 5. load data for data analysis
4 of the 7 skills analytic-minded accountants should have:
1. develop an analytics mindset 2. data scrubbing and data prep 3. data quality 4. descriptive data analysis
IMPACT cycle
1. identify the questions 2. master the data 3. perform the test plan 4. address and refine results 5. communicate insights 6. track outcomes
4 basic types of JOIN:
1. inner 2. outer 3. left 4. right
JOIN clause is part of what clause?
FROM clause
you can use SELECT ______ and _____ to refine your query
aggregates; aliases
OUTER JOIN =
attributes from other table are NULL
The real value inherent in data comes from ____________, discovering the various buying patterns of customers, investigating anomalies that were not predicted in firm operations, forecasting future demand and supply and so on.
data analytics
What process involves the technologies, systems, practices, methodologies, databases, statistics, and applications used to analyze diverse business data to give organizations the information they need to make decisions?
data analytics
big data
datasets that are too large and complex to be analyzed traditionally
data dictionary
defines what data are acceptable
summary statistics
describe set of data in terms of location, range, shape, and size
summary statistics and data reduction are what type of data analytics?
descriptive
if a z-score is higher, then it is ______ from the mean
farther
what is the power of the JOIN clause?
foreign keys
Benford's law evaluates the ______ distribution of the ____ digits of a large set of ______ data
frequency; first; numerical
predictive analytics
generates a model to predict what could happen in the future
to aggregate means to...
group or cluster
Which of the following is an accurate description of the Audit Data Standards?
guide for formatting the way in which data are provided to auditors
Tax compliance deals primarily with filing tax returns. In contrast, tax planning primarily helps
minimize the amount of taxes paid
in a WHERE clause, number attributes need...
no special criteria
mean, median, and mode are always the same in a ______ _______
normal distribution
As part of mastering the data, data analysts perform data ________________ to reduce data redundancy and improve data integrity.
normalization
in a WHERE clause, date attributes need to be enclosed by ______ _____
number signs
predictive and prescriptive analytics are considered to be _______ and ______
optimization; foresight
descriptive data analysis
perform basis analysis to understand the quality of the underlying data and its ability to address the business question
foreign keys
point to a primary key in another table
an alias ______ a column by using the ___ clause
renames; AS
JOIN clause is used to...
retrieve info from multiple tables
RIGHT JOIN
retrieved from the joined table
LEFT JOIN
retrieved from the main table
decision support systems
rule based systems that gather data and recommend actions
descriptive analytics
summarize existing data to determine what happened in the past
the safest type of data is ____
text
HAVING clause is a filter for when you are using which clause?
the GROUP BY clause
A digital dashboard would be part of which step of the IMPACT cycle?
track outcomes
primary keys
unique identifiers
4 types of attributes:
1. primary keys 2. foreign keys 3. composite keys 4. descriptive attributes
what are the 3 V's?
1. volume- size 2. velocity- frequency 3. variety- different types
data dictionary answers what 4 questions for each attribute we learn about?
1. what type of key is it 2. what data are required 3. what data can be stored in it 4. how much data is stored
___% of CEOs say data analytics are the 2nd most important strategic technology
80
___% of CEOs put a high value on data analytics
85
the operator for the HAVING clause can be...
=, <, >, =<, =>, <>
what clause is always the last line in any query?
ORDER BY clause
which clause can not be used with the GROUP BY clause?
WHERE clause
Financial accounting often has challenges with valuation and estimation in all but the following area:
accounts payable
machine learning and artificial intelligence
adapt to new external data to recommend a course of action
what is the default order for attributes?
ascending
ORDER BY clause sorts the records in...
ascending or descending order
Why does data analytics matter to accountants?
because it allows them to test the whole population rather than just the sample (enhances audit quality)
Why does data analytics matter to business?
because it translates into higher profits (up to $3 trillion per year)
relational databases ensure that data is ______, is not _______, follows _____ _____ and ______ ______, and aids _______ and _______ of business practices
complete; redundant; business rules; internal controls; communication; integration
data scrubbing and data prep
comprehend the process needed to clean and prepare the data before analysis
what does a foreign key do?
carries out the relationship between 2 tables
Which testing approach would be considered to be an attempt to discover associations between individuals based on transactions involving them?
co occurrence grouping
composite keys
combo of 2 foreign keys for line items
After data analysts slice and dice the data, find correlations, ask ourselves further questions, ask colleagues what they think, and revise and rerun the analysis, what comes next in the IMPACT cycle?
communicate insights
Data Visualization would be part of which step of the IMPACT cycle?
communicate insights
Variance analysis, a common practice in management accounting, is an example of _____ analytics.
diagnostic
profiling, clustering, similarity matching, and co occurrence grouping are what type of data analytics?
diagnostic
co occurence grouping
discovers associations between individuals based on common events
prescriptive analytics
enable recommendations for what should be done in the future
GROUP BY clause is the...
engine that powers subtotaling data
regression
estimates numerical value of a dependent variable based on the slope and intersect of a line and value of an independent variable
training data
existing data that have been manually evaluated and assigned a class
test data
existing data used to evaluate the model
diagnostic analytics
explore the current data to determine why something happened the way it did
mastering the data involves what 3 things?
extract, transform, load (ETL)
clustering
identifies groups/clusters of individuals that share common underlying characteristics
similarity matching
identifies similar individuals based on data known about them
profiling
identifies typical behavior of an individual, group, pop by compiling summary statistics and comparing the individuals to the pop
which step of the IMPACT cycle is the most important?
identify the questions
descriptive and diagnostic analytics are considered to be _____ and _______
info; hindsight
unified modeling language (UML)
know which tables and attributes contain relevant data
data reduction takes a _____ set of data and reduces it to a smaller set that has the vast majority of the _____ ____ of the larger set
large; critical info
you can have _____ and _____ skewed distributions
left; right
Which of the following is not an existing Audit Data Standard?
manufacturing subledger
INNER JOIN =
match in both tables
regression, classification, and link prediction are what type of data analytics?
predictive
classification
predicts a class/category based on the manual identification of classes from previous observations
link prediciton
predicts a relationship between 2 data items
decision supports systems, machine learning, and AI are what type of data analytics?
prescriptive
data analytics
process of evaluating data with the purpose of drawing conclusions to address business equations
_____ might be used to identify areas where there is a lack of controls, changes in procedures, or individuals more willing to spend excessively in potential types of T&E expenses which might be associated with higher risk.
profiling
in a WHERE clause, text attributes need _____
quotes
data quality
recognize what is meant by data quality (completeness, reliability, validity)
develop an analytics mindset
recognize when and how data analytics can address business equations
data reduction/filtering
reduces the amount of observations to focus on relevant items
Structured data is stored in a database or spreadsheet and are readily _________
searchable
profiling uses what type of data?
structured
effective data analytics provide a way to search through large ______ and __________ data to identify ______ patterns and relationships
structured; unstructured; unknown
if all attributes are selected, then the order is based on...
the primary key
An _________ approach is used when you don't have a specific question and are simply exploring the data for potential patterns of interest.
unsupervised
Profiling is an __________ method that is used to discover patterns of behavior, based on the distance of z-scores from the mean.
unsupervised
what is the 4th V?
veracity- quality of data
SELECT clause indicates...
which attribute you wish to view
FROM clause indicates...
which table contains the attributes you are selecting
WHERE clause...
works like a filter in excel