A355 Final Exam
Which of the following best addresses outliers? Please select the best answer. a. univariate regression b. multivariate regression c. logistic regression d. sample selection e. adjusted r^2
Answer: D
what do accountants provide that machines do not? Please select the best answer a. alternative methods of analysis b. accurate computation c. fast computation d. interpretation and expertise related to understanding and communicating the computation
Answer: D
which of the following is a balanced panel? Example A: Company / Year A / 1 A / 2 B / 1 B / 2 C / 1 Example B: Company / Year A / 1 A / 2 A / 3 B / 1 B / 2 Example C: Company / Year A / 1 B / 1 C / 1 A / 2 B / 2 D / 2 Example D: Company / Year A / 1 A / 2 A / 3 B / 1 B / 2 B / 3
Answer: Example D
as accounting analytics, which item of the framework do we view differently than data scientists? Please select the best answer a. ask the question b. master the data c. perform the analysis d. share the story
Answer: a
ledger
a list of events or transactions, establishes consensus about facts
excel
best at data analysis and exploring data
robotic process automation
can automate simple tasks, process information very quickly (EX: using a function in excel and dragging it into another cell)
DeFi
decentralized finance; covers a broad array of topics: -smart contracts -stablecoins -decentralized exchanges (DEX) -NFTs
NFTs
essentially a way to digitally store and secure property rights, and can be used in any fashion; non-fungible: each token is unique and cannot be mistaken for another; OpenSea is the biggest NFT marketplace (currently most used for art)
volume
how much data
outer join
maximum information, keep all information
p-value
probability of observing a t-statistic as extreme as the one shown if the null hypothesis is true (small p-values are indicative that we can reject the null hypothesis of no relation)
triple entry accounting
public receipt allows for perfect verifiability of every transaction, instantly in real time; no more need to sample transactions because you have all of them instantly; becomes more difficult to commit fraud because all books are linked and the transaction is public
winsorize the outlier
pulls the outlier back into the distribution, retains the outlier observation but changes its value; sample size remains the same, but value of certain observations changes to be closer to the 'center' of the distribution
variety
unstructured, semi-structured, structured
descriptive analytics
what happened? what is happening?
we consider accounting and non-accounting data sources available for data analysis including:
-financial statements -macroeconomic statistics -supply chain data -financial analyst reports
AMPS model
1. ask the question 2. master the data 3. perform the analysis 4. share the story
Bloom's Taxonomy
1. create 2. evaluate 3. analyze 4. apply 5. understand 6. remember (machines do 4-6, accountants do 1-3)
two broad types of diagnostic analytics
1. identifying anomalies/outliers 2. finding previously unknown linkages, patterns, or relationships between and among variables
two common ways of dealing with outliers
1. trim the outlier 2. winsorize the outlier
unbalanced panel
A panel data set in which some data are missing.
Total Value Locked, or TVL, is an example of a new accounting metric that is made possible through DLT technology. In the scenario below, what is the TVL for A355 Protocol? Abby deposits $1,000 into A355 protocol to validate transactions (staking) Lydia lends $1,000 in digital assets into A355 protocol for lending. A355 protocol itself lends out $800 of Lydia's original investment. a. $2,000 b. $2,800 c. $1,200 d. $3,600
Answer: A
You are worried that you have a correlated omitted variables problem. Which would best help you address this problem? a. ensuring your model has a high r^2 b. Ensuring your Beta coefficient is statistically different from zero at a tail end of the probability distribution (i.e. 1% or lower) c. Including the correlated omitted variable you are concerned about as an additional explanatory variable in your regression. d. ensuring the intercept of your model goes through the origin
Answer: C
You estimate the following trend line analysis (linear regression) in Tableau, and get the following output. What is B1 in this regression? Awesomeness = 0.620356*You + 0.0370451 a. awesomeness b. 0.0370451 c. 0.620356 d. 0.0001 e. You.
Answer: C
what type of task would Robotic Process Automation, a currently commonly used 4th industrial revolution technology, be well suited for? Please select the best answer. a. providing the firm guidance b. creating balance sheets given unstructured information c. classifying lease agreements based on shared keywords d. identifying the most profitable future business opportunities, given accounting data
Answer: C
which of the following does Tableau not do? Please select the bets answer a. visualize data b. compute measures c. create data d. organize data e. tableau does all of the other listed answers
Answer: C
What aspect of distributed ledger technology allowed for the development of airdrops, a brand new capital allocation mechanism? Please select the best answer. a. Because consensus for the ledger is generated via mining, the winning accountant can share their reward for entering transactions into the ledger. b. The decentralized portion of the ledger allows for investors to search out early stage ventures. c. Triple entry accounting allows for startups to interact with established corporations d. Because the ledgers are publicly available, start-up companies can send assets to entities based on their transaction history with other companies.
Answer: D
You wish to investigate how well the independent variables in your regression explain the variance in Y. Which statistic should you use? a. beta coefficients b. alpha coefficients c. p values d. R^2
Answer: D
You have two datasets that you wish to combine. If you did a left-join, which firms WOULD be in the final dataset? Please select ALL firms that would be in the final dataset. Dataset #1 Firm / Accruals A 0.50 B 2.70 C -0.55 Dataset #2 Firm / Cash Flow A -1.0 C 5.0 D 7.0 a. Firm A b. Firm B c. Firm C d. Firm D
Answer: Firm A, Firm B, Firm C
blockchain trilemma
a blockchain has tradeoffs, and its very difficult to simultaneously achieve 1. decentralization 2. security 3. scalability (Ex: Ethereum aims for decentralization and security, but as a consequence it is relatively slow and expensive)
random forest
a grouping of decision trees, injecting randomness at each stage of the process
management by exception
a managerial style that allows management to spend its time addressing issues/problems
regression
a predictive analytics technique that allows the accountant to estimate a specific dependent variable outcome value based on independent inputs (shows the on-average associations between variables of interest, standard regressions assume linear relationships, altman's z score is derived from the coefficients estimated using a linear regression)
classification
a predictive analytics technique used to separate or classify a sample (or population) into two or more groups or classes--fraud/no fraud, extend loan/do not extend loan, etc. (predicting a probabilistic outcome, or what might happen based on our forecasts)
smart contract
a self-executing agreement between two or more parties where the terms are written directly into code--they have dominion over the assets in the contract (Ethereum)
time series analysis
a tool/technique used to predict future values based on past values of the same variable
machine learning
a type of artificial intelligence and is the ability of a computer to automatically learn on its own without being explicitly programmed to do so
smart contracts - wallet
a wallet is a way to access your funds on the blockchain; secured with a private key; MetaMask is the most common wallet
stablecoins
attempt to stabilize value at $1 (or whatever); imagine a crypto asset that is always (hopefully) worth a dollar; can use very quickly to send money (biggest ones currently are USDT and USDC) -- also known as CBDCs (Central Bank Digital Currencies)
decentralized exchanges (DEX)
automated, permissionless, marketplaces that allow instant real-time trading (or swapping) of digital assets
4th Industrial Revolution
based on the use of cyber physical systems
tableau
best for data visualization; does not allow for original data entry
double entry accounting
both parties record the transaction, but there is no way to know if either recorded it right
base rates
defined as probability of an event occurring based on a related historical average
blockchain
distributed, permissionless ledger (anyone can see the ledger, ledger is updated in real time), essentially a distributed database
in class example of hash
hash = nonce + a + b + c - value of last 2 digits of previous hash a = value of the first letter of the transaction b = value of the first letter of the "from" party c = value of the first letter of the "to" party
veracity
how trustworthy
Altman's Z score decision rules
if z < 1.80 classify as significant risk of bankruptcy, or in the "distress zone" if z >= 1.8 and z < 3 classify as at risk of bankruptcy or "gray zone" if z >= 3 classify as not currently at risk of bankruptcy or "safe zone"
inner join
keep only the rows that match in both datasets
hash functions
mathematical functions that transform a given set of data into a string of fixed size; it is deterministic: same input will always produce the same output; one way encoding - can't get original message from encoded
proof of stake
miners put up collateral (stake). they stake their collateral on not submitting false entries. if found to commit false entries, they lose their collateral entirely (miners putting up stake are rewarded with more tokens, don't have to be winning miner to get tokens)
databases
most secure method of storing data
nonce
number only used once, helps calculate the hash function
base rate fallacy
occurs when the prediction places too little weight on the base rates of the past and instead uses different or new information
to be useful to decision makers, accounting needs to be both _______ and ____________ the substance of what actually occurred
relevant; faithfully represent
r^2
represents the proportion of variance of the y variable that is explained by the x variables
velocity
speed of generation or rate of analysis
avoid overfitting by
splitting into training and test samples (can be sequential or random)
left join
start with the first dataset, and add only columns that match the rows in the first dataset
right join
start with the second dataset, and add only columns that match the rows in the second dataset
total value locked (TVL)
sum of all assets deposited in crypto protocols earning rewards, interest, new coins and tokens, and fixed income
ordinary least squares (standard regression)
the regression line is fit so that the sum of the squared errors from the line are minimized
balanced panel
the variables are observed for each entity and each time period
box charts can be used to...
visualize the distribution of data points, identify outliers
value transfer in a distributed ledger network
we call the computers in this network "miners" but really they are accountants (they all vote on the correct hash--decentralized because each are solving and voting separately); they are verifying authentication and authorization, just like banks do
supervised learning
we tell the algorithm what the input/output data is
hypothesis testing
we test the hypothesis and then see if our hypothesized relation is significantly different from zero
prescriptive analytics
what should we do, based on what we expect will happen? how do we optimize our performance based on potential constraints?
diagnostic analytics
why did it happen? what are the root causes of past results?
predictive analytics
will it happen in the future? what is the probability something will happen? is it forecastable?
Altman's Z 5 factors that predict bankruptcy
x1: working capital / total assets x2: retained earnings / total assets x3: earnings before interest and taxes / total assets x4: market value of stockholders' equity / book value of total debt owed x5: sales / total assets
type 2 error
your pregnancy test says you're not pregnant but you are (false negative)
type 1 error
your pregnancy test says you're pregnant but you're not (false positive)
What formula in Tableau allows us to classify firms based on their Altman's Z scores? Assume that we have previously coded the Altman's Z formula in Tableau (as in our lab), and that we named this encoded variable "Altman's Z score". Please read carefully. (Hint, compare the formulas and you should be able to identify the errors in the incorrect ones). a. IF [Altman's Z Score]< 1.8 THEN "Distress Zone" ELSEIF [Altman's Z Score] >=1.8 AND [Altman's Z Score] < 3 THEN "Grey Zone" ELSE "Safe Zone" END b. IF [Altman's Z Score]< 1.8 THEN "Distress Zone" IF [Altman's Z Score] >=1.8 AND [Altman's Z Score] < 3 THEN "Grey Zone" ELSE "Safe Zone" END c. [Altman's Z Score]< 1.8 THEN "Distress Zone" [Altman's Z Score] >=1.8 AND [Altman's Z Score] < 3 THEN "Grey Zone" ELSE "Safe Zone" END d. IF [Altman's Z Score]< 1.8 THEN "Distress Zone" ELSEIF [Altman's Z Score] >=1.8 AND [Altman's Z Score] < 3 THEN "Grey Zone" ELSE "Safe Zone"
Answer: A
What is one problem of double entry accounting that distributed ledgers can solve? a. Distributed ledgers allow for triple entry accounting, in which each party must record the same value for the transaction, where in double entry accounting each party could record a different entry. b. Distributed Ledgers use less energy to maintain compared to double entry ledgers. c. Distributed ledgers allow for each party to keep their transactions completely private and secret from competitors, whereas for double entry accounting these transactions are open for anyone to see. d. Double entry accounting requires a lot of complex understanding of accounting concepts, whereas triple entry accounting requires less understanding of accounting concepts.
Answer: A
Assume we have a regression of the form: Consulting Fees = α + β1(Kelley Alum) + β2(High School GPA) + ε assume that: Consulting Fees is measured in Millions Kelley Alum is an indicator variable equal to one if the consultant is a Kelley Alum, and zero if they are not a Kelley Alum. High School GPA is a continuous variable which is equivalent to the consultant's high school GPA. Assume that β1 is equal to 3, α is equal to 1, and β2 is equal to 2. Assume all variables are statistically significant at the <1% level. Which of the following are correct interpretations of this result? MORE THAN ONE CAN BE CORRECT, SO PLEASE SELECT ALL CORRECT INTERPRETATIONS. a. On average Kelley alums are associated with higher consulting fees, even after controlling for the association between high school GPA. b. On average Kelley alums are associated with higher consulting fees, but this effect is conditional upon having a high school GPA greater than or equal to 2. c. On average a higher high school GPA is associated with lower consulting fees. d. As high school GPA increases by one unit, consulting fees are on average 2 million dollars higher.
Answer: A and D
Assume we have a regression of the form: Consulting Fees = α + β1(Kelley Alum) + ε and that Consulting Fees is measured in Millions, while Kelley Alum is an indicator variable equal to one if the consultant is a Kelley Alum, and zero if they are not a Kelley Alum. Assume that β1 is equal to 3, while α is equal to 1. What is the correct interpretation of this result? a. On average Kelley Alums are associated with an increase of $4 Million in consulting fees. When the individual is not a Kelley Alum, consulting fees are expected to be $1 million. b. On average Kelley Alums are associated with an increase of $3 Million in consulting fees. When the individual is not a Kelley Alum, consulting fees are expected to be $1 million. c. On average Kelley Alums are associated with an increase of $1 Million in consulting fees. When the individual is not a Kelley Alum, consulting fees are expected to be $3 million. d. On average Kelley Alums are associated with an increase of $4 Million in consulting fees. When the individual is not a Kelley Alum, consulting fees are expected to be $1 million.
Answer: B
Which way of dealing with an outlier retains the observation that contains an outlier, but changes the value of the variable in question? a. trim b. winsorize c. cluster d. trim
Answer: B
why are balanced panels helpful in accounting analytics? a. they allow us to more easily compute variable in Tableau b. they allow us to ensure that changes over time are not driven by changes in sample composition c. they allow us to best generalize to the newest real-world data d. they allow us to make sure our economic magnitudes are balanced e. they allow for more accurate tests of statistical significance
Answer: B
Based on firm characteristics, you predict that Company A is likely to commit fraud, while Company B is not likely to commit fraud. What type of analysis is this? a. Regression Analysis b. Deterministic Analysis c. Probabilistic Analysis d. Prescriptive Analysis e. none of the other answers are correct
Answer: C
If we wanted to treat each observation from a dataset as an individual unit in Tableau, so that, for example, we could do a trend line analysis... What would we classify the data type as? Hint - think about the steps we took to do trend line analysis in either in-class diagnostic exercise #1, or in-class diagnostic exercise #2. a. attribute b. measure c. dimension d. count
Answer: C
In accounting analytics we use different tools. When comparing Python versus Tableau, which is TRUE? a. python is easier to work with in creating visualizations b. tableau is cheaper than python c. Python allows for multiple independent variables to be used in a regression, while Tableau only allows for one independent variable. d. tableau allows for more advanced machine learning algorithms
Answer: C
Of the choices below, which type of analysis would you choose if you wish to easily covey a correlation or association between two variables? Choose the best answer a. pie chart b. histogram c. trend line d. bar chart
Answer: C
Which of the following best describes t-statistics? a. they tell you the economic significance of the variable in question b. if a t-statistic is below 1.65, you shouldn't care about what the analysis says c. they are measures of statistical significance d. tell inform you whether the relationship examined is causal
Answer: C
Which of the following is NOT true? a. In Altman's Z classification, we can take the ratio of Working Capital/Total Assets, and multiple this variable times 1.2 to help in calculating the Z score. b. The Z scores were originally derived from regressions, and we use the estimated coefficients from that regression to now classify firms based on bankruptcy risk. c. Altman's Z score predicts that as the ratio of Retained Earnings/Total Assets increases, the company has a higher bankruptcy risk. d. An Altman's Z score of greater than 3 indicates the company is classified as not currently at risk of bankruptcy, or is in the "safe zone" e. all of the answers are true
Answer: C
