SPI Interview
A random variable X a follows normal distribution. An observation of X
denote by x, has z- score of 2. It means that,x is two standard deviations away from the mean of X
Long term sources of finance for private firm
-Listing on Stock Exchange, Private equity -Bank
Formula: ∆put
-e^(-∂T)*N(-d1)
if product is in growth stage
-emphasize marketing and competition
EFE matrix steps:
1. List key external factors 2. Wright from 0-1 3. Rate effectiveness of current strategies 4. Multiple weight* rating 5. Sum weighted scores
110010
62
Possible Values: Vega(put)
>0
What is cross-validation?
It's a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.
Synthetic Treasury
Ke^(-rT)=P(S,K,T)-C(S,K,T)+F0,T[S]*e^(-rT)
KPI
Key Performance Indicator or metric or feature.
why is sales revenue imporant?
Knowledge of revenue and profit by customer, by product, by sales channel, for example, is important.
What is a term loan?
A loan of a fixed amount for an agreed time and on specified terms
What are percentiles?
A percentile is a metric indicating a value, below which a percentage of values falls.
What latent semantic indexing is used for?
Learning correct word meanings Subject matter comprehension Information retrieval Sentiment analysis (social network analysis) Here's a great tutorial on it.
4 Ps
Product Price Promotion Place
21 Ways to Cut Costs Finance
Have customers pay sooner Refinance you debt Sell nonessential assets Hedge currency Rates redesign health insurance
000100
04
Going concern assumption
Assume the company has the ability to operate indefinitely, unless told otherwise
010001
21
1000
8
Principles - Monitoring Activities
*S*eperate and/or *O*ngoing Evaluations Communication of *D*eficiencies
5%
#/2 and #/10
33 and 1/3%
#/3
Current assets
-Cash -Short-term investments -Accounts receivable -Inventory -Prepaid expenses
What should we worry about if we have an experiment with 20 different metrics?
The more metrics you are measuring, the more likely it is you'll get a false positive
Star Schema
The star schema is a simple relational model that is easy to understand and that represents business transactions in a lucid manner
What are examples of supervised learning?
Classification, Neural Networks, Regression
14
E (hex)
P(S
K,T) Payoff,max(0,K-S(T))
Liquidation
Selling all of a company's assets, in parts, for their tangible worth -Most extreme -Can be very emotional
reducing churn can significantly impact your __________________
bottomline (profitability)
chance / event node
circle
Company
how does the company create and capture value
t-scores are _____ than z-scores
larger
Covariance
measures strength and direction of a linear relationship between two variables
Correlation is useful only for
measuring the strength of a linear relationship
Symmetric distributions
n>15
Standard Deviation
the square root of variance
What is recall?
tp / (tp + fn)
7-S framework
"hard" -- strategy, structure, systems "soft" -- style, skills, staff, shared values
10%
#/10
1%
#/100
Legacy systems
out of date systems
000101
05
NYC population
8.5 million
d ? exp[(r-∂)h] ? u
<
Possible Values: Ψput
>0
What are Recommender Systems?
A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.
What is a Broadcast Variable?
Ans: Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.
Growth Strategies Approaches
Assessment Strategies
what is click stream
Clickstream analysis is the process of collecting and analyzing data about website visitors' mouse clicks
Option Greek Definition: Delta
Increase in option value per increase in stock price (∂C/∂S) (SLOPE)
At-the-Money
S=K
Calendar Spread
Sell C(S,K,T) and Buy C(S,K,T+t) for t>0
Maximax
Tries for the best outcome "Optimist"
What percentile does the median represent?
What percentile does the median represent?
snowflake schema
an enhancement to the standard star schema
ed (economic depreciation)
annual cash invest required to replace fixed assets
What is a Liability?
(Anything the company owes) A present obligation of the entity arising from past events, the settlement of which is expected to result in an outflow of resources.
sloan ratio
(net income - operating cash flow - investing cash flows) divided by (total assets)
Fact table
contains facts about the business
Variety
different forms of data
Two external dimensions
Stability positions and industry position (SP & IP)
Porter's 5 Forces 3. Pressure from substitute Products
Sugar vs High fructose Corn Syrup
Possible Values: Vega(call)
>0
Liquidity and solvency are associated with the
Balance sheet
What does the income statement report?
The profit (or loss) made by the organization in the period that has just ended
Strategy/Performance *Technology leader*
Short product cycles, high inventory turnover, high R&D costs.
debt to assets ratio
Total debt/total assets
ST: Strengths and threats
Use a firm's strengths to avoid or reduce the impact of external threats
Earnings per share
Used heavily with the P/E ratio
Are expected value and mean value different?
They are not different but the terms are used in different contexts. Mean is generally referred when talking about a probability distribution or sample population whereas expected value is generally referred in a random variable context.
Virtual Cubes
Virtual cubes are a combination of cubes or a portion of a cube that has been segmented for business analysis or for security reasons.
Ansoff Matrix
Ways to grow: Product development (new p, existing m) Market Development (existing p, new m) Market Penetration (existing p, existing m) Differentiation (new p, new m)
what is the leading indicator of future sales?
csat
Needed when there are outliers
median
Liquidity
solvency, and profitability are ______. A company that cannot pay its debts will have difficulty obtaining credit, which can decrease its profitability.,Interrelated
The sampling mean of an distribution becomes normal as
the sample size grows
what are foreign keys?
when the primary keys are then referenced to in other tables. EG. In a Sales Order table, the Customer Number is a foreign key because it references the Customer table's primary key.
Of the four interactions
which has no impact on the tables?,Read
100%
x1
Unmodified (Unqualified) Opinion
*Clean Opinion* -States that FS present fairly, in all material respect, the financial position, results of operations, and cash flow in conformity with financial reporting framework. -Unmodified: nonissuers -Unqualified: issuers May determine necessary to add additional communication to auditor report w/o modifying auditors opinion: emphasis-of-matter (nonissuers), other-matter (nonissuers), and explanatory paragraphs (issuer)
Misstatements Related to the Appropriateness of Financial Statement Presentation or Disclosure
- FS do not include all required disclosures - Disclosures are not in accordance with the framework - FS do not provide necessary disclosures - Required information (Statement of CF) has not been included
Why Spark even Hadoop exists?(stream)
-Near real-time data processing: Spark also supports near real-time streaming workloads via Spark Streaming application framework
000011
03
Qualified Opinion Due to Inadequate Disclosure - Nonissuers (Private Company)
1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "qualified audit opinion" 4. Basis for Qualified Opinion 5. Qualified Opinion Paragraph - "except for," "the financial statements are presented fairly"
Developing a New Product Step
1. Think about the Product 2. Think about Market Strategy 3. Think about the Customers 4. How are we going to get Funding
Nonissuer Report Qualified Opinion
1.Introductory paragraph: no change. 2.Management's responsibility paragraph: no change. 3.Auditor's responsibility paragraph. 4.Basis for qualified opinion paragraph. 5.Qualified opinion paragraph.
P(1/x
1/K,T),C(x,K,T)/(Kx)
010111
27
Inventory Conversion Period
365/Inventory Turnover
011111
37
What is a Transformer?
A Transformer is an algorithm which can transform one DataFrame into another.
Options Combination
A combination is defined as any strategy that uses both puts and calls
Cross Validation
A model validation technique that splits training data into two parts: one is a training set and the other is a validation set. Checks how well a model will generalize to new data.
Explain what a local optimum is?
A solution that is optimal in within a neighboring set of candidate solutions In contrast with global optimum: the optimal solution among all others
How would you evaluate a logistic regression model?
A subsection of the question above. You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction etc.) and bring up a few examples and use cases.
Estimator
An estimator is an algorithm which is fit on a DataFrame to produce a Transformer
Please define executors in detail?
Ans: Executors are distributed agents responsible for executing tasks. Executors provide in- memory storage for RDDs that are cached in Spark applications. When executors are started they register themselves with the driver and communicate directly to execute tasks.
How would you the amount of memory to allocate to each executor?
Ans: SPARK_EXECUTOR_MEMORY sets the amount of memory to allocate to each executor.
What it is important to have a robust set of metrics for machine learning?
Any ml technique should be evaluated by using metrics for assessing the quality of results.
Type 5:
Best value focus strategy that offers products or services to a small group of customers at the best price available on the market
BCG Matrix
Cash Cows Stars Pet Question Marks Companies that have diversified portfolios have CashCows, Stars, and QuestionMarks
What do you understand by a closure in Scala?
Closure is a function in Scala where the return value of the function depends on the value of one or more variables that have been declared outside the function.
5 Cs
Company Costs Consumers Competition Climate
Comprehensive income
Company takes net income and adjusts it for unrealized gains and losses (in investments in stocks and bonds and foreign currency translations)
Market Penetration
Current Customers / Total Potential Customers
Nestle- Company
Current Market Share Growth Rate WCS Brands Pricing Strategy
Perpetuities (Zero Growth Stock)
Dividend/Required Return
Advantages of Equity Financing
Does not need to be paid back. Dividend can be cut/ stopped. Improves Solvency/Capital
BCG Matrix Stars
High Market Share High Growth Can be Leader in the market which gives alot of added benefits Growth in Market>Growth in Share Increases in Share> Increase Margins High Margin Will eventually become a cash cow
Developing a New Product 2. Think about the Marketing Strategy
How does this strategy affect our existing product line? Are we cannibalizing our own sales from an existing product? Are we replacing an existing product? How will this strategy expand our customer base and increase our sales? What will the competitive response be? If we are entering a new market what are the barriers to entry? Who are the major players and what are their respective market shares?
Practicable
Information is reasonably obtainable from management's accounts and records and that providing the information in the auditor's report doesn't require the auditor to assume the position of a preparer of financial information
what is the difference between informational and transactional systems?
Informational is used for analysis and transactional is used to record transactions.
Diversification strategies:
Introducing a new product or service -Related diversification, unrelated diversification
What is the 68-95-99.7 / Empirical Rule?
It is a shorthand rule used to remember the percentage of values that lie within the mean of a normal distribution. 68% of values fall within 1 SD, 95% of values fall within 2 SD, and 99.7% of values within 3 SD
Put Boundaries
K≥Pamer≥Peur≥max(0,Ke^(-rT)-F0,T[S]*e^(-rT))
Long Term Assets
Land+Building+Equiptment
What is Linear Regression?
Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.
Strategy/Performance *Strategy - Superior quality*
Lower volumes, higher margins -higher marketing/R&D costs
Response to competition - investigate
-What is the competitor's new product and how does it differ from what we offer? -What has the competitor done differently? -What's changed? - Have any other competitors picked up market share?
Developing a new product - Why this product
-What's special or proprietary about our product? -Is the product patented? -Are there similar products out there? Substitutions? -What are the advantages and disadvantages of the new product? -How does this new product fit into the rest of product line?
internal factors
-strategy -operations: marketing & sales, operations & logistics, finance & control, organization & culture, R&D
l
01101100
n
01101110
what are the components of enterprise warehouse?
1. *data acquisition layer*-This is where data from the source system(s) enter the data warehousing system. 2. *propagation layer*- stores data so that they can be used and reused for multiple applications. 3. *Corporate memory* is the destination of all corporate (organizational) data in their granular and harmonized forms.
What do Investors look for? (5)
1. A viable business opportunity 2. Realistic path to profitability 3. Understanding of any technology 4. The abilities of the people involved 5. Commitment of key personnel
What are the contents of a Business Plan?
1.Executive Summary 2.Company Description 3.Products and Services 4.Markets 5.Technology 6.Competition/Competitive Advantage 7.Business Model/Key Customers 8.Operations 9.Management 10.Financial Information - current position 11.Financial Projections - milestones
Nonissuer Report Adverse Opinion
1.Introductory paragraph: no change. 2.Management's responsibility paragraph: no change. 3.Auditor's responsibility paragraph. 4.Basis for adverse opinion paragraph. 5.Adverse opinion paragraph.
Asia
4.436 billion
100000
40
110001
61
110011
63
Example of Tagged data
<message> <to>mary</to> <from> john </from> <date> april 9, 2015 </date> <content> hello mary </content> </message>
Day Sales of inventory
= Inventory/ cost of sales * 365 average length of time a companies cash is tied up as inventory
acquisition cost
= cost per contact / take rate (aka % of customers who accept an offer)
what does the process of analytics include?
(a) identifying the problem, (b) gathering relevant data that frequently are not in a usable form (c) cleaning up the data to make them usable, (d) loading them into data storage models,
what does NLP reflect?
(a) when people are communicating with their computer they should be able to speak as they would to another person (b) the computer should be able to translate the speech into information and commands it can understand.
quick ratio
(current assets-inventory)/current liabilities
Central limit theorem
As long as you have a large enough sample size, the sample you take will fall somewhere along the normal distribution
Revenue
Comes before earnings- known as sales or top line
Principles - Control Environment
Commitment to *E*thics and Integrity *B*oard Independence *O*rganizational Structure Commitment to *C*ompetence *A*ccountability
Competition
Consolidation Industry Growth Product differences Exit Barriers
000000
00
b
01100010
z
01111010
starting a new business
1. entering a new market. does it make good business sense? -who's the competition? market share? how do their products compare to ours? -barriers to entry? 2. venture capitalist perspective -management -market and strategic plans -distribution channels -products -customers -finance
Barriers to studying a whole population
COST and SPEED are the main barriers to studying the whole population
Shareholders equity
Capital stock+ retained earnings
Average Absolute Variation
1. how far (absolute value) each data point is ways from the average 2. what is the average of these numbers
Use Mergers and Acquisitions framework when you hear
Is this merger a good idea?
13
D (hex)
reflexive
For every x ∈ A, xRx | every guy in the set is related to its self
Anti Symmetric
For every x,y ∈ A, xRy and yRx → x=y. | if xRy and yRx then x equals y
multidimensional model
one where a value such as revenue can be viewed by multiple aspects such as
Price drivers
price elasticity, commodity product vs. differentiated, positioning, pricing strategy
what is a web service?
simply an XML-based software system that enables users to access computing resources via a network. uses protocol simple object text protocol SOAP
Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods
such as ridge regression and lasso?,Used to prevent overfitting: improve the generalization of a model Decreases complexity of a model Introducing a regularization term to a general loss function: adding a term to the minimization problem Impose Occam's Razor in the solution
Form and Content of Auditor's Report Nonissuers= Private company When the auditor expresses a qualified or adverse opinion due to material misstatement of the financial statements
the "Auditor's Responsibility" paragraph,is modified and the auditor's report will include a "Basis for Modification" paragraph and a "Qualified Opinion" or "Adverse Opinion" paragraph, as appropriate.
Historisation
the ability to store the historical changes to a dimension attribute
Relational Database
the relationships between tables are created through the use of unique identifiers call primary keys.
When A and B are independent
P(A and B) = P(A)P(B)
Bayes' Rule
P(A|B) = P(B|A)P(A)/P(B)
Contribution Margin ($)
Price - Variable Cost
Financial Statement Issues - Material but not pervasive
Qualified Opinion
If a Product is in its Emerging Growth Stage concentrate on.....
R&D Competition Pricing
Can you write the formula to calculate R-square?
R-Square can be calculated using the below formula - 1 - (Residual Sum of Squares/ Total Sum of Squares)
Defensive strategies:
Retrenchment, divestiture, liquidation
Gross Profit ($)
Revenue - COGS
Issuer Report (adverse opinion)
Same as adverse with middle paragraph(s) explaining the substantive reasons and the disclosure of the principal effects. The opinion paragraph has familiar language with the "because of" and "do not present fairly" wording.
Nonissuer Report (adverse opinion)
Same as qualified except for a basis for ADVERSE opinion paragraph that lays out the same types of issues. In opinion paragraph, the language includes "because of" and "do not present fairly".
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer Introductory Paragraph:
Same as standard nonissuer audit report.
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Management's Responsibility Paragraph.
Same as standard nonissuer audit report.
What are the three stages to build the hypotheses or model in machine learning?
The standard approach to supervised learning is to split the set of example into the training set and the test.
What is variance?
The tendency to learn random things irrespective of the true signal
What are the two paradigms of ensemble methods?
The two paradigms of ensemble methods are a) Sequential ensemble methods b) Parallel ensemble methods
What are two techniques of Machine Learning ?
The two techniques of Machine Learning are a) Genetic Programming b) Inductive Learning
Three Vs of Big Data
Volume, Velocity and Variety
What does the balance sheet show?
What the organization owns and owes at the end of the period
Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?
What's important here is to define your views on how to properly visualize data and your personal preferences when it comes to tools. Popular tools include R's ggplot, Python's seaborn and matplotlib, and tools such as Plot.ly and Tableau.
Explain how a ROC curve works.
The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It's often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).
Predictive Analytics
analyzes past performance
Jarrow-Rudd (Lognormal) Binomial Tree: u
exp[(r-∂-0.5σ^2)h+σ√h]
Jarrow-Rudd (Lognormal) Binomial Tree: d
exp[(r-∂-0.5σ^2)h-σ√h]
Cox-Ross-Rubinstein Binomial Tree: d
exp[-σ√h]
gross profit margin %
gross profit / revenue
Describe a hash table.
hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.
correlation
how securities move together range from -1 to 1
what is HTTP?
hypertext transfer protocol is a standard set of rules for sending web pages over the internet
101111
57
population data: LA
4M
0101
5
u ? exp[(r-∂)h] ? d
>
t
01110100
Knowledge
created when we learn from information
10
A (hex)
15
F (hex)
Competition segments acronym
MCBSR
in other words
a subset of the data.
Comparability
inter-company comparisons
staff
people -- brains, management, motivation
Text Tables
where textual master data are stored
000010
02
Historical cost principle
An issue related to the balance sheet and income statement -Analyst want to see the fair market value
Does shuffling change the number of partitions?
Ans: No, By default, shuffling doesn't change the number of partitions, but their content
What is F1?
Combines precision and recall into a single value
Sources of Short-term Finance
Debt Factoring Invoice Discounting
Transitive
For every x,y,z ∈ A, xRy and yRz → xRz. | if there's some guy x that's related to y and y's related to z then, x is related to z
What are the sources of Medium-term Finance?
Leasing Benefits Hire Purchase Term Loan Securitisation
Equivalence Relation
Reflexive, Symmetric, and Transitive
Replicating Portfolio
Sue^(∂h)∆+Be^(rh)=Cu | Sde^(∂h)∆+Be^(rh)=Cd
Liquidity
The ability to pay debts as they fall due
Analytics
The use of data, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers make better, fact‐based decisions
Adverse Opinion
This opinion is expressed when the auditor concludes that the misstatements, individually or in aggregate, are both material *and* pervasive to the financial statements. (GAAP Problem)
Whats a false negative?
When we wrongly accept the null hypothesis as highly probable
what is transactional data?
Whenever a business event is recorded by an information system, the relevant data are written to and stored in the database as transactional data
Can you use machine learning for time series analysis?
Yes, it can be used but it depends on the applications.
Probabilistic merging ( fuzzy merging )
You do a join on two tables A and B, but the keys are not compatible.
liabilities
future obligations a business is likely to owe
A business plan is used..
to facilitate implementation of the selected strategy
rule of 72
years to double = 72 / r (at 10% return, an investment will double every 7 years)
What is Chi-Square Selection?
Chi-Square is a statistical test used to understand if two categorical features are correlated.
Explain what resampling methods are and why they are useful. Also explain their limitations.
Classical statistical parametric tests compare observed statistics to theoretical sampling distributions. Resampling a data-driven, not theory-driven methodology which is based upon repeated sampling within the same sample. Resampling refers to methods for doing one of these Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping) Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests) Validating models by using random subsets (bootstrapping, cross validation)
Discrete Dividend Exercise (PUT)
Exercise ex-dividend
p>a
Fail to reject null hypothesis
Type I error
Ho is true, but we mistakenly reject it
what are the steps in figure 1.5?
Identify the goals gather the data design model apply model review results present findings Derive insights make decisions deploy strategy Improve
What is involved in strategic planning?
Identify the problem you are solving by producing your product/service i.e. What is the gap in the market you are filling?
Growth Strategies Assessment Elements
Is the industry growing? How are we growing compared to the industry? Are our prices relative to our competitors? What are our competitors marketing and development strategies? Which Segments have the most potential? Funding for higher growth
What is precision?
Precision: How many selected items are relevant? TP / ALL Recall: How many relevant elemenets were selected? TP / TP + FN
What do you understand by "Unit" and "()" in Scala?
Unit is a subtype of scala.anyval and is nothing but Scala equivalent of Java void that provides the Scala with an abstraction of the java platform. Empty tuple i.e. () in Scala is a term that represents unit value.
When are expenses accounted for?
When incurred, not when paid for
What are the two classification methods that SVM ( Support Vector Machine) can handle?
a) Combining binary classifiers b) Modifying binary to incorporate multiclass learning
Serious challenges with big data
correlation does not imply causality Large sample sizes can still be problematic Hard to analyze unstructured data (esp. video) Concerns about privacy
A company's ability to pay its current liabilities is called ______ analysis.
current position
Aspects of Big Data
data is available in real time data is available at larger scale data with less structure data on novel types of variables
Forward Price Binomial Tree: u
exp[(r-∂)h+σ√h]
Forward Price Binomial Tree: d
exp[(r-∂)h-σ√h]
Ratio of Fixed Assets to Long-Term Liabilities = ? / ?
fixed assets (net) / long-term liabilities
Return on stockholders equity
net income available to common stockholders/shareholders equity
Accounts Receivable Turnover = ? / ?
net sales / average accounts receivable
what is another name for informational systems?
online analytical processing or OLAP system
what is another name for Transctional Systems?
online transaction processing (OLTP) systems.
Analysis error
survey owner adjustments to represent what they believe is the true underlying population (i.e., "likely voter" adjustments)
What information does the government want?
tax due, regulatory requirements, grants
Descriptive analytics
uses data to understand past and present
Prescriptive analytics
uses optimization techniques
Which script will you use Spark Application
using spark-shell ?,Ans: You use spark-submit script to launch a Spark application, i.e. submit the application to a Spark deployment environment.
what is web scraping
web scraping, which is the process of searching for information on web pages and then stripping the html tags so the data can be stored in a structured format.
WACC
weighted average cost of capital E/E+D * Cost of E + D/E+D * Cost of D * (1-tax)
dividend yield
dividend payment per share divided by its share price
The ______ on common stock measures the rate of return to common stockholders from cash dividends.
dividend yield
dividend payout ratio
dividends divided by earnings percentage of earnings being paid as dividend
How can you define Spark Accumulators?
Ans: This are similar to counters in Hadoop MapReduce framework, which gives information regarding completion of tasks, or how much data is processed etc.
What is a prepayment?
advance payments for goods/services An asset not an expense
current liabilities
liabilities due within one year
Starting a New Business Steps 5. Products and Services
What is the product service or technology? What is the competitive edge? What are the disadvantages? Is the tech proprietary?
Give example of transformations that do trigger jobs(Spark)
Ans: There are a couple of transformations that do trigger jobs, e.g. sortBy , zipWithIndex , etc
Acceptability of the Financial Reporting Framework
Auditor should obtain an understanding of: 1. The purpose for which the single FS or specific element is prepared 2. The intended users 3. The steps taking by mgt to determine that the framework is acceptable in teh circumstances
d
01100100
e
01100101
i
01101001
k
01101011
110110
66
Return on Sales (ROS)
Net Income / Sales Revenue Profit as a percentage of revenue
ROA
Net Income/Assets
ROI (formula ext)
Net Income/Invested Capital
Pre-paid Forward on a Stock with Continuous Dividends
S(t)*exp[-∂(T-t)]
Current Ratio
CA/CL
Entering the market - good business sense
Brainstorm answers to these questions aloud: -Who is our competition and what size market share does each competitor have? -How do their products and services differ from ours? -How will we price our products or services? -Are there substitutions available? -Are there any barriers to entry?Suchas:capital requirements,access to distribution channels, proprietary product technology, or government policy. -Are there any barriers to exit? How do we exit if this market sours? -What are the risks?Suchas:market, regulation or technology?
volume
amount of shares of stock traded over a defined time period (typically one day)
net margin
(profit margin) net income divided by revenue
common barriers to entry
-capital requirements -access to distribution channels -proprietary product technology -government policy
1/9
.111
1/8
.125
1/7
.143
1/6
.167
1/5
.20
what the methods of data feedback?
1. Monitoring transactions, incidents, and other feedback mechanisms provides additional data for analysis. 2. Exception reporting. Embedded audit modules (EAMs) are one type of exception reporting. The EAM highlights data outside the expected values
Two ways to employ a cost-leadership strategy
1. Perform value chain activities more efficiently than rivals and control the factors that drive the costs of value chain activities 2. Revamp the firm's overall value chain to eliminate or bypass some cost-producing activities
100111
47
101000
50
101001
51
101010
52
101011
53
101100
54
101101
55
101110
56
110000
60
population data: NYC
8M
1001
9
What is the probability of obtaining a value less than or equal to two standard deviations below the mean?
95%
Confidence Interval
95% of data points are 1.96σ from the mean 95% of the time, the mean is 1.96σ from a data point
Possible Values: Ψcall
<0
Possible Values: ρput
<0
Possible Values: ∆put
<0
Advantages and Drawbacks of a Star Schema
A easy to understand only a single level of JOIN for any query D -it can contain a great deal of duplication of dimensional data -when the star contains alphanumeric primary and foreign keys, the query joins are slower, and performance may suffer
what are distributed system?
A system whose components (data and processes) are spread out over several locations instead of in one central location and thus can manage workload sharing.
What information do suppliers want?
Ability to pay for goods, long term viability of customer
Related diversification
Adding new but related products/services -Value chains possesses competitively valuable cross-business strategic fits
Unrelated diversification
Adding new, unrelated products/services -Value chains are so dissimilar that no competitively valuable cross-business relationships exist
Random Note
An omission of the statement of cash flows counts as a qualified opinion
Competitive Response Solutions
Analyze our current product and redesign, repackage, or move upmarket Introduce a new product Increase our profile with a marketing and public relations campaign Build Customer Loyalty Cut Prices Lock up raw materials and talent Acquire the competitor or another player in the same market Merge with a competitor to create a strategic advantage and become more powerful Copy the competitor
How can you create an RDD for a text file?
Ans: SparkContext.textFile
Break Even Market Share
BE Volume/Total market volume
Calmar ratio and MAR ratio
Calmar = 3 yr annualized return/ max draw down in past three years MAR ratio= annualized return since inception / max draw down since inception
Response to competition - choosing
Consider each, choose one: - Acquire the competitor, or another player in the same market. - Merge with a competitor to create a strategic advantage and make us more powerful. - Copy the competitor (e.g., Amazon.com vs. BarnesandNobIe.com). - Hire the competitor's top management. - Increase our profile with a marketing and public relations campaign.
Design of experiments
Design of experiments or experimental design is the initial process used (before data is collected) to split your data, sample and set up a data set for statistical analysis, for instance in A/B testing frameworks or clinical trials.
___________ measures the share of profits that are earned by a share of common stock. GAAP requires the reporting of earnings per share in the income statement.
Earnings per share (EPS) on common stock
Ways to expand global
Globalization Transnational Localization
How can you prove an improvement to an algorithm is an improvement over doing nothing?
Good experimental design 1. No selection bias in test data 2. Test data is a good model of the real world 3. Ensure results are repeatable
PP&E is usually on balance sheet at
Historical cost
Type II error
Ho is false, but we fail to reject it
Adjustments *Inventory*
LIFO vs FIFO
What is bias / variance trade off?
More powerful methods have less bias but more variance
For K₁>K₂
P(S,K₁,T) ? P(S,K₂,T),≥
Debt-to-Equity Ratio
Total Debt/Total Shareholder's Equity
Five C's Competition
Who are the biggest competitors? What market share do they each hold? Has the market changed in the last year? How do our services or products differ from the competition? Do we hold any strategic advantage over our competitors?
Starting a New Business Steps 6. Customers
Who are the customers? How can we best reach them? How can we ensure that we retain them?
Common‐sized statements are useful for comparing ________
_________, or _______.,the current period with prior periods, individual businesses with one another, or one business with industry averages
What information do lenders want?
ability to repay debt
Appropriateness of Accounting Policies
accounting policies aren't in accordance with the applicable financial reporting framework financial statements don't represent the underlying transactions entity hasn't complied with the financial reporting framework requirements for accounting for and disclosing changes in accounting policies
What do you understand by the term Normal Distribution?
bias, central value, bell shaped curve, random variables
Adverse Opinion Paragraph
because of ______ do not present fairly
statistically significant
been predicted as unlikely to have occurred by sampling error alone, according to a threshold probability—the significance level
Strategic decisions need to be made...
before a business plan is written
csat links ____________ and _________________ ____________
branding and customer loyalty
variable cost drivers
cogs, raw materials, energy inputs, labor, service
More C's
collaborators, costs, channels, competencies, capacity, culture
what does ASCII mean?
comma separated values files
z-score
counts the number of SDs (σ) that an observation is away from the mean (μ) If Z > 0, the observation is above the mean If Z < 0, the observation is below the mean
Calculating correlation
covariance (x,y) / stdev(x) * stdev(y)
Business situation segments acronym
cpcc
current ratio
current assets/current liabilities
upside and downside capture ratio
downside- measures how a portfolio performed vs a benchmark when the benchmark fell in value upside- measures how a portfolio performed vs a benchmark when the benchmark rose in value portfolio performance over period of time / benchmark performance over period of time
Risk-Neutral Pricing (C)
e^(-rh)*[p*Cu-(1-p*)Cd]
Real-World Pricing (C)
e^(-γh)[pCu-(1-p)Cd]
what are referential intergrity constraints?
generates normalized tables between them via primary key and foreign key pairs.
gross margin
gross profit divided by revenue - gross profit is revenue minus costs of goods sold - tells what percentage of revenue the business keeps before paying other expenses
gross profitability ratio
gross profits divided by assets higher ratio will typically outperform lower ratio
The percentage analysis of increases and decreases in related items in comparative financial statements is called _____ analysis.
horizontal
Pick an algorithm you like and walk me through the math and then the implementation of it
in pseudo-code. OK now let's pick another one, maybe more advanced.,
What are Recommender Systems?
information filtering systems, used - movies
Measures of central tendency
median, average, mode
value at risk (VaR)
minimum potential loss at a given confidence interval- typically uses historical returns and normal distributions
Omega Ratio
must first pick target return threshold (typically risk free or 0%) Sum of (returns above threshold - threshold return) / sum of abs(returns below threshold - threshold return) can be used on non-normal returns
fixed cost drivers
overhead, machinery, distribution, interest, depreciation, rent
Quick Ratio = ? / ?
quick assets / current liabilities
risk free rate of return
typically one year treasury bill no risk associated. used as comparative benchmark
For t<T
Ceur(T) ? Ceur(t) on a Non-Dividend Paying Stock,≥
Instead of Changing the industry price level(Which requires alot of cooperation) then you should focus on.....
Change the volume or costs because it is alot easier and under the company's control
Entering a New Market Market Elements
Competition Market Share Comparative Products and Services Barriers to Entry
What is associative rule learning?
Computer is given a large set of observations made up of multiple variables. The task is to learn relationships between variables. If A and B => C
If Profits are declining because of rising expenses.....
Concentrate on operational and financial issues ie COGS, labor, rent, and marketing costs
reducing costs: sales flat
profits declining (surge in costs),focus on internal costs vs. external costs internal: union wages, suppliers, materials, economies of scale, increased support systems external: economy, interest rates, government regulations, transportation/shipping strikes
sterling ratio
annualized return since inception / max draw down since inception +10%
dividend discount model
next years expected dividend/ (discount rate - growth rate) basic stock pricing formula to determine fair value
How long are Current Assets held by the entity?
no more than one period Inventory, trade receivables, prepayments, cash (and equivalents)
dividend payback period
number of years it would take for dividend growth to equal initial cost of purchasing stock. lower period is better
Sampling error
occurs whenever n < N, i.e., the sample is smaller than the relevant population.
The auditor cannot issue an unmodified report if the client
omits the statement of cash flows from the financial statements.
operating margin
operating profit (includes most expense but not interest or taxes) divided by revenue
depreciation
reduction in value of an asset over time
Central limit theorem states
regardless of the way in which the population data is distributed for large samples
Domain knowledge
relates to the expertise gained by individuals in certain areas or fields. For example, medicine is a domain.
Gross Profit Margin (%)
Gross Profit ($) / Revenue
Revenue
Price*Quantity
Four P's
Product Price Place Promotion
The _______ measures the rate of income earned on the amount invested by the stockholders.
rate earned on stockholders' equity
If a Product is in its Declining Stage....
Define niche market analyze the competition's play or think exit strategy
What is the difference between artificial learning and machine learning?
Designing and developing algorithms according to the behaviours based on empirical data are known as Machine Learning. While artificial intelligence in addition to machine learning, it also covers other aspects like knowledge representation, natural language processing, planning, robotics etc.
Data
Data are the raw figures, numbers, or text that serve as the starting point of analysis.
Information
Data become information when they reveal the causes or results of the event.
Data calibration
Data calibration is the method of establishing a relationship between a data point and a unit of measure that has been formally defined
what are examples of master data?
data about customers, products, vendors, employees, and fixed assets
what is anomalies?
data anomalies (irregularities). They threaten data integrity.
volume inputs and drivers
increase overall market share, overall market growth, new markets, changing customer demand,
Vasicek model
In finance, the Vasicek model is a mathematical model describing the evolution of interest rates. It is a type of one-factor short rate model as it describes interest rate movements as driven by only one source of market risk. The model can be used in the valuation of interest rate derivatives, and has also been adapted for credit markets. It was introduced in 1977 by Oldřich Vašíček[1] and can be also seen as a stochastic investment model.
Reporting on Incomeplete Presentation
Incomplete presentation that is otherwise in accordance w GAAP is a type of single FS. When reporting on incomplete presentation that is otherwise in accordance w GAAP, the report should include an emphasis of matter paragraph (after opinion) or explanatory paragraph (before): 1. States the purpose for which the presentation is prepared and refers to note that described basis of presentation 2. Indicates that the presentation is not intended to be a complete presentation of the entitys A, L, R, E
Mergers and Acquisitions Objectives Elements
Increase Market Access Diversify Holdings Pre-empt the competition Enjoy the tax advantages Incorporate synergies Increase Shareholder value
International Standards
International Financial Reporting Standard Generally Accepted Accounting Principles
What is the Poisson Distribution?
It is a discrete probability distribution that describes the outcome of events in a given unit of time or space. A discrete distribution refers to a countable # of outcomes, where as continuous distributions can have infinite values. • Calls received by TD EasyLine during payday • Number of bacteria per volume of mucuous • Number of customers at the checkout counter at Wal-Mart at any given minute • Number of breakdowns in the TTC per day In each example, x refers to the # of events that occur in that period of time during which an average of Mu events can be expected to occur. Events HAVE TO BE INDEPENDENT. Mean = mu | variance = mu
What is the goal of A/B Testing?
It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An example for this could be identifying the click through rate for a banner ad.
Why Spark even Hadoop exists? (1)
Iterative Algorithm: Generally MapReduce is not good to process iterative algorithms like Machine Learning and Graph processing. Graph and Machine Learning algorithms are iterative by nature and less saves to disk, this type of algorithm needs data in memory to run algorithm steps again and again or less transfers over network means better performance.
GBI- Global Bike Inc. History
John Davis is a bicyclist and a mountain racing champion. He created a company in the United States to produce trail bikes. Peter Weiss of Germany is an engineer who not only races road bikes but also designs bike frames. He formed a company to manufacture lightweight touring bike frames. John and Peter met in 2000 and merged their two companies to form GBI.
How is KNN different from k-means clustering?
K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points. The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn't — and is thus unsupervised learning.
Explain LSA
Layered-LSA model consists of multiple layers. Scalable- Data warehouses typically grow in scope and size. Architecture- LSA is an architecture that does not depend on any specific technology.
Turnaround Approach Strategy Elements
Learn about the company Review serviece, products and finances Secure Funding Review talent and culture Determine short term long term goals write a business plan reassure clients suppliers and distributors Prioritize goals and develop some small successes for momentum
Advantages of Debt Investing
Less Risk of losing investment More reliable income stream
What is logistic regression?
Logistic Regression is a regression model where the dependent variable (DV) is categorical. It is used when we are trying to predict binary outcomes (e.g. predicting whether a student will PASS/FAIL a test given the number of hours he spent studying). Another example is trying to predict whether a political candidate will WIN/LOSE, Predictor variables: -amt of $ spent on campaign -time spent on the campaign and so on...
For Sampling Data & For Distributions: Define Mean Value & Expected Value
Mean value is the only value that comes from the sampling data. Expected Value is the mean of all the means i.e. the value that is built from multiple samples. Expected value is the population mean. For Distributions Mean value and Expected value are same irrespective of the distribution, under the condition that the distribution is in the same population.
When to Issue an "Adverse Opinion"
Misstatements are material *AND* pervasive Example: 1. Material Misstatement
Model Fitting
Model fitting is a procedure that takes three steps: First you need a function that takes in a set of parameters and returns a predicted data set. Second you need an 'error function' that provides a number representing the difference between your data and the model's prediction for any given set of model parameters. This is usually either the sums of squared error (SSE) or maximum likelihood. Third you need to find the parameters that minimize this difference
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Auditor's Responsibility Paragraph:
Modify the paragraph to state: "Auditor believes that the auditor evidence obtained is sufficient and appropriate to provide a basis for the qualified audit opinion."
Nestle- Alt Mrkts
N America S America Europe Asia
Product segments acronym
NCDCSL
Net Profit Margin
NI/Rev
Explain what is "Over the Counter Market"?
Over the counter market is a decentralized market, which does not have a physical location, where market traders or participants trade with one another through various communication modes such as telephone, e-mail and proprietary electronic trading systems.
If decline in Sales analyze these three things.....
Overall declining market demand(soda sales have dropped as bottled water becomes the drink of choice) The possibility that the current marketplace is mature or your product is obsolete(Vinyl records compared to CDs) Loss of Market share due to substitutions(Video rentals got owned like blockbuster)
P/B Ratio
P0/BV Common Equity
What is PAC Learning?
PAC (Probably Approximately Correct) learning is a learning framework that has been introduced to analyze learning algorithms and their statistical efficiency.
Valuing Equity with PEG
PEG * Expected EPS * Growth Rate
For t<T
Pamer(T) ? Pamer(t),≥
In what areas Pattern Recognition is used?
Pattern Recognition can be used in a) Computer Vision b) Speech Recognition c) Data Mining d) Statistics e) Informal Retrieval f) Bio-Informatics
What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.
Management's Responsibility
Financial Statements & Internal Control 1. Preparation/fair presentation of FS in accordance to framework 2. Design, implemntation, and maintance of IC relevant to preparation and fair presenation of FS free from matieral misstament 3. Providing auditor with access to information nd persons
Two internal dimensions
Financial and competitive position (EP & CP)
Increasing Profits Revenue Elements
Identification of Revenue Streams Percentage of total revenue of each? Unusual balance? Have percentages changed?
Competitive Response Questions
If Competition comes out with a new product/ How does it differ from ours? What has the competitor done differently? Have any other competitors picked up market share? Have the consumers needs changed? Did they increase or expand into new channels?
Cox-Ross-Rubinstein Binomial Tree: u
exp[σ√h]
w
01110111
1/14
.0714
COGS
cost of goods sold
Cash Ratio
(Cash + Cash Equivalents)/CL
Asymmetric
∀a,b∈X(aRb→¬(bRa)) i.e. antisymmetric ⋀ irreflexive
001000
10
What is securitisation?
bundling similar assets to provide the backing for bonds
eva
nopat-wacc (ta-cl)
discrete uniform distribution: Variance
(b-a+1)^2 - 1 /12
if product in mature...
-manufactoring, cost, compt
residual income
operating income-rr
Data Provisioning
Provisioning -the process of providing users and systems with access to data.
With which programming languages and environments are you most comfortable working?
Python, Anaconda environment, PySpark
Qualified vs. Adverse (when to express)
Qualified - material but not pervasive Adverse - material AND pervasive
Audit Issues - Material but not pervasive
Qualified Opinion
Forward on a Sock with Discrete Dividends
S(t)*exp[r(T-t)]-CumValue(Div)
Pre-paid Forward on a Sock with Discrete Dividends
S(t)-PV(Div)
In-memory databases
SAP HANA that employ columnar storage and other technologies. The data in an in-memory database are stored in a columnar store in memory (RAM).
Customer segments acronym
SGMNPD
Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.
SVM and Random Forest are both used in classification problems. a) If you are sure that your data is outlier free and clean then go for SVM. It is the opposite - if your data might contain outliers then Random forest would be the best choice b) Generally, SVM consumes more computational power than Random Forest, so if you are constrained with memory go for Random Forest machine learning algorithm. c) Random Forest gives you a very good idea of variable importance in your data, so if you want to have variable importance then choose Random Forest machine learning algorithm. d) Random Forest machine learning algorithms are preferred for multiclass problems. e) SVM is preferred in multi-dimensional problem set - like text classification but as a good data scientist, you should experiment with both of them and test for accuracy or rather you can use ensemble of many Machine Learning techniques.
generic frameworks
SWOT, cost-benefit analysis
what are the components of data warehousing
Sources Systems Data Staging Data warehouses Data Mart Analytics Tool
Spark Low-Latency
Spark can cache/store intermediate data in memory for faster model building and training. Also, when graph algorithms are processed then it traverses graphs one connection per iteration with the partial result in memory. Less disk access and network traffic can make a huge difference when you need to process lots of data.
Speculative Execution (SPARK)
Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset . Such slow tasks will be re-launched in another worker. It will not stop the slow tasks, but run a new copy in parallel.
Velocity
Speed of data generation
What are support vector machines?
Support vector machines are supervised learning algorithms used for classification and regression analysis.
Early Exercise IS Optimal for Put (Necessary Conditions)
S∂<Kr | K(1-e^(-rT))>S(1-e^(-∂T))+C(S,K,T) or S-K>Ceur(S,K,T)
Working Capital Turnover
Sales/Average Working Capital
Backward integration
Seeking ownership or increased control over suppliers
Growth Rate
Retention Ratio * ROE Retention Ratio = (1 - Payout)
Operating Income
Revenue - Operating Expense
Gross Profit
Revenue*Gross Margin %
Increasing Profits Approachs
Revenue- E(P=R - C)M Always look at external first Costs Volume
What is selection bias?
Selection bias is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed. It is sometimes referred to as the selection effect.
What is sequence learning?
Sequence learning is a method of teaching and learning in a logical manner.
3 Areas where Material Misstatements may Arise
(1) Appropriateness of accounting policies (2) Application of Accounting Policies (3) Appropriateness of the financial statement presentation/appropriateness or adequacy of disclosures in financial statements.
termine value
(CF last yr x (1 +grwth rate))/(cost of capital -grwth rate)
cfroi
(cfo-ed)/cash invested
Generally Accepted Government Auditing Standards
*GAGAS* - Audits -Section: GAGAS -Standard Setting: Governmental Accountability Office -Provide guidance for audits of government organizations, programs, activities and of entities that receive government funds -Financial or performance audits of gov organizations, programs, activities,a nd of entities that receive government funds
The grand strategy matrix
-Based on two evaluative dimensions: competitive position and market (industry) growth
barriers of entry
-economies of scale -capital requirements -government policy -switching costs -access to distribution channels -product differentiation -proprietary product technology
Long term sources of finance for listed companies
-financial markets
How can you use Machine Learning library SciKit library which is written in Python with Spark engine?
,Ans: Machine learning tool written in Python, e.g. SciKit library, can be used as a Pipeline API in Spark MLlib or calling pipe().
How would you brodcast collection of values over the Spark executors?
,Ans: sc.broadcast("hello")
If the hypothesized value falls within the confidence interval
,Fail to reject Ho
The failure of the financial statements to contain adequate disclosure of related party transactions
,Or other required disclosures , would result in a qualified or adverse opinion , not a disclaimer of opinion.
If the hypothesized value falls outside the confidence interval
,Reject Ho
111111
77
100110
46
What do you mean by Dependencies in RDD lineage graph?
Ans: Dependency is a connection between RDDs after applying a transformation.
Possible Values: Γput
>0
What is a Gaussian?
A family of functions that show a "bell curve" shape.
Use supply and Demand framework when you hear
Capacity change through acquisitions, merger Build shut-down factory Capacity shift in response to change in demand
What are Bayesian Networks (BN) ?
Bayesian Network is used to represent the graphical model for probability relationship among a set of variables .
12
C (hex)
What is a non-current/long term liability?
Due by the entity for more than one period long term loans mortgages debt instruments
Gamma is the greatest when an option: A) is deep out of the money. B) is deep in the money. C) is at the money.
Gamma, the curvature of the optionprice/assetprice function, is greatest when the asset is at the money.
BCG Matrix Cash Cows
High Market Share Low Margins Generate excess amount of cash, past the amount that should be reinvested.
Stars II
High market share, high growth rate
What is an Incremental Learning algorithm in ensemble?
Incremental learning method is the ability of an algorithm to learn from new data that may be available after classifier has already been generated from already available dataset.
Ethical Requirements
Indepedent -Comply with ethical requirements related to FS including independence in both fact and apperance. Include AICPA Coe of Professional Conduct and rules of the state boards.
BCG Matrix Question Marks
Low Market Share High Growth They always almost require more cash then they can generate. If there is no cash they will fall behind and die It is a liability that does have a good pay off but needs to be fed capital
BCG Matrix Pets(Dogs)
Low Market Share Low Growth They may show profit but the cash needs to be reinvested for them to keep share Product is worthless except in liquidation No excess cash flows
Range
Maximum - Minimum
How would you deal with categorical features?
One-hot encoding
Differentiation
Producing products/services considered unique and directed at consumers who are relatively price-sensitive
New Product Approaches
Product Market Strategy Customers Financing
How do you avoid false positive?
Set a proper sample size
Data Staging
The process whereby data are organized and prepared for analysis
What is ensemble learning?
To solve a particular computational program, multiple models such as classifiers or experts are strategically generated and combined. This process is known as ensemble learning.
Debt-to-Total-Capital Ratio
Total Debt/Total Capital Total Capital = Debt + Equity = Total Assets
Correlation
Unitless measure of relationship between two variables Always between -1 and +1 Strong weak cutoffs |r| < .3 |r| >.7
What evaluation approaches would you work to gauge the effectiveness of a machine learning model?
You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data. You should then implement a choice selection of performance metrics: here is a fairly comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion matrix. What's important here is to demonstrate that you understand the nuances of how a model is measured and how to choose the right performance measures for the right situations.
Null Hypothesis
a statement about the population value that tis will be tested will be rejected only if the sample data provide enough contradictory evidence (Important! We can never accept the null! We can only reject it, or fail to reject it.)
Why is Naive Bayes so bad?
assumes features are not correlated
product factors
attributes, buyer decisions, competition, substitutions
appropriate when there are no outliers
average
Number of Days' Sales in Inventory = ? / ?
average inventory / average daily cost of goods sold
Data: Variance
average squared deviation from the average
promotion factors
awareness --> information search --> evaluation --> purchase --> repurchase
2. Data Visualization—Tools that feature advanced graphical representations such as heat maps
waterfall charts 3. Data mining—Advanced statistical tools that can be either descriptive or predictive in nature
Business situation segments
customer, product, company, competition
For options on Futures
d is,e^(-σ√h)
strategy
increased growth, increased profits, lower costs, new product development, new market
price earnings ratio
market price per share/earnings per share
Competition segments
market share, company, barriers to entry, supplier concentration, regulatory environment
npv formula
npv = pv - cost
revenue
price x volume (quantity)
annual market size
product rev/useful life(yrs)
Not informative of outliers
range
xCy if and only if |x - y| ≤ 1
where x is a Real number, and y is an integer,ex)
Formula: Elasticity
Ω=S∆/C
What are the different categories you can categorized the sequence learning process?
a) Sequence prediction b) Sequence generation c) Sequence recognition d) Sequential decision
tagged data
employ identifiers known as tags that are attached to the data elements to make them readable by a computer.
What information do employees want?
employment prospects, wage negotiations
Altman z plus score
estimates bankruptcy risk for companies 6.56A + 3.26B + 6.72C + 1.02D A= working capital / total assets B= retained earnings / total assets C= EBIT / Total assets D= Book value/ Total Liabilities >2.6 financially sound <1.1 likely to go bankrupt
When Spark works with file.txt.gz
how many partitions can be created?,Ans: When using textFile with compressed files ( file.txt.gz not file.txt or similar), Spark disables splitting that makes for an RDD with only 1 partition (as reads against gzipped files cannot be parallelized). In this case, to change the number of partitions you should do repartitioning.Please note that Spark disables splitting for compressed files and creates RDDs with only 1 partition . In such cases, it's helpful to use sc.textFile('demo.gz') and do repartitioning using rdd.repartition(100) as follows: rdd = sc.textFile('demo.gz') rdd = rdd.repartition(100) With the lines, you end up with rdd to be exactly 100 partitions of roughly equal in size.
The _____ ratio
sometimes called the working capital ratio or bankers' ratio, also measures a company's ability to pay its current liabilities.,Current
what are the three most common data structures?
spreadsheets, flat files, and databases.
market capitalization
total market value of a companys outstanding shares (price times outstanding shares)
Types of Material Misstatements
appropriateness of accounting policies application of accounting policies appropriateness of financial statement presentation or disclosures
Qualified Opinion Paragraph
except for _____ presented fairly
alpha
excess return. return greater than what would be expected from an investment using the capital asset pricing model. high alpha is outperforming market while controlled for risk (beta)
Companies will recognize revenues in the year 3 specific conditions are met
1. Have provided the goods/services to customers 2. Reasonable assured we will collect $ 3. We can determine the cost of providing those goods/services
Possible sections of the income stmt
1. Income from continuing operations 2. Income from discontinued operations 3. Extraordinary items 4. Cululative effect of a change in accounting principle 5. Net income 6. Comprehensive income 7. Earnings per share
Adjustments *Investments*
Classified as either 'available for sale' or 'trading'
Adjustments *Goodwill*
Company with internal growth vs company growth by M&A. -use tangible book value
demand elasticity
% change demand/%change in price
elasticity
% change volume / % change price
Material Misstatements related to Appropriateness of Financial Statement Presentation or Disclosures
(1) Financials do not include all required disclosures. (2) The disclosures are not presented in accordance with the applicable financial reporting framework. (3) Financials do not provide the disclosures needed to achieve fair presentation. (4) Info that is required to be presented, such as statement of cash flows, has not been included or disclosed in the financials.
Most Commonly Encountered GAAP Problems
(1) GAAP Consistency Change (unjustified) = Auditor Disagrees. (2) Inadequate Disclosure (3) Departure from GAAP (unjustified) (4) Unreasonable Accounting Estimates
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include: (3)
(3) A description of the nature of omitted information and inclusion of the omitted information, when practicable, if there is an omission of information that is required to be presented or disclosed.
Treynor Ratio
(Annualized return - risk free return) / Beta appropriate for use when a portfolio has diversified away non-systematic risk and only has systematic risk remaining
Turnarounds Steps
1. Analyze the Company and the industry 2. Possible actions
population data: US
300M
Options Credit Spread
A credit spread results from buying a long position that costs less than the premium received selling the short position of the spread
Data flow diagram (DFD)
A data flow diagram (DFD) is used to model the flow of data from one such object to another.
For an interest rate swap the swap spread is the difference between the: A) swap rate and the corresponding Treasury rate. B) fixed rate and the floating rate in a given period. C) average fixed rate and the average floating rate over the life of the contract.
A) swap rate and the corresponding Treasury rate. The swap spread is the swap rate minus the corresponding Treasury rate.
How would we reduce bias?
Add more features / more complex model
What's a Spark RDD?
An abstraction that distributes data and marshalls data behind the scenes
What is the advantage of using Scala over other functional programming languages?
As the name itself indicates Scala meaning Scalable Language, its high scalable, maintainability, productivity and testability features make it advantageous to use Scala. Singleton and Companion Objects in Scala provide a cleaner solution unlike static in other JVM languages like Java. It eliminates the need for having a ternary operator as if blocks', 'for-yield loops', and 'code' in braces return a value in Scala.
Entering the Market - Size of current & Future market
Ask interviewer: -What is the size of the market? -What is the growth rate? -Where is it in its life cycle?(stage of development:Emerging?Mature? Decline?) -Who are the customers and how are they segmented? - What role does technology play in the industry and how quickly does it change? - How will the competition respond?
Reducing Cost Approach
Assessment Cost Analysis- Internal Cost Analysis- External
Auditor's Responsibility
Auditing & Giving opinion (Attest Function: Opinion) 1. Maintain professional skepticism 2. Comply with ethical requirements 3. Exercise professional judgment throughout audit 4. Obtain sufficient appropriate audit evidence 5. Comply w GAAS
11
B (hex)
Standard Reports
Balance Sheet Income Statement Cash flow Statement
Five C's
Company Costs Competition Consumers/Clients Channels
Contribution Margin (%)
Contribution Margin ($) / Price
21 Ways to Cut Costs Labor
Cross-Train Workers Cut overtime Reduce employer 401k or 403k match Raise emplotyee contribution to health-care premium Institute 4 10hr days instead of 5 8hr days Convert workers into owners(If they are a stakeholder they will want to work harder) Contemplate layoffs Institute across the board pay decreases
DAGScheduler
DAGScheduler uses an event queue architecture in which a thread can post DAGSchedulerEvent events, e.g. a new job or stage being submitted, that DAGScheduler reads and executes sequentially.
Nestle- Growth Strategies
Distribution Channels Product Line Brands, Types of Waters Mktg Campaign(Acquire Competitor, Create Seasonal Balance) Prices
Constant (Gordon) Growth Dividend Discount Model
Dividend One Year After Period "t" / (R-G) D1 = D0 * (1+g)
interest coverage ratio
EBIT / interest expense higher is better- shows how well covered a companys interest expenses are by its earnings before tax and interest. over 1.5 is stable
Times-Interest Earned Ratio
EBIT/Interest Expense
Company segments acronym
EDCIFO
New entrants
Economies of scale Product differences Regulation retaliation
Income from discontinued operations
Ex. Maybe a company is trying to drop segment A bc they are no longer as profitable as the others
Why is feature engineering so important ?
Features are what you use to make predictions. Your choice of features can dramatically affect your model regardless of the algorithm you use. A simple algorithm on a good set of features can perform better than a sophisticated algorithm on a bad set
Pricing Strategies 1.Investigate the Company
How big is it? What Products does it have? Is it a Market Leader in this Field? What is their Objective?(Profits, Market Share, or Brand Positioning?) Is the company in charge of their own pricing strategies, or is it reacting to to suppliers, market, or competition?
Five C's Channels
How do we get our product in the hands of the end consumer? How can we increase our distribution channels? Are there areas of our market that we are not reaching? How do we reach them?
Four P's Place
How do we get the products to the end user? How can we increase or distribution channels? Do our competitiors have products in places that we dont? Do they serve markets that we cant reach? If so Why? How can we reach them?
What is the difference between Supervised Learning an Unsupervised Learning?
If an algorithm learns something from the training data so that the knowledge can be applied to the test data, then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the algorithm does not learn anything beforehand because there is no response variable or any training data, then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning.
Turnarounds 2. Possible Actions
Learn as much as possible about the company and is operations Analyze services, products and finances Secure sufficient fiancing, so your plan has a chance Review the talent and temperament of all employees and get rid of the deadwood Determine short term and long term goals Devise a business plan Visit clients, suppliers, and distributors to reassure them Prioritize goals and get some small successes under your belt ASAP to build confidence
Best Fit Line
Line that is down within dots in a scatter plot upward sloping line = positive correlation flat slopped line = no correlation
Price-earnings (P/E) ratio = ? / ?
Market Price per Share of Common Stock / Earnings per Share on Common Stock
If a Product is in its Growth Stage emphasize the.....
Marketing Competition
The Value Chain Marketing and Sales
Marketing Strategy Id of customer base and the cost of customer acquisition sales force issues
What are some classification methods?
Naive Bayes, SVM, Decision Trees, and Neural Networks
what is NLP?
Natural Language Processing- Programming languages such as Python allow developers to write programs that translate human voice and language into computer-readable text.
Rate Earned on Total Assets = ( ? + ? ) / ?
Net Income + Interest Expense / Average Total Assets
Out-of-the-Money
Option would not have a payout if it could be exercised.
The auditor's inability to observe physical inventories
Or apply alternative procedures to verify their balances could result in a disclaimer.
Characteristics of Business Entities (3)
Ownership structure Objective Liability
Trailing P/E
P0 / EPS for Past Year
Professional Skepticism
Professional judgment: make assessment yourself each year -Auditor plan and perform audit w professional skepticism. Recognition that circumstances may exist that cause FS to be materially misstated. Necessary to the critical assessment of audit evidence Alert for: -Evidence that contradicts other evidence obtained -Info that calls into question reliability of documents and responses to inquiries -Conditions that indicate possible fraud (Pressure, Opportunity, Rationalization) -Circumstances that suggest need for audit procedures in addition to GAAS
Nestle- Market
Size Growth Rate Major Players/MKT Share Changes in Industry Barriers- Gov Reg? Markets?- Home, Retail, Office
Black-Scholes Model Assumptions
Stock returns are normally distributed and independent over time. Risk-free rate, volatility and dividends are known and constant. No transaction costs. Possible to short-sell any stock and borrow any amount of money at the risk-free rate.
What is the advantage of performing dimensionality reduction before fitting an SVM?
Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.
Early Exercise IS Optimal for Call (Necessary Conditions)
S∂>Kr | S(1-e^(-∂T))>K(1-e^(-rT))+P(S,K,T) or S-K>Ceur(S,K,T)
Call Boundaries
S≥Camer≥Ceur≥max(0,F0,T[S]*e^(-rT)-K*e^(-rT))
What percentile does the mode represent?
The answer cannot be determined without further information. the mean's location depends upon the distribution of the data set.
How would you create a taxonomy to identify key customer trends in unstructured data?
The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.
Cumulative effect of a change in accounting principle
The catch-up adjustment of changing from one accounting principle to another -Ex. going from LIFO to FIFO
Example the quote to cash process
The complete set of business processes involved in selling, from creating initial offers for prospects to collecting cash.
Semi structured data
The data are considered semi-structured because they may contain both unstructured data in the form of text like what gets typed into searches and structured data stored automatically by the system based on the movements within the site
Randomization condition
The data values must be sampled randomly
Low p-value:
The data we have observed would be very unlikely if our null hypothesis were true -Should reject null hypotheses
Data Modeling
The definition of the data and their relationships
XBRL-extensible business reporting language
The goal was for business to report clear and understandable financial statements to the SEC. But now any activity that requires communicating unstructured data to a computer and a structured taxonomy of tags can use XBRL.
When an auditor issues an Adverse Opinion how is the opinion paragraph modified?
The opinion paragraph should include the following, "In the auditor's opinion, *because of* the significance of the matter(s) described in the basis for adverse opinion paragraph, the financial statements *do no present fairly* in accordance with the applicable financial reporting framework."
What is Model Selection in Machine Learning?
The process of selecting models among different mathematical models, which are used to describe the same data set is known as Model Selection. Model selection is applied to the fields of statistics, machine learning and data mining.
Give a popular application of machine learning that you see on day to day basis?
The recommendation engine implemented by major ecommerce websites uses Machine Learning
What are the assumptions required for linear regression?
The regression has five key assumptions: Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity
what is one example of transactional data?
The sales order data contain information about the date and time the order was created, the sales person who created it, the types and quantities of products ordered etc.
What happens to the sample mean and standard deviation as you increase the sample size?
The sample mean and standard deviation generally become closer to the population mean and standard deviation
What happens to the sample mean and standard deviation as you take new samples of equal size?
The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation
Independence assumption
The sampled values must be independent of each other
what is the components of the three tiered architecture?
The user interface or presentation tier The business services or business logic tier The data services and programming tier
If you decrease the confidence level (e.g. from 99% to 95%)
The width of the confidence interval decreases The precision of the confidence interval is lower
Five C's Consumer/Clients
Who are they? What do they want? Are we fulfilling their needs? How can we get more? Are we keeping the one we have?
short ratio
aka short float it shows the percentage of tradeable shares being sold short. higher ratio means more people are betting stock price will fall.
churn
aka, the loyalty metric = the percentage of existing customers who stop purchasing your product/service (measured in 30 days, 90 days, or 1 year)
what is LSA?
layered scalable architecture (LSA) is a flexible framework for data acquisition, storage, and retrieval that provides for a robust data warehousing process
d1
ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}+0.5σ^2T
outsourcing 2x2
low competitiveness & high strategic importance: improve or seek partner high competitiveness & high strategic importance: keep and leverage low competitiveness & low strategic importance: outsource high competitiveness & low strategic importance: seek different advantage
Threat of substitutes
price sensitivity product differences switching costs
discrete uniform distribution
where a finite number of values are equally likely to be observed • If there are n possible values, each value has a chance of 1/n of happening • EV = Mean = (a+b)/2, where a is the lowest possible value and b is the highest possible value
Call-Put Option Relationship: Delta
∆call-∆put=e^(-∂T)
Ωportfolio
∆portfolio*S/Value(Portfolio) (Assumes S is the underlying asset for all portfolio instruments.)
Portfolio Greeks
∑Greek(i) where Greek(i) is the Greek for investment i in the portfolio
Possible Values Ωput
≤0
Possible Values: Ωcall
≥1
If two events are independent
what is the probability that they both occur?,If A and B are independent, the most we can know about P(A and B) is that P(A and B) = P(A) * P(B).
it consists of a set of Bayesian Clauses
which captures the qualitative structure of the domain. The second component is a quantitative one, it encodes the quantitative information about the domain.
Forward on Currency
x(t)*exp[(rd-rf)(T-t)]
Adverse Opinion Due to Material Misstatement of the Financial Statements - Nonissuers (Private Company)
1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "adverse audit opinion" 4. Basis for Adverse Opinion 5. Adverse Opinion Paragraph - "because of," "the financial statements do not present fairly"
Qualified Opinion Due to Material Misstatement of the Financial Statements - Nonissuers (Private Company)
1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "qualified audit opinion" 4. Basis for Qualified Opinion 5. Qualified Opinion Paragraph - "except for," "the financial statements are presented fairly"
customer factors
1. who? segmentation -demographics -socioeconomics -needs 2. market data -share -size -growth 3. decision-making -what's driving the buying decision?
Appropriateness of Accounting Policies
1.Accounting policies aren't in accordance w/ the applicable financial reporting framework. 2.Financial statements don't represent the underlying transactions. 3.Entity hasn't complied w/ the financial reporting framework requirements for accounting for & disclosing changes in accounting policies.
Middle Paragraph
1.All of the substantive reasons. 2.Disclosure of the principal effects.
1111
15
0011
3
0110
6
What probability falls within one standard deviation of the mean?
68%
What is Interpolation and Extrapolation?
Estimating a value, approximating a value
Discrete Dividend Exercise (CALL)
Exercise cum-dividend
Grow and Increasing Sales 3. How to increase Volume?
Expand the number of distribution channels Increase product line through diversification of products or services(Particularly, where they wont cannibalize other products or services you have already) Analyze the segments of the business that have the highest potential Invest in a Marketing Campaign Acquire a Competitor(If you want to increase Market Share) Adjust Prices Create a Seasonal Balance(Increase sales in every quarter, seel flowers in the spring, herbs in the summer, pumkins in the fall and christmas trees in the winter)
|Critical value|>|Test statistic|
Fail to reject Ho, there is not sufficient evidence that Ha is true
What is logistic regression? Or State an example when you have used logistic regression recently.
binary outcome, predict, political leader, outcome, binary, predictor variables
Hypothesis testing
is a way of summarizing our conclusions about data based on confidence intervals
Information Ratio
measures a portfolios consistency and returns relative to a benchmark (Portfolio return - index error) / tracking error high information ratio suggest successful value added portfolio manager
Frame error
mismatch between people who could possibly have been sampled is different from the true target population
capital expenditures
money spend by a business to purchase assets. (cap ex)
assets
property (tangible and intangible) that has value and could likely be used to meet debt, commitments, or liabilities
example: PC1=spendy axis (proportion of baskets containing spendy items
raw counts of items and visits)
ways to segment a population
-age -gender -geography -income -married/single
001100
14
What is the Central Limit Theorem?
Average random variables independently drawn from independent distributions are normally distributed
Type 2:
Best-value strategy that offers products/services to a wide range of customers at the best price-value available on the market
What is regression?
Regression gives the computer pairs of (inputs, continuous targets) and the computer learns to predict continuous values on unseen data
p<a
Reject null hypothesis
Pre-paid Forward on a Non-Dividend Paying Stock
S(t)
Matching principle
Will match revenues and expenses to come up with net income
equity
assets minus liabilities- gauge overall built up value in a corporation
Collaborative filtering
friends like, used in social context networks
revenue
-price/unit -number of units sold
What is market analysis?
Research on the target market segment -industry/retail -market tastes-price/quality -market spending Info can help define business plan and viability
New Product Product Elements
Special or Proprietary? Financing? Patented? Substitutions? Advantages/Disadvantages? Place in product line? Cannibalizing our own products? Replacing existing products?
Economic Order Quantity
Sq. Root ((2(Annual Sales Units)(Cost/Purchase Order))/Annual Carrying Cost/Unit)
Entering a New Market Entry Elements
Start from scratch Acquire an existing player Form a joint venture/strategic alliance with existing player
ROI
(rev-cost-inv)/capitla inv
Principles - Information and Communication
*O*btain and Use Information *I*nternally Communicate Information *E*xternally Communicate Information
reducing VC
--(think about risks related to quality, customer satisfaction, return on investments, etc.) -look for inefficiencies in the manufactoring process -renegotiate contracts with suppliers and distributors (consolidate purchasing, volume discounts) -vertical integration -get cheaper raw materials or labor -outsource to cheaper regions
Industry/Acquiring a diverse company - Future outlook
-Are players coming into or leaving the industry? -Have there been many mergers or acquisitions lately? -What are the barriers to entry and/or to exit
Extraordinary items (gains/losses)
-Associated with natural catastrophe -A company experiencing a large loss from an ice storm in Alabama -Unusual and infrequent
Lack of Consistency
-Deals w comparability of FS from year to year. -Evaluate whether due to change in acc principle or adj to correct material misstatement *Acceptability of Change in Accounting Principle* Justified -Consider whether: 1. Newly adopted acc principle is in accordance w framework 2. Method of accounting for change is acceptable 3. Disclosure related to change are appropriate and adequate 4. Entity has justified alternative acc principle is preferable -If satisfied w all four criteria, the auditor should include an emphasis-of-matter paragraph in report. Examples that Affect Consistency: -Change in acc estimate that is inseparable from change in acc principle: change in depreciation method. *Change in estimates ONLY or errors are not consistency issues* -Correction of error in acc principal - cash method to accrual -Change in reporting entity results in FS that are those of different reporting entity -Using equity method and any changes made Effect of Acceptable Change on Report -If effect of change in accounting principle is immaterial, no revision -If material, emphasis-of-matter paragraph should be added
The grand strategy matrix: Quadrant III
-Must make some drastic changes quickly to avoid further decline and possible liquidation -Extensive cost an asset reduction should be pursued first
QSPM
-Objectively indicates which alternative strategies are best -Uses input from stage 1 analyses and matching results from stage 2 analyses to decide objectively among alternative strategies
Developing a new product - customers
-Who are our customers -How can we best reach them? -Can we reach them through internet/direct sales? -How can we ensure that we retain them?
Starting a new business - Investigate the market
-Who is our competition? -What size of the market does each hold? -How do products/sales compare to ours? -Are there any barriers to entry? (capital requirements, access to distribution channels, proprietary product technology, government services
if product is in emerging growth stage...
-concentrate on R&D, competition, and pricing
cost (would like to segment costs)
-cost/unit -> FC/unit and VC/unit -number of units sold
reducing FC
-reduce overhead -excess capacity -get to economies of scale -trim extra employees -> think automation, union negotiations, reduce overtime -going up the learning curve -one-time investments (decrease) -reduce seasonality of demand by finding good alternate use of PP&E -outsource to cheaper regions
Annual reports/ 10K reports filed w/ the SEC
1. Management's discussion and analysis 2. Income stmt, balance sheet, stmt of cash flows 3. Footnotes, supplementary schedules 4. Auditor's report
continuous probability function
1. Probability that x is between two points, a and b, is the integral of f(x) from a to b 2. It is non-negative for all real x 3. The integral of f(x) from negative infinity to infinity is 1
Types of Material Misstatements
1.Appropriateness of accounting policies. 2.Application of accounting policies. 3.Appropriateness of financial statement presentation or disclosures.
Appropriateness of Financial Statement Presentation or Disclosures
1.FS don't include all required disclosures. 2.Disclosures aren't presented in accordance w/ the applicable financial reporting framework. 3.FS don't provide disclosures needed to achieve fair presentation. 4.Info required hasn't been included or disclosed in the FS.
Application of Accounting Policies
1.Management hasn't applied accounting policies in accordance with the applicable financial reporting framework. 2.Management hasn't applied accounting policies consistently. 3.Error in the application of an accounting policy.
Basis for Qualified Opinion Paragraph
1.Totally new. 2.Description & quantification of the financial statement effects of any misstatement. 3.Explanation of how disclosures are misstated. 4.Description of the nature of omitted information and the inclusion of the omitted information if practicable.
Basis for Adverse Opinion Paragraph
1.Totally new. 2.Description and quantification of the financial statement effects of any misstatement. 3.Explanation of how disclosures are misstated. 4.Description of the nature of omitted information and the inclusion of the omitted information if practicable.
K
1000 = 10^3
001001
11
1011
11
001010
12
0010
2
{(1
2), (2, 3), (3, 4)} is it anti symmetric and why?,there isn't any xRy and yRx
010000
20
Quartiles
25th percentile (Q1 = lower median) 50th percentile (Q2 - same as median) 75th percentile (Q3 - upper median)
010110
26
011000
30
Los Angeles
4 million
100010
42
100011
43
100100
44
population data: Great Britain
60M
110101
65
Normal Distribution: 3 Empirical Rules
68% observations fall within 1 SD 95% observations fall within 2 SD 99.7% (almost all) observations fall within 3 SD
0111
7
population data: world
7 billion
What section(s) is added to the Auditor's report when a Qualified or Adverse Opinion is issued for a non-issuer?
A *Basis for Modification* paragraph and a *Qualified Opinion* or *Adverse Opinion* paragraph or added, as appropriate. (GAAP Problems)
Give an example of a Transformer
A ML model is a Transformer which transforms a DataFrame with features into a DataFrame with predictions
Pipeline
A Pipeline chains multiple Transformers and Estimators together to specify a workflow
If two events A and B are mutually exclusive
A and B are disjoint events P (A and B) = 0
denormalized database
A denormalized database is one that was originally normalized to eliminate anomalies, after which select redundant data were restored.
What is a business plan?
A document outlining the key details of a future business (associated with start up or small companies seeking funding)
data scientist
A practitioner of data science, they are trained in mathematics, computer science, and statistics.
What is the mean?
Arithmetic mean is the sum of values / number of values. Central value of a discrete set of numbers.
Increasing Sales Approach
Assessment(Increasing sales doesnt neccessarily mean increasing profits) How?
Maximin
Avoids the worst outcome "Pessimist"
Explain the two components of Bayesian logic program?
Bayesian logic program consists of two components. The first component is a logical one
Bonds
Bonds have a face value, maturity and coupon Sold by company to raise funds Periodic coupon payments and final repayment of face value.
Entering the Market - How
Brainstorm pros and cons of: -Start from scratch (new business) -Acquire an existing player -Form a joint venture/strategic alliance
Annuity Present Value
C * (1 - (1/((1+r)^t))/r) C = Amount of Annuity (Equal Future CF) r = rate of return t = number of years
Put-Call Parity (General)
C(S,K,T)-P(S,K,T)=F0,T[S]*exp[-rT]-K*exp[-rT]
Black-Scholes Formula
C(S,K,σ,r,T,∂)=F0,T[S]*e^(-rT)*N(d1)-F0,T[K]*e^(-rT)*N(d2)
For K₁>K₂
C(S,K₁,T) ? C(S,K₂,T),≤
For K₁>K₂
C(S,K₁,T)-C(S,K₂,T) ? K₂-K₁,≥
Put-Call Parity (Exchange Options)
C(S,Q,T)-P(S,Q,T)=F0,T[S]*e^(-rT)-F0,T[Q]*e^(-rT)
Put-Call Parity (Currency Options)
C(x,K,T)-P(x,K,T)=xe^(-rfT)-Ke^(-rdT)
Current Assets
Cash + Supplies(inventory)+ Accounts Recievable
What is an example of unsupervised learning?
Clustering and Density Estimations
Accounts Receivable Turnover
Credit Sales/Average AR
Volume
Data size
non-sampling error
Estimation errors Biased survey responses Using a non-representative sample Measurement errors
Mergers and Acquisitions Price Elements
Fair? Affordable? How to pay? If the economy sours?
Greek Relationship: Written vs Purchased
Greek(Written) = -Greek(Purchased)
Net Profit ($)
Gross Profit - Depreciation - Amortization - Other Expenses - Interest - Tax OR Sales Revenue - Total Costs
Four P's Promotion
How can we best market our products? Are we reaching the right market? What kind of marketing campaigns have we conducted in the past? Were they effective? Can we afford to increase our marketing campaign?
Is it better to have 100 small hash tables or one big hash table in memory
I would say 100 small hash tables, it's because of how the hashtables are implemented internally. As the number of records grow the constant in O(1) will increase and you see the performance degradation.
Increasing Profits Costs Elements
ID fixed costs ID variable costs Shifts in cost Unusual costs Benchmark competiton Reduce costs with out damaging revenue Streams
The client's refusal to provide access to the minutes of the Board of Director's meeting would result
In a disclaimer of opinion.
Option Greek Definition: Gamma
Increase in delta per increase in stock price (∂^2∆/∂S^2) (CONVEXITY)
Developing a New Product Consumer Adoption Rates
Innovators - 2.5% Early Adopters - 13.5% Early Majority- 34% Late Majority- 34% Laggards- 13.5%
What is linear least squares regression?
Linear
New Business Cost-Benefits Analysis Elements
Management Marketing and Strategic Plan Distribution Channels Product Customers Finance
New Business Approaches
Market Cost-Benefit Analysis
Adjustments *Long-lived Assets*
Methods & estimates
ROE
Net Income/SE
Do gradient descent methods always converge to same point?
No, they do not because in some cases it reaches a local minima or a local optima point. You don't reach the global optima point. It depends on the data and starting conditions
Operating Margin
Operating Income / Total Revenue
Adjustments *Off-balance Sheet Financing*
Operating vs Financing
When A and B are not independent
P(A and B) = P(A)P(B|A)
Binary Relation
Relationship between two sets, A and B, where the relation is a subset of A x B
Reorder Point
Safety Stock + (Lead Time * Sales During Lead Time)
Market penetration
Seeking increased market share for present products or services in present markets through greater marketing efforts
Paired data assumption
The data must be paired
What is bias?
The learner's tendency to learn about the same wrong thing
What is Invoice Discounting?
Third party provides a loan based on debt due Recourse -If debtor defaults the factoring company can seek payment from company.
Porter's 5 Forces 2. Intensity of Rivalry among Competition
This is a factor that plays a big role, could force the market into a price war and the company with the lowest costs would be able to survive
Architected data mart layer
This layer enables users to access data stored in the warehouse logically and efficiently
Cost leadership strategies
To employ one successfully, a firm must ensure that its total costs across its overall value chain are lower than competitors' total costs
Audit Issues - None or Immaterial
Unmodified (Unqualified)
Solvency
Value of assets is greater than the value of liabilities capital(equity) = assets - liabilites
What is Variance?
Variance is the error representing sensitivities to small training data fluctuations. (overfitting)
Call-Put Option Relationship: Vega
Vega(call)=Vega(put)
What are sources of funds? (5)
Venture Capital Business Angels Bank Loan Informal Government Grants
The Value Chain Delievery
Warehousing and Distribution Channels
Overfitting
When a model makes good predictions on training data but has poor performance on the test data
If you lower price and volume rises and you are pushed beyond capacity.....
Your costs will rise as employees will have to work overtime and your profits will suffer
Real-World Pricing (p)
[exp[(α-∂)h]-d]/[u-d]
Options Debit Spread
a debit spread results when the long position costs more than the premium received for the short position — nonetheless, the debit spread still lowers the cost of the position.
Probability
a number between 0 and 1 that measures the likelihood that some event will occur
Overdraft
a permit to overdraw an account up to a stated limit. Repayable on demand
In a _____
all items are expressed as percentages with no dollar amounts shown.,common‐sized statement
Random Variable
associates a numeric value with each possible random outcome Random variables can be discrete or continuous.
All users of financial statements are interested in the ability of a company to: Maintain _____ and _____. Earn income
called _____.,Liquidity, solvency, profitability
Nonsampling error
can come in many forms, and bigger samples don't help us reduce the problem: • Selection of sample (non-response, survival) • Non-truthfulness • Measurement error
cfroi definition
cash based metric to compute real interest rate of return on a companies assets express as a rate (cfo- cash invested in fa)
Elasticity
change in demand / change in price
3 Cs
company, customers, competitors
Flat file
contains data in text format with no structured relationship among the data.
In a vertical analysis of the _____
each asset item is stated as a percent of the total assets.,balance sheet
Company segments
expertise, distribution channels, cost structure, intangibles, financial situation, organisational structure
ETL
extracion transformaional loading
Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Middle Paragraph(s): A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (2). Disclosure of the principal effects of the subject matter of the adverse opinion on
financial position, results of operations, and cash flows, if practicable. (a). If the effects are not reasonably determinable, the report should so state. (b). If such disclosures are made in a note to the financial statements, the explanatory paragraph(s) may be shortened by referring to it.
The ratio of ______ to _____ is a solvency measure that indicates the margin of safety of the note‐holders or bondholders. It also indicates the ability of the business to borrow additional funds on a long‐term basis.
fixed assets to long‐term liabilities
core competency
hard to imitate efficiency of competitors perceived customer benefits
How would you improve a spam detection algorithm that uses Naïve Bayes?
hidden decision trees or decorrelate your features.
Corporations in some industries normally have _____ ratios of debt to stockholders' equity.
high
52 week range
high low over the past year
The relationship between the volume of goods (merchandise) sold and inventory may be stated as the ______ turnover. The purpose of this ratio is to assess the efficiency of a firm in managing its stuff.
inventory
in other words
it contains a list of historical transactions.
style
leadership style, meritocracy, etc.
d2
ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}-0.5σ^2T
Long Term liabilities
long term debt
market book value
market price per share/book value per share of equity
what is master data?
master data represent business entities that support business transactions
Other
more specific strategies,-Cooperation among competitors -Joint venture/partnering -Merger/acquisition -First mover advantages -Outsourcing
standard deviation
most commonly used risk metric- calculated as the annualized stock price standard deviation.
Qualified Opinion Due to Material Misstatement of Financial Statements: Basis for Qualified Opinion: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include: (1) A description and quantification
of the financial effects of any misstatement that relates to specific amounts in the financial statements. (b). If disclosure of the financial effects is made in the notes to the financial statements, the basis for the modification paragraph can be shortened by referring to the disclosure.
take rate is an ____________________ metric
operational (measures internal effectiveness)
Mean Absolute Deviation
or the average absolute deviation, is the average of the deviations (not squared)
discrete probability function
p(x), is a function that satisfies the following properties: 1. Probability that x can take on a specific value is p(x) 2. p(x) is non-negative for all real x 3. The sum of p(x) over all possible values of x is 1
For options on Futures
p* is,[1-d]/[u-d]
Each liability and stockholders' equity item is stated as a _____ of the total liabilities and stockholders' equity.
percent
How will you explain logistic regression to an economist
physican scientist and biologist?,
The number of times interest charges are earned can be adapted for use with dividends on ______ stock.
preferred
what is the most common data model?
relational data model
contribution margin
revenue - variable costs (1 - COGS + all other expenses / revenue)
Earnings and adjusted earnings
revenue minus expenses= earnings (profit) adjusted earnings for 1 time or unusual expenses (lawsuit, restructuring, acquisitions)
Statistics
rigorous branch of mathematics that deals with understanding data. It involves the collection or sampling, organization, modeling, analysis, interpretation, and presentation of data.
balanced scorecard looks at
shareholders, customer satisfaction, internal functions, innovation and learning
tracking error
shows the difference in performance between a fund or index and its benchmark
Customer segments
size, growth, market share, needs, price sensitivity, distribution channels
what are intelligent control systems?
software processes that work autonomously with distributed systems to control or run a system both with and without human intervention.
what is image recognition?
software scans a picture and translates what it "sees" into a textual description of whatever is depicted in the picture.
why local optimum is important in a specific context
such as K-means clustering,K-means clustering context: It's proven that the objective cost function will always decrease until a local optimum is reached. Results will depend on the initial random cluster assignment
or (4). Information that is required to be presented
such as a statement of cash flows, has not been included or disclosed in the financial statements.
The relationship between the total claims of the creditors and the owners
the ratio of _____ to ______, is a solvency measure that indicates the margin of safety for creditors.,liabilities to stockholders' equity
Data Source
the set of fields that are chosen for analysis in the source system.
what are data models?
the structure of a database
Computer Science
the study of how computers work and the application of theory to improve computing methods and capabilities.
Non-integrated special system
there are some systems that just stand alone. E.G P.O.S is usually linked with the accounting inventory systems. But most small business, owners can understand everything by just looking at their P.O.S system.
Nonresponse error
there is a consistent relationship between who answered and their response
what are the three dimension tables in star schema?
time unit data packaging
why do we use flat files?
to transfer data from one flat file location to another.
Correlation is not affected by
units of x and y
A percentage analysis used to show the relationship of each component to the total within a single financial statement is called _____ analysis.
vertical
You have RDD storage level defined as MEMORY_ONLY_2
what does _2 means ?,Ans: number _2 in the name denotes 2 replicas
What is stage
with regards to Spark Job execution?,Ans: A stage is a set of parallel tasks, one per partition of an RDD, that compute partial results of a function executed as part of a Spark job.
Pre-paid Forward on Currency
x(t)*exp[-rf(T-t)]
Call-Put Option Relationship: Gamma
Γcall=Γput
Call-Put Option Relationship: Psi
Ψcall-Ψput=-0.01TSe^(-∂T)
net profit margin %
net income / revenue
Return on Equity (ROE)
net income divided by equity
roi
net income/total assets
What is bias?
Bias is the error representing missing relations between features and outputs
Types of LT Finance (2)
Equity - sell part of company debt - borrow money
Forward on a Stock with Continuous Dividends
S(t)*exp[(r-∂)(T-t)]
Forward on a Non-Dividend Paying Stock
S(t)*exp[r(T-t)]
What are the areas in robotics and information processing where sequential prediction problem arises?
The areas in robotics and information processing where sequential prediction problem arises are a) Imitation Learning b) Structured prediction c) Model based reinforcement learning
Why are vectors used in machine learning?
The give a synthetic summary of characteristics of real world objects.
What is Chi-Square distribution?
The chi-square test is a tool that is used to identify if there is a relationship that exists between two given categorical data types. It is calculated by comparing observed vs. expected frequencies *R-squared: numerical data | Chi-squared: categorical data • Do certain products sell better in certain geographic regions? • Does gender influence car-color preference? (X <- gender, Y <- color) • In consumer marketing, a common problem that any marketing manager faces is the selection of colors for packaging. Assume that a manager wishes to compare five different colors. He is interested in knowing which of the five is the most preferred one so that it can be introduced in the market. A random sample of 400 consumers reveals the following:
Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). Report of Independent Registered Public Accounting Firm.
The company has excluded, from property and debt in the accompanying balance sheets, certain lease obligations that, in our opinion, should be capitalized in order to conform to accounting principles generally accepted in the United States of America.
What is classification?
The computer is given pairs of (inputs, target classes) and the computer learns to attribute classes to unseen data.
If Sales are Flat but Market Share is the same.....
This could be that the industry sales are flat and your competition is also facing the same issues
Pick an algorithm. Write the psuedo-code for a parallel implementation.
This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your ability to write code that reflects parallelism.
Qualified Opinion
This opinion is expressed when the auditor concludes that the misstatements, individually or aggregate, are material *but* not pervasive to the financial statements.
Basis for Modification Paragraph (Qualified or Adverse)
This paragraph contains the following: (1) A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (2) An explanation of how disclosures are misstated if there is a material misstatement related to narrative disclosure. (3) A description of omitted information and inclusion of the omitted information, *when practicable*, if there is an omission of info that is required to be presented or disclosed.
Pricing Strategies 3. Determine the Pricing Strategy
Three options: Competeitive Analysis Cost-Based Pricing Price-Based Costing Run through all of these and determine the pros/cons
Can you explain the difference between a Test Set and a Validation Set?
Validation set can be considered as a part of the training set as it is used for parameter selection and to avoid Overfitting of the model being built. On the other hand, test set is used for testing or evaluating the performance of a trained machine leaning model. In simple terms ,the differences can be summarized as- Training Set is to fit the parameters i.e. weights. Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization. Validation set is to tune the parameters.
25%
#/4 or #/2 twice
BE Price
(FC/BE Volume) +VC/unit
Statements on Auditing Standards
*SAS* - Audits -Section: AU-C -Standard Setting: AICPA Auditing Standards Board -Provides generally accepted auditing standards for audits of *nonissuers*. Provide guidance for other services like review of interim financial information and letters to underwriters *Private Company* -Audits of annual FS: nonissuers, Special reports: nonissuers, Interim FS: nonissuers
if sales are flat but market share is constant...
-could indicate industry sales are flat -examine competition
Vertical Spread
A money spread, or vertical spread, involves the buying of options and the writing of other options with different strike prices, but with the same expiration dates.
A stock is priced at 38 and the periodic riskfree rate of interest is 6%. What is the value of a twoperiod European put option with a strike price of 35 on a share of stock using a binomial model with an up factor of 1.15 and a riskneutral probability of 68%? A) $0.57. B) $0.64. C) $2.58.
A) $0.57. Given an up factor of 1.15, the down factor is simply the reciprocal of this number 1/1.15=0.87. Two down moves produce a stock price of 38 × 0.87 2 = 28.73 and a put value at the end of two periods of 6.27. An up and a down move, as well as two up moves leave the put option out of the money. You are directly given the probability of up = 0.68. The down probability = 0.32. The value of the put option is [0.32 2 × 6.27] / 1.06 2 = $0.57.
Financial Statement Issues - Material and Pervasive
Adverse Opinion
Which cluster managers can be used with Spark?
Apache Mesos, Hadoop YARN, Spark standalone and Spark local: Local node or on single JVM. Drivers and executor runs in same JVM. In this case same node will be used for execution.
An adverse opinion is issued when the financial statements
Are not presented in accordance with GAAP.
A portfolio manager holds 100000 shares of IPRD Company (which is trading today for $9 per share) for a client. The client informs the manager that he would like to liquidate the position on the last day of the quarter, which is 2 months from today. To hedge against a possible decline in price during the next two months, the manager enters into a forward contract to sell the IPRD shares in 2 months. The riskfree rate is 2.5%, and no dividends are expected to be received during this time. However, IPRD has a historical dividend yield of 3.5%. The forward price on this contract is closest to: A) $905,175. B) $903,712. C) $901,494
B) $903,712. The historical dividend yield is irrelevant for calculating the noarbitrage forward price because no dividends are expected to be paid during the life of the forward contract. FP = S 0 (1 + R f )^T 903,712 = 900,000(1.025)^2/12
An instantaneously riskless hedged portfolio has a delta of: A) anything gamma determines the instantaneous risk of a hedge portfolio. B) 0. C) 1.
B) 0 A riskless portfolio is delta neutral the delta is zero.
Long term sources of finance for small/expanding company
Banks-Debt Venture/Seed Capital- Equity Government Grants
Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Adverse Opinion. In our opinion
Because Of the significance of the matter discussed in the Basis for Adverse Opinion paragraph,,the consolidated financial statements referred to above Do Not Present Fairly the financial position of ABC Company and its subsidiary as of December 31, 2001, or the results of their operations or their cash flows for the year then ended.
Consider a fixedrate semiannualpay equity swap where the equity payments are the total return on a $1 million portfolio and the following information: 180day LIBOR is 4.2% 360day LIBOR is 4.5% Div. yield on the portfolio = 1.2% What is the fixed rate on the swap? A) 4.5143%. B) 4.3232%. C) 4.4477%.
C) 4.4477%. (1-(1/1.045))/((1/1+0.042(180/360))+(1/(1+0.045(360/360)) = 0.022239*2 = 4.4477%
Compared to the value of a call option on a stock with no dividends a call option on an identical stock expected to pay a dividend during the term of the option will have a: A) higher value only if it is an American style option. B) lower value only if it is an American style option. C) lower value in all cases.
C) lower value in all cases An expected dividend during the term of an option will decrease the value of a call option.
For t<T
Camer(T) ? Camer(t),≥
MetaData
Data about the data. Metadata provide context, meaning, and purpose to data
WT: Weaknesses and threats
Defensive tactics directed at reducing internal weaknesses and avoiding external threats
Do you have experience with R (or Weka
Scikit-learn, SAS, Spark, etc.)? Tell me what you've done with that. Write some example data pipelines in that environment.,
Reporting on Complete Set of FS and Single FS/Specific Element
The auditor should: -Issue a separate auditor's report and express a separate opinion for each engagement. May be published together provided they are sufficiently differentiated and report on the complete set of FS is unmodified/unqualified -Indicate in the report on a specific element the date of the auditor's report on the complete set of FS and nature of opinion expressed on the FS
Why overfitting happens?
The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.
Five C's Costs
What are the major cost? How have they changed in the past year? How do the costs compare to those of others in the industry? How can we reduce costs?
Five C's Company
What do you know about the company? How big is it? Is it public or private? What kinds of products or services?
what questions that data analytics enables us to answer?
What has happened in the past? Why did it happen? What could happen in the future? Can some of the actions resulting from our insights be automated? Can the analytics process be automated?
sector
broad business category a stock falls into
whereas a causal factor is one that affects an event's outcome
but is not a root cause. Essentially, you can find the root cause of a problem and show the relationship of causes by repeatedly asking the question, "Why?", until you find the root of the problem. This technique is commonly called "5 Whys", although is can be involve more or less than 5 questions.
What information do competitors want?
competitive analysis, benchmarking
geometric growth rate
compounded growth rate or time series growth rate ((end-beginning value) ^ (1/# yrs)) -1
Working Capital = ? - ?
current assets - current liabilities
Current Ratio = ? / ?
current assets / current liabilities
In a vertical analysis of the _____
each item is stated as a percent of net sales.,income statement
Sensitivity testing
looks at changing the inputs and how that changes the output provides insights into why the results occur. We change inputs and assumptions, and then recalculate our decision tree.
The ________ requires a report stating management's responsibility for establishing and maintaining internal control. In addition
management's assessment of the effectiveness of internal controls over financial reporting is included in the report.,Sarbanes‐Oxley Act of 2002
what are the transactional data of fact tables?
measures or key figures.
sharpe ratio
risk metric equal to: (asset return - risk free) / standard deviation
asset turnover
sales/total assets
What is Linear Regression?
score of a variable Y, predictor variable
Which languages would you choose for semi-structured text data reconciliation?
scripting languages (Python and Perl)
Data Gathering
selecting the data
If you are performing a hypothesis test based on a 90% confidence level
what are your chances of making a type I error?,10%. The probability of a type I error is equal to the significance level, which is 1-confidence level. (A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% chance of making a type I error.)
Enterprise value
what it would cost to completely take over a business. EV = mkt cap + debt - cash
Please explain
how worker's work, when a new Job submitted to them?,Ans: When SparkContext is created, each worker starts one executor. This is a separate java process or you can say new JVM, and it loads application jar in this JVM. Now executors connect back to your driver program and driver send them commands, like, foreach, filter, map etc. As soon as the driver quits, the executors shut down
csat is a double edged sword because.....
if someone really doesn't like your brand, they'll recommend for others to avoid it
What are the trade-offs between closed-form and iterative implementations of an algorithm
in the context of distributed systems?,
SWOT framework
internal: strengths & weaknesses external: opportunities & threats
Buyers
intrinsic power consolidation volume threat if backward integration
Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Middle Paragraph(s):(2) (2).Disclosure of the principal effects of the subject matter of the qualification on financial position
results of operations,,and cash flows, if practicable. (a). if the effects are not reasonably determinable, the report should so state. (b). if such disclosures are made in a note to the financial statements, the explanatory paragraph(s) may be shortened by referring to it.
contribution margin %
revenue - variable costs/ revenue
Margin of safety
risk management method- never pay fair value in case your fair value estimate was wrong ie only pay $6.70 for $10 stock to insure margin of safety
For options on Futures
u is,e^(σ√h)
Sortino Ratio
update to sharpe ratio (Annualized return - risk free rate of return) / Std Dev of negative return series
Funds from operation FFO
used for REITs instead of earnings. assets are primarily its business depreciation can significantly impact the results calculated as net income excluding gains or losses on the sale of property, with depreciation added back in.
cash conversion cycle
used to measure how quickly a company converts cash on hand into more cash 3 parts: Days Sales of Inventory (abbreviated as DSI) Days Sales Outstanding (abbreviated as DSO) Days Payable Outstanding (abbreviated as DPO) DSI+DSO-DPO
Hierarchies
used to organize dimension attributes in a tree-like structure for reporting purposes
enterprise value
what it would cost to completely take over a business. EV = mkt cap + debt - cash
Which form of income is the highest risk
with the greatest potential return?,Ordinary Shares Preference Shares Bonds Debentures- Asset Backed
The _______ measures the profitability of total assets
without considering how the assets are financed.,rate earned on total assets
011101
35
011110
36
Accounts Payable Deferral Period
365/AP Turnover
Receivables Collection Period
365/AR Turnover
0100
4
110100
64
110111
67
world population
7.5 billion
111001
71
111010
72
111011
73
111100
74
population data: Europe
740M
Europe
743.1 million
111101
75
111110
76
population data: London
7M
Possible Values: θcall
<0 (usually)
Possible Values: θput
<0 (usually)
Possible Values: Γcall
>0
Mapreduce
process large data sets
Quick (Acid-Test) Ratio
(Cash + Marketable Sec. + Receivables)/CL
LIFO liquidation:
Way for a company to manipulate earnings
What does the business plan include?
product offering customer base budget & finance
What is equity?
(Capital) The residual interests in the assets of an entity after deducting all its liabilities equity = assets - liabilities Shareholders equity Shareholders funds Stockholders equity Capital
Appropriateness of Financial Statement Presentation or Disclosure (≠ GAAP) Material misstatements related to the appropriateness of financial statement presentation or the appropriateness or adequacy of disclosures may arise when:
(1). The financial statements do not include all required disclosures
ROI
(Gain on Investment - Cost of Investment)/ Cost of Investment Typically shown as a percentage
Valuing Equity with P/B Ratio
(P0/BV Common Equity) * BV Common Equity
Valuing Equity with Price-to-Cash-Flow
(P0/Expected CF in One Year) * Expected CF in One Year
Valuing Equity with Price-to-Sales
(P0/Expected Sales in One Year) * Expected Sales in One Year
Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:(2)
(b). If disclosure of the financial effects is made in the notes to the financial statements, the basis for the modification paragraph can be shortened by referring to the disclosure.
Gearing (Leverage)
(debt to equity ratio) Mgmt seek optimum leverage Too high- increased risk of bankruptcy Too low- inefficient use of capital
Contribution Margin %
(rev-vc)/rev
return on investment (ROI)
(revenue - cost - investment) / capital invested
Gross profit margin
(sales revenue-cost of goods sold)/Sales revenue
Misstatements Related to Appropriateness of Accounting Policies
- Accounting policies are not in accordance with applicable financial reporting frameworks - FS are not fairly represented - Entity has not complied with the financial reporting framework requirements
Sharpe Ratio (φ)
(α-r)/σ(stock)=(γ-r)/σ(call)
Interpretive Publications
*Second level of audit guidance: middle authoritative -Recommendations regarding how SASs should be applied in situations. *Not considered auditing standards* -Auditor should consider the guidance in performing audit and able to explain any departures and how compliance with standards was achieved. -Examples: Auditing interprations of GAAS, exhibits to GAAS, auditing guidance provide in AICPA Audit and Accounting Guides, and AICPA Auditing Statements of Positions SOP
Auditing Sales Transactions
*C*ompleteness - Trace from Shipping Doc -> Invoice -> Sales Journal Cut*o*ff - Compare a sample of sales invoices from shortly before and after year-end with the shipment dates and with the sate the sales were recorded in the sales journal *V*aluation, Allocation, and Accuracy - *E*xistence and Occurrence *U*nderstandability and Classification
Five Components of Internal Control
*C*ontrol Environment - tone at the top *R*isk Assessment - FS misstated, not efficient, breaking law *I*nformation and Communication - Fair, Accurate, Complete, Timely -> FACT *M*onitoring - effectiveness of controls and report deficiencies *E*xisiting Control Activities - policies/procedures to mitigate risks
SAS nonissuers and PCAOB AS issuers
*First level of audit guidance*: most authoritative -Audits should use professional judgment in applying SAS or PCAOB to particular engagement, be prepared to justify any departures from mandatory requirements -"Must" or "is required": unconditional requirement that must be followed -"Should": indicates presumptively mandatory requirement, followed in cases when relevant -"May" "might" or "could" indicates explanatory material that does not impose requirement
Definition of Pervasive
*Very Material* -Auditor's professional judgement: are not confined to specific elements, accounts, items. If confined, represent substantial proprosion of FS, are disclosures fundamental to users understanding
Pricing strategies - supply & demand
*graph answer if possible... -What's the supply? How is demand? -How will pricing impact market equilibrium? -Matching competition: What are similar products selling for? -Are there substitutions?
In Spark-Shell which all contexts are available by default?
,Ans: SparkContext and SQLContext
Increasing sales - choose strategy
- Increase volume. (Get more buyers, increase distribution channels, intensify marketing.) - Increase amount of each sale. (Get each buyer to spend more.) - Increase prices. - Create seasonal balance
Misstatements Related to the Application of Accounting Policies
- Management has not applied accounting policies in accordance with the applicable framework - Management has not applied policies consistently - There is an error in the application of an accounting policy
A Material Misstatement of the financial Statements may arise in relation to the following:
- The appropriateness of accounting policies - The application of accounting policies - The appropriateness of the financial statement presentation or the appropriateness or adequacy of disclosures in the financial statements
Mergers and Acquisitions - researching company & industry
- What kind of shape is the company in? -How secure are its markets and customers? -How is the industry doing overall? -And how is this company doing compared to the industry? - How will our competitors respond to this acquisition? -Are there any legal reasons why we can't, or shouldn't, acquire it?
The grand strategy matrix: Quadrant I
-Continued concentration on current markets (market penetration and development) and products is an appropriate strategy
Pricing strategies - choosing
-Cost-based pricing vs. price-based costing (i.e., do you decide pricing based on how much the product costs to produce or on how much people will pay?) - How much does it cost to make or deliver/provide? -What does the market expect to pay? - Is it a "must have" product? - Do we need to spend money to educate the consumer?
Investing activities reflected in statement of cash flows
-Dividends received -Interest received -Loans to/from associate /subsidiary companies -Proceeds from sales of assets and investments -Purchases of assets & investments
Analytics Methodology within a Framework
-Enablers are the essential components needed for the methodology to work. They include technology, infrastructure, tools, and techniques. -The benefits of analytics are vast and varied. Examples are value/profit, performance, safety, health and longevity of the system, and many others. -People are generally both the creators and the benefactors of analytics activities. User authorizations and internal controls, and training are required within the framework to work.
Increasing profit - costs
-Identify the major variable and fixed costs. .-Have there been any major shifts in costs? (e.g., labor or raw material costs) - Do any of these costs seem out of line? .-How can we reduce costs without damaging the revenue streams? -Benchmark costs against our competitors.
Why Spark even Hadoop exists?(2),
-In Memory Processing: MapReduce uses disk storage for storing processed intermediate data and also read from disks which is not good for fast processing. . Because Spark keeps data in Memory (Configurable), which saves lot of time, by not reading and writing data to disk as it happens in case of Hadoop.
Mergers and Acquisitions - Goals & Objectives
-Increase market access - Diversify their holdings - Pre-empt the competition - Gain tax advantages -Incorporate synergies: marketing, financial, operations
The internal-external (IE) matrix
-Put the EFE on the Y axis and the IFE on the X axis -Quadrants are grouped -Three major regions: 1. Grow and build 2. Hold and maintain 3. Harvest or divest
Financial Accountants' responsibilities can include:
-Recording sales, expenses, bank transactions -Analysing the records -Providing a control function -Preparing information for external audit -Controlling cash - Liquidity -Responsibility for payroll
SWOT matrix
-SO: Strengths and opportunities strategies -WO: Weaknesses and opportunities strategies -ST: Strengths and threats -WT: Weaknesses and threats
Strategy-formulation analytical framework
-Stage 1: Input stage -Stage 2: The matching stage: Matches info to compatible strategies -Stage 3: The decision stage: Choose the best strategy
Turning around troubled co - industry
-Tell me about the company. -Why is it failing -Bad products, bad management, bad economy? - Tell me about the industry. - Are our competitors facing the same problems? - Do we have access to capital? - Is it a public or privately-held company?
what are the conditions in which sample is appropriate?
-The analysts are certain that each data point is representative of the entire set -The source dataset is too large for the planned analysis -The application specifically calls for a data sample, as is the case with some accounting and regulatory compliance audits
The grand strategy matrix: Quadrant II
-Unable to compete effectively -Need to determine why the firm's current approach is ineffective and how the company can best change to improve its competitiveness
Increasing profit - revenue/price
-What are the revenue streams?(Where does the money come from?) -What percentage of the total revenue does each stream represent? -Does anything seem unusual in the balance of percentages? -Have those percentages changed lately?If so,why?
Pricing strategies - investigate product
-What's special or proprietary about our product? -Are there similar products out there, and how are they priced? - Where are we in the growth cycle of this industry?(Growth phase? Transition phase? Maturity phase?) - How big is the market? - What were our R&D costs?
Industry/Acquiring a diverse company - Investigate Industry
-Where is it in its lifecycle?(Emerging?Maturity?Decline?) - How has the industry been performing (growing or declining) over the last 1, 2, 5,and10 years? -How have we been doing compared to the industry? -Who are the major players and what kind of market share does each have? -Who has the rest? -Has the industry seen any major changes lately? Such as new players, new technology and increased regulation. -What drives the industry?Brand products, size, or technology?
Customer (cheng)
-Who is the customer -identify segments (segment size, growth rate, percentage of market) -compare current year metrics to historical (look for trends) - What does each customer segment want (identify key needs) - Price sensitivity of each segment - distribution channel preference of each segment - customer concentration and power
if product in decline
-define niche market -analyze compt -think exit strategy
promotion
-do we have a desirable brand image -what are the metrics for awareness, trial, and retention -marketing campaign to increase awareness -discounts initally to capture the market and promote trial -promotions to encourage trial and referalls -warranties and refunds to encourage trial and lower ris -hav trained salesforce to talk about beenfits of product
potential reasons for M&A
-increase market access -diversify holdings -pre-empt competition -gain tax advantages -incorporate synergies -create shareholder value
growth strategies
-increase sales -increase distribution channels -increase product line -invest in major marketing campaign -diversify products and services -acquire competitors
if sales are flat and profits are takign a header
-investigate both revenue and costs -start with revenue
if profits declining because of drop in rev
-investigate marketing and distribution issues
if profits declining bc of rising expenses
-investigate price drop and/or cost increase
external factors
-market -customers -industry -competitors -risk
Assessing Credit Quality *Lowe Risk:*
-more revenue sources -more operational margins -stable/sustainable margins -higher FCF/Debt, FCF/Int.
ways to grow (other than volume)
-new customers -new products -new geographies -new channels -backward/forward integration -M&A -new tech -new capabilities
decline in sales problem...
-overall declining market demand -the current marketplace is mature or product is obsolete -loss of market share due to substitution
Financing activities reflected in statement of cash flows
-proceeds from issue of ordinary shares, or from borrowings -Repayments of borrowings -Dividends paid to equity shareholders or minority interests
Applications of Analytics
-retail- used to assist in pricing strategies -marketing- used in predicting customers behavior -supply chain- selecting suppliers and optimizing distribution costs -customer service- Customized service is based on analysis of prior work orders -financial investors- looking at companies that are great investments or risks
1/15
.0667
1/3
.333
10 x 100
000,1,000,000
Space
00100000
Period
00101110
000001
01
a
01100001
M
1,000,000 = 10^6
B
1,000,000,000 = 10^9
3 Main Questions of Financial Statements
1. How much is the company worth? 2. How much profit did the company make? 3. What cash movement took place?
Starting a New Business Steps
1. Initial Questions 2. Management 3. Market and Strategic Plans 4. Distribution Channels 5. Products and Services 6. Customers 7. Finance
what are the ERP business Areas?
1. Inventory control for reservation of the goods for the customer 2. Procurement (purchasing) for replacement of the sold goods 3. Warehouse management for picking and packing the goods for delivery 4. Shipping for the goods to be shipped 5. Billing so that the sale can be invoiced 6. Human Resources for calculation of commissions 7. Accounting to be recorded in the general ledger 8. Budgeting for comparison of actual results to forecasted sales
Pricing Case
1. Investigate the product 2. Choosing a Pricing Strategy 3. Supply and Demind (graph) -3 main ways to price a product -competitive analysis: are there similar products out there, how does our comapre to compt, do we know their costs, how are they priced, any subsititutions - Cost based pricing: take all our costs, add them up, and add profit to it -> break even point - Price-based costing: what is the willingness to pay,
What are the key concepts introduced by the Spark ML APIs?
1. ML Dataset 2. Transformer 3. Estimator 4. Pipeline 5. Param
Sales - Segregation of the Functions
1. Preparation of the Sales Order - Sales department receives a customer PO and prepares a prenumbered sales order 2. Credit Approval - Credit department approves the sales order and sends a copy of the approved sales order to the shipping, billing, and accounting departments 3. Shipment - Shipping department prepares a prenumbered bill of lading and sends a copy to the customer. They then ship the goods 4. Billing - Billing department prepares a prenumbered sales invoice. They then compare shipping documents, sales orders, and invoices. Invoices are then sent to the customer and to the AR department. 5. Accounting - Sale is entered into the sales journal and a receivable is recorded
Entering a New Market Steps
1. Questions about the Company 2.Determine Current and Future State of the Market 3. Investigate the Market to determine if it makes good business sense 4. If we decide to enter, what way should we choose to do so?
AR - Segregation of the Functions
1. Sales - A receivable is recorded in the AR control account and in the GL. An independent person should periodically reconcile the two. 2. Collection of Cash Receipts - When payment is received, the receivable is eliminated. 3. Uncollectible Receivables - An aging schedule is prepared and sent to the credit department. * in order to write-off a receivable, the treasurer must authorize 4. Sales Returns - Prenumbered receiving report may be used as a sales return slip. Once approved, the return is recorded and the related outstanding receivable is eliminated - Credit memos should *NOT* be prepared by those who collect/receive cash payments on AR 5. Sales Discounts - Sales discount procedures should be reviewed to make sure that discounts are recorded properly
what are the three the types of data anomalies
1. Update occur when the same data are stored in multiple places. Eg. customer's address is stored in customer billing and shipping table 2. Insert- result when there is no place within the table to store the new data until another event occurs. eg. the creation of customer info where the only place to store customer name is in the sales transaction table 3. Delete anomalies occur when deleting some data results in the unintentional deletion of other data. eg. if we delete a customer sales transaction and that would affect our sales history.
how to calculate brand equity
1. ask the brand equity question about paying a premium for a branded product 2. subtract all the tangible assets of a firm from the market valuation of the firm based on stock price
reducing costs: cash flow problem
1. breakdown of costs 2. is anything out of line? why? 3. benchmark the competitors 4. determine whether there are any labor-saving technologies that would help reduce costs
placement factors
1. channels: intensive v. selective 2. inventory -push v. pull -stock v. just in time -carrying cost 3. transportation -cost: inhouse v. outsource
five Cs
1. company 2. costs 3. competition 4. consumers/clients 5. channels
price factors
1. product: commodity vs. highly differentiated 2. competitor pricing 3. strategy -goal: penetrate, retain, convert, loss leader, etc. -positioning & perception -cost plus margin 4. customer -segmentation: differences in willingness to pay -elasticity
increasing sales
1. relationship between increasing sales and increasing profits -how are we growing relative to industry? -what has our market share done lately? -do we know what customers want? -are prices in line with our competitors? -what have comps done in marketing & product development? 2. ways to increase sales -increase volume: more buyers, increase distribution channels, intensify marketing -increase amount of each sale (get each buyer to spend more) -increase prices -create seasonal balance
steps
1. summarize question 2. verify objectives 3. ask clarifying questions -- company, industry, competition, product 4. lay out structure
industry analysis
1. things to investigate -life cycle -performance over last 1, 2, 5, 10 years -our performance compared to industry -major players/market share -recent major changes in industry -what drives industry? brand, products, size, technology -profitability -- margins 2. suppliers -how many? -product availability -what's going on in their market? 3. future -players entering or leaving market -recent M&A? -barriers to entry and exit? -substitutes?
What are the new Spark DataFrame and the Spark Pipeline?
A Spark DataFrame is a table where columns are explicitly associated with names.
Bullish Spread
A bullish spread increases in value as the stock price increases, whereas a bearish spread increases in value as the stock price decreases.
How would you implement a recommendation system for our company's users?
A lot of machine learning interview questions of this type will involve implementation of machine learning models to a company's problems. You'll have to research the company and its industry in-depth, especially the revenue drivers the company has, and the types of users the company takes on in the context of the industry it's in.
Audits of Single FS and Specific Elements
Accounts, or Items of FS,Audit of single FS or specific elements, accounts, or items of FS may be performed as separate engagement or in conjunction w an audit of an entity's complete set of FS *Single FS* -BS, Statement of income, RE, CF, Changes in Owner's Equity, Statement of operations by product line. *Specific Elements, Accounts, or Items* -AR -Allowance for doubtful accounts receivable -Inventory -Intangible assets -Schedule of disbursements regarding lease property -Schedule of profit participation or employee bonuses
What are specific ways of determining if you have a local optimum problem?
Determining if you have a local optimum problem: Tendency of premature convergence Different initialization induces different optima
Param
All Transformers and Estimators now share a common API for specifying parameters
continuous uniform distribution
All simulation software packages use the continuous uniform distribution, generating random numbers uniformly distributed between 0 and 1
What is Shuffling?
Ans: Shuffling is a process of repartitioning (redistributing) data across partitions and may cause moving it across JVMs or even network when it is redistributed among executors. Avoid shuffling at all cost. Think about ways to leverage existing partitions. Leverage partial aggregation to reduce data transfer
What is the difference between groupByKey and use reduceByKey ?
Ans : Avoid groupByKey and use reduceByKey or combineByKey instead. groupByKey shuffles all the data, which is slow. reduceByKey shuffles only the results of sub-aggregations in each partition of the data.
What is the transformation?
Ans: A transformation is a lazy operation on a RDD that returns another RDD, like map , flatMap , filter , reduceByKey , join , cogroup , etc. Transformations are lazy and are not executed immediately, but only after an action have been executed.
total return
stock price appreciation plus dividend payments
Which all kind of data processing supported by Spark?
Ans: Spark offers three kinds of data processing using batch, interactive (Spark Shell), and stream processing with the unified API and data structures.
Pricing Strategies Competitive Analysis
Are their similar products out there? How does our Product compare to the competition? Do we know the competitors costs? How are they Priced? Are their substitutions Available? Is there a Supply-and-Demand Issue? What will the competitive response be?
Can you cite some examples where a false negative important than a false positive?
Assume there is an airport 'A' which has received high security threats and based on certain characteristics they identify whether a particular passenger can be a threat or not. Due to shortage of staff they decided to scan passenger being predicted as risk positives by their predictive model. What will happen if a true threat customer is being flagged as non-threat by airport model? Another example can be judicial system. What if Jury or judge decide to make a criminal go free? What if you rejected to marry a very good person based on your predictive model and you happen to meet him/her after few years and realize that you had a false negative?
An investor who anticipates the need to exit a payfixed interest rate swap prior to expiration might: A) buy a payer swaption. B) buy a receiver swaption. C) sell a payer swaption.
B) buy a receiver swaption. A receiver swaption will, if exercised, provide a fixed payment to offset the investor's fixed obligation, and allow him to pay floating rates if they decrease.
In order to compute the implied asset price volatility for a particular option an investor: A) must have a series of asset prices. B) must have the market price of the option. C) does not need to know the riskfree rate.
B) must have the market price of the option. In order to compute the implied volatility we need the riskfree rate, the current asset price, the time to expiration, the exercise price, and the market price of the option.
What's the trade-off between bias and variance?
Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you're using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set. Variance is error due to too much complexity in the learning algorithm you're using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You'll be carrying too much noise from your training data for your model to be very useful for your test data. The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you'll lose bias but gain some variance — in order to get the optimally reduced amount of error, you'll have to tradeoff bias and variance. You don't want either high bias or high variance in your model.
Call Profit
C(S(h),K,T-h)-C(S(0),K,T)e^(rh)
How is market backwardation related to an asset's convenience yield? If the convenience yield is: A) positive causing the futures price to be below the spot price and the market is in backwardation. B) negative, causing the futures price to be below the spot price and the market is in backwardation. C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation.
C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation. When the convenience yield is more than the borrowing rate, the noarbitrage costofcarry model will not apply. It means that the value of the convenience of holding the asset it is worth more than the cost of funds to purchase it. This usually applies to nonfinancial futures contracts.
Accounts Payable Turnover
COGS/Average AP
What attributes does useful information have? (6)
Clarity Consistency Relevance Accuracy Reliability Timeliness
If Profits are declining because of a drop in revenues...
Concentrate on Marketing and Distribution Issues
Porter's generic strategies
Cost, Differentiation, Focus
What is covariance?
Covariance is a measure of how much two random variables change together.
The Value Chain Service
Customer support and retention
Retention Rate
Customers kept at the end of a period / Total customers available at the beginning of the period OR 1 - Churn Rate
K-fold cross validation
Data is divided into a train and validation set for k-times (folds) and the minimizing combination is selected
What do you understand by the term Normal Distribution?
Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve. Bell Curve for Normal Distribution
Data Science
Data science involves the use of computers to acquire knowledge by analyzing large amounts of data using models and domain expertise.
data Staging area
Data source-->Data staging--->Data Target ETL ETL
Databases
Databases are organized collections of data that enable users to access, manage, and update the data.
What is an example of a dataset with a non-Gaussian distribution?
Days to receive payment from the time invoice is sent
What is the Default level of parallelism in Spark?
Default level of parallelism is the number of partitions when not specified explicitly by a user.
Cause of Non-Recombining Trees
Discrete Dividends (e.g., (Se^(rh)-D)e^(σ√h))
Dividends per Share = ? / ?
Dividends / Shares of Common Stock Outstanding
_______ can be reported with earnings per share to indicate the relationship between dividends and earnings.
Dividends per share
Advantages of Debt Financing
Do not need to give up share of company. Interest payments made before tax
What is a current liability?
Due by the entity within one year trade payables accruals overdraft
Gross Profit Margin
Gross profit / Revenue
Reducing Cost Cost Analysis - external elements
Economy, interest rates government regulation transportation/ shipping strikes
Sustain competitive advantage through
Efficiency, quality, innovation, customer responsiveness
What is an Eigenvalue and Eigenvector?
Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.
Who and what are the enablers and benefits respectively?
Enablers- technology, infastracture, tools, techniques Benefits-value/profit, performance, safety, health / longevity
Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). In our opinion
Except For the effects of not capitalizing certain lease obligations as discussed in the preceding paragraph,the financial statements,referred to above Present Fairly, in all material respects, the financial position of X Company as of December 31, 2002 and 2001, and the results of its operations and its cash flows for the years then ended in conformity with accounting principles generally accepted in the United States of America.
Topics in a Business Plan
Executive summary Market analysis and marketing strategy Financial plan (revenue streams and cost structure) Strategic Rationale (unique value proposition) Environmental Analysis and SWOT Analysis
Increasing Profits Volume Elements
Expand into new areas Increase sales(Volume and Force) Increase Marketing Reduce Prices Improve customer service
Industry Analysis Future
Expanding or Shrinking? Mergers and Acquisition? Barriers to entry or Exit?
New Product Market Strategy Elements
Expansion of Customer Base Prompts to competitive response Barriers to entry Major Players and Market Share
What is a key component of planning?
Extensive market research
What is Debt Factoring?
External company takes over managing debtors Advances payment Non-Recourse -The factoring company accepts risk of default.
Regulation (F/M)
F- Companies act, accounting standards, stock exchange M-none
Reporting Focus (F/M)
F- Historical M-past, present & future
Primary user (F/M)
F- external/public M-Internal/private
Type of Information (F/M)
F-Aggregate/summarised M-Detailed/specific
Financial Reports (F/M)
F-structured, balance sheet, income sheet, cash flow statement M- ad hoc, combines financial and non-financial indiactors
BE volume
FC/(Rev-VC)
Materiality of Problem: Material and pervasive
Financial Statements Are Materially Misstated (Financial Statement Issues):Adverse opinion. Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Disclaimer of opinion.
what are the forms of normalization
First normal form: 1NF Second normal form: 2NF Third normal form: 3NF Boyce-Codd normal form: BCNF
Costs
Fixed Costs + Variable Costs
CAGE Framework
For analyzing global markets: Culture Administrative and political Geographic Economic / Wealth Distance from us
Central Limit Theorem (in Math)
For any population distribution with mean μ and variance σ2: •The distribution of the sample mean (x bar) is approximately normal with mean μ and variance σ2/n •This approximation improves as n increases
Practicable means that the information is reasonably obtainable from management's accounts and records and that providing the information in the auditor's report does not require the auditor to assume the position of preparer of financial information.
For example, the auditor is not expected to prepare a basic financial statement, such as an omitted statement of cash flows, or segment information and include it in the auditor's report when management omits such information.
Binomial Variables
Formally, a binomial distribution has parameters n and p, where there are n independent trials, and each trial has probability p of success • EV=Mean=np • Variance = np(1-p)
GAAS
Generally Accepted Auditing Standards
What is Genetic Programming?
Genetic programming is one of the two techniques used in machine learning. The model is based on the testing and selecting the best choice among a set of results.
Reducing Cost Assessment Elements
Get cost breakdown Investigate for irregularities Benchmark competitors Consider Labor saving technologies
The ratio of ______ to _____ is a profitability measure that shows how effectively a company utilizes its assets.
Net sales to assets
What is PCA
KPCA and ICA used for?,PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component Analysis) and ICA ( Independent Component Analysis) are important feature extraction techniques used for dimensionality reduction.
Four P's Price
How does our price compare to the competitions? How was our price determined? Are we priced right? If we changed our price what will that do to our sales volume?
Starting a New Business Steps 2. Management
How experienced is the team? What are its core competencies? Have they worked together before? Is there an advisory board?
Explain what significance means
If a statistical test returns significant, then the effect is unlikely to be from random chance alone
What is dimension reduction in Machine Learning?
In Machine Learning and statistics, dimension reduction is the process of reducing the number of random variables under considerations and can be divided into feature selection and feature extraction
What is unsupervised learning?
In unsupervised learning, the computer searches for patterns in the data without any examples.
The Value Chain Raw Materials
Inbound logistics included here the receiving raw materials into the warehouse Relationships with suppliers Just In Time Delivery
Early Exercise on a Non-Dividend Paying Stock
Never exercise Camer early! (i.e., Camer=Ceur)
Growth Strategies Strategy elements
Increase Distribution Channels Increase product line Invest in major marketing campaign Diversify products or services offered
What is logistic regression? Or State an example when you have used logistic regression recently.
Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.
Type 4:
Low cost focus strategy that offers products or services to a niche group of customers at the lowest price available on the market
Question marks I
Low market share, high growth rate -Must decide to strengthen by pursuing an intensive strategy or to sell them
Type 1
Low-cost strategy that offers products or services to a wide range of customers at the lowest price available on the market
Strategy/Performance *Niche market*
Lower volumes, higher margins -higher marketing/R&D costs
What is algorithm independent machine learning?
Machine learning in where mathematical foundations is independent of any particular classifier or learning algorithm is referred as algorithm independent machine learning?
Where do you usually source datasets?
Machine learning interview questions like these try to get at the heart of your machine learning interest. Somebody who is truly passionate about machine learning will have gone off and done side projects on their own, and have a good idea of what great datasets are out there. If you're missing any, check out Quandl for economic and financial data, and Kaggle's Datasets collection for another great list.
What is Machine learning?
Machine learning is a branch of computer science which deals with system programming in order to automatically learn and improve with experience. For example: Robots are programed so that they can perform the task based on data they gather from sensors. It automatically learns programs from data.
_______________ is required in annual reports filed with the SEC. It contains management's analysis of current operations and its plans for the future. Typical items included are: Management's analysis and explanations of any significant changes between the current and prior year's financial statements. Important accounting principles or policies that could affect interpretation of the financial statements. Management's assessment of the company's liquidity and the availability of capital to the company. Significant risk exposures that might affect the company. Any "off-balance-sheet" arrangements such as leases not included directly in the financial statements.
Management's Discussion and Analysis (MD&A)
Types of Assets
Non-Current/Fixed Assets Current Assets Intangible Assets
What are the basic assumptions to be made for linear regression?
Normality of error distribution, statistical independence of errors, linearity and additivity.
Sufficient Appropriate Audit Evidence and Risk
Obtain reasonable assurance, auditor obtain sufficient appropriate audit evidence to reduce audit risk to acceptably low level. *Weak internal control DOES NOT equal adverse opinion*
In-the-Money
Option would have positive payout if it could be exercised.
For K₁>K₂
P(S,K₁,T)-P(S,K₂,T) ? K₁-K₂,≤
Exchange Option Equality
P(S,Q,T)=C(Q,S,T)
Price-to-Cash-Flow Ratio
P0/Expected CF in One Year
Price-to-Sales Ratio
P0/Expected Sales in One Year
Barriers to Entry may include (8):
Patents Key partnerships Key customer relationships Expert management team Superior product, functionality Time to Market Cost advantages Switching Costs
What is statistical power?
Probability that the test correctly rejects the null hypothesis when the alternate hypothesis is true
How is a decision tree pruned?
Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning. Reduced error pruning is perhaps the simplest version: replace each node. If it doesn't decrease predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy.
Liabilities
Put on balance sheet in increasing order of maturity 1. Current vs long term liabs 2. Accounts payable 3. Notes payable 4. Mortgage payable 5. Lease liabilities 6. Pension liabilities 7. Liabs related to other postretirement benefits 8. Accrued liabilities 9. Unearned revenues 10. Income tax liabilities 11. Short-term debt 12. Current maturities of long-term debt
relative strength index
RSI = 100 - 100/ (1+ Average close price on up days / average close price on down days) compares recent gains to recent losses- is it over bought or over sold
Measures of Variability
Range Interquartile Range Average Absolute Deviation Variance Standard Deviation
The Value Chain
Raw Materials Operations Delivery Marketing and Sales Service
Retrenchment
Regrouping through cost and asset reduction to reverse declining sales and profit -Also called a turnaround or re-organizational strategy -Designed to fortify a firm's basic distinctive competence
RDD
Resilient: Fault-tolerant and so able to recomputed missing or damaged partitions on node failures with the help of RDD lineage graph. Distributed: across clusters. Dataset: is a collection of partitioned data.
Yarn Components
ResourceManager: runs as a master daemon and manages ApplicationMasters and NodeManagers. ApplicationMaster: is a lightweight process that coordinates the execution of tasks of an application and asks the ResourceManager for resource containers for tasks. It monitors tasks, restarts failed ones, etc. It can run any type of tasks, be them MapReduce tasks or Giraph tasks, or Spark tasks. NodeManager offers resources (memory and CPU) as resource containers. NameNode Container: can run tasks, including ApplicationMasters.
What are internal sources of finance?
Retained earnings -profits made in previous years Benefits gained from more effective mgmt of its working capital (these sources are unleveraged, not optimum in the long run)
Contribution Margin
Rev-VC
Nonissuer Report (qualified opinion)
Same as unmodified except add a basis for qualified opinion paragraph that describes the issues and explains how disclosures are misstated. In opinion paragraph, the language includes "except for" and "presented fairly".
Issuer Report (qualified opinion)
Same as unqualified except middle paragraph(s) are added in that explain all of the substantive reasons that lead to the auditor's conclusion and disclosure of the principal effects on the company's financial statements, if practicable. This paragraph goes before the opinion paragraph. The opinion paragraph has familiar language with the "except for" and "presented fairly" wording.
Divestiture
Selling a division or part of an organization -Used to raise capital for further acquisitions or investments
What do you understand by statistical power of sensitivity and how do you calculate it?
Sensitivity is commonly used to validate the accuracy of a classifier (Logistic, SVM, RF etc.). Sensitivity is nothing but "Predicted TRUE events/ Total events". True events here are the events which were true and model also predicted them as true. Calculation of seasonality is pretty straight forward- Seasonality = True Positives /Positives in Actual Dependent Variable Where, True positives are Positive events which are correctly classified as Positives.
data from sensors
Sensor data are the data gathered from devices such as heating units, vehicles, electrical transformers, airplanes, health monitors
Cluster Sampling
Separate the population into clusters, and take a random sample of the clusters selected, and then survey everyone in the selected clusters • Easier since it reduces the number of sampling locations • May not be a representative sample
Economist guilde to visualizing data
Show data Reduced Clutter Integrate text with graph
Assets
Shown on the balance sheet in declining order of liquidity 1. Current assets 2. Long-term assets
What is batch statistical learning?
Statistical learning techniques allow learning a function or predictor from a set of observed data that can make predictions about unseen or future data. These techniques provide guarantees on the performance of the learned predictor on the future unseen data based on a statistical assumption on the data generating process.
what are the differences and similarities between structured and unstructured data?
Structured Unstructured Organised Unorganised Fixed cell widths varying length Easily scanned, examined difficulty to scan structured text Unstructured text Understandable by comp need to translated Values are proscribed values not typical
What is the difference between supervised learning and unsupervised learning? Give concrete examples
Supervised learning: inferring a function from labeled training data Supervised learning: predictor measurements associated with a response measurement we wish to fit a model that relates both for better understanding the relation between them (inference) or with the aim to accurately predicting the response for future observations (prediction) Supervised learning: support vector machines neural networks, linear regression, logistic regression, extreme gradient boosting,churn prediction Supervised learning examples: predict the price of a house based on the are, size. predict the relevance of search engine results. Unsupervised learning: inferring a function to describe hidden structure of unlabeled data Unsupervised learning: we lack a response variable that can supervise our analysis Unsupervised learning: clustering principal component analysis, singular value decomposition identify group of customers Unsupervised learning examples: find customer segments, image segmentation, classify US senators by their voting.
What's the "kernel trick" and how is it useful?
The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.
What are your favorite use cases of machine learning models?
The Quora thread above contains some examples, such as decision trees that categorize people into different tiers of intelligence based on IQ scores. Make sure that you have a few examples in mind and describe what resonated with you. It's important that you demonstrate an interest in how machine learning is implemented.
What percentile does the mean represent?
The answer cannot be determined without further information. the mean's location depends upon the distribution of the data set.
Why do we need/want the bias term?
The answer is that bias values allow a neural network to output a value of zero even when the input is near one. Adding a bias permits the output of the activation function to be shifted to the left or right on the x-axis. Consider a simple neural network where a single input neuron I1 is directly connected to an output neuron O1. Bias is a vital concept for neural networks. Bias neurons are added to every non-output layer of the neural network. They are unique from ordinary neurons in two very significant ways. Firstly, the output from a bias neuron is always one. Secondly, a bias neuron has no inbound connections. The constant value of one makes the layer to respond with non-zero values even when the input to the layer is zero. This may be very crucial for certain data sets.
Python or R - Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.
Revenue realization principle
The company recognizes revenues on the accrual basis of accounting -Related to the income stmt
Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Basis for Qualified Opinion:
The company's financial statements do not disclose [describe the nature of the omitted information that is not practicable to present in the auditor's report]. In our opinion, disclosure of this information is required by accounting principles generally accepted in the United States of America.
What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are a) Supervised Learning b) Unsupervised Learning c) Semi-supervised Learning d) Reinforcement Learning e) Transduction f) Learning to Learn
enterprise data warehouse
The enterprise data warehouse (EDW) layer refers to the layers in which data are acquired, transformed, and stored for the long term in full granularity so the data either expand or shrink.
Standard error:
The estimated standard deviation of a sampling distribution
What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?
The general principle of an ensemble method is to combine the predictions of several models built with a given learning algorithm in order to improve robustness over a single model. Bagging is a method in ensemble for improving unstable estimation or classification schemes. While boosting method are used sequentially to reduce the bias of the combined model. Boosting and Bagging both can reduce errors by reducing the variance term.
When an auditor issues a Qualified Opinion how is the opinion paragraph modified?
The opinion paragraph should include the following, "In the auditor's opinion, *except for* the effects of the matter(s) described in the basis for qualified opinion paragraph, the financial statements are *presented fairly*, in all material respects, in accordance with the applicable financial reporting framework."
How to define/select metrics?
Type of task: regression? Classification? Business goal? What is the distribution of the target variable? What metric do we optimize for? Regression: RMSE (root mean squared error), MAE (mean absolute error), WMAE(weighted mean absolute error), RMSLE (root mean squared logarithmic error)... Classification: recall, AUC, accuracy, misclassification error, Cohen's Kappa...
What are various steps involved in an analytics project?
Understand the business problem • Explore the data and become familiar with it. • Prepare the data for modelling by detecting outliers, treating missing values, transforming variables, etc. • After data preparation, start running the model, analyse the result and tweak the approach. This is an iterative step till the best possible outcome is achieved. • Validate the model using a new data set. • Start implementing the model and track the result to analyse the performance of the model over the period of time.
Financial Statement Issues - None or immaterial
Unmodified (Unqualified)
Explain what regularization is and why it is useful.
Used to prevent overfitting: improve the generalization of a model Decreases complexity of a model Introducing a regularization term to a general loss function: adding a term to the minimization problem Impose Occam's Razor in the solution
When you call join operation on two pair RDDs e.g. (K
V) and (K, W), what is the result?,Ans: When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key
Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model
Validation using R2R2: - % of variance retained by the model - Issue: R2R2 is always increased when adding variables - R2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStotR2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStot Analysis of residuals: - Heteroskedasticity (relation between the variance of the model errors and the size of an independent variable's observations) - Scatter plots residuals Vs predictors - Normality of errors - Etc. : diagnostic plots Out-of-sample evaluation: with cross-validation
Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Auditor's Responsibility:
We believe that the audit evidence we have obtained is sufficient and appropriate to provide a basis for our QUALIFIED audit opinion.
Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Independent Auditor's Report. Auditor's Responsibility.
We believe that the audit evidence we have obtained is sufficient and appropriate to provide a basis for our adverse audit opinion.
What are the benefits and drawbacks of specific methods such as lasso regression?
We use an L1L1 penalty when fitting the model using least squares Can force regression coefficients to be exactly: feature selection method by itself β^lasso=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1||βj||}
What is a hire purchase?
When a company hires equipment with the option to buy at the end of the term
what are web crawlers?
Web crawlers, also known as info agents or web spiders, are internet bots (short term for robots) that search web sites one page at a time for information
Four P's Product
What are our Products and services? What is the company's niche?
Pricing Strategies Price-Based Costing
What are people willing to pay for the product?(If it is not more than your costs, it may not be worth making)(On the other hand consumers may be willing to pay much more than you could get just by adding a profit margin) What is the product worth to your buyer? Compare it to other products or services in their lives, what did they pay in those cases? Some times you need to look at the factors that could come from pricing something neccessary(Like heart medication for babies) to high, PR nightmere could result
Starting a New Business Steps 3. Market and Strategic Plans
What are the barriers to entering this market? Who are the major players and what are their respective market shares? What will the competitive response be?
Mergers and Acquisitions Due Diligence Elements
What shape is the company? The industry? How secure are its markets and customers? What are the margins? What is the best competitive response to aquistition? What are the legal issues?
Starting a New Business Steps 4. Distribution Channels
What types of distribution? How will we do this efficiently? Are they reliable?
Porter's 5 Forces 5. Bargaining Power of the Suppliers
When there are many suppliers and few buyers the buyers have the advantage but when there are many buyers and few suppliers the suppliers have the power
Which method is frequently used to prevent overfitting?
When there is sufficient data 'Isotonic Regression' is used to prevent an overfitting issue.
Whats a false positive?
When we wrongly reject the null hypothesis as highly probable
Developing a New Product 3. Think about the Customers
Who are our customer and what is important to them? How are they segmented? How can we best reach them? How can we ensure that we retain them? Consumer Adoption Rates(See Chart)
Both XML and HTML use tagging but what are their purposes?
XML tags are used to create metadata about data so that the data can be understood by computers for further processing and structuring. HTML is used to tag data so that browsers can display that data as a web page.
Check Pointing (SPARK)
You mark an RDD for checkpointing by calling RDD.checkpoint() . The RDD will be saved to a file inside the checkpoint directory and all references to its parent RDDs will be removed. This function has to be called before any job has been executed on this RDD.
If Sales are Flat and profits are down.....
You need to examine both revenue and costs Start with Revenue You cant make educated decisions about costs until you identify and understand the revenue streams
What is not Machine Learning?
a) Artificial Intelligence b) Rule based inference
Explain what is the function of 'Supervised Learning'?
a) Classifications b) Speech recognition c) Regression d) Predict time series e) Annotate strings
What are the five popular algorithms of Machine Learning?
a) Decision Trees b) Neural Networks (back propagation) c) Probabilistic networks d) Nearest Neighbor e) Support vector machines
Explain what is the function of 'Unsupervised Learning'?
a) Find clusters of the data b) Find low-dimensional representations of the data c) Find interesting directions in data d) Interesting coordinates and correlations e) Find novel observations/ database cleaning
How are neural nets related to Fourier transforms? What are Fourier transforms
for that matter?,
csat is the ___________ metric
golden
binomial distributions
have a fixed number of trials
yield on cost
measures the percent of dividend income your investment is generating from the purchase price
Which is more important to you- model accuracy
or model performance?,This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense? Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn't the be-all and end-all of model performance.
Is it better to have too many false positives
or too many false negatives? Explain.,It depends on the question as well as on the domain for which we are trying to solve the question. In medical testing, false negatives may provide a falsely reassuring message to patients and physicians that disease is absent, when it is actually present. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. So, it is desired to have too many false positive. For spam filtering, a false positive occurs when spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task. So, we prefer too many false negatives over many false positives.
Insider ownership
percentage of ownership in business that is: greater than 10% officers or directors of a company
what are examples of unstructured data?
pictures, audio recordings, and videos, although they commonly consist of blocks of text.
other than changing price
profit can increase through,-increase revenue w/o increasing volume 1.look for additional products/services you can bundle 2. branch into related capabilities) -increase volume 1. look for new uses of the product or service 2. market efforts to increase awareness, trial, and repeat purchase 3. ensure quality 4. add value-added features
improving bottom line -- profits E(P=R-C)M (sales up
profits flat),look at external factors first -- economy & market/industry -industry-wide problem or company problem? 1. analyze revenues: revenue streams, % of total revenue for each stream, is anything unusual in balance of %s, have %s changed lately? why? 2. examine costs: major costs, any major shifts in costs, any costs out of line? benchmark costs against competitors 3. volume: expand into new areas, increase sales force, increase marketing, reduce prices, improve customer service
return on invested capital (ROIC)
profits made by company on money from its capital base - net operating profits after taxes (doesnt included interest expense) divided by invested capital (assets minus cash and non-interest bearing current liabilities)
The __________ measures the rate of profits earned on the amount invested by the common stockholders.
rate earned on common stockholders' equity
dupont roi
return on sales x assets turn over
outcome - decision tree
triangle
current assets
subset of assets- any asset that can be quickly converted to cash
get rid of deadwood -determine short-term and long-term company goals -devise business plan -visit clients
suppliers & distributors, & reassure them -prioritize goals -- get some small successes ASAP to build confidence
surrogate ID (SID)
surrogate ID (SID) table to map the alphanumeric master data primary key to the numeric characteristic. Here is an example. A product may have the key DXTR1000 (Deluxe Touring Bike—Black).
Ticker
symbol used to identify the shares of a specific corporation
adjusted funds from operation AFFO
takes funds from operations and adjusts them for recurring capital expenditures, as well as other adjustments from management
Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Basis for Adverse Opinion: As described in Note X
the Company has not consolidated the financial statements of subsidiary XYZ company that it acquired during 2001,because it has not yet been able to ascertain the fair value of certain of the subsidiary's material assets and liabilities at the acquisition date. This investment is therefore accounted for on a cost basis by the Company.
The perfect strategy for the high cost producer is one that convinces....
the competition that market shares cannot be shifted except over long periods of time aka highest practical industry prices are an advantage to all because price wars are detrimental to all players in the market
customer journey
the complete sum of experiences that customers go through when interacting with your company and brand
Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Qualified Opinion Paragraph: When the auditor expresses a qualified opinion due to a material misstatement in the financial statements
the opinion paragraph,should state that, in the auditor's opinion, Except For the effects of the matter(s) discussed in the preceding paragraph, the financial statements are Presented Fairly, in all material respects, in conformity with the accounting principles generally accepted in the United States of America.
What does NLP stand for?
"Natural language processing"! Interaction with human (natural) and computers languages Involves natural language understanding Major tasks: - Machine translation - Question answering: "what's the capital of Canada?" - Sentiment analysis: extract subjective information from a set of documents, identify trends or public opinions in the social media - Information retrieval
Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:
(1).A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (a). If it is not practicable to quantify the financial effects, this should be stated.
PEG Ratio
(P0 / Expected EPS)/Growth Rate
Starting a new business - Venture capitalist appeal
*Management -What is the management team like? -What are their core competencies? -Have they worked together before? -Is there an advisory board? *Market & Strategic Plans -What are the barriers to entering this market? -Who are the major players and what kind of market share does each firm have? -What will the competitive response be? *Distribution Channels -What are our distribution channels? *Products -What is the product and technology? -What is the competitive edge? -What are the disadvantages of this product? -Is the technology proprietary? *Customers -Who are our customers? -How can we best reach them? Can we reach them on the Internet? -How can we ensure that we retain them? *Finance -How is the project being funded? -What is the best allocation of funds? -Can we support the debt? (What if interest rates change? What if the economy sours?)
Emphasis-of-Matter Paragraphs
*Nonissuers = Private* -Included in auditor's report when required by GAAS or auditor's discretion. Used when referring to a matter that is appropriately presented or disclosed in the FS and is very important and fundamental to users for understanding FS. *Does NOT affect the auditor's opinion - Stays unmodified* *Requirements* -Immediately after the opinion paragraph (before Other-Matters) -Titled "Emphasis-of-Matter" -Describe matter being emphasized and location of disclosures in FS -Indicate auditor's opinion is not modified w respect to matter emphasized
Three Framework Objectives
*O*perations Objectives - effectiveness and efficiency - ensuring that the assets of the organization are adequately safeguarded *R*eporting Objectives - focus of COSO - reliability, timeliness, transparency of an entity's financial and non financial reporting *C*ompliance Objectives - adhering to all applicable laws and regulations
Governance issues
-Board of directors -A group of individuals who are elected by the ownership of a corporation to have oversight and guidance over management and who look out for the shareholders' interests -Help explain political choices
Porter's 5 generic strategies:
-Cost leadership -Differentiation -Focus -Cost focus -Differentiation focus
Increasing profit - volume
-Expand into new areas. -Increase sales force. -Increase marketing. - Reduce prices. - Improve customer service.
Industry/Acquiring a diverse company - Suppliers
-Have the suppliers been consistent? -What is going on in their industry? -Will they continue to supply us?
What are Non-current/Fixed Assets/ how long are they held in the entity for?
-Held for more than one accounting period -Not intended for resale -Used in production of entity's goods or services Buildings, plant, equipment
Increasing sales - investigate industry & market
-How are we growing relative to the industry? -What has our market share done lately? - Have we gone out and asked customers what they want from us? - Are our prices in line with our competitors? -What have our competitors done in marketing and product development?
Developing a new product - market strategy
-How does this impact existing product line? -Are we cannibalizing one of our existing products? -Are we replacing one of our existing products? -How will this expand our customer base and increase our sales? -What will competitor's response be? -If it is a new market - what are barriers to entry? -Who are the major players? How much market share do they have? Who has the rest?
Developing a new product - financing
-How is the project being funded? - What is the best allocation of funds? - Can we support the debt? (What if interest rates change? What if the economy sours?)
Mergers and acquisitions - Exit strategies
-How long are they planning to keep it? - Did they buy it to break it up and sell off parts of it?
Company and industry analysis
-Identify economic characteristics -Identify strategies to achieve company's goals and objectives -Keys to competing in the industry (Quantitative analysis)
Stage 3: Decision stage
-Involves the quantitative strategic planning matrix (QSPM) -Reveals the relative attractiveness of alternative strategies and thus provides objective basis for selecting specific strategies
Profitability analysis
-Operating results of the company -Analyze the 12-month time period for the company
Objectives:
-Provide direction -Allow synergy -Aid in evaluation -Establish priorities -Reduce uncertainty -Minimize conflicts -Aid in both the allocation of resources and the design of jobs
s
01110011
u
01110101
x
01111000
what are the fours types of interactions?
1. Create new records or rows of data, 2. Read records 3. Update or change the value of the attributes in the record 4. Delete records
what are the types of tools
1. Data Exploration and Reporting—Typically these are tools that feature slicing and dicing
What are the benefits that ERP offers?
1. Data enter only once and they can share data in different areas 2. Changes to master data such as names and addresses are entered only once and are then used many times. 2. The data processing and storage functionality of all of the business processes are consolidated in a single system, the ERP. which helps reduces IT costs.
Dominant assets:
1. Receivables 2. Inventory 3. PP&E
Purpose of the Z-Score
1. To calculate the probability of an outcome occurring, given a normal distribution of outcomes 2. To compare two or more outcomes that come from different normal distributions
Describe supervised learning in more details
1. Training phase: Sample extracted from true labels is used to learn a family of models. 2. Validation phase 3. Test Phase 4. Application phases
competitive response
1. competitive analysis -what is new product? how does it differ from ours? -what has competition done differently? what's changed? -have any other comps picked up market share? 2. response actions -acquire competitor -merge w/ competitor -hire competitor's management -increase own profile with marketing and PR campaign
what questions do you ask for brand awareness?
1. for [product/service] what is the first brand name you can think of? 2. for [product/service] what are other brands you've heard of?
company factors
1. profit equation 2. product/service offering -value chain -differentiation 3. more Cs: collaborators, channels, competencies, capacity, culture
developing new product
1. think about product -what's special or proprietary? -is it patented? -substitutions? -advantages & disadvantages? -fit with rest of product line? 2. think about market strategy -effects on existing product line -cannibalizing existing products? replacing? -will it expand customer base and increase sales? -competitive response? -if new market, entrance barriers? -major players/market share 3. customers -who are they? -how best to reach them? Internet? -ensure retention 4. financing -how is it funded -best allocation of funds? -can we support the debt?
Five Forces (state of competition depends on these)
1. threat of new or potential entrants. (barriers of entry) 2. intensity of rivalry among existing competitors 3. pressure from sbustitution products 4. bargaining power of buyers 5. bargaining power of suppliers
market sizing formula
1. total population x % of customers in segment = # of customers targeted 2. # customers targeted x # units purchased per year = total # units 3. total # units x price per unit = total annual market size
001011
13
1101
13
1110
14
001101
15
001110
16
001111
17
010010
22
010011
23
010100
24
011001
31
011010
32
U.S. population
323 million
100001
41
100101
45
111000
70
rule of 72
72/r = number of years it would take an investment to double in value (ex: discount rate is 12%, so 72/12=6)
EVPI: Expected Value of Perfect Information
= The most a firm should pay for perfect information = EV(with perfect info.) - EV(baseline)
EVSI: Expected Value of Sample Information
= The most a firm should pay for sample information = EV(with sample info.) - EV(baseline)
Days payable outstanding
=Accounts payable/ cost of sales *365 how long a company waits before repaying creditors
Possible Values: ρcall
>0
Possible Values: ∆call
>0
What's a Fourier transform?
A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it's how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it's a very common way to extract features from audio signals or other time series such as sensor data.
What is classifier in machine learning?
A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous feature values and outputs a single discrete value, the class.
What is leasing?
A company gets the right to use an asset in exchange for regular payments to the owner.
Give an example of an Estimator
A learning algorithm is an Estimator which trains on a training set and produces a model
What is a sigmoid function and what is a logistic function?
A logistic function is a sigmoid used in logistic regression
Which of the following is equivalent to a plain vanilla receive fixed currency swap? A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. B) A short position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. C) A short position in a foreign bond coupled with a long position in a dollardenominated floating rate note.
A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. A long position in a fixed rate foreign bond will receive fixed coupons denominated in a foreign currency. The short floating rate note requires U.S. dollar denominated floatingrate payments. Combined, these are the same cash flow as a plain vanilla currency swap.
How can we distribute JARs to workers?
Ans: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.
what is a common use in business in spreadsheets?
Budgeting
Business intelligence (BI)
Business intelligence (BI) has been used to describe analytics in the context of business data. It focuses business data, financial data, and marketing data to gain business value, customer loyalty, and other benefits
If a company issues financial statements that purport to present financial position and results of operation
But omits the related statement of cash flows , the auditor will normally conclude that the omission requires qualification of the opinion.
Inventory Turnover
COGS/Average Inventory
What does the Cash Flow statement show?
Cash inflows and outflow of the organizations for the period that has just ended
Real-World Pricing (Replicating Portfolio)
Ce^(γh)=S∆e^(αh)+Be^(rh)
what is data analytics
Data analytics is the process that takes us from data to decision.
Use profitability framework when you hear
Decline in prices Decline in volume Increase in costs
Why ensemble learning is used?
Ensemble learning is used to improve the classification, prediction, function approximation etc of a model.
Professional Judgement
Exercise in planning and performing an audit. The audit requires interpretation of ethical requirements and GAAS. Make decisions about: -Materiality -Audit Risk -Nature, extent, and timing of audit procedures (*NET*) -Evaluating whether sufficient, appropriate evidence has been obtained (Support audit opinion, not FS) -Evaluating Management judgments in applying applicable financial reporting framework -Frawing conslusions based on evidence obtained
Screening
Filtering a set of potential investments into a smaller set that meet certain criteria -back-testing - applies securities selected to historical data
Branches of Accounting
Financial Accounting MGMT Accounting Treasury MGMT Auditing Taxation and VAT Consultancy (Financial MGMT Corp. Finance?)
Materiality of Problem: Material but not pervasive
Financial Statements Are Materially Misstated (Financial Statement Issues): Qualified opinion. Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Qualified opinion.
Increasing Sales Assessment Elements
Growth relative to market share Changes in market share Customer polls Prices Competitive? Competitors strategies(marketing and product development)
Code of Professional Conduct
Guidelines -Section: ET -Standard Setting: AICPA -AICPA Code of Professional Conduct provides members with guidelines for behavior in the conduct of their professional affairs. Provides assurance to public that profession intends to maintain high standards and to enforce compliance with these standards by its members -Applies to: Members of AICPA
Developing a New Product 4. Funding
How is the product being funded? Does our company have the cash or are they taking on debt? Can we support this debt under various economic conditions? What is the best allocation for funds?
Starting a New Business Steps 7. Finances
How is the project being funded? What is the best allocation of funds? Can we support the debt under various economic conditions?
Mergers and Acquisitions Exit Strategies Elements
How long to keep it? Divest parts of the organization?
Industry Analysis Suppliers
How many? Product availability? What's going on in their market?
High p-value:
Hypothesis being true is likely, therefore we will not reject the hypothesis
What is Inductive Logic Programming in Machine Learning?
Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical programming representing background knowledge and examples.
Estimating Volatility
Let x(i)=ln[S(i)/S(i-1)]. Then E[x^2]=∑[x(i)^2/n], x-bar=∑[x(i)/n] and s^2=[n/(n-1)]*(E[x^2]-x-bar^2) => σ≈√s^2√t
what are the characteristics for informational systems?
Level of detail Periodic Requirements are not always known Managerial requirements Optimized for access Historical data Data can be integrated avalaibility
Industry Analysis Current Industry Structure
Life Cycle(Growth, transition, maturity) Performance, Margins Major Player and Market Share Industry change(new players, new technology) Drivers(brand, size, technology)
If Profits are declining yet Revenues have gone up......
Look to see if there were Changes in Cost any additional expenses changes in price the product mix changes in customer needs
2) Mention the difference between Data Mining and Machine learning?
Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns. During this process machine, learning algorithms are used.
When to Issue a "Qualified Opinion"
Misstatements are material but *NOT* pervasive Examples: 1. Inadequate Disclosure 2. Material Misstatement
Power
My decision: Reject Ho The truth: Ho is false
What are the a main components of the statement of cash flows?
Opening balance Receipts Payments Closing Balance
What are they 3 components of the statement of cash flows?
Operating Investing Financing
Profit
Revenue - Costs OR Revenue*Margin %
How can you iterate over a list and also retrieve element indices at the same time?
This can be done using the enumerate function which takes every element in a sequence just like in a list and adds its location just before it.
Unstructured Data
Unstructured data, are just that, "unstructured," meaning that they do not conform to data models and associated metadata.
The relationship between sales and accounts receivable may be stated as a _____ turnover.
accounts receivable
shared values
company's principles
skills
competencies
If a company issues financial statements that purport to present financial position and results of operations but
omits the related statement of cash flows, the auditor will normally conclude that the omission requires a qualified opinion.
asset to
roi/profit margin
Sample size assumption
sample size must be sufficiently large
The excess of current assets over current liabilities is called_____.
working capital
if profits are declining
yet rev increased examine...,-change in costs -additional expenses -changes in prices -the product mix -change in customer needs
company (cheng)
- capabilities and expertise - distribution channels - cost structure (fixed vs. variable) - investment cost - intangibles (brand, reputation, etc.) - financial situation - organizational structure
competition (cheng)
- competitor market share concentration - competitor behaviors (target segment, products, pricing, distribution) - best practices (are they doing things we're not) -barriers to entry (do we need to worry about new entrants to market) - supplier concentration -regulatory environment
The income stmt is a presentation of operating results under accrual basis
-Assumed it is not a good reflection of cash inflows and outflows (current) -BUT, if company is doing a reasonable job collecting receivables and paying payables, accrual basis should give us a good idea of what the future will look like -Illustrate what cash flows will look like
Stage 2: Matching stage
-Focuses on generating feasible alternative strategies by aligning key external and internal factors -Techniques include the strengths-weaknesses-opportunities-threats (SWOT) matrix, the strategic position and action evaluation (SPACE) matrix
Strategic position and action evaluation (SPACE) matrix
-Four quadrant framework indicates whether aggressive, conservative, defensive or competitive strategies are most appropriate for a given organization -2 internal and 2 external dimensions -End up with a coordinate point
When to give an adverse opinion (examples)
-GAAP consistency change (unjustified) = auditor disagrees -Inadequate disclosure -Departure from GAAP (unjustified) -Unreasonable accounting estimate
The grand strategy matrix: Quadrant IV
-Have characteristically high cash flow levels and limited internal growth needs and often can pursue related or unrelated diversification successfully
Competitive profile matrix
-Identifies firm's major competitors and their strengths and weaknesses in relation to a sample firm's strategic positions -Critical success factors include internal and external issues
Growth strategies - Investigate industry
-Is the industry growing? - How are we growing relative to the industry? -Are our prices in line with our competitors? - What have our competitors done in marketing and product development? - Which segments of our business have the highest future potential? - Do we have funding to support higher growth?
Mergers & Acquisitions - Price
-Is the price fair? - How are they going to pay for it? - Can they afford it? - If the economy sours, can they still make their debt payments?
Strategy/Performance *levels of and changes in:*
-Performance Measures -Critical Success Factors -Alignment of Strategy and results Time Series and/or Cross-sectional
Accounting is
-the classification and recording of transactions. -the presentation and interpretation of the results of those transactions. -the projection of future activities, selection of best strategy
entering a new market
1. why? goal/objective 2. determine state of current and future market 3. investigate market -- does entering it make good business sense? 4. major ways to enter new market: start from scratch, acquire existing player, joint venture
population data: India
1.2B
India
1.3 billion
population data: China
1.3B
China
1.4 billion
Issuer Report Adverse Opinion
1.Introductory paragraph: no change. 2.Scope paragraph: no change. 3.Middle paragraph. 4.Adverse opinion paragraph.
Issuer Report Qualified Opinion
1.Introductory paragraph: no change. 2.Scope paragraph: no change. 3.Middle paragraph. 4.Qualified opinion paragraph.
1100
12
population data: Japan
125M
010101
25
011011
33
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include:
(1) A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (a). If it is not practicable to quantify the financial effects, this should be stated.
Material Misstatements related to Appropriateness of Accounting Policies
(1) Accounting policies are not in accordance with the applicable financial reporting framework. (2) The financial statements do no represent the underlying transactions and events in a manner that achieves fair presentation. (3) The entity has not compiled with the financial reporting framework requirements for accounting for and disclosing changes in accounting policies.
Material Misstatements related to Application of Accounting Policies
(1) Management has not applied accounting policies in accordance with the applicable reporting framework (i.e. Expensing rather than capitalizing a fixed asset). (2) Management has not applied accounting policies consistently between periods or to similar transactions and events. (3) There is an error in application of an accounting policy.
Nature of Material Misstatements. (≠ GAAP) A material misstatement of the financial statements may arise in relation to the following:
(1) The appropriate of accounting policies. (2) The application of accounting policies. (3) The appropriateness of the financial statement presentation or the appropriateness or adequacy of disclosure in the financial statements.
Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:(3)
(2). An explanation of how disclosures are misstated if there is a material misstatement related to narrative disclosure. (3). A description of the nature of omitted information and inclusion of information, when practicable, if there is an omission that is required to be presented or disclosed.
The Independent Audit Function: The Basics
*GAAS* -Provide FS users with opinion on whether the FS are presented fairly, in material respect, in accordance to applicable financial reporting framework. -Applicable reporting framework: acceptable in view of the nature of the entity and objective of FS or required by law or regulation. Ex. GAAP or IFRSs, and special purpose framework -Auditor report gives credibility to FS. Have an objective view and report on companies activities without bias or conflict of interest -FS: prepared by management of company, not by auditor. They are product and property of company.
Other Auditing Publications
*Lease Authoritative - 3rd Level of audit guidance* -No authoritative status but may be helpful to auditor -Examples: Auditing articles in hournal of accountancy, textbooks
Financial Statements
-A Summary of Financial Transactions -Make up Annual Report
what the components of architected data mart layer?
1. Business transformation layer uses business logic to transform transactional data from the propagation layer into their business context. 2. reporting layer- the read-optimized data cube that is used for queries and analytics. 3. operational data store (ODS)- maintains operational data that may be subject to changes. 4. virtualization layer- This layer is available to virtualize reporting structures.
3 Questions generally answered by accounting
1. How are we doing? -a scorecard 2. What problems should be looked at? -attention directing 3. What is the best way to do a job? -problem solving
four Ps
1. product 2. price 3. place/placement 4. promotion
011100
34
days sales outstanding
= accounts receivable / total sales * 365 how long it takes to collect its sales
csat
= asking the question: how likely are you to recommend __________ to a friend/colleague? (aka net promoter scale)
take rate
= number of accepted offers / number of contacts
test drive
= the customer pretest of a product or service prior to purchase
Most organizations simultaneously pursue
A combination of two or more strategies, but a combination strategy can be exceptionally risky if carried too far -No org can afford to pursue all the strategies that might benefit them -Difficult decisions must be made and priorities must be established
data warehouse
A data warehouse is a database architecture that provides the persistent (permanent) storage of summarized, harmonized, cleansed, and consolidated data, often from multiple sources, to serve as a single source of truth for decision making
What's the difference between a generative and discriminative model?
A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.
What section(s) is added to the Auditor's report when a Qualified or Adverse Opinion is issued for an issuer?
A middle paragraph(s) is added to the Auditor's report that is placed immediately before the opinion paragraph.
For a change in which of the following inputs into the BlackScholesMerton option pricing model will the direction of the change in a put's value and the direction of the change in a call's value be the same? A) Volatility. B) Exercise price. C) Riskfree rate.
A) Volatility. A decrease/increase in the volatility of the price of the underlying asset will decrease/increase both put values and call values. A change in the values of the other inputs will have opposite effects on the values of puts and calls.
What is A/B Testing?
A/B Testing is a experiment design method where we compare 2 versions of a web page or an ad and see which one performs better. An example of this is email marketing: -Cx database of 2000 people -1000 send: "offer ends this week!" -1000 send: "offer ends soon!" -and compare which one is better in terms of purchases
average true range
ATR- measure of volatility of a security. simple moving average (14 day) of a companys true range. true range is the highest of: high in a period minus low in a period abs(period high - previous close) abs(period low - previous close)
Which is true about the normal distribution?
About 95% of observations are within 2 standard deviations of the mean About 68% of observations are within 1 standard deviation of the mean It is symmetric
What is the center value of the distribution of the sample means?
According to the Central Limit Theorem, if we take enough large samples, the mean of the set of sample means equals the population mean.
Current Liabilities
Accounts payable +notes payable (short term)
Competitive Response Strategy Element
Acquire a Competitor Merge with Competitor Copy a Competitor Hire the Competitor's Mangement Increase profile with marketing campaign
WO: Weaknesses and opportunities
Aim at improving internal weaknesses by taking advantage of external opportunities
How would you simulate the approach AlphaGo took to beat Lee Sidol at Go?
AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning. The Nature paper above describes how this was accomplished with "Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play."
Operating cycle
Amount of times it takes to turn inventory into cash
What are some differences between a linked list and an array?
An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.
A material change in accounting principle would result in the addition of
An emphasis-of-matter paragraph to the unmodified opinion.
what is an exmaple structured data?
An example of structured data is a database about restaurants. It stores attributes such as restaurant name, location, phone number, and cuisine.
What is power analysis?
An experimental design technique for determining the effect of a given sample size.
How would you control the number of partitions of a RDD?
Ans You can control the number of partitions of a RDD using repartition or coalesce operations.
What is Speculative Execution of a tasks?
Ans: Speculative tasks or task strugglers are tasks that run slower than most of the all tasks in a job.
what are the characteristics of transactional Systems?
Availability—Because businesses cannot afford to lose any computing time due to system failures, systems that process transactions should be available as close to 100% of the time as possible. Level of Detail—The data of transactional systems should be available in full detail so that each transaction as well as its content, creator, date, and details are available at all times. Updatable—By their very nature, business transactions are created, updated or changed and deleted quite frequently. Speed—The ability to process large quantities of transactions is critical in business systems. Current—Transactional systems are current, which means they store only recent transactions, frequently a year or two of data. Operational—OLTP systems are operational in nature
Customer Lifetime Value
Avg Customer Contribution Margin ($) per year* Customer Lifetime OR Avg Customer Contrubution Margin ($)* (Retention Rate/(1 + Discount - Retention Rate)) --> calculates time value of money using a discount rate (more time consuming)
The floatingrate payer in a simple interestrate swap has a position that is equivalent to: A) a series of long forward rate agreements (FRAs). B) a series of short FRAs. C) issuing a floatingrate bond and a series of long FRAs.
B) a series of short FRAs. The floatingrate payer has a liability/gain when rates increase/decrease above the fixed contract rate the short position in an FRA has a liability/gain when rates increase/decrease above the contract rate.
Breakeven
BErev = FC / CM% CM% = (rev-vc)/rev BE units = FC/CM CM= $/unit - vc/unit
What is Bayes' Theorem? How is it useful in a machine learning context?
Bayes' Theorem gives you the posterior probability of an event given what is known as prior knowledge. Mathematically, it's expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition. Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test? Bayes' Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu. Bayes' Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That's something important to consider when you're faced with machine learning interview questions.
Why is randomization important in experimental design?
Because it balances out confounding variables. You can ensure possible confounding variables are balanced out.
Entering the Market - Why?
Brainstorm assumptions & clarify with interviewer: -What is the goal? -What is our objective? -How does this fit into overall strategy?
Why is Accounting Necessary?
Budgeting Financial Statements Investment Review Business Plan
A U.S. firm (U.S.) and a foreign firm (F) engage in a 3year annual pay plainvanilla currency swap U.S. is the fixed rate payer in FC. The fixed rate at initiation was 5%. The variable rate at the end of year 1 was 4% at the end of year 2 was 6%, and at the end of year 3 was 7%. At the beginning of the swap, $2 million was exchanged at an exchange rate of 2 foreign units per $1. At the end of the swap period the exchange rate was 1.75 foreign units per $1. At the end of year 1, firm: A) F pays firm U.S. $200,000. B) U.S. pays firm F $200,000. C) U.S. pays firm F 200,000 foreign units.
C) U.S. pays firm F 200,000 foreign units. A plainvanilla currency swap pays floating on dollars and fixed on foreign. Fixed on foreign 0.05 × $2,000,000 × 2 foreign units per $1 = 200,000 foreign units paid by the U.S. firm.
The major benefit of the BCG matrix is that it draws attention to the
Cash flow, investment characteristics, and needs of an organization's various divisions
Profitability is associated with the
Cash flows and income statements
When should you use classification over regression?
Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points. You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.)
Which technique is used to predict categorical responses?
Classification technique is used widely in mining for classifying data sets.
Why data cleaning plays a vital role in analysis?
Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.
What is the difference between Cluster and Systematic Sampling?
Cluster sampling is a technique used when it becomes difficult to study the target population spread across a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample where each sampling unit is a collection, or cluster of elements. Systematic sampling is a statistical technique where elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a circular manner so once you reach the end of the list,it is progressed from the top again. The best example for systematic sampling is equal probability method.
Industry Analysis Approachs
Current Industry Structure Supplier Future
Which data scientists do you admire most? which startups?
DJ Patil, First US Chief Data Scientist, for using Data Science to make US government work better. Hadley Wickham, for his fantastic work on Data Science and Data Visualization in R, including dplyr, ggplot2, and Rstudio.
Key features of big data
Data is available in real time Data is available at a larger scale Data is available on novel types of variables
What are Expenses?
Decreases in economic benefits during the accounting period in the form of outflows or depletions of assets or increases of liabilities that result in decreases in equity, other than those relating to distributions to equity participants. Salaries and wages Rent and rates Light and heat Stationary costs Director Expenses Depreciation
Dimension Tables
Details regarding the master data are stored in separate tables
Competitors include (3):
Direct competition Substitute solutions Potential entrants
Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Qualified Opinion: In our opinion
EXCEPT for the omission of the information described in the Basis for Qualified Opinion paragraph, the financial statements referred to,above PRESENT FAIRLY, in all material respects, the financial position of ABC Company as of December 31, 2001 and 2000, and the results of its operations and its cash flows for the years then ended in accordance with accounting principles generally accepted in the United States of America.
Ex: Hacker Gaurd has been the industry leader in ID-theft monitoring but inconsistent profits and losses in last 6/10 quarters has made it had it hard to do an IPO How can it reduce the turmoil and increase profits?
Economy Industry Revenues Costs
Symmetric
For every x,y ∈ A, xRy → yRx | if x is related to y then y is related to x
An auditor may express a disclaimer of opinion when the auditor is unable to obtain sufficient appropriate audit evidence on which to base an opinion.
For example, when an auditor is unable to determine the extent of or the amounts associated with a pervasive employee fraud scheme , there is not sufficient appropriate audit evidence , and an expression of disclaimer of opinion may be appropriate.
Forward integration
Gaining ownership or increased control over distributors or retailers
The Audit Process
General Principles: Overall objectives, documentation, communication, quality control - firm 1. Engagement Acceptance: -Ethics and independence -Terms of engagement 2. Assess Risk and Plan Response: -Audit planning, including audit strategy -Materiality -Risk assessment procedures: understand the entity and environment & understand internal control -Identify and assess risk -Respond to risk 3. Perform Producedures and Obtain Evidence -Test of controls, if applicable -Substantive Testing 4. Form Conclusions -Subsequent Events -Management representation -Evaluate audit results -Quality Control - engagement 5. Reporting -Report on audited financial statements -Other reporting considerations
How would we reduce variance?
Get more data / decrease complexity of the mode
Porter's 5 forces
Good for Entering New Market Developing New Product Starting a new business 1. The threat of new or potential entrants 2. Intensity of rivalry among existing competitors 3. Pressure from substitution products 4. Bargaining power of buyers 5. Bargaining power of suppliers
what languages use tagged data?
HTML XML XBRL
Cash cows III
High market share, low growth rate -Generating cash in excess
Strategy/Performance *Strategy - Low cost provider*
High volume, low margin
Porter's 5 Forces 1. Threat of New or Potential Entrants
If barriers are high then new comers can expect entrenchment or retaliatory forces from the existing competitors. Some Barriers to Entry are: Economies of Scale Capital Requirements Government Policy Switching Costs Access to Distribution Channels Product Differentiation Proprietary Product Technology
What is Perceptron in Machine Learning?
In Machine Learning, Perceptron is an algorithm for supervised classification of the input into one of several possible non-binary outputs.
What is a statistical interaction?
In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of regression analyses.
What is supervised learning?
In supervised learning, tuples of examples (input, desired output) are available and the computer uses this to build a model where a given input produces an output (with minimal error)
Can you cite some examples where both false positive and false negatives are equally important?
In the banking industry giving loans is the primary source of making money but at the same time if your repayment rate is not good you will not make any profit, rather you will risk huge losses. Banks don't want to lose good customers and at the same point of time they don't want to acquire bad customers. In this scenario both the false positives and false negatives become very important to measure. These days we hear many cases of players using steroids during sport competitions Every player has to go through a steroid test before the game starts. A false positive can ruin the career of a Great sportsman and a false negative can make the game unfair.
Define Unsupervised Machine Learning
In unsupervised learning, there is only an input data (X) but the output variable isn't known. The algorithm is left on its known to discover underlying patterns or structures within the data. Two common types of unsupervised learning algorithms: 1. Clustering: used in marketing where we try to grouping customers by purchasing behaviour. 2. Association Rule Mining: determine underlying patterns, such as people who buy X tend to also buy Y
What is 'Training set' and 'Test set'?
In various areas of information science like machine learning, a set of data is used to discover the potentially predictive relationship known as 'Training Set'. Training set is an examples given to the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner. Training set are distinct from Test set.
Option Greek Definition: Vega
Increase in option value per percentage point increase in volatility (0.01∂C/∂σ)
Increasing Sales How? Element
Increase volume Increase amount of each sale Increase Prices Create seasonal balance
Why instance based learning algorithm sometimes referred as Lazy learning algorithm?
Instance based learning algorithm is also referred as Lazy learning algorithm as they delay the induction or generalization process until classification is performed.
What cross-validation technique would you use on a time series dataset?
Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data — it is inherently ordered by chronological order. If a pattern emerges in later time periods for example, your model may still pick up on it even if that effect doesn't hold in earlier years! You'll want to do something like forward chaining where you'll be able to model on past data then look at forward-facing data. fold 1 : training [1], test [2] fold 2 : training [1 2], test [3] fold 3 : training [1 2 3], test [4] fold 4 : training [1 2 3 4], test [5] fold 5 : training [1 2 3 4 5], test [6]
Cash Conversion Cycle
Inventory Conversion + Receivables Collection Period + Payables Deferral Period
How do you think Google is training data for self-driving cars?
Machine learning interview questions like this one really test your knowledge of different machine learning methods, and your inventiveness if you don't know the answer. Google is currently using recaptcha to source labelled data on storefronts and traffic signs. They are also building on training data collected by Sebastian Thrun at GoogleX — some of which was obtained by his grad students driving buggies on desert dunes!
Rate Earned on Common Stockholders' Equity = ( ? - ? ) / ?
Net Income - Preferred Dividends Average Common Stockholders' Equity
Earnings per Share (EPS) on Common Stock = ( ? - ? ) / ?
Net Income - Preferred Dividends Shares of Common Stock Outstanding
Rate Earned on Stockholders' Equity = ? / ?
Net Income / Average Total Stockholders' Equity
Number of Times Preferred Dividends Are Earned = ? / ?
Net Income / Preferred Dividends
Net Profit Margin (%)
Net Profit ($) / Revenue
Ratio of Net Sales to Assets = ? / ?
Net Sales / Average Total Assets (exclude long-term)
Net profit margin
Net income/sales revenue
Competitive Response Why? element
New Product? Competitor Strategy Changed? Other competitors increased markets share?
Intangible Asset
Not a separate entity in balance sheet Intellectual property: patents, copyright, goodwill
How can outlier values be treated?
Outlier values can be identified by using univariate or any other graphical analysis method. If the number of outlier values is few then they can be assessed individually but for large number of outliers the values can be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values.The most common ways to treat outlier values - 1) To change the value and bring in within a range 2) To just remove the value.
What are the elements of the Company Description (7)?
Overview/Goals Basic product offering Company history Markets to be served Company location Stage of business Financing to date
P(A and B) =
P(A) * P(B)
Put Profit
P(S(h),K,T-h)-P(S(0),K,T)e^(rh)
What does P-value signify about the statistical data?
P-value is used to determine the significance of results after a hypothesis test in statistics. P-value helps the readers to draw conclusions and is always between 0 and 1. • P- Value > 0.05 denotes weak evidence against the null hypothesis which means the null hypothesis cannot be rejected. • P-value <= 0.05 denotes strong evidence against the null hypothesis which means the null hypothesis can be rejected. • P-value=0.05is the marginal value indicating it is possible to go either way
P/E Ratio
P0 / EPS Expected in One Year
|Critical value|<|Test statistic|
Reject null hypothesis. There is sufficient evidence that Ha is true
Replication
Replication ensures that the source data remain intact, they can be in real time or in batches.
Regularization
Smoothing a model to prevent overfitting
Facial recognition software is able to recognize your friends on FB so you can tag them on it
Supervised Learning
How is kNN different from k-means clustering?
Supervised classification algorithm, unsupervised clustering algorithm
What is the difference between supervised and unsupervised machine learning?
Supervised learning requires training labeled data. For example, in order to do classification (a supervised learning task), you'll need to first label the data you'll use to train the model to classify data into your labeled groups. Unsupervised learning, in contrast, does not require labeling data explicitly.
What is the difference between heuristic for rule learning and heuristics for decision trees?
The difference is that the heuristics for decision trees evaluate the average quality of a number of disjointed sets while rule learners only evaluate the quality of the set of instances that is covered with the candidate rule.
List down various approaches for machine learning?
The different approaches in Machine Learning are a) Concept Vs Classification Learning b) Symbolic Vs Statistical Learning c) Inductive Vs Analytical Learning
What are the different methods for Sequential Supervised Learning?
The different methods to solve Sequential Supervised Learning problems are a) Sliding-window methods b) Recurrent sliding windows c) Hidden Markow models d) Maximum entropy Markow models e) Conditional random fields f) Graph transformer networks
Central Limit Theorem
The distribution of the sample average tends to be normal, even when the distribution from which it is taken is non-normal!
What is bias-variance decomposition of classification error in ensemble method?
The expected error of a learning algorithm can be decomposed into bias and variance. A bias term measures how closely the average classifier produced by the learning algorithm matches the target function. The variance term measures how much the learning algorithm's prediction fluctuates for different training sets.
Prices are only stable when 3 conditions are met.....
The growth rate for all competitors is approx. the same The prices are paralleling costs The prices of all competitors are roughly of equal value
what are the different types of data modeling?
The hierarchical model- assume parent child relationship between data. The network model- enhanced version of the first model where the child can have more than one parent. Object-oriented data modeling- organisation of entities as objects. Relational data modeling
What are the components of relational evaluation techniques?
The important components of relational evaluation techniques are a) Data Acquisition b) Ground Truth Acquisition c) Cross Validation Technique d) Query Type e) Scoring Metric f) Significance Test
What is inductive machine learning?
The inductive machine learning involves the process of learning by examples, where a system, from a set of observed instances tries to induce a general rule.
What is P-value and what does it signify about the statistical data?
The p-value is used to determine the significance of a result when we are conducting a Hypothesis Test in Statistics. Generally, when the p-value <= 0.05, it indicates strong evidence AGAINST the null hypothesis (or status quo) which means the null hypothesis can be rejected. (e.g. Pizza Hut claims 30 minute guarantee)
If Sales and Marketshare are increasing but profits are declining.....
Then you need to investigate whether prices are dropping and/or costs are climbing. How ever if costs arent the issue then investigate the product mix and check to see if the margins have changed
Credit event binary options (CEBO)
These are options that provides a fixed payoff if a particular company suffers a credit event such as bankruptcy, failure to pay interest or principal on debt, and a restructuring of debt.
Unrelated diversification v Related Diversification
Unrelated diversification No exchanges or linkages among divisions Easiest and cheapest strategy to manage Allows corporate managers to evaluate division performance accurately Divisions have considerable autonomy unless Related Diversification Gains from pursuing multibusiness model are derived from the transfer, sharing, and leveraging of R&D knowledge, industry information, customer bases across divisions Company needs to develop corporate culture that stresses cooperation among divisions and the corporate team rather than focusing purely on divisional goals Rewarding divisions more difficult because divisions share activities
SO: Strengths and opportunities
Use a firm's internal strengths to take advantage of external opportunities
What's a null hypothesis?
We want to see if we can reject the status quo as being highly improbable
Entering a New Market 2. Determine Current and Future Market States
What is the size of the current market? What is the growth trend? Where is the industry in its life cycle?(Stage of development: Emerging, Mature, Declining?) Who are the customers and how are the segmented? What role does technology play in the industry and how quickly will it change? How will the competition respond?
Entering a New Market 3. Investigate the market to determine whether it makes good business sense(Porters 5 forces)
Who are our competitors and what size market share do they have? How do their products differ from ours? How will we price our products or services? Are substitutions available? Are their any barriers to entry?( Ex: Capital Reqs, Access to Raw Materials, Access to Distribution Channels, Gov Policy) Are there any barriers to exit? How would we exit if the market sours? What are the risks?(Markets regulations or Technology)
New Product Customers Elements
Who? How to reach them? Retention- how to hold them?
Entering a New Market 1. Questions about the Company!
Why does the company want to enter this market? What are the different Revenue Streams and Trends? What is their Product Mix? What are their costs and how have they changed over time? What makes up their customer segmentation? What constitutes Success?(How much market share and what time frame?)
Competitive Response Approach
Why? Strategy
Spark and HDFS
With HDFS the Spark driver contacts NameNode about the DataNodes (ideally local) containing the various blocks of a file or directory as well as their locations (represented as InputSplits ), and then schedules the work to the "SparkWorkers. Spark's compute nodes / workers should be running on storage nodes.
How do you handle missing or corrupted data in a dataset?
You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value. In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.
Do you have experience with Spark or big data tools for machine learning?
You'll want to get familiar with the meaning of big data for different companies and the different tools they'll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don't have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you'll want to invest in familiarizing yourself with them.
For K₁>K₂>K₃
[P(S,K₁,T)-P(S,K₂,T)]/[K₁-K₂] ? [P(S,K₂,T)-P(S,K₃,T)]/[K₂-K₃],≥
X follows a normal distribution with mean m and standard deviation s. Which of the following must be true about X?
a. Median of X is m b. Expected value of X is m c. Nearly 95% of values of X lie between m-2s and m+2s d. 50% of value lie above m e. Interquartile range of X is less than 2s
Auditor's Responsibility Paragraph Changes
add in "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the adverse audit opinion"
Auditor's Responsibility Paragraph Changes
add in "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the qualified audit opinion"
dividend history
amount of time a company has paid a dividend dividend aristocrats is 25+ yrs of growth dividend kings is 50+ yrs of growth
An algorithm gets input data of a thousand pictures and groups the pictures into humans
animals, scenery,Unsupervised Learning
Beneish- M score
are companies manipulating earnings includes 5 variables -6.065 + 0.823(DSRI) + 0.906(GMI) + 0.593(AQI) + 0.717(SGI) + 0.107(DEPI) DSRI= days sales in receivables invdex GMI= gross margin index AQI= asset quality index SGI= sales growth index DEPI= depreciation index
Differentiate between univariate
bivariate and multivariate analysis.,descriptive statistical analysis techniques, pie charts of sales based on territory, difference between 2 variables, scatter plot, analyzing the volume of sale and a spending, study of more than two variables
You have 20 bottles of pills. 19 bottles have 1.0 gram pills
but one has pills of weight 1.1 grams. Given a scale that provides an exact measurement, how would you find the heavy bottle? You can only use the scale once.,Because we can only use the scale once, we know something interesting: we must weigh multiple pills at the same time. In fact, we know we must weigh pills from at least 19 bottles at the same time. Other wise, if we skipped two or more bottles entirely, how could we distinguish between those missed bottles? Remember that we only have one chance to use the scale. So how can we weigh pills from more than one bottle and discover which bottle has the heavy pills? Let's suppose there were just two bottles, one of which had heavier pills. If we took one pill from each bottle, we would get a weight of 2.1 grams, but we wouldn't know which bottle contributed the extra 0.1 grams. We know we must treat the bottles differently somehow. If we took one pill from Bottle #1 and two pills from Bottle #2, what would the scale show? It depends. If Bottle #1 were the heavy bottle, we would get 3.1 grams. If Bottle #2 were the heavy bottle, we would get 3.2 grams. And that is the trick to this problem. We know the "expected" weight of a bunch of pills. The difference between the expected weight and the actual weight will indicate which bottle contributed the heavier pills, provided we select a different number of pills from each bottle. We can generalize this to the full solution: take one pill from Bottle #1, two pills from Bottle #2, three pills from Bottle #3, and so on. Weigh this mix of pills. If all pills were one gram each, the scale would read 210 grams (1 + 2 + • • • + 20 = 20 * 21 / 2 = 210). Any "overage" must come from the extra 0.1 gram pills. This formula will tell you the bottle number: weight- 210 grams 0. l grams So, if the set of pills weighed 211.3 grams, then Bottle #13 would have the heavy pills.
why do most ads fail?
consumers don't know which company it's for (example- nivea electric shaver)
Inventory Turnover = ? / ?
costs of goods sold / average inventory
Formula: ∆call
e^(-∂T)*N(d1)
in other words
exceptions to the norm. 3. Intelligent control systems
Abnormal non-recurring revenues
expenses, gains, and losses (Possible but not probable),1. Income from discontinue operations 2. Extraordinary items 3. Cumulative effect of a change in accounting principle
Appropriateness of Financial Statement Presentation or Disclosures
financial statements don't include all required disclosures disclosures aren't presented in accordance with the applicable financial reporting framework financial statements don't provide the disclosures needed to achieve fair presentation information required hasn't been included or disclosed in the financial statements
Cash Return on invested capital (CROIC
free cash flows devided by invested capital (assets minue cash and non-interest bearing current liabilities)
Non-sampling errors cannot be fixed by
having a larger sample size
active share
how different a fund or portfolios holdings are from the benchmark greater the difference means a higher active share
Entering a New Market 4. If we decide to enter the market
how do we do it?,Start from scratch and grow organically Aquire an existing player from within our industry Form a joint ventre/ Strategic alliance with another player with a similiar interest. What can both sides bring to the venture? Cost benefit analysis of each one
During analysis
how do you treat missing values?,The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored.There are various factors to be considered when answering this question- Understand the problem statement, understand the data and then give the answer.Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important. If it is a categorical variable, the default value is assigned. The missing value is assigned a default value. If you have a distribution of data coming, for normal distribution give the mean value. Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.
Please tell me
how execution starts and end on RDD or Spark Job,Ans: Execution Plan starts with the earliest RDDs (those with no dependencies on other RDDs or reference cached data) and ends with the RDD that produces the result of the action that has been called to execute.
inventory turnover ratio
how many times in one year a companys inventory is being replaced. sales divided by average inventory in period
brand equity question
how much more are consumers willing to pay for a branded product versus a non-branded product (good branding enables a price premium over non-branded products)
systems
information, budgeting, planning, innovation, compensation, performance measurement
In experimental design
is it necessary to do randomization? If yes, why?,
p-value
is the probability of obtaining the observed sample results, or more extreme results, when the null hypothesis is actually true.
A key limitation of covariance as a descriptive measure is that it
is very sensitive to the units of the variables
momentum
it is a measure of past performance of a stock positive momentum has been shown to produce market beating returns over the next month 12 month
The difference between the rate earned on stockholders' equity and the rate earned on total assets is called ______.
leverage
structure
lines of authority, chains of command, communication channels
Application of Accounting Policies
management hasn't applied accounting policies in accordance with the applicable financial reporting framework management hasn't applied accounting policies consistently error in the application of an accounting policy
How would you hint
minimum number of partitions while transformation ?,Ans: You can request for the minimum number of partitions, using the second input parameter to many transformations. scala> sc.parallelize(1 to 100, 2).count Preferred way to set up the number of partitions for an RDD is to directly pass it as the second input parameter in the call like rdd = sc.textFile "hdfs://... /file.txt", 400) , here400 is the number of partitions. In this case, the partitioning makes for 400 splits that would be done by the Hadoop's Te tI putFor at , ot "park a d it ould ork u h faster. It'salso that the ode spawns 400 concurrent tasks to try to load file.txt directly into 400 partitions
What are feature vectors?
n-dimensional vector of numerical features that represent some object term occurrences frequencies, pixels of an image etc. Feature space: vector space associated with these vectors
Product segments
nature, commodity vs differentiable good, complementary goods, substitute goods, life cycle
In most cases
net income and income from continuing operations,Are one in the same
return on assets (ROA)
net income divided by total assets
ros
net income/sales
nopat
net operating profit after tax
ROIC
net profit / invested capital
test drive conversion rate
number of purchases / number of test drives aka- conversion rates to sales
operating cash flow & cash flow from operations
on cash flow statement - shows a company's cash flows from normal operations. it does not include depreciation and amortization, as well as other non-cash charges
free cash flow
operating cash flow minus capital expenditures. cash based measure that does not suffer from the issues of accrual based accounting
Profitability analysis focuses primarily on the relationship between _____ and ______.
operating results and the resources available to a business
Expenses
opposite side of revenue- all costs of business (tax, interest, payroll, r & d)
A ratio that measures the "instant" debt‐paying ability of a company is called the _____ ratio
or acid‐test ratio.,quick
institutional ownership
precentage of ownership of a stock by large investors such as hedge funds, etfs, mutual funds, private equity funds, and pension funds
Are you familiar with price optimization
price elasticity, inventory management, competitive intelligence? Give examples.,Price optimization is the use of mathematical tools to determine how customers will respond to different prices for its products and services through different channels. Big Data and data mining enables use of personalization for price optimization. Now companies like Amazon can even take optimization further and show different prices to different visitors, based on their history, although there is a strong debate about whether this is fair. Price elasticity in common usage typically refers to Price elasticity of demand, a measure of price sensitivity. It is computed as: Price Elasticity of Demand = % Change in Quantity Demanded / % Change in Price. Similarly, Price elasticity of supply is an economics measure that shows how the quantity supplied of a good or service responds to a change in its price. Inventory management is the overseeing and controlling of the ordering, storage and use of components that a company will use in the production of the items it will sell as well as the overseeing and controlling of quantities of finished products for sale. Wikipedia defines Competitive intelligence: the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Tools like Google Trends, Alexa, Compete, can be used to determine general trends and analyze your competitors on the web.
4 Ps/marketing mix
price, product, promotion, placement
Another profitability measure quoted by the financial press is the _______ ratio on common stock. This ratio measures a company's future earnings prospects.
price‐earnings (P/E)
Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). Report of Independent Registered Public Accounting Firm. If these lease obligations were capitalized
property would be increased by $______ and $_____ long-term debt,by $_____ and $_____, and retained earnings by $_____ and $______ as of December 31, 2002 and 2001, respectively. Additionally, net income would be increased (decreased) by $____ and $____ and earnings per share would be increased (decreased) by $_____ and $_____, respectively, for years then ended.
10% condition:
sample size, n, must be no more than 10% of the population
Sampling error example
sampling less than the population size
what are the types of data gathering?
sampling-the act of extracting only certain data values from a dataset
Beta
sensitivity to moves in the overall market. farther from 1, above or below, the more sensitive the stock. higher the beta the more risky it is presumed to be
The fixedrate payer in an interestrate swap has a position equivalent to a series of: A) long interestputs and short interestrate calls. B) short interestrate puts and long interestrate calls. C) long interestrate puts and calls.
short interestrate puts and long interestrate calls. The fixedrate payer has profits when short rates rise and losses when short rates fall, equivalent to writing puts and buying calls.
piotroski f- score
simple 9 point scoring systems to seperate businesses based on success points are assigned based on 9 criteria, typically different ratio metrics (profitability, leverage, liquidity, sources of funds and operating efficiency)
what is social media?
social media as online channels for communication. eg Snapchat, Instagram etc.
Qualified Opinion Vs. Adverse Opinion = GAAP Problem.
the auditor uses professional judgement to determine whether to issue a qualified opinion or an adverse opinion when audit evidence indicates that there is material misstatement of the financial statements.
how does npv enable the marketer to compare marketing campaigns or initiatives?
the cost of the campaign is subtracted from the present value for each campaign and you compare these npvs
Alternative Hypothesis
the hypothesis that concludes all values not covered by the null the alternative hypothesis is deemed t be true if the null hypothesis is rejected (HA or H1)
Financial MGMT
the management of all processes associated with the efficient acquisition and deployment of financial resources
Decreasing returns to larger sampling
the margin of error decrease is greatest when going from 100 samples to 200 samples
Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Adverse Opinion Paragraph. When the auditor expresses an adverse opinion
the opinion paragraph should state that, in the auditor's opinion,,BECAUSE Of the significance of the matter(s) described in the basis for adverse opinion paragraph, the financial statements DO NOT PRESENT FAIRLY in accordance with the applicable financial reporting framework.
Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Adverse Opinion Paragraph: When the auditor expresses an adverse opinion
the opinion paragraph should state that, in the auditor's opinion,,Because Of the effects of matters discussed in preceding paragraph(s), the financial statements Do Not Present Fairly, in conformity with accounting principles generally accepted in the United States of America, the financial statements.
For European options
the probability for each ending node is,(nCx)(p*^x)[(1-p*)^(n-x)] for n=nodes and x=#up
Financial Accounting
the process of designing and operating an information system for collecting, measuring and recording an enterprise's transactions and summarising and communicating the results of these transactions to users to facilitate making financial decisions.
Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Basis for Adverse Opinion: Under accounting principles generally accepted in the United States of America
the subsidiary should have been consolidated because it is controlled by the company.,Had XYZ Company been consolidated, many elements in the accompanying consolidated financial statements would have been materially affected. The effects on the consolidated financial statements of the failure to consolidate have not been determined.
How to do cross-validation right?
the training and validation data sets have to be drawn from the same population predicting stock prices: trained for a certain 5-year period, it's unrealistic to treat the subsequent 5-year a draw from the same population common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well Bias-variance trade-off for k-fold cross validation: Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n−1n−1 observations). But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation Typically, we choose k=5k=5 or k=10k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
In the new post-apocalyptic world
the world queen is desperately concerned about the birth rate. Therefore, she decrees that all families should ensure that they have one girl or else they face massive fines. If all families abide by this policy-that is, they have continue to have children until they have one girl, at which point they immediately stop-what will the gender ratio of the new generation be? (Assume that the odds of someone having a boy or a girl on any given pregnancy is equal.) Solve this out logically and then write a computer simulation of it.,If each family abides by this policy, then each family will have a sequence of zero or more boys followed by a single girl. That is, if "G" indicates a girl and "B" indicates a boy, the sequence of children will look like one of: G
debt to equity
total debt/total equity
Which all are the
ways to configure Spark Properties and order them least important to the most important.,Ans: There are the following ways to set up properties for Spark and user programs (in the order of importance from the least important to the most important): -conf/spark-defaults.conf - the default --conf - the command line option used by spark-shell and spark-submit -SparkConf
wacc
weight portion of debt(less marginal tax rate) vs equity
Gamble: Variance
weighted average squared deviation from expected value
A test has a true positive rate of 100% and false positive rate of 5%. There is a population with a 1/1000 rate of having the condition the test identifies. Considering a positive test
what is the probability of having that condition?,Let's suppose you are being tested for a disease, if you have the illness the test will end up saying you have the illness. However, if you don't have the illness- 5% of the times the test will end up saying you have the illness and 95% of the times the test will give accurate result that you don't have the illness. Thus there is a 5% error in case you do not have the illness. Out of 1000 people, 1 person who has the disease will get true positive result. Out of the remaining 999 people, 5% will also get true positive result. Close to 50 people will get a true positive result for the disease. This means that out of 1000 people, 51 people will be tested positive for the disease even though only one person has the illness. There is only a 2% probability of you having the disease even if your reports say that you have the disease.
For a standard normal distribution (µ=0
σ=1), the area under the curve less than 1.5 is 93.32%. What is the approximate percentage of the area under the curve less than -1.5?,6.68%. 1-93.32%=6.68% is the area under the curve greater than 1.5. Since the normal distribution is symmetric, 6.68% is also the area under the curve less than -1.5.
Sharpe Ratio Relationship: φcall vs φput
φcall=-φput
What are some feature engineering techniques?
1. TF x IDF 2. ChiSquare 3. Kernel Trick 4. Hashing 5. Binning
Porter's 5 Forces 4. Bargaining Power of the Buyers
Buyers compete with the industry by forcing down the prices, bargaining for higher quality or better services. They play the competition against eachother all at the expense of the industry profitability
Which scheduler is used by SparkContext by default?
By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.
How can you avoid overfitting ?
By using a lot of data overfitting can be avoided, overfitting happens relatively as you have a small dataset, and you try to learn from it. But if you have a small database and you are forced to come with a model based on that. In such situation, you can use a technique known as cross validation. In this method the dataset splits into two section, testing and training datasets, the testing dataset will only test the model while, in training dataset, the datapoints will come up with the model. In this technique, a model is usually given a dataset of a known data on which training (training data set) is run and a dataset of unknown data against which the model is tested. The idea of cross validation is to define a dataset to "test" the model in the training phase.
Use business situations framework when you hear
Enter a new market / start a new Introduce a new product Respond to competitors behavior Respond to changes in demand How to grow
Which technique is used to predict categorical responses?
Classification technique
The auditor's inability to determine the amounts associated with illegal acts
Committed by the client's management could result in a disclaimer.
What is the advantage of companion objects in Scala?
Companion objects are beneficial for encapsulating things and they act as a bridge for writing functional and object oriented programming code. Using companion objects, the Scala programming code can be kept more concise as the static keyword need not be added to each and every attribute. Companion objects provide a clear separation between static and non-static methods in a class because everything that is located inside a companion object is not a part of the class's runtime objects but is available from a static context and vice versa.
Pricing Strategies Approach Princing Elements
Company Objective Competitive Pricing Cost-based Pricing Price-based Costing
Pricing Strategies 2.Investigate the Product
How does it compare to the competition? Are their substitutes or alternatives? Where is the product in its growth cycle? Is their a Supply-and-Demand issue at work?
New Product Financing Elements
How funded? Best allocation of funds? Debt Viable?
What is the variance?
How much are a set of numbers spread out? Small variances are close to the mean,
Things that should be in the back of your mind in every case:
How the internet and technology economy competition(internal &external(subs)) affect the company
Market development
Introducing present products or services into a new geographic area
What is power analysis?
Power analysis is an important part of experimental design because it allows us to find out the minimum sample size needed to detect its effect with a certain level of confidence. minimum sample. effect.
R-Squared mean value?
R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. ... 100% indicates that the model explains all the variability of the response data around its mean.
The Value Chain Operations
Raw Materials become product in this phase through the use of capital equipment and labor
Substantial doubt with regard to the entity's ability to continue as a going concern
Should be disclosed in an emphasis-of-matter paragraph appended to an otherwise unmodified opinion.
Differentiation strategies
Should be pursued only after a study of buyers' needs and preferences to determine the feasibility on incorporating one or more differentiating features into a unique product
what is one issue storing data in spreadsheets?
Since protection of the data is limited, users can easily introduce errors in formulas (processing) if they are unfamiliar with how the spreadsheet works.
The ability of a business to pay debts is called ______.
Solvency
Structured Data
Structured data are computer-readable and usable.
What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method
Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components. Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error. Algorithm: 1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable 2) Compute covariance matrix Σ 3) Compute eigenvectors of Σ 4) Choose k principal components so as to retain x% of the variance (typically x=99) Applications: 1) Compression - Reduce disk/memory needed to store data - Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set 2. Visualization: 2 or 3 principal components, so as to summarize data Limitations: - PCA is not scale invariant - The directions with largest variance are assumed to be of most interest - Only considers orthogonal transformations (rotations) of the original variables - PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not - If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances 11. Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important False positive Improperly reporting the presence of a condition when it's not in reality. Example: HIV positive test when the patient is actually HIV negative False negative Improperly reporting the absence of a condition when in reality it's the case. Example: not detecting a disease when the patient has this disease. When false positives are more important than false negatives: - In a non-contagious disease, where treatment delay doesn't have any long-term consequences but the treatment itself is grueling - HIV test: psychological impact When false negatives are more important than false positives: - If early treatment is important for good outcomes - In quality control: a defective item passes through the cracks! - Software testing: a test to catch a virus has failed
economies of scale
The increase in efficiency of production as the number of goods being produced increases. Typically, a company that achieves economies of scale lowers the average cost per unit through increased production since fixed costs are shared over an increased number of goods.
module 4
We use regression analysis for two primary purposes: Studying the magnitude and structure of the relationship between two variables. Forecasting a variable based on its relationship with another variable. The structure of the single variable linear regression line is ŷ =a+bxy^=a+bx. ŷ y^ is the expected value of yy, the dependent variable, for a given value of xx. xx is the independent variable, the variable we are using to help us predict or better understand the dependent variable. aa is the y-intercept, the point at which the regression line intersects the vertical axis. This is the value of ŷ y^ when the independent variable, xx, is set equal to 0. bb is the slope, the average change in the dependent variable yy as the independent variable xx increases by one. The true relationship between two variables is described by the equation y=α+βx+εy=α+βx+ε, where εε is the error term (ε=y−ŷ )(ε=y−y^). The idealized equation that describes the true regression line is ŷ =α+βxy^=α+βx. We determine a point forecast by entering the desired value of xx into the regression equation. We must be extremely cautious about using regression to forecast for values outside of the historically observed range of the independent variable (x-values). Instead of predicting a single point, we can construct a prediction interval, an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size. The width of a prediction interval varies based on the standard deviation of the regression (the standard error of the regression), the desired level of confidence, and the location of the x-value of interest in relation to the historical values of the independent variable. It is important to evaluate several metrics in order to determine whether a single variable linear regression model is a good fit for a data set, rather than looking at single metrics in isolation. R2 measures the percent of total variation in the dependent variable, yy, that is explained by the regression line. R2=Variation explained by the regression lineTotal variation=Regression Sum of SquaresTotal Sum of SquaresR2=Variation explained by the regression lineTotal variation=Regression Sum of SquaresTotal Sum of Squares 0≤R2≤1 For a single variable linear regression, R2 is equal to the square of the correlation coefficient. In addition to analyzing R2, we must test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data. We do this by analyzing the p-value (or confidence interval) associated with the independent variable and the regression's residual plot. The p-value of an independent variable is the result of the hypothesis test that tests whether there is a significant linear relationship
1/2%
#/100 and #/2
50%
#/2
1010
10
Determine the Nature and Scope of Engagement
Auditor may be hired to perform audit for single period or multiple periods May be on complete FS, single FS, or specific element, account or items of FS Many audit firms are hired to perform tax services in addition to audit services. *Nonissuers* Private have choice of: -Financial Statement audit - fairness of FS -Integrated audit: 1 opinion of fairness of FS, 1 opinion on operating effectiveness of IC over finanail reporting *Issuers* Public must perform integrated
Which of the following is the best approximation of the gamma of an option if its delta is equal to 0.6 when the price of the underlying security is 100 and 0.7 when the price of the underlying security is 110? A) 1.00. B) 0.01. C) 0.10.
B) 0.01. The gamma of an option is computed as follows: Gamma = change in delta/change in the price of the underlying = (0.7 0.6)/(110 100) = 0.01
Writing a series of interestrate puts and buying a series of interestrate calls all at the same exercise rate, is equivalent to: A) a short position in a series of forward rate agreements. B) being the fixedrate payer in an interest rate swap. C) being the floatingrate payer in an interest rate swap.
B) being the fixedrate payer in an interest rate swap. A short position in interest rate puts will have a negative payoff when rates are below the exercise rate the calls will have positive payoffs when rates exceed the exercise rate. This mirrors the payoffs of the fixedrate payer who will receive positive net payments when settlement rates are above the fixed rate.
What's the difference between Type I and Type II error?
Don't think that this is a trick question! Many machine learning interview questions will be an attempt to lob basic questions at you just to make sure you're on top of your game and you've prepared all of your bases. Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn't, while Type II error means that you claim nothing is happening when in fact something is. A clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn't carrying a baby.
Consistency
Intra-company comparisons
Mergers and Acquisitions Approach
Objectives Price Due Diligence Exit Strategies
Interquartile Range
Q3 - Q1 = IQR
n-gram
token permutations associated with a keyword
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include:
(2). An explanation of how disclosures are misstated is there is a material misstatement related to narrative disclosure.
Appropriateness of Financial Statement Presentation or Disclosure (≠ GAAP) Material misstatements related to the appropriateness of financial statement presentation or the appropriateness or adequacy of disclosures may arise when: (2)
(3). The financial statements do not provide the disclosures needed to achieve fair presentation
f
01100110
Adverse Opinion (Issuer/public company/GAAP-very material problem). Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Middle Paragraph(s):
A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (1). All of the substantive reasons that lead the auditor to conclude that there has been a departure from generally accepted accounting principles.
Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Middle Paragraph(s):
A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (1). All of the substantive reasons that lead the auditor to conclude that there has been a departure from generally accepted accounting principles.
Alerts that Restrict Use of Auditor's Written Communication
Auditor may be required by GAAS or may decide that it is necessary to include language in auditor's report that restricts the use of the auditor's written communication. In the report, such language is included in an other-matter paragraph. *Use* -Include an alter that restricts its use when subject matter of auditor written communication is based on: measurement or disclosure criteria suitable for limited users who have adequate understanding, measurement or disclosure criteria avail to only specific parties, matters identified during audit engagement that are not primary object of engagement *Content* -Statement that auditor's written communication is intended solely for the information and use of specified parties -Identification of parties whom it is intended -Statement that auditor written communication is not inteded and should not be used by anyuone
Reasonable Assurance and Inherent Limitations of Audit
Auditor obtains reasonable assurance about whether FS are free from material misstatement whether due from error or fraud -Reasonable assurance is high, but not absolute, level of assurance. To obtain, auditor must: 1. Plan work and properly supervise assistances 2. Determine appropriate materiality levels 3. Identify and assess risks of material misstatement 4. Obtain sufficient appropriate audit evidence Auditor unable to obtain absolute assurance because of inherent limiations: -Nature of Financial Reporting: FS items include subjective decisions or judgment by management: estimates - AR: bad debt, inventory: obsolete, PPE: life and salvage, Intangible: Cash flow, Impairment, warranties, contingency, lawsuit -Nature of Audit Procedures: Mangement or others may not provide, intentionally or not, the complete information. Fraud may be concealed. Fraud = intentional Error= Unintentional -Timeliness of Financial Reporting and Balance Between Cost and Benefit: Form opinion w/in resonable period of time and achieve balance between benefit and cost. Impractiable to address all ifnormation. Necessary for auditor to: Plan audit so performed effectively, Direct effors that are expected to contain risks of material misstatement, and Use testing and other means of examinimg populations for misstatements
What can be done to avoid local optima?
Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost
What is one-hot econding ?
Maps a column of categories to a column of sparse binary vectors. Use if you don't want to order categorical variables
Can you cite some examples where a false positive is important than a false negative? Define false positive & false negative.
Before we start, let us understand what are false positives and what are false negatives. False Positives are the cases where you wrongly classified a non-event as an event a.k.a Type I error. And, False Negatives are the cases where you wrongly classify events as non-events, a.k.a Type II error. False Positive and False Negative In medical field, assume you have to give chemo therapy to patients. Your lab tests patients for certain vital information and based on those results they decide to give radiation therapy to a patient. Assume a patient comes to that hospital and he is tested positive for cancer (But he doesn't have cancer) based on lab prediction. What will happen to him? (Assuming Sensitivity is 1) One more example might come from marketing. Let's say an ecommerce company decided to give $1000 Gift voucher to the customers whom they assume to purchase at least $5000 worth of items. They send free voucher mail directly to 100 customers without any minimum purchase condition because they assume to make at least 20% profit on sold items above 5K. Now what if they have sent it to false positive cases?
variable costs
COGS, raw materials, energy inputs, labor, service
Ex: Nestle wants to grow market share in China who has bad water quality and the demand for bottled water is growing
Company Market Growth Strategies Alt Markets
Suppliers
Consolidation threat of integration pull through by customers
Schroder Method
Construct tree using pre-paid forward price (i.e., S-PV(Div)). The stock price at each node is pre-paid forward price + present value of unpaid dividends (only used to determine payoff at a node).
Advantages of Equity Investing
Control over company-Voting Rights. Participation in future profits.
Why is "Naive" Bayes naive?
Despite its practical applications, especially in text mining, Naive Bayes is considered "Naive" because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life. As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked pickles and ice cream would probably naively recommend you a pickle ice cream. Bayes' Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That's something important to consider when you're faced with machine learning interview questions.
Audit Issues - Material and Pervasive
Disclaimer of Opinion
Black-Scholes Model Pricing on Futures
Discount at FUTURE expiry instead of option expiry.
Dividend Yield = ? / ?
Dividends per Share of Common Stock / Market Price per Share of Common Stock
When to use ensemble learning?
Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.
Companies use LIFO for:
Tax advantages
What is normalization?
elimination of data anomalies
A business plan document is the culmination of...
the business planning process
c
01100011
Most other distributions
n>30
decision node
square
Materiality
*Audit of Single FS* Auditor should determine materiality for the single FS rather than the complete set of FS *Audit of Specific Element* Auditor should determine materiality separately for each element, rather than aggregate of all elements.
Qualified v Adverse Opinion
*GAAP Problem* -Qualified Opinion: Auditor concludes misstatements are material but not pervasive to FS -Adverse Opinion: Auditor concludes that misstatements are both material and pervasive to the FS Examples: -GAAP consistency change (unjustified): auditor disagrees -Inadequate disclosures -Departure from GAAP (unjustified) -Unreasonable accounting estimate *Nature of Material Misstatements ≠ GAAP* -Appropriateness of Accounting Policies ∙Acc policies are not in accordance w framework ∙FS do not represent underlying transaction/event in a manner to achieve fair presentation ∙Entity has not complied w framework requirements for accounting/disclosing changes in acc policies -Application of Accounting Policies ∙Mgt has not applied acc policies in accordance w framework ∙Mgt has not applied acc policies consistently between periods/similar transactions ∙Error in application of acc policy -Appropriateness of Financial Statement Presentation or Disclosure ∙FS do not include all required disclosures ∙Disclosures are not presented in accordance w framework ∙FS do not provide disclsures needed for fair presentation ∙Info that is required to be presented has not been included or disclosed
Auditor's Report Issuer Adverse Opinion
*Issuer = Public* *GAAP = Large Matieral Problem* *Report* Introductory Paragraph -Same (FS were audited, FS are responsibility of mgt and auditor is responsible for opinion) Scope Paragraph -Same (Audit in accordance w PCAOB, audit was planned and performed to obtain assurance that FS are free from error, Examined evidence on test basis, assessed acc principles and estimated made by mgt, audit provides reasonable basis for opinion) Middle Paragraph -Immediately before the opinion -List substantive reason that lead auditor to conclude departure from GAAP -Disclosure of principal effects Adverse Opinion Paragraph -Because of the effects of matter, FS do not present fairly
Auditor's Report Issuer Qualified Opinion
*Issuer = Public* *GAAP = Matieral Problem* *Report* Introductory Paragraph -Same (FS were audited, FS are responsibility of mgt and auditor is responsible for opinion) Scope Paragraph -Same (Audit in accordance w PCAOB, audit was planned and performed to obtain assurance that FS are free from error, Examined evidence on test basis, assessed acc principles and estimated made by mgt, audit provides reasonable basis for opinion) Middle Paragraph -Immediately before the opinion -List substantive reason that lead auditor to conclude departure from GAAP -Disclosure of principal effects Qualified Opinion Paragraph -Except for effects of matter... FS are presented fairly
Explanatory Paragraph
*Issuer=Public* -Included in report when required by PCAOB or auditor's discretion. Does not affect the auditor's opinion. *Requirements* -Does not have a title -Describe matter being emphasized and location of relevant disclosures -Will generally follow the opinion paragraph when added to unqualified report -May be place before or after opinion paragraph to emphaize a matter Place before opinion when: -FS are prepared in accordance w special purpose framework -Prior year audit opinion is updated
Modified Opinion on Complete Set of FS
*Modified Opinion Relevant to Audit of Specific Element* Modified opinion on complete set of FS is relevant to audit of specific element on FS, auditor should either: -express an adverse opinion on element when modified opinion on complete set of FS is due to MM of FS (GAAP) -express a disclaimer of opinion on element when modified opinion on complete set of FS is due to scope limitation (GAAS) *Piecemeal Opinion* Auditor expresses an adverse or disclaimer of opinion on complete set of FS, an unmodified opinion on specific element in same report. Auditor considers it appropriate to express an unmodified opinion on specific elements, should do so ONLY: 1. opinion on specific element is not published w and does not accompany the auditor's report on the complete set of FS (which have adverse/disclaimer) 2. specific element does not constitute a major portion of the entity's complete set of FS or element is not based on SHE or NI. Single FS is considered a major portion of complete set of FS, unmodified should not be expressed on single if auditor has expressed adverse or disclaimer on complete *Emphasis-of-matter, Other-Matter, Explanatory Paragraph* If auditor's report on complete set includes emphasis of matter that is relevant to audit of single FS or specific elemetn, auditor should include similar paragrah in auditors report on single FS or specific element.
Other-Matter Paragraphs
*Nonissuer = Private* Included in report when required by GAAS or at auditor's discretion. Refer to matters other than those presented or disclosed in the FS that are relevant to user's understanding of audit, auditor's responsibilities, or auditor's report *Requirements* -Immediately after the opinion paragraph, after any emphasis-of-matter paragraph. After opinion & emphasis of matter. -"Other-Matter" -Describe matter being emphasized and location of disclosures in FS
Auditor's Report Nonissuer Adverse Opinion
*Nonissuer = Private* *Large Material Misstatement = GAAP* *Report* Introductory Paragraph -Same (Entity being audited, FS were audited, name of FS) Management's Responsibility Paragraph -Same (MR DIM): mgt is responsible for FS, responsibility includes design, implementation, and maintence of internal control. Auditor's Responsibility Paragraph -Same (express opinion, accordance with auditing standards of US, plan, perform, obtain evidence, assess risk of MM, test IC) Modified to say basis is adverse audit opinion Basis for Adverse Opinion Paragraph -Immediately before opinion paragraph -Description and quantification of effects -Explanation of how disclosures are misstated -Description and inclusion of nature of omitted info and omitted info when practicable (reasonably obtainable from mgt accounts) Adverse Opinion Paragraph -Because of the significance of the matter... FS do not present fairly in accordance with framework
Auditor's Report Nonissuer Qualified Opinion
*Nonissuer = Private* *Material Misstatement = GAAP* *Report* Introductory Paragraph -Same (Entity being audited, FS were audited, name of FS) Management's Responsibility Paragraph -Same (MR DIM): mgt is responsible for FS, responsibility includes design, implementation, and maintence of internal control. Auditor's Responsibility Paragraph -Same (express opinion, accordance with auditing standards of US, plan, perform, obtain evidence, assess risk of MM, test IC) Modified to say basis is qualified audit opinion Basis for Qualified Opinion Paragraph -Immediately before opinion paragraph -Description and quantification of effects -Explanation of how disclosures are misstated -Description and inclusion of nature of omitted info and omitted info when practicable (reasonably obtainable from mgt accounts) Qualified Opinion Paragraph -Except for the effects of matters described in basis... FS are presented fairly *Make sure omission does not make FS false, fraudulent, deceptive, or misleading, if so WITHDRAW*
Public Company Accounting Oversight Board Auditing Standards
*PCAOB AS* - Audits -Section: PCAOB AS -Standard Setting: Public Company Accounting Oversight Board -Provides generally accepted auditing standards for audits of *issuers*. Provide guidance for other services like review of interim financial information and letters to underwriters *Public* -Audits of annual FS: issuers, Special reports: issuers, Interim FS: issuers
Use of Other-Matter Paragraph
*Required* -Auditor includes alert in report that restricts the use of report -Subsequently discovered facts lead to change in audit opinion (option to put in emphasis-of-matter) -FS of prior period were audited by predecessor auditor and predecessor audit report is not reissued -Current FS are audited and presented in comparative form w FS from prior period or in comparative form w prior FS that were not audited, reviewed, or compiled -Prior to audit report date, auditor identifies a material inconsistency in other information -Auditor chooses to report on supplementary information -Refer to required supplementary information -Restrict the use of auditor's report when special purpose FS are prepared -Report on compliance is included in report on FS *May be Necessary* Professional Judgement -Describe reason why auditor cannot withdraw from engagement when auditor is unable to obtain sufficient evidence. -Law, regulation, or generally accepted practice require auditor to provide further explanation of auditor's responsibilities -Auditor engaged to report on more than 1 set of FS when each set has been prepared in accordance with a difference general-purpose framework
Use Emphasis-of-Matter Paragraphs
*Required* -Conclude substantial doubt in ability to continue as going concern -Describe a justified change in acc principle that has material effect on FS -Subsequently discovered facts lead to a change in audit opinion -FS are prepared in accordance w applicable special purpose framework *May be Necessary* Professional Judgement -Uncertainty related to outcome of unusually important litigation or regulatory action -Major catastrophe having significant effect on fin position -Significant related party transactions -Unusually important subsequent events
Use of Explanatory Paragraphs
*Required* -Prior year opinion is updated -FS are prepared in accordance w special purpose framework -Substantial doubt about ability to continue as going concern -Material change between periods in acc principles or method -Material misstatement in previously issued FS has been corrected -Other info in document containing audited FS is materially inconsistent w FS -Selected quarterly fin data required by SEC Regulation S-K has been omitted or not reviewed -Supplementary info has been omitted *May be Necessary* Professional Judgement -Wishes to emphasize a matter regarding FS *Intro paragraph modified when prior FS audited by prior auditor and prior auditors report is not presented*
Principles - Risk Assessment
*S*pecify Objectives Identify and *A*ssess Changes Consider Potential for *F*raud Identify and Analyze *R*isks
Statements on Quality Control Standards
*SQCS* - Guidelines -Section: QC -AICPA -Provides guidance to CPA firms about the quality control system. Consists of policies and procedures designed, implemented, and maintained to ensure that the firm complies with professional standards and appropriate legal and regulatory requirements and that any reports issued are appropriate in circumstances -Applies to: CPA firms providing auditing, attestation, and accounting and review services
Statements on Standards for Attestation Engagements
*SSAE* - Other Engagaments -Section: AT-C -Standard Setting: AICPA -Provide guidance for attestation engagements. -*Examination, review, and agreed upon procedures* report on a subject matter, or an assertion about a subject matter, that is the responsibility of another party
Statements on Standards for Accounting and Review Services
*SSARS* - Other Engagements -Section: AR-C -Standard Setting: AICPA Accounting and Review Services Committee -Provide guidance for *unaudited* FS or unaudited financial information of *nonissuers* -Preparation, compilation, and reviews of FS: nonissuers, Preparation or compilation of pro forma fin information: nonissuers
If the auditor is unable to observe physical inventory and is unable to become satisfied through alternative means
,That is a scope limitation. Scope limitation results in either a qualified opinion or a disclaimer of opinion.
product (cheng)
- nature of the product (what it does, how it's used, why it's useful) - commodity or differentiable good - identify complementary goods - identify substitutes (indirect competitors? don't buy anything?) -product's life cycle - how is it packaged
Quote-to-Cash Business Process
--------------------------------------------->>>>>> Presales activity Sales order processing Inventory sourcing Delivery Biling Payment
Responsibilities of financial managers
-Forecasting revenues and costs -Planning activities -Managing costs -Identifying alternative sources and costs of finance -Managing cash -Negotiations with bankers -Evaluation of investments -Measurement and control of performance
Turning around troubled co - choose strategy
-Learn as much about the business and its operations as possible. -Review services, products, and finances. (Are products out of date?Do we have a high debt load?) -Secure sufficient financing so your plan has a chance. -Review talent and temperament of all employees, and get rid of the deadwood. - Determine short term and long-term company goals. - Devise a business plan. - Visit clients,suppliers,and distributors, and reassurethem. -Prioritize goals and get some small successes under your belt ASAP to build confidence
The process of generating and selecting strategies
-Manageable set of most alternative strategies must be developed -The advantages, disadvantages, tradeoffs, costs and benefits of these strategies should be determined -Identifying and evaluating alternative strategies should involve many of the managers and employees who earlier assembled the organizational vision and mission statements, performed the external audit, and conducted the internal audit
The politics of strategy choice
-Political maneuvering consumes valuable time, subverts organizational objectives, diverts human energy, and results in the loss of some valuable employees -Political biases and personal preferences get unduly embedded in strategy choice decisions -The hierarchy of command in an organization, combined with the career aspirations of different people and the need to allocate scarce resources, guaranteed the formation of coalitions of individuals who strive to take care of themselves first and the organization second, third, or fourth
Stage 1: Input stage
-Summarizes the basic input information needed to formulate strategies -Consists of the EFE matrix, the IFE matrix and the competitive profile matrix CPM
prices are stable when:
-growth rate for all competitors is approx. the same -prices are paralleling costs -prices of all competitors are roughly of equal value
Net Income/ Cash Flow *Now estimate income/cash flow*
-historical levels/trends in ratios -gross m./Op. m/ etc. -separate forecasts for expense items -based on some relationship with sales or state company strategy. -forecast cash flows -assume non-cash WC/sales constant -required increases in EC -CAPEX -CFF - debt, equity
fixed costs
-overhead -machinery -distribution -rent -interest -depreciation
g
01100111
h
01101000
j
01101010
m
01101101
o
01101111
p
01110000
q
01110001
r
01110010
v
01110110
y
01111001
Consider a 9month forward contract on a 10year 7% Treasury note just issued at par. The effective annual riskfree rate is 5% over the near term and the first coupon is to be paid in 182 days. The price of the forward is closest to: A) 1
037.27. B) 1,001.84. C) 965.84.,The forward price is calculated as the bond price minus the present value of the coupon, times one plus the riskfree rate for the term of the forward. (1,000 35/1.05^( 182/365 )) 1.05 ^(9/12 ) = $1,001.84
000110
06
000111
07
0001
1
gross profit margin
1 - (COGS/Rev)
Customer Lifetime
1 / Customer Churn Rate
Net Profit Margin
1- (COGS+all other expenses/Rev)
Why are business plans necessary? (5)
1. Attract funding 2. Attract key personnel 3. Budgeting 4. Clarify the business model 5. Benchmark mechanism
what the types of master data tables?
1. Display attributes are attributes that are presented alongside their primary key in analytic reports. 2. Navigational attributes, like display attributes, are displayed alongside their primary key in analytic reports. 3. Time-dependent attributes are attributes such as price that change over time. 4. Time-independent attributes such as product weight do not change over time.
How would you validate a regression model.
1. Eyeball it. If values are outside the response variable values could indicate poor accuracy
Adverse Opinion Due to Material Misstatement of the Financial Statements - Issuers (Public Company)
1. Intro Paragraph 2. Scope Paragraph 3. Middle Paragraph(s) 4. Qualified Opinion - "because of," "the financial statements do not present fairly"
Qualified Opinion Due to Material Misstatement of the Financial Statements - Issuers (Public Company)
1. Intro Paragraph 2. Scope Paragraph 3. Middle Paragraph(s) 4. Qualified Opinion - "except for," "the financial statements are presented fairly"
Pricing Strategies Steps
1. Investigate the Company 2. Investigate the Product 3. Determine the Pricing Strategy
Grow and Increasing Sales Steps
1. Learn about the company, and its size, resources and products 2. Investigate the Industry and compare company to it
Overall Objective of Audit Engagements
1. Objectives of Financial Statement Audit (1 of 2): issuers, nonissuers, and governmental: 1. Obtain reasonable assurance whether FS are free from material misstatements, error or fraud, which enables auditor to express opinion 2. Report on FS and communicate as required by GAAS 2. Objectives of Audit of Internal Control OVer Financial Reporting (2 of 2): Issuers are required. 1. Express opinion on effectiveness of IC over financial reporting 2. Plan and perform audit to obtain appropriate evidence that is sufficient to obtain reasonable assurnace about whether meatieral weakness exists
Steps to develop a SPACE matrix
1. Select a set of variable to define financial positions, competitive position, stability position, and industry position 2. Assign a numerical value ranging from +1 (worst) to +7 (best) to each of the variables that make up the FP and IP dimensions -Assign a numerical value ranging from -1 (best) to -7 (worst) each of the variables that make up the SP and CP dimensions 3. Compute an average score for FP, CP, IP, and SP 4. Plot the average scores for FP, IP, SP, and CP on the appropriate axis in the SPACE matrix 5. Add the two scores on the x-axis and plot the resultant point on X. add the two scores on the y-axis and plot the resultant point on Y. Plot the intersection on the new XY point 6. Draw a line
Porter's five forces
1. Supplier Power 2. Buyer Power 3. Barriers to Entry 4. Threat of Substitutes 5. Competitive Rivalry
turnarounds
1. gather information -tell me about company -why is it failing? bad products, management, economy? -tell me about industry -are competitors facing same problems? -access to capital? 2. action -learn about business and operations -review services, products, finances: products out of date? high debt load? -secure sufficient financing -review talent and temperament of all employees
M&A
1. goals and objectives -- why? good business sense? better alternatives? good strategic move? 2. how much are they paying? 3. due diligence -- research company and industry. -shape company is in -how secure are markets, customers, suppliers? -how is industry doing overall? -what are margins like -- high-volume low margins or low-volume high margins? -legal reasons to prevent acquisition? 4. exit strategy
Variance
1. how far (negative or positive number) is each data point from the average? (standard deviation) x (standard deviation)
pricing strategies
1. investigate product -what's special or proprietary? -do similar products exist? how are they priced? -where are we in industry's growth cycle? -how big is market? -what were R&D costs? 2. pricing strategies -cost-based pricing: production costs, breakeven point, profit margin -price-based costing: what are customers willing to pay, what's it worth to them compared to other things, supply & demand
what are two examples of how ERP is used in business?
1. large retailers such as Costco and Sam's Club collect sales data from their customers. They can utilize that data to see if they should cut down the operations hours etc. 2. G.B.I wishes to optimize its logistics such as shipping. Using its E.R.P system, analyst can determine which shipper are reliable with time and delivery.
competitor factors
1. product/service offering -value chain 2. advantages & disadvantages in capabilities -marketing -operating efficiencies -talented people 3. key data -market share -total number -fragmentation/concentration
Explain what a local optimum is and why it is important in a specific context such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?
A solution that is optimal in within a neighboring set of candidate solutions In contrast with global optimum: the optimal solution among all others K-means clustering context: It's proven that the objective cost function will always decrease until a local optimum is reached. Results will depend on the initial random cluster assignment Determining if you have a local optimum problem: Tendency of premature convergence Different initialization induces different optima Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost
How would you handle an imbalanced dataset?
An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump: 1- Collect more data to even the imbalances in the dataset. 2- Resample the dataset to correct for imbalances. 3- Try a different algorithm altogether on your dataset. What's important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.
What all are the data sources Spark can process?
Ans: -Hadoop File System (HDFS) - Cassandra (NoSQL databases) - HBase (NoSQL database) - S3 (Amazon WebService Storage : AWS Cloud)
What is a RDD Lineage Graph
Ans: A RDD Lineage Graph (aka RDD operator graph) is a graph of the parent RDD of a RDD. It is built as a result of applying transformations to the RDD. A RDD lineage graph is hence a graph of what transformations need to be executed after an action has been called
How do you define RDD?
Ans: A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.
What is the purpose of Driver in Spark Architecture?
Ans: A Spark driver is the process that creates and owns an instance of SparkContext. It is your Spark application that launches the main method in which the instance of SparkContext is created. -Drive splits a Spark application into tasks and schedules them to run on executors. - A driver is where the task scheduler lives and spawns tasks across workers. - A driver coordinates workers and overall execution of tasks.
Can you define the purpose of master in Spark architecture?
Ans: A master is a running Spark instance that connects to a cluster manager for resources. The master acquires cluster nodes to run executors.
What is Preferred Locations
Ans: A preferred location (aka locality preferences or placement preferences) is a block location for an HDFS file where to compute each partition on. def getPreferredLocations(split: Partition): Seq[String] specifies placement preferences for a partition in an RDD.
How do you define actions?
Ans: An action is an operation that triggers execution of RDD transformations and returns a value (to a Spark driver - the user program). They trigger execution of RDD transformations to return values. Simply put, an action evaluates the RDD lineage graph. You can think of actions as a valve and until no action is fired, the data to be processed is not even in the pipes, i.e. transformations. Only actions can materialize the entire processing pipeline with real data.
What is Apache Parquet format?
Ans: Apache Parquet is a columnar storage format
What is a BlockManager?
Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap.
Block Manger
Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap. A BlockManager manages the storage for most of the data in Spark, i.e. block that represent a cached RDD partition, intermediate shuffle data, and broadcast data.
What is checkpointing?
Ans: Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed (HDFS) or local file system. RDD checkpointing that saves the actual intermediate RDD data to a reliable distributed file system.
What is DAGSchedular and how it performs?
Ans: DAGScheduler is the scheduling layer of Apache Spark that implements stage-oriented scheduling, i.e. after an RDD action has been called it becomes a job that is then transformed into a set of stages that are submitted as TaskSets for execution.
How do you define SparkContext?
Ans: It's an entry point for a Spark Job. Each Spark application starts by instantiating a Spark context. A Spark application is an instance of SparkContext. Or you can say, a Spark context constitutes a Spark application. SparkContext represents the connection to a Spark execution environment (deployment mode). A Spark context can be used to create RDDs, accumulators and broadcast variables, access Spark services and run jobs.
What is Lazy evaluated RDD mean?
Ans: Lazy evaluated, i.e. the data inside RDD is not available or transformed until an action is executed that triggers the execution.
Why Spark is good at low-latency iterative workloads e.g. Graphs and Machine Learning?
Ans: Machine Learning algorithms for instance logistic regression require many iterations before creating optimal resulting model. And similarly in graph algorithms which traverse all the nodes and edges. Any algorithm which needs many iteration before creating results can increase their performance when the intermediate partial results are stored in memory or at very fast solid state drives.
What is Narrow Transformations? (Spark)
Ans: Narrow transformations are the result of map, filter and such that is from the data from a single partition only, i.e. it is self-sustained. An output RDD has partitions with records that originate from a single partition in the parent RDD. Only a limited subset of partitions used to calculate the result. Spark groups narrow transformations as a stage.
Can RDD be shared between SparkContexts?
Ans: No, When an RDD is created it belongs to and is completely owned by the Spark context it originated from . RDDs can 't be shared between SparkContexts.
What is the difference between cache() and persist() method of RDD
Ans: RDDs can be cached (using RDD's cache() operation) or persisted (using RDD's persist(newLevel: StorageLevel) operation). The cache() operation is a synonym of persist() that uses the default storage level MEMORY_ONLY .
What are the possible operations on RDD?
Ans: RDDs support two kinds of operations: - transformations - lazy operations that return another RDD. - actions - operations that trigger computation and return values.
What is Apache Spark Streaming?
Ans: Spark Streaming helps to process live stream data. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.
How many concurrent task Spark can run for an RDD partition?
Ans: Spark can only run 1 concurrent task for every partition of an RDD, up to the number of cores in your cluster. So if you have a cluster with 50 cores, you want your RDDs to at least have 50 partitions (and probably 2-3x times that).As far as choosing a "good" number of partitions, you generally want at least as many as the number of executors for parallelism. You can get this computed value by calling sc.defaultParallelism .
How RDD helps parallel job processing?
Ans: Spark does jobs in parallel, and RDDs are split into partitions to be processed and written in parallel. Inside a partition, data is processed sequentially.
Why both Spark and Hadoop needed?
Ans: Spark is often called cluster computing engine or simply execution engine. Spark uses many concepts from Hadoop MapReduce. Both Spark and Hadoop work together well. Spark with HDFS and YARN gives better performance and also simplifies the work distribution on cluster. As HDFS is storage engine for storing huge volume of data and Spark as a processing engine (In memory as well as more efficient data processing). HDFS: It is used as a Storage engine for Spark as well as Hadoop. YARN: It is a framework to manage Cluster using pluggable scedular. Run other than MapReduce: With Spark you can run MapReduce algorithm as well as other higher level of operators for instance map(), filter(), reduceByKey(), groupByKey() etc.
How can you define SparkConf?
Ans: Spark properties control most application settings and are configured separately for each application. These properties can be set directly on a SparkConf passed to your SparkContext. SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set() method. For example, we could initialize an application with two threads as follows: Note that we run with local[2], meaning two threads - which represents minimal parallelism, which can help detect bugs that only exist when we run in a distributed context.
What is Data locality / placement?
Ans: Spark relies on data locality or data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.
What is the advantage of broadcasting values across Spark Cluster?
Ans: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.
Define Spark architecture
Ans: Spark uses a master/worker architecture. There is a driver that talks to a single coordinator called master that manages workers in which executors run. The driver and the executors run in their own Java processes.
Give few examples how RDD can be created using SparkContext,
Ans: SparkContext allows you to create many different RDDs from input sources like: -Scala's collections: i.e. sc.parallelize(0 to 100) -Local or remote filesystems :sc.textFile("README.md") -Any Hadoop InputSource : using sc.newAPIHadoopFile
What is coalesce transformation?
Ans: The coalesce transformation is used to change the number of partitions. It can trigger RDD shuffling depending on the second shuffle boolean input parameter (defaults to false ).
Which limits the maximum size of a partition?
Ans: The maximum size of a partition is ultimately limited by the available memory of an executor.
How many type of transformations exist?(spark)
Ans: There are two kinds of transformations: -narrow transformations -wide transformations
What is wide Transformations?(Spark)
Ans: Wide transformations are the result of groupByKey and reduceByKey . The data required to compute the records in a single partition may reside in many partitions of the parent RDD. All of the tuples with the same key must end up in the same partition, processed by the same task. To satisfy these operations, Spark must execute RDD shuffle, which transfers data across cluster and results in a new stage with a new set of partitions.
What are the workers?
Ans: Workers or slaves are running Spark instances where executors live to execute tasks. They are the compute nodes in Spark. A worker receives serialized/marshalled tasks that it runs in a thread pool.
Is it possible to have multiple SparkContext in single JVM?
Ans: Yes, spark.driver.allowMultipleContexts is true (default: false ). If true Spark logs warnings instead of throwing exceptions when multiple SparkContexts are active, i.e. multiple SparkContext are running in this JVM. When creating an instance of SparkContex.
Can we broadcast an RDD?
Ans: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.
What is master URL in local mode?
Ans: You can run Spark in local mode using local , local[n] or the most general local[*]. The URL says how many threads can be used in total: -local uses 1 thread only. - local[n] uses n threads. - local[*] uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors() to know the number).
How can you stop SparkContext and what is the impact if stopped?
Ans: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application
What is an Asset?
Anything worth money A resource owned by an entity as a result of past events and from which future economic benefits are expected to flow to the entity -Probable future benefit -Arising from some past transaction -Control the resource -Capable of measurement (money)
Grow and Increasing Sales 2. Investigate the Industry and compare company to it
Are the clients prices inline with the competition?
Referring to putcall parity which one of the following alternatives would allow you to create a synthetic European call option? A) Sell the stock buy a European put option on the same stock with the same exercise price and the same maturity invest an amount equal to the present value of the exercise price in a purediscount riskless bond. B) Buy the stock buy a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. C) Buy the stock sell a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond.
B) Buy the stock buy a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. According to putcall parity we can write a European call as: C 0 = P 0 + S 0 X/(1+R f ) TWe can then read off the righthand side of the equation to create a synthetic position in the call. We would need to buy the European put buy the stock, and short or issue a riskless purediscount bond equal in value to the present value of the exercise price.
Which of the following statements regarding an option's price is CORRECT? An option's price is: A) a decreasing function of the underlying asset's volatility when it has a long time remaining until expiration and an increasing function of its volatility if the option is close to expiration. B) an increasing function of the underlying asset's volatility. C) a decreasing function of the underlying asset's volatility.
B) an increasing function of the underlying asset's volatility. Since an option has limited risk but significant upside potential, its value always increases when the volatility of the underlying asset increases.
Which of the following statements regarding the goal of a deltaneutral portfolio is most accurate? One example of a delta neutral portfolio is to combine a: A) long position in a stock with a short position in a call option so that the value of the portfolio changes with changes in the value of the stock. B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. C) long position in a stock with a long position in call options so that the value of the portfolio does not change with changes in the value of the stock.
B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. A deltaneutral portfolio can be created with any of the following combinations: long stock and short calls, long stock and long puts, short stock and long calls, and short stock and short puts.
Growth strategies - choosing
Determine fit for each: -Increase sales -Increase distribution channels -Increase product line -Diversify products or services offered -Acquire competitors or a company in a different industry
Name an example where ensemble techniques might be useful.
Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods, from bagging to boosting to a "bucket of models" method and demonstrate how they could increase predictive power.
what are ERP systems?
Enterprise resource planning (ERP) systems are integrated transactional systems that enable all the functional areas of a business to share data
Reporting Period (F/M)
F-usually annually M- As required by mgmt
Materiality of Problem: None or immaterial
Financial Statements Are Materially Misstated (Financial Statement Issues): Unmodified (Unqualified). Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Unmodified (Unqualified).
Write a function that takes in two sorted lists and outputs a sorted list that is their union.
First solution which will come to your mind is to merge two lists and short them afterwards Python code- def return_union(list_a, list_b): return sorted(list_a + list_b) R code- return_union <- function(list_a, list_b) { list_c<-list(c(unlist(list_a),unlist(list_b))) return(list(list_c[[1]][order(list_c[[1]])])) } Generally, the tricky part of the question is not to use any sorting or ordering function. In that case you will have to write your own logic to answer the question and impress your interviewer. Python code- def return_union(list_a, list_b): len1 = len(list_a) len2 = len(list_b) final_sorted_list = [] j = 0 k = 0 for i in range(len1+len2): if k == len1: final_sorted_list.extend(list_b[j:]) break elif j == len2: final_sorted_list.extend(list_a[k:]) break elif list_a[k] < list_b[j]: final_sorted_list.append(list_a[k]) k += 1 else: final_sorted_list.append(list_b[j]) j += 1 return final_sorted_list Similar function can be returned in R as well by following the similar steps. return_union <- function(list_a,list_b) { #Initializing length variables len_a <- length(list_a) len_b <- length(list_b) len <- len_a + len_b #initializing counter variables j=1 k=1 #Creating an empty list which has length equal to sum of both the lists list_c <- list(rep(NA,len)) #Here goes our for loop for(i in 1:len) { if(j>len_a) { list_c[i:len] <- list_b[k:len_b] break } else if(k>len_b) { list_c[i:len] <- list_a[j:len_a] break } else if(list_a[[j]] <= list_b[[k]]) { list_c[[i]] <- list_a[[j]] j <- j+1 } else if(list_a[[j]] > list_b[[k]]) { list_c[[i]] <- list_b[[k]] k <- k+1 } } return(list(unlist(list_c))) }
antireflexive (irreflexive)
For every x ∈ A, x(not R) x | every guy in the set is NOT related to itself
Integration strategies:
Forward, backward, and horizontal integration
Large enough sample condition
If the population is unimodal and symmetric, even a fairly small sample is okay. For highly skewed distributions you may need several hundred to get the distribution to normal
Explain what a confidence interval means
If you reject something with 95% confidence then in the case there is no true effect, a result like ours will happen in less than 5% of all possible samples
When would you use random forests Vs SVM and why?
In a case of a multi-class classification problem: SVM will require one-against-all method (memory intensive) If one needs to know the variable importance (random forests can perform it as well) If one needs to get a model fast (SVM is long to tune, need to choose the appropriate kernel and its parameters, for instance sigma and epsilon) In a semi-supervised learning context (random forest and dissimilarity measure): SVM can work only in a supervised learning mode
What are the advantages of Naive Bayes?
In Naïve Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data. The main advantage is that it can't learn interactions between features.
Define Supervised Learning:
In Supervised learning, the algorithm learns from the training data so that the knowledge can be applied to predict outcomes of the test data. It can be further grouped into regression and classification problems. A. Classification: output variable is a category, such as "red" or "blue" or "disease" and "no disease". B. Regression: A regression problem is when the output variable is a numerical value, such as "dollars" or "weight".
What is 'Overfitting' in Machine learning?
In machine learning, when a statistical model describes random error or noise instead of underlying relationship 'overfitting' occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfit.
What is the Central Limit Theorem and why is it important?
In probability theory, the central limit theorem (CLT) establishes that, in most situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed.
Cash Receipts - Segregation of the Functions
Incoming mail must be opened by a person who does not have access to the AR ledger. Three copies must be distributed to the following: 1. Cashiers - Receives actual recipients and prepares bank deposit. 2. AR Department - Enters receipts into the AR subsidiary records - Match the details from the bank deposit ticket with the details from the remittance advances 3. Accounting Department - Enters receipts into AR control account
Reasons to go global
Increase demand Decrease cost Circumvent or follow competition Leave saturated home market Create economies of scale Build power as a buyer Hurry down the experience curve Transfer DCs where competition dont have them Achieve location economies Leverage skills of global organizaiton
Option Greek Definition: Theta
Increase in option value per decrease in time to expiry (-1/365∂C/∂t)
Option Greek Definition: Psi
Increase in option value per percentage point increase in the dividend yield (0.01∂C/∂∂)
Option Greek Definition: Rho
Increase in option value per percentage point increase in the risk-free rate (0.01∂C/∂r)
What is income?
Increases in economic benefits during the accounting period in the form of inflows or enhancements of assets or decreases of liabilities that result in increases in equity other than those relating to contributions from equity participants Sales Interest Income Dividend Income Other Income
What would increase the width of the confidence interval?
Increasing the confidence level, Decreasing the sample size
What is latent semantic indexing?
Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text Based on the principle that words that are used in the same contexts tend to have similar meanings "Latent": semantic associations between words is present not explicitly but only latently For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations
What is latent semantic indexing? What is it used for? What are the specific limitations of the method?
Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text Based on the principle that words that are used in the same contexts tend to have similar meanings "Latent": semantic associations between words is present not explicitly but only latently For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations Used for: Learning correct word meanings Subject matter comprehension Information retrieval Sentiment analysis (social network analysis)
Informational Systems
Informational systems are used to provide a place for data to be stored and prepared for analytical purposes
21 Ways to Cut Costs Production
Invest in Technology Consolidate production space to gain scale and increase accountability Create flexible production lines Reduce Inventory(JIT) Outsource Renegotiate with Suppliers Consolidate Suppliers Import Parts
What is the Hypergeometric Distribution?
It is a discrete probability distribution that describes outcomes when sampling from a population without replacement. Suppose we are sampling without replacement from a batch of items containing a variable number of defectives. We are essentially assuming that we know the probability p that a given item is defective but not the actual number of defective items contained in the batch. The number of defective items in the batch is a random variable in this case • Wallet with 3 $100 and 5 $1, probability of getting 2 $100 bills • Checking defective products in a batch of manufactured goods
What is the Binomial Distribution?
It is a discrete probability distribution that describes the outcome of n independent trials in an experiment. In every trial there can only be 2 outcomes (success or failure). The binomial distribution describes the behavior of a count variable X if the following conditions apply (x is the probability of observing a success) 1: The experiment consists of n identical trials 2: Each event/observation is independent 3: Each observation represents one of two outcomes ("success" or "failure"). 4: The probability of "success" p is the same for each outcome. Mean = np | variance = np(1-p) Examples: 4. Market research experiment if people prefer Coke or Pepsi
Is it possible to perform logistic regression with Microsoft Excel?
It is possible to perform logistic regression with Microsoft Excel. There are two ways to do it using Excel. a) One is to use Add-ins provided by many websites which we can use. b) Second is to use fundamentals of logistic regression and use Excel's computational power to build a logistic regression But when this question is being asked in an interview, interviewer is not looking for a name of Add-ins rather a method using the base excel functionalities. Let's use a sample data to learn about logistic regression using Excel. (Example assumes that you are familiar with basic concepts of logistic regression) Sample Data for Logistic Regression Demo using Excel Data shown above consists of three variables where X1 and X2 are independent variables and Y is a class variable. We have kept only 2 categories for our purpose of binary logistic regression classifier. Next we have to create a logit function using independent variables, i.e. Logit = L = β0 + β1*X1 + β2*X2 Logit Function Applied
HTML- hypertext markup languges
It uses tags to mark how content is structured within a web page so that a web browser can process the tags and display the intended content.
C(S
K,T) Payoff,max(0,S(T)-K)
What is cross-validation? How to do it right?
It's a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set. Examples: leave-one-out cross validation, K-fold cross validation How to do it right? the training and validation data sets have to be drawn from the same population predicting stock prices: trained for a certain 5-year period, it's unrealistic to treat the subsequent 5-year a draw from the same population common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well Bias-variance trade-off for k-fold cross validation: Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n−1n−1 observations). But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation Typically, we choose k=5k=5 or k=10k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
what is an example of image recognition?
Jetpac, is an application that uses public Instagram data to create "Jetpac City Guides".
What are the last machine learning papers you've read?
Keeping up with the latest scientific literature on machine learning is a must if you want to demonstrate interest in a machine learning position. This overview of deep learning in Nature by the scions of deep learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what's happening in deep learning — and the kind of paper you might want to cite.
Explain the difference between L1 and L2 regularization.
L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.
21 Ways to Cut Costs 3 Categories
Labor Production Finance
What is Working Capital?
Liquid assets held to meet day to day running costs of the company. (Short term finance)
The ability to convert assets into cash is called ______.
Liquidity
Dogs IV
Low market share, low growth rate -Compete in slow or no market growth industry, consider liquidation
How are kernel methods different?
Machine learning and data mining Kernel Machine.svg Problems[show] Supervised learning (classification • regression) [show] Clustering[show] Dimensionality reduction[show] Structured prediction[show] Anomaly detection[show] Neural nets[show] Reinforcement learning[show] Theory[show] Machine-learning venues[show] Related articles[show] Portal-puzzle.svg Machine learning portal v t e In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation. In kernel methods we map our data into higher dimensions and then classify it. Researchers use kernel methods for non linear classifiable problems. The most interesting point is if you can find proper kernel function for your data set. you can classify it very accurately. But the most different scenario is finding a proper kernel function for your problem. most of the researchers use GA (genetic algorithms) to solve these problems.
If a Product is in its Mature Stage focus on.....
Manufacturing Costs Competition
Why study data analytics?
Many business professionals are trying to understand the data to make better decisions. Employers are pushing educators to better train students on the fundamentals of analytics.
Entering a New Market Approachs
Market Entry
Common metrics in regression:
Mean Squared Error Vs Mean Absolute Error RMSE gives a relatively high weight to large errors. The RMSE is most useful when large errors are particularly undesirable. The MAE is a linear score: all the individual differences are weighted equally in the average. MAE is more robust to outliers than MSE. RMSE=1n∑ni=1(yi−y^i)2−−−−−−−−−−−−−−√RMSE=1n∑i=1n(yi−y^i)2 MAE=1n∑ni=1|yi−y^i|MAE=1n∑i=1n|yi−y^i| Root Mean Squared Logarithmic Error RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate (opposite to RMSE) RMSLE=1n∑ni=1(log(pi+1)−log(ai+1))2−−−−−−−−−−−−−−−−−−−−−−−−−−−√RMSLE=1n∑i=1n(log(pi+1)−log(ai+1))2 Where pipi is the ith prediction, aiai the ith actual response, log(b)log(b) the natural logarithm of bb. Weighted Mean Absolute Error The weighted average of absolute errors. MAE and RMSE consider that each prediction provides equally precise information about the error variation, i.e. the standard variation of the error term is constant over all the predictions. Examples: recommender systems (differences between past and recent products) WMAE=1∑wi∑ni=1wi|yi−y^i|
How do data management procedures like missing data handling make selection bias worse?
Missing value treatment is one of the primary tasks which a data scientist is supposed to do before starting data analysis. There are multiple methods for missing value treatment. If not done properly, it could potentially result into selection bias. Let see few missing value treatment examples and their impact on selection- Complete Case Treatment: Complete case treatment is when you remove entire row in data even if one value is missing. You could achieve a selection bias if your values are not missing at random and they have some pattern. Assume you are conducting a survey and few people didn't specify their gender. Would you remove all those people? Can't it tell a different story? Available case analysis: Let say you are trying to calculate correlation matrix for data so you might remove the missing values from variables which are needed for that particular correlation coefficient. In this case your values will not be fully correct as they are coming from population sets. Mean Substitution: In this method missing values are replaced with mean of other available values.This might make your distribution biased e.g., standard deviation, correlation and regression are mostly dependent on the mean value of variables. Hence, various data management procedures might include selection bias in your data if not chosen correctly.
Modification to Auditor's Opinion
Modified when: 1. Conclude FS as whole are materially misstated (FS issue) 2. Unable to obtain sufficient appropriate audit evidence to conclude FS are free from material misstatement (audit issue) -*Qualified Opinion*: States except for matter, FS present fairly in all material respects, the financial position, results of operations, and CF. (GAAP or GAAS) -*Adverse Opinion*: FS do not present fairly the fin position. (GAAP) -*Disclaimer of Opinion*: Auditor does not express an opinion on FS (GAAS) *None or Immaterial* -FS Materially Misstated (GAAP): Unmodified (unqualified) -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Unmodified (unqualified) *Material but not pervasive* -FS Materially Misstated (GAAP): Qualified opinion -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Qualified opinion *Material and pervasive* -FS Materially Misstated (GAAP): Adverse opinion -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Disclaimer of opinion
When a material GAAP problem is discovered how is the Auditor's Responsibility Paragraph modified (Non-issuer)?
Modify the paragraph to state: "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the *qualified* audit opinion."
Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Auditor's Responsibility Paragraph.
Modify the paragraph to state: "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the ADVERSE AUDIT OPINION. "
How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?
Often it is observed that in the pursuit of rapid innovation (aka "quick fame"), the principles of scientific methodology are violated leading to misleading innovations, i.e. appealing insights that are confirmed without rigorous validation. One such scenario is the case that given the task of improving an algorithm to yield better results, you might come with several ideas with potential for improvement. An obvious human urge is to announce these ideas ASAP and ask for their implementation. When asked for supporting data, often limited results are shared, which are very likely to be impacted by selection bias (known or unknown) or a misleading global minima (due to lack of appropriate variety in test data). Data scientists do not let their human emotions overrun their logical reasoning. While the exact approach to prove that one improvement you've brought to an algorithm is really an improvement over not doing anything would depend on the actual case at hand, there are a few common guidelines: Ensure that there is no selection bias in test data used for performance comparison Ensure that the test data has sufficient variety in order to be symbolic of real-life data (helps avoid overfitting) Ensure that "controlled experiment" principles are followed i.e. while comparing performance, the test environment (hardware, etc.) must be exactly the same while running original algorithm and new algorithm Ensure that the results are repeatable with near similar results Examine whether the results reflect local maxima/minima or global maxima/minima One common way to achieve the above guidelines is through A/B testing, where both the versions of algorithm are kept running on similar environment for a considerably long time and real-life input data is randomly split between the two. This approach is particularly common in Web Analytics.
Forming an Opinion on FS
Opinion on whether FS are presented fairly in all material respect. -To form opinion, auditor take into account: 1. Sufficient appropriate audit evidence was obtained 2. FS are prepared in accordance to financial reporting framework -FS are complete set of general-purpose FS, including notes. (US GAAP: BS, statement of income, changes in equity, CF statement, related notes) When forming opinion: -FS adequately disclose significant accounting policies -Accounting policies are consistent with framework -Accounting estimates made by mgt reasonable -Info presented in FS is relevant, reliable, comparable, and understandable -FS provide adequate disclosures -Terminology is appropriate -Overall structure is fairly presented -FS represent the underlying transactions that achieves fair presentation *Departure from GAAP is permissible if FS would be otherwise misleading (unmodified/unqualified opinion) -Use generally acceptable auditing standards (GAAS) for guidelines to perform the audit -Refer to financial reporting framework (GAAP) to evaluate whether transactions are recorded and reported fairly in FS
Python or R - Which one would you prefer for text analytics?
Pandas , data structures, high performance data analysis tools
What is Parquet?
Parquet is a tabular format for saving and retrieving data.
Shares
Partial ownership of the company Ordinary Shares, Preference Shares Initial Public Offering, Seasoned Offering, Rights Issue
What is the difference between concurrency and parallelism?
People often confuse with the terms concurrency and parallelism. When several computations execute sequentially during overlapping time periods it is referred to as concurrency whereas when processes are executed simultaneously it is known as parallelism. Parallel collection, Futures and Async library are examples of achieving parallelism in Scala.
Definition: Elasticity
Percentage change in option value as a function of the percentage change in the value of the underlying asset.
3 Steps of Strategic Planning
Problem Solution Business Model -how are you planning on making money solving this problem
An auditor may express a disclaimer of opinion when the auditor is unable to obtain sufficient appropriate audit evidence on which to base an opinion. When management refuses to
Produce documentation verifying the ownership of its equipment and production facilities, a client-imposed scope limitation exists, and an expression of disclaimer of opinion may be appropriate.
Calendar Spread Profit
Profit if S(0)=S(T), but loss possible for S(0) substantially different from S(T).
How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.
Proposed methods for model validation: If the values predicted by the model are far outside of the response variable range, this would immediately indicate poor estimation or model inaccuracy. If the values seem to be reasonable, examine the parameters
value chain
R&D --> sourcing --> inbound logistics --> manufacturing --> distribution --> sales & marketing --> service
Common metrics in classification:
Recall / Sensitivity / True positive rate: High when FN low. Sensitive to unbalanced classes. Sensitivity=TPTP+FNSensitivity=TPTP+FN Precision / Positive Predictive Value High when FP low. Sensitive to unbalanced classes. Precision=TPTP+FPPrecision=TPTP+FP Specificity / True Negative Rate High when FP low. Sensitive to unbalanced classes. Specificity=TNTN+FPSpecificity=TNTN+FP Accuracy High when FP and FN are low. Sensitive to unbalanced classes (see "Accuracy paradox") Accuracy=TP+TNTN+TP+FP+FNAccuracy=TP+TNTN+TP+FP+FN ROC / AUC ROC is a graphical plot that illustrates the performance of a binary classifier (SensitivitySensitivity Vs 1−Specificity1−Specificity or SensitivitySensitivity Vs SpecificitySpecificity). They are not sensitive to unbalanced classes. AUC is the area under the ROC curve. Perfect classifier: AUC=1, fall on (0,1) 100% sensitivity (no FN) and 100% specificity (no FP) Logarithmic loss Punishes infinitely the deviation from the true value! It's better to be somewhat wrong than emphatically wrong! logloss=−1N∑ni=1(yilog(pi)+(1−yi)log(1−pi))logloss=−1N∑i=1n(yilog(pi)+(1−yi)log(1−pi)) Misclassification Rate Misclassification=1n∑iI(yi≠y^i)Misclassification=1n∑iI(yi≠y^i) F1-Score Used when the target variable is unbalanced. F1Score=2Precision×RecallPrecision+RecallF1Score=2Precision×RecallPrecision+Recall
Define precision and recall
Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims. It can be easier to think of recall and precision in the context of a case where you've predicted that there were 10 apples and 5 oranges in a case of 10 apples. You'd have perfect recall (there are actually 10 apples, and you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.
Explain what regularization is and why it is useful.
Regularization is the process of adding a tuning parameter to a model to induce smoothness in order to prevent overfitting. This is most often done by adding a constant multiple to an existing weight vector. This constant is often either the L1 (Lasso) or L2 (ridge), but can in actuality can be any norm. The model predictions should then minimize the mean of the loss function calculated on the regularized training set.
Why L1 regularizations causes parameter sparsity whereas L2 regularization does not?
Regularizations in statistics or in the field of machine learning is used to include some extra information in order to solve a problem in a better way. L1 & L2 regularizations are generally used to add constraints to optimization problems. L1 L2 Regularizations In the example shown above H0 is a hypothesis. If you observe, in L1 there is a high likelihood to hit the corners as solutions while in L2, it doesn't. So in L1 variables are penalized more as compared to L2 which results into sparsity. In other words, errors are squared in L2, so model sees higher error and tries to minimize that squared error.
Do you have research experience in machine learning?
Related to the last point, most organizations hiring for machine learning positions will look for your formal experience in the field. Research papers, co-authored or supervised by leaders in the field, can make the difference between you being hired and not. Make sure you have a summary of your research experience and papers ready — and an explanation for your background and lack of formal research experience if you don't.
Net Profit
Revenue*Net Margin %
What is root cause analysis?
Root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. A factor is considered a root cause if removal thereof from the problem-fault-sequence prevents the final undesirable event from recurring
How will you assess the statistical significance of an insight whether it is a real insight or just by chance?
Statistical importance of an insight can be accessed using Hypothesis Testing.
What is sampling? How many sampling methods do you know?
Sampling Methods can be classified into one of two categories: Probability Sampling: Sample has a known probability of being selected Non-probability Sampling: Sample does not have known probability of being selected as in convenience or voluntary response surveys Probability Sampling In probability sampling it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling: Simple Random Sampling (SRS) Stratified Sampling Cluster Sampling Systematic Sampling Multistage Sampling (in which some of the methods above are combined in stages)
What is a Scala Map?
Scala Map is a collection of key value pairs wherein the value in a map can be retrieved using the key. Values in a Scala Map are not unique but the keys are unique. Scala supports two kinds of maps- mutable and immutable. By default, Scala supports immutable map and to make use of the mutable map, programmers have to import the scala.collection.mutable.Map class explicitly. When programmers want to use mutable and immutable map together in the same program then the mutable map can be accessed as mutable.map and the immutable map can just be accessed with the name of the map.
Which Scala library is used for functional programming?
Scalaz library has purely functional data structures that complement the standard Scala library. It has pre-defined set of foundational type classes like Monad, Functor, etc.
How can you deal with different types of seasonality in time series modelling?
Seasonality in time series occurs when time series shows a repeated pattern over time. E.g., stationary sales decreases during holiday season, air conditioner sales increases during the summers etc. are few examples of seasonality in a time series. Seasonality makes your time series non-stationary because average value of the variables at different time periods. Differentiating a time series is generally known as the best method of removing seasonality from a time series. Seasonal differencing can be defined as a numerical difference between a particular value and a value with a periodic lag (i.e. 12, if monthly seasonality is present)
Horizontal integration
Seeking ownership or increased control over competitors -We want to do this when there is a major benefit absorbing the competitor
Principles - Existing Control Activities
Select and Develop *C*ontrol *A*ctivities Select and Develop *T*echnology Controls Deployment of *P*olicies and Procedures
Systematic Sampling
Select one of the first k members randomly, and then every kth member after the selected one • k is the sample interval and equals the ratio N/n
What is selection bias? why is it important and how can you avoid it?
Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample. For example, if a given sample of 100 test cases was made up of a 60/20/15/5 split of 4 classes which actually occurred in relatively equal numbers in the population, then a given model may make the false assumption that probability could be the determining predictive factor. Avoiding non-random samples is the best way to deal with bias however when this is impractical, techniques such as resampling, boosting, and weighting are strategies which can be introduced to help deal with the situation.
Users of Financial Information
Shareholders/Investors Managers/directors Lenders Investment Analysis Government General Public Employees Customers Competitors Suppliers
Compliance with GAAS
Should not represent compliance with GAAS in auditor's report unless auditor has complied with all GAAS relevant to audit. If cannot be achieved, consider whether this prevents auditor from achieving the overall objectives of auditor and thereby requires the auditor to modify the opinion or withdraw from engagement. -GAAS does not override laws or regulations that govern an audit of FS. May conduct in accordance with GAAS and: -auditing standards by PCAOB - public -International standards on auditing - ISAs international -Government auditing standards - GAGAS -Auditing standards of specific jurisdiction
You created a predictive model of a quantitative outcome variable using multiple regressions. What are the steps you would follow to validate the model?
Since the question asked, is about post model building exercise, we will assume that you have already tested for null hypothesis, multi collinearity and Standard error of coefficients. Once you have built the model, you should check for following - · Global F-test to see the significance of group of independent variables on dependent variable · R^2 · Adjusted R^2 · RMSE, MAPE In addition to above mentioned quantitative metrics you should also check for- · Residual plot · Assumptions of linear regression
owners earnings
Start with earnings Add back depreciation and amortization Add back non-cash charges Subtract maintenance capital expenditures If working capital increased, subtract change in working capital If working capital decreased, add change in working capital The difficult part of calculating owner's earnings is finding maintenance capital expenditures.
Income statement
Statement of operating results presented under the accrual basis of accounting
Reducing costs -
Step 1:Ask for a break down of costs. Step2: If any cost seems out of line, investigate why. Step 3: Benchmark the competitors. Step 4: Determine whether there are any labor-saving technologies that would help reduce costs Or investigate internal v external costs: *Internal -union wages -suppliers -materials -economies of sales -increased support system *External -economy -interest rates -government relations -transportation/shipping strikes
Pricing Strategies Cost-Based Pricing
Take all of our costs and add them up, add profit to it This way you will know the break even point
What is TFIDF?
Term frequency inverse document frequency. It is a weighting technique for text classifications. How important is a word in a document contained in a corpus?
Example of feature engineering
Text files: bag of words 1. Each word is associated with a unique integer 2. For each document, # occurances of each word is computed and stored in a matrix
What is the Central Limit Theorem?
The Central Limit Theorem states if we sample from a population given a sufficiently large sample size, the mean of the samples will be normally distributed (as long as the events are random and independent). It is true regardless of the distribution of the original population. The main idea behind it is that it is expensive and impractical to sample the entire population, so we can infer about the characteristics of a population given a sample.
What's the F1 score? How would you use it?
The F1 score is a measure of a model's performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would use it in classification tests where true negatives don't matter much.
How would you approach the "Netflix Prize" competition?
The Netflix Prize was a famed competition where Netflix offered $1,000,000 for a better collaborative filtering algorithm. The team that won called BellKor had a 10% improvement and used an ensemble of different methods to win. Some familiarity with the case and its solution will help demonstrate you've paid attention to machine learning for a while.
What does the Statement of Cash flows show?
The actual cash received during the period and how that cash was spent during the period Includes cash in hand and demand deposits and overdrafts
What's the difference between probability and likelihood?
The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English. If my assumption is not correct please let me know and I will revise my answer. Discrete Random Variables Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes etc). In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is pp and that coin tosses are independent). Denote the observed outcomes by OO and the set of parameters that describe the stochastic process as θθ. Thus, when we speak of probability we want to calculate P(O|θ)P(O|θ). In other words, given specific values for θθ, P(O|θ)P(O|θ) is the probability that we would observe the outcomes represented by OO. However, when we model a real life stochastic process, we often do not know θθ. We simply observe OO and the goal then is to arrive at an estimate for θθ that would be a plausible choice given the observed outcomes OO. We know that given a value of θθ the probability of observing OO is P(O|θ)P(O|θ). Thus, a 'natural' estimation process is to choose that value of θθ that would maximize the probability that we would actually observe OO. In other words, we find the parameter values θθ that maximize the following function: L(θ|O)=P(O|θ)L(θ|O)=P(O|θ) L(θ|O)L(θ|O) is called as the likelihood function. Notice that by definition the likelihood function is conditioned on the observed OO and that it is a function of the unknown parameters θθ. Continuous Random Variables In the continuous case the situation is similar with one important difference. We can no longer talk about the probability that we observed OO given θθ because in the continuous case P(O|θ)=0P(O|θ)=0. Without getting into technicalities, the basic idea is as follows: Denote the probability density function (pdf) associated with the outcomes OO as: f(O|θ)f(O|θ). Thus, in the continuous case we estimate θθ given observed outcomes OO by maximizing the following function: L(θ|O)=f(O|θ)L(θ|O)=f(O|θ) In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe OO as we maximize the pdf associated with the observed outcomes OO.
Audit Procedures
The auditor should perform procedures on any interrelated items as necessary. Examples are sales/receivable, inventory/payables, fixed assets/deprecation. *Audit of SHE* Specific elements based on SHE, auditor should perform procedures necessary to express an opinion on financial position because of interrelationship between SHE and BS *Audit of NI* Specific elements based on NI, auditor should perform procedures necessary to express an opinion on financial position and results of operations because of interrelationship between income, BS, and IS accounts
What is clustering?
The computers learn how to partition observations in various subsets. So each partition will be made of similar observations
Why do we call it GLM when it's clearly non-linear? (somewhat tricky question
The linear in "generalized linear model" says the parameters enter the model linearly. Specifically, what's meant is that on the scale of the linear predictor η=g(μ), the model is of the form η=Xβ. which may in turn be modeled using the linear model framework by using the appropriate link function. "Logistic" on the other hand refers to the description of a mean (that the mean is logistic in predictors). It's not a GLM unless you combine it with a conditional distribution that's in the exponential family. When people say "logistic regression" on the other hand, they almost always mean a binomial model with logit link - that does have mean that's logistic in predictors, the model is linear in parameters and is in the exponential family, so is a GLM.
What is the Binomial Probability Formula?
The first variable in the binomial formula, n, stands for the number of times the experiment is performed. The second variable, p, represents the probability of one specific outcome. For example, let's suppose you wanted to know the probability of getting a 1 on a die roll. if you were to roll a die 20 times, the probability of rolling a one on any throw is 1/6. Roll twenty times and you have a binomial distribution of (n=20, p=1/6). SUCCESS would be "roll a one" and FAILURE would be "roll anything else." If the outcome in question was the probability of the die landing on an even number, the binomial distribution would then become (n=20, p=1/2). That's because your probability of throwing an even number is one half. Binomial distributions must also meet the following three criteria: The number of observations or trials is fixed. In other words, you can only figure out the probability of something happening if you do it a certain number of times. This is common sense — if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very close to 100%. Each observation or trial is independent. In other words, none of your trials have an effect on the probability of the next trial. The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another. Once you know that your distribution is binomial, you can apply the binomial distribution formula to calculate the probability. What is a Binomial Distribution? The Bernoulli Distribution. The binomial distribution is closely related to the Bernoulli distribution. According to Washington State University, "If each Bernoulli trial is independent, then the number of successes in Bernoulli trails has a Binomial Distribution. On the other hand, the Bernoulli distribution is the Binomial distribution with n=1." A Bernouilli distribution is a set of Bernouilli trials. Each Bernouilli trial has one possible outcome, chosen from S, success, or F, failure. In each trial, the probability of success, P(S)=p, is the same. The probability of failure is just 1 minus the probability of success: P(F) = 1-p. (Remember that "1" is the total probability of an event occurring...probability is always between zero and 1). Finally, all Bernouilli trials are independent from each other and the probability of success doesn't change from trial to trial, even if you have information about the other trials' outcomes. What is a Binomial Distribution? Real Life Examples Many instances of binomial distributions can be found in real life. For example, if a new drug is introduced to cure a disease, it either cures the disease (it's successful) or it doesn't cure the disease (it's a failure). If you purchase a lottery ticket, you're either going to win money, or you aren't. Basically, anything you can think of that can only be a success or a failure can be represented by a binomial distribution. The Binomial Distribution Formula Binomial Distribution formula A Binomial Distribution shows either (S)uccess or (F)ailure. The binomial distribution formula is: b(x; n, P) = nCx * Px * (1 - P)^(n - x) Where: b = binomial probability x = total number of "successes" (pass or fail, heads or tails etc.) P = probability of a success on an individual trial n = number of trials or P(x) = n!/ (n-X)!X! * p^x * q^(n-x)
what is a primary key?
The primary key- a unique identifier EG. In a Customer table, the Customer Number could be the primary key.
What is Collaborative filtering?
The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.
What is Machine Learning?
The simplest way to answer this question is - we give the data and equation to the machine. Ask the machine to look at the data and identify the coefficient values in an equation. For example for the linear regression y=mx+c, we give the data for the variable x, y and the machine learns about the values of m and c from the data.
What is a Monad in Scala?
The simplest way to define a monad is to relate it to a wrapper. Any class object is taken wrapped with a monad in Scala. Just like you wrap any gift or present into a shiny wrapper with ribbons to make them look attractive, Monads in Scala are used to wrap objects and provide two important operations - Identity through "unit" in Scala Bind through "flatMap" in Scala
What are the two methods used for the calibration in Supervised Learning?
The two methods used for predicting good probabilities in Supervised Learning are a) Platt Calibration b) Isotonic Regression These methods are designed for binary classification, and it is not trivial.
Is it better to design robust or accurate algorithms?
The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before The generalization performance of a learning system strongly depends on the complexity of the model assumed If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system has poor generalization properties and is said to suffer from underfitting By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam's razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations Quick response: Occam's Razor. It depends on the learning task. Choose the right balance Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)
How can you assess a good logistic model?
There are various methods to assess the results of a logistic regression analysis- • Using Classification Matrix to look at the true negatives and false positives. • Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening. • Lift helps assess the logistic model by comparing it with random selection.
How do you ensure you're not overfitting with a model?
This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations. There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data. 2- Use cross-validation techniques such as k-folds cross-validation. 3- Use regularization techniques such as LASSO that penalize certain model parameters if they're likely to cause overfitting.
How can we use your machine learning skills to generate revenue?
This is a tricky question. The ideal answer would demonstrate knowledge of what drives the business and how your skills could relate. For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run. The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.
What do you think of our current data process?
This kind of question requires you to listen carefully and impart feedback in a manner that is constructive and insightful. Your interviewer is trying to gauge if you'd be a valuable member of their team and whether you grasp the nuances of why certain things are set the way they are in the company's data process based on company- or industry-specific conditions. They're trying to see if you can be an intellectual peer. Act accordingly.
Middle Paragraph(s)
This paragraph contains the following: (1) All of the substantive reasons that lead the auditor to conclude that there has been a departure from GAAP. (2) Disclosure of the principal effects of the subject matter, if practicable.
How will you define the number of clusters in a clustering algorithm?
Though the Clustering Algorithm is not specified, this question will mostly be asked in reference to K-Means clustering where "K" defines the number of clusters. The objective of clustering is to group similar entities in a way that the entities within a group are similar to each other but the groups are different from each other. For example, the following image shows three different groups. K Mean Clustering Machine Learning Algorithm Within Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS for a range of number of clusters, you will get the plot shown below. The Graph is generally known as Elbow Curve. Data Science Interview Questions K Mean Clustering Red circled point in above graph i.e. Number of Cluster =6 is the point after which you don't see any decrement in WSS. This point is known as bending point and taken as K in K - Means. This is the widely used approach but few data scientists also use Hierarchical clustering first to create dendograms and identify the distinct groups from there.
Breakeven Analysis
Total Fixed Costs / Contribution Margin per Unit
Ratio of Liabilities to Stockholders' Equity = ? / ?
Total Liabilities / Total Stockholders' Equity
What are the phases of supervised machine learning?
Training phase, validation phase, test phase, application
What is binning?
Transforms continuous features into a discrete one
Reducing Cost Cost Analysis- Internal Elements
Union Wages Suppliers Materials Economies of Scale Increase Support System
what is WSDL?
Users can access data from web services using a web services description language (WSDL) such as Java.
Ridge regression:
We use an L2L2 penalty when fitting the model using least squares We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficients λ: tuning parameter controls the bias-variance tradeoff accessed with cross-validation A bit faster than the lasso β^ridge=argminβ{∑(yi−β0−∑(xij)βj)^2+λ∑β^2}
What are the benefits and drawbacks of specific methods such as ridge regression?
We use an L2L2 penalty when fitting the model using least squares We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficientsλ×∑coefficients λλ: tuning parameter controls the bias-variance tradeoff accessed with cross-validation A bit faster than the lasso β^ridge=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1β2j}
Developing a New Product 1. Think about the Product
What is proprietary or special about it? Is the product patented? for how long? Are there similar products? Substitutes? What are the advantages or disadvantages of the new product? How does this new product fit in with the rest of our product line? Can our sales force sell it?
What questions can help us in quote to cash?
What percentage of quotes convert to sales orders? What percentage of customers don't pay their bills? How much of a discount does GBI give each year to customers who take advantage of discounted payment terms? What is the average delay in delivery? What percentage of products are damaged during shipment?
P-value?
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that's on trial, in essence, is called the null hypothesis. The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be untrue. The evidence in the trial is your data and the statistics that go along with it. All hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data are telling you about the population). The p-value is a number between 0 and 1 and interpreted in the following way: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
Starting a New Business 1. Initial Questions
Who is our competition? What market share does each competitor have? How do competitors' products or services differ from ours? Are there barriers to enter or exit?
New Business Market Elements
Who is the competition? What is their market share? Products comparison? Barriers to entry?
Turnarounds 1. Analyze the Company and the Industry
Why is it failing? Bad products or services? bad management? bad economy? Are the competitors facing the same problem? Do we have access to capital? Is the company publicly-traded or privately held?
For K₁>K₂>K₃
[C(S,K₁,T)-C(S,K₂,T)]/[K₁-K₂] ? [C(S,K₂,T)-C(S,K₃,T)]/[K₂-K₃],≥
Risk-Neutral Pricing (p*)
[exp[(r-∂)h]-d]/[u-d]
How do you take millions of users with 100's transactions each
amongst 10k's of products and group the users together in meaningful segments?,1. Some exploratory data analysis (get a first insight) Transactions by date Count of customers Vs number of items bought Total items Vs total basket per customer Total items Vs total basket per area 2. Create new features (per customer): Counts: Total baskets (unique days) Total items Total spent Unique product id Distributions: Items per basket Spent per basket Product id per basket Duration between visits Product preferences: proportion of items per product cat per basket 3. Too many features, dimension-reduction? PCA? 4. Clustering: PCA 5. Interpreting model fit View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1. Interpret each principal component regarding the linear combination it's obtained from
Random Note
an omission of the statement of cash flows counts as a qualified opinion
Dominos: There is an 8x8 chessboard in which two diagonally opposite corners have been cut off. You are given 31 dominos
and a single domino can cover exactly two squares. Can you use the 31 dominos to cover the entire board? Prove your answer (by providing an example or showing why it's impossible).,At first, it seems like this should be possible. It's an 8 x 8 board, which has 64 squares, but two have been cut off, so we're down to 62 squares. A set of 31 dominoes should be able to fit there, right? When we try to lay down dominoes on row 1, which only has 7 squares, we may notice that one domino must stretch into the row 2. Then, when we try to lay down dominoes onto row 2, again we need to stretch a domino into row 3. For each row we place, we'll always have one domino that needs to poke into the next row. No matter how many times and ways we try to solve this issue, we won't be able to successfully lay down all the dominoes. There's a cleaner, more solid proof for why it won't work. The chessboard initially has 32 black and 32 white squares. By removing opposite corners (which must be the same color), we're left with 30 of one color and 32 of the other color. Let's say, for the sake of argument, that we have 30 black and 32 white squares. Each domino we set on the board will always take up one white and one black square. Therefore, 31 dominos will take up 31 white squares and 31 black squares exactly. On this board, however, we must have 30 black squares and 32 white squares. Hence, it is impossible.
What's your favorite algorithm
and can you explain it to me in less than a minute?,This type of question tests your understanding of how to communicate complex and technical nuances with poise and the ability to summarize quickly and efficiently. Make sure you have a choice and make sure you can explain different algorithms so simply and effectively that a five-year-old could grasp the basics!
What is deep learning
and how does it contrast with other machine learning algorithms?,Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.
Grow and Increasing Sales 1. Learn about the company
and its size, resources and products,How big is it? What Products does it have? Is it a Market Leader in this Field? What is their Objective?(Profits, Market Share, or Brand Positioning?) Is the company in charge of their own pricing strategies, or is it reacting to to suppliers, market, or competition? How does Products compare to the competition? Are their substitutes or alternatives? Where is the product in its growth cycle? Is their a Supply-and-Demand issue at work?
Discuss the meaning of the ROC curve
and write pseudo-code to generate the data for such a curve.,
your goal should be to get your contact costs as ________ as possible
and your take rate as _________ as possible,low, high
maximum drawdown
biggest decline a stock has suffered as measured from stock price high to low
Differentiate between univariate
bivariate and multivariate analysis.,These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis. If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered as an example of bivariate analysis. Analysis that deals with the study of more than two variables to understand the effect of variables on the responses is referred to as multivariate analysis.
Differentiate between univariate
bivariate and multivariate analysis.,Univariate analysis is concerned with understanding one variable. For example, we can make a pie chart to analyze the breakdown of market share of coffee chains(e.g. starbucks, timmies, mcdonalds, et al) Bivariate analysis is concerned with the relationship of 2 variables, such as the relationship between sales as a function of $ spent on advertising
How would you create a taxonomy to identify key customer trends in unstructured data?
business owner, accuracy, results
offer a "free" repeat product/service that is of ______ cost to you
but _______ perceived value to your customer,low, high
Operating activities reflected in statement of cash flows
cash generated from operations interest paid income tax paid
simple moving average
determine hold vs sell 200 day moving average. trading above 200 day moving average have higher returns
earnings per share
earnings divided by total outstanding shares
product-market 2x2
existing products & existing markets: market penetration new products & existing markets: product development existing products & new markets: market development new products & new markets: diversify
You have a basketball hoop and someone says that you can play one of two games. Game 1: You get one shot to make the hoop. Game 2: You get three shots and you have to make two of three shots. If p is the probability of making a particular shot
for which values of p should you pick one game or the other?,Probability of winning Game 1: The probability of winning Game 1 is p, by definition. Probability of winning Game 2: Lets ( k, n) be the probability of making exactly k shots out of n. The probability of winning Game 2 is the probability of making exactly two shots out of three OR making all three shots. In other words: P(winning) = s(2 ,3) + s(3 ,3) The probability of making all three shots is: s ( 3, 3) = p^3 The probability of making exactly two shots is: P(making 1 and 2, and missing 3) + P(making 1 and 3, and missing 2) + P(miss ing 1, and making 2 and 3) * p * (1 -p) + p * (1 -p) * p + (1 -p) * p * 3(1-p)p^2 Adding these together, we get: p^3 + 3 ( 1 - p) p^2 p 3 + 3p^2 - 3p^3 3p 2 - 2p^3 Which game should you play? You should play Game 1 if P ( Game 1) > P ( Game 2): p > 3p^2 - 2p^3 • 1 > 3p - 2p^2 2p^2 - 3p + 1 > 0 (2p - l)(p - 1) > 0 Both terms must be positive, or both must be negative. But we know p < 1, so p - 1 < 0. This means both terms must be negative. 2p -1 < 0 2p < 1 p < • 5 So, we should play Game 1 if0 < p < • 5 and Game 2 if. 5 < p < 1. lf p = 0,0.5,or 1,then P(Game 1) = P(Game 2),so it doesn't matter which game we play.
Data is spread in all the nodes of cluster
how spark tries to process this data?,Ans: By default, Spark tries to read data into an RDD from the nodes that are close to it. Since Spark usually accesses distributed partitioned data, to optimize transformation operations it creates partitions to hold the data chunks
Number of Times Interest Charges Are Earned = ( ? + ? ) / ?
income before income tax + interest expense / interest expense
Qualified Opinion vs. Adverse Opinion: A qualified opinion should be expressed when the auditor concludes that misstatements
individually or in the aggregate, are material but not pervasive to the financial statements.,An adverse opinion should be expressed when the auditor concludes that misstatements, individually or in the aggregate, are both material and pervasive to the financial statements.
"a strong brand drives ______________ _________________ in purchasing"
initial preference
What information do shareholders want?
investment prospects, mgmt performance
r- square
is a measure of how well data fits a linear regression from 0-100. higher is better
XML- extensible markup language
is a method of tagging or coding data in documents, so that they can be read by both people and computers.
Management Accounting
is primarily concerned with the provision of information to managers. Deals with current problems and looking ahead, unlike financial accounting
that is
it tests whether the slope of the regression line is zero, H0:β=0H0:β=0 and Ha:β≠0Ha:β≠0. If the coefficient's p-value is less than 0.05, we reject the null hypothesis and conclude that we have sufficient evidence to be 95% confident that there is a significant linear relationship between the dependent and independent variables. Note that the p-value and R2 provide different information. A linear relationship can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high R2.) A confidence interval associated with an independent variable's coefficient indicates the likely range for that coefficient. If the 95% confidence interval does not contain zero, we can be 95% confident that there is a significant linear relationship between the variables. Residual plots can provide important insights into whether a linear model is a good fit. Each observation in a data set has a residual equal to the historically observed value minus the regression's predicted value, that is, ε=y−ŷ ε=y−y^. Linear regression models assume that the regression's residuals follow a normal distribution with a mean of zero and fixed variance. We can also perform regression analyses using qualitative, or categorical, variables. To do so, we must convert data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis. A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy variable for "Female" would equal 1 for all female observations and 0 for male observations.
Return on total assets
net income available to common stockholders/total assets
Discuss MapReduce (or your favorite parallelization abstraction). Why is MapReduce referred to as a "shared-nothing" architecture (clearly the nodes have to share something
no?) What are the advantages/disadvantages of "shared-nothing"?,
What is probabilistic merging (aka fuzzy merging)? Is it easier to handle with SQL or other languages?
on A the key is first name/lastname in some char set
debt
refers to bonds, credit lines and other borrowings
what is artificial Intelligence?
refers to the development of computer technologies that can reason and otherwise function in manners similar to humans. eg. self driving cars, robots
Poset
reflexive ⋀ antisymmetric ⋀ transitive
Explain what resampling methods are and why they are useful
repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model example: repeatedly draw different samples from training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fit differ most common are: cross-validation and the bootstrap cross-validation: random sampling with no replacement bootstrap: random sampling with replacement cross-validation: evaluating model performance, model selection (select the appropriate level of flexibility) bootstrap: mostly used to quantify the uncertainty associated with a given estimator or statistical learning method
The ________ is a rough measure of the length of time it takes to purchase
sell, and replace the inventory,number of days' sales in inventory
arithmetic growth rate
simple average of returns (and wrong)
BCG matrix
stars: high market share and high industry growth question mark: low market share and high growth cash cow: high market share and low industry growth dog/pet: low market share and low growth management should BUILD question marks, HARVEST cash cows, HOLD stars, DIVEST dogs/pets
case scenarios
strategy 1. entering new market 2. industry analysis 3. mergers and acquisitions 4. developing new product 5. pricing strategies 6. growth strategies 7. starting a new business 8. competitive response operations 9. increasing sales 10. reducing costs 11. improving bottom line (profitability) 12. turnarounds
what are the techniques of processing unstructured data?
tagged data, natural language processing, image recognition, and artificial intelligence.
Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Qualified Opinion Paragraph. When the auditor expresses a qualified opinion due to a material misstatement in the financial statements
the opinion paragraph should state that, in the auditor's opinion,,EXCEPT FOR the effects of the matter(s) described in the basis for qualified opinion paragraph, the financial statements are PRESENTED FAIRLY, in all material respects, in accordance with the applicable financial reporting framework.
For such corporations
the relative risk of the debt‐ holders is normally measured as the _______ (during the year), sometimes called the fixed charge coverage ratio.,number of times interest charges are earned
For American options
the value at each node is,max(calculated value, exercise value)
that is
they support the organization's business functions. Concurrent— Small uniform transactions Optimized for storage—
If a value represents the 99th percentile
this means that,99% of all values are below this value
and so on. We can solve this problem multiple ways. Logically If the earlier sum is 1
this would mean that the gender ratio is even. Families contribute exactly one girl and on average one boy. The birth policy is therefore ineffective. Does this make sense? At first glance. this seems wrong. The policy is clPsigned to favor girls as it ensures that all families have a girl. On the other hand, the families that keep having children contribute (potentially) multiple boys to the population. This could offset the impact of the "one girl" policy. One way to think about this is to imagine that we put all the gender sequence of each family into one giant string. So if family 1 has BG, family 2 has BBG, and family 3 has G, we would write BGBBGG. In fact, we don't really care about the groupings of families because we're concerned about the population as a whole. As soon as a child is born, we can just append its gender (B or G) to the string. What are the odds of the next character being a G? Well, if the odds of having a boy and girl is the same, then the odds of the next character being a G is 50%. Therefore, roughly half of the string should be Gs and half should be Bs, giving an even gender ratio. This actually makes a lot of sense. Biology hasn't been changed. Half of newborn babies are girls and half are boys. Abiding by some rule about when to stop having children doesn't change this fact. Therefore, the gender ratio is 50% girls and 50% boys.
what are the types of heiarchry tables
time dependent time- independent version dependent interval dependent
Why data cleaning plays a vital role in analysis?
time take, 80%
What is precision?
tp / (tp + fp)
what are the types of data sources?
transactional systems- Informational systems- Excel spreadsheet ERP- Integrated Enterprise Resouce Planning systems such as SAP and Oracle
Is Naïve Bayes bad? If yes
under what aspects.,
any of the following would indicate poor estimation or multi-collinearity: opposite signs of expectations
unusually large or small values, or observed inconsistency when the model is fed new data. Use the model for prediction by feeding it new data, and use the coefficient of determination (R squared) as a model validity measure. Use data splitting to form a separate dataset for estimating model parameters, and another for validating predictions. Use jackknife resampling if the dataset contains a small number of instances, and measure validity with R squared and mean squared error (MSE).
Blue-Eyed Island: A bunch of people are living on an island
when a visitor comes with a strange order: all blue-eyed people must leave the island as soon as possible. There will be a flight out at 8:00pm every evening. Each person can see everyone else's eye color, but they do not know their own (nor is anyone allowed to tell them). Additionally, they do not know how many people have blue eyes, although they do know that at least one person does. How many days will it take the blue-eyed people to leave?,Let's apply the Base Case and Build approach. Assume that there are n people on the island and c of them have blue eyes. We are explicitly told that c > 0. Case c = 1: Exactly one person has blue eyes. Assuming all the people are intelligent, the blue-eyed person should look around and realize that no one else has blue eyes. Since he knows that at least one person has blue eyes, he must conclude that it is he who has blue eyes. Therefore, he would take the flight that evening. Case c = 2: Exactly two people have blue eyes. The two blue-eyed people see each other, but are unsure whether c is 1 or 2. They know, from the previous case, that if c = 1, the blue-eyed person would leave on the first night. Therefore, if the other blue-eyed person is still there, he must deduce that c = 2, which means that he himself has blue eyes. Both men would then leave on the second night. Case c > 2: The General Case. As we increase c, we can see that this logic continues to apply. If c = 3, then those three people will immediately know that there are either 2 or 3 people with blue eyes. If there were two people, then those two people would have left on the second night. So, when the others are still around after that night, each person would conclude that c = 3 and that they, therefore, have blue eyes too. They would leave that night. This same pattern extends up through any value of c. Therefore, if c men have blue eyes, it will take c nights for the blue-eyed men to leave. All will leave on the same night
What is Task
with regards to Spark Job execution?,Ans: Task is an individual unit of work for executors to run. It is an individual unit of physical execution (computation) that runs on a single machine for parts of your Spark application on a data. All tasks in a stage should be completed before moving on to another stage. -A task can also be considered a computation in a stage on a partition in a given job attempt. -A Task belongs to a single stage and operates on a single partition (a part of an RDD). -Tasks are spawned one by one for each stage and data partition.
When looking at long term assets and long term liabilities
you are analyzing,Solvency
nonlinear relationships have correlation close to
zero
Call-Put Option Relationship: Theta
θcall-θput=[∂Se^(-∂T)-rKe^(-rT)]/365
Call-Put Option Relationship: Rho
ρcall-ρput=0.01TKe^(-rT)
Schroder Volatility Adjustment
σ(F)=σ(S)×S/F (Adjust ONLY if given historical σ)
Volatility Relationship: Option vs Underlying Asset
σ(option)=σ(stock)|Ω|
Revenue recognition
-Accrual basis -Cash basis
Net Income/ Cash Flow *Forecast Sales*
-Begin with economy's GDP forecast (x) - -Company growth rate based on market share analysis. -retain => company growth rate = industry -gain => >industry -lose => <industry
Focus strategies
-Depends on an industry segment that is of sufficient size, had good growth potential, and is not crucial to the success of other major competitors -Most effective when consumers have distinctive preferences
Financial statement analysis
-Equity investment decisions -Credit decisions -Review a supplier, customer, or a competitor -Audit/consulting engagement planning process -Corporate acquisitions and consolidations -Internal company review -Valuation engagement -A review of the past and projection of the future
Accrual basis
-Expenses recognized when they are incurred, regardless when they are paid -Revenues recognized when realized, regardless when cash is collected
Net income/Cash Flow *Top-down Approach*
-Forecast Sales -Now estimate income/cash flow
Intensive strategies
-Has to do with current products or services -Market penetration, market development, product development
A review of the past and a projection of the future: Is the company going to continue indefinitely?
-Liquidity analysis -Solvency analysis -Profitability analysis
Long-term assets
-Long-term investments -PP&E -Intangible assets -Other assets
Objectives should be:
-Specific -Measurable -Achievable -Relevant -Time bound -Congruent among organizational units
Balance sheet
-Statement of financial position -Current assets vs long term assets -Shows the economic resources of the businesses, then shows us how we finance the acquisition of those economic resources in 1 of 3 ways: 1. Creditors 2. Owner contributions 3. Prior years earnings
Synergies of related diversification
-Transferring competitively valuable expertise or other capabilities from one business to another -Combining the related activities of separate businesses into a singe operation to achieve lower costs -Exploiting use of a known brand name -Using cross-business collaboration to create strengths
Analyzing performance measures by organizational level
-Trying to create long and short term objectives -At higher level, focusing more on long-term -At lower level, focusing more on short-term
Screening *Limits*
-when screening with rates, they are not adjusted for GAAP vs IFRS -back-testing may not be relevant for future periods.
Assessing Credit Quality *4 C's of credit*
1)Character - quality of management 2)Capacity - ability to pay 3)Collateral - assets pledged 4)Covenants - limitations/restrictions
Footnotes
Current value of inventory (LIFO/FIFO)
Liquidity analysis
Focusing on current assets and current liabilities
Solvency analysis
Focusing on long term assets and long term liabilities
Full disclosure principle
If info would make a difference in an economic decision, that piece of info must be in a report
Conservatism constraint
Info being analyzed is more conservative than optimistic -Company anticipates losses, not gains
Most importantly and closely scrutinized by internal and external representatives:
Management's discussion and analysis
Current
Paid in upcoming year, or operating cycle of business, whichever is longer (usually 1 yr) -If this is not met, it is a long term liability
Cost leadership
Producing standardized products at a very low cost for consumers who are price-sensitive -Low end and traditional
Product development
Seeking increased sales by improving present products/services or developing new ones
Income from continuing operations
Where company will show normal and recurring revenues, expenses, gains, and losses -Most important part of the inc stmt for financial analysts
When looking at current assets and current liabilities
you are looking at,Liquidity