SPI Interview

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

A random variable X a follows normal distribution. An observation of X

denote by x, has z- score of 2. It means that,x is two standard deviations away from the mean of X

Long term sources of finance for private firm

-Listing on Stock Exchange, Private equity -Bank

Formula: ∆put

-e^(-∂T)*N(-d1)

if product is in growth stage

-emphasize marketing and competition

EFE matrix steps:

1. List key external factors 2. Wright from 0-1 3. Rate effectiveness of current strategies 4. Multiple weight* rating 5. Sum weighted scores

110010

62

Possible Values: Vega(put)

>0

What is cross-validation?

It's a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Synthetic Treasury

Ke^(-rT)=P(S,K,T)-C(S,K,T)+F0,T[S]*e^(-rT)

KPI

Key Performance Indicator or metric or feature.

why is sales revenue imporant?

Knowledge of revenue and profit by customer, by product, by sales channel, for example, is important.

What is a term loan?

A loan of a fixed amount for an agreed time and on specified terms

What are percentiles?

A percentile is a metric indicating a value, below which a percentage of values falls.

What latent semantic indexing is used for?

Learning correct word meanings Subject matter comprehension Information retrieval Sentiment analysis (social network analysis) Here's a great tutorial on it.

4 Ps

Product Price Promotion Place

21 Ways to Cut Costs Finance

Have customers pay sooner Refinance you debt Sell nonessential assets Hedge currency Rates redesign health insurance

000100

04

Going concern assumption

Assume the company has the ability to operate indefinitely, unless told otherwise

010001

21

1000

8

Principles - Monitoring Activities

*S*eperate and/or *O*ngoing Evaluations Communication of *D*eficiencies

5%

#/2 and #/10

33 and 1/3%

#/3

Current assets

-Cash -Short-term investments -Accounts receivable -Inventory -Prepaid expenses

What should we worry about if we have an experiment with 20 different metrics?

The more metrics you are measuring, the more likely it is you'll get a false positive

Star Schema

The star schema is a simple relational model that is easy to understand and that represents business transactions in a lucid manner

What are examples of supervised learning?

Classification, Neural Networks, Regression

14

E (hex)

P(S

K,T) Payoff,max(0,K-S(T))

Liquidation

Selling all of a company's assets, in parts, for their tangible worth -Most extreme -Can be very emotional

reducing churn can significantly impact your __________________

bottomline (profitability)

chance / event node

circle

Company

how does the company create and capture value

t-scores are _____ than z-scores

larger

Covariance

measures strength and direction of a linear relationship between two variables

Correlation is useful only for

measuring the strength of a linear relationship

Symmetric distributions

n>15

Standard Deviation

the square root of variance

What is recall?

tp / (tp + fn)

7-S framework

"hard" -- strategy, structure, systems "soft" -- style, skills, staff, shared values

10%

#/10

1%

#/100

Legacy systems

out of date systems

000101

05

NYC population

8.5 million

d ? exp[(r-∂)h] ? u

<

Possible Values: Ψput

>0

What are Recommender Systems?

A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.

What is a Broadcast Variable?

Ans: Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

Growth Strategies Approaches

Assessment Strategies

what is click stream

Clickstream analysis is the process of collecting and analyzing data about website visitors' mouse clicks

Option Greek Definition: Delta

Increase in option value per increase in stock price (∂C/∂S) (SLOPE)

At-the-Money

S=K

Calendar Spread

Sell C(S,K,T) and Buy C(S,K,T+t) for t>0

Maximax

Tries for the best outcome "Optimist"

What percentile does the median represent?

What percentile does the median represent?

snowflake schema

an enhancement to the standard star schema

ed (economic depreciation)

annual cash invest required to replace fixed assets

What is a Liability?

(Anything the company owes) A present obligation of the entity arising from past events, the settlement of which is expected to result in an outflow of resources.

sloan ratio

(net income - operating cash flow - investing cash flows) divided by (total assets)

Fact table

contains facts about the business

Variety

different forms of data

Two external dimensions

Stability positions and industry position (SP & IP)

Porter's 5 Forces 3. Pressure from substitute Products

Sugar vs High fructose Corn Syrup

Possible Values: Vega(call)

>0

Liquidity and solvency are associated with the

Balance sheet

What does the income statement report?

The profit (or loss) made by the organization in the period that has just ended

Strategy/Performance *Technology leader*

Short product cycles, high inventory turnover, high R&D costs.

debt to assets ratio

Total debt/total assets

ST: Strengths and threats

Use a firm's strengths to avoid or reduce the impact of external threats

Earnings per share

Used heavily with the P/E ratio

Are expected value and mean value different?

They are not different but the terms are used in different contexts. Mean is generally referred when talking about a probability distribution or sample population whereas expected value is generally referred in a random variable context.

Virtual Cubes

Virtual cubes are a combination of cubes or a portion of a cube that has been segmented for business analysis or for security reasons.

Ansoff Matrix

Ways to grow: Product development (new p, existing m) Market Development (existing p, new m) Market Penetration (existing p, existing m) Differentiation (new p, new m)

what is the leading indicator of future sales?

csat

Needed when there are outliers

median

Liquidity

solvency, and profitability are ______. A company that cannot pay its debts will have difficulty obtaining credit, which can decrease its profitability.,Interrelated

The sampling mean of an distribution becomes normal as

the sample size grows

what are foreign keys?

when the primary keys are then referenced to in other tables. EG. In a Sales Order table, the Customer Number is a foreign key because it references the Customer table's primary key.

Of the four interactions

which has no impact on the tables?,Read

100%

x1

Unmodified (Unqualified) Opinion

*Clean Opinion* -States that FS present fairly, in all material respect, the financial position, results of operations, and cash flow in conformity with financial reporting framework. -Unmodified: nonissuers -Unqualified: issuers May determine necessary to add additional communication to auditor report w/o modifying auditors opinion: emphasis-of-matter (nonissuers), other-matter (nonissuers), and explanatory paragraphs (issuer)

Misstatements Related to the Appropriateness of Financial Statement Presentation or Disclosure

- FS do not include all required disclosures - Disclosures are not in accordance with the framework - FS do not provide necessary disclosures - Required information (Statement of CF) has not been included

Why Spark even Hadoop exists?(stream)

-Near real-time data processing: Spark also supports near real-time streaming workloads via Spark Streaming application framework

000011

03

Qualified Opinion Due to Inadequate Disclosure - Nonissuers (Private Company)

1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "qualified audit opinion" 4. Basis for Qualified Opinion 5. Qualified Opinion Paragraph - "except for," "the financial statements are presented fairly"

Developing a New Product Step

1. Think about the Product 2. Think about Market Strategy 3. Think about the Customers 4. How are we going to get Funding

Nonissuer Report Qualified Opinion

1.Introductory paragraph: no change. 2.Management's responsibility paragraph: no change. 3.Auditor's responsibility paragraph. 4.Basis for qualified opinion paragraph. 5.Qualified opinion paragraph.

P(1/x

1/K,T),C(x,K,T)/(Kx)

010111

27

Inventory Conversion Period

365/Inventory Turnover

011111

37

What is a Transformer?

A Transformer is an algorithm which can transform one DataFrame into another.

Options Combination

A combination is defined as any strategy that uses both puts and calls

Cross Validation

A model validation technique that splits training data into two parts: one is a training set and the other is a validation set. Checks how well a model will generalize to new data.

Explain what a local optimum is?

A solution that is optimal in within a neighboring set of candidate solutions In contrast with global optimum: the optimal solution among all others

How would you evaluate a logistic regression model?

A subsection of the question above. You have to demonstrate an understanding of what the typical goals of a logistic regression are (classification, prediction etc.) and bring up a few examples and use cases.

Estimator

An estimator is an algorithm which is fit on a DataFrame to produce a Transformer

Please define executors in detail?

Ans: Executors are distributed agents responsible for executing tasks. Executors provide in- memory storage for RDDs that are cached in Spark applications. When executors are started they register themselves with the driver and communicate directly to execute tasks.

How would you the amount of memory to allocate to each executor?

Ans: SPARK_EXECUTOR_MEMORY sets the amount of memory to allocate to each executor.

What it is important to have a robust set of metrics for machine learning?

Any ml technique should be evaluated by using metrics for assessing the quality of results.

Type 5:

Best value focus strategy that offers products or services to a small group of customers at the best price available on the market

BCG Matrix

Cash Cows Stars Pet Question Marks Companies that have diversified portfolios have CashCows, Stars, and QuestionMarks

What do you understand by a closure in Scala?

Closure is a function in Scala where the return value of the function depends on the value of one or more variables that have been declared outside the function.

5 Cs

Company Costs Consumers Competition Climate

Comprehensive income

Company takes net income and adjusts it for unrealized gains and losses (in investments in stocks and bonds and foreign currency translations)

Market Penetration

Current Customers / Total Potential Customers

Nestle- Company

Current Market Share Growth Rate WCS Brands Pricing Strategy

Perpetuities (Zero Growth Stock)

Dividend/Required Return

Advantages of Equity Financing

Does not need to be paid back. Dividend can be cut/ stopped. Improves Solvency/Capital

BCG Matrix Stars

High Market Share High Growth Can be Leader in the market which gives alot of added benefits Growth in Market>Growth in Share Increases in Share> Increase Margins High Margin Will eventually become a cash cow

Developing a New Product 2. Think about the Marketing Strategy

How does this strategy affect our existing product line? Are we cannibalizing our own sales from an existing product? Are we replacing an existing product? How will this strategy expand our customer base and increase our sales? What will the competitive response be? If we are entering a new market what are the barriers to entry? Who are the major players and what are their respective market shares?

Practicable

Information is reasonably obtainable from management's accounts and records and that providing the information in the auditor's report doesn't require the auditor to assume the position of a preparer of financial information

what is the difference between informational and transactional systems?

Informational is used for analysis and transactional is used to record transactions.

Diversification strategies:

Introducing a new product or service -Related diversification, unrelated diversification

What is the 68-95-99.7 / Empirical Rule?

It is a shorthand rule used to remember the percentage of values that lie within the mean of a normal distribution. 68% of values fall within 1 SD, 95% of values fall within 2 SD, and 99.7% of values within 3 SD

Put Boundaries

K≥Pamer≥Peur≥max(0,Ke^(-rT)-F0,T[S]*e^(-rT))

Long Term Assets

Land+Building+Equiptment

What is Linear Regression?

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

Strategy/Performance *Strategy - Superior quality*

Lower volumes, higher margins -higher marketing/R&D costs

Response to competition - investigate

-What is the competitor's new product and how does it differ from what we offer? -What has the competitor done differently? -What's changed? - Have any other competitors picked up market share?

Developing a new product - Why this product

-What's special or proprietary about our product? -Is the product patented? -Are there similar products out there? Substitutions? -What are the advantages and disadvantages of the new product? -How does this new product fit into the rest of product line?

internal factors

-strategy -operations: marketing & sales, operations & logistics, finance & control, organization & culture, R&D

l

01101100

n

01101110

what are the components of enterprise warehouse?

1. *data acquisition layer*-This is where data from the source system(s) enter the data warehousing system. 2. *propagation layer*- stores data so that they can be used and reused for multiple applications. 3. *Corporate memory* is the destination of all corporate (organizational) data in their granular and harmonized forms.

What do Investors look for? (5)

1. A viable business opportunity 2. Realistic path to profitability 3. Understanding of any technology 4. The abilities of the people involved 5. Commitment of key personnel

What are the contents of a Business Plan?

1.Executive Summary 2.Company Description 3.Products and Services 4.Markets 5.Technology 6.Competition/Competitive Advantage 7.Business Model/Key Customers 8.Operations 9.Management 10.Financial Information - current position 11.Financial Projections - milestones

Nonissuer Report Adverse Opinion

1.Introductory paragraph: no change. 2.Management's responsibility paragraph: no change. 3.Auditor's responsibility paragraph. 4.Basis for adverse opinion paragraph. 5.Adverse opinion paragraph.

Asia

4.436 billion

100000

40

110001

61

110011

63

Example of Tagged data

<message> <to>mary</to> <from> john </from> <date> april 9, 2015 </date> <content> hello mary </content> </message>

Day Sales of inventory

= Inventory/ cost of sales * 365 average length of time a companies cash is tied up as inventory

acquisition cost

= cost per contact / take rate (aka % of customers who accept an offer)

what does the process of analytics include?

(a) identifying the problem, (b) gathering relevant data that frequently are not in a usable form (c) cleaning up the data to make them usable, (d) loading them into data storage models,

what does NLP reflect?

(a) when people are communicating with their computer they should be able to speak as they would to another person (b) the computer should be able to translate the speech into information and commands it can understand.

quick ratio

(current assets-inventory)/current liabilities

Central limit theorem

As long as you have a large enough sample size, the sample you take will fall somewhere along the normal distribution

Revenue

Comes before earnings- known as sales or top line

Principles - Control Environment

Commitment to *E*thics and Integrity *B*oard Independence *O*rganizational Structure Commitment to *C*ompetence *A*ccountability

Competition

Consolidation Industry Growth Product differences Exit Barriers

000000

00

b

01100010

z

01111010

starting a new business

1. entering a new market. does it make good business sense? -who's the competition? market share? how do their products compare to ours? -barriers to entry? 2. venture capitalist perspective -management -market and strategic plans -distribution channels -products -customers -finance

Barriers to studying a whole population

COST and SPEED are the main barriers to studying the whole population

Shareholders equity

Capital stock+ retained earnings

Average Absolute Variation

1. how far (absolute value) each data point is ways from the average 2. what is the average of these numbers

Use Mergers and Acquisitions framework when you hear

Is this merger a good idea?

13

D (hex)

reflexive

For every x ∈ A, xRx | every guy in the set is related to its self

Anti Symmetric

For every x,y ∈ A, xRy and yRx → x=y. | if xRy and yRx then x equals y

multidimensional model

one where a value such as revenue can be viewed by multiple aspects such as

Price drivers

price elasticity, commodity product vs. differentiated, positioning, pricing strategy

what is a web service?

simply an XML-based software system that enables users to access computing resources via a network. uses protocol simple object text protocol SOAP

Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods

such as ridge regression and lasso?,Used to prevent overfitting: improve the generalization of a model Decreases complexity of a model Introducing a regularization term to a general loss function: adding a term to the minimization problem Impose Occam's Razor in the solution

Form and Content of Auditor's Report Nonissuers= Private company When the auditor expresses a qualified or adverse opinion due to material misstatement of the financial statements

the "Auditor's Responsibility" paragraph,is modified and the auditor's report will include a "Basis for Modification" paragraph and a "Qualified Opinion" or "Adverse Opinion" paragraph, as appropriate.

Historisation

the ability to store the historical changes to a dimension attribute

Relational Database

the relationships between tables are created through the use of unique identifiers call primary keys.

When A and B are independent

P(A and B) = P(A)P(B)

Bayes' Rule

P(A|B) = P(B|A)P(A)/P(B)

Contribution Margin ($)

Price - Variable Cost

Financial Statement Issues - Material but not pervasive

Qualified Opinion

If a Product is in its Emerging Growth Stage concentrate on.....

R&D Competition Pricing

Can you write the formula to calculate R-square?

R-Square can be calculated using the below formula - 1 - (Residual Sum of Squares/ Total Sum of Squares)

Defensive strategies:

Retrenchment, divestiture, liquidation

Gross Profit ($)

Revenue - COGS

Issuer Report (adverse opinion)

Same as adverse with middle paragraph(s) explaining the substantive reasons and the disclosure of the principal effects. The opinion paragraph has familiar language with the "because of" and "do not present fairly" wording.

Nonissuer Report (adverse opinion)

Same as qualified except for a basis for ADVERSE opinion paragraph that lays out the same types of issues. In opinion paragraph, the language includes "because of" and "do not present fairly".

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer Introductory Paragraph:

Same as standard nonissuer audit report.

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Management's Responsibility Paragraph.

Same as standard nonissuer audit report.

What are the three stages to build the hypotheses or model in machine learning?

The standard approach to supervised learning is to split the set of example into the training set and the test.

What is variance?

The tendency to learn random things irrespective of the true signal

What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are a) Sequential ensemble methods b) Parallel ensemble methods

What are two techniques of Machine Learning ?

The two techniques of Machine Learning are a) Genetic Programming b) Inductive Learning

Three Vs of Big Data

Volume, Velocity and Variety

What does the balance sheet show?

What the organization owns and owes at the end of the period

Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

What's important here is to define your views on how to properly visualize data and your personal preferences when it comes to tools. Popular tools include R's ggplot, Python's seaborn and matplotlib, and tools such as Plot.ly and Tableau.

Explain how a ROC curve works.

The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It's often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).

Predictive Analytics

analyzes past performance

Jarrow-Rudd (Lognormal) Binomial Tree: u

exp[(r-∂-0.5σ^2)h+σ√h]

Jarrow-Rudd (Lognormal) Binomial Tree: d

exp[(r-∂-0.5σ^2)h-σ√h]

Cox-Ross-Rubinstein Binomial Tree: d

exp[-σ√h]

gross profit margin %

gross profit / revenue

Describe a hash table.

hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.

correlation

how securities move together range from -1 to 1

what is HTTP?

hypertext transfer protocol is a standard set of rules for sending web pages over the internet

101111

57

population data: LA

4M

0101

5

u ? exp[(r-∂)h] ? d

>

t

01110100

Knowledge

created when we learn from information

10

A (hex)

15

F (hex)

Competition segments acronym

MCBSR

in other words

a subset of the data.

Comparability

inter-company comparisons

staff

people -- brains, management, motivation

Text Tables

where textual master data are stored

000010

02

Historical cost principle

An issue related to the balance sheet and income statement -Analyst want to see the fair market value

Does shuffling change the number of partitions?

Ans: No, By default, shuffling doesn't change the number of partitions, but their content

What is F1?

Combines precision and recall into a single value

Sources of Short-term Finance

Debt Factoring Invoice Discounting

Transitive

For every x,y,z ∈ A, xRy and yRz → xRz. | if there's some guy x that's related to y and y's related to z then, x is related to z

What are the sources of Medium-term Finance?

Leasing Benefits Hire Purchase Term Loan Securitisation

Equivalence Relation

Reflexive, Symmetric, and Transitive

Replicating Portfolio

Sue^(∂h)∆+Be^(rh)=Cu | Sde^(∂h)∆+Be^(rh)=Cd

Liquidity

The ability to pay debts as they fall due

Analytics

The use of data, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers make better, fact‐based decisions

Adverse Opinion

This opinion is expressed when the auditor concludes that the misstatements, individually or in aggregate, are both material *and* pervasive to the financial statements. (GAAP Problem)

Whats a false negative?

When we wrongly accept the null hypothesis as highly probable

what is transactional data?

Whenever a business event is recorded by an information system, the relevant data are written to and stored in the database as transactional data

Can you use machine learning for time series analysis?

Yes, it can be used but it depends on the applications.

Probabilistic merging ( fuzzy merging )

You do a join on two tables A and B, but the keys are not compatible.

liabilities

future obligations a business is likely to owe

A business plan is used..

to facilitate implementation of the selected strategy

rule of 72

years to double = 72 / r (at 10% return, an investment will double every 7 years)

What is Chi-Square Selection?

Chi-Square is a statistical test used to understand if two categorical features are correlated.

Explain what resampling methods are and why they are useful. Also explain their limitations.

Classical statistical parametric tests compare observed statistics to theoretical sampling distributions. Resampling a data-driven, not theory-driven methodology which is based upon repeated sampling within the same sample. Resampling refers to methods for doing one of these Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping) Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests) Validating models by using random subsets (bootstrapping, cross validation)

Discrete Dividend Exercise (PUT)

Exercise ex-dividend

p>a

Fail to reject null hypothesis

Type I error

Ho is true, but we mistakenly reject it

what are the steps in figure 1.5?

Identify the goals gather the data design model apply model review results present findings Derive insights make decisions deploy strategy Improve

What is involved in strategic planning?

Identify the problem you are solving by producing your product/service i.e. What is the gap in the market you are filling?

Growth Strategies Assessment Elements

Is the industry growing? How are we growing compared to the industry? Are our prices relative to our competitors? What are our competitors marketing and development strategies? Which Segments have the most potential? Funding for higher growth

What is precision?

Precision: How many selected items are relevant? TP / ALL Recall: How many relevant elemenets were selected? TP / TP + FN

What do you understand by "Unit" and "()" in Scala?

Unit is a subtype of scala.anyval and is nothing but Scala equivalent of Java void that provides the Scala with an abstraction of the java platform. Empty tuple i.e. () in Scala is a term that represents unit value.

When are expenses accounted for?

When incurred, not when paid for

What are the two classification methods that SVM ( Support Vector Machine) can handle?

a) Combining binary classifiers b) Modifying binary to incorporate multiclass learning

Serious challenges with big data

correlation does not imply causality Large sample sizes can still be problematic Hard to analyze unstructured data (esp. video) Concerns about privacy

A company's ability to pay its current liabilities is called ______ analysis.

current position

Aspects of Big Data

data is available in real time data is available at larger scale data with less structure data on novel types of variables

Forward Price Binomial Tree: u

exp[(r-∂)h+σ√h]

Forward Price Binomial Tree: d

exp[(r-∂)h-σ√h]

Ratio of Fixed Assets to Long-Term Liabilities = ? / ?

fixed assets (net) / long-term liabilities

Return on stockholders equity

net income available to common stockholders/shareholders equity

Accounts Receivable Turnover = ? / ?

net sales / average accounts receivable

what is another name for informational systems?

online analytical processing or OLAP system

what is another name for Transctional Systems?

online transaction processing (OLTP) systems.

Analysis error

survey owner adjustments to represent what they believe is the true underlying population (i.e., "likely voter" adjustments)

What information does the government want?

tax due, regulatory requirements, grants

Descriptive analytics

uses data to understand past and present

Prescriptive analytics

uses optimization techniques

Which script will you use Spark Application

using spark-shell ?,Ans: You use spark-submit script to launch a Spark application, i.e. submit the application to a Spark deployment environment.

what is web scraping

web scraping, which is the process of searching for information on web pages and then stripping the html tags so the data can be stored in a structured format.

WACC

weighted average cost of capital E/E+D * Cost of E + D/E+D * Cost of D * (1-tax)

dividend yield

dividend payment per share divided by its share price

The ______ on common stock measures the rate of return to common stockholders from cash dividends.

dividend yield

dividend payout ratio

dividends divided by earnings percentage of earnings being paid as dividend

How can you define Spark Accumulators?

Ans: This are similar to counters in Hadoop MapReduce framework, which gives information regarding completion of tasks, or how much data is processed etc.

What is a prepayment?

advance payments for goods/services An asset not an expense

current liabilities

liabilities due within one year

Starting a New Business Steps 5. Products and Services

What is the product service or technology? What is the competitive edge? What are the disadvantages? Is the tech proprietary?

Give example of transformations that do trigger jobs(Spark)

Ans: There are a couple of transformations that do trigger jobs, e.g. sortBy , zipWithIndex , etc

Acceptability of the Financial Reporting Framework

Auditor should obtain an understanding of: 1. The purpose for which the single FS or specific element is prepared 2. The intended users 3. The steps taking by mgt to determine that the framework is acceptable in teh circumstances

d

01100100

e

01100101

i

01101001

k

01101011

110110

66

Return on Sales (ROS)

Net Income / Sales Revenue Profit as a percentage of revenue

ROA

Net Income/Assets

ROI (formula ext)

Net Income/Invested Capital

Pre-paid Forward on a Stock with Continuous Dividends

S(t)*exp[-∂(T-t)]

Current Ratio

CA/CL

Entering the market - good business sense

Brainstorm answers to these questions aloud: -Who is our competition and what size market share does each competitor have? -How do their products and services differ from ours? -How will we price our products or services? -Are there substitutions available? -Are there any barriers to entry?Suchas:capital requirements,access to distribution channels, proprietary product technology, or government policy. -Are there any barriers to exit? How do we exit if this market sours? -What are the risks?Suchas:market, regulation or technology?

volume

amount of shares of stock traded over a defined time period (typically one day)

net margin

(profit margin) net income divided by revenue

common barriers to entry

-capital requirements -access to distribution channels -proprietary product technology -government policy

1/9

.111

1/8

.125

1/7

.143

1/6

.167

1/5

.20

what the methods of data feedback?

1. Monitoring transactions, incidents, and other feedback mechanisms provides additional data for analysis. 2. Exception reporting. Embedded audit modules (EAMs) are one type of exception reporting. The EAM highlights data outside the expected values

Two ways to employ a cost-leadership strategy

1. Perform value chain activities more efficiently than rivals and control the factors that drive the costs of value chain activities 2. Revamp the firm's overall value chain to eliminate or bypass some cost-producing activities

100111

47

101000

50

101001

51

101010

52

101011

53

101100

54

101101

55

101110

56

110000

60

population data: NYC

8M

1001

9

What is the probability of obtaining a value less than or equal to two standard deviations below the mean?

95%

Confidence Interval

95% of data points are 1.96σ from the mean 95% of the time, the mean is 1.96σ from a data point

Possible Values: Ψcall

<0

Possible Values: ρput

<0

Possible Values: ∆put

<0

Advantages and Drawbacks of a Star Schema

A easy to understand only a single level of JOIN for any query D -it can contain a great deal of duplication of dimensional data -when the star contains alphanumeric primary and foreign keys, the query joins are slower, and performance may suffer

what are distributed system?

A system whose components (data and processes) are spread out over several locations instead of in one central location and thus can manage workload sharing.

What information do suppliers want?

Ability to pay for goods, long term viability of customer

Related diversification

Adding new but related products/services -Value chains possesses competitively valuable cross-business strategic fits

Unrelated diversification

Adding new, unrelated products/services -Value chains are so dissimilar that no competitively valuable cross-business relationships exist

Random Note

An omission of the statement of cash flows counts as a qualified opinion

Competitive Response Solutions

Analyze our current product and redesign, repackage, or move upmarket Introduce a new product Increase our profile with a marketing and public relations campaign Build Customer Loyalty Cut Prices Lock up raw materials and talent Acquire the competitor or another player in the same market Merge with a competitor to create a strategic advantage and become more powerful Copy the competitor

How can you create an RDD for a text file?

Ans: SparkContext.textFile

Break Even Market Share

BE Volume/Total market volume

Calmar ratio and MAR ratio

Calmar = 3 yr annualized return/ max draw down in past three years MAR ratio= annualized return since inception / max draw down since inception

Response to competition - choosing

Consider each, choose one: - Acquire the competitor, or another player in the same market. - Merge with a competitor to create a strategic advantage and make us more powerful. - Copy the competitor (e.g., Amazon.com vs. BarnesandNobIe.com). - Hire the competitor's top management. - Increase our profile with a marketing and public relations campaign.

Design of experiments

Design of experiments or experimental design is the initial process used (before data is collected) to split your data, sample and set up a data set for statistical analysis, for instance in A/B testing frameworks or clinical trials.

___________ measures the share of profits that are earned by a share of common stock. GAAP requires the reporting of earnings per share in the income statement.

Earnings per share (EPS) on common stock

Ways to expand global

Globalization Transnational Localization

How can you prove an improvement to an algorithm is an improvement over doing nothing?

Good experimental design 1. No selection bias in test data 2. Test data is a good model of the real world 3. Ensure results are repeatable

PP&E is usually on balance sheet at

Historical cost

Type II error

Ho is false, but we fail to reject it

Adjustments *Inventory*

LIFO vs FIFO

What is bias / variance trade off?

More powerful methods have less bias but more variance

For K₁>K₂

P(S,K₁,T) ? P(S,K₂,T),≥

Debt-to-Equity Ratio

Total Debt/Total Shareholder's Equity

Five C's Competition

Who are the biggest competitors? What market share do they each hold? Has the market changed in the last year? How do our services or products differ from the competition? Do we hold any strategic advantage over our competitors?

Starting a New Business Steps 6. Customers

Who are the customers? How can we best reach them? How can we ensure that we retain them?

Common‐sized statements are useful for comparing ________

_________, or _______.,the current period with prior periods, individual businesses with one another, or one business with industry averages

What information do lenders want?

ability to repay debt

Appropriateness of Accounting Policies

accounting policies aren't in accordance with the applicable financial reporting framework financial statements don't represent the underlying transactions entity hasn't complied with the financial reporting framework requirements for accounting for and disclosing changes in accounting policies

What do you understand by the term Normal Distribution?

bias, central value, bell shaped curve, random variables

Adverse Opinion Paragraph

because of ______ do not present fairly

statistically significant

been predicted as unlikely to have occurred by sampling error alone, according to a threshold probability—the significance level

Strategic decisions need to be made...

before a business plan is written

csat links ____________ and _________________ ____________

branding and customer loyalty

variable cost drivers

cogs, raw materials, energy inputs, labor, service

More C's

collaborators, costs, channels, competencies, capacity, culture

what does ASCII mean?

comma separated values files

z-score

counts the number of SDs (σ) that an observation is away from the mean (μ) If Z > 0, the observation is above the mean If Z < 0, the observation is below the mean

Calculating correlation

covariance (x,y) / stdev(x) * stdev(y)

Business situation segments acronym

cpcc

current ratio

current assets/current liabilities

upside and downside capture ratio

downside- measures how a portfolio performed vs a benchmark when the benchmark fell in value upside- measures how a portfolio performed vs a benchmark when the benchmark rose in value portfolio performance over period of time / benchmark performance over period of time

Risk-Neutral Pricing (C)

e^(-rh)*[p*Cu-(1-p*)Cd]

Real-World Pricing (C)

e^(-γh)[pCu-(1-p)Cd]

what are referential intergrity constraints?

generates normalized tables between them via primary key and foreign key pairs.

gross margin

gross profit divided by revenue - gross profit is revenue minus costs of goods sold - tells what percentage of revenue the business keeps before paying other expenses

gross profitability ratio

gross profits divided by assets higher ratio will typically outperform lower ratio

The percentage analysis of increases and decreases in related items in comparative financial statements is called _____ analysis.

horizontal

Pick an algorithm you like and walk me through the math and then the implementation of it

in pseudo-code. OK now let's pick another one, maybe more advanced.,

What are Recommender Systems?

information filtering systems, used - movies

Measures of central tendency

median, average, mode

value at risk (VaR)

minimum potential loss at a given confidence interval- typically uses historical returns and normal distributions

Omega Ratio

must first pick target return threshold (typically risk free or 0%) Sum of (returns above threshold - threshold return) / sum of abs(returns below threshold - threshold return) can be used on non-normal returns

fixed cost drivers

overhead, machinery, distribution, interest, depreciation, rent

Quick Ratio = ? / ?

quick assets / current liabilities

risk free rate of return

typically one year treasury bill no risk associated. used as comparative benchmark

For t<T

Ceur(T) ? Ceur(t) on a Non-Dividend Paying Stock,≥

Instead of Changing the industry price level(Which requires alot of cooperation) then you should focus on.....

Change the volume or costs because it is alot easier and under the company's control

Entering a New Market Market Elements

Competition Market Share Comparative Products and Services Barriers to Entry

What is associative rule learning?

Computer is given a large set of observations made up of multiple variables. The task is to learn relationships between variables. If A and B => C

If Profits are declining because of rising expenses.....

Concentrate on operational and financial issues ie COGS, labor, rent, and marketing costs

reducing costs: sales flat

profits declining (surge in costs),focus on internal costs vs. external costs internal: union wages, suppliers, materials, economies of scale, increased support systems external: economy, interest rates, government regulations, transportation/shipping strikes

sterling ratio

annualized return since inception / max draw down since inception +10%

dividend discount model

next years expected dividend/ (discount rate - growth rate) basic stock pricing formula to determine fair value

How long are Current Assets held by the entity?

no more than one period Inventory, trade receivables, prepayments, cash (and equivalents)

dividend payback period

number of years it would take for dividend growth to equal initial cost of purchasing stock. lower period is better

Sampling error

occurs whenever n < N, i.e., the sample is smaller than the relevant population.

The auditor cannot issue an unmodified report if the client

omits the statement of cash flows from the financial statements.

operating margin

operating profit (includes most expense but not interest or taxes) divided by revenue

depreciation

reduction in value of an asset over time

Central limit theorem states

regardless of the way in which the population data is distributed for large samples

Domain knowledge

relates to the expertise gained by individuals in certain areas or fields. For example, medicine is a domain.

Gross Profit Margin (%)

Gross Profit ($) / Revenue

Revenue

Price*Quantity

Four P's

Product Price Place Promotion

The _______ measures the rate of income earned on the amount invested by the stockholders.

rate earned on stockholders' equity

If a Product is in its Declining Stage....

Define niche market analyze the competition's play or think exit strategy

What is the difference between artificial learning and machine learning?

Designing and developing algorithms according to the behaviours based on empirical data are known as Machine Learning. While artificial intelligence in addition to machine learning, it also covers other aspects like knowledge representation, natural language processing, planning, robotics etc.

Data

Data are the raw figures, numbers, or text that serve as the starting point of analysis.

Information

Data become information when they reveal the causes or results of the event.

Data calibration

Data calibration is the method of establishing a relationship between a data point and a unit of measure that has been formally defined

what are examples of master data?

data about customers, products, vendors, employees, and fixed assets

what is anomalies?

data anomalies (irregularities). They threaten data integrity.

volume inputs and drivers

increase overall market share, overall market growth, new markets, changing customer demand,

Vasicek model

In finance, the Vasicek model is a mathematical model describing the evolution of interest rates. It is a type of one-factor short rate model as it describes interest rate movements as driven by only one source of market risk. The model can be used in the valuation of interest rate derivatives, and has also been adapted for credit markets. It was introduced in 1977 by Oldřich Vašíček[1] and can be also seen as a stochastic investment model.

Reporting on Incomeplete Presentation

Incomplete presentation that is otherwise in accordance w GAAP is a type of single FS. When reporting on incomplete presentation that is otherwise in accordance w GAAP, the report should include an emphasis of matter paragraph (after opinion) or explanatory paragraph (before): 1. States the purpose for which the presentation is prepared and refers to note that described basis of presentation 2. Indicates that the presentation is not intended to be a complete presentation of the entitys A, L, R, E

Mergers and Acquisitions Objectives Elements

Increase Market Access Diversify Holdings Pre-empt the competition Enjoy the tax advantages Incorporate synergies Increase Shareholder value

International Standards

International Financial Reporting Standard Generally Accepted Accounting Principles

What is the Poisson Distribution?

It is a discrete probability distribution that describes the outcome of events in a given unit of time or space. A discrete distribution refers to a countable # of outcomes, where as continuous distributions can have infinite values. • Calls received by TD EasyLine during payday • Number of bacteria per volume of mucuous • Number of customers at the checkout counter at Wal-Mart at any given minute • Number of breakdowns in the TTC per day In each example, x refers to the # of events that occur in that period of time during which an average of Mu events can be expected to occur. Events HAVE TO BE INDEPENDENT. Mean = mu | variance = mu

What is the goal of A/B Testing?

It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An example for this could be identifying the click through rate for a banner ad.

Why Spark even Hadoop exists? (1)

Iterative Algorithm: Generally MapReduce is not good to process iterative algorithms like Machine Learning and Graph processing. Graph and Machine Learning algorithms are iterative by nature and less saves to disk, this type of algorithm needs data in memory to run algorithm steps again and again or less transfers over network means better performance.

GBI- Global Bike Inc. History

John Davis is a bicyclist and a mountain racing champion. He created a company in the United States to produce trail bikes. Peter Weiss of Germany is an engineer who not only races road bikes but also designs bike frames. He formed a company to manufacture lightweight touring bike frames. John and Peter met in 2000 and merged their two companies to form GBI.

How is KNN different from k-means clustering?

K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points. The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn't — and is thus unsupervised learning.

Explain LSA

Layered-LSA model consists of multiple layers. Scalable- Data warehouses typically grow in scope and size. Architecture- LSA is an architecture that does not depend on any specific technology.

Turnaround Approach Strategy Elements

Learn about the company Review serviece, products and finances Secure Funding Review talent and culture Determine short term long term goals write a business plan reassure clients suppliers and distributors Prioritize goals and develop some small successes for momentum

Advantages of Debt Investing

Less Risk of losing investment More reliable income stream

What is logistic regression?

Logistic Regression is a regression model where the dependent variable (DV) is categorical. It is used when we are trying to predict binary outcomes (e.g. predicting whether a student will PASS/FAIL a test given the number of hours he spent studying). Another example is trying to predict whether a political candidate will WIN/LOSE, Predictor variables: -amt of $ spent on campaign -time spent on the campaign and so on...

For Sampling Data & For Distributions: Define Mean Value & Expected Value

Mean value is the only value that comes from the sampling data. Expected Value is the mean of all the means i.e. the value that is built from multiple samples. Expected value is the population mean. For Distributions Mean value and Expected value are same irrespective of the distribution, under the condition that the distribution is in the same population.

When to Issue an "Adverse Opinion"

Misstatements are material *AND* pervasive Example: 1. Material Misstatement

Model Fitting

Model fitting is a procedure that takes three steps: First you need a function that takes in a set of parameters and returns a predicted data set. Second you need an 'error function' that provides a number representing the difference between your data and the model's prediction for any given set of model parameters. This is usually either the sums of squared error (SSE) or maximum likelihood. Third you need to find the parameters that minimize this difference

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Auditor's Responsibility Paragraph:

Modify the paragraph to state: "Auditor believes that the auditor evidence obtained is sufficient and appropriate to provide a basis for the qualified audit opinion."

Nestle- Alt Mrkts

N America S America Europe Asia

Product segments acronym

NCDCSL

Net Profit Margin

NI/Rev

Explain what is "Over the Counter Market"?

Over the counter market is a decentralized market, which does not have a physical location, where market traders or participants trade with one another through various communication modes such as telephone, e-mail and proprietary electronic trading systems.

If decline in Sales analyze these three things.....

Overall declining market demand(soda sales have dropped as bottled water becomes the drink of choice) The possibility that the current marketplace is mature or your product is obsolete(Vinyl records compared to CDs) Loss of Market share due to substitutions(Video rentals got owned like blockbuster)

P/B Ratio

P0/BV Common Equity

What is PAC Learning?

PAC (Probably Approximately Correct) learning is a learning framework that has been introduced to analyze learning algorithms and their statistical efficiency.

Valuing Equity with PEG

PEG * Expected EPS * Growth Rate

For t<T

Pamer(T) ? Pamer(t),≥

In what areas Pattern Recognition is used?

Pattern Recognition can be used in a) Computer Vision b) Speech Recognition c) Data Mining d) Statistics e) Informal Retrieval f) Bio-Informatics

What is Interpolation and Extrapolation?

Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.

Management's Responsibility

Financial Statements & Internal Control 1. Preparation/fair presentation of FS in accordance to framework 2. Design, implemntation, and maintance of IC relevant to preparation and fair presenation of FS free from matieral misstament 3. Providing auditor with access to information nd persons

Two internal dimensions

Financial and competitive position (EP & CP)

Increasing Profits Revenue Elements

Identification of Revenue Streams Percentage of total revenue of each? Unusual balance? Have percentages changed?

Competitive Response Questions

If Competition comes out with a new product/ How does it differ from ours? What has the competitor done differently? Have any other competitors picked up market share? Have the consumers needs changed? Did they increase or expand into new channels?

Cox-Ross-Rubinstein Binomial Tree: u

exp[σ√h]

w

01110111

1/14

.0714

COGS

cost of goods sold

Cash Ratio

(Cash + Cash Equivalents)/CL

Asymmetric

∀a,b∈X(aRb→¬(bRa)) i.e. antisymmetric ⋀ irreflexive

001000

10

What is securitisation?

bundling similar assets to provide the backing for bonds

eva

nopat-wacc (ta-cl)

discrete uniform distribution: Variance

(b-a+1)^2 - 1 /12

if product in mature...

-manufactoring, cost, compt

residual income

operating income-rr

Data Provisioning

Provisioning -the process of providing users and systems with access to data.

With which programming languages and environments are you most comfortable working?

Python, Anaconda environment, PySpark

Qualified vs. Adverse (when to express)

Qualified - material but not pervasive Adverse - material AND pervasive

Audit Issues - Material but not pervasive

Qualified Opinion

Forward on a Sock with Discrete Dividends

S(t)*exp[r(T-t)]-CumValue(Div)

Pre-paid Forward on a Sock with Discrete Dividends

S(t)-PV(Div)

In-memory databases

SAP HANA that employ columnar storage and other technologies. The data in an in-memory database are stored in a columnar store in memory (RAM).

Customer segments acronym

SGMNPD

Give some situations where you will use an SVM over a RandomForest Machine Learning algorithm and vice-versa.

SVM and Random Forest are both used in classification problems. a) If you are sure that your data is outlier free and clean then go for SVM. It is the opposite - if your data might contain outliers then Random forest would be the best choice b) Generally, SVM consumes more computational power than Random Forest, so if you are constrained with memory go for Random Forest machine learning algorithm. c) Random Forest gives you a very good idea of variable importance in your data, so if you want to have variable importance then choose Random Forest machine learning algorithm. d) Random Forest machine learning algorithms are preferred for multiclass problems. e) SVM is preferred in multi-dimensional problem set - like text classification but as a good data scientist, you should experiment with both of them and test for accuracy or rather you can use ensemble of many Machine Learning techniques.

generic frameworks

SWOT, cost-benefit analysis

what are the components of data warehousing

Sources Systems Data Staging Data warehouses Data Mart Analytics Tool

Spark Low-Latency

Spark can cache/store intermediate data in memory for faster model building and training. Also, when graph algorithms are processed then it traverses graphs one connection per iteration with the partial result in memory. Less disk access and network traffic can make a huge difference when you need to process lots of data.

Speculative Execution (SPARK)

Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset . Such slow tasks will be re-launched in another worker. It will not stop the slow tasks, but run a new copy in parallel.

Velocity

Speed of data generation

What are support vector machines?

Support vector machines are supervised learning algorithms used for classification and regression analysis.

Early Exercise IS Optimal for Put (Necessary Conditions)

S∂<Kr | K(1-e^(-rT))>S(1-e^(-∂T))+C(S,K,T) or S-K>Ceur(S,K,T)

Working Capital Turnover

Sales/Average Working Capital

Backward integration

Seeking ownership or increased control over suppliers

Growth Rate

Retention Ratio * ROE Retention Ratio = (1 - Payout)

Operating Income

Revenue - Operating Expense

Gross Profit

Revenue*Gross Margin %

Increasing Profits Approachs

Revenue- E(P=R - C)M Always look at external first Costs Volume

What is selection bias?

Selection bias is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed. It is sometimes referred to as the selection effect.

What is sequence learning?

Sequence learning is a method of teaching and learning in a logical manner.

3 Areas where Material Misstatements may Arise

(1) Appropriateness of accounting policies (2) Application of Accounting Policies (3) Appropriateness of the financial statement presentation/appropriateness or adequacy of disclosures in financial statements.

termine value

(CF last yr x (1 +grwth rate))/(cost of capital -grwth rate)

cfroi

(cfo-ed)/cash invested

Generally Accepted Government Auditing Standards

*GAGAS* - Audits -Section: GAGAS -Standard Setting: Governmental Accountability Office -Provide guidance for audits of government organizations, programs, activities and of entities that receive government funds -Financial or performance audits of gov organizations, programs, activities,a nd of entities that receive government funds

The grand strategy matrix

-Based on two evaluative dimensions: competitive position and market (industry) growth

barriers of entry

-economies of scale -capital requirements -government policy -switching costs -access to distribution channels -product differentiation -proprietary product technology

Long term sources of finance for listed companies

-financial markets

How can you use Machine Learning library SciKit library which is written in Python with Spark engine?

,Ans: Machine learning tool written in Python, e.g. SciKit library, can be used as a Pipeline API in Spark MLlib or calling pipe().

How would you brodcast collection of values over the Spark executors?

,Ans: sc.broadcast("hello")

If the hypothesized value falls within the confidence interval

,Fail to reject Ho

The failure of the financial statements to contain adequate disclosure of related party transactions

,Or other required disclosures , would result in a qualified or adverse opinion , not a disclaimer of opinion.

If the hypothesized value falls outside the confidence interval

,Reject Ho

111111

77

100110

46

What do you mean by Dependencies in RDD lineage graph?

Ans: Dependency is a connection between RDDs after applying a transformation.

Possible Values: Γput

>0

What is a Gaussian?

A family of functions that show a "bell curve" shape.

Use supply and Demand framework when you hear

Capacity change through acquisitions, merger Build shut-down factory Capacity shift in response to change in demand

What are Bayesian Networks (BN) ?

Bayesian Network is used to represent the graphical model for probability relationship among a set of variables .

12

C (hex)

What is a non-current/long term liability?

Due by the entity for more than one period long term loans mortgages debt instruments

Gamma is the greatest when an option: A) is deep out of the money. B) is deep in the money. C) is at the money.

Gamma, the curvature of the optionprice/assetprice function, is greatest when the asset is at the money.

BCG Matrix Cash Cows

High Market Share Low Margins Generate excess amount of cash, past the amount that should be reinvested.

Stars II

High market share, high growth rate

What is an Incremental Learning algorithm in ensemble?

Incremental learning method is the ability of an algorithm to learn from new data that may be available after classifier has already been generated from already available dataset.

Ethical Requirements

Indepedent -Comply with ethical requirements related to FS including independence in both fact and apperance. Include AICPA Coe of Professional Conduct and rules of the state boards.

BCG Matrix Question Marks

Low Market Share High Growth They always almost require more cash then they can generate. If there is no cash they will fall behind and die It is a liability that does have a good pay off but needs to be fed capital

BCG Matrix Pets(Dogs)

Low Market Share Low Growth They may show profit but the cash needs to be reinvested for them to keep share Product is worthless except in liquidation No excess cash flows

Range

Maximum - Minimum

How would you deal with categorical features?

One-hot encoding

Differentiation

Producing products/services considered unique and directed at consumers who are relatively price-sensitive

New Product Approaches

Product Market Strategy Customers Financing

How do you avoid false positive?

Set a proper sample size

Data Staging

The process whereby data are organized and prepared for analysis

What is ensemble learning?

To solve a particular computational program, multiple models such as classifiers or experts are strategically generated and combined. This process is known as ensemble learning.

Debt-to-Total-Capital Ratio

Total Debt/Total Capital Total Capital = Debt + Equity = Total Assets

Correlation

Unitless measure of relationship between two variables Always between -1 and +1 Strong weak cutoffs |r| < .3 |r| >.7

What evaluation approaches would you work to gauge the effectiveness of a machine learning model?

You would first split the dataset into training and test sets, or perhaps use cross-validation techniques to further segment the dataset into composite sets of training and test sets within the data. You should then implement a choice selection of performance metrics: here is a fairly comprehensive list. You could use measures such as the F1 score, the accuracy, and the confusion matrix. What's important here is to demonstrate that you understand the nuances of how a model is measured and how to choose the right performance measures for the right situations.

Null Hypothesis

a statement about the population value that tis will be tested will be rejected only if the sample data provide enough contradictory evidence (Important! We can never accept the null! We can only reject it, or fail to reject it.)

Why is Naive Bayes so bad?

assumes features are not correlated

product factors

attributes, buyer decisions, competition, substitutions

appropriate when there are no outliers

average

Number of Days' Sales in Inventory = ? / ?

average inventory / average daily cost of goods sold

Data: Variance

average squared deviation from the average

promotion factors

awareness --> information search --> evaluation --> purchase --> repurchase

2. Data Visualization—Tools that feature advanced graphical representations such as heat maps

waterfall charts 3. Data mining—Advanced statistical tools that can be either descriptive or predictive in nature

Business situation segments

customer, product, company, competition

For options on Futures

d is,e^(-σ√h)

strategy

increased growth, increased profits, lower costs, new product development, new market

price earnings ratio

market price per share/earnings per share

Competition segments

market share, company, barriers to entry, supplier concentration, regulatory environment

npv formula

npv = pv - cost

revenue

price x volume (quantity)

annual market size

product rev/useful life(yrs)

Not informative of outliers

range

xCy if and only if |x - y| ≤ 1

where x is a Real number, and y is an integer,ex)

Formula: Elasticity

Ω=S∆/C

What are the different categories you can categorized the sequence learning process?

a) Sequence prediction b) Sequence generation c) Sequence recognition d) Sequential decision

tagged data

employ identifiers known as tags that are attached to the data elements to make them readable by a computer.

What information do employees want?

employment prospects, wage negotiations

Altman z plus score

estimates bankruptcy risk for companies 6.56A + 3.26B + 6.72C + 1.02D A= working capital / total assets B= retained earnings / total assets C= EBIT / Total assets D= Book value/ Total Liabilities >2.6 financially sound <1.1 likely to go bankrupt

When Spark works with file.txt.gz

how many partitions can be created?,Ans: When using textFile with compressed files ( file.txt.gz not file.txt or similar), Spark disables splitting that makes for an RDD with only 1 partition (as reads against gzipped files cannot be parallelized). In this case, to change the number of partitions you should do repartitioning.Please note that Spark disables splitting for compressed files and creates RDDs with only 1 partition . In such cases, it's helpful to use sc.textFile('demo.gz') and do repartitioning using rdd.repartition(100) as follows: rdd = sc.textFile('demo.gz') rdd = rdd.repartition(100) With the lines, you end up with rdd to be exactly 100 partitions of roughly equal in size.

The _____ ratio

sometimes called the working capital ratio or bankers' ratio, also measures a company's ability to pay its current liabilities.,Current

what are the three most common data structures?

spreadsheets, flat files, and databases.

market capitalization

total market value of a companys outstanding shares (price times outstanding shares)

Types of Material Misstatements

appropriateness of accounting policies application of accounting policies appropriateness of financial statement presentation or disclosures

Qualified Opinion Paragraph

except for _____ presented fairly

alpha

excess return. return greater than what would be expected from an investment using the capital asset pricing model. high alpha is outperforming market while controlled for risk (beta)

Companies will recognize revenues in the year 3 specific conditions are met

1. Have provided the goods/services to customers 2. Reasonable assured we will collect $ 3. We can determine the cost of providing those goods/services

Possible sections of the income stmt

1. Income from continuing operations 2. Income from discontinued operations 3. Extraordinary items 4. Cululative effect of a change in accounting principle 5. Net income 6. Comprehensive income 7. Earnings per share

Adjustments *Investments*

Classified as either 'available for sale' or 'trading'

Adjustments *Goodwill*

Company with internal growth vs company growth by M&A. -use tangible book value

demand elasticity

% change demand/%change in price

elasticity

% change volume / % change price

Material Misstatements related to Appropriateness of Financial Statement Presentation or Disclosures

(1) Financials do not include all required disclosures. (2) The disclosures are not presented in accordance with the applicable financial reporting framework. (3) Financials do not provide the disclosures needed to achieve fair presentation. (4) Info that is required to be presented, such as statement of cash flows, has not been included or disclosed in the financials.

Most Commonly Encountered GAAP Problems

(1) GAAP Consistency Change (unjustified) = Auditor Disagrees. (2) Inadequate Disclosure (3) Departure from GAAP (unjustified) (4) Unreasonable Accounting Estimates

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include: (3)

(3) A description of the nature of omitted information and inclusion of the omitted information, when practicable, if there is an omission of information that is required to be presented or disclosed.

Treynor Ratio

(Annualized return - risk free return) / Beta appropriate for use when a portfolio has diversified away non-systematic risk and only has systematic risk remaining

Turnarounds Steps

1. Analyze the Company and the industry 2. Possible actions

population data: US

300M

Options Credit Spread

A credit spread results from buying a long position that costs less than the premium received selling the short position of the spread

Data flow diagram (DFD)

A data flow diagram (DFD) is used to model the flow of data from one such object to another.

For an interest rate swap the swap spread is the difference between the: A) swap rate and the corresponding Treasury rate. B) fixed rate and the floating rate in a given period. C) average fixed rate and the average floating rate over the life of the contract.

A) swap rate and the corresponding Treasury rate. The swap spread is the swap rate minus the corresponding Treasury rate.

How would we reduce bias?

Add more features / more complex model

What's a Spark RDD?

An abstraction that distributes data and marshalls data behind the scenes

What is the advantage of using Scala over other functional programming languages?

As the name itself indicates Scala meaning Scalable Language, its high scalable, maintainability, productivity and testability features make it advantageous to use Scala. Singleton and Companion Objects in Scala provide a cleaner solution unlike static in other JVM languages like Java. It eliminates the need for having a ternary operator as if blocks', 'for-yield loops', and 'code' in braces return a value in Scala.

Entering the Market - Size of current & Future market

Ask interviewer: -What is the size of the market? -What is the growth rate? -Where is it in its life cycle?(stage of development:Emerging?Mature? Decline?) -Who are the customers and how are they segmented? - What role does technology play in the industry and how quickly does it change? - How will the competition respond?

Reducing Cost Approach

Assessment Cost Analysis- Internal Cost Analysis- External

Auditor's Responsibility

Auditing & Giving opinion (Attest Function: Opinion) 1. Maintain professional skepticism 2. Comply with ethical requirements 3. Exercise professional judgment throughout audit 4. Obtain sufficient appropriate audit evidence 5. Comply w GAAS

11

B (hex)

Standard Reports

Balance Sheet Income Statement Cash flow Statement

Five C's

Company Costs Competition Consumers/Clients Channels

Contribution Margin (%)

Contribution Margin ($) / Price

21 Ways to Cut Costs Labor

Cross-Train Workers Cut overtime Reduce employer 401k or 403k match Raise emplotyee contribution to health-care premium Institute 4 10hr days instead of 5 8hr days Convert workers into owners(If they are a stakeholder they will want to work harder) Contemplate layoffs Institute across the board pay decreases

DAGScheduler

DAGScheduler uses an event queue architecture in which a thread can post DAGSchedulerEvent events, e.g. a new job or stage being submitted, that DAGScheduler reads and executes sequentially.

Nestle- Growth Strategies

Distribution Channels Product Line Brands, Types of Waters Mktg Campaign(Acquire Competitor, Create Seasonal Balance) Prices

Constant (Gordon) Growth Dividend Discount Model

Dividend One Year After Period "t" / (R-G) D1 = D0 * (1+g)

interest coverage ratio

EBIT / interest expense higher is better- shows how well covered a companys interest expenses are by its earnings before tax and interest. over 1.5 is stable

Times-Interest Earned Ratio

EBIT/Interest Expense

Company segments acronym

EDCIFO

New entrants

Economies of scale Product differences Regulation retaliation

Income from discontinued operations

Ex. Maybe a company is trying to drop segment A bc they are no longer as profitable as the others

Why is feature engineering so important ?

Features are what you use to make predictions. Your choice of features can dramatically affect your model regardless of the algorithm you use. A simple algorithm on a good set of features can perform better than a sophisticated algorithm on a bad set

Pricing Strategies 1.Investigate the Company

How big is it? What Products does it have? Is it a Market Leader in this Field? What is their Objective?(Profits, Market Share, or Brand Positioning?) Is the company in charge of their own pricing strategies, or is it reacting to to suppliers, market, or competition?

Five C's Channels

How do we get our product in the hands of the end consumer? How can we increase our distribution channels? Are there areas of our market that we are not reaching? How do we reach them?

Four P's Place

How do we get the products to the end user? How can we increase or distribution channels? Do our competitiors have products in places that we dont? Do they serve markets that we cant reach? If so Why? How can we reach them?

What is the difference between Supervised Learning an Unsupervised Learning?

If an algorithm learns something from the training data so that the knowledge can be applied to the test data, then it is referred to as Supervised Learning. Classification is an example for Supervised Learning. If the algorithm does not learn anything beforehand because there is no response variable or any training data, then it is referred to as unsupervised learning. Clustering is an example for unsupervised learning.

Turnarounds 2. Possible Actions

Learn as much as possible about the company and is operations Analyze services, products and finances Secure sufficient fiancing, so your plan has a chance Review the talent and temperament of all employees and get rid of the deadwood Determine short term and long term goals Devise a business plan Visit clients, suppliers, and distributors to reassure them Prioritize goals and get some small successes under your belt ASAP to build confidence

Best Fit Line

Line that is down within dots in a scatter plot upward sloping line = positive correlation flat slopped line = no correlation

Price-earnings (P/E) ratio = ? / ?

Market Price per Share of Common Stock / Earnings per Share on Common Stock

If a Product is in its Growth Stage emphasize the.....

Marketing Competition

The Value Chain Marketing and Sales

Marketing Strategy Id of customer base and the cost of customer acquisition sales force issues

What are some classification methods?

Naive Bayes, SVM, Decision Trees, and Neural Networks

what is NLP?

Natural Language Processing- Programming languages such as Python allow developers to write programs that translate human voice and language into computer-readable text.

Rate Earned on Total Assets = ( ? + ? ) / ?

Net Income + Interest Expense / Average Total Assets

Out-of-the-Money

Option would not have a payout if it could be exercised.

The auditor's inability to observe physical inventories

Or apply alternative procedures to verify their balances could result in a disclaimer.

Characteristics of Business Entities (3)

Ownership structure Objective Liability

Trailing P/E

P0 / EPS for Past Year

Professional Skepticism

Professional judgment: make assessment yourself each year -Auditor plan and perform audit w professional skepticism. Recognition that circumstances may exist that cause FS to be materially misstated. Necessary to the critical assessment of audit evidence Alert for: -Evidence that contradicts other evidence obtained -Info that calls into question reliability of documents and responses to inquiries -Conditions that indicate possible fraud (Pressure, Opportunity, Rationalization) -Circumstances that suggest need for audit procedures in addition to GAAS

Nestle- Market

Size Growth Rate Major Players/MKT Share Changes in Industry Barriers- Gov Reg? Markets?- Home, Retail, Office

Black-Scholes Model Assumptions

Stock returns are normally distributed and independent over time. Risk-free rate, volatility and dividends are known and constant. No transaction costs. Possible to short-sell any stock and borrow any amount of money at the risk-free rate.

What is the advantage of performing dimensionality reduction before fitting an SVM?

Support Vector Machine Learning Algorithm performs better in the reduced space. It is beneficial to perform dimensionality reduction before fitting an SVM if the number of features is large when compared to the number of observations.

Early Exercise IS Optimal for Call (Necessary Conditions)

S∂>Kr | S(1-e^(-∂T))>K(1-e^(-rT))+P(S,K,T) or S-K>Ceur(S,K,T)

Call Boundaries

S≥Camer≥Ceur≥max(0,F0,T[S]*e^(-rT)-K*e^(-rT))

What percentile does the mode represent?

The answer cannot be determined without further information. the mean's location depends upon the distribution of the data set.

How would you create a taxonomy to identify key customer trends in unstructured data?

The best way to approach this question is to mention that it is good to check with the business owner and understand their objectives before categorizing the data. Having done this, it is always good to follow an iterative approach by pulling new data samples and improving the model accordingly by validating it for accuracy by soliciting feedback from the stakeholders of the business. This helps ensure that your model is producing actionable results and improving over the time.

Cumulative effect of a change in accounting principle

The catch-up adjustment of changing from one accounting principle to another -Ex. going from LIFO to FIFO

Example the quote to cash process

The complete set of business processes involved in selling, from creating initial offers for prospects to collecting cash.

Semi structured data

The data are considered semi-structured because they may contain both unstructured data in the form of text like what gets typed into searches and structured data stored automatically by the system based on the movements within the site

Randomization condition

The data values must be sampled randomly

Low p-value:

The data we have observed would be very unlikely if our null hypothesis were true -Should reject null hypotheses

Data Modeling

The definition of the data and their relationships

XBRL-extensible business reporting language

The goal was for business to report clear and understandable financial statements to the SEC. But now any activity that requires communicating unstructured data to a computer and a structured taxonomy of tags can use XBRL.

When an auditor issues an Adverse Opinion how is the opinion paragraph modified?

The opinion paragraph should include the following, "In the auditor's opinion, *because of* the significance of the matter(s) described in the basis for adverse opinion paragraph, the financial statements *do no present fairly* in accordance with the applicable financial reporting framework."

What is Model Selection in Machine Learning?

The process of selecting models among different mathematical models, which are used to describe the same data set is known as Model Selection. Model selection is applied to the fields of statistics, machine learning and data mining.

Give a popular application of machine learning that you see on day to day basis?

The recommendation engine implemented by major ecommerce websites uses Machine Learning

What are the assumptions required for linear regression?

The regression has five key assumptions: Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity

what is one example of transactional data?

The sales order data contain information about the date and time the order was created, the sales person who created it, the types and quantities of products ordered etc.

What happens to the sample mean and standard deviation as you increase the sample size?

The sample mean and standard deviation generally become closer to the population mean and standard deviation

What happens to the sample mean and standard deviation as you take new samples of equal size?

The sample mean and standard deviation vary but remain fairly close to the population mean and standard deviation

Independence assumption

The sampled values must be independent of each other

what is the components of the three tiered architecture?

The user interface or presentation tier The business services or business logic tier The data services and programming tier

If you decrease the confidence level (e.g. from 99% to 95%)

The width of the confidence interval decreases The precision of the confidence interval is lower

Five C's Consumer/Clients

Who are they? What do they want? Are we fulfilling their needs? How can we get more? Are we keeping the one we have?

short ratio

aka short float it shows the percentage of tradeable shares being sold short. higher ratio means more people are betting stock price will fall.

churn

aka, the loyalty metric = the percentage of existing customers who stop purchasing your product/service (measured in 30 days, 90 days, or 1 year)

what is LSA?

layered scalable architecture (LSA) is a flexible framework for data acquisition, storage, and retrieval that provides for a robust data warehousing process

d1

ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}+0.5σ^2T

outsourcing 2x2

low competitiveness & high strategic importance: improve or seek partner high competitiveness & high strategic importance: keep and leverage low competitiveness & low strategic importance: outsource high competitiveness & low strategic importance: seek different advantage

Threat of substitutes

price sensitivity product differences switching costs

discrete uniform distribution

where a finite number of values are equally likely to be observed • If there are n possible values, each value has a chance of 1/n of happening • EV = Mean = (a+b)/2, where a is the lowest possible value and b is the highest possible value

Call-Put Option Relationship: Delta

∆call-∆put=e^(-∂T)

Ωportfolio

∆portfolio*S/Value(Portfolio) (Assumes S is the underlying asset for all portfolio instruments.)

Portfolio Greeks

∑Greek(i) where Greek(i) is the Greek for investment i in the portfolio

Possible Values Ωput

≤0

Possible Values: Ωcall

≥1

If two events are independent

what is the probability that they both occur?,If A and B are independent, the most we can know about P(A and B) is that P(A and B) = P(A) * P(B).

it consists of a set of Bayesian Clauses

which captures the qualitative structure of the domain. The second component is a quantitative one, it encodes the quantitative information about the domain.

Forward on Currency

x(t)*exp[(rd-rf)(T-t)]

Adverse Opinion Due to Material Misstatement of the Financial Statements - Nonissuers (Private Company)

1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "adverse audit opinion" 4. Basis for Adverse Opinion 5. Adverse Opinion Paragraph - "because of," "the financial statements do not present fairly"

Qualified Opinion Due to Material Misstatement of the Financial Statements - Nonissuers (Private Company)

1. Intro Paragraph 2. Management's Responsibility Paragraph 3. Auditor's Responsibility Paragraph - "qualified audit opinion" 4. Basis for Qualified Opinion 5. Qualified Opinion Paragraph - "except for," "the financial statements are presented fairly"

customer factors

1. who? segmentation -demographics -socioeconomics -needs 2. market data -share -size -growth 3. decision-making -what's driving the buying decision?

Appropriateness of Accounting Policies

1.Accounting policies aren't in accordance w/ the applicable financial reporting framework. 2.Financial statements don't represent the underlying transactions. 3.Entity hasn't complied w/ the financial reporting framework requirements for accounting for & disclosing changes in accounting policies.

Middle Paragraph

1.All of the substantive reasons. 2.Disclosure of the principal effects.

1111

15

0011

3

0110

6

What probability falls within one standard deviation of the mean?

68%

What is Interpolation and Extrapolation?

Estimating a value, approximating a value

Discrete Dividend Exercise (CALL)

Exercise cum-dividend

Grow and Increasing Sales 3. How to increase Volume?

Expand the number of distribution channels Increase product line through diversification of products or services(Particularly, where they wont cannibalize other products or services you have already) Analyze the segments of the business that have the highest potential Invest in a Marketing Campaign Acquire a Competitor(If you want to increase Market Share) Adjust Prices Create a Seasonal Balance(Increase sales in every quarter, seel flowers in the spring, herbs in the summer, pumkins in the fall and christmas trees in the winter)

|Critical value|>|Test statistic|

Fail to reject Ho, there is not sufficient evidence that Ha is true

What is logistic regression? Or State an example when you have used logistic regression recently.

binary outcome, predict, political leader, outcome, binary, predictor variables

Hypothesis testing

is a way of summarizing our conclusions about data based on confidence intervals

Information Ratio

measures a portfolios consistency and returns relative to a benchmark (Portfolio return - index error) / tracking error high information ratio suggest successful value added portfolio manager

Frame error

mismatch between people who could possibly have been sampled is different from the true target population

capital expenditures

money spend by a business to purchase assets. (cap ex)

assets

property (tangible and intangible) that has value and could likely be used to meet debt, commitments, or liabilities

example: PC1=spendy axis (proportion of baskets containing spendy items

raw counts of items and visits)

ways to segment a population

-age -gender -geography -income -married/single

001100

14

What is the Central Limit Theorem?

Average random variables independently drawn from independent distributions are normally distributed

Type 2:

Best-value strategy that offers products/services to a wide range of customers at the best price-value available on the market

What is regression?

Regression gives the computer pairs of (inputs, continuous targets) and the computer learns to predict continuous values on unseen data

p<a

Reject null hypothesis

Pre-paid Forward on a Non-Dividend Paying Stock

S(t)

Matching principle

Will match revenues and expenses to come up with net income

equity

assets minus liabilities- gauge overall built up value in a corporation

Collaborative filtering

friends like, used in social context networks

revenue

-price/unit -number of units sold

What is market analysis?

Research on the target market segment -industry/retail -market tastes-price/quality -market spending Info can help define business plan and viability

New Product Product Elements

Special or Proprietary? Financing? Patented? Substitutions? Advantages/Disadvantages? Place in product line? Cannibalizing our own products? Replacing existing products?

Economic Order Quantity

Sq. Root ((2(Annual Sales Units)(Cost/Purchase Order))/Annual Carrying Cost/Unit)

Entering a New Market Entry Elements

Start from scratch Acquire an existing player Form a joint venture/strategic alliance with existing player

ROI

(rev-cost-inv)/capitla inv

Principles - Information and Communication

*O*btain and Use Information *I*nternally Communicate Information *E*xternally Communicate Information

reducing VC

--(think about risks related to quality, customer satisfaction, return on investments, etc.) -look for inefficiencies in the manufactoring process -renegotiate contracts with suppliers and distributors (consolidate purchasing, volume discounts) -vertical integration -get cheaper raw materials or labor -outsource to cheaper regions

Industry/Acquiring a diverse company - Future outlook

-Are players coming into or leaving the industry? -Have there been many mergers or acquisitions lately? -What are the barriers to entry and/or to exit

Extraordinary items (gains/losses)

-Associated with natural catastrophe -A company experiencing a large loss from an ice storm in Alabama -Unusual and infrequent

Lack of Consistency

-Deals w comparability of FS from year to year. -Evaluate whether due to change in acc principle or adj to correct material misstatement *Acceptability of Change in Accounting Principle* Justified -Consider whether: 1. Newly adopted acc principle is in accordance w framework 2. Method of accounting for change is acceptable 3. Disclosure related to change are appropriate and adequate 4. Entity has justified alternative acc principle is preferable -If satisfied w all four criteria, the auditor should include an emphasis-of-matter paragraph in report. Examples that Affect Consistency: -Change in acc estimate that is inseparable from change in acc principle: change in depreciation method. *Change in estimates ONLY or errors are not consistency issues* -Correction of error in acc principal - cash method to accrual -Change in reporting entity results in FS that are those of different reporting entity -Using equity method and any changes made Effect of Acceptable Change on Report -If effect of change in accounting principle is immaterial, no revision -If material, emphasis-of-matter paragraph should be added

The grand strategy matrix: Quadrant III

-Must make some drastic changes quickly to avoid further decline and possible liquidation -Extensive cost an asset reduction should be pursued first

QSPM

-Objectively indicates which alternative strategies are best -Uses input from stage 1 analyses and matching results from stage 2 analyses to decide objectively among alternative strategies

Developing a new product - customers

-Who are our customers -How can we best reach them? -Can we reach them through internet/direct sales? -How can we ensure that we retain them?

Starting a new business - Investigate the market

-Who is our competition? -What size of the market does each hold? -How do products/sales compare to ours? -Are there any barriers to entry? (capital requirements, access to distribution channels, proprietary product technology, government services

if product is in emerging growth stage...

-concentrate on R&D, competition, and pricing

cost (would like to segment costs)

-cost/unit -> FC/unit and VC/unit -number of units sold

reducing FC

-reduce overhead -excess capacity -get to economies of scale -trim extra employees -> think automation, union negotiations, reduce overtime -going up the learning curve -one-time investments (decrease) -reduce seasonality of demand by finding good alternate use of PP&E -outsource to cheaper regions

Annual reports/ 10K reports filed w/ the SEC

1. Management's discussion and analysis 2. Income stmt, balance sheet, stmt of cash flows 3. Footnotes, supplementary schedules 4. Auditor's report

continuous probability function

1. Probability that x is between two points, a and b, is the integral of f(x) from a to b 2. It is non-negative for all real x 3. The integral of f(x) from negative infinity to infinity is 1

Types of Material Misstatements

1.Appropriateness of accounting policies. 2.Application of accounting policies. 3.Appropriateness of financial statement presentation or disclosures.

Appropriateness of Financial Statement Presentation or Disclosures

1.FS don't include all required disclosures. 2.Disclosures aren't presented in accordance w/ the applicable financial reporting framework. 3.FS don't provide disclosures needed to achieve fair presentation. 4.Info required hasn't been included or disclosed in the FS.

Application of Accounting Policies

1.Management hasn't applied accounting policies in accordance with the applicable financial reporting framework. 2.Management hasn't applied accounting policies consistently. 3.Error in the application of an accounting policy.

Basis for Qualified Opinion Paragraph

1.Totally new. 2.Description & quantification of the financial statement effects of any misstatement. 3.Explanation of how disclosures are misstated. 4.Description of the nature of omitted information and the inclusion of the omitted information if practicable.

Basis for Adverse Opinion Paragraph

1.Totally new. 2.Description and quantification of the financial statement effects of any misstatement. 3.Explanation of how disclosures are misstated. 4.Description of the nature of omitted information and the inclusion of the omitted information if practicable.

K

1000 = 10^3

001001

11

1011

11

001010

12

0010

2

{(1

2), (2, 3), (3, 4)} is it anti symmetric and why?,there isn't any xRy and yRx

010000

20

Quartiles

25th percentile (Q1 = lower median) 50th percentile (Q2 - same as median) 75th percentile (Q3 - upper median)

010110

26

011000

30

Los Angeles

4 million

100010

42

100011

43

100100

44

population data: Great Britain

60M

110101

65

Normal Distribution: 3 Empirical Rules

68% observations fall within 1 SD 95% observations fall within 2 SD 99.7% (almost all) observations fall within 3 SD

0111

7

population data: world

7 billion

What section(s) is added to the Auditor's report when a Qualified or Adverse Opinion is issued for a non-issuer?

A *Basis for Modification* paragraph and a *Qualified Opinion* or *Adverse Opinion* paragraph or added, as appropriate. (GAAP Problems)

Give an example of a Transformer

A ML model is a Transformer which transforms a DataFrame with features into a DataFrame with predictions

Pipeline

A Pipeline chains multiple Transformers and Estimators together to specify a workflow

If two events A and B are mutually exclusive

A and B are disjoint events P (A and B) = 0

denormalized database

A denormalized database is one that was originally normalized to eliminate anomalies, after which select redundant data were restored.

What is a business plan?

A document outlining the key details of a future business (associated with start up or small companies seeking funding)

data scientist

A practitioner of data science, they are trained in mathematics, computer science, and statistics.

What is the mean?

Arithmetic mean is the sum of values / number of values. Central value of a discrete set of numbers.

Increasing Sales Approach

Assessment(Increasing sales doesnt neccessarily mean increasing profits) How?

Maximin

Avoids the worst outcome "Pessimist"

Explain the two components of Bayesian logic program?

Bayesian logic program consists of two components. The first component is a logical one

Bonds

Bonds have a face value, maturity and coupon Sold by company to raise funds Periodic coupon payments and final repayment of face value.

Entering the Market - How

Brainstorm pros and cons of: -Start from scratch (new business) -Acquire an existing player -Form a joint venture/strategic alliance

Annuity Present Value

C * (1 - (1/((1+r)^t))/r) C = Amount of Annuity (Equal Future CF) r = rate of return t = number of years

Put-Call Parity (General)

C(S,K,T)-P(S,K,T)=F0,T[S]*exp[-rT]-K*exp[-rT]

Black-Scholes Formula

C(S,K,σ,r,T,∂)=F0,T[S]*e^(-rT)*N(d1)-F0,T[K]*e^(-rT)*N(d2)

For K₁>K₂

C(S,K₁,T) ? C(S,K₂,T),≤

For K₁>K₂

C(S,K₁,T)-C(S,K₂,T) ? K₂-K₁,≥

Put-Call Parity (Exchange Options)

C(S,Q,T)-P(S,Q,T)=F0,T[S]*e^(-rT)-F0,T[Q]*e^(-rT)

Put-Call Parity (Currency Options)

C(x,K,T)-P(x,K,T)=xe^(-rfT)-Ke^(-rdT)

Current Assets

Cash + Supplies(inventory)+ Accounts Recievable

What is an example of unsupervised learning?

Clustering and Density Estimations

Accounts Receivable Turnover

Credit Sales/Average AR

Volume

Data size

non-sampling error

Estimation errors Biased survey responses Using a non-representative sample Measurement errors

Mergers and Acquisitions Price Elements

Fair? Affordable? How to pay? If the economy sours?

Greek Relationship: Written vs Purchased

Greek(Written) = -Greek(Purchased)

Net Profit ($)

Gross Profit - Depreciation - Amortization - Other Expenses - Interest - Tax OR Sales Revenue - Total Costs

Four P's Promotion

How can we best market our products? Are we reaching the right market? What kind of marketing campaigns have we conducted in the past? Were they effective? Can we afford to increase our marketing campaign?

Is it better to have 100 small hash tables or one big hash table in memory

I would say 100 small hash tables, it's because of how the hashtables are implemented internally. As the number of records grow the constant in O(1) will increase and you see the performance degradation.

Increasing Profits Costs Elements

ID fixed costs ID variable costs Shifts in cost Unusual costs Benchmark competiton Reduce costs with out damaging revenue Streams

The client's refusal to provide access to the minutes of the Board of Director's meeting would result

In a disclaimer of opinion.

Option Greek Definition: Gamma

Increase in delta per increase in stock price (∂^2∆/∂S^2) (CONVEXITY)

Developing a New Product Consumer Adoption Rates

Innovators - 2.5% Early Adopters - 13.5% Early Majority- 34% Late Majority- 34% Laggards- 13.5%

What is linear least squares regression?

Linear

New Business Cost-Benefits Analysis Elements

Management Marketing and Strategic Plan Distribution Channels Product Customers Finance

New Business Approaches

Market Cost-Benefit Analysis

Adjustments *Long-lived Assets*

Methods & estimates

ROE

Net Income/SE

Do gradient descent methods always converge to same point?

No, they do not because in some cases it reaches a local minima or a local optima point. You don't reach the global optima point. It depends on the data and starting conditions

Operating Margin

Operating Income / Total Revenue

Adjustments *Off-balance Sheet Financing*

Operating vs Financing

When A and B are not independent

P(A and B) = P(A)P(B|A)

Binary Relation

Relationship between two sets, A and B, where the relation is a subset of A x B

Reorder Point

Safety Stock + (Lead Time * Sales During Lead Time)

Market penetration

Seeking increased market share for present products or services in present markets through greater marketing efforts

Paired data assumption

The data must be paired

What is bias?

The learner's tendency to learn about the same wrong thing

What is Invoice Discounting?

Third party provides a loan based on debt due Recourse -If debtor defaults the factoring company can seek payment from company.

Porter's 5 Forces 2. Intensity of Rivalry among Competition

This is a factor that plays a big role, could force the market into a price war and the company with the lowest costs would be able to survive

Architected data mart layer

This layer enables users to access data stored in the warehouse logically and efficiently

Cost leadership strategies

To employ one successfully, a firm must ensure that its total costs across its overall value chain are lower than competitors' total costs

Audit Issues - None or Immaterial

Unmodified (Unqualified)

Solvency

Value of assets is greater than the value of liabilities capital(equity) = assets - liabilites

What is Variance?

Variance is the error representing sensitivities to small training data fluctuations. (overfitting)

Call-Put Option Relationship: Vega

Vega(call)=Vega(put)

What are sources of funds? (5)

Venture Capital Business Angels Bank Loan Informal Government Grants

The Value Chain Delievery

Warehousing and Distribution Channels

Overfitting

When a model makes good predictions on training data but has poor performance on the test data

If you lower price and volume rises and you are pushed beyond capacity.....

Your costs will rise as employees will have to work overtime and your profits will suffer

Real-World Pricing (p)

[exp[(α-∂)h]-d]/[u-d]

Options Debit Spread

a debit spread results when the long position costs more than the premium received for the short position — nonetheless, the debit spread still lowers the cost of the position.

Probability

a number between 0 and 1 that measures the likelihood that some event will occur

Overdraft

a permit to overdraw an account up to a stated limit. Repayable on demand

In a _____

all items are expressed as percentages with no dollar amounts shown.,common‐sized statement

Random Variable

associates a numeric value with each possible random outcome Random variables can be discrete or continuous.

All users of financial statements are interested in the ability of a company to: Maintain _____ and _____. Earn income

called _____.,Liquidity, solvency, profitability

Nonsampling error

can come in many forms, and bigger samples don't help us reduce the problem: • Selection of sample (non-response, survival) • Non-truthfulness • Measurement error

cfroi definition

cash based metric to compute real interest rate of return on a companies assets express as a rate (cfo- cash invested in fa)

Elasticity

change in demand / change in price

3 Cs

company, customers, competitors

Flat file

contains data in text format with no structured relationship among the data.

In a vertical analysis of the _____

each asset item is stated as a percent of the total assets.,balance sheet

Company segments

expertise, distribution channels, cost structure, intangibles, financial situation, organisational structure

ETL

extracion transformaional loading

Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Middle Paragraph(s): A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (2). Disclosure of the principal effects of the subject matter of the adverse opinion on

financial position, results of operations, and cash flows, if practicable. (a). If the effects are not reasonably determinable, the report should so state. (b). If such disclosures are made in a note to the financial statements, the explanatory paragraph(s) may be shortened by referring to it.

The ratio of ______ to _____ is a solvency measure that indicates the margin of safety of the note‐holders or bondholders. It also indicates the ability of the business to borrow additional funds on a long‐term basis.

fixed assets to long‐term liabilities

core competency

hard to imitate efficiency of competitors perceived customer benefits

How would you improve a spam detection algorithm that uses Naïve Bayes?

hidden decision trees or decorrelate your features.

Corporations in some industries normally have _____ ratios of debt to stockholders' equity.

high

52 week range

high low over the past year

The relationship between the volume of goods (merchandise) sold and inventory may be stated as the ______ turnover. The purpose of this ratio is to assess the efficiency of a firm in managing its stuff.

inventory

in other words

it contains a list of historical transactions.

style

leadership style, meritocracy, etc.

d2

ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}-0.5σ^2T

Long Term liabilities

long term debt

market book value

market price per share/book value per share of equity

what is master data?

master data represent business entities that support business transactions

Other

more specific strategies,-Cooperation among competitors -Joint venture/partnering -Merger/acquisition -First mover advantages -Outsourcing

standard deviation

most commonly used risk metric- calculated as the annualized stock price standard deviation.

Qualified Opinion Due to Material Misstatement of Financial Statements: Basis for Qualified Opinion: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include: (1) A description and quantification

of the financial effects of any misstatement that relates to specific amounts in the financial statements. (b). If disclosure of the financial effects is made in the notes to the financial statements, the basis for the modification paragraph can be shortened by referring to the disclosure.

take rate is an ____________________ metric

operational (measures internal effectiveness)

Mean Absolute Deviation

or the average absolute deviation, is the average of the deviations (not squared)

discrete probability function

p(x), is a function that satisfies the following properties: 1. Probability that x can take on a specific value is p(x) 2. p(x) is non-negative for all real x 3. The sum of p(x) over all possible values of x is 1

For options on Futures

p* is,[1-d]/[u-d]

Each liability and stockholders' equity item is stated as a _____ of the total liabilities and stockholders' equity.

percent

How will you explain logistic regression to an economist

physican scientist and biologist?,

The number of times interest charges are earned can be adapted for use with dividends on ______ stock.

preferred

what is the most common data model?

relational data model

contribution margin

revenue - variable costs (1 - COGS + all other expenses / revenue)

Earnings and adjusted earnings

revenue minus expenses= earnings (profit) adjusted earnings for 1 time or unusual expenses (lawsuit, restructuring, acquisitions)

Statistics

rigorous branch of mathematics that deals with understanding data. It involves the collection or sampling, organization, modeling, analysis, interpretation, and presentation of data.

balanced scorecard looks at

shareholders, customer satisfaction, internal functions, innovation and learning

tracking error

shows the difference in performance between a fund or index and its benchmark

Customer segments

size, growth, market share, needs, price sensitivity, distribution channels

what are intelligent control systems?

software processes that work autonomously with distributed systems to control or run a system both with and without human intervention.

what is image recognition?

software scans a picture and translates what it "sees" into a textual description of whatever is depicted in the picture.

why local optimum is important in a specific context

such as K-means clustering,K-means clustering context: It's proven that the objective cost function will always decrease until a local optimum is reached. Results will depend on the initial random cluster assignment

or (4). Information that is required to be presented

such as a statement of cash flows, has not been included or disclosed in the financial statements.

The relationship between the total claims of the creditors and the owners

the ratio of _____ to ______, is a solvency measure that indicates the margin of safety for creditors.,liabilities to stockholders' equity

Data Source

the set of fields that are chosen for analysis in the source system.

what are data models?

the structure of a database

Computer Science

the study of how computers work and the application of theory to improve computing methods and capabilities.

Non-integrated special system

there are some systems that just stand alone. E.G P.O.S is usually linked with the accounting inventory systems. But most small business, owners can understand everything by just looking at their P.O.S system.

Nonresponse error

there is a consistent relationship between who answered and their response

what are the three dimension tables in star schema?

time unit data packaging

why do we use flat files?

to transfer data from one flat file location to another.

Correlation is not affected by

units of x and y

A percentage analysis used to show the relationship of each component to the total within a single financial statement is called _____ analysis.

vertical

You have RDD storage level defined as MEMORY_ONLY_2

what does _2 means ?,Ans: number _2 in the name denotes 2 replicas

What is stage

with regards to Spark Job execution?,Ans: A stage is a set of parallel tasks, one per partition of an RDD, that compute partial results of a function executed as part of a Spark job.

Pre-paid Forward on Currency

x(t)*exp[-rf(T-t)]

Call-Put Option Relationship: Gamma

Γcall=Γput

Call-Put Option Relationship: Psi

Ψcall-Ψput=-0.01TSe^(-∂T)

net profit margin %

net income / revenue

Return on Equity (ROE)

net income divided by equity

roi

net income/total assets

What is bias?

Bias is the error representing missing relations between features and outputs

Types of LT Finance (2)

Equity - sell part of company debt - borrow money

Forward on a Stock with Continuous Dividends

S(t)*exp[(r-∂)(T-t)]

Forward on a Non-Dividend Paying Stock

S(t)*exp[r(T-t)]

What are the areas in robotics and information processing where sequential prediction problem arises?

The areas in robotics and information processing where sequential prediction problem arises are a) Imitation Learning b) Structured prediction c) Model based reinforcement learning

Why are vectors used in machine learning?

The give a synthetic summary of characteristics of real world objects.

What is Chi-Square distribution?

The chi-square test is a tool that is used to identify if there is a relationship that exists between two given categorical data types. It is calculated by comparing observed vs. expected frequencies *R-squared: numerical data | Chi-squared: categorical data • Do certain products sell better in certain geographic regions? • Does gender influence car-color preference? (X <- gender, Y <- color) • In consumer marketing, a common problem that any marketing manager faces is the selection of colors for packaging. Assume that a manager wishes to compare five different colors. He is interested in knowing which of the five is the most preferred one so that it can be introduced in the market. A random sample of 400 consumers reveals the following:

Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). Report of Independent Registered Public Accounting Firm.

The company has excluded, from property and debt in the accompanying balance sheets, certain lease obligations that, in our opinion, should be capitalized in order to conform to accounting principles generally accepted in the United States of America.

What is classification?

The computer is given pairs of (inputs, target classes) and the computer learns to attribute classes to unseen data.

If Sales are Flat but Market Share is the same.....

This could be that the industry sales are flat and your competition is also facing the same issues

Pick an algorithm. Write the psuedo-code for a parallel implementation.

This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your ability to write code that reflects parallelism.

Qualified Opinion

This opinion is expressed when the auditor concludes that the misstatements, individually or aggregate, are material *but* not pervasive to the financial statements.

Basis for Modification Paragraph (Qualified or Adverse)

This paragraph contains the following: (1) A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (2) An explanation of how disclosures are misstated if there is a material misstatement related to narrative disclosure. (3) A description of omitted information and inclusion of the omitted information, *when practicable*, if there is an omission of info that is required to be presented or disclosed.

Pricing Strategies 3. Determine the Pricing Strategy

Three options: Competeitive Analysis Cost-Based Pricing Price-Based Costing Run through all of these and determine the pros/cons

Can you explain the difference between a Test Set and a Validation Set?

Validation set can be considered as a part of the training set as it is used for parameter selection and to avoid Overfitting of the model being built. On the other hand, test set is used for testing or evaluating the performance of a trained machine leaning model. In simple terms ,the differences can be summarized as- Training Set is to fit the parameters i.e. weights. Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization. Validation set is to tune the parameters.

25%

#/4 or #/2 twice

BE Price

(FC/BE Volume) +VC/unit

Statements on Auditing Standards

*SAS* - Audits -Section: AU-C -Standard Setting: AICPA Auditing Standards Board -Provides generally accepted auditing standards for audits of *nonissuers*. Provide guidance for other services like review of interim financial information and letters to underwriters *Private Company* -Audits of annual FS: nonissuers, Special reports: nonissuers, Interim FS: nonissuers

if sales are flat but market share is constant...

-could indicate industry sales are flat -examine competition

Vertical Spread

A money spread, or vertical spread, involves the buying of options and the writing of other options with different strike prices, but with the same expiration dates.

A stock is priced at 38 and the periodic riskfree rate of interest is 6%. What is the value of a twoperiod European put option with a strike price of 35 on a share of stock using a binomial model with an up factor of 1.15 and a riskneutral probability of 68%? A) $0.57. B) $0.64. C) $2.58.

A) $0.57. Given an up factor of 1.15, the down factor is simply the reciprocal of this number 1/1.15=0.87. Two down moves produce a stock price of 38 × 0.87 2 = 28.73 and a put value at the end of two periods of 6.27. An up and a down move, as well as two up moves leave the put option out of the money. You are directly given the probability of up = 0.68. The down probability = 0.32. The value of the put option is [0.32 2 × 6.27] / 1.06 2 = $0.57.

Financial Statement Issues - Material and Pervasive

Adverse Opinion

Which cluster managers can be used with Spark?

Apache Mesos, Hadoop YARN, Spark standalone and Spark local: Local node or on single JVM. Drivers and executor runs in same JVM. In this case same node will be used for execution.

An adverse opinion is issued when the financial statements

Are not presented in accordance with GAAP.

A portfolio manager holds 100000 shares of IPRD Company (which is trading today for $9 per share) for a client. The client informs the manager that he would like to liquidate the position on the last day of the quarter, which is 2 months from today. To hedge against a possible decline in price during the next two months, the manager enters into a forward contract to sell the IPRD shares in 2 months. The riskfree rate is 2.5%, and no dividends are expected to be received during this time. However, IPRD has a historical dividend yield of 3.5%. The forward price on this contract is closest to: A) $905,175. B) $903,712. C) $901,494

B) $903,712. The historical dividend yield is irrelevant for calculating the noarbitrage forward price because no dividends are expected to be paid during the life of the forward contract. FP = S 0 (1 + R f )^T 903,712 = 900,000(1.025)^2/12

An instantaneously riskless hedged portfolio has a delta of: A) anything gamma determines the instantaneous risk of a hedge portfolio. B) 0. C) 1.

B) 0 A riskless portfolio is delta neutral the delta is zero.

Long term sources of finance for small/expanding company

Banks-Debt Venture/Seed Capital- Equity Government Grants

Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Adverse Opinion. In our opinion

Because Of the significance of the matter discussed in the Basis for Adverse Opinion paragraph,,the consolidated financial statements referred to above Do Not Present Fairly the financial position of ABC Company and its subsidiary as of December 31, 2001, or the results of their operations or their cash flows for the year then ended.

Consider a fixedrate semiannualpay equity swap where the equity payments are the total return on a $1 million portfolio and the following information: 180day LIBOR is 4.2% 360day LIBOR is 4.5% Div. yield on the portfolio = 1.2% What is the fixed rate on the swap? A) 4.5143%. B) 4.3232%. C) 4.4477%.

C) 4.4477%. (1-(1/1.045))/((1/1+0.042(180/360))+(1/(1+0.045(360/360)) = 0.022239*2 = 4.4477%

Compared to the value of a call option on a stock with no dividends a call option on an identical stock expected to pay a dividend during the term of the option will have a: A) higher value only if it is an American style option. B) lower value only if it is an American style option. C) lower value in all cases.

C) lower value in all cases An expected dividend during the term of an option will decrease the value of a call option.

For t<T

Camer(T) ? Camer(t),≥

MetaData

Data about the data. Metadata provide context, meaning, and purpose to data

WT: Weaknesses and threats

Defensive tactics directed at reducing internal weaknesses and avoiding external threats

Do you have experience with R (or Weka

Scikit-learn, SAS, Spark, etc.)? Tell me what you've done with that. Write some example data pipelines in that environment.,

Reporting on Complete Set of FS and Single FS/Specific Element

The auditor should: -Issue a separate auditor's report and express a separate opinion for each engagement. May be published together provided they are sufficiently differentiated and report on the complete set of FS is unmodified/unqualified -Indicate in the report on a specific element the date of the auditor's report on the complete set of FS and nature of opinion expressed on the FS

Why overfitting happens?

The possibility of overfitting exists as the criteria used for training the model is not the same as the criteria used to judge the efficacy of a model.

Five C's Costs

What are the major cost? How have they changed in the past year? How do the costs compare to those of others in the industry? How can we reduce costs?

Five C's Company

What do you know about the company? How big is it? Is it public or private? What kinds of products or services?

what questions that data analytics enables us to answer?

What has happened in the past? Why did it happen? What could happen in the future? Can some of the actions resulting from our insights be automated? Can the analytics process be automated?

sector

broad business category a stock falls into

whereas a causal factor is one that affects an event's outcome

but is not a root cause. Essentially, you can find the root cause of a problem and show the relationship of causes by repeatedly asking the question, "Why?", until you find the root of the problem. This technique is commonly called "5 Whys", although is can be involve more or less than 5 questions.

What information do competitors want?

competitive analysis, benchmarking

geometric growth rate

compounded growth rate or time series growth rate ((end-beginning value) ^ (1/# yrs)) -1

Working Capital = ? - ?

current assets - current liabilities

Current Ratio = ? / ?

current assets / current liabilities

In a vertical analysis of the _____

each item is stated as a percent of net sales.,income statement

Sensitivity testing

looks at changing the inputs and how that changes the output provides insights into why the results occur. We change inputs and assumptions, and then recalculate our decision tree.

The ________ requires a report stating management's responsibility for establishing and maintaining internal control. In addition

management's assessment of the effectiveness of internal controls over financial reporting is included in the report.,Sarbanes‐Oxley Act of 2002

what are the transactional data of fact tables?

measures or key figures.

sharpe ratio

risk metric equal to: (asset return - risk free) / standard deviation

asset turnover

sales/total assets

What is Linear Regression?

score of a variable Y, predictor variable

Which languages would you choose for semi-structured text data reconciliation?

scripting languages (Python and Perl)

Data Gathering

selecting the data

If you are performing a hypothesis test based on a 90% confidence level

what are your chances of making a type I error?,10%. The probability of a type I error is equal to the significance level, which is 1-confidence level. (A 90% confidence level indicates that the significance level is 10%. Therefore there is a 10% chance of making a type I error.)

Enterprise value

what it would cost to completely take over a business. EV = mkt cap + debt - cash

Please explain

how worker's work, when a new Job submitted to them?,Ans: When SparkContext is created, each worker starts one executor. This is a separate java process or you can say new JVM, and it loads application jar in this JVM. Now executors connect back to your driver program and driver send them commands, like, foreach, filter, map etc. As soon as the driver quits, the executors shut down

csat is a double edged sword because.....

if someone really doesn't like your brand, they'll recommend for others to avoid it

What are the trade-offs between closed-form and iterative implementations of an algorithm

in the context of distributed systems?,

SWOT framework

internal: strengths & weaknesses external: opportunities & threats

Buyers

intrinsic power consolidation volume threat if backward integration

Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Middle Paragraph(s):(2) (2).Disclosure of the principal effects of the subject matter of the qualification on financial position

results of operations,,and cash flows, if practicable. (a). if the effects are not reasonably determinable, the report should so state. (b). if such disclosures are made in a note to the financial statements, the explanatory paragraph(s) may be shortened by referring to it.

contribution margin %

revenue - variable costs/ revenue

Margin of safety

risk management method- never pay fair value in case your fair value estimate was wrong ie only pay $6.70 for $10 stock to insure margin of safety

For options on Futures

u is,e^(σ√h)

Sortino Ratio

update to sharpe ratio (Annualized return - risk free rate of return) / Std Dev of negative return series

Funds from operation FFO

used for REITs instead of earnings. assets are primarily its business depreciation can significantly impact the results calculated as net income excluding gains or losses on the sale of property, with depreciation added back in.

cash conversion cycle

used to measure how quickly a company converts cash on hand into more cash 3 parts: Days Sales of Inventory (abbreviated as DSI) Days Sales Outstanding (abbreviated as DSO) Days Payable Outstanding (abbreviated as DPO) DSI+DSO-DPO

Hierarchies

used to organize dimension attributes in a tree-like structure for reporting purposes

enterprise value

what it would cost to completely take over a business. EV = mkt cap + debt - cash

Which form of income is the highest risk

with the greatest potential return?,Ordinary Shares Preference Shares Bonds Debentures- Asset Backed

The _______ measures the profitability of total assets

without considering how the assets are financed.,rate earned on total assets

011101

35

011110

36

Accounts Payable Deferral Period

365/AP Turnover

Receivables Collection Period

365/AR Turnover

0100

4

110100

64

110111

67

world population

7.5 billion

111001

71

111010

72

111011

73

111100

74

population data: Europe

740M

Europe

743.1 million

111101

75

111110

76

population data: London

7M

Possible Values: θcall

<0 (usually)

Possible Values: θput

<0 (usually)

Possible Values: Γcall

>0

Mapreduce

process large data sets

Quick (Acid-Test) Ratio

(Cash + Marketable Sec. + Receivables)/CL

LIFO liquidation:

Way for a company to manipulate earnings

What does the business plan include?

product offering customer base budget & finance

What is equity?

(Capital) The residual interests in the assets of an entity after deducting all its liabilities equity = assets - liabilities Shareholders equity Shareholders funds Stockholders equity Capital

Appropriateness of Financial Statement Presentation or Disclosure (≠ GAAP) Material misstatements related to the appropriateness of financial statement presentation or the appropriateness or adequacy of disclosures may arise when:

(1). The financial statements do not include all required disclosures

ROI

(Gain on Investment - Cost of Investment)/ Cost of Investment Typically shown as a percentage

Valuing Equity with P/B Ratio

(P0/BV Common Equity) * BV Common Equity

Valuing Equity with Price-to-Cash-Flow

(P0/Expected CF in One Year) * Expected CF in One Year

Valuing Equity with Price-to-Sales

(P0/Expected Sales in One Year) * Expected Sales in One Year

Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:(2)

(b). If disclosure of the financial effects is made in the notes to the financial statements, the basis for the modification paragraph can be shortened by referring to the disclosure.

Gearing (Leverage)

(debt to equity ratio) Mgmt seek optimum leverage Too high- increased risk of bankruptcy Too low- inefficient use of capital

Contribution Margin %

(rev-vc)/rev

return on investment (ROI)

(revenue - cost - investment) / capital invested

Gross profit margin

(sales revenue-cost of goods sold)/Sales revenue

Misstatements Related to Appropriateness of Accounting Policies

- Accounting policies are not in accordance with applicable financial reporting frameworks - FS are not fairly represented - Entity has not complied with the financial reporting framework requirements

Sharpe Ratio (φ)

(α-r)/σ(stock)=(γ-r)/σ(call)

Interpretive Publications

*Second level of audit guidance: middle authoritative -Recommendations regarding how SASs should be applied in situations. *Not considered auditing standards* -Auditor should consider the guidance in performing audit and able to explain any departures and how compliance with standards was achieved. -Examples: Auditing interprations of GAAS, exhibits to GAAS, auditing guidance provide in AICPA Audit and Accounting Guides, and AICPA Auditing Statements of Positions SOP

Auditing Sales Transactions

*C*ompleteness - Trace from Shipping Doc -> Invoice -> Sales Journal Cut*o*ff - Compare a sample of sales invoices from shortly before and after year-end with the shipment dates and with the sate the sales were recorded in the sales journal *V*aluation, Allocation, and Accuracy - *E*xistence and Occurrence *U*nderstandability and Classification

Five Components of Internal Control

*C*ontrol Environment - tone at the top *R*isk Assessment - FS misstated, not efficient, breaking law *I*nformation and Communication - Fair, Accurate, Complete, Timely -> FACT *M*onitoring - effectiveness of controls and report deficiencies *E*xisiting Control Activities - policies/procedures to mitigate risks

SAS nonissuers and PCAOB AS issuers

*First level of audit guidance*: most authoritative -Audits should use professional judgment in applying SAS or PCAOB to particular engagement, be prepared to justify any departures from mandatory requirements -"Must" or "is required": unconditional requirement that must be followed -"Should": indicates presumptively mandatory requirement, followed in cases when relevant -"May" "might" or "could" indicates explanatory material that does not impose requirement

Definition of Pervasive

*Very Material* -Auditor's professional judgement: are not confined to specific elements, accounts, items. If confined, represent substantial proprosion of FS, are disclosures fundamental to users understanding

Pricing strategies - supply & demand

*graph answer if possible... -What's the supply? How is demand? -How will pricing impact market equilibrium? -Matching competition: What are similar products selling for? -Are there substitutions?

In Spark-Shell which all contexts are available by default?

,Ans: SparkContext and SQLContext

Increasing sales - choose strategy

- Increase volume. (Get more buyers, increase distribution channels, intensify marketing.) - Increase amount of each sale. (Get each buyer to spend more.) - Increase prices. - Create seasonal balance

Misstatements Related to the Application of Accounting Policies

- Management has not applied accounting policies in accordance with the applicable framework - Management has not applied policies consistently - There is an error in the application of an accounting policy

A Material Misstatement of the financial Statements may arise in relation to the following:

- The appropriateness of accounting policies - The application of accounting policies - The appropriateness of the financial statement presentation or the appropriateness or adequacy of disclosures in the financial statements

Mergers and Acquisitions - researching company & industry

- What kind of shape is the company in? -How secure are its markets and customers? -How is the industry doing overall? -And how is this company doing compared to the industry? - How will our competitors respond to this acquisition? -Are there any legal reasons why we can't, or shouldn't, acquire it?

The grand strategy matrix: Quadrant I

-Continued concentration on current markets (market penetration and development) and products is an appropriate strategy

Pricing strategies - choosing

-Cost-based pricing vs. price-based costing (i.e., do you decide pricing based on how much the product costs to produce or on how much people will pay?) - How much does it cost to make or deliver/provide? -What does the market expect to pay? - Is it a "must have" product? - Do we need to spend money to educate the consumer?

Investing activities reflected in statement of cash flows

-Dividends received -Interest received -Loans to/from associate /subsidiary companies -Proceeds from sales of assets and investments -Purchases of assets & investments

Analytics Methodology within a Framework

-Enablers are the essential components needed for the methodology to work. They include technology, infrastructure, tools, and techniques. -The benefits of analytics are vast and varied. Examples are value/profit, performance, safety, health and longevity of the system, and many others. -People are generally both the creators and the benefactors of analytics activities. User authorizations and internal controls, and training are required within the framework to work.

Increasing profit - costs

-Identify the major variable and fixed costs. .-Have there been any major shifts in costs? (e.g., labor or raw material costs) - Do any of these costs seem out of line? .-How can we reduce costs without damaging the revenue streams? -Benchmark costs against our competitors.

Why Spark even Hadoop exists?(2),

-In Memory Processing: MapReduce uses disk storage for storing processed intermediate data and also read from disks which is not good for fast processing. . Because Spark keeps data in Memory (Configurable), which saves lot of time, by not reading and writing data to disk as it happens in case of Hadoop.

Mergers and Acquisitions - Goals & Objectives

-Increase market access - Diversify their holdings - Pre-empt the competition - Gain tax advantages -Incorporate synergies: marketing, financial, operations

The internal-external (IE) matrix

-Put the EFE on the Y axis and the IFE on the X axis -Quadrants are grouped -Three major regions: 1. Grow and build 2. Hold and maintain 3. Harvest or divest

Financial Accountants' responsibilities can include:

-Recording sales, expenses, bank transactions -Analysing the records -Providing a control function -Preparing information for external audit -Controlling cash - Liquidity -Responsibility for payroll

SWOT matrix

-SO: Strengths and opportunities strategies -WO: Weaknesses and opportunities strategies -ST: Strengths and threats -WT: Weaknesses and threats

Strategy-formulation analytical framework

-Stage 1: Input stage -Stage 2: The matching stage: Matches info to compatible strategies -Stage 3: The decision stage: Choose the best strategy

Turning around troubled co - industry

-Tell me about the company. -Why is it failing -Bad products, bad management, bad economy? - Tell me about the industry. - Are our competitors facing the same problems? - Do we have access to capital? - Is it a public or privately-held company?

what are the conditions in which sample is appropriate?

-The analysts are certain that each data point is representative of the entire set -The source dataset is too large for the planned analysis -The application specifically calls for a data sample, as is the case with some accounting and regulatory compliance audits

The grand strategy matrix: Quadrant II

-Unable to compete effectively -Need to determine why the firm's current approach is ineffective and how the company can best change to improve its competitiveness

Increasing profit - revenue/price

-What are the revenue streams?(Where does the money come from?) -What percentage of the total revenue does each stream represent? -Does anything seem unusual in the balance of percentages? -Have those percentages changed lately?If so,why?

Pricing strategies - investigate product

-What's special or proprietary about our product? -Are there similar products out there, and how are they priced? - Where are we in the growth cycle of this industry?(Growth phase? Transition phase? Maturity phase?) - How big is the market? - What were our R&D costs?

Industry/Acquiring a diverse company - Investigate Industry

-Where is it in its lifecycle?(Emerging?Maturity?Decline?) - How has the industry been performing (growing or declining) over the last 1, 2, 5,and10 years? -How have we been doing compared to the industry? -Who are the major players and what kind of market share does each have? -Who has the rest? -Has the industry seen any major changes lately? Such as new players, new technology and increased regulation. -What drives the industry?Brand products, size, or technology?

Customer (cheng)

-Who is the customer -identify segments (segment size, growth rate, percentage of market) -compare current year metrics to historical (look for trends) - What does each customer segment want (identify key needs) - Price sensitivity of each segment - distribution channel preference of each segment - customer concentration and power

if product in decline

-define niche market -analyze compt -think exit strategy

promotion

-do we have a desirable brand image -what are the metrics for awareness, trial, and retention -marketing campaign to increase awareness -discounts initally to capture the market and promote trial -promotions to encourage trial and referalls -warranties and refunds to encourage trial and lower ris -hav trained salesforce to talk about beenfits of product

potential reasons for M&A

-increase market access -diversify holdings -pre-empt competition -gain tax advantages -incorporate synergies -create shareholder value

growth strategies

-increase sales -increase distribution channels -increase product line -invest in major marketing campaign -diversify products and services -acquire competitors

if sales are flat and profits are takign a header

-investigate both revenue and costs -start with revenue

if profits declining because of drop in rev

-investigate marketing and distribution issues

if profits declining bc of rising expenses

-investigate price drop and/or cost increase

external factors

-market -customers -industry -competitors -risk

Assessing Credit Quality *Lowe Risk:*

-more revenue sources -more operational margins -stable/sustainable margins -higher FCF/Debt, FCF/Int.

ways to grow (other than volume)

-new customers -new products -new geographies -new channels -backward/forward integration -M&A -new tech -new capabilities

decline in sales problem...

-overall declining market demand -the current marketplace is mature or product is obsolete -loss of market share due to substitution

Financing activities reflected in statement of cash flows

-proceeds from issue of ordinary shares, or from borrowings -Repayments of borrowings -Dividends paid to equity shareholders or minority interests

Applications of Analytics

-retail- used to assist in pricing strategies -marketing- used in predicting customers behavior -supply chain- selecting suppliers and optimizing distribution costs -customer service- Customized service is based on analysis of prior work orders -financial investors- looking at companies that are great investments or risks

1/15

.0667

1/3

.333

10 x 100

000,1,000,000

Space

00100000

Period

00101110

000001

01

a

01100001

M

1,000,000 = 10^6

B

1,000,000,000 = 10^9

3 Main Questions of Financial Statements

1. How much is the company worth? 2. How much profit did the company make? 3. What cash movement took place?

Starting a New Business Steps

1. Initial Questions 2. Management 3. Market and Strategic Plans 4. Distribution Channels 5. Products and Services 6. Customers 7. Finance

what are the ERP business Areas?

1. Inventory control for reservation of the goods for the customer 2. Procurement (purchasing) for replacement of the sold goods 3. Warehouse management for picking and packing the goods for delivery 4. Shipping for the goods to be shipped 5. Billing so that the sale can be invoiced 6. Human Resources for calculation of commissions 7. Accounting to be recorded in the general ledger 8. Budgeting for comparison of actual results to forecasted sales

Pricing Case

1. Investigate the product 2. Choosing a Pricing Strategy 3. Supply and Demind (graph) -3 main ways to price a product -competitive analysis: are there similar products out there, how does our comapre to compt, do we know their costs, how are they priced, any subsititutions - Cost based pricing: take all our costs, add them up, and add profit to it -> break even point - Price-based costing: what is the willingness to pay,

What are the key concepts introduced by the Spark ML APIs?

1. ML Dataset 2. Transformer 3. Estimator 4. Pipeline 5. Param

Sales - Segregation of the Functions

1. Preparation of the Sales Order - Sales department receives a customer PO and prepares a prenumbered sales order 2. Credit Approval - Credit department approves the sales order and sends a copy of the approved sales order to the shipping, billing, and accounting departments 3. Shipment - Shipping department prepares a prenumbered bill of lading and sends a copy to the customer. They then ship the goods 4. Billing - Billing department prepares a prenumbered sales invoice. They then compare shipping documents, sales orders, and invoices. Invoices are then sent to the customer and to the AR department. 5. Accounting - Sale is entered into the sales journal and a receivable is recorded

Entering a New Market Steps

1. Questions about the Company 2.Determine Current and Future State of the Market 3. Investigate the Market to determine if it makes good business sense 4. If we decide to enter, what way should we choose to do so?

AR - Segregation of the Functions

1. Sales - A receivable is recorded in the AR control account and in the GL. An independent person should periodically reconcile the two. 2. Collection of Cash Receipts - When payment is received, the receivable is eliminated. 3. Uncollectible Receivables - An aging schedule is prepared and sent to the credit department. * in order to write-off a receivable, the treasurer must authorize 4. Sales Returns - Prenumbered receiving report may be used as a sales return slip. Once approved, the return is recorded and the related outstanding receivable is eliminated - Credit memos should *NOT* be prepared by those who collect/receive cash payments on AR 5. Sales Discounts - Sales discount procedures should be reviewed to make sure that discounts are recorded properly

what are the three the types of data anomalies

1. Update occur when the same data are stored in multiple places. Eg. customer's address is stored in customer billing and shipping table 2. Insert- result when there is no place within the table to store the new data until another event occurs. eg. the creation of customer info where the only place to store customer name is in the sales transaction table 3. Delete anomalies occur when deleting some data results in the unintentional deletion of other data. eg. if we delete a customer sales transaction and that would affect our sales history.

how to calculate brand equity

1. ask the brand equity question about paying a premium for a branded product 2. subtract all the tangible assets of a firm from the market valuation of the firm based on stock price

reducing costs: cash flow problem

1. breakdown of costs 2. is anything out of line? why? 3. benchmark the competitors 4. determine whether there are any labor-saving technologies that would help reduce costs

placement factors

1. channels: intensive v. selective 2. inventory -push v. pull -stock v. just in time -carrying cost 3. transportation -cost: inhouse v. outsource

five Cs

1. company 2. costs 3. competition 4. consumers/clients 5. channels

price factors

1. product: commodity vs. highly differentiated 2. competitor pricing 3. strategy -goal: penetrate, retain, convert, loss leader, etc. -positioning & perception -cost plus margin 4. customer -segmentation: differences in willingness to pay -elasticity

increasing sales

1. relationship between increasing sales and increasing profits -how are we growing relative to industry? -what has our market share done lately? -do we know what customers want? -are prices in line with our competitors? -what have comps done in marketing & product development? 2. ways to increase sales -increase volume: more buyers, increase distribution channels, intensify marketing -increase amount of each sale (get each buyer to spend more) -increase prices -create seasonal balance

steps

1. summarize question 2. verify objectives 3. ask clarifying questions -- company, industry, competition, product 4. lay out structure

industry analysis

1. things to investigate -life cycle -performance over last 1, 2, 5, 10 years -our performance compared to industry -major players/market share -recent major changes in industry -what drives industry? brand, products, size, technology -profitability -- margins 2. suppliers -how many? -product availability -what's going on in their market? 3. future -players entering or leaving market -recent M&A? -barriers to entry and exit? -substitutes?

What are the new Spark DataFrame and the Spark Pipeline?

A Spark DataFrame is a table where columns are explicitly associated with names.

Bullish Spread

A bullish spread increases in value as the stock price increases, whereas a bearish spread increases in value as the stock price decreases.

How would you implement a recommendation system for our company's users?

A lot of machine learning interview questions of this type will involve implementation of machine learning models to a company's problems. You'll have to research the company and its industry in-depth, especially the revenue drivers the company has, and the types of users the company takes on in the context of the industry it's in.

Audits of Single FS and Specific Elements

Accounts, or Items of FS,Audit of single FS or specific elements, accounts, or items of FS may be performed as separate engagement or in conjunction w an audit of an entity's complete set of FS *Single FS* -BS, Statement of income, RE, CF, Changes in Owner's Equity, Statement of operations by product line. *Specific Elements, Accounts, or Items* -AR -Allowance for doubtful accounts receivable -Inventory -Intangible assets -Schedule of disbursements regarding lease property -Schedule of profit participation or employee bonuses

What are specific ways of determining if you have a local optimum problem?

Determining if you have a local optimum problem: Tendency of premature convergence Different initialization induces different optima

Param

All Transformers and Estimators now share a common API for specifying parameters

continuous uniform distribution

All simulation software packages use the continuous uniform distribution, generating random numbers uniformly distributed between 0 and 1

What is Shuffling?

Ans: Shuffling is a process of repartitioning (redistributing) data across partitions and may cause moving it across JVMs or even network when it is redistributed among executors. Avoid shuffling at all cost. Think about ways to leverage existing partitions. Leverage partial aggregation to reduce data transfer

What is the difference between groupByKey and use reduceByKey ?

Ans : Avoid groupByKey and use reduceByKey or combineByKey instead. groupByKey shuffles all the data, which is slow. reduceByKey shuffles only the results of sub-aggregations in each partition of the data.

What is the transformation?

Ans: A transformation is a lazy operation on a RDD that returns another RDD, like map , flatMap , filter , reduceByKey , join , cogroup , etc. Transformations are lazy and are not executed immediately, but only after an action have been executed.

total return

stock price appreciation plus dividend payments

Which all kind of data processing supported by Spark?

Ans: Spark offers three kinds of data processing using batch, interactive (Spark Shell), and stream processing with the unified API and data structures.

Pricing Strategies Competitive Analysis

Are their similar products out there? How does our Product compare to the competition? Do we know the competitors costs? How are they Priced? Are their substitutions Available? Is there a Supply-and-Demand Issue? What will the competitive response be?

Can you cite some examples where a false negative important than a false positive?

Assume there is an airport 'A' which has received high security threats and based on certain characteristics they identify whether a particular passenger can be a threat or not. Due to shortage of staff they decided to scan passenger being predicted as risk positives by their predictive model. What will happen if a true threat customer is being flagged as non-threat by airport model? Another example can be judicial system. What if Jury or judge decide to make a criminal go free? What if you rejected to marry a very good person based on your predictive model and you happen to meet him/her after few years and realize that you had a false negative?

An investor who anticipates the need to exit a payfixed interest rate swap prior to expiration might: A) buy a payer swaption. B) buy a receiver swaption. C) sell a payer swaption.

B) buy a receiver swaption. A receiver swaption will, if exercised, provide a fixed payment to offset the investor's fixed obligation, and allow him to pay floating rates if they decrease.

In order to compute the implied asset price volatility for a particular option an investor: A) must have a series of asset prices. B) must have the market price of the option. C) does not need to know the riskfree rate.

B) must have the market price of the option. In order to compute the implied volatility we need the riskfree rate, the current asset price, the time to expiration, the exercise price, and the market price of the option.

What's the trade-off between bias and variance?

Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you're using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set. Variance is error due to too much complexity in the learning algorithm you're using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You'll be carrying too much noise from your training data for your model to be very useful for your test data. The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you'll lose bias but gain some variance — in order to get the optimally reduced amount of error, you'll have to tradeoff bias and variance. You don't want either high bias or high variance in your model.

Call Profit

C(S(h),K,T-h)-C(S(0),K,T)e^(rh)

How is market backwardation related to an asset's convenience yield? If the convenience yield is: A) positive causing the futures price to be below the spot price and the market is in backwardation. B) negative, causing the futures price to be below the spot price and the market is in backwardation. C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation.

C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation. When the convenience yield is more than the borrowing rate, the noarbitrage costofcarry model will not apply. It means that the value of the convenience of holding the asset it is worth more than the cost of funds to purchase it. This usually applies to nonfinancial futures contracts.

Accounts Payable Turnover

COGS/Average AP

What attributes does useful information have? (6)

Clarity Consistency Relevance Accuracy Reliability Timeliness

If Profits are declining because of a drop in revenues...

Concentrate on Marketing and Distribution Issues

Porter's generic strategies

Cost, Differentiation, Focus

What is covariance?

Covariance is a measure of how much two random variables change together.

The Value Chain Service

Customer support and retention

Retention Rate

Customers kept at the end of a period / Total customers available at the beginning of the period OR 1 - Churn Rate

K-fold cross validation

Data is divided into a train and validation set for k-times (folds) and the minimizing combination is selected

What do you understand by the term Normal Distribution?

Data is usually distributed in different ways with a bias to the left or to the right or it can all be jumbled up. However, there are chances that data is distributed around a central value without any bias to the left or right and reaches normal distribution in the form of a bell shaped curve. The random variables are distributed in the form of an symmetrical bell shaped curve. Bell Curve for Normal Distribution

Data Science

Data science involves the use of computers to acquire knowledge by analyzing large amounts of data using models and domain expertise.

data Staging area

Data source-->Data staging--->Data Target ETL ETL

Databases

Databases are organized collections of data that enable users to access, manage, and update the data.

What is an example of a dataset with a non-Gaussian distribution?

Days to receive payment from the time invoice is sent

What is the Default level of parallelism in Spark?

Default level of parallelism is the number of partitions when not specified explicitly by a user.

Cause of Non-Recombining Trees

Discrete Dividends (e.g., (Se^(rh)-D)e^(σ√h))

Dividends per Share = ? / ?

Dividends / Shares of Common Stock Outstanding

_______ can be reported with earnings per share to indicate the relationship between dividends and earnings.

Dividends per share

Advantages of Debt Financing

Do not need to give up share of company. Interest payments made before tax

What is a current liability?

Due by the entity within one year trade payables accruals overdraft

Gross Profit Margin

Gross profit / Revenue

Reducing Cost Cost Analysis - external elements

Economy, interest rates government regulation transportation/ shipping strikes

Sustain competitive advantage through

Efficiency, quality, innovation, customer responsiveness

What is an Eigenvalue and Eigenvector?

Eigenvectors are used for understanding linear transformations. In data analysis, we usually calculate the eigenvectors for a correlation or covariance matrix. Eigenvectors are the directions along which a particular linear transformation acts by flipping, compressing or stretching. Eigenvalue can be referred to as the strength of the transformation in the direction of eigenvector or the factor by which the compression occurs.

Who and what are the enablers and benefits respectively?

Enablers- technology, infastracture, tools, techniques Benefits-value/profit, performance, safety, health / longevity

Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). In our opinion

Except For the effects of not capitalizing certain lease obligations as discussed in the preceding paragraph,the financial statements,referred to above Present Fairly, in all material respects, the financial position of X Company as of December 31, 2002 and 2001, and the results of its operations and its cash flows for the years then ended in conformity with accounting principles generally accepted in the United States of America.

Topics in a Business Plan

Executive summary Market analysis and marketing strategy Financial plan (revenue streams and cost structure) Strategic Rationale (unique value proposition) Environmental Analysis and SWOT Analysis

Increasing Profits Volume Elements

Expand into new areas Increase sales(Volume and Force) Increase Marketing Reduce Prices Improve customer service

Industry Analysis Future

Expanding or Shrinking? Mergers and Acquisition? Barriers to entry or Exit?

New Product Market Strategy Elements

Expansion of Customer Base Prompts to competitive response Barriers to entry Major Players and Market Share

What is a key component of planning?

Extensive market research

What is Debt Factoring?

External company takes over managing debtors Advances payment Non-Recourse -The factoring company accepts risk of default.

Regulation (F/M)

F- Companies act, accounting standards, stock exchange M-none

Reporting Focus (F/M)

F- Historical M-past, present & future

Primary user (F/M)

F- external/public M-Internal/private

Type of Information (F/M)

F-Aggregate/summarised M-Detailed/specific

Financial Reports (F/M)

F-structured, balance sheet, income sheet, cash flow statement M- ad hoc, combines financial and non-financial indiactors

BE volume

FC/(Rev-VC)

Materiality of Problem: Material and pervasive

Financial Statements Are Materially Misstated (Financial Statement Issues):Adverse opinion. Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Disclaimer of opinion.

what are the forms of normalization

First normal form: 1NF Second normal form: 2NF Third normal form: 3NF Boyce-Codd normal form: BCNF

Costs

Fixed Costs + Variable Costs

CAGE Framework

For analyzing global markets: Culture Administrative and political Geographic Economic / Wealth Distance from us

Central Limit Theorem (in Math)

For any population distribution with mean μ and variance σ2: •The distribution of the sample mean (x bar) is approximately normal with mean μ and variance σ2/n •This approximation improves as n increases

Practicable means that the information is reasonably obtainable from management's accounts and records and that providing the information in the auditor's report does not require the auditor to assume the position of preparer of financial information.

For example, the auditor is not expected to prepare a basic financial statement, such as an omitted statement of cash flows, or segment information and include it in the auditor's report when management omits such information.

Binomial Variables

Formally, a binomial distribution has parameters n and p, where there are n independent trials, and each trial has probability p of success • EV=Mean=np • Variance = np(1-p)

GAAS

Generally Accepted Auditing Standards

What is Genetic Programming?

Genetic programming is one of the two techniques used in machine learning. The model is based on the testing and selecting the best choice among a set of results.

Reducing Cost Assessment Elements

Get cost breakdown Investigate for irregularities Benchmark competitors Consider Labor saving technologies

The ratio of ______ to _____ is a profitability measure that shows how effectively a company utilizes its assets.

Net sales to assets

What is PCA

KPCA and ICA used for?,PCA (Principal Components Analysis), KPCA ( Kernel based Principal Component Analysis) and ICA ( Independent Component Analysis) are important feature extraction techniques used for dimensionality reduction.

Four P's Price

How does our price compare to the competitions? How was our price determined? Are we priced right? If we changed our price what will that do to our sales volume?

Starting a New Business Steps 2. Management

How experienced is the team? What are its core competencies? Have they worked together before? Is there an advisory board?

Explain what significance means

If a statistical test returns significant, then the effect is unlikely to be from random chance alone

What is dimension reduction in Machine Learning?

In Machine Learning and statistics, dimension reduction is the process of reducing the number of random variables under considerations and can be divided into feature selection and feature extraction

What is unsupervised learning?

In unsupervised learning, the computer searches for patterns in the data without any examples.

The Value Chain Raw Materials

Inbound logistics included here the receiving raw materials into the warehouse Relationships with suppliers Just In Time Delivery

Early Exercise on a Non-Dividend Paying Stock

Never exercise Camer early! (i.e., Camer=Ceur)

Growth Strategies Strategy elements

Increase Distribution Channels Increase product line Invest in major marketing campaign Diversify products or services offered

What is logistic regression? Or State an example when you have used logistic regression recently.

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

Type 4:

Low cost focus strategy that offers products or services to a niche group of customers at the lowest price available on the market

Question marks I

Low market share, high growth rate -Must decide to strengthen by pursuing an intensive strategy or to sell them

Type 1

Low-cost strategy that offers products or services to a wide range of customers at the lowest price available on the market

Strategy/Performance *Niche market*

Lower volumes, higher margins -higher marketing/R&D costs

What is algorithm independent machine learning?

Machine learning in where mathematical foundations is independent of any particular classifier or learning algorithm is referred as algorithm independent machine learning?

Where do you usually source datasets?

Machine learning interview questions like these try to get at the heart of your machine learning interest. Somebody who is truly passionate about machine learning will have gone off and done side projects on their own, and have a good idea of what great datasets are out there. If you're missing any, check out Quandl for economic and financial data, and Kaggle's Datasets collection for another great list.

What is Machine learning?

Machine learning is a branch of computer science which deals with system programming in order to automatically learn and improve with experience. For example: Robots are programed so that they can perform the task based on data they gather from sensors. It automatically learns programs from data.

_______________ is required in annual reports filed with the SEC. It contains management's analysis of current operations and its plans for the future. Typical items included are: Management's analysis and explanations of any significant changes between the current and prior year's financial statements. Important accounting principles or policies that could affect interpretation of the financial statements. Management's assessment of the company's liquidity and the availability of capital to the company. Significant risk exposures that might affect the company. Any "off-balance-sheet" arrangements such as leases not included directly in the financial statements.

Management's Discussion and Analysis (MD&A)

Types of Assets

Non-Current/Fixed Assets Current Assets Intangible Assets

What are the basic assumptions to be made for linear regression?

Normality of error distribution, statistical independence of errors, linearity and additivity.

Sufficient Appropriate Audit Evidence and Risk

Obtain reasonable assurance, auditor obtain sufficient appropriate audit evidence to reduce audit risk to acceptably low level. *Weak internal control DOES NOT equal adverse opinion*

In-the-Money

Option would have positive payout if it could be exercised.

For K₁>K₂

P(S,K₁,T)-P(S,K₂,T) ? K₁-K₂,≤

Exchange Option Equality

P(S,Q,T)=C(Q,S,T)

Price-to-Cash-Flow Ratio

P0/Expected CF in One Year

Price-to-Sales Ratio

P0/Expected Sales in One Year

Barriers to Entry may include (8):

Patents Key partnerships Key customer relationships Expert management team Superior product, functionality Time to Market Cost advantages Switching Costs

What is statistical power?

Probability that the test correctly rejects the null hypothesis when the alternate hypothesis is true

How is a decision tree pruned?

Pruning is what happens in decision trees when branches that have weak predictive power are removed in order to reduce the complexity of the model and increase the predictive accuracy of a decision tree model. Pruning can happen bottom-up and top-down, with approaches such as reduced error pruning and cost complexity pruning. Reduced error pruning is perhaps the simplest version: replace each node. If it doesn't decrease predictive accuracy, keep it pruned. While simple, this heuristic actually comes pretty close to an approach that would optimize for maximum accuracy.

Liabilities

Put on balance sheet in increasing order of maturity 1. Current vs long term liabs 2. Accounts payable 3. Notes payable 4. Mortgage payable 5. Lease liabilities 6. Pension liabilities 7. Liabs related to other postretirement benefits 8. Accrued liabilities 9. Unearned revenues 10. Income tax liabilities 11. Short-term debt 12. Current maturities of long-term debt

relative strength index

RSI = 100 - 100/ (1+ Average close price on up days / average close price on down days) compares recent gains to recent losses- is it over bought or over sold

Measures of Variability

Range Interquartile Range Average Absolute Deviation Variance Standard Deviation

The Value Chain

Raw Materials Operations Delivery Marketing and Sales Service

Retrenchment

Regrouping through cost and asset reduction to reverse declining sales and profit -Also called a turnaround or re-organizational strategy -Designed to fortify a firm's basic distinctive competence

RDD

Resilient: Fault-tolerant and so able to recomputed missing or damaged partitions on node failures with the help of RDD lineage graph. Distributed: across clusters. Dataset: is a collection of partitioned data.

Yarn Components

ResourceManager: runs as a master daemon and manages ApplicationMasters and NodeManagers. ApplicationMaster: is a lightweight process that coordinates the execution of tasks of an application and asks the ResourceManager for resource containers for tasks. It monitors tasks, restarts failed ones, etc. It can run any type of tasks, be them MapReduce tasks or Giraph tasks, or Spark tasks. NodeManager offers resources (memory and CPU) as resource containers. NameNode Container: can run tasks, including ApplicationMasters.

What are internal sources of finance?

Retained earnings -profits made in previous years Benefits gained from more effective mgmt of its working capital (these sources are unleveraged, not optimum in the long run)

Contribution Margin

Rev-VC

Nonissuer Report (qualified opinion)

Same as unmodified except add a basis for qualified opinion paragraph that describes the issues and explains how disclosures are misstated. In opinion paragraph, the language includes "except for" and "presented fairly".

Issuer Report (qualified opinion)

Same as unqualified except middle paragraph(s) are added in that explain all of the substantive reasons that lead to the auditor's conclusion and disclosure of the principal effects on the company's financial statements, if practicable. This paragraph goes before the opinion paragraph. The opinion paragraph has familiar language with the "except for" and "presented fairly" wording.

Divestiture

Selling a division or part of an organization -Used to raise capital for further acquisitions or investments

What do you understand by statistical power of sensitivity and how do you calculate it?

Sensitivity is commonly used to validate the accuracy of a classifier (Logistic, SVM, RF etc.). Sensitivity is nothing but "Predicted TRUE events/ Total events". True events here are the events which were true and model also predicted them as true. Calculation of seasonality is pretty straight forward- Seasonality = True Positives /Positives in Actual Dependent Variable Where, True positives are Positive events which are correctly classified as Positives.

data from sensors

Sensor data are the data gathered from devices such as heating units, vehicles, electrical transformers, airplanes, health monitors

Cluster Sampling

Separate the population into clusters, and take a random sample of the clusters selected, and then survey everyone in the selected clusters • Easier since it reduces the number of sampling locations • May not be a representative sample

Economist guilde to visualizing data

Show data Reduced Clutter Integrate text with graph

Assets

Shown on the balance sheet in declining order of liquidity 1. Current assets 2. Long-term assets

What is batch statistical learning?

Statistical learning techniques allow learning a function or predictor from a set of observed data that can make predictions about unseen or future data. These techniques provide guarantees on the performance of the learned predictor on the future unseen data based on a statistical assumption on the data generating process.

what are the differences and similarities between structured and unstructured data?

Structured Unstructured Organised Unorganised Fixed cell widths varying length Easily scanned, examined difficulty to scan structured text Unstructured text Understandable by comp need to translated Values are proscribed values not typical

What is the difference between supervised learning and unsupervised learning? Give concrete examples

Supervised learning: inferring a function from labeled training data Supervised learning: predictor measurements associated with a response measurement we wish to fit a model that relates both for better understanding the relation between them (inference) or with the aim to accurately predicting the response for future observations (prediction) Supervised learning: support vector machines neural networks, linear regression, logistic regression, extreme gradient boosting,churn prediction Supervised learning examples: predict the price of a house based on the are, size. predict the relevance of search engine results. Unsupervised learning: inferring a function to describe hidden structure of unlabeled data Unsupervised learning: we lack a response variable that can supervise our analysis Unsupervised learning: clustering principal component analysis, singular value decomposition identify group of customers Unsupervised learning examples: find customer segments, image segmentation, classify US senators by their voting.

What's the "kernel trick" and how is it useful?

The Kernel trick involves kernel functions that can enable in higher-dimension spaces without explicitly calculating the coordinates of points within that dimension: instead, kernel functions compute the inner products between the images of all pairs of data in a feature space. This allows them the very useful attribute of calculating the coordinates of higher dimensions while being computationally cheaper than the explicit calculation of said coordinates. Many algorithms can be expressed in terms of inner products. Using the kernel trick enables us effectively run algorithms in a high-dimensional space with lower-dimensional data.

What are your favorite use cases of machine learning models?

The Quora thread above contains some examples, such as decision trees that categorize people into different tiers of intelligence based on IQ scores. Make sure that you have a few examples in mind and describe what resonated with you. It's important that you demonstrate an interest in how machine learning is implemented.

What percentile does the mean represent?

The answer cannot be determined without further information. the mean's location depends upon the distribution of the data set.

Why do we need/want the bias term?

The answer is that bias values allow a neural network to output a value of zero even when the input is near one. Adding a bias permits the output of the activation function to be shifted to the left or right on the x-axis. Consider a simple neural network where a single input neuron I1 is directly connected to an output neuron O1. Bias is a vital concept for neural networks. Bias neurons are added to every non-output layer of the neural network. They are unique from ordinary neurons in two very significant ways. Firstly, the output from a bias neuron is always one. Secondly, a bias neuron has no inbound connections. The constant value of one makes the layer to respond with non-zero values even when the input to the layer is zero. This may be very crucial for certain data sets.

Python or R - Which one would you prefer for text analytics?

The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.

Revenue realization principle

The company recognizes revenues on the accrual basis of accounting -Related to the income stmt

Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Basis for Qualified Opinion:

The company's financial statements do not disclose [describe the nature of the omitted information that is not practicable to present in the auditor's report]. In our opinion, disclosure of this information is required by accounting principles generally accepted in the United States of America.

What are the different Algorithm techniques in Machine Learning?

The different types of techniques in Machine Learning are a) Supervised Learning b) Unsupervised Learning c) Semi-supervised Learning d) Reinforcement Learning e) Transduction f) Learning to Learn

enterprise data warehouse

The enterprise data warehouse (EDW) layer refers to the layers in which data are acquired, transformed, and stored for the long term in full granularity so the data either expand or shrink.

Standard error:

The estimated standard deviation of a sampling distribution

What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?

The general principle of an ensemble method is to combine the predictions of several models built with a given learning algorithm in order to improve robustness over a single model. Bagging is a method in ensemble for improving unstable estimation or classification schemes. While boosting method are used sequentially to reduce the bias of the combined model. Boosting and Bagging both can reduce errors by reducing the variance term.

When an auditor issues a Qualified Opinion how is the opinion paragraph modified?

The opinion paragraph should include the following, "In the auditor's opinion, *except for* the effects of the matter(s) described in the basis for qualified opinion paragraph, the financial statements are *presented fairly*, in all material respects, in accordance with the applicable financial reporting framework."

How to define/select metrics?

Type of task: regression? Classification? Business goal? What is the distribution of the target variable? What metric do we optimize for? Regression: RMSE (root mean squared error), MAE (mean absolute error), WMAE(weighted mean absolute error), RMSLE (root mean squared logarithmic error)... Classification: recall, AUC, accuracy, misclassification error, Cohen's Kappa...

What are various steps involved in an analytics project?

Understand the business problem • Explore the data and become familiar with it. • Prepare the data for modelling by detecting outliers, treating missing values, transforming variables, etc. • After data preparation, start running the model, analyse the result and tweak the approach. This is an iterative step till the best possible outcome is achieved. • Validate the model using a new data set. • Start implementing the model and track the result to analyse the performance of the model over the period of time.

Financial Statement Issues - None or immaterial

Unmodified (Unqualified)

Explain what regularization is and why it is useful.

Used to prevent overfitting: improve the generalization of a model Decreases complexity of a model Introducing a regularization term to a general loss function: adding a term to the minimization problem Impose Occam's Razor in the solution

When you call join operation on two pair RDDs e.g. (K

V) and (K, W), what is the result?,Ans: When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key

Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model

Validation using R2R2: - % of variance retained by the model - Issue: R2R2 is always increased when adding variables - R2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStotR2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStot Analysis of residuals: - Heteroskedasticity (relation between the variance of the model errors and the size of an independent variable's observations) - Scatter plots residuals Vs predictors - Normality of errors - Etc. : diagnostic plots Out-of-sample evaluation: with cross-validation

Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Auditor's Responsibility:

We believe that the audit evidence we have obtained is sufficient and appropriate to provide a basis for our QUALIFIED audit opinion.

Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Independent Auditor's Report. Auditor's Responsibility.

We believe that the audit evidence we have obtained is sufficient and appropriate to provide a basis for our adverse audit opinion.

What are the benefits and drawbacks of specific methods such as lasso regression?

We use an L1L1 penalty when fitting the model using least squares Can force regression coefficients to be exactly: feature selection method by itself β^lasso=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1||βj||}

What is a hire purchase?

When a company hires equipment with the option to buy at the end of the term

what are web crawlers?

Web crawlers, also known as info agents or web spiders, are internet bots (short term for robots) that search web sites one page at a time for information

Four P's Product

What are our Products and services? What is the company's niche?

Pricing Strategies Price-Based Costing

What are people willing to pay for the product?(If it is not more than your costs, it may not be worth making)(On the other hand consumers may be willing to pay much more than you could get just by adding a profit margin) What is the product worth to your buyer? Compare it to other products or services in their lives, what did they pay in those cases? Some times you need to look at the factors that could come from pricing something neccessary(Like heart medication for babies) to high, PR nightmere could result

Starting a New Business Steps 3. Market and Strategic Plans

What are the barriers to entering this market? Who are the major players and what are their respective market shares? What will the competitive response be?

Mergers and Acquisitions Due Diligence Elements

What shape is the company? The industry? How secure are its markets and customers? What are the margins? What is the best competitive response to aquistition? What are the legal issues?

Starting a New Business Steps 4. Distribution Channels

What types of distribution? How will we do this efficiently? Are they reliable?

Porter's 5 Forces 5. Bargaining Power of the Suppliers

When there are many suppliers and few buyers the buyers have the advantage but when there are many buyers and few suppliers the suppliers have the power

Which method is frequently used to prevent overfitting?

When there is sufficient data 'Isotonic Regression' is used to prevent an overfitting issue.

Whats a false positive?

When we wrongly reject the null hypothesis as highly probable

Developing a New Product 3. Think about the Customers

Who are our customer and what is important to them? How are they segmented? How can we best reach them? How can we ensure that we retain them? Consumer Adoption Rates(See Chart)

Both XML and HTML use tagging but what are their purposes?

XML tags are used to create metadata about data so that the data can be understood by computers for further processing and structuring. HTML is used to tag data so that browsers can display that data as a web page.

Check Pointing (SPARK)

You mark an RDD for checkpointing by calling RDD.checkpoint() . The RDD will be saved to a file inside the checkpoint directory and all references to its parent RDDs will be removed. This function has to be called before any job has been executed on this RDD.

If Sales are Flat and profits are down.....

You need to examine both revenue and costs Start with Revenue You cant make educated decisions about costs until you identify and understand the revenue streams

What is not Machine Learning?

a) Artificial Intelligence b) Rule based inference

Explain what is the function of 'Supervised Learning'?

a) Classifications b) Speech recognition c) Regression d) Predict time series e) Annotate strings

What are the five popular algorithms of Machine Learning?

a) Decision Trees b) Neural Networks (back propagation) c) Probabilistic networks d) Nearest Neighbor e) Support vector machines

Explain what is the function of 'Unsupervised Learning'?

a) Find clusters of the data b) Find low-dimensional representations of the data c) Find interesting directions in data d) Interesting coordinates and correlations e) Find novel observations/ database cleaning

How are neural nets related to Fourier transforms? What are Fourier transforms

for that matter?,

csat is the ___________ metric

golden

binomial distributions

have a fixed number of trials

yield on cost

measures the percent of dividend income your investment is generating from the purchase price

Which is more important to you- model accuracy

or model performance?,This question tests your grasp of the nuances of machine learning model performance! Machine learning interview questions often look towards the details. There are models with higher accuracy that can perform worse in predictive power — how does that make sense? Well, it has everything to do with how model accuracy is only a subset of model performance, and at that, a sometimes misleading one. For example, if you wanted to detect fraud in a massive dataset with a sample of millions, a more accurate model would most likely predict no fraud at all if only a vast minority of cases were fraud. However, this would be useless for a predictive model — a model designed to find fraud that asserted there was no fraud at all! Questions like this help you demonstrate that you understand model accuracy isn't the be-all and end-all of model performance.

Is it better to have too many false positives

or too many false negatives? Explain.,It depends on the question as well as on the domain for which we are trying to solve the question. In medical testing, false negatives may provide a falsely reassuring message to patients and physicians that disease is absent, when it is actually present. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. So, it is desired to have too many false positive. For spam filtering, a false positive occurs when spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task. So, we prefer too many false negatives over many false positives.

Insider ownership

percentage of ownership in business that is: greater than 10% officers or directors of a company

what are examples of unstructured data?

pictures, audio recordings, and videos, although they commonly consist of blocks of text.

other than changing price

profit can increase through,-increase revenue w/o increasing volume 1.look for additional products/services you can bundle 2. branch into related capabilities) -increase volume 1. look for new uses of the product or service 2. market efforts to increase awareness, trial, and repeat purchase 3. ensure quality 4. add value-added features

improving bottom line -- profits E(P=R-C)M (sales up

profits flat),look at external factors first -- economy & market/industry -industry-wide problem or company problem? 1. analyze revenues: revenue streams, % of total revenue for each stream, is anything unusual in balance of %s, have %s changed lately? why? 2. examine costs: major costs, any major shifts in costs, any costs out of line? benchmark costs against competitors 3. volume: expand into new areas, increase sales force, increase marketing, reduce prices, improve customer service

return on invested capital (ROIC)

profits made by company on money from its capital base - net operating profits after taxes (doesnt included interest expense) divided by invested capital (assets minus cash and non-interest bearing current liabilities)

The __________ measures the rate of profits earned on the amount invested by the common stockholders.

rate earned on common stockholders' equity

dupont roi

return on sales x assets turn over

outcome - decision tree

triangle

current assets

subset of assets- any asset that can be quickly converted to cash

get rid of deadwood -determine short-term and long-term company goals -devise business plan -visit clients

suppliers & distributors, & reassure them -prioritize goals -- get some small successes ASAP to build confidence

surrogate ID (SID)

surrogate ID (SID) table to map the alphanumeric master data primary key to the numeric characteristic. Here is an example. A product may have the key DXTR1000 (Deluxe Touring Bike—Black).

Ticker

symbol used to identify the shares of a specific corporation

adjusted funds from operation AFFO

takes funds from operations and adjusts them for recurring capital expenditures, as well as other adjustments from management

Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Basis for Adverse Opinion: As described in Note X

the Company has not consolidated the financial statements of subsidiary XYZ company that it acquired during 2001,because it has not yet been able to ascertain the fair value of certain of the subsidiary's material assets and liabilities at the acquisition date. This investment is therefore accounted for on a cost basis by the Company.

The perfect strategy for the high cost producer is one that convinces....

the competition that market shares cannot be shifted except over long periods of time aka highest practical industry prices are an advantage to all because price wars are detrimental to all players in the market

customer journey

the complete sum of experiences that customers go through when interacting with your company and brand

Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Qualified Opinion Paragraph: When the auditor expresses a qualified opinion due to a material misstatement in the financial statements

the opinion paragraph,should state that, in the auditor's opinion, Except For the effects of the matter(s) discussed in the preceding paragraph, the financial statements are Presented Fairly, in all material respects, in conformity with the accounting principles generally accepted in the United States of America.

What does NLP stand for?

"Natural language processing"! Interaction with human (natural) and computers languages Involves natural language understanding Major tasks: - Machine translation - Question answering: "what's the capital of Canada?" - Sentiment analysis: extract subjective information from a set of documents, identify trends or public opinions in the social media - Information retrieval

Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:

(1).A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (a). If it is not practicable to quantify the financial effects, this should be stated.

PEG Ratio

(P0 / Expected EPS)/Growth Rate

Starting a new business - Venture capitalist appeal

*Management -What is the management team like? -What are their core competencies? -Have they worked together before? -Is there an advisory board? *Market & Strategic Plans -What are the barriers to entering this market? -Who are the major players and what kind of market share does each firm have? -What will the competitive response be? *Distribution Channels -What are our distribution channels? *Products -What is the product and technology? -What is the competitive edge? -What are the disadvantages of this product? -Is the technology proprietary? *Customers -Who are our customers? -How can we best reach them? Can we reach them on the Internet? -How can we ensure that we retain them? *Finance -How is the project being funded? -What is the best allocation of funds? -Can we support the debt? (What if interest rates change? What if the economy sours?)

Emphasis-of-Matter Paragraphs

*Nonissuers = Private* -Included in auditor's report when required by GAAS or auditor's discretion. Used when referring to a matter that is appropriately presented or disclosed in the FS and is very important and fundamental to users for understanding FS. *Does NOT affect the auditor's opinion - Stays unmodified* *Requirements* -Immediately after the opinion paragraph (before Other-Matters) -Titled "Emphasis-of-Matter" -Describe matter being emphasized and location of disclosures in FS -Indicate auditor's opinion is not modified w respect to matter emphasized

Three Framework Objectives

*O*perations Objectives - effectiveness and efficiency - ensuring that the assets of the organization are adequately safeguarded *R*eporting Objectives - focus of COSO - reliability, timeliness, transparency of an entity's financial and non financial reporting *C*ompliance Objectives - adhering to all applicable laws and regulations

Governance issues

-Board of directors -A group of individuals who are elected by the ownership of a corporation to have oversight and guidance over management and who look out for the shareholders' interests -Help explain political choices

Porter's 5 generic strategies:

-Cost leadership -Differentiation -Focus -Cost focus -Differentiation focus

Increasing profit - volume

-Expand into new areas. -Increase sales force. -Increase marketing. - Reduce prices. - Improve customer service.

Industry/Acquiring a diverse company - Suppliers

-Have the suppliers been consistent? -What is going on in their industry? -Will they continue to supply us?

What are Non-current/Fixed Assets/ how long are they held in the entity for?

-Held for more than one accounting period -Not intended for resale -Used in production of entity's goods or services Buildings, plant, equipment

Increasing sales - investigate industry & market

-How are we growing relative to the industry? -What has our market share done lately? - Have we gone out and asked customers what they want from us? - Are our prices in line with our competitors? -What have our competitors done in marketing and product development?

Developing a new product - market strategy

-How does this impact existing product line? -Are we cannibalizing one of our existing products? -Are we replacing one of our existing products? -How will this expand our customer base and increase our sales? -What will competitor's response be? -If it is a new market - what are barriers to entry? -Who are the major players? How much market share do they have? Who has the rest?

Developing a new product - financing

-How is the project being funded? - What is the best allocation of funds? - Can we support the debt? (What if interest rates change? What if the economy sours?)

Mergers and acquisitions - Exit strategies

-How long are they planning to keep it? - Did they buy it to break it up and sell off parts of it?

Company and industry analysis

-Identify economic characteristics -Identify strategies to achieve company's goals and objectives -Keys to competing in the industry (Quantitative analysis)

Stage 3: Decision stage

-Involves the quantitative strategic planning matrix (QSPM) -Reveals the relative attractiveness of alternative strategies and thus provides objective basis for selecting specific strategies

Profitability analysis

-Operating results of the company -Analyze the 12-month time period for the company

Objectives:

-Provide direction -Allow synergy -Aid in evaluation -Establish priorities -Reduce uncertainty -Minimize conflicts -Aid in both the allocation of resources and the design of jobs

s

01110011

u

01110101

x

01111000

what are the fours types of interactions?

1. Create new records or rows of data, 2. Read records 3. Update or change the value of the attributes in the record 4. Delete records

what are the types of tools

1. Data Exploration and Reporting—Typically these are tools that feature slicing and dicing

What are the benefits that ERP offers?

1. Data enter only once and they can share data in different areas 2. Changes to master data such as names and addresses are entered only once and are then used many times. 2. The data processing and storage functionality of all of the business processes are consolidated in a single system, the ERP. which helps reduces IT costs.

Dominant assets:

1. Receivables 2. Inventory 3. PP&E

Purpose of the Z-Score

1. To calculate the probability of an outcome occurring, given a normal distribution of outcomes 2. To compare two or more outcomes that come from different normal distributions

Describe supervised learning in more details

1. Training phase: Sample extracted from true labels is used to learn a family of models. 2. Validation phase 3. Test Phase 4. Application phases

competitive response

1. competitive analysis -what is new product? how does it differ from ours? -what has competition done differently? what's changed? -have any other comps picked up market share? 2. response actions -acquire competitor -merge w/ competitor -hire competitor's management -increase own profile with marketing and PR campaign

what questions do you ask for brand awareness?

1. for [product/service] what is the first brand name you can think of? 2. for [product/service] what are other brands you've heard of?

company factors

1. profit equation 2. product/service offering -value chain -differentiation 3. more Cs: collaborators, channels, competencies, capacity, culture

developing new product

1. think about product -what's special or proprietary? -is it patented? -substitutions? -advantages & disadvantages? -fit with rest of product line? 2. think about market strategy -effects on existing product line -cannibalizing existing products? replacing? -will it expand customer base and increase sales? -competitive response? -if new market, entrance barriers? -major players/market share 3. customers -who are they? -how best to reach them? Internet? -ensure retention 4. financing -how is it funded -best allocation of funds? -can we support the debt?

Five Forces (state of competition depends on these)

1. threat of new or potential entrants. (barriers of entry) 2. intensity of rivalry among existing competitors 3. pressure from sbustitution products 4. bargaining power of buyers 5. bargaining power of suppliers

market sizing formula

1. total population x % of customers in segment = # of customers targeted 2. # customers targeted x # units purchased per year = total # units 3. total # units x price per unit = total annual market size

001011

13

1101

13

1110

14

001101

15

001110

16

001111

17

010010

22

010011

23

010100

24

011001

31

011010

32

U.S. population

323 million

100001

41

100101

45

111000

70

rule of 72

72/r = number of years it would take an investment to double in value (ex: discount rate is 12%, so 72/12=6)

EVPI: Expected Value of Perfect Information

= The most a firm should pay for perfect information = EV(with perfect info.) - EV(baseline)

EVSI: Expected Value of Sample Information

= The most a firm should pay for sample information = EV(with sample info.) - EV(baseline)

Days payable outstanding

=Accounts payable/ cost of sales *365 how long a company waits before repaying creditors

Possible Values: ρcall

>0

Possible Values: ∆call

>0

What's a Fourier transform?

A Fourier transform is a generic method to decompose generic functions into a superposition of symmetric functions. Or as this more intuitive tutorial puts it, given a smoothie, it's how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — it's a very common way to extract features from audio signals or other time series such as sensor data.

What is classifier in machine learning?

A classifier in a Machine Learning is a system that inputs a vector of discrete or continuous feature values and outputs a single discrete value, the class.

What is leasing?

A company gets the right to use an asset in exchange for regular payments to the owner.

Give an example of an Estimator

A learning algorithm is an Estimator which trains on a training set and produces a model

What is a sigmoid function and what is a logistic function?

A logistic function is a sigmoid used in logistic regression

Which of the following is equivalent to a plain vanilla receive fixed currency swap? A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. B) A short position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. C) A short position in a foreign bond coupled with a long position in a dollardenominated floating rate note.

A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. A long position in a fixed rate foreign bond will receive fixed coupons denominated in a foreign currency. The short floating rate note requires U.S. dollar denominated floatingrate payments. Combined, these are the same cash flow as a plain vanilla currency swap.

How can we distribute JARs to workers?

Ans: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.

what is a common use in business in spreadsheets?

Budgeting

Business intelligence (BI)

Business intelligence (BI) has been used to describe analytics in the context of business data. It focuses business data, financial data, and marketing data to gain business value, customer loyalty, and other benefits

If a company issues financial statements that purport to present financial position and results of operation

But omits the related statement of cash flows , the auditor will normally conclude that the omission requires qualification of the opinion.

Inventory Turnover

COGS/Average Inventory

What does the Cash Flow statement show?

Cash inflows and outflow of the organizations for the period that has just ended

Real-World Pricing (Replicating Portfolio)

Ce^(γh)=S∆e^(αh)+Be^(rh)

what is data analytics

Data analytics is the process that takes us from data to decision.

Use profitability framework when you hear

Decline in prices Decline in volume Increase in costs

Why ensemble learning is used?

Ensemble learning is used to improve the classification, prediction, function approximation etc of a model.

Professional Judgement

Exercise in planning and performing an audit. The audit requires interpretation of ethical requirements and GAAS. Make decisions about: -Materiality -Audit Risk -Nature, extent, and timing of audit procedures (*NET*) -Evaluating whether sufficient, appropriate evidence has been obtained (Support audit opinion, not FS) -Evaluating Management judgments in applying applicable financial reporting framework -Frawing conslusions based on evidence obtained

Screening

Filtering a set of potential investments into a smaller set that meet certain criteria -back-testing - applies securities selected to historical data

Branches of Accounting

Financial Accounting MGMT Accounting Treasury MGMT Auditing Taxation and VAT Consultancy (Financial MGMT Corp. Finance?)

Materiality of Problem: Material but not pervasive

Financial Statements Are Materially Misstated (Financial Statement Issues): Qualified opinion. Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Qualified opinion.

Increasing Sales Assessment Elements

Growth relative to market share Changes in market share Customer polls Prices Competitive? Competitors strategies(marketing and product development)

Code of Professional Conduct

Guidelines -Section: ET -Standard Setting: AICPA -AICPA Code of Professional Conduct provides members with guidelines for behavior in the conduct of their professional affairs. Provides assurance to public that profession intends to maintain high standards and to enforce compliance with these standards by its members -Applies to: Members of AICPA

Developing a New Product 4. Funding

How is the product being funded? Does our company have the cash or are they taking on debt? Can we support this debt under various economic conditions? What is the best allocation for funds?

Starting a New Business Steps 7. Finances

How is the project being funded? What is the best allocation of funds? Can we support the debt under various economic conditions?

Mergers and Acquisitions Exit Strategies Elements

How long to keep it? Divest parts of the organization?

Industry Analysis Suppliers

How many? Product availability? What's going on in their market?

High p-value:

Hypothesis being true is likely, therefore we will not reject the hypothesis

What is Inductive Logic Programming in Machine Learning?

Inductive Logic Programming (ILP) is a subfield of machine learning which uses logical programming representing background knowledge and examples.

Estimating Volatility

Let x(i)=ln[S(i)/S(i-1)]. Then E[x^2]=∑[x(i)^2/n], x-bar=∑[x(i)/n] and s^2=[n/(n-1)]*(E[x^2]-x-bar^2) => σ≈√s^2√t

what are the characteristics for informational systems?

Level of detail Periodic Requirements are not always known Managerial requirements Optimized for access Historical data Data can be integrated avalaibility

Industry Analysis Current Industry Structure

Life Cycle(Growth, transition, maturity) Performance, Margins Major Player and Market Share Industry change(new players, new technology) Drivers(brand, size, technology)

If Profits are declining yet Revenues have gone up......

Look to see if there were Changes in Cost any additional expenses changes in price the product mix changes in customer needs

2) Mention the difference between Data Mining and Machine learning?

Machine learning relates with the study, design and development of the algorithms that give computers the capability to learn without being explicitly programmed. While, data mining can be defined as the process in which the unstructured data tries to extract knowledge or unknown interesting patterns. During this process machine, learning algorithms are used.

When to Issue a "Qualified Opinion"

Misstatements are material but *NOT* pervasive Examples: 1. Inadequate Disclosure 2. Material Misstatement

Power

My decision: Reject Ho The truth: Ho is false

What are the a main components of the statement of cash flows?

Opening balance Receipts Payments Closing Balance

What are they 3 components of the statement of cash flows?

Operating Investing Financing

Profit

Revenue - Costs OR Revenue*Margin %

How can you iterate over a list and also retrieve element indices at the same time?

This can be done using the enumerate function which takes every element in a sequence just like in a list and adds its location just before it.

Unstructured Data

Unstructured data, are just that, "unstructured," meaning that they do not conform to data models and associated metadata.

The relationship between sales and accounts receivable may be stated as a _____ turnover.

accounts receivable

shared values

company's principles

skills

competencies

If a company issues financial statements that purport to present financial position and results of operations but

omits the related statement of cash flows, the auditor will normally conclude that the omission requires a qualified opinion.

asset to

roi/profit margin

Sample size assumption

sample size must be sufficiently large

The excess of current assets over current liabilities is called_____.

working capital

if profits are declining

yet rev increased examine...,-change in costs -additional expenses -changes in prices -the product mix -change in customer needs

company (cheng)

- capabilities and expertise - distribution channels - cost structure (fixed vs. variable) - investment cost - intangibles (brand, reputation, etc.) - financial situation - organizational structure

competition (cheng)

- competitor market share concentration - competitor behaviors (target segment, products, pricing, distribution) - best practices (are they doing things we're not) -barriers to entry (do we need to worry about new entrants to market) - supplier concentration -regulatory environment

The income stmt is a presentation of operating results under accrual basis

-Assumed it is not a good reflection of cash inflows and outflows (current) -BUT, if company is doing a reasonable job collecting receivables and paying payables, accrual basis should give us a good idea of what the future will look like -Illustrate what cash flows will look like

Stage 2: Matching stage

-Focuses on generating feasible alternative strategies by aligning key external and internal factors -Techniques include the strengths-weaknesses-opportunities-threats (SWOT) matrix, the strategic position and action evaluation (SPACE) matrix

Strategic position and action evaluation (SPACE) matrix

-Four quadrant framework indicates whether aggressive, conservative, defensive or competitive strategies are most appropriate for a given organization -2 internal and 2 external dimensions -End up with a coordinate point

When to give an adverse opinion (examples)

-GAAP consistency change (unjustified) = auditor disagrees -Inadequate disclosure -Departure from GAAP (unjustified) -Unreasonable accounting estimate

The grand strategy matrix: Quadrant IV

-Have characteristically high cash flow levels and limited internal growth needs and often can pursue related or unrelated diversification successfully

Competitive profile matrix

-Identifies firm's major competitors and their strengths and weaknesses in relation to a sample firm's strategic positions -Critical success factors include internal and external issues

Growth strategies - Investigate industry

-Is the industry growing? - How are we growing relative to the industry? -Are our prices in line with our competitors? - What have our competitors done in marketing and product development? - Which segments of our business have the highest future potential? - Do we have funding to support higher growth?

Mergers & Acquisitions - Price

-Is the price fair? - How are they going to pay for it? - Can they afford it? - If the economy sours, can they still make their debt payments?

Strategy/Performance *levels of and changes in:*

-Performance Measures -Critical Success Factors -Alignment of Strategy and results Time Series and/or Cross-sectional

Accounting is

-the classification and recording of transactions. -the presentation and interpretation of the results of those transactions. -the projection of future activities, selection of best strategy

entering a new market

1. why? goal/objective 2. determine state of current and future market 3. investigate market -- does entering it make good business sense? 4. major ways to enter new market: start from scratch, acquire existing player, joint venture

population data: India

1.2B

India

1.3 billion

population data: China

1.3B

China

1.4 billion

Issuer Report Adverse Opinion

1.Introductory paragraph: no change. 2.Scope paragraph: no change. 3.Middle paragraph. 4.Adverse opinion paragraph.

Issuer Report Qualified Opinion

1.Introductory paragraph: no change. 2.Scope paragraph: no change. 3.Middle paragraph. 4.Qualified opinion paragraph.

1100

12

population data: Japan

125M

010101

25

011011

33

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include:

(1) A description and quantification of the financial effects of any misstatement that relates to specific amounts in the financial statements. (a). If it is not practicable to quantify the financial effects, this should be stated.

Material Misstatements related to Appropriateness of Accounting Policies

(1) Accounting policies are not in accordance with the applicable financial reporting framework. (2) The financial statements do no represent the underlying transactions and events in a manner that achieves fair presentation. (3) The entity has not compiled with the financial reporting framework requirements for accounting for and disclosing changes in accounting policies.

Material Misstatements related to Application of Accounting Policies

(1) Management has not applied accounting policies in accordance with the applicable reporting framework (i.e. Expensing rather than capitalizing a fixed asset). (2) Management has not applied accounting policies consistently between periods or to similar transactions and events. (3) There is an error in application of an accounting policy.

Nature of Material Misstatements. (≠ GAAP) A material misstatement of the financial statements may arise in relation to the following:

(1) The appropriate of accounting policies. (2) The application of accounting policies. (3) The appropriateness of the financial statement presentation or the appropriateness or adequacy of disclosure in the financial statements.

Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Adverse Opinion Paragraph. This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Adverse Opinion." This paragraph should include:(3)

(2). An explanation of how disclosures are misstated if there is a material misstatement related to narrative disclosure. (3). A description of the nature of omitted information and inclusion of information, when practicable, if there is an omission that is required to be presented or disclosed.

The Independent Audit Function: The Basics

*GAAS* -Provide FS users with opinion on whether the FS are presented fairly, in material respect, in accordance to applicable financial reporting framework. -Applicable reporting framework: acceptable in view of the nature of the entity and objective of FS or required by law or regulation. Ex. GAAP or IFRSs, and special purpose framework -Auditor report gives credibility to FS. Have an objective view and report on companies activities without bias or conflict of interest -FS: prepared by management of company, not by auditor. They are product and property of company.

Other Auditing Publications

*Lease Authoritative - 3rd Level of audit guidance* -No authoritative status but may be helpful to auditor -Examples: Auditing articles in hournal of accountancy, textbooks

Financial Statements

-A Summary of Financial Transactions -Make up Annual Report

what the components of architected data mart layer?

1. Business transformation layer uses business logic to transform transactional data from the propagation layer into their business context. 2. reporting layer- the read-optimized data cube that is used for queries and analytics. 3. operational data store (ODS)- maintains operational data that may be subject to changes. 4. virtualization layer- This layer is available to virtualize reporting structures.

3 Questions generally answered by accounting

1. How are we doing? -a scorecard 2. What problems should be looked at? -attention directing 3. What is the best way to do a job? -problem solving

four Ps

1. product 2. price 3. place/placement 4. promotion

011100

34

days sales outstanding

= accounts receivable / total sales * 365 how long it takes to collect its sales

csat

= asking the question: how likely are you to recommend __________ to a friend/colleague? (aka net promoter scale)

take rate

= number of accepted offers / number of contacts

test drive

= the customer pretest of a product or service prior to purchase

Most organizations simultaneously pursue

A combination of two or more strategies, but a combination strategy can be exceptionally risky if carried too far -No org can afford to pursue all the strategies that might benefit them -Difficult decisions must be made and priorities must be established

data warehouse

A data warehouse is a database architecture that provides the persistent (permanent) storage of summarized, harmonized, cleansed, and consolidated data, often from multiple sources, to serve as a single source of truth for decision making

What's the difference between a generative and discriminative model?

A generative model will learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Discriminative models will generally outperform generative models on classification tasks.

What section(s) is added to the Auditor's report when a Qualified or Adverse Opinion is issued for an issuer?

A middle paragraph(s) is added to the Auditor's report that is placed immediately before the opinion paragraph.

For a change in which of the following inputs into the BlackScholesMerton option pricing model will the direction of the change in a put's value and the direction of the change in a call's value be the same? A) Volatility. B) Exercise price. C) Riskfree rate.

A) Volatility. A decrease/increase in the volatility of the price of the underlying asset will decrease/increase both put values and call values. A change in the values of the other inputs will have opposite effects on the values of puts and calls.

What is A/B Testing?

A/B Testing is a experiment design method where we compare 2 versions of a web page or an ad and see which one performs better. An example of this is email marketing: -Cx database of 2000 people -1000 send: "offer ends this week!" -1000 send: "offer ends soon!" -and compare which one is better in terms of purchases

average true range

ATR- measure of volatility of a security. simple moving average (14 day) of a companys true range. true range is the highest of: high in a period minus low in a period abs(period high - previous close) abs(period low - previous close)

Which is true about the normal distribution?

About 95% of observations are within 2 standard deviations of the mean About 68% of observations are within 1 standard deviation of the mean It is symmetric

What is the center value of the distribution of the sample means?

According to the Central Limit Theorem, if we take enough large samples, the mean of the set of sample means equals the population mean.

Current Liabilities

Accounts payable +notes payable (short term)

Competitive Response Strategy Element

Acquire a Competitor Merge with Competitor Copy a Competitor Hire the Competitor's Mangement Increase profile with marketing campaign

WO: Weaknesses and opportunities

Aim at improving internal weaknesses by taking advantage of external opportunities

How would you simulate the approach AlphaGo took to beat Lee Sidol at Go?

AlphaGo beating Lee Sidol, the best human player at Go, in a best-of-five series was a truly seminal event in the history of machine learning and deep learning. The Nature paper above describes how this was accomplished with "Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play."

Operating cycle

Amount of times it takes to turn inventory into cash

What are some differences between a linked list and an array?

An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.

A material change in accounting principle would result in the addition of

An emphasis-of-matter paragraph to the unmodified opinion.

what is an exmaple structured data?

An example of structured data is a database about restaurants. It stores attributes such as restaurant name, location, phone number, and cuisine.

What is power analysis?

An experimental design technique for determining the effect of a given sample size.

How would you control the number of partitions of a RDD?

Ans You can control the number of partitions of a RDD using repartition or coalesce operations.

What is Speculative Execution of a tasks?

Ans: Speculative tasks or task strugglers are tasks that run slower than most of the all tasks in a job.

what are the characteristics of transactional Systems?

Availability—Because businesses cannot afford to lose any computing time due to system failures, systems that process transactions should be available as close to 100% of the time as possible. Level of Detail—The data of transactional systems should be available in full detail so that each transaction as well as its content, creator, date, and details are available at all times. Updatable—By their very nature, business transactions are created, updated or changed and deleted quite frequently. Speed—The ability to process large quantities of transactions is critical in business systems. Current—Transactional systems are current, which means they store only recent transactions, frequently a year or two of data. Operational—OLTP systems are operational in nature

Customer Lifetime Value

Avg Customer Contribution Margin ($) per year* Customer Lifetime OR Avg Customer Contrubution Margin ($)* (Retention Rate/(1 + Discount - Retention Rate)) --> calculates time value of money using a discount rate (more time consuming)

The floatingrate payer in a simple interestrate swap has a position that is equivalent to: A) a series of long forward rate agreements (FRAs). B) a series of short FRAs. C) issuing a floatingrate bond and a series of long FRAs.

B) a series of short FRAs. The floatingrate payer has a liability/gain when rates increase/decrease above the fixed contract rate the short position in an FRA has a liability/gain when rates increase/decrease above the contract rate.

Breakeven

BErev = FC / CM% CM% = (rev-vc)/rev BE units = FC/CM CM= $/unit - vc/unit

What is Bayes' Theorem? How is it useful in a machine learning context?

Bayes' Theorem gives you the posterior probability of an event given what is known as prior knowledge. Mathematically, it's expressed as the true positive rate of a condition sample divided by the sum of the false positive rate of the population and the true positive rate of a condition. Say you had a 60% chance of actually having the flu after a flu test, but out of people who had the flu, the test will be false 50% of the time, and the overall population only has a 5% chance of having the flu. Would you actually have a 60% chance of having the flu after having a positive test? Bayes' Theorem says no. It says that you have a (.6 * 0.05) (True Positive Rate of a Condition Sample) / (.6*0.05)(True Positive Rate of a Condition Sample) + (.5*0.95) (False Positive Rate of a Population) = 0.0594 or 5.94% chance of getting a flu. Bayes' Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That's something important to consider when you're faced with machine learning interview questions.

Why is randomization important in experimental design?

Because it balances out confounding variables. You can ensure possible confounding variables are balanced out.

Entering the Market - Why?

Brainstorm assumptions & clarify with interviewer: -What is the goal? -What is our objective? -How does this fit into overall strategy?

Why is Accounting Necessary?

Budgeting Financial Statements Investment Review Business Plan

A U.S. firm (U.S.) and a foreign firm (F) engage in a 3year annual pay plainvanilla currency swap U.S. is the fixed rate payer in FC. The fixed rate at initiation was 5%. The variable rate at the end of year 1 was 4% at the end of year 2 was 6%, and at the end of year 3 was 7%. At the beginning of the swap, $2 million was exchanged at an exchange rate of 2 foreign units per $1. At the end of the swap period the exchange rate was 1.75 foreign units per $1. At the end of year 1, firm: A) F pays firm U.S. $200,000. B) U.S. pays firm F $200,000. C) U.S. pays firm F 200,000 foreign units.

C) U.S. pays firm F 200,000 foreign units. A plainvanilla currency swap pays floating on dollars and fixed on foreign. Fixed on foreign 0.05 × $2,000,000 × 2 foreign units per $1 = 200,000 foreign units paid by the U.S. firm.

The major benefit of the BCG matrix is that it draws attention to the

Cash flow, investment characteristics, and needs of an organization's various divisions

Profitability is associated with the

Cash flows and income statements

When should you use classification over regression?

Classification produces discrete values and dataset to strict categories, while regression gives you continuous results that allow you to better distinguish differences between individual points. You would use classification over regression if you wanted your results to reflect the belongingness of data points in your dataset to certain explicit categories (ex: If you wanted to know whether a name was male or female rather than just how correlated they were with male and female names.)

Which technique is used to predict categorical responses?

Classification technique is used widely in mining for classifying data sets.

Why data cleaning plays a vital role in analysis?

Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.

What is the difference between Cluster and Systematic Sampling?

Cluster sampling is a technique used when it becomes difficult to study the target population spread across a wide area and simple random sampling cannot be applied. Cluster Sample is a probability sample where each sampling unit is a collection, or cluster of elements. Systematic sampling is a statistical technique where elements are selected from an ordered sampling frame. In systematic sampling, the list is progressed in a circular manner so once you reach the end of the list,it is progressed from the top again. The best example for systematic sampling is equal probability method.

Industry Analysis Approachs

Current Industry Structure Supplier Future

Which data scientists do you admire most? which startups?

DJ Patil, First US Chief Data Scientist, for using Data Science to make US government work better. Hadley Wickham, for his fantastic work on Data Science and Data Visualization in R, including dplyr, ggplot2, and Rstudio.

Key features of big data

Data is available in real time Data is available at a larger scale Data is available on novel types of variables

What are Expenses?

Decreases in economic benefits during the accounting period in the form of outflows or depletions of assets or increases of liabilities that result in decreases in equity, other than those relating to distributions to equity participants. Salaries and wages Rent and rates Light and heat Stationary costs Director Expenses Depreciation

Dimension Tables

Details regarding the master data are stored in separate tables

Competitors include (3):

Direct competition Substitute solutions Potential entrants

Sample Report:Qualified Opinion Due to Inadequate Disclosure Nonissuer. Independent Auditor's Report. Qualified Opinion: In our opinion

EXCEPT for the omission of the information described in the Basis for Qualified Opinion paragraph, the financial statements referred to,above PRESENT FAIRLY, in all material respects, the financial position of ABC Company as of December 31, 2001 and 2000, and the results of its operations and its cash flows for the years then ended in accordance with accounting principles generally accepted in the United States of America.

Ex: Hacker Gaurd has been the industry leader in ID-theft monitoring but inconsistent profits and losses in last 6/10 quarters has made it had it hard to do an IPO How can it reduce the turmoil and increase profits?

Economy Industry Revenues Costs

Symmetric

For every x,y ∈ A, xRy → yRx | if x is related to y then y is related to x

An auditor may express a disclaimer of opinion when the auditor is unable to obtain sufficient appropriate audit evidence on which to base an opinion.

For example, when an auditor is unable to determine the extent of or the amounts associated with a pervasive employee fraud scheme , there is not sufficient appropriate audit evidence , and an expression of disclaimer of opinion may be appropriate.

Forward integration

Gaining ownership or increased control over distributors or retailers

The Audit Process

General Principles: Overall objectives, documentation, communication, quality control - firm 1. Engagement Acceptance: -Ethics and independence -Terms of engagement 2. Assess Risk and Plan Response: -Audit planning, including audit strategy -Materiality -Risk assessment procedures: understand the entity and environment & understand internal control -Identify and assess risk -Respond to risk 3. Perform Producedures and Obtain Evidence -Test of controls, if applicable -Substantive Testing 4. Form Conclusions -Subsequent Events -Management representation -Evaluate audit results -Quality Control - engagement 5. Reporting -Report on audited financial statements -Other reporting considerations

How would we reduce variance?

Get more data / decrease complexity of the mode

Porter's 5 forces

Good for Entering New Market Developing New Product Starting a new business 1. The threat of new or potential entrants 2. Intensity of rivalry among existing competitors 3. Pressure from substitution products 4. Bargaining power of buyers 5. Bargaining power of suppliers

what languages use tagged data?

HTML XML XBRL

Cash cows III

High market share, low growth rate -Generating cash in excess

Strategy/Performance *Strategy - Low cost provider*

High volume, low margin

Porter's 5 Forces 1. Threat of New or Potential Entrants

If barriers are high then new comers can expect entrenchment or retaliatory forces from the existing competitors. Some Barriers to Entry are: Economies of Scale Capital Requirements Government Policy Switching Costs Access to Distribution Channels Product Differentiation Proprietary Product Technology

What is Perceptron in Machine Learning?

In Machine Learning, Perceptron is an algorithm for supervised classification of the input into one of several possible non-binary outputs.

What is a statistical interaction?

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the simultaneous influence of two variables on a third is not additive. Most commonly, interactions are considered in the context of regression analyses.

What is supervised learning?

In supervised learning, tuples of examples (input, desired output) are available and the computer uses this to build a model where a given input produces an output (with minimal error)

Can you cite some examples where both false positive and false negatives are equally important?

In the banking industry giving loans is the primary source of making money but at the same time if your repayment rate is not good you will not make any profit, rather you will risk huge losses. Banks don't want to lose good customers and at the same point of time they don't want to acquire bad customers. In this scenario both the false positives and false negatives become very important to measure. These days we hear many cases of players using steroids during sport competitions Every player has to go through a steroid test before the game starts. A false positive can ruin the career of a Great sportsman and a false negative can make the game unfair.

Define Unsupervised Machine Learning

In unsupervised learning, there is only an input data (X) but the output variable isn't known. The algorithm is left on its known to discover underlying patterns or structures within the data. Two common types of unsupervised learning algorithms: 1. Clustering: used in marketing where we try to grouping customers by purchasing behaviour. 2. Association Rule Mining: determine underlying patterns, such as people who buy X tend to also buy Y

What is 'Training set' and 'Test set'?

In various areas of information science like machine learning, a set of data is used to discover the potentially predictive relationship known as 'Training Set'. Training set is an examples given to the learner, while Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of example held back from the learner. Training set are distinct from Test set.

Option Greek Definition: Vega

Increase in option value per percentage point increase in volatility (0.01∂C/∂σ)

Increasing Sales How? Element

Increase volume Increase amount of each sale Increase Prices Create seasonal balance

Why instance based learning algorithm sometimes referred as Lazy learning algorithm?

Instance based learning algorithm is also referred as Lazy learning algorithm as they delay the induction or generalization process until classification is performed.

What cross-validation technique would you use on a time series dataset?

Instead of using standard k-folds cross-validation, you have to pay attention to the fact that a time series is not randomly distributed data — it is inherently ordered by chronological order. If a pattern emerges in later time periods for example, your model may still pick up on it even if that effect doesn't hold in earlier years! You'll want to do something like forward chaining where you'll be able to model on past data then look at forward-facing data. fold 1 : training [1], test [2] fold 2 : training [1 2], test [3] fold 3 : training [1 2 3], test [4] fold 4 : training [1 2 3 4], test [5] fold 5 : training [1 2 3 4 5], test [6]

Cash Conversion Cycle

Inventory Conversion + Receivables Collection Period + Payables Deferral Period

How do you think Google is training data for self-driving cars?

Machine learning interview questions like this one really test your knowledge of different machine learning methods, and your inventiveness if you don't know the answer. Google is currently using recaptcha to source labelled data on storefronts and traffic signs. They are also building on training data collected by Sebastian Thrun at GoogleX — some of which was obtained by his grad students driving buggies on desert dunes!

Rate Earned on Common Stockholders' Equity = ( ? - ? ) / ?

Net Income - Preferred Dividends Average Common Stockholders' Equity

Earnings per Share (EPS) on Common Stock = ( ? - ? ) / ?

Net Income - Preferred Dividends Shares of Common Stock Outstanding

Rate Earned on Stockholders' Equity = ? / ?

Net Income / Average Total Stockholders' Equity

Number of Times Preferred Dividends Are Earned = ? / ?

Net Income / Preferred Dividends

Net Profit Margin (%)

Net Profit ($) / Revenue

Ratio of Net Sales to Assets = ? / ?

Net Sales / Average Total Assets (exclude long-term)

Net profit margin

Net income/sales revenue

Competitive Response Why? element

New Product? Competitor Strategy Changed? Other competitors increased markets share?

Intangible Asset

Not a separate entity in balance sheet Intellectual property: patents, copyright, goodwill

How can outlier values be treated?

Outlier values can be identified by using univariate or any other graphical analysis method. If the number of outlier values is few then they can be assessed individually but for large number of outliers the values can be substituted with either the 99th or the 1st percentile values. All extreme values are not outlier values.The most common ways to treat outlier values - 1) To change the value and bring in within a range 2) To just remove the value.

What are the elements of the Company Description (7)?

Overview/Goals Basic product offering Company history Markets to be served Company location Stage of business Financing to date

P(A and B) =

P(A) * P(B)

Put Profit

P(S(h),K,T-h)-P(S(0),K,T)e^(rh)

What does P-value signify about the statistical data?

P-value is used to determine the significance of results after a hypothesis test in statistics. P-value helps the readers to draw conclusions and is always between 0 and 1. • P- Value > 0.05 denotes weak evidence against the null hypothesis which means the null hypothesis cannot be rejected. • P-value <= 0.05 denotes strong evidence against the null hypothesis which means the null hypothesis can be rejected. • P-value=0.05is the marginal value indicating it is possible to go either way

P/E Ratio

P0 / EPS Expected in One Year

|Critical value|<|Test statistic|

Reject null hypothesis. There is sufficient evidence that Ha is true

Replication

Replication ensures that the source data remain intact, they can be in real time or in batches.

Regularization

Smoothing a model to prevent overfitting

Facial recognition software is able to recognize your friends on FB so you can tag them on it

Supervised Learning

How is kNN different from k-means clustering?

Supervised classification algorithm, unsupervised clustering algorithm

What is the difference between supervised and unsupervised machine learning?

Supervised learning requires training labeled data. For example, in order to do classification (a supervised learning task), you'll need to first label the data you'll use to train the model to classify data into your labeled groups. Unsupervised learning, in contrast, does not require labeling data explicitly.

What is the difference between heuristic for rule learning and heuristics for decision trees?

The difference is that the heuristics for decision trees evaluate the average quality of a number of disjointed sets while rule learners only evaluate the quality of the set of instances that is covered with the candidate rule.

List down various approaches for machine learning?

The different approaches in Machine Learning are a) Concept Vs Classification Learning b) Symbolic Vs Statistical Learning c) Inductive Vs Analytical Learning

What are the different methods for Sequential Supervised Learning?

The different methods to solve Sequential Supervised Learning problems are a) Sliding-window methods b) Recurrent sliding windows c) Hidden Markow models d) Maximum entropy Markow models e) Conditional random fields f) Graph transformer networks

Central Limit Theorem

The distribution of the sample average tends to be normal, even when the distribution from which it is taken is non-normal!

What is bias-variance decomposition of classification error in ensemble method?

The expected error of a learning algorithm can be decomposed into bias and variance. A bias term measures how closely the average classifier produced by the learning algorithm matches the target function. The variance term measures how much the learning algorithm's prediction fluctuates for different training sets.

Prices are only stable when 3 conditions are met.....

The growth rate for all competitors is approx. the same The prices are paralleling costs The prices of all competitors are roughly of equal value

what are the different types of data modeling?

The hierarchical model- assume parent child relationship between data. The network model- enhanced version of the first model where the child can have more than one parent. Object-oriented data modeling- organisation of entities as objects. Relational data modeling

What are the components of relational evaluation techniques?

The important components of relational evaluation techniques are a) Data Acquisition b) Ground Truth Acquisition c) Cross Validation Technique d) Query Type e) Scoring Metric f) Significance Test

What is inductive machine learning?

The inductive machine learning involves the process of learning by examples, where a system, from a set of observed instances tries to induce a general rule.

What is P-value and what does it signify about the statistical data?

The p-value is used to determine the significance of a result when we are conducting a Hypothesis Test in Statistics. Generally, when the p-value <= 0.05, it indicates strong evidence AGAINST the null hypothesis (or status quo) which means the null hypothesis can be rejected. (e.g. Pizza Hut claims 30 minute guarantee)

If Sales and Marketshare are increasing but profits are declining.....

Then you need to investigate whether prices are dropping and/or costs are climbing. How ever if costs arent the issue then investigate the product mix and check to see if the margins have changed

Credit event binary options (CEBO)

These are options that provides a fixed payoff if a particular company suffers a credit event such as bankruptcy, failure to pay interest or principal on debt, and a restructuring of debt.

Unrelated diversification v Related Diversification

Unrelated diversification No exchanges or linkages among divisions Easiest and cheapest strategy to manage Allows corporate managers to evaluate division performance accurately Divisions have considerable autonomy unless Related Diversification Gains from pursuing multibusiness model are derived from the transfer, sharing, and leveraging of R&D knowledge, industry information, customer bases across divisions Company needs to develop corporate culture that stresses cooperation among divisions and the corporate team rather than focusing purely on divisional goals Rewarding divisions more difficult because divisions share activities

SO: Strengths and opportunities

Use a firm's internal strengths to take advantage of external opportunities

What's a null hypothesis?

We want to see if we can reject the status quo as being highly improbable

Entering a New Market 2. Determine Current and Future Market States

What is the size of the current market? What is the growth trend? Where is the industry in its life cycle?(Stage of development: Emerging, Mature, Declining?) Who are the customers and how are the segmented? What role does technology play in the industry and how quickly will it change? How will the competition respond?

Entering a New Market 3. Investigate the market to determine whether it makes good business sense(Porters 5 forces)

Who are our competitors and what size market share do they have? How do their products differ from ours? How will we price our products or services? Are substitutions available? Are their any barriers to entry?( Ex: Capital Reqs, Access to Raw Materials, Access to Distribution Channels, Gov Policy) Are there any barriers to exit? How would we exit if the market sours? What are the risks?(Markets regulations or Technology)

New Product Customers Elements

Who? How to reach them? Retention- how to hold them?

Entering a New Market 1. Questions about the Company!

Why does the company want to enter this market? What are the different Revenue Streams and Trends? What is their Product Mix? What are their costs and how have they changed over time? What makes up their customer segmentation? What constitutes Success?(How much market share and what time frame?)

Competitive Response Approach

Why? Strategy

Spark and HDFS

With HDFS the Spark driver contacts NameNode about the DataNodes (ideally local) containing the various blocks of a file or directory as well as their locations (represented as InputSplits ), and then schedules the work to the "SparkWorkers. Spark's compute nodes / workers should be running on storage nodes.

How do you handle missing or corrupted data in a dataset?

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value. In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.

Do you have experience with Spark or big data tools for machine learning?

You'll want to get familiar with the meaning of big data for different companies and the different tools they'll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don't have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you'll want to invest in familiarizing yourself with them.

For K₁>K₂>K₃

[P(S,K₁,T)-P(S,K₂,T)]/[K₁-K₂] ? [P(S,K₂,T)-P(S,K₃,T)]/[K₂-K₃],≥

X follows a normal distribution with mean m and standard deviation s. Which of the following must be true about X?

a. Median of X is m b. Expected value of X is m c. Nearly 95% of values of X lie between m-2s and m+2s d. 50% of value lie above m e. Interquartile range of X is less than 2s

Auditor's Responsibility Paragraph Changes

add in "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the adverse audit opinion"

Auditor's Responsibility Paragraph Changes

add in "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the qualified audit opinion"

dividend history

amount of time a company has paid a dividend dividend aristocrats is 25+ yrs of growth dividend kings is 50+ yrs of growth

An algorithm gets input data of a thousand pictures and groups the pictures into humans

animals, scenery,Unsupervised Learning

Beneish- M score

are companies manipulating earnings includes 5 variables -6.065 + 0.823(DSRI) + 0.906(GMI) + 0.593(AQI) + 0.717(SGI) + 0.107(DEPI) DSRI= days sales in receivables invdex GMI= gross margin index AQI= asset quality index SGI= sales growth index DEPI= depreciation index

Differentiate between univariate

bivariate and multivariate analysis.,descriptive statistical analysis techniques, pie charts of sales based on territory, difference between 2 variables, scatter plot, analyzing the volume of sale and a spending, study of more than two variables

You have 20 bottles of pills. 19 bottles have 1.0 gram pills

but one has pills of weight 1.1 grams. Given a scale that provides an exact measurement, how would you find the heavy bottle? You can only use the scale once.,Because we can only use the scale once, we know something interesting: we must weigh multiple pills at the same time. In fact, we know we must weigh pills from at least 19 bottles at the same time. Other wise, if we skipped two or more bottles entirely, how could we distinguish between those missed bottles? Remember that we only have one chance to use the scale. So how can we weigh pills from more than one bottle and discover which bottle has the heavy pills? Let's suppose there were just two bottles, one of which had heavier pills. If we took one pill from each bottle, we would get a weight of 2.1 grams, but we wouldn't know which bottle contributed the extra 0.1 grams. We know we must treat the bottles differently somehow. If we took one pill from Bottle #1 and two pills from Bottle #2, what would the scale show? It depends. If Bottle #1 were the heavy bottle, we would get 3.1 grams. If Bottle #2 were the heavy bottle, we would get 3.2 grams. And that is the trick to this problem. We know the "expected" weight of a bunch of pills. The difference between the expected weight and the actual weight will indicate which bottle contributed the heavier pills, provided we select a different number of pills from each bottle. We can generalize this to the full solution: take one pill from Bottle #1, two pills from Bottle #2, three pills from Bottle #3, and so on. Weigh this mix of pills. If all pills were one gram each, the scale would read 210 grams (1 + 2 + • • • + 20 = 20 * 21 / 2 = 210). Any "overage" must come from the extra 0.1 gram pills. This formula will tell you the bottle number: weight- 210 grams 0. l grams So, if the set of pills weighed 211.3 grams, then Bottle #13 would have the heavy pills.

why do most ads fail?

consumers don't know which company it's for (example- nivea electric shaver)

Inventory Turnover = ? / ?

costs of goods sold / average inventory

Formula: ∆call

e^(-∂T)*N(d1)

in other words

exceptions to the norm. 3. Intelligent control systems

Abnormal non-recurring revenues

expenses, gains, and losses (Possible but not probable),1. Income from discontinue operations 2. Extraordinary items 3. Cumulative effect of a change in accounting principle

Appropriateness of Financial Statement Presentation or Disclosures

financial statements don't include all required disclosures disclosures aren't presented in accordance with the applicable financial reporting framework financial statements don't provide the disclosures needed to achieve fair presentation information required hasn't been included or disclosed in the financial statements

Cash Return on invested capital (CROIC

free cash flows devided by invested capital (assets minue cash and non-interest bearing current liabilities)

Non-sampling errors cannot be fixed by

having a larger sample size

active share

how different a fund or portfolios holdings are from the benchmark greater the difference means a higher active share

Entering a New Market 4. If we decide to enter the market

how do we do it?,Start from scratch and grow organically Aquire an existing player from within our industry Form a joint ventre/ Strategic alliance with another player with a similiar interest. What can both sides bring to the venture? Cost benefit analysis of each one

During analysis

how do you treat missing values?,The extent of the missing values is identified after identifying the variables with missing values. If any patterns are identified the analyst has to concentrate on them as it could lead to interesting and meaningful business insights. If there are no patterns identified, then the missing values can be substituted with mean or median values (imputation) or they can simply be ignored.There are various factors to be considered when answering this question- Understand the problem statement, understand the data and then give the answer.Assigning a default value which can be mean, minimum or maximum value. Getting into the data is important. If it is a categorical variable, the default value is assigned. The missing value is assigned a default value. If you have a distribution of data coming, for normal distribution give the mean value. Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you can answer that you would be dropping the variable instead of treating the missing values.

Please tell me

how execution starts and end on RDD or Spark Job,Ans: Execution Plan starts with the earliest RDDs (those with no dependencies on other RDDs or reference cached data) and ends with the RDD that produces the result of the action that has been called to execute.

inventory turnover ratio

how many times in one year a companys inventory is being replaced. sales divided by average inventory in period

brand equity question

how much more are consumers willing to pay for a branded product versus a non-branded product (good branding enables a price premium over non-branded products)

systems

information, budgeting, planning, innovation, compensation, performance measurement

In experimental design

is it necessary to do randomization? If yes, why?,

p-value

is the probability of obtaining the observed sample results, or more extreme results, when the null hypothesis is actually true.

A key limitation of covariance as a descriptive measure is that it

is very sensitive to the units of the variables

momentum

it is a measure of past performance of a stock positive momentum has been shown to produce market beating returns over the next month 12 month

The difference between the rate earned on stockholders' equity and the rate earned on total assets is called ______.

leverage

structure

lines of authority, chains of command, communication channels

Application of Accounting Policies

management hasn't applied accounting policies in accordance with the applicable financial reporting framework management hasn't applied accounting policies consistently error in the application of an accounting policy

How would you hint

minimum number of partitions while transformation ?,Ans: You can request for the minimum number of partitions, using the second input parameter to many transformations. scala> sc.parallelize(1 to 100, 2).count Preferred way to set up the number of partitions for an RDD is to directly pass it as the second input parameter in the call like rdd = sc.textFile "hdfs://... /file.txt", 400) , here400 is the number of partitions. In this case, the partitioning makes for 400 splits that would be done by the Hadoop's Te tI putFor at , ot "park a d it ould ork u h faster. It'salso that the ode spawns 400 concurrent tasks to try to load file.txt directly into 400 partitions

What are feature vectors?

n-dimensional vector of numerical features that represent some object term occurrences frequencies, pixels of an image etc. Feature space: vector space associated with these vectors

Product segments

nature, commodity vs differentiable good, complementary goods, substitute goods, life cycle

In most cases

net income and income from continuing operations,Are one in the same

return on assets (ROA)

net income divided by total assets

ros

net income/sales

nopat

net operating profit after tax

ROIC

net profit / invested capital

test drive conversion rate

number of purchases / number of test drives aka- conversion rates to sales

operating cash flow & cash flow from operations

on cash flow statement - shows a company's cash flows from normal operations. it does not include depreciation and amortization, as well as other non-cash charges

free cash flow

operating cash flow minus capital expenditures. cash based measure that does not suffer from the issues of accrual based accounting

Profitability analysis focuses primarily on the relationship between _____ and ______.

operating results and the resources available to a business

Expenses

opposite side of revenue- all costs of business (tax, interest, payroll, r & d)

A ratio that measures the "instant" debt‐paying ability of a company is called the _____ ratio

or acid‐test ratio.,quick

institutional ownership

precentage of ownership of a stock by large investors such as hedge funds, etfs, mutual funds, private equity funds, and pension funds

Are you familiar with price optimization

price elasticity, inventory management, competitive intelligence? Give examples.,Price optimization is the use of mathematical tools to determine how customers will respond to different prices for its products and services through different channels. Big Data and data mining enables use of personalization for price optimization. Now companies like Amazon can even take optimization further and show different prices to different visitors, based on their history, although there is a strong debate about whether this is fair. Price elasticity in common usage typically refers to Price elasticity of demand, a measure of price sensitivity. It is computed as: Price Elasticity of Demand = % Change in Quantity Demanded / % Change in Price. Similarly, Price elasticity of supply is an economics measure that shows how the quantity supplied of a good or service responds to a change in its price. Inventory management is the overseeing and controlling of the ordering, storage and use of components that a company will use in the production of the items it will sell as well as the overseeing and controlling of quantities of finished products for sale. Wikipedia defines Competitive intelligence: the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Tools like Google Trends, Alexa, Compete, can be used to determine general trends and analyze your competitors on the web.

4 Ps/marketing mix

price, product, promotion, placement

Another profitability measure quoted by the financial press is the _______ ratio on common stock. This ratio measures a company's future earnings prospects.

price‐earnings (P/E)

Sample Report: Qualified Opinion Due to a Material Misstatement of the Financial Statements Issuer (Public Company). Report of Independent Registered Public Accounting Firm. If these lease obligations were capitalized

property would be increased by $______ and $_____ long-term debt,by $_____ and $_____, and retained earnings by $_____ and $______ as of December 31, 2002 and 2001, respectively. Additionally, net income would be increased (decreased) by $____ and $____ and earnings per share would be increased (decreased) by $_____ and $_____, respectively, for years then ended.

10% condition:

sample size, n, must be no more than 10% of the population

Sampling error example

sampling less than the population size

what are the types of data gathering?

sampling-the act of extracting only certain data values from a dataset

Beta

sensitivity to moves in the overall market. farther from 1, above or below, the more sensitive the stock. higher the beta the more risky it is presumed to be

The fixedrate payer in an interestrate swap has a position equivalent to a series of: A) long interestputs and short interestrate calls. B) short interestrate puts and long interestrate calls. C) long interestrate puts and calls.

short interestrate puts and long interestrate calls. The fixedrate payer has profits when short rates rise and losses when short rates fall, equivalent to writing puts and buying calls.

piotroski f- score

simple 9 point scoring systems to seperate businesses based on success points are assigned based on 9 criteria, typically different ratio metrics (profitability, leverage, liquidity, sources of funds and operating efficiency)

what is social media?

social media as online channels for communication. eg Snapchat, Instagram etc.

Qualified Opinion Vs. Adverse Opinion = GAAP Problem.

the auditor uses professional judgement to determine whether to issue a qualified opinion or an adverse opinion when audit evidence indicates that there is material misstatement of the financial statements.

how does npv enable the marketer to compare marketing campaigns or initiatives?

the cost of the campaign is subtracted from the present value for each campaign and you compare these npvs

Alternative Hypothesis

the hypothesis that concludes all values not covered by the null the alternative hypothesis is deemed t be true if the null hypothesis is rejected (HA or H1)

Financial MGMT

the management of all processes associated with the efficient acquisition and deployment of financial resources

Decreasing returns to larger sampling

the margin of error decrease is greatest when going from 100 samples to 200 samples

Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Adverse Opinion Paragraph. When the auditor expresses an adverse opinion

the opinion paragraph should state that, in the auditor's opinion,,BECAUSE Of the significance of the matter(s) described in the basis for adverse opinion paragraph, the financial statements DO NOT PRESENT FAIRLY in accordance with the applicable financial reporting framework.

Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Adverse Opinion Paragraph: When the auditor expresses an adverse opinion

the opinion paragraph should state that, in the auditor's opinion,,Because Of the effects of matters discussed in preceding paragraph(s), the financial statements Do Not Present Fairly, in conformity with accounting principles generally accepted in the United States of America, the financial statements.

For European options

the probability for each ending node is,(nCx)(p*^x)[(1-p*)^(n-x)] for n=nodes and x=#up

Financial Accounting

the process of designing and operating an information system for collecting, measuring and recording an enterprise's transactions and summarising and communicating the results of these transactions to users to facilitate making financial decisions.

Sample Report: Adverse Opinion Due to a Material Misstatement of the Financial Statements (Nonissuer). Basis for Adverse Opinion: Under accounting principles generally accepted in the United States of America

the subsidiary should have been consolidated because it is controlled by the company.,Had XYZ Company been consolidated, many elements in the accompanying consolidated financial statements would have been materially affected. The effects on the consolidated financial statements of the failure to consolidate have not been determined.

How to do cross-validation right?

the training and validation data sets have to be drawn from the same population predicting stock prices: trained for a certain 5-year period, it's unrealistic to treat the subsequent 5-year a draw from the same population common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well Bias-variance trade-off for k-fold cross validation: Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n−1n−1 observations). But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation Typically, we choose k=5k=5 or k=10k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.

In the new post-apocalyptic world

the world queen is desperately concerned about the birth rate. Therefore, she decrees that all families should ensure that they have one girl or else they face massive fines. If all families abide by this policy-that is, they have continue to have children until they have one girl, at which point they immediately stop-what will the gender ratio of the new generation be? (Assume that the odds of someone having a boy or a girl on any given pregnancy is equal.) Solve this out logically and then write a computer simulation of it.,If each family abides by this policy, then each family will have a sequence of zero or more boys followed by a single girl. That is, if "G" indicates a girl and "B" indicates a boy, the sequence of children will look like one of: G

debt to equity

total debt/total equity

Which all are the

ways to configure Spark Properties and order them least important to the most important.,Ans: There are the following ways to set up properties for Spark and user programs (in the order of importance from the least important to the most important): -conf/spark-defaults.conf - the default --conf - the command line option used by spark-shell and spark-submit -SparkConf

wacc

weight portion of debt(less marginal tax rate) vs equity

Gamble: Variance

weighted average squared deviation from expected value

A test has a true positive rate of 100% and false positive rate of 5%. There is a population with a 1/1000 rate of having the condition the test identifies. Considering a positive test

what is the probability of having that condition?,Let's suppose you are being tested for a disease, if you have the illness the test will end up saying you have the illness. However, if you don't have the illness- 5% of the times the test will end up saying you have the illness and 95% of the times the test will give accurate result that you don't have the illness. Thus there is a 5% error in case you do not have the illness. Out of 1000 people, 1 person who has the disease will get true positive result. Out of the remaining 999 people, 5% will also get true positive result. Close to 50 people will get a true positive result for the disease. This means that out of 1000 people, 51 people will be tested positive for the disease even though only one person has the illness. There is only a 2% probability of you having the disease even if your reports say that you have the disease.

For a standard normal distribution (µ=0

σ=1), the area under the curve less than 1.5 is 93.32%. What is the approximate percentage of the area under the curve less than -1.5?,6.68%. 1-93.32%=6.68% is the area under the curve greater than 1.5. Since the normal distribution is symmetric, 6.68% is also the area under the curve less than -1.5.

Sharpe Ratio Relationship: φcall vs φput

φcall=-φput

What are some feature engineering techniques?

1. TF x IDF 2. ChiSquare 3. Kernel Trick 4. Hashing 5. Binning

Porter's 5 Forces 4. Bargaining Power of the Buyers

Buyers compete with the industry by forcing down the prices, bargaining for higher quality or better services. They play the competition against eachother all at the expense of the industry profitability

Which scheduler is used by SparkContext by default?

By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.

How can you avoid overfitting ?

By using a lot of data overfitting can be avoided, overfitting happens relatively as you have a small dataset, and you try to learn from it. But if you have a small database and you are forced to come with a model based on that. In such situation, you can use a technique known as cross validation. In this method the dataset splits into two section, testing and training datasets, the testing dataset will only test the model while, in training dataset, the datapoints will come up with the model. In this technique, a model is usually given a dataset of a known data on which training (training data set) is run and a dataset of unknown data against which the model is tested. The idea of cross validation is to define a dataset to "test" the model in the training phase.

Use business situations framework when you hear

Enter a new market / start a new Introduce a new product Respond to competitors behavior Respond to changes in demand How to grow

Which technique is used to predict categorical responses?

Classification technique

The auditor's inability to determine the amounts associated with illegal acts

Committed by the client's management could result in a disclaimer.

What is the advantage of companion objects in Scala?

Companion objects are beneficial for encapsulating things and they act as a bridge for writing functional and object oriented programming code. Using companion objects, the Scala programming code can be kept more concise as the static keyword need not be added to each and every attribute. Companion objects provide a clear separation between static and non-static methods in a class because everything that is located inside a companion object is not a part of the class's runtime objects but is available from a static context and vice versa.

Pricing Strategies Approach Princing Elements

Company Objective Competitive Pricing Cost-based Pricing Price-based Costing

Pricing Strategies 2.Investigate the Product

How does it compare to the competition? Are their substitutes or alternatives? Where is the product in its growth cycle? Is their a Supply-and-Demand issue at work?

New Product Financing Elements

How funded? Best allocation of funds? Debt Viable?

What is the variance?

How much are a set of numbers spread out? Small variances are close to the mean,

Things that should be in the back of your mind in every case:

How the internet and technology economy competition(internal &external(subs)) affect the company

Market development

Introducing present products or services into a new geographic area

What is power analysis?

Power analysis is an important part of experimental design because it allows us to find out the minimum sample size needed to detect its effect with a certain level of confidence. minimum sample. effect.

R-Squared mean value?

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. ... 100% indicates that the model explains all the variability of the response data around its mean.

The Value Chain Operations

Raw Materials become product in this phase through the use of capital equipment and labor

Substantial doubt with regard to the entity's ability to continue as a going concern

Should be disclosed in an emphasis-of-matter paragraph appended to an otherwise unmodified opinion.

Differentiation strategies

Should be pursued only after a study of buyers' needs and preferences to determine the feasibility on incorporating one or more differentiating features into a unique product

what is one issue storing data in spreadsheets?

Since protection of the data is limited, users can easily introduce errors in formulas (processing) if they are unfamiliar with how the spreadsheet works.

The ability of a business to pay debts is called ______.

Solvency

Structured Data

Structured data are computer-readable and usable.

What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method

Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components. Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error. Algorithm: 1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable 2) Compute covariance matrix Σ 3) Compute eigenvectors of Σ 4) Choose k principal components so as to retain x% of the variance (typically x=99) Applications: 1) Compression - Reduce disk/memory needed to store data - Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set 2. Visualization: 2 or 3 principal components, so as to summarize data Limitations: - PCA is not scale invariant - The directions with largest variance are assumed to be of most interest - Only considers orthogonal transformations (rotations) of the original variables - PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not - If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances 11. Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important False positive Improperly reporting the presence of a condition when it's not in reality. Example: HIV positive test when the patient is actually HIV negative False negative Improperly reporting the absence of a condition when in reality it's the case. Example: not detecting a disease when the patient has this disease. When false positives are more important than false negatives: - In a non-contagious disease, where treatment delay doesn't have any long-term consequences but the treatment itself is grueling - HIV test: psychological impact When false negatives are more important than false positives: - If early treatment is important for good outcomes - In quality control: a defective item passes through the cracks! - Software testing: a test to catch a virus has failed

economies of scale

The increase in efficiency of production as the number of goods being produced increases. Typically, a company that achieves economies of scale lowers the average cost per unit through increased production since fixed costs are shared over an increased number of goods.

module 4

We use regression analysis for two primary purposes: Studying the magnitude and structure of the relationship between two variables. Forecasting a variable based on its relationship with another variable. The structure of the single variable linear regression line is ŷ =a+bxy^=a+bx. ŷ y^ is the expected value of yy, the dependent variable, for a given value of xx. xx is the independent variable, the variable we are using to help us predict or better understand the dependent variable. aa is the y-intercept, the point at which the regression line intersects the vertical axis. This is the value of ŷ y^ when the independent variable, xx, is set equal to 0. bb is the slope, the average change in the dependent variable yy as the independent variable xx increases by one. The true relationship between two variables is described by the equation y=α+βx+εy=α+βx+ε, where εε is the error term (ε=y−ŷ )(ε=y−y^). The idealized equation that describes the true regression line is ŷ =α+βxy^=α+βx. We determine a point forecast by entering the desired value of xx into the regression equation. We must be extremely cautious about using regression to forecast for values outside of the historically observed range of the independent variable (x-values). Instead of predicting a single point, we can construct a prediction interval, an interval around the point forecast that is likely to contain, for example, the actual selling price of a house of a given size. The width of a prediction interval varies based on the standard deviation of the regression (the standard error of the regression), the desired level of confidence, and the location of the x-value of interest in relation to the historical values of the independent variable. It is important to evaluate several metrics in order to determine whether a single variable linear regression model is a good fit for a data set, rather than looking at single metrics in isolation. R2 measures the percent of total variation in the dependent variable, yy, that is explained by the regression line. R2=Variation explained by the regression lineTotal variation=Regression Sum of SquaresTotal Sum of SquaresR2=Variation explained by the regression lineTotal variation=Regression Sum of SquaresTotal Sum of Squares 0≤R2≤1 For a single variable linear regression, R2 is equal to the square of the correlation coefficient. In addition to analyzing R2, we must test whether the relationship between the dependent and independent variable is significant and whether the linear model is a good fit for the data. We do this by analyzing the p-value (or confidence interval) associated with the independent variable and the regression's residual plot. The p-value of an independent variable is the result of the hypothesis test that tests whether there is a significant linear relationship

1/2%

#/100 and #/2

50%

#/2

1010

10

Determine the Nature and Scope of Engagement

Auditor may be hired to perform audit for single period or multiple periods May be on complete FS, single FS, or specific element, account or items of FS Many audit firms are hired to perform tax services in addition to audit services. *Nonissuers* Private have choice of: -Financial Statement audit - fairness of FS -Integrated audit: 1 opinion of fairness of FS, 1 opinion on operating effectiveness of IC over finanail reporting *Issuers* Public must perform integrated

Which of the following is the best approximation of the gamma of an option if its delta is equal to 0.6 when the price of the underlying security is 100 and 0.7 when the price of the underlying security is 110? A) 1.00. B) 0.01. C) 0.10.

B) 0.01. The gamma of an option is computed as follows: Gamma = change in delta/change in the price of the underlying = (0.7 0.6)/(110 100) = 0.01

Writing a series of interestrate puts and buying a series of interestrate calls all at the same exercise rate, is equivalent to: A) a short position in a series of forward rate agreements. B) being the fixedrate payer in an interest rate swap. C) being the floatingrate payer in an interest rate swap.

B) being the fixedrate payer in an interest rate swap. A short position in interest rate puts will have a negative payoff when rates are below the exercise rate the calls will have positive payoffs when rates exceed the exercise rate. This mirrors the payoffs of the fixedrate payer who will receive positive net payments when settlement rates are above the fixed rate.

What's the difference between Type I and Type II error?

Don't think that this is a trick question! Many machine learning interview questions will be an attempt to lob basic questions at you just to make sure you're on top of your game and you've prepared all of your bases. Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn't, while Type II error means that you claim nothing is happening when in fact something is. A clever way to think about this is to think of Type I error as telling a man he is pregnant, while Type II error means you tell a pregnant woman she isn't carrying a baby.

Consistency

Intra-company comparisons

Mergers and Acquisitions Approach

Objectives Price Due Diligence Exit Strategies

Interquartile Range

Q3 - Q1 = IQR

n-gram

token permutations associated with a keyword

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Basis for Qualified Opinion Paragraph: This paragraph should be placed immediately before the opinion paragraph and use the heading "Basis for Qualified Opinion." This paragraph should include:

(2). An explanation of how disclosures are misstated is there is a material misstatement related to narrative disclosure.

Appropriateness of Financial Statement Presentation or Disclosure (≠ GAAP) Material misstatements related to the appropriateness of financial statement presentation or the appropriateness or adequacy of disclosures may arise when: (2)

(3). The financial statements do not provide the disclosures needed to achieve fair presentation

f

01100110

Adverse Opinion (Issuer/public company/GAAP-very material problem). Adverse Opinion Due to Material Misstatement of Financial Statements: Issuer. Middle Paragraph(s):

A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (1). All of the substantive reasons that lead the auditor to conclude that there has been a departure from generally accepted accounting principles.

Qualified Opinion (issuer/public company/GAAP-material problem.) Qualified Opinion Due to Material Misstatement of Financial Statements:Issuer. Middle Paragraph(s):

A paragraph should be placed immediately before the opinion paragraph. This paragraph should include: (1). All of the substantive reasons that lead the auditor to conclude that there has been a departure from generally accepted accounting principles.

Alerts that Restrict Use of Auditor's Written Communication

Auditor may be required by GAAS or may decide that it is necessary to include language in auditor's report that restricts the use of the auditor's written communication. In the report, such language is included in an other-matter paragraph. *Use* -Include an alter that restricts its use when subject matter of auditor written communication is based on: measurement or disclosure criteria suitable for limited users who have adequate understanding, measurement or disclosure criteria avail to only specific parties, matters identified during audit engagement that are not primary object of engagement *Content* -Statement that auditor's written communication is intended solely for the information and use of specified parties -Identification of parties whom it is intended -Statement that auditor written communication is not inteded and should not be used by anyuone

Reasonable Assurance and Inherent Limitations of Audit

Auditor obtains reasonable assurance about whether FS are free from material misstatement whether due from error or fraud -Reasonable assurance is high, but not absolute, level of assurance. To obtain, auditor must: 1. Plan work and properly supervise assistances 2. Determine appropriate materiality levels 3. Identify and assess risks of material misstatement 4. Obtain sufficient appropriate audit evidence Auditor unable to obtain absolute assurance because of inherent limiations: -Nature of Financial Reporting: FS items include subjective decisions or judgment by management: estimates - AR: bad debt, inventory: obsolete, PPE: life and salvage, Intangible: Cash flow, Impairment, warranties, contingency, lawsuit -Nature of Audit Procedures: Mangement or others may not provide, intentionally or not, the complete information. Fraud may be concealed. Fraud = intentional Error= Unintentional -Timeliness of Financial Reporting and Balance Between Cost and Benefit: Form opinion w/in resonable period of time and achieve balance between benefit and cost. Impractiable to address all ifnormation. Necessary for auditor to: Plan audit so performed effectively, Direct effors that are expected to contain risks of material misstatement, and Use testing and other means of examinimg populations for misstatements

What can be done to avoid local optima?

Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

What is one-hot econding ?

Maps a column of categories to a column of sparse binary vectors. Use if you don't want to order categorical variables

Can you cite some examples where a false positive is important than a false negative? Define false positive & false negative.

Before we start, let us understand what are false positives and what are false negatives. False Positives are the cases where you wrongly classified a non-event as an event a.k.a Type I error. And, False Negatives are the cases where you wrongly classify events as non-events, a.k.a Type II error. False Positive and False Negative In medical field, assume you have to give chemo therapy to patients. Your lab tests patients for certain vital information and based on those results they decide to give radiation therapy to a patient. Assume a patient comes to that hospital and he is tested positive for cancer (But he doesn't have cancer) based on lab prediction. What will happen to him? (Assuming Sensitivity is 1) One more example might come from marketing. Let's say an ecommerce company decided to give $1000 Gift voucher to the customers whom they assume to purchase at least $5000 worth of items. They send free voucher mail directly to 100 customers without any minimum purchase condition because they assume to make at least 20% profit on sold items above 5K. Now what if they have sent it to false positive cases?

variable costs

COGS, raw materials, energy inputs, labor, service

Ex: Nestle wants to grow market share in China who has bad water quality and the demand for bottled water is growing

Company Market Growth Strategies Alt Markets

Suppliers

Consolidation threat of integration pull through by customers

Schroder Method

Construct tree using pre-paid forward price (i.e., S-PV(Div)). The stock price at each node is pre-paid forward price + present value of unpaid dividends (only used to determine payoff at a node).

Advantages of Equity Investing

Control over company-Voting Rights. Participation in future profits.

Why is "Naive" Bayes naive?

Despite its practical applications, especially in text mining, Naive Bayes is considered "Naive" because it makes an assumption that is virtually impossible to see in real-life data: the conditional probability is calculated as the pure product of the individual probabilities of components. This implies the absolute independence of features — a condition probably never met in real life. As a Quora commenter put it whimsically, a Naive Bayes classifier that figured out that you liked pickles and ice cream would probably naively recommend you a pickle ice cream. Bayes' Theorem is the basis behind a branch of machine learning that most notably includes the Naive Bayes classifier. That's something important to consider when you're faced with machine learning interview questions.

Audit Issues - Material and Pervasive

Disclaimer of Opinion

Black-Scholes Model Pricing on Futures

Discount at FUTURE expiry instead of option expiry.

Dividend Yield = ? / ?

Dividends per Share of Common Stock / Market Price per Share of Common Stock

When to use ensemble learning?

Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.

Companies use LIFO for:

Tax advantages

What is normalization?

elimination of data anomalies

A business plan document is the culmination of...

the business planning process

c

01100011

Most other distributions

n>30

decision node

square

Materiality

*Audit of Single FS* Auditor should determine materiality for the single FS rather than the complete set of FS *Audit of Specific Element* Auditor should determine materiality separately for each element, rather than aggregate of all elements.

Qualified v Adverse Opinion

*GAAP Problem* -Qualified Opinion: Auditor concludes misstatements are material but not pervasive to FS -Adverse Opinion: Auditor concludes that misstatements are both material and pervasive to the FS Examples: -GAAP consistency change (unjustified): auditor disagrees -Inadequate disclosures -Departure from GAAP (unjustified) -Unreasonable accounting estimate *Nature of Material Misstatements ≠ GAAP* -Appropriateness of Accounting Policies ∙Acc policies are not in accordance w framework ∙FS do not represent underlying transaction/event in a manner to achieve fair presentation ∙Entity has not complied w framework requirements for accounting/disclosing changes in acc policies -Application of Accounting Policies ∙Mgt has not applied acc policies in accordance w framework ∙Mgt has not applied acc policies consistently between periods/similar transactions ∙Error in application of acc policy -Appropriateness of Financial Statement Presentation or Disclosure ∙FS do not include all required disclosures ∙Disclosures are not presented in accordance w framework ∙FS do not provide disclsures needed for fair presentation ∙Info that is required to be presented has not been included or disclosed

Auditor's Report Issuer Adverse Opinion

*Issuer = Public* *GAAP = Large Matieral Problem* *Report* Introductory Paragraph -Same (FS were audited, FS are responsibility of mgt and auditor is responsible for opinion) Scope Paragraph -Same (Audit in accordance w PCAOB, audit was planned and performed to obtain assurance that FS are free from error, Examined evidence on test basis, assessed acc principles and estimated made by mgt, audit provides reasonable basis for opinion) Middle Paragraph -Immediately before the opinion -List substantive reason that lead auditor to conclude departure from GAAP -Disclosure of principal effects Adverse Opinion Paragraph -Because of the effects of matter, FS do not present fairly

Auditor's Report Issuer Qualified Opinion

*Issuer = Public* *GAAP = Matieral Problem* *Report* Introductory Paragraph -Same (FS were audited, FS are responsibility of mgt and auditor is responsible for opinion) Scope Paragraph -Same (Audit in accordance w PCAOB, audit was planned and performed to obtain assurance that FS are free from error, Examined evidence on test basis, assessed acc principles and estimated made by mgt, audit provides reasonable basis for opinion) Middle Paragraph -Immediately before the opinion -List substantive reason that lead auditor to conclude departure from GAAP -Disclosure of principal effects Qualified Opinion Paragraph -Except for effects of matter... FS are presented fairly

Explanatory Paragraph

*Issuer=Public* -Included in report when required by PCAOB or auditor's discretion. Does not affect the auditor's opinion. *Requirements* -Does not have a title -Describe matter being emphasized and location of relevant disclosures -Will generally follow the opinion paragraph when added to unqualified report -May be place before or after opinion paragraph to emphaize a matter Place before opinion when: -FS are prepared in accordance w special purpose framework -Prior year audit opinion is updated

Modified Opinion on Complete Set of FS

*Modified Opinion Relevant to Audit of Specific Element* Modified opinion on complete set of FS is relevant to audit of specific element on FS, auditor should either: -express an adverse opinion on element when modified opinion on complete set of FS is due to MM of FS (GAAP) -express a disclaimer of opinion on element when modified opinion on complete set of FS is due to scope limitation (GAAS) *Piecemeal Opinion* Auditor expresses an adverse or disclaimer of opinion on complete set of FS, an unmodified opinion on specific element in same report. Auditor considers it appropriate to express an unmodified opinion on specific elements, should do so ONLY: 1. opinion on specific element is not published w and does not accompany the auditor's report on the complete set of FS (which have adverse/disclaimer) 2. specific element does not constitute a major portion of the entity's complete set of FS or element is not based on SHE or NI. Single FS is considered a major portion of complete set of FS, unmodified should not be expressed on single if auditor has expressed adverse or disclaimer on complete *Emphasis-of-matter, Other-Matter, Explanatory Paragraph* If auditor's report on complete set includes emphasis of matter that is relevant to audit of single FS or specific elemetn, auditor should include similar paragrah in auditors report on single FS or specific element.

Other-Matter Paragraphs

*Nonissuer = Private* Included in report when required by GAAS or at auditor's discretion. Refer to matters other than those presented or disclosed in the FS that are relevant to user's understanding of audit, auditor's responsibilities, or auditor's report *Requirements* -Immediately after the opinion paragraph, after any emphasis-of-matter paragraph. After opinion & emphasis of matter. -"Other-Matter" -Describe matter being emphasized and location of disclosures in FS

Auditor's Report Nonissuer Adverse Opinion

*Nonissuer = Private* *Large Material Misstatement = GAAP* *Report* Introductory Paragraph -Same (Entity being audited, FS were audited, name of FS) Management's Responsibility Paragraph -Same (MR DIM): mgt is responsible for FS, responsibility includes design, implementation, and maintence of internal control. Auditor's Responsibility Paragraph -Same (express opinion, accordance with auditing standards of US, plan, perform, obtain evidence, assess risk of MM, test IC) Modified to say basis is adverse audit opinion Basis for Adverse Opinion Paragraph -Immediately before opinion paragraph -Description and quantification of effects -Explanation of how disclosures are misstated -Description and inclusion of nature of omitted info and omitted info when practicable (reasonably obtainable from mgt accounts) Adverse Opinion Paragraph -Because of the significance of the matter... FS do not present fairly in accordance with framework

Auditor's Report Nonissuer Qualified Opinion

*Nonissuer = Private* *Material Misstatement = GAAP* *Report* Introductory Paragraph -Same (Entity being audited, FS were audited, name of FS) Management's Responsibility Paragraph -Same (MR DIM): mgt is responsible for FS, responsibility includes design, implementation, and maintence of internal control. Auditor's Responsibility Paragraph -Same (express opinion, accordance with auditing standards of US, plan, perform, obtain evidence, assess risk of MM, test IC) Modified to say basis is qualified audit opinion Basis for Qualified Opinion Paragraph -Immediately before opinion paragraph -Description and quantification of effects -Explanation of how disclosures are misstated -Description and inclusion of nature of omitted info and omitted info when practicable (reasonably obtainable from mgt accounts) Qualified Opinion Paragraph -Except for the effects of matters described in basis... FS are presented fairly *Make sure omission does not make FS false, fraudulent, deceptive, or misleading, if so WITHDRAW*

Public Company Accounting Oversight Board Auditing Standards

*PCAOB AS* - Audits -Section: PCAOB AS -Standard Setting: Public Company Accounting Oversight Board -Provides generally accepted auditing standards for audits of *issuers*. Provide guidance for other services like review of interim financial information and letters to underwriters *Public* -Audits of annual FS: issuers, Special reports: issuers, Interim FS: issuers

Use of Other-Matter Paragraph

*Required* -Auditor includes alert in report that restricts the use of report -Subsequently discovered facts lead to change in audit opinion (option to put in emphasis-of-matter) -FS of prior period were audited by predecessor auditor and predecessor audit report is not reissued -Current FS are audited and presented in comparative form w FS from prior period or in comparative form w prior FS that were not audited, reviewed, or compiled -Prior to audit report date, auditor identifies a material inconsistency in other information -Auditor chooses to report on supplementary information -Refer to required supplementary information -Restrict the use of auditor's report when special purpose FS are prepared -Report on compliance is included in report on FS *May be Necessary* Professional Judgement -Describe reason why auditor cannot withdraw from engagement when auditor is unable to obtain sufficient evidence. -Law, regulation, or generally accepted practice require auditor to provide further explanation of auditor's responsibilities -Auditor engaged to report on more than 1 set of FS when each set has been prepared in accordance with a difference general-purpose framework

Use Emphasis-of-Matter Paragraphs

*Required* -Conclude substantial doubt in ability to continue as going concern -Describe a justified change in acc principle that has material effect on FS -Subsequently discovered facts lead to a change in audit opinion -FS are prepared in accordance w applicable special purpose framework *May be Necessary* Professional Judgement -Uncertainty related to outcome of unusually important litigation or regulatory action -Major catastrophe having significant effect on fin position -Significant related party transactions -Unusually important subsequent events

Use of Explanatory Paragraphs

*Required* -Prior year opinion is updated -FS are prepared in accordance w special purpose framework -Substantial doubt about ability to continue as going concern -Material change between periods in acc principles or method -Material misstatement in previously issued FS has been corrected -Other info in document containing audited FS is materially inconsistent w FS -Selected quarterly fin data required by SEC Regulation S-K has been omitted or not reviewed -Supplementary info has been omitted *May be Necessary* Professional Judgement -Wishes to emphasize a matter regarding FS *Intro paragraph modified when prior FS audited by prior auditor and prior auditors report is not presented*

Principles - Risk Assessment

*S*pecify Objectives Identify and *A*ssess Changes Consider Potential for *F*raud Identify and Analyze *R*isks

Statements on Quality Control Standards

*SQCS* - Guidelines -Section: QC -AICPA -Provides guidance to CPA firms about the quality control system. Consists of policies and procedures designed, implemented, and maintained to ensure that the firm complies with professional standards and appropriate legal and regulatory requirements and that any reports issued are appropriate in circumstances -Applies to: CPA firms providing auditing, attestation, and accounting and review services

Statements on Standards for Attestation Engagements

*SSAE* - Other Engagaments -Section: AT-C -Standard Setting: AICPA -Provide guidance for attestation engagements. -*Examination, review, and agreed upon procedures* report on a subject matter, or an assertion about a subject matter, that is the responsibility of another party

Statements on Standards for Accounting and Review Services

*SSARS* - Other Engagements -Section: AR-C -Standard Setting: AICPA Accounting and Review Services Committee -Provide guidance for *unaudited* FS or unaudited financial information of *nonissuers* -Preparation, compilation, and reviews of FS: nonissuers, Preparation or compilation of pro forma fin information: nonissuers

If the auditor is unable to observe physical inventory and is unable to become satisfied through alternative means

,That is a scope limitation. Scope limitation results in either a qualified opinion or a disclaimer of opinion.

product (cheng)

- nature of the product (what it does, how it's used, why it's useful) - commodity or differentiable good - identify complementary goods - identify substitutes (indirect competitors? don't buy anything?) -product's life cycle - how is it packaged

Quote-to-Cash Business Process

--------------------------------------------->>>>>> Presales activity Sales order processing Inventory sourcing Delivery Biling Payment

Responsibilities of financial managers

-Forecasting revenues and costs -Planning activities -Managing costs -Identifying alternative sources and costs of finance -Managing cash -Negotiations with bankers -Evaluation of investments -Measurement and control of performance

Turning around troubled co - choose strategy

-Learn as much about the business and its operations as possible. -Review services, products, and finances. (Are products out of date?Do we have a high debt load?) -Secure sufficient financing so your plan has a chance. -Review talent and temperament of all employees, and get rid of the deadwood. - Determine short term and long-term company goals. - Devise a business plan. - Visit clients,suppliers,and distributors, and reassurethem. -Prioritize goals and get some small successes under your belt ASAP to build confidence

The process of generating and selecting strategies

-Manageable set of most alternative strategies must be developed -The advantages, disadvantages, tradeoffs, costs and benefits of these strategies should be determined -Identifying and evaluating alternative strategies should involve many of the managers and employees who earlier assembled the organizational vision and mission statements, performed the external audit, and conducted the internal audit

The politics of strategy choice

-Political maneuvering consumes valuable time, subverts organizational objectives, diverts human energy, and results in the loss of some valuable employees -Political biases and personal preferences get unduly embedded in strategy choice decisions -The hierarchy of command in an organization, combined with the career aspirations of different people and the need to allocate scarce resources, guaranteed the formation of coalitions of individuals who strive to take care of themselves first and the organization second, third, or fourth

Stage 1: Input stage

-Summarizes the basic input information needed to formulate strategies -Consists of the EFE matrix, the IFE matrix and the competitive profile matrix CPM

prices are stable when:

-growth rate for all competitors is approx. the same -prices are paralleling costs -prices of all competitors are roughly of equal value

Net Income/ Cash Flow *Now estimate income/cash flow*

-historical levels/trends in ratios -gross m./Op. m/ etc. -separate forecasts for expense items -based on some relationship with sales or state company strategy. -forecast cash flows -assume non-cash WC/sales constant -required increases in EC -CAPEX -CFF - debt, equity

fixed costs

-overhead -machinery -distribution -rent -interest -depreciation

g

01100111

h

01101000

j

01101010

m

01101101

o

01101111

p

01110000

q

01110001

r

01110010

v

01110110

y

01111001

Consider a 9month forward contract on a 10year 7% Treasury note just issued at par. The effective annual riskfree rate is 5% over the near term and the first coupon is to be paid in 182 days. The price of the forward is closest to: A) 1

037.27. B) 1,001.84. C) 965.84.,The forward price is calculated as the bond price minus the present value of the coupon, times one plus the riskfree rate for the term of the forward. (1,000 35/1.05^( 182/365 )) 1.05 ^(9/12 ) = $1,001.84

000110

06

000111

07

0001

1

gross profit margin

1 - (COGS/Rev)

Customer Lifetime

1 / Customer Churn Rate

Net Profit Margin

1- (COGS+all other expenses/Rev)

Why are business plans necessary? (5)

1. Attract funding 2. Attract key personnel 3. Budgeting 4. Clarify the business model 5. Benchmark mechanism

what the types of master data tables?

1. Display attributes are attributes that are presented alongside their primary key in analytic reports. 2. Navigational attributes, like display attributes, are displayed alongside their primary key in analytic reports. 3. Time-dependent attributes are attributes such as price that change over time. 4. Time-independent attributes such as product weight do not change over time.

How would you validate a regression model.

1. Eyeball it. If values are outside the response variable values could indicate poor accuracy

Adverse Opinion Due to Material Misstatement of the Financial Statements - Issuers (Public Company)

1. Intro Paragraph 2. Scope Paragraph 3. Middle Paragraph(s) 4. Qualified Opinion - "because of," "the financial statements do not present fairly"

Qualified Opinion Due to Material Misstatement of the Financial Statements - Issuers (Public Company)

1. Intro Paragraph 2. Scope Paragraph 3. Middle Paragraph(s) 4. Qualified Opinion - "except for," "the financial statements are presented fairly"

Pricing Strategies Steps

1. Investigate the Company 2. Investigate the Product 3. Determine the Pricing Strategy

Grow and Increasing Sales Steps

1. Learn about the company, and its size, resources and products 2. Investigate the Industry and compare company to it

Overall Objective of Audit Engagements

1. Objectives of Financial Statement Audit (1 of 2): issuers, nonissuers, and governmental: 1. Obtain reasonable assurance whether FS are free from material misstatements, error or fraud, which enables auditor to express opinion 2. Report on FS and communicate as required by GAAS 2. Objectives of Audit of Internal Control OVer Financial Reporting (2 of 2): Issuers are required. 1. Express opinion on effectiveness of IC over financial reporting 2. Plan and perform audit to obtain appropriate evidence that is sufficient to obtain reasonable assurnace about whether meatieral weakness exists

Steps to develop a SPACE matrix

1. Select a set of variable to define financial positions, competitive position, stability position, and industry position 2. Assign a numerical value ranging from +1 (worst) to +7 (best) to each of the variables that make up the FP and IP dimensions -Assign a numerical value ranging from -1 (best) to -7 (worst) each of the variables that make up the SP and CP dimensions 3. Compute an average score for FP, CP, IP, and SP 4. Plot the average scores for FP, IP, SP, and CP on the appropriate axis in the SPACE matrix 5. Add the two scores on the x-axis and plot the resultant point on X. add the two scores on the y-axis and plot the resultant point on Y. Plot the intersection on the new XY point 6. Draw a line

Porter's five forces

1. Supplier Power 2. Buyer Power 3. Barriers to Entry 4. Threat of Substitutes 5. Competitive Rivalry

turnarounds

1. gather information -tell me about company -why is it failing? bad products, management, economy? -tell me about industry -are competitors facing same problems? -access to capital? 2. action -learn about business and operations -review services, products, finances: products out of date? high debt load? -secure sufficient financing -review talent and temperament of all employees

M&A

1. goals and objectives -- why? good business sense? better alternatives? good strategic move? 2. how much are they paying? 3. due diligence -- research company and industry. -shape company is in -how secure are markets, customers, suppliers? -how is industry doing overall? -what are margins like -- high-volume low margins or low-volume high margins? -legal reasons to prevent acquisition? 4. exit strategy

Variance

1. how far (negative or positive number) is each data point from the average? (standard deviation) x (standard deviation)

pricing strategies

1. investigate product -what's special or proprietary? -do similar products exist? how are they priced? -where are we in industry's growth cycle? -how big is market? -what were R&D costs? 2. pricing strategies -cost-based pricing: production costs, breakeven point, profit margin -price-based costing: what are customers willing to pay, what's it worth to them compared to other things, supply & demand

what are two examples of how ERP is used in business?

1. large retailers such as Costco and Sam's Club collect sales data from their customers. They can utilize that data to see if they should cut down the operations hours etc. 2. G.B.I wishes to optimize its logistics such as shipping. Using its E.R.P system, analyst can determine which shipper are reliable with time and delivery.

competitor factors

1. product/service offering -value chain 2. advantages & disadvantages in capabilities -marketing -operating efficiencies -talented people 3. key data -market share -total number -fragmentation/concentration

Explain what a local optimum is and why it is important in a specific context such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A solution that is optimal in within a neighboring set of candidate solutions In contrast with global optimum: the optimal solution among all others K-means clustering context: It's proven that the objective cost function will always decrease until a local optimum is reached. Results will depend on the initial random cluster assignment Determining if you have a local optimum problem: Tendency of premature convergence Different initialization induces different optima Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

How would you handle an imbalanced dataset?

An imbalanced dataset is when you have, for example, a classification test and 90% of the data is in one class. That leads to problems: an accuracy of 90% can be skewed if you have no predictive power on the other category of data! Here are a few tactics to get over the hump: 1- Collect more data to even the imbalances in the dataset. 2- Resample the dataset to correct for imbalances. 3- Try a different algorithm altogether on your dataset. What's important here is that you have a keen sense for what damage an unbalanced dataset can cause, and how to balance that.

What all are the data sources Spark can process?

Ans: -Hadoop File System (HDFS) - Cassandra (NoSQL databases) - HBase (NoSQL database) - S3 (Amazon WebService Storage : AWS Cloud)

What is a RDD Lineage Graph

Ans: A RDD Lineage Graph (aka RDD operator graph) is a graph of the parent RDD of a RDD. It is built as a result of applying transformations to the RDD. A RDD lineage graph is hence a graph of what transformations need to be executed after an action has been called

How do you define RDD?

Ans: A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

What is the purpose of Driver in Spark Architecture?

Ans: A Spark driver is the process that creates and owns an instance of SparkContext. It is your Spark application that launches the main method in which the instance of SparkContext is created. -Drive splits a Spark application into tasks and schedules them to run on executors. - A driver is where the task scheduler lives and spawns tasks across workers. - A driver coordinates workers and overall execution of tasks.

Can you define the purpose of master in Spark architecture?

Ans: A master is a running Spark instance that connects to a cluster manager for resources. The master acquires cluster nodes to run executors.

What is Preferred Locations

Ans: A preferred location (aka locality preferences or placement preferences) is a block location for an HDFS file where to compute each partition on. def getPreferredLocations(split: Partition): Seq[String] specifies placement preferences for a partition in an RDD.

How do you define actions?

Ans: An action is an operation that triggers execution of RDD transformations and returns a value (to a Spark driver - the user program). They trigger execution of RDD transformations to return values. Simply put, an action evaluates the RDD lineage graph. You can think of actions as a valve and until no action is fired, the data to be processed is not even in the pipes, i.e. transformations. Only actions can materialize the entire processing pipeline with real data.

What is Apache Parquet format?

Ans: Apache Parquet is a columnar storage format

What is a BlockManager?

Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap.

Block Manger

Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap. A BlockManager manages the storage for most of the data in Spark, i.e. block that represent a cached RDD partition, intermediate shuffle data, and broadcast data.

What is checkpointing?

Ans: Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed (HDFS) or local file system. RDD checkpointing that saves the actual intermediate RDD data to a reliable distributed file system.

What is DAGSchedular and how it performs?

Ans: DAGScheduler is the scheduling layer of Apache Spark that implements stage-oriented scheduling, i.e. after an RDD action has been called it becomes a job that is then transformed into a set of stages that are submitted as TaskSets for execution.

How do you define SparkContext?

Ans: It's an entry point for a Spark Job. Each Spark application starts by instantiating a Spark context. A Spark application is an instance of SparkContext. Or you can say, a Spark context constitutes a Spark application. SparkContext represents the connection to a Spark execution environment (deployment mode). A Spark context can be used to create RDDs, accumulators and broadcast variables, access Spark services and run jobs.

What is Lazy evaluated RDD mean?

Ans: Lazy evaluated, i.e. the data inside RDD is not available or transformed until an action is executed that triggers the execution.

Why Spark is good at low-latency iterative workloads e.g. Graphs and Machine Learning?

Ans: Machine Learning algorithms for instance logistic regression require many iterations before creating optimal resulting model. And similarly in graph algorithms which traverse all the nodes and edges. Any algorithm which needs many iteration before creating results can increase their performance when the intermediate partial results are stored in memory or at very fast solid state drives.

What is Narrow Transformations? (Spark)

Ans: Narrow transformations are the result of map, filter and such that is from the data from a single partition only, i.e. it is self-sustained. An output RDD has partitions with records that originate from a single partition in the parent RDD. Only a limited subset of partitions used to calculate the result. Spark groups narrow transformations as a stage.

Can RDD be shared between SparkContexts?

Ans: No, When an RDD is created it belongs to and is completely owned by the Spark context it originated from . RDDs can 't be shared between SparkContexts.

What is the difference between cache() and persist() method of RDD

Ans: RDDs can be cached (using RDD's cache() operation) or persisted (using RDD's persist(newLevel: StorageLevel) operation). The cache() operation is a synonym of persist() that uses the default storage level MEMORY_ONLY .

What are the possible operations on RDD?

Ans: RDDs support two kinds of operations: - transformations - lazy operations that return another RDD. - actions - operations that trigger computation and return values.

What is Apache Spark Streaming?

Ans: Spark Streaming helps to process live stream data. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.

How many concurrent task Spark can run for an RDD partition?

Ans: Spark can only run 1 concurrent task for every partition of an RDD, up to the number of cores in your cluster. So if you have a cluster with 50 cores, you want your RDDs to at least have 50 partitions (and probably 2-3x times that).As far as choosing a "good" number of partitions, you generally want at least as many as the number of executors for parallelism. You can get this computed value by calling sc.defaultParallelism .

How RDD helps parallel job processing?

Ans: Spark does jobs in parallel, and RDDs are split into partitions to be processed and written in parallel. Inside a partition, data is processed sequentially.

Why both Spark and Hadoop needed?

Ans: Spark is often called cluster computing engine or simply execution engine. Spark uses many concepts from Hadoop MapReduce. Both Spark and Hadoop work together well. Spark with HDFS and YARN gives better performance and also simplifies the work distribution on cluster. As HDFS is storage engine for storing huge volume of data and Spark as a processing engine (In memory as well as more efficient data processing). HDFS: It is used as a Storage engine for Spark as well as Hadoop. YARN: It is a framework to manage Cluster using pluggable scedular. Run other than MapReduce: With Spark you can run MapReduce algorithm as well as other higher level of operators for instance map(), filter(), reduceByKey(), groupByKey() etc.

How can you define SparkConf?

Ans: Spark properties control most application settings and are configured separately for each application. These properties can be set directly on a SparkConf passed to your SparkContext. SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set() method. For example, we could initialize an application with two threads as follows: Note that we run with local[2], meaning two threads - which represents minimal parallelism, which can help detect bugs that only exist when we run in a distributed context.

What is Data locality / placement?

Ans: Spark relies on data locality or data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

What is the advantage of broadcasting values across Spark Cluster?

Ans: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.

Define Spark architecture

Ans: Spark uses a master/worker architecture. There is a driver that talks to a single coordinator called master that manages workers in which executors run. The driver and the executors run in their own Java processes.

Give few examples how RDD can be created using SparkContext,

Ans: SparkContext allows you to create many different RDDs from input sources like: -Scala's collections: i.e. sc.parallelize(0 to 100) -Local or remote filesystems :sc.textFile("README.md") -Any Hadoop InputSource : using sc.newAPIHadoopFile

What is coalesce transformation?

Ans: The coalesce transformation is used to change the number of partitions. It can trigger RDD shuffling depending on the second shuffle boolean input parameter (defaults to false ).

Which limits the maximum size of a partition?

Ans: The maximum size of a partition is ultimately limited by the available memory of an executor.

How many type of transformations exist?(spark)

Ans: There are two kinds of transformations: -narrow transformations -wide transformations

What is wide Transformations?(Spark)

Ans: Wide transformations are the result of groupByKey and reduceByKey . The data required to compute the records in a single partition may reside in many partitions of the parent RDD. All of the tuples with the same key must end up in the same partition, processed by the same task. To satisfy these operations, Spark must execute RDD shuffle, which transfers data across cluster and results in a new stage with a new set of partitions.

What are the workers?

Ans: Workers or slaves are running Spark instances where executors live to execute tasks. They are the compute nodes in Spark. A worker receives serialized/marshalled tasks that it runs in a thread pool.

Is it possible to have multiple SparkContext in single JVM?

Ans: Yes, spark.driver.allowMultipleContexts is true (default: false ). If true Spark logs warnings instead of throwing exceptions when multiple SparkContexts are active, i.e. multiple SparkContext are running in this JVM. When creating an instance of SparkContex.

Can we broadcast an RDD?

Ans: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.

What is master URL in local mode?

Ans: You can run Spark in local mode using local , local[n] or the most general local[*]. The URL says how many threads can be used in total: -local uses 1 thread only. - local[n] uses n threads. - local[*] uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors() to know the number).

How can you stop SparkContext and what is the impact if stopped?

Ans: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application

What is an Asset?

Anything worth money A resource owned by an entity as a result of past events and from which future economic benefits are expected to flow to the entity -Probable future benefit -Arising from some past transaction -Control the resource -Capable of measurement (money)

Grow and Increasing Sales 2. Investigate the Industry and compare company to it

Are the clients prices inline with the competition?

Referring to putcall parity which one of the following alternatives would allow you to create a synthetic European call option? A) Sell the stock buy a European put option on the same stock with the same exercise price and the same maturity invest an amount equal to the present value of the exercise price in a purediscount riskless bond. B) Buy the stock buy a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. C) Buy the stock sell a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond.

B) Buy the stock buy a European put option on the same stock with the same exercise price and the same maturity short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. According to putcall parity we can write a European call as: C 0 = P 0 + S 0 X/(1+R f ) TWe can then read off the righthand side of the equation to create a synthetic position in the call. We would need to buy the European put buy the stock, and short or issue a riskless purediscount bond equal in value to the present value of the exercise price.

Which of the following statements regarding an option's price is CORRECT? An option's price is: A) a decreasing function of the underlying asset's volatility when it has a long time remaining until expiration and an increasing function of its volatility if the option is close to expiration. B) an increasing function of the underlying asset's volatility. C) a decreasing function of the underlying asset's volatility.

B) an increasing function of the underlying asset's volatility. Since an option has limited risk but significant upside potential, its value always increases when the volatility of the underlying asset increases.

Which of the following statements regarding the goal of a deltaneutral portfolio is most accurate? One example of a delta neutral portfolio is to combine a: A) long position in a stock with a short position in a call option so that the value of the portfolio changes with changes in the value of the stock. B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. C) long position in a stock with a long position in call options so that the value of the portfolio does not change with changes in the value of the stock.

B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. A deltaneutral portfolio can be created with any of the following combinations: long stock and short calls, long stock and long puts, short stock and long calls, and short stock and short puts.

Growth strategies - choosing

Determine fit for each: -Increase sales -Increase distribution channels -Increase product line -Diversify products or services offered -Acquire competitors or a company in a different industry

Name an example where ensemble techniques might be useful.

Ensemble techniques use a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust (unlikely to be influenced by small changes in the training data). You could list some examples of ensemble methods, from bagging to boosting to a "bucket of models" method and demonstrate how they could increase predictive power.

what are ERP systems?

Enterprise resource planning (ERP) systems are integrated transactional systems that enable all the functional areas of a business to share data

Reporting Period (F/M)

F-usually annually M- As required by mgmt

Materiality of Problem: None or immaterial

Financial Statements Are Materially Misstated (Financial Statement Issues): Unmodified (Unqualified). Inability to Obtain Sufficient Appropriate Audit Evidence (Audit Issues): Unmodified (Unqualified).

Write a function that takes in two sorted lists and outputs a sorted list that is their union.

First solution which will come to your mind is to merge two lists and short them afterwards Python code- def return_union(list_a, list_b): return sorted(list_a + list_b) R code- return_union <- function(list_a, list_b) { list_c<-list(c(unlist(list_a),unlist(list_b))) return(list(list_c[[1]][order(list_c[[1]])])) } Generally, the tricky part of the question is not to use any sorting or ordering function. In that case you will have to write your own logic to answer the question and impress your interviewer. Python code- def return_union(list_a, list_b): len1 = len(list_a) len2 = len(list_b) final_sorted_list = [] j = 0 k = 0 for i in range(len1+len2): if k == len1: final_sorted_list.extend(list_b[j:]) break elif j == len2: final_sorted_list.extend(list_a[k:]) break elif list_a[k] < list_b[j]: final_sorted_list.append(list_a[k]) k += 1 else: final_sorted_list.append(list_b[j]) j += 1 return final_sorted_list Similar function can be returned in R as well by following the similar steps. return_union <- function(list_a,list_b) { #Initializing length variables len_a <- length(list_a) len_b <- length(list_b) len <- len_a + len_b #initializing counter variables j=1 k=1 #Creating an empty list which has length equal to sum of both the lists list_c <- list(rep(NA,len)) #Here goes our for loop for(i in 1:len) { if(j>len_a) { list_c[i:len] <- list_b[k:len_b] break } else if(k>len_b) { list_c[i:len] <- list_a[j:len_a] break } else if(list_a[[j]] <= list_b[[k]]) { list_c[[i]] <- list_a[[j]] j <- j+1 } else if(list_a[[j]] > list_b[[k]]) { list_c[[i]] <- list_b[[k]] k <- k+1 } } return(list(unlist(list_c))) }

antireflexive (irreflexive)

For every x ∈ A, x(not R) x | every guy in the set is NOT related to itself

Integration strategies:

Forward, backward, and horizontal integration

Large enough sample condition

If the population is unimodal and symmetric, even a fairly small sample is okay. For highly skewed distributions you may need several hundred to get the distribution to normal

Explain what a confidence interval means

If you reject something with 95% confidence then in the case there is no true effect, a result like ours will happen in less than 5% of all possible samples

When would you use random forests Vs SVM and why?

In a case of a multi-class classification problem: SVM will require one-against-all method (memory intensive) If one needs to know the variable importance (random forests can perform it as well) If one needs to get a model fast (SVM is long to tune, need to choose the appropriate kernel and its parameters, for instance sigma and epsilon) In a semi-supervised learning context (random forest and dissimilarity measure): SVM can work only in a supervised learning mode

What are the advantages of Naive Bayes?

In Naïve Bayes classifier will converge quicker than discriminative models like logistic regression, so you need less training data. The main advantage is that it can't learn interactions between features.

Define Supervised Learning:

In Supervised learning, the algorithm learns from the training data so that the knowledge can be applied to predict outcomes of the test data. It can be further grouped into regression and classification problems. A. Classification: output variable is a category, such as "red" or "blue" or "disease" and "no disease". B. Regression: A regression problem is when the output variable is a numerical value, such as "dollars" or "weight".

What is 'Overfitting' in Machine learning?

In machine learning, when a statistical model describes random error or noise instead of underlying relationship 'overfitting' occurs. When a model is excessively complex, overfitting is normally observed, because of having too many parameters with respect to the number of training data types. The model exhibits poor performance which has been overfit.

What is the Central Limit Theorem and why is it important?

In probability theory, the central limit theorem (CLT) establishes that, in most situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed.

Cash Receipts - Segregation of the Functions

Incoming mail must be opened by a person who does not have access to the AR ledger. Three copies must be distributed to the following: 1. Cashiers - Receives actual recipients and prepares bank deposit. 2. AR Department - Enters receipts into the AR subsidiary records - Match the details from the bank deposit ticket with the details from the remittance advances 3. Accounting Department - Enters receipts into AR control account

Reasons to go global

Increase demand Decrease cost Circumvent or follow competition Leave saturated home market Create economies of scale Build power as a buyer Hurry down the experience curve Transfer DCs where competition dont have them Achieve location economies Leverage skills of global organizaiton

Option Greek Definition: Theta

Increase in option value per decrease in time to expiry (-1/365∂C/∂t)

Option Greek Definition: Psi

Increase in option value per percentage point increase in the dividend yield (0.01∂C/∂∂)

Option Greek Definition: Rho

Increase in option value per percentage point increase in the risk-free rate (0.01∂C/∂r)

What is income?

Increases in economic benefits during the accounting period in the form of inflows or enhancements of assets or decreases of liabilities that result in increases in equity other than those relating to contributions from equity participants Sales Interest Income Dividend Income Other Income

What would increase the width of the confidence interval?

Increasing the confidence level, Decreasing the sample size

What is latent semantic indexing?

Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text Based on the principle that words that are used in the same contexts tend to have similar meanings "Latent": semantic associations between words is present not explicitly but only latently For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations

What is latent semantic indexing? What is it used for? What are the specific limitations of the method?

Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text Based on the principle that words that are used in the same contexts tend to have similar meanings "Latent": semantic associations between words is present not explicitly but only latently For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations Used for: Learning correct word meanings Subject matter comprehension Information retrieval Sentiment analysis (social network analysis)

Informational Systems

Informational systems are used to provide a place for data to be stored and prepared for analytical purposes

21 Ways to Cut Costs Production

Invest in Technology Consolidate production space to gain scale and increase accountability Create flexible production lines Reduce Inventory(JIT) Outsource Renegotiate with Suppliers Consolidate Suppliers Import Parts

What is the Hypergeometric Distribution?

It is a discrete probability distribution that describes outcomes when sampling from a population without replacement. Suppose we are sampling without replacement from a batch of items containing a variable number of defectives. We are essentially assuming that we know the probability p that a given item is defective but not the actual number of defective items contained in the batch. The number of defective items in the batch is a random variable in this case • Wallet with 3 $100 and 5 $1, probability of getting 2 $100 bills • Checking defective products in a batch of manufactured goods

What is the Binomial Distribution?

It is a discrete probability distribution that describes the outcome of n independent trials in an experiment. In every trial there can only be 2 outcomes (success or failure). The binomial distribution describes the behavior of a count variable X if the following conditions apply (x is the probability of observing a success) 1: The experiment consists of n identical trials 2: Each event/observation is independent 3: Each observation represents one of two outcomes ("success" or "failure"). 4: The probability of "success" p is the same for each outcome. Mean = np | variance = np(1-p) Examples: 4. Market research experiment if people prefer Coke or Pepsi

Is it possible to perform logistic regression with Microsoft Excel?

It is possible to perform logistic regression with Microsoft Excel. There are two ways to do it using Excel. a) One is to use Add-ins provided by many websites which we can use. b) Second is to use fundamentals of logistic regression and use Excel's computational power to build a logistic regression But when this question is being asked in an interview, interviewer is not looking for a name of Add-ins rather a method using the base excel functionalities. Let's use a sample data to learn about logistic regression using Excel. (Example assumes that you are familiar with basic concepts of logistic regression) Sample Data for Logistic Regression Demo using Excel Data shown above consists of three variables where X1 and X2 are independent variables and Y is a class variable. We have kept only 2 categories for our purpose of binary logistic regression classifier. Next we have to create a logit function using independent variables, i.e. Logit = L = β0 + β1*X1 + β2*X2 Logit Function Applied

HTML- hypertext markup languges

It uses tags to mark how content is structured within a web page so that a web browser can process the tags and display the intended content.

C(S

K,T) Payoff,max(0,S(T)-K)

What is cross-validation? How to do it right?

It's a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set. Examples: leave-one-out cross validation, K-fold cross validation How to do it right? the training and validation data sets have to be drawn from the same population predicting stock prices: trained for a certain 5-year period, it's unrealistic to treat the subsequent 5-year a draw from the same population common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well Bias-variance trade-off for k-fold cross validation: Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n−1n−1 observations). But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation Typically, we choose k=5k=5 or k=10k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.

what is an example of image recognition?

Jetpac, is an application that uses public Instagram data to create "Jetpac City Guides".

What are the last machine learning papers you've read?

Keeping up with the latest scientific literature on machine learning is a must if you want to demonstrate interest in a machine learning position. This overview of deep learning in Nature by the scions of deep learning themselves (from Hinton to Bengio to LeCun) can be a good reference paper and an overview of what's happening in deep learning — and the kind of paper you might want to cite.

Explain the difference between L1 and L2 regularization.

L2 regularization tends to spread error among all the terms, while L1 is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms, while L2 corresponds to a Gaussian prior.

21 Ways to Cut Costs 3 Categories

Labor Production Finance

What is Working Capital?

Liquid assets held to meet day to day running costs of the company. (Short term finance)

The ability to convert assets into cash is called ______.

Liquidity

Dogs IV

Low market share, low growth rate -Compete in slow or no market growth industry, consider liquidation

How are kernel methods different?

Machine learning and data mining Kernel Machine.svg Problems[show] Supervised learning (classification • regression) [show] Clustering[show] Dimensionality reduction[show] Structured prediction[show] Anomaly detection[show] Neural nets[show] Reinforcement learning[show] Theory[show] Machine-learning venues[show] Related articles[show] Portal-puzzle.svg Machine learning portal v t e In machine learning, kernel methods are a class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). The general task of pattern analysis is to find and study general types of relations (for example clusters, rankings, principal components, correlations, classifications) in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over pairs of data points in raw representation. In kernel methods we map our data into higher dimensions and then classify it. Researchers use kernel methods for non linear classifiable problems. The most interesting point is if you can find proper kernel function for your data set. you can classify it very accurately. But the most different scenario is finding a proper kernel function for your problem. most of the researchers use GA (genetic algorithms) to solve these problems.

If a Product is in its Mature Stage focus on.....

Manufacturing Costs Competition

Why study data analytics?

Many business professionals are trying to understand the data to make better decisions. Employers are pushing educators to better train students on the fundamentals of analytics.

Entering a New Market Approachs

Market Entry

Common metrics in regression:

Mean Squared Error Vs Mean Absolute Error RMSE gives a relatively high weight to large errors. The RMSE is most useful when large errors are particularly undesirable. The MAE is a linear score: all the individual differences are weighted equally in the average. MAE is more robust to outliers than MSE. RMSE=1n∑ni=1(yi−y^i)2−−−−−−−−−−−−−−√RMSE=1n∑i=1n(yi−y^i)2 MAE=1n∑ni=1|yi−y^i|MAE=1n∑i=1n|yi−y^i| Root Mean Squared Logarithmic Error RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate (opposite to RMSE) RMSLE=1n∑ni=1(log(pi+1)−log(ai+1))2−−−−−−−−−−−−−−−−−−−−−−−−−−−√RMSLE=1n∑i=1n(log⁡(pi+1)−log⁡(ai+1))2 Where pipi is the ith prediction, aiai the ith actual response, log(b)log(b) the natural logarithm of bb. Weighted Mean Absolute Error The weighted average of absolute errors. MAE and RMSE consider that each prediction provides equally precise information about the error variation, i.e. the standard variation of the error term is constant over all the predictions. Examples: recommender systems (differences between past and recent products) WMAE=1∑wi∑ni=1wi|yi−y^i|

How do data management procedures like missing data handling make selection bias worse?

Missing value treatment is one of the primary tasks which a data scientist is supposed to do before starting data analysis. There are multiple methods for missing value treatment. If not done properly, it could potentially result into selection bias. Let see few missing value treatment examples and their impact on selection- Complete Case Treatment: Complete case treatment is when you remove entire row in data even if one value is missing. You could achieve a selection bias if your values are not missing at random and they have some pattern. Assume you are conducting a survey and few people didn't specify their gender. Would you remove all those people? Can't it tell a different story? Available case analysis: Let say you are trying to calculate correlation matrix for data so you might remove the missing values from variables which are needed for that particular correlation coefficient. In this case your values will not be fully correct as they are coming from population sets. Mean Substitution: In this method missing values are replaced with mean of other available values.This might make your distribution biased e.g., standard deviation, correlation and regression are mostly dependent on the mean value of variables. Hence, various data management procedures might include selection bias in your data if not chosen correctly.

Modification to Auditor's Opinion

Modified when: 1. Conclude FS as whole are materially misstated (FS issue) 2. Unable to obtain sufficient appropriate audit evidence to conclude FS are free from material misstatement (audit issue) -*Qualified Opinion*: States except for matter, FS present fairly in all material respects, the financial position, results of operations, and CF. (GAAP or GAAS) -*Adverse Opinion*: FS do not present fairly the fin position. (GAAP) -*Disclaimer of Opinion*: Auditor does not express an opinion on FS (GAAS) *None or Immaterial* -FS Materially Misstated (GAAP): Unmodified (unqualified) -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Unmodified (unqualified) *Material but not pervasive* -FS Materially Misstated (GAAP): Qualified opinion -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Qualified opinion *Material and pervasive* -FS Materially Misstated (GAAP): Adverse opinion -Inability to Obtain Sufficient Appropriate Audit Evidence (GAAS): Disclaimer of opinion

When a material GAAP problem is discovered how is the Auditor's Responsibility Paragraph modified (Non-issuer)?

Modify the paragraph to state: "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the *qualified* audit opinion."

Adverse Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Auditor's Responsibility Paragraph.

Modify the paragraph to state: "Auditor believes that the audit evidence obtained is sufficient and appropriate to provide a basis for the ADVERSE AUDIT OPINION. "

How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?

Often it is observed that in the pursuit of rapid innovation (aka "quick fame"), the principles of scientific methodology are violated leading to misleading innovations, i.e. appealing insights that are confirmed without rigorous validation. One such scenario is the case that given the task of improving an algorithm to yield better results, you might come with several ideas with potential for improvement. An obvious human urge is to announce these ideas ASAP and ask for their implementation. When asked for supporting data, often limited results are shared, which are very likely to be impacted by selection bias (known or unknown) or a misleading global minima (due to lack of appropriate variety in test data). Data scientists do not let their human emotions overrun their logical reasoning. While the exact approach to prove that one improvement you've brought to an algorithm is really an improvement over not doing anything would depend on the actual case at hand, there are a few common guidelines: Ensure that there is no selection bias in test data used for performance comparison Ensure that the test data has sufficient variety in order to be symbolic of real-life data (helps avoid overfitting) Ensure that "controlled experiment" principles are followed i.e. while comparing performance, the test environment (hardware, etc.) must be exactly the same while running original algorithm and new algorithm Ensure that the results are repeatable with near similar results Examine whether the results reflect local maxima/minima or global maxima/minima One common way to achieve the above guidelines is through A/B testing, where both the versions of algorithm are kept running on similar environment for a considerably long time and real-life input data is randomly split between the two. This approach is particularly common in Web Analytics.

Forming an Opinion on FS

Opinion on whether FS are presented fairly in all material respect. -To form opinion, auditor take into account: 1. Sufficient appropriate audit evidence was obtained 2. FS are prepared in accordance to financial reporting framework -FS are complete set of general-purpose FS, including notes. (US GAAP: BS, statement of income, changes in equity, CF statement, related notes) When forming opinion: -FS adequately disclose significant accounting policies -Accounting policies are consistent with framework -Accounting estimates made by mgt reasonable -Info presented in FS is relevant, reliable, comparable, and understandable -FS provide adequate disclosures -Terminology is appropriate -Overall structure is fairly presented -FS represent the underlying transactions that achieves fair presentation *Departure from GAAP is permissible if FS would be otherwise misleading (unmodified/unqualified opinion) -Use generally acceptable auditing standards (GAAS) for guidelines to perform the audit -Refer to financial reporting framework (GAAP) to evaluate whether transactions are recorded and reported fairly in FS

Python or R - Which one would you prefer for text analytics?

Pandas , data structures, high performance data analysis tools

What is Parquet?

Parquet is a tabular format for saving and retrieving data.

Shares

Partial ownership of the company Ordinary Shares, Preference Shares Initial Public Offering, Seasoned Offering, Rights Issue

What is the difference between concurrency and parallelism?

People often confuse with the terms concurrency and parallelism. When several computations execute sequentially during overlapping time periods it is referred to as concurrency whereas when processes are executed simultaneously it is known as parallelism. Parallel collection, Futures and Async library are examples of achieving parallelism in Scala.

Definition: Elasticity

Percentage change in option value as a function of the percentage change in the value of the underlying asset.

3 Steps of Strategic Planning

Problem Solution Business Model -how are you planning on making money solving this problem

An auditor may express a disclaimer of opinion when the auditor is unable to obtain sufficient appropriate audit evidence on which to base an opinion. When management refuses to

Produce documentation verifying the ownership of its equipment and production facilities, a client-imposed scope limitation exists, and an expression of disclaimer of opinion may be appropriate.

Calendar Spread Profit

Profit if S(0)=S(T), but loss possible for S(0) substantially different from S(T).

How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.

Proposed methods for model validation: If the values predicted by the model are far outside of the response variable range, this would immediately indicate poor estimation or model inaccuracy. If the values seem to be reasonable, examine the parameters

value chain

R&D --> sourcing --> inbound logistics --> manufacturing --> distribution --> sales & marketing --> service

Common metrics in classification:

Recall / Sensitivity / True positive rate: High when FN low. Sensitive to unbalanced classes. Sensitivity=TPTP+FNSensitivity=TPTP+FN Precision / Positive Predictive Value High when FP low. Sensitive to unbalanced classes. Precision=TPTP+FPPrecision=TPTP+FP Specificity / True Negative Rate High when FP low. Sensitive to unbalanced classes. Specificity=TNTN+FPSpecificity=TNTN+FP Accuracy High when FP and FN are low. Sensitive to unbalanced classes (see "Accuracy paradox") Accuracy=TP+TNTN+TP+FP+FNAccuracy=TP+TNTN+TP+FP+FN ROC / AUC ROC is a graphical plot that illustrates the performance of a binary classifier (SensitivitySensitivity Vs 1−Specificity1−Specificity or SensitivitySensitivity Vs SpecificitySpecificity). They are not sensitive to unbalanced classes. AUC is the area under the ROC curve. Perfect classifier: AUC=1, fall on (0,1) 100% sensitivity (no FN) and 100% specificity (no FP) Logarithmic loss Punishes infinitely the deviation from the true value! It's better to be somewhat wrong than emphatically wrong! logloss=−1N∑ni=1(yilog(pi)+(1−yi)log(1−pi))logloss=−1N∑i=1n(yilog⁡(pi)+(1−yi)log⁡(1−pi)) Misclassification Rate Misclassification=1n∑iI(yi≠y^i)Misclassification=1n∑iI(yi≠y^i) F1-Score Used when the target variable is unbalanced. F1Score=2Precision×RecallPrecision+RecallF1Score=2Precision×RecallPrecision+Recall

Define precision and recall

Recall is also known as the true positive rate: the amount of positives your model claims compared to the actual number of positives there are throughout the data. Precision is also known as the positive predictive value, and it is a measure of the amount of accurate positives your model claims compared to the number of positives it actually claims. It can be easier to think of recall and precision in the context of a case where you've predicted that there were 10 apples and 5 oranges in a case of 10 apples. You'd have perfect recall (there are actually 10 apples, and you predicted there would be 10) but 66.7% precision because out of the 15 events you predicted, only 10 (the apples) are correct.

Explain what regularization is and why it is useful.

Regularization is the process of adding a tuning parameter to a model to induce smoothness in order to prevent overfitting. This is most often done by adding a constant multiple to an existing weight vector. This constant is often either the L1 (Lasso) or L2 (ridge), but can in actuality can be any norm. The model predictions should then minimize the mean of the loss function calculated on the regularized training set.

Why L1 regularizations causes parameter sparsity whereas L2 regularization does not?

Regularizations in statistics or in the field of machine learning is used to include some extra information in order to solve a problem in a better way. L1 & L2 regularizations are generally used to add constraints to optimization problems. L1 L2 Regularizations In the example shown above H0 is a hypothesis. If you observe, in L1 there is a high likelihood to hit the corners as solutions while in L2, it doesn't. So in L1 variables are penalized more as compared to L2 which results into sparsity. In other words, errors are squared in L2, so model sees higher error and tries to minimize that squared error.

Do you have research experience in machine learning?

Related to the last point, most organizations hiring for machine learning positions will look for your formal experience in the field. Research papers, co-authored or supervised by leaders in the field, can make the difference between you being hired and not. Make sure you have a summary of your research experience and papers ready — and an explanation for your background and lack of formal research experience if you don't.

Net Profit

Revenue*Net Margin %

What is root cause analysis?

Root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. A factor is considered a root cause if removal thereof from the problem-fault-sequence prevents the final undesirable event from recurring

How will you assess the statistical significance of an insight whether it is a real insight or just by chance?

Statistical importance of an insight can be accessed using Hypothesis Testing.

What is sampling? How many sampling methods do you know?

Sampling Methods can be classified into one of two categories: Probability Sampling: Sample has a known probability of being selected Non-probability Sampling: Sample does not have known probability of being selected as in convenience or voluntary response surveys Probability Sampling In probability sampling it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling: Simple Random Sampling (SRS) Stratified Sampling Cluster Sampling Systematic Sampling Multistage Sampling (in which some of the methods above are combined in stages)

What is a Scala Map?

Scala Map is a collection of key value pairs wherein the value in a map can be retrieved using the key. Values in a Scala Map are not unique but the keys are unique. Scala supports two kinds of maps- mutable and immutable. By default, Scala supports immutable map and to make use of the mutable map, programmers have to import the scala.collection.mutable.Map class explicitly. When programmers want to use mutable and immutable map together in the same program then the mutable map can be accessed as mutable.map and the immutable map can just be accessed with the name of the map.

Which Scala library is used for functional programming?

Scalaz library has purely functional data structures that complement the standard Scala library. It has pre-defined set of foundational type classes like Monad, Functor, etc.

How can you deal with different types of seasonality in time series modelling?

Seasonality in time series occurs when time series shows a repeated pattern over time. E.g., stationary sales decreases during holiday season, air conditioner sales increases during the summers etc. are few examples of seasonality in a time series. Seasonality makes your time series non-stationary because average value of the variables at different time periods. Differentiating a time series is generally known as the best method of removing seasonality from a time series. Seasonal differencing can be defined as a numerical difference between a particular value and a value with a periodic lag (i.e. 12, if monthly seasonality is present)

Horizontal integration

Seeking ownership or increased control over competitors -We want to do this when there is a major benefit absorbing the competitor

Principles - Existing Control Activities

Select and Develop *C*ontrol *A*ctivities Select and Develop *T*echnology Controls Deployment of *P*olicies and Procedures

Systematic Sampling

Select one of the first k members randomly, and then every kth member after the selected one • k is the sample interval and equals the ratio N/n

What is selection bias? why is it important and how can you avoid it?

Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample. For example, if a given sample of 100 test cases was made up of a 60/20/15/5 split of 4 classes which actually occurred in relatively equal numbers in the population, then a given model may make the false assumption that probability could be the determining predictive factor. Avoiding non-random samples is the best way to deal with bias however when this is impractical, techniques such as resampling, boosting, and weighting are strategies which can be introduced to help deal with the situation.

Users of Financial Information

Shareholders/Investors Managers/directors Lenders Investment Analysis Government General Public Employees Customers Competitors Suppliers

Compliance with GAAS

Should not represent compliance with GAAS in auditor's report unless auditor has complied with all GAAS relevant to audit. If cannot be achieved, consider whether this prevents auditor from achieving the overall objectives of auditor and thereby requires the auditor to modify the opinion or withdraw from engagement. -GAAS does not override laws or regulations that govern an audit of FS. May conduct in accordance with GAAS and: -auditing standards by PCAOB - public -International standards on auditing - ISAs international -Government auditing standards - GAGAS -Auditing standards of specific jurisdiction

You created a predictive model of a quantitative outcome variable using multiple regressions. What are the steps you would follow to validate the model?

Since the question asked, is about post model building exercise, we will assume that you have already tested for null hypothesis, multi collinearity and Standard error of coefficients. Once you have built the model, you should check for following - · Global F-test to see the significance of group of independent variables on dependent variable · R^2 · Adjusted R^2 · RMSE, MAPE In addition to above mentioned quantitative metrics you should also check for- · Residual plot · Assumptions of linear regression

owners earnings

Start with earnings Add back depreciation and amortization Add back non-cash charges Subtract maintenance capital expenditures If working capital increased, subtract change in working capital If working capital decreased, add change in working capital The difficult part of calculating owner's earnings is finding maintenance capital expenditures.

Income statement

Statement of operating results presented under the accrual basis of accounting

Reducing costs -

Step 1:Ask for a break down of costs. Step2: If any cost seems out of line, investigate why. Step 3: Benchmark the competitors. Step 4: Determine whether there are any labor-saving technologies that would help reduce costs Or investigate internal v external costs: *Internal -union wages -suppliers -materials -economies of sales -increased support system *External -economy -interest rates -government relations -transportation/shipping strikes

Pricing Strategies Cost-Based Pricing

Take all of our costs and add them up, add profit to it This way you will know the break even point

What is TFIDF?

Term frequency inverse document frequency. It is a weighting technique for text classifications. How important is a word in a document contained in a corpus?

Example of feature engineering

Text files: bag of words 1. Each word is associated with a unique integer 2. For each document, # occurances of each word is computed and stored in a matrix

What is the Central Limit Theorem?

The Central Limit Theorem states if we sample from a population given a sufficiently large sample size, the mean of the samples will be normally distributed (as long as the events are random and independent). It is true regardless of the distribution of the original population. The main idea behind it is that it is expensive and impractical to sample the entire population, so we can infer about the characteristics of a population given a sample.

What's the F1 score? How would you use it?

The F1 score is a measure of a model's performance. It is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. You would use it in classification tests where true negatives don't matter much.

How would you approach the "Netflix Prize" competition?

The Netflix Prize was a famed competition where Netflix offered $1,000,000 for a better collaborative filtering algorithm. The team that won called BellKor had a 10% improvement and used an ensemble of different methods to win. Some familiarity with the case and its solution will help demonstrate you've paid attention to machine learning for a while.

What does the Statement of Cash flows show?

The actual cash received during the period and how that cash was spent during the period Includes cash in hand and demand deposits and overdrafts

What's the difference between probability and likelihood?

The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English. If my assumption is not correct please let me know and I will revise my answer. Discrete Random Variables Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes etc). In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is pp and that coin tosses are independent). Denote the observed outcomes by OO and the set of parameters that describe the stochastic process as θθ. Thus, when we speak of probability we want to calculate P(O|θ)P(O|θ). In other words, given specific values for θθ, P(O|θ)P(O|θ) is the probability that we would observe the outcomes represented by OO. However, when we model a real life stochastic process, we often do not know θθ. We simply observe OO and the goal then is to arrive at an estimate for θθ that would be a plausible choice given the observed outcomes OO. We know that given a value of θθ the probability of observing OO is P(O|θ)P(O|θ). Thus, a 'natural' estimation process is to choose that value of θθ that would maximize the probability that we would actually observe OO. In other words, we find the parameter values θθ that maximize the following function: L(θ|O)=P(O|θ)L(θ|O)=P(O|θ) L(θ|O)L(θ|O) is called as the likelihood function. Notice that by definition the likelihood function is conditioned on the observed OO and that it is a function of the unknown parameters θθ. Continuous Random Variables In the continuous case the situation is similar with one important difference. We can no longer talk about the probability that we observed OO given θθ because in the continuous case P(O|θ)=0P(O|θ)=0. Without getting into technicalities, the basic idea is as follows: Denote the probability density function (pdf) associated with the outcomes OO as: f(O|θ)f(O|θ). Thus, in the continuous case we estimate θθ given observed outcomes OO by maximizing the following function: L(θ|O)=f(O|θ)L(θ|O)=f(O|θ) In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe OO as we maximize the pdf associated with the observed outcomes OO.

Audit Procedures

The auditor should perform procedures on any interrelated items as necessary. Examples are sales/receivable, inventory/payables, fixed assets/deprecation. *Audit of SHE* Specific elements based on SHE, auditor should perform procedures necessary to express an opinion on financial position because of interrelationship between SHE and BS *Audit of NI* Specific elements based on NI, auditor should perform procedures necessary to express an opinion on financial position and results of operations because of interrelationship between income, BS, and IS accounts

What is clustering?

The computers learn how to partition observations in various subsets. So each partition will be made of similar observations

Why do we call it GLM when it's clearly non-linear? (somewhat tricky question

The linear in "generalized linear model" says the parameters enter the model linearly. Specifically, what's meant is that on the scale of the linear predictor η=g(μ), the model is of the form η=Xβ. which may in turn be modeled using the linear model framework by using the appropriate link function. "Logistic" on the other hand refers to the description of a mean (that the mean is logistic in predictors). It's not a GLM unless you combine it with a conditional distribution that's in the exponential family. When people say "logistic regression" on the other hand, they almost always mean a binomial model with logit link - that does have mean that's logistic in predictors, the model is linear in parameters and is in the exponential family, so is a GLM.

What is the Binomial Probability Formula?

The first variable in the binomial formula, n, stands for the number of times the experiment is performed. The second variable, p, represents the probability of one specific outcome. For example, let's suppose you wanted to know the probability of getting a 1 on a die roll. if you were to roll a die 20 times, the probability of rolling a one on any throw is 1/6. Roll twenty times and you have a binomial distribution of (n=20, p=1/6). SUCCESS would be "roll a one" and FAILURE would be "roll anything else." If the outcome in question was the probability of the die landing on an even number, the binomial distribution would then become (n=20, p=1/2). That's because your probability of throwing an even number is one half. Binomial distributions must also meet the following three criteria: The number of observations or trials is fixed. In other words, you can only figure out the probability of something happening if you do it a certain number of times. This is common sense — if you toss a coin once, your probability of getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a tails is very, very close to 100%. Each observation or trial is independent. In other words, none of your trials have an effect on the probability of the next trial. The probability of success (tails, heads, fail or pass) is exactly the same from one trial to another. Once you know that your distribution is binomial, you can apply the binomial distribution formula to calculate the probability. What is a Binomial Distribution? The Bernoulli Distribution. The binomial distribution is closely related to the Bernoulli distribution. According to Washington State University, "If each Bernoulli trial is independent, then the number of successes in Bernoulli trails has a Binomial Distribution. On the other hand, the Bernoulli distribution is the Binomial distribution with n=1." A Bernouilli distribution is a set of Bernouilli trials. Each Bernouilli trial has one possible outcome, chosen from S, success, or F, failure. In each trial, the probability of success, P(S)=p, is the same. The probability of failure is just 1 minus the probability of success: P(F) = 1-p. (Remember that "1" is the total probability of an event occurring...probability is always between zero and 1). Finally, all Bernouilli trials are independent from each other and the probability of success doesn't change from trial to trial, even if you have information about the other trials' outcomes. What is a Binomial Distribution? Real Life Examples Many instances of binomial distributions can be found in real life. For example, if a new drug is introduced to cure a disease, it either cures the disease (it's successful) or it doesn't cure the disease (it's a failure). If you purchase a lottery ticket, you're either going to win money, or you aren't. Basically, anything you can think of that can only be a success or a failure can be represented by a binomial distribution. The Binomial Distribution Formula Binomial Distribution formula A Binomial Distribution shows either (S)uccess or (F)ailure. The binomial distribution formula is: b(x; n, P) = nCx * Px * (1 - P)^(n - x) Where: b = binomial probability x = total number of "successes" (pass or fail, heads or tails etc.) P = probability of a success on an individual trial n = number of trials or P(x) = n!/ (n-X)!X! * p^x * q^(n-x)

what is a primary key?

The primary key- a unique identifier EG. In a Customer table, the Customer Number could be the primary key.

What is Collaborative filtering?

The process of filtering used by most of the recommender systems to find patterns or information by collaborating viewpoints, various data sources and multiple agents.

What is Machine Learning?

The simplest way to answer this question is - we give the data and equation to the machine. Ask the machine to look at the data and identify the coefficient values in an equation. For example for the linear regression y=mx+c, we give the data for the variable x, y and the machine learns about the values of m and c from the data.

What is a Monad in Scala?

The simplest way to define a monad is to relate it to a wrapper. Any class object is taken wrapped with a monad in Scala. Just like you wrap any gift or present into a shiny wrapper with ribbons to make them look attractive, Monads in Scala are used to wrap objects and provide two important operations - Identity through "unit" in Scala Bind through "flatMap" in Scala

What are the two methods used for the calibration in Supervised Learning?

The two methods used for predicting good probabilities in Supervised Learning are a) Platt Calibration b) Isotonic Regression These methods are designed for binary classification, and it is not trivial.

Is it better to design robust or accurate algorithms?

The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before The generalization performance of a learning system strongly depends on the complexity of the model assumed If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system has poor generalization properties and is said to suffer from underfitting By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam's razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations Quick response: Occam's Razor. It depends on the learning task. Choose the right balance Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)

How can you assess a good logistic model?

There are various methods to assess the results of a logistic regression analysis- • Using Classification Matrix to look at the true negatives and false positives. • Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening. • Lift helps assess the logistic model by comparing it with random selection.

How do you ensure you're not overfitting with a model?

This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations. There are three main methods to avoid overfitting: 1- Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data. 2- Use cross-validation techniques such as k-folds cross-validation. 3- Use regularization techniques such as LASSO that penalize certain model parameters if they're likely to cause overfitting.

How can we use your machine learning skills to generate revenue?

This is a tricky question. The ideal answer would demonstrate knowledge of what drives the business and how your skills could relate. For example, if you were interviewing for music-streaming startup Spotify, you could remark that your skills at developing a better recommendation model would increase user retention, which would then increase revenue in the long run. The startup metrics Slideshare linked above will help you understand exactly what performance indicators are important for startups and tech companies as they think about revenue and growth.

What do you think of our current data process?

This kind of question requires you to listen carefully and impart feedback in a manner that is constructive and insightful. Your interviewer is trying to gauge if you'd be a valuable member of their team and whether you grasp the nuances of why certain things are set the way they are in the company's data process based on company- or industry-specific conditions. They're trying to see if you can be an intellectual peer. Act accordingly.

Middle Paragraph(s)

This paragraph contains the following: (1) All of the substantive reasons that lead the auditor to conclude that there has been a departure from GAAP. (2) Disclosure of the principal effects of the subject matter, if practicable.

How will you define the number of clusters in a clustering algorithm?

Though the Clustering Algorithm is not specified, this question will mostly be asked in reference to K-Means clustering where "K" defines the number of clusters. The objective of clustering is to group similar entities in a way that the entities within a group are similar to each other but the groups are different from each other. For example, the following image shows three different groups. K Mean Clustering Machine Learning Algorithm Within Sum of squares is generally used to explain the homogeneity within a cluster. If you plot WSS for a range of number of clusters, you will get the plot shown below. The Graph is generally known as Elbow Curve. Data Science Interview Questions K Mean Clustering Red circled point in above graph i.e. Number of Cluster =6 is the point after which you don't see any decrement in WSS. This point is known as bending point and taken as K in K - Means. This is the widely used approach but few data scientists also use Hierarchical clustering first to create dendograms and identify the distinct groups from there.

Breakeven Analysis

Total Fixed Costs / Contribution Margin per Unit

Ratio of Liabilities to Stockholders' Equity = ? / ?

Total Liabilities / Total Stockholders' Equity

What are the phases of supervised machine learning?

Training phase, validation phase, test phase, application

What is binning?

Transforms continuous features into a discrete one

Reducing Cost Cost Analysis- Internal Elements

Union Wages Suppliers Materials Economies of Scale Increase Support System

what is WSDL?

Users can access data from web services using a web services description language (WSDL) such as Java.

Ridge regression:

We use an L2L2 penalty when fitting the model using least squares We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficients λ: tuning parameter controls the bias-variance tradeoff accessed with cross-validation A bit faster than the lasso β^ridge=argminβ{∑(yi−β0−∑(xij)βj)^2+λ∑β^2}

What are the benefits and drawbacks of specific methods such as ridge regression?

We use an L2L2 penalty when fitting the model using least squares We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficientsλ×∑coefficients λλ: tuning parameter controls the bias-variance tradeoff accessed with cross-validation A bit faster than the lasso β^ridge=argminβ{∑ni=1(yi−β0−∑pj=1xijβj)2+λ∑pj=1β2j}

Developing a New Product 1. Think about the Product

What is proprietary or special about it? Is the product patented? for how long? Are there similar products? Substitutes? What are the advantages or disadvantages of the new product? How does this new product fit in with the rest of our product line? Can our sales force sell it?

What questions can help us in quote to cash?

What percentage of quotes convert to sales orders? What percentage of customers don't pay their bills? How much of a discount does GBI give each year to customers who take advantage of discounted payment terms? What is the average delay in delivery? What percentage of products are damaged during shipment?

P-value?

When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that's on trial, in essence, is called the null hypothesis. The alternative hypothesis is the one you would believe if the null hypothesis is concluded to be untrue. The evidence in the trial is your data and the statistics that go along with it. All hypothesis tests ultimately use a p-value to weigh the strength of the evidence (what the data are telling you about the population). The p-value is a number between 0 and 1 and interpreted in the following way: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis. A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.

Starting a New Business 1. Initial Questions

Who is our competition? What market share does each competitor have? How do competitors' products or services differ from ours? Are there barriers to enter or exit?

New Business Market Elements

Who is the competition? What is their market share? Products comparison? Barriers to entry?

Turnarounds 1. Analyze the Company and the Industry

Why is it failing? Bad products or services? bad management? bad economy? Are the competitors facing the same problem? Do we have access to capital? Is the company publicly-traded or privately held?

For K₁>K₂>K₃

[C(S,K₁,T)-C(S,K₂,T)]/[K₁-K₂] ? [C(S,K₂,T)-C(S,K₃,T)]/[K₂-K₃],≥

Risk-Neutral Pricing (p*)

[exp[(r-∂)h]-d]/[u-d]

How do you take millions of users with 100's transactions each

amongst 10k's of products and group the users together in meaningful segments?,1. Some exploratory data analysis (get a first insight) Transactions by date Count of customers Vs number of items bought Total items Vs total basket per customer Total items Vs total basket per area 2. Create new features (per customer): Counts: Total baskets (unique days) Total items Total spent Unique product id Distributions: Items per basket Spent per basket Product id per basket Duration between visits Product preferences: proportion of items per product cat per basket 3. Too many features, dimension-reduction? PCA? 4. Clustering: PCA 5. Interpreting model fit View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1. Interpret each principal component regarding the linear combination it's obtained from

Random Note

an omission of the statement of cash flows counts as a qualified opinion

Dominos: There is an 8x8 chessboard in which two diagonally opposite corners have been cut off. You are given 31 dominos

and a single domino can cover exactly two squares. Can you use the 31 dominos to cover the entire board? Prove your answer (by providing an example or showing why it's impossible).,At first, it seems like this should be possible. It's an 8 x 8 board, which has 64 squares, but two have been cut off, so we're down to 62 squares. A set of 31 dominoes should be able to fit there, right? When we try to lay down dominoes on row 1, which only has 7 squares, we may notice that one domino must stretch into the row 2. Then, when we try to lay down dominoes onto row 2, again we need to stretch a domino into row 3. For each row we place, we'll always have one domino that needs to poke into the next row. No matter how many times and ways we try to solve this issue, we won't be able to successfully lay down all the dominoes. There's a cleaner, more solid proof for why it won't work. The chessboard initially has 32 black and 32 white squares. By removing opposite corners (which must be the same color), we're left with 30 of one color and 32 of the other color. Let's say, for the sake of argument, that we have 30 black and 32 white squares. Each domino we set on the board will always take up one white and one black square. Therefore, 31 dominos will take up 31 white squares and 31 black squares exactly. On this board, however, we must have 30 black squares and 32 white squares. Hence, it is impossible.

What's your favorite algorithm

and can you explain it to me in less than a minute?,This type of question tests your understanding of how to communicate complex and technical nuances with poise and the ability to summarize quickly and efficiently. Make sure you have a choice and make sure you can explain different algorithms so simply and effectively that a five-year-old could grasp the basics!

What is deep learning

and how does it contrast with other machine learning algorithms?,Deep learning is a subset of machine learning that is concerned with neural networks: how to use backpropagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. In that sense, deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.

Grow and Increasing Sales 1. Learn about the company

and its size, resources and products,How big is it? What Products does it have? Is it a Market Leader in this Field? What is their Objective?(Profits, Market Share, or Brand Positioning?) Is the company in charge of their own pricing strategies, or is it reacting to to suppliers, market, or competition? How does Products compare to the competition? Are their substitutes or alternatives? Where is the product in its growth cycle? Is their a Supply-and-Demand issue at work?

Discuss the meaning of the ROC curve

and write pseudo-code to generate the data for such a curve.,

your goal should be to get your contact costs as ________ as possible

and your take rate as _________ as possible,low, high

maximum drawdown

biggest decline a stock has suffered as measured from stock price high to low

Differentiate between univariate

bivariate and multivariate analysis.,These are descriptive statistical analysis techniques which can be differentiated based on the number of variables involved at a given point of time. For example, the pie charts of sales based on territory involve only one variable and can be referred to as univariate analysis. If the analysis attempts to understand the difference between 2 variables at time as in a scatterplot, then it is referred to as bivariate analysis. For example, analysing the volume of sale and a spending can be considered as an example of bivariate analysis. Analysis that deals with the study of more than two variables to understand the effect of variables on the responses is referred to as multivariate analysis.

Differentiate between univariate

bivariate and multivariate analysis.,Univariate analysis is concerned with understanding one variable. For example, we can make a pie chart to analyze the breakdown of market share of coffee chains(e.g. starbucks, timmies, mcdonalds, et al) Bivariate analysis is concerned with the relationship of 2 variables, such as the relationship between sales as a function of $ spent on advertising

How would you create a taxonomy to identify key customer trends in unstructured data?

business owner, accuracy, results

offer a "free" repeat product/service that is of ______ cost to you

but _______ perceived value to your customer,low, high

Operating activities reflected in statement of cash flows

cash generated from operations interest paid income tax paid

simple moving average

determine hold vs sell 200 day moving average. trading above 200 day moving average have higher returns

earnings per share

earnings divided by total outstanding shares

product-market 2x2

existing products & existing markets: market penetration new products & existing markets: product development existing products & new markets: market development new products & new markets: diversify

You have a basketball hoop and someone says that you can play one of two games. Game 1: You get one shot to make the hoop. Game 2: You get three shots and you have to make two of three shots. If p is the probability of making a particular shot

for which values of p should you pick one game or the other?,Probability of winning Game 1: The probability of winning Game 1 is p, by definition. Probability of winning Game 2: Lets ( k, n) be the probability of making exactly k shots out of n. The probability of winning Game 2 is the probability of making exactly two shots out of three OR making all three shots. In other words: P(winning) = s(2 ,3) + s(3 ,3) The probability of making all three shots is: s ( 3, 3) = p^3 The probability of making exactly two shots is: P(making 1 and 2, and missing 3) + P(making 1 and 3, and missing 2) + P(miss ing 1, and making 2 and 3) * p * (1 -p) + p * (1 -p) * p + (1 -p) * p * 3(1-p)p^2 Adding these together, we get: p^3 + 3 ( 1 - p) p^2 p 3 + 3p^2 - 3p^3 3p 2 - 2p^3 Which game should you play? You should play Game 1 if P ( Game 1) > P ( Game 2): p > 3p^2 - 2p^3 • 1 > 3p - 2p^2 2p^2 - 3p + 1 > 0 (2p - l)(p - 1) > 0 Both terms must be positive, or both must be negative. But we know p < 1, so p - 1 < 0. This means both terms must be negative. 2p -1 < 0 2p < 1 p < • 5 So, we should play Game 1 if0 < p < • 5 and Game 2 if. 5 < p < 1. lf p = 0,0.5,or 1,then P(Game 1) = P(Game 2),so it doesn't matter which game we play.

Data is spread in all the nodes of cluster

how spark tries to process this data?,Ans: By default, Spark tries to read data into an RDD from the nodes that are close to it. Since Spark usually accesses distributed partitioned data, to optimize transformation operations it creates partitions to hold the data chunks

Number of Times Interest Charges Are Earned = ( ? + ? ) / ?

income before income tax + interest expense / interest expense

Qualified Opinion vs. Adverse Opinion: A qualified opinion should be expressed when the auditor concludes that misstatements

individually or in the aggregate, are material but not pervasive to the financial statements.,An adverse opinion should be expressed when the auditor concludes that misstatements, individually or in the aggregate, are both material and pervasive to the financial statements.

"a strong brand drives ______________ _________________ in purchasing"

initial preference

What information do shareholders want?

investment prospects, mgmt performance

r- square

is a measure of how well data fits a linear regression from 0-100. higher is better

XML- extensible markup language

is a method of tagging or coding data in documents, so that they can be read by both people and computers.

Management Accounting

is primarily concerned with the provision of information to managers. Deals with current problems and looking ahead, unlike financial accounting

that is

it tests whether the slope of the regression line is zero, H0:β=0H0:β=0 and Ha:β≠0Ha:β≠0. If the coefficient's p-value is less than 0.05, we reject the null hypothesis and conclude that we have sufficient evidence to be 95% confident that there is a significant linear relationship between the dependent and independent variables. Note that the p-value and R2 provide different information. A linear relationship can be significant (have a low p-value) but not explain a large percentage of the variation (not have a high R2.) A confidence interval associated with an independent variable's coefficient indicates the likely range for that coefficient. If the 95% confidence interval does not contain zero, we can be 95% confident that there is a significant linear relationship between the variables. Residual plots can provide important insights into whether a linear model is a good fit. Each observation in a data set has a residual equal to the historically observed value minus the regression's predicted value, that is, ε=y−ŷ ε=y−y^. Linear regression models assume that the regression's residuals follow a normal distribution with a mean of zero and fixed variance. We can also perform regression analyses using qualitative, or categorical, variables. To do so, we must convert data to dummy (0, 1) variables. After that, we can proceed as we would with any other regression analysis. A dummy variable is equal to 1 when the variable of interest fits a certain criterion. For example, a dummy variable for "Female" would equal 1 for all female observations and 0 for male observations.

Return on total assets

net income available to common stockholders/total assets

Discuss MapReduce (or your favorite parallelization abstraction). Why is MapReduce referred to as a "shared-nothing" architecture (clearly the nodes have to share something

no?) What are the advantages/disadvantages of "shared-nothing"?,

What is probabilistic merging (aka fuzzy merging)? Is it easier to handle with SQL or other languages?

on A the key is first name/lastname in some char set

debt

refers to bonds, credit lines and other borrowings

what is artificial Intelligence?

refers to the development of computer technologies that can reason and otherwise function in manners similar to humans. eg. self driving cars, robots

Poset

reflexive ⋀ antisymmetric ⋀ transitive

Explain what resampling methods are and why they are useful

repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model example: repeatedly draw different samples from training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fit differ most common are: cross-validation and the bootstrap cross-validation: random sampling with no replacement bootstrap: random sampling with replacement cross-validation: evaluating model performance, model selection (select the appropriate level of flexibility) bootstrap: mostly used to quantify the uncertainty associated with a given estimator or statistical learning method

The ________ is a rough measure of the length of time it takes to purchase

sell, and replace the inventory,number of days' sales in inventory

arithmetic growth rate

simple average of returns (and wrong)

BCG matrix

stars: high market share and high industry growth question mark: low market share and high growth cash cow: high market share and low industry growth dog/pet: low market share and low growth management should BUILD question marks, HARVEST cash cows, HOLD stars, DIVEST dogs/pets

case scenarios

strategy 1. entering new market 2. industry analysis 3. mergers and acquisitions 4. developing new product 5. pricing strategies 6. growth strategies 7. starting a new business 8. competitive response operations 9. increasing sales 10. reducing costs 11. improving bottom line (profitability) 12. turnarounds

what are the techniques of processing unstructured data?

tagged data, natural language processing, image recognition, and artificial intelligence.

Qualified Opinion Due to Material Misstatement of Financial Statements: Nonissuer. Qualified Opinion Paragraph. When the auditor expresses a qualified opinion due to a material misstatement in the financial statements

the opinion paragraph should state that, in the auditor's opinion,,EXCEPT FOR the effects of the matter(s) described in the basis for qualified opinion paragraph, the financial statements are PRESENTED FAIRLY, in all material respects, in accordance with the applicable financial reporting framework.

For such corporations

the relative risk of the debt‐ holders is normally measured as the _______ (during the year), sometimes called the fixed charge coverage ratio.,number of times interest charges are earned

For American options

the value at each node is,max(calculated value, exercise value)

that is

they support the organization's business functions. Concurrent— Small uniform transactions Optimized for storage—

If a value represents the 99th percentile

this means that,99% of all values are below this value

and so on. We can solve this problem multiple ways. Logically If the earlier sum is 1

this would mean that the gender ratio is even. Families contribute exactly one girl and on average one boy. The birth policy is therefore ineffective. Does this make sense? At first glance. this seems wrong. The policy is clPsigned to favor girls as it ensures that all families have a girl. On the other hand, the families that keep having children contribute (potentially) multiple boys to the population. This could offset the impact of the "one girl" policy. One way to think about this is to imagine that we put all the gender sequence of each family into one giant string. So if family 1 has BG, family 2 has BBG, and family 3 has G, we would write BGBBGG. In fact, we don't really care about the groupings of families because we're concerned about the population as a whole. As soon as a child is born, we can just append its gender (B or G) to the string. What are the odds of the next character being a G? Well, if the odds of having a boy and girl is the same, then the odds of the next character being a G is 50%. Therefore, roughly half of the string should be Gs and half should be Bs, giving an even gender ratio. This actually makes a lot of sense. Biology hasn't been changed. Half of newborn babies are girls and half are boys. Abiding by some rule about when to stop having children doesn't change this fact. Therefore, the gender ratio is 50% girls and 50% boys.

what are the types of heiarchry tables

time dependent time- independent version dependent interval dependent

Why data cleaning plays a vital role in analysis?

time take, 80%

What is precision?

tp / (tp + fp)

what are the types of data sources?

transactional systems- Informational systems- Excel spreadsheet ERP- Integrated Enterprise Resouce Planning systems such as SAP and Oracle

Is Naïve Bayes bad? If yes

under what aspects.,

any of the following would indicate poor estimation or multi-collinearity: opposite signs of expectations

unusually large or small values, or observed inconsistency when the model is fed new data. Use the model for prediction by feeding it new data, and use the coefficient of determination (R squared) as a model validity measure. Use data splitting to form a separate dataset for estimating model parameters, and another for validating predictions. Use jackknife resampling if the dataset contains a small number of instances, and measure validity with R squared and mean squared error (MSE).

Blue-Eyed Island: A bunch of people are living on an island

when a visitor comes with a strange order: all blue-eyed people must leave the island as soon as possible. There will be a flight out at 8:00pm every evening. Each person can see everyone else's eye color, but they do not know their own (nor is anyone allowed to tell them). Additionally, they do not know how many people have blue eyes, although they do know that at least one person does. How many days will it take the blue-eyed people to leave?,Let's apply the Base Case and Build approach. Assume that there are n people on the island and c of them have blue eyes. We are explicitly told that c > 0. Case c = 1: Exactly one person has blue eyes. Assuming all the people are intelligent, the blue-eyed person should look around and realize that no one else has blue eyes. Since he knows that at least one person has blue eyes, he must conclude that it is he who has blue eyes. Therefore, he would take the flight that evening. Case c = 2: Exactly two people have blue eyes. The two blue-eyed people see each other, but are unsure whether c is 1 or 2. They know, from the previous case, that if c = 1, the blue-eyed person would leave on the first night. Therefore, if the other blue-eyed person is still there, he must deduce that c = 2, which means that he himself has blue eyes. Both men would then leave on the second night. Case c > 2: The General Case. As we increase c, we can see that this logic continues to apply. If c = 3, then those three people will immediately know that there are either 2 or 3 people with blue eyes. If there were two people, then those two people would have left on the second night. So, when the others are still around after that night, each person would conclude that c = 3 and that they, therefore, have blue eyes too. They would leave that night. This same pattern extends up through any value of c. Therefore, if c men have blue eyes, it will take c nights for the blue-eyed men to leave. All will leave on the same night

What is Task

with regards to Spark Job execution?,Ans: Task is an individual unit of work for executors to run. It is an individual unit of physical execution (computation) that runs on a single machine for parts of your Spark application on a data. All tasks in a stage should be completed before moving on to another stage. -A task can also be considered a computation in a stage on a partition in a given job attempt. -A Task belongs to a single stage and operates on a single partition (a part of an RDD). -Tasks are spawned one by one for each stage and data partition.

When looking at long term assets and long term liabilities

you are analyzing,Solvency

nonlinear relationships have correlation close to

zero

Call-Put Option Relationship: Theta

θcall-θput=[∂Se^(-∂T)-rKe^(-rT)]/365

Call-Put Option Relationship: Rho

ρcall-ρput=0.01TKe^(-rT)

Schroder Volatility Adjustment

σ(F)=σ(S)×S/F (Adjust ONLY if given historical σ)

Volatility Relationship: Option vs Underlying Asset

σ(option)=σ(stock)|Ω|

Revenue recognition

-Accrual basis -Cash basis

Net Income/ Cash Flow *Forecast Sales*

-Begin with economy's GDP forecast (x) - -Company growth rate based on market share analysis. -retain => company growth rate = industry -gain => >industry -lose => <industry

Focus strategies

-Depends on an industry segment that is of sufficient size, had good growth potential, and is not crucial to the success of other major competitors -Most effective when consumers have distinctive preferences

Financial statement analysis

-Equity investment decisions -Credit decisions -Review a supplier, customer, or a competitor -Audit/consulting engagement planning process -Corporate acquisitions and consolidations -Internal company review -Valuation engagement -A review of the past and projection of the future

Accrual basis

-Expenses recognized when they are incurred, regardless when they are paid -Revenues recognized when realized, regardless when cash is collected

Net income/Cash Flow *Top-down Approach*

-Forecast Sales -Now estimate income/cash flow

Intensive strategies

-Has to do with current products or services -Market penetration, market development, product development

A review of the past and a projection of the future: Is the company going to continue indefinitely?

-Liquidity analysis -Solvency analysis -Profitability analysis

Long-term assets

-Long-term investments -PP&E -Intangible assets -Other assets

Objectives should be:

-Specific -Measurable -Achievable -Relevant -Time bound -Congruent among organizational units

Balance sheet

-Statement of financial position -Current assets vs long term assets -Shows the economic resources of the businesses, then shows us how we finance the acquisition of those economic resources in 1 of 3 ways: 1. Creditors 2. Owner contributions 3. Prior years earnings

Synergies of related diversification

-Transferring competitively valuable expertise or other capabilities from one business to another -Combining the related activities of separate businesses into a singe operation to achieve lower costs -Exploiting use of a known brand name -Using cross-business collaboration to create strengths

Analyzing performance measures by organizational level

-Trying to create long and short term objectives -At higher level, focusing more on long-term -At lower level, focusing more on short-term

Screening *Limits*

-when screening with rates, they are not adjusted for GAAP vs IFRS -back-testing may not be relevant for future periods.

Assessing Credit Quality *4 C's of credit*

1)Character - quality of management 2)Capacity - ability to pay 3)Collateral - assets pledged 4)Covenants - limitations/restrictions

Footnotes

Current value of inventory (LIFO/FIFO)

Liquidity analysis

Focusing on current assets and current liabilities

Solvency analysis

Focusing on long term assets and long term liabilities

Full disclosure principle

If info would make a difference in an economic decision, that piece of info must be in a report

Conservatism constraint

Info being analyzed is more conservative than optimistic -Company anticipates losses, not gains

Most importantly and closely scrutinized by internal and external representatives:

Management's discussion and analysis

Current

Paid in upcoming year, or operating cycle of business, whichever is longer (usually 1 yr) -If this is not met, it is a long term liability

Cost leadership

Producing standardized products at a very low cost for consumers who are price-sensitive -Low end and traditional

Product development

Seeking increased sales by improving present products/services or developing new ones

Income from continuing operations

Where company will show normal and recurring revenues, expenses, gains, and losses -Most important part of the inc stmt for financial analysts

When looking at current assets and current liabilities

you are looking at,Liquidity


Set pelajaran terkait

Chapter 2: The internet and World Wide Web

View Set

Social Studies Foundations Practice Questions

View Set

Rules, Regulations, and Law - All Lines (Quiz)

View Set

1.08 Quiz: Systems of Linear Inequalities

View Set

types of life insurance polices quiz

View Set

Personal Financial Management - Ch 3

View Set