Quant Cards

Ace your homework & exams now with Quizwiz!

b

01100010

c

01100011

d

01100100

e

01100101

f

01100110

g

01100111

h

01101000

i

01101001

p

01110000

q

01110001

s

01110011

v

01110110

0010

2

{(1

2), (2, 3), (3, 4)} is it anti symmetric and why?,there isn't any xRy and yRx

010000

20

010001

21

010010

22

010011

23

010100

24

010101

25

010110

26

010111

27

0011

3

Which cluster managers can be used with Spark?

Apache Mesos, Hadoop YARN, Spark standalone and Spark local: Local node or on single JVM. Drivers and executor runs in same JVM. In this case same node will be used for execution.

14

E (hex)

For options on Futures

d is,e^(-σ√h)

Risk-Neutral Pricing (C)

e^(-rh)*[p*Cu-(1-p*)Cd]

Real-World Pricing (C)

e^(-γh)[pCu-(1-p)Cd]

Formula: ∆call

e^(-∂T)*N(d1)

Forward Price Binomial Tree: u

exp[(r-∂)h+σ√h]

Forward Price Binomial Tree: d

exp[(r-∂)h-σ√h]

Jarrow-Rudd (Lognormal) Binomial Tree: u

exp[(r-∂-0.5σ^2)h+σ√h]

Jarrow-Rudd (Lognormal) Binomial Tree: d

exp[(r-∂-0.5σ^2)h-σ√h]

Cox-Ross-Rubinstein Binomial Tree: d

exp[-σ√h]

Cox-Ross-Rubinstein Binomial Tree: u

exp[σ√h]

Poset

reflexive ⋀ antisymmetric ⋀ transitive

o

01101111

r

01110010

t

01110100

u

01110101

w

01110111

x

01111000

y

01111001

z

01111010

000010

02

000011

03

110011

63

110101

65

110110

66

110111

67

0111

7

Call-Put Option Relationship: Gamma

Γcall=Γput

Call-Put Option Relationship: Psi

Ψcall-Ψput=-0.01TSe^(-∂T)

Formula: Elasticity

Ω=S∆/C

Call-Put Option Relationship: Theta

θcall-θput=[∂Se^(-∂T)-rKe^(-rT)]/365

Call-Put Option Relationship: Rho

ρcall-ρput=0.01TKe^(-rT)

Schroder Volatility Adjustment

σ(F)=σ(S)×S/F (Adjust ONLY if given historical σ)

Volatility Relationship: Option vs Underlying Asset

σ(option)=σ(stock)|Ω|

Sharpe Ratio Relationship: φcall vs φput

φcall=-φput

Sharpe Ratio (φ)

(α-r)/σ(stock)=(γ-r)/σ(call)

111011

73

j

01101010

k

01101011

Formula: ∆put

-e^(-∂T)*N(-d1)

l

01101100

m

01101101

n

01101110

000000

00

Space

00100000

Period

00101110

000001

01

a

01100001

000100

04

000101

05

000110

06

000111

07

0001

1

111101

75

111110

76

111111

77

1000

8

P(1/x

1/K,T),C(x,K,T)/(Kx)

001000

10

1010

10

001001

11

1011

11

001010

12

1100

12

001011

13

1101

13

001100

14

1110

14

001101

15

1111

15

001110

16

001111

17

d ? exp[(r-∂)h] ? u

<

Possible Values: Ψcall

<0

Possible Values: ρput

<0

Possible Values: ∆put

<0

Possible Values: θcall

<0 (usually)

Possible Values: θput

<0 (usually)

u ? exp[(r-∂)h] ? d

>

Possible Values: Vega(call)

>0

Possible Values: Vega(put)

>0

Possible Values: Γcall

>0

Possible Values: Γput

>0

Possible Values: Ψput

>0

Possible Values: ρcall

>0

Possible Values: ∆call

>0

10

A (hex)

What is a BlockManager?

Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap.

What is the difference between groupByKey and use reduceByKey ?

Ans : Avoid groupByKey and use reduceByKey or combineByKey instead. groupByKey shuffles all the data, which is slow. reduceByKey shuffles only the results of sub-aggregations in each partition of the data.

How would you control the number of partitions of a RDD?

Ans You can control the number of partitions of a RDD using repartition or coalesce operations.

What all are the data sources Spark can process?

Ans: -Hadoop File System (HDFS) - Cassandra (NoSQL databases) - HBase (NoSQL database) - S3 (Amazon WebService Storage : AWS Cloud)

Can you define the purpose of master in Spark architecture?

Ans: A master is a running Spark instance that connects to a cluster manager for resources. The master acquires cluster nodes to run executors.

Block Manger

Ans: Block Manager is a key-value store for blocks that acts as a cache. It runs on every node, i.e. a driver and executors, in a Spark runtime environment. It provides interfaces for putting and retrieving blocks both locally and remotely into various stores, i.e. memory, disk, and offheap. A BlockManager manages the storage for most of the data in Spark, i.e. block that represent a cached RDD partition, intermediate shuffle data, and broadcast data.

What is a Broadcast Variable?

Ans: Broadcast variables allow the programmer to keep a read-only variable cached on each machine rather than shipping a copy of it with tasks.

What is checkpointing?

Ans: Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed (HDFS) or local file system. RDD checkpointing that saves the actual intermediate RDD data to a reliable distributed file system.

What is DAGSchedular and how it performs?

Ans: DAGScheduler is the scheduling layer of Apache Spark that implements stage-oriented scheduling, i.e. after an RDD action has been called it becomes a job that is then transformed into a set of stages that are submitted as TaskSets for execution.

What do you mean by Dependencies in RDD lineage graph?

Ans: Dependency is a connection between RDDs after applying a transformation.

What is the difference between cache() and persist() method of RDD

Ans: RDDs can be cached (using RDD's cache() operation) or persisted (using RDD's persist(newLevel: StorageLevel) operation). The cache() operation is a synonym of persist() that uses the default storage level MEMORY_ONLY .

What is Shuffling?

Ans: Shuffling is a process of repartitioning (redistributing) data across partitions and may cause moving it across JVMs or even network when it is redistributed among executors. Avoid shuffling at all cost. Think about ways to leverage existing partitions. Leverage partial aggregation to reduce data transfer

What is Apache Spark Streaming?

Ans: Spark Streaming helps to process live stream data. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets, and can be processed using complex algorithms expressed with high-level functions like map, reduce, join and window.

What is Data locality / placement?

Ans: Spark relies on data locality or data placement or proximity to data source, that makes Spark jobs sensitive to where the data is located. It is therefore important to have Spark running on Hadoop YARN cluster if the data comes from HDFS.

Define Spark architecture

Ans: Spark uses a master/worker architecture. There is a driver that talks to a single coordinator called master that manages workers in which executors run. The driver and the executors run in their own Java processes.

What is Speculative Execution of a tasks?

Ans: Speculative tasks or task strugglers are tasks that run slower than most of the all tasks in a job.

What is Task, with regards to Spark Job execution?

Ans: Task is an individual unit of work for executors to run. It is an individual unit of physical execution (computation) that runs on a single machine for parts of your Spark application on a data. All tasks in a stage should be completed before moving on to another stage. -A task can also be considered a computation in a stage on a partition in a given job attempt. -A Task belongs to a single stage and operates on a single partition (a part of an RDD). -Tasks are spawned one by one for each stage and data partition.

What are the workers?

Ans: Workers or slaves are running Spark instances where executors live to execute tasks. They are the compute nodes in Spark. A worker receives serialized/marshalled tasks that it runs in a thread pool.

What is master URL in local mode?

Ans: You can run Spark in local mode using local , local[n] or the most general local[*]. The URL says how many threads can be used in total: -local uses 1 thread only. - local[n] uses n threads. - local[*] uses as many threads as the number of processors available to the Java virtual machine (it uses Runtime.getRuntime.availableProcessors() to know the number).

Which script will you use Spark Application, using spark-shell ?

Ans: You use spark-submit script to launch a Spark application, i.e. submit the application to a Spark deployment environment.

11

B (hex)

You have 20 bottles of pills. 19 bottles have 1.0 gram pills, but one has pills of weight 1.1 grams. Given a scale that provides an exact measurement, how would you find the heavy bottle? You can only use the scale once.

Because we can only use the scale once, we know something interesting: we must weigh multiple pills at the same time. In fact, we know we must weigh pills from at least 19 bottles at the same time. Other wise, if we skipped two or more bottles entirely, how could we distinguish between those missed bottles? Remember that we only have one chance to use the scale. So how can we weigh pills from more than one bottle and discover which bottle has the heavy pills? Let's suppose there were just two bottles, one of which had heavier pills. If we took one pill from each bottle, we would get a weight of 2.1 grams, but we wouldn't know which bottle contributed the extra 0.1 grams. We know we must treat the bottles differently somehow. If we took one pill from Bottle #1 and two pills from Bottle #2, what would the scale show? It depends. If Bottle #1 were the heavy bottle, we would get 3.1 grams. If Bottle #2 were the heavy bottle, we would get 3.2 grams. And that is the trick to this problem. We know the "expected" weight of a bunch of pills. The difference between the expected weight and the actual weight will indicate which bottle contributed the heavier pills, provided we select a different number of pills from each bottle. We can generalize this to the full solution: take one pill from Bottle #1, two pills from Bottle #2, three pills from Bottle #3, and so on. Weigh this mix of pills. If all pills were one gram each, the scale would read 210 grams (1 + 2 + • • • + 20 = 20 * 21 / 2 = 210). Any "overage" must come from the extra 0.1 gram pills. This formula will tell you the bottle number: weight- 210 grams 0. l grams So, if the set of pills weighed 211.3 grams, then Bottle #13 would have the heavy pills.

12

C (hex)

Call Profit

C(S(h),K,T-h)-C(S(0),K,T)e^(rh)

Put-Call Parity (General)

C(S,K,T)-P(S,K,T)=F0,T[S]*exp[-rT]-K*exp[-rT]

Black-Scholes Formula

C(S,K,σ,r,T,∂)=F0,T[S]*e^(-rT)*N(d1)-F0,T[K]*e^(-rT)*N(d2)

For K₁>K₂

C(S,K₁,T) ? C(S,K₂,T),≤

For K₁>K₂

C(S,K₁,T)-C(S,K₂,T) ? K₂-K₁,≥

Put-Call Parity (Exchange Options)

C(S,Q,T)-P(S,Q,T)=F0,T[S]*e^(-rT)-F0,T[Q]*e^(-rT)

Put-Call Parity (Currency Options)

C(x,K,T)-P(x,K,T)=xe^(-rfT)-Ke^(-rdT)

For t<T

Camer(T) ? Camer(t),≥

What do you understand by a closure in Scala?

Closure is a function in Scala where the return value of the function depends on the value of one or more variables that have been declared outside the function.

What is the advantage of companion objects in Scala?

Companion objects are beneficial for encapsulating things and they act as a bridge for writing functional and object oriented programming code. Using companion objects, the Scala programming code can be kept more concise as the static keyword need not be added to each and every attribute. Companion objects provide a clear separation between static and non-static methods in a class because everything that is located inside a companion object is not a part of the class's runtime objects but is available from a static context and vice versa.

Schroder Method

Construct tree using pre-paid forward price (i.e., S-PV(Div)). The stock price at each node is pre-paid forward price + present value of unpaid dividends (only used to determine payoff at a node).

13

D (hex)

Black-Scholes Model Pricing on Futures

Discount at FUTURE expiry instead of option expiry.

Cause of Non-Recombining Trees

Discrete Dividends (e.g., (Se^(rh)-D)e^(σ√h))

Discrete Dividend Exercise (CALL)

Exercise cum-dividend

Discrete Dividend Exercise (PUT)

Exercise ex-dividend

15

F (hex)

antireflexive (irreflexive)

For every x ∈ A, x(not R) x | every guy in the set is NOT related to itself

reflexive

For every x ∈ A, xRx | every guy in the set is related to its self

Anti Symmetric

For every x,y ∈ A, xRy and yRx → x=y. | if xRy and yRx then x equals y

Symmetric

For every x,y ∈ A, xRy → yRx | if x is related to y then y is related to x

Transitive

For every x,y,z ∈ A, xRy and yRz → xRz. | if there's some guy x that's related to y and y's related to z then, x is related to z

Greek Relationship: Written vs Purchased

Greek(Written) = -Greek(Purchased)

Vasicek model

In finance, the Vasicek model is a mathematical model describing the evolution of interest rates. It is a type of one-factor short rate model as it describes interest rate movements as driven by only one source of market risk. The model can be used in the valuation of interest rate derivatives, and has also been adapted for credit markets. It was introduced in 1977 by Oldřich Vašíček[1] and can be also seen as a stochastic investment model.

Option Greek Definition: Gamma

Increase in delta per increase in stock price (∂^2∆/∂S^2) (CONVEXITY)

Option Greek Definition: Theta

Increase in option value per decrease in time to expiry (-1/365∂C/∂t)

Option Greek Definition: Delta

Increase in option value per increase in stock price (∂C/∂S) (SLOPE)

Option Greek Definition: Psi

Increase in option value per percentage point increase in the dividend yield (0.01∂C/∂∂)

Option Greek Definition: Rho

Increase in option value per percentage point increase in the risk-free rate (0.01∂C/∂r)

P(S

K,T) Payoff,max(0,K-S(T))

C(S

K,T) Payoff,max(0,S(T)-K)

Synthetic Treasury

Ke^(-rT)=P(S,K,T)-C(S,K,T)+F0,T[S]*e^(-rT)

Put Boundaries

K≥Pamer≥Peur≥max(0,Ke^(-rT)-F0,T[S]*e^(-rT))

Estimating Volatility

Let x(i)=ln[S(i)/S(i-1)]. Then E[x^2]=∑[x(i)^2/n], x-bar=∑[x(i)/n] and s^2=[n/(n-1)]*(E[x^2]-x-bar^2) => σ≈√s^2√t

Early Exercise on a Non-Dividend Paying Stock

Never exercise Camer early! (i.e., Camer=Ceur)

In-the-Money

Option would have positive payout if it could be exercised.

Out-of-the-Money

Option would not have a payout if it could be exercised.

Put Profit

P(S(h),K,T-h)-P(S(0),K,T)e^(rh)

For K₁>K₂

P(S,K₁,T) ? P(S,K₂,T),≥

For K₁>K₂

P(S,K₁,T)-P(S,K₂,T) ? K₁-K₂,≤

Exchange Option Equality

P(S,Q,T)=C(Q,S,T)

For t<T

Pamer(T) ? Pamer(t),≥

What is the difference between concurrency and parallelism?

People often confuse with the terms concurrency and parallelism. When several computations execute sequentially during overlapping time periods it is referred to as concurrency whereas when processes are executed simultaneously it is known as parallelism. Parallel collection, Futures and Async library are examples of achieving parallelism in Scala.

Definition: Elasticity

Percentage change in option value as a function of the percentage change in the value of the underlying asset.

Calendar Spread Profit

Profit if S(0)=S(T), but loss possible for S(0) substantially different from S(T).

Equivalence Relation

Reflexive, Symmetric, and Transitive

Binary Relation

Relationship between two sets, A and B, where the relation is a subset of A x B

Yarn Components

ResourceManager: runs as a master daemon and manages ApplicationMasters and NodeManagers. ApplicationMaster: is a lightweight process that coordinates the execution of tasks of an application and asks the ResourceManager for resource containers for tasks. It monitors tasks, restarts failed ones, etc. It can run any type of tasks, be them MapReduce tasks or Giraph tasks, or Spark tasks. NodeManager offers resources (memory and CPU) as resource containers. NameNode Container: can run tasks, including ApplicationMasters.

Pre-paid Forward on a Non-Dividend Paying Stock

S(t)

Forward on a Stock with Continuous Dividends

S(t)*exp[(r-∂)(T-t)]

Pre-paid Forward on a Stock with Continuous Dividends

S(t)*exp[-∂(T-t)]

Forward on a Non-Dividend Paying Stock

S(t)*exp[r(T-t)]

Forward on a Sock with Discrete Dividends

S(t)*exp[r(T-t)]-CumValue(Div)

Pre-paid Forward on a Sock with Discrete Dividends

S(t)-PV(Div)

At-the-Money

S=K

What is a Scala Map?

Scala Map is a collection of key value pairs wherein the value in a map can be retrieved using the key. Values in a Scala Map are not unique but the keys are unique. Scala supports two kinds of maps- mutable and immutable. By default, Scala supports immutable map and to make use of the mutable map, programmers have to import the scala.collection.mutable.Map class explicitly. When programmers want to use mutable and immutable map together in the same program then the mutable map can be accessed as mutable.map and the immutable map can just be accessed with the name of the map.

Which Scala library is used for functional programming?

Scalaz library has purely functional data structures that complement the standard Scala library. It has pre-defined set of foundational type classes like Monad, Functor, etc.

Calendar Spread

Sell C(S,K,T) and Buy C(S,K,T+t) for t>0

Speculative Execution (SPARK)

Speculative execution of tasks is a health-check procedure that checks for tasks to be speculated, i.e. running slower in a stage than the median of all successfully completed tasks in a taskset . Such slow tasks will be re-launched in another worker. It will not stop the slow tasks, but run a new copy in parallel.

Black-Scholes Model Assumptions

Stock returns are normally distributed and independent over time. Risk-free rate, volatility and dividends are known and constant. No transaction costs. Possible to short-sell any stock and borrow any amount of money at the risk-free rate.

Replicating Portfolio

Sue^(∂h)∆+Be^(rh)=Cu | Sde^(∂h)∆+Be^(rh)=Cd

Early Exercise IS Optimal for Put (Necessary Conditions)

S∂<Kr | K(1-e^(-rT))>S(1-e^(-∂T))+C(S,K,T) or S-K>Ceur(S,K,T)

Early Exercise IS Optimal for Call (Necessary Conditions)

S∂>Kr | S(1-e^(-∂T))>K(1-e^(-rT))+P(S,K,T) or S-K>Ceur(S,K,T)

Call Boundaries

S≥Camer≥Ceur≥max(0,F0,T[S]*e^(-rT)-K*e^(-rT))

What is a Monad in Scala?

The simplest way to define a monad is to relate it to a wrapper. Any class object is taken wrapped with a monad in Scala. Just like you wrap any gift or present into a shiny wrapper with ribbons to make them look attractive, Monads in Scala are used to wrap objects and provide two important operations - Identity through "unit" in Scala Bind through "flatMap" in Scala

Credit event binary options (CEBO)

These are options that provides a fixed payoff if a particular company suffers a credit event such as bankruptcy, failure to pay interest or principal on debt, and a restructuring of debt.

What do you understand by "Unit" and "()" in Scala?

Unit is a subtype of scala.anyval and is nothing but Scala equivalent of Java void that provides the Scala with an abstraction of the java platform. Empty tuple i.e. () in Scala is a term that represents unit value.

Call-Put Option Relationship: Vega

Vega(call)=Vega(put)

Spark and HDFS

With HDFS the Spark driver contacts NameNode about the DataNodes (ideally local) containing the various blocks of a file or directory as well as their locations (represented as InputSplits ), and then schedules the work to the "SparkWorkers. Spark's compute nodes / workers should be running on storage nodes.

Check Pointing (SPARK)

You mark an RDD for checkpointing by calling RDD.checkpoint() . The RDD will be saved to a file inside the checkpoint directory and all references to its parent RDDs will be removed. This function has to be called before any job has been executed on this RDD.

For K₁>K₂>K₃

[C(S,K₁,T)-C(S,K₂,T)]/[K₁-K₂] ? [C(S,K₂,T)-C(S,K₃,T)]/[K₂-K₃],≥

For K₁>K₂>K₃

[P(S,K₁,T)-P(S,K₂,T)]/[K₁-K₂] ? [P(S,K₂,T)-P(S,K₃,T)]/[K₂-K₃],≥

Risk-Neutral Pricing (p*)

[exp[(r-∂)h]-d]/[u-d]

Real-World Pricing (p)

[exp[(α-∂)h]-d]/[u-d]

d2

ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}-0.5σ^2T

d1

ln{[F0,T[S]*e^(-rT)]/[F0,T[K]*e^(-rT)]}+0.5σ^2T

For options on Futures

p* is,[1-d]/[u-d]

For European options

the probability for each ending node is,(nCx)(p*^x)[(1-p*)^(n-x)] for n=nodes and x=#up

For American options

the value at each node is,max(calculated value, exercise value)

For options on Futures

u is,e^(σ√h)

xCy if and only if |x - y| ≤ 1

where x is a Real number, and y is an integer,ex)

Forward on Currency

x(t)*exp[(rd-rf)(T-t)]

Pre-paid Forward on Currency

x(t)*exp[-rf(T-t)]

011000

30

011001

31

011010

32

011011

33

011100

34

011101

35

011110

36

011111

37

0100

4

100000

40

100001

41

100010

42

100011

43

100100

44

100101

45

100110

46

100111

47

0101

5

101000

50

101001

51

101010

52

101011

53

101100

54

101101

55

101110

56

101111

57

0110

6

110000

60

110001

61

110010

62

110100

64

111000

70

111001

71

111010

72

111100

74

1001

9

Please define executors in detail?

Ans: Executors are distributed agents responsible for executing tasks. Executors provide in- memory storage for RDDs that are cached in Spark applications. When executors are started they register themselves with the driver and communicate directly to execute tasks.

What is the purpose of Driver in Spark Architecture?

Ans: A Spark driver is the process that creates and owns an instance of SparkContext. It is your Spark application that launches the main method in which the instance of SparkContext is created. -Drive splits a Spark application into tasks and schedules them to run on executors. - A driver is where the task scheduler lives and spawns tasks across workers. - A driver coordinates workers and overall execution of tasks.

What is stage, with regards to Spark Job execution?

Ans: A stage is a set of parallel tasks, one per partition of an RDD, that compute partial results of a function executed as part of a Spark job.

What is Apache Parquet format?

Ans: Apache Parquet is a columnar storage format

You have RDD storage level defined as MEMORY_ONLY_2 , what does _2 means ?

Ans: number _2 in the name denotes 2 replicas

Does shuffling change the number of partitions?

Ans: No, By default, shuffling doesn't change the number of partitions, but their content

What is coalesce transformation?

Ans: The coalesce transformation is used to change the number of partitions. It can trigger RDD shuffling depending on the second shuffle boolean input parameter (defaults to false ).

How can you define Spark Accumulators?

Ans: This are similar to counters in Hadoop MapReduce framework, which gives information regarding completion of tasks, or how much data is processed etc.

Please explain, how worker's work, when a new Job submitted to them?

Ans: When SparkContext is created, each worker starts one executor. This is a separate java process or you can say new JVM, and it loads application jar in this JVM. Now executors connect back to your driver program and driver send them commands, like, foreach, filter, map etc. As soon as the driver quits, the executors shut down

When you call join operation on two pair RDDs e.g. (K, V) and (K, W), what is the result?

Ans: When called on datasets of type (K, V) and (K, W), returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key

What is the advantage of using Scala over other functional programming languages?

As the name itself indicates Scala meaning Scalable Language, its high scalable, maintainability, productivity and testability features make it advantageous to use Scala. Singleton and Companion Objects in Scala provide a cleaner solution unlike static in other JVM languages like Java. It eliminates the need for having a ternary operator as if blocks', 'for-yield loops', and 'code' in braces return a value in Scala.

Dominos: There is an 8x8 chessboard in which two diagonally opposite corners have been cut off. You are given 31 dominos, and a single domino can cover exactly two squares. Can you use the 31 dominos to cover the entire board? Prove your answer (by providing an example or showing why it's impossible).

At first, it seems like this should be possible. It's an 8 x 8 board, which has 64 squares, but two have been cut off, so we're down to 62 squares. A set of 31 dominoes should be able to fit there, right? When we try to lay down dominoes on row 1, which only has 7 squares, we may notice that one domino must stretch into the row 2. Then, when we try to lay down dominoes onto row 2, again we need to stretch a domino into row 3. For each row we place, we'll always have one domino that needs to poke into the next row. No matter how many times and ways we try to solve this issue, we won't be able to successfully lay down all the dominoes. There's a cleaner, more solid proof for why it won't work. The chessboard initially has 32 black and 32 white squares. By removing opposite corners (which must be the same color), we're left with 30 of one color and 32 of the other color. Let's say, for the sake of argument, that we have 30 black and 32 white squares. Each domino we set on the board will always take up one white and one black square. Therefore, 31 dominos will take up 31 white squares and 31 black squares exactly. On this board, however, we must have 30 black squares and 32 white squares. Hence, it is impossible.

Real-World Pricing (Replicating Portfolio)

Ce^(γh)=S∆e^(αh)+Be^(rh)

For t<T

Ceur(T) ? Ceur(t) on a Non-Dividend Paying Stock,≥

DAGScheduler

DAGScheduler uses an event queue architecture in which a thread can post DAGSchedulerEvent events, e.g. a new job or stage being submitted, that DAGScheduler reads and executes sequentially.

In the new post-apocalyptic world, the world queen is desperately concerned about the birth rate. Therefore, she decrees that all families should ensure that they have one girl or else they face massive fines. If all families abide by this policy-that is, they have continue to have children until they have one girl, at which point they immediately stop-what will the gender ratio of the new generation be? (Assume that the odds of someone having a boy or a girl on any given pregnancy is equal.) Solve this out logically and then write a computer simulation of it.

If each family abides by this policy, then each family will have a sequence of zero or more boys followed by a single girl. That is, if "G" indicates a girl and "B" indicates a boy, the sequence of children will look like one of: G; BG; BBG; BBBG; BBBBG; and so on. We can solve this problem multiple ways. Logically If the earlier sum is 1, this would mean that the gender ratio is even. Families contribute exactly one girl and on average one boy. The birth policy is therefore ineffective. Does this make sense? At first glance. this seems wrong. The policy is clPsigned to favor girls as it ensures that all families have a girl. On the other hand, the families that keep having children contribute (potentially) multiple boys to the population. This could offset the impact of the "one girl" policy. One way to think about this is to imagine that we put all the gender sequence of each family into one giant string. So if family 1 has BG, family 2 has BBG, and family 3 has G, we would write BGBBGG. In fact, we don't really care about the groupings of families because we're concerned about the population as a whole. As soon as a child is born, we can just append its gender (B or G) to the string. What are the odds of the next character being a G? Well, if the odds of having a boy and girl is the same, then the odds of the next character being a G is 50%. Therefore, roughly half of the string should be Gs and half should be Bs, giving an even gender ratio. This actually makes a lot of sense. Biology hasn't been changed. Half of newborn babies are girls and half are boys. Abiding by some rule about when to stop having children doesn't change this fact. Therefore, the gender ratio is 50% girls and 50% boys.

Option Greek Definition: Vega

Increase in option value per percentage point increase in volatility (0.01∂C/∂σ)

Blue-Eyed Island: A bunch of people are living on an island, when a visitor comes with a strange order: all blue-eyed people must leave the island as soon as possible. There will be a flight out at 8:00pm every evening. Each person can see everyone else's eye color, but they do not know their own (nor is anyone allowed to tell them). Additionally, they do not know how many people have blue eyes, although they do know that at least one person does. How many days will it take the blue-eyed people to leave?

Let's apply the Base Case and Build approach. Assume that there are n people on the island and c of them have blue eyes. We are explicitly told that c > 0. Case c = 1: Exactly one person has blue eyes. Assuming all the people are intelligent, the blue-eyed person should look around and realize that no one else has blue eyes. Since he knows that at least one person has blue eyes, he must conclude that it is he who has blue eyes. Therefore, he would take the flight that evening. Case c = 2: Exactly two people have blue eyes. The two blue-eyed people see each other, but are unsure whether c is 1 or 2. They know, from the previous case, that if c = 1, the blue-eyed person would leave on the first night. Therefore, if the other blue-eyed person is still there, he must deduce that c = 2, which means that he himself has blue eyes. Both men would then leave on the second night. Case c > 2: The General Case. As we increase c, we can see that this logic continues to apply. If c = 3, then those three people will immediately know that there are either 2 or 3 people with blue eyes. If there were two people, then those two people would have left on the second night. So, when the others are still around after that night, each person would conclude that c = 3 and that they, therefore, have blue eyes too. They would leave that night. This same pattern extends up through any value of c. Therefore, if c men have blue eyes, it will take c nights for the blue-eyed men to leave. All will leave on the same night

You have a basketball hoop and someone says that you can play one of two games. Game 1: You get one shot to make the hoop. Game 2: You get three shots and you have to make two of three shots. If p is the probability of making a particular shot, for which values of p should you pick one game or the other?

Probability of winning Game 1: The probability of winning Game 1 is p, by definition. Probability of winning Game 2: Lets ( k, n) be the probability of making exactly k shots out of n. The probability of winning Game 2 is the probability of making exactly two shots out of three OR making all three shots. In other words: P(winning) = s(2 ,3) + s(3 ,3) The probability of making all three shots is: s ( 3, 3) = p^3 The probability of making exactly two shots is: P(making 1 and 2, and missing 3) + P(making 1 and 3, and missing 2) + P(miss ing 1, and making 2 and 3) * p * (1 -p) + p * (1 -p) * p + (1 -p) * p * 3(1-p)p^2 Adding these together, we get: p^3 + 3 ( 1 - p) p^2 p 3 + 3p^2 - 3p^3 3p 2 - 2p^3 Which game should you play? You should play Game 1 if P ( Game 1) > P ( Game 2): p > 3p^2 - 2p^3 • 1 > 3p - 2p^2 2p^2 - 3p + 1 > 0 (2p - l)(p - 1) > 0 Both terms must be positive, or both must be negative. But we know p < 1, so p - 1 < 0. This means both terms must be negative. 2p -1 < 0 2p < 1 p < • 5 So, we should play Game 1 if0 < p < • 5 and Game 2 if. 5 < p < 1. lf p = 0,0.5,or 1,then P(Game 1) = P(Game 2),so it doesn't matter which game we play.

Asymmetric

∀a,b∈X(aRb→¬(bRa)) i.e. antisymmetric ⋀ irreflexive

Call-Put Option Relationship: Delta

∆call-∆put=e^(-∂T)

Ωportfolio

∆portfolio*S/Value(Portfolio) (Assumes S is the underlying asset for all portfolio instruments.)

Portfolio Greeks

∑Greek(i) where Greek(i) is the Greek for investment i in the portfolio

Possible Values Ωput

≤0

Possible Values: Ωcall

≥1

What is the Default level of parallelism in Spark?

Default level of parallelism is the number of partitions when not specified explicitly by a user.

Gamma is the greatest when an option: A) is deep out of the money. B) is deep in the money. C) is at the money.

Gamma, the curvature of the optionprice/assetprice function, is greatest when the asset is at the money.

Why Spark, even Hadoop exists?(stream)

-Near real-time data processing: Spark also supports near real-time streaming workloads via Spark Streaming application framework

Bullish Spread

A bullish spread increases in value as the stock price increases, whereas a bearish spread increases in value as the stock price decreases.

Can we broadcast an RDD?

Ans: Yes, you should not broadcast a RDD to use in tasks and Spark will warn you. It will not stop you, though.

What is statistical power?

Wikipedia defines Statistical power or sensitivity of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H0) when the alternative hypothesis (H1) is true. To put in another way, Statistical power is the likelihood that a study will detect an effect when the effect is present. The higher the statistical power, the less likely you are to make a Type II error (concluding there is no effect when, in fact, there is).

Why Spark, even Hadoop exists?(2)

-In Memory Processing: MapReduce uses disk storage for storing processed intermediate data and also read from disks which is not good for fast processing. . Because Spark keeps data in Memory (Configurable), which saves lot of time, by not reading and writing data to disk as it happens in case of Hadoop.

How do you take millions of users with 100's transactions each, amongst 10k's of products and group the users together in meaningful segments?

1. Some exploratory data analysis (get a first insight) Transactions by date Count of customers Vs number of items bought Total items Vs total basket per customer Total items Vs total basket per area 2. Create new features (per customer): Counts: Total baskets (unique days) Total items Total spent Unique product id Distributions: Items per basket Spent per basket Product id per basket Duration between visits Product preferences: proportion of items per product cat per basket 3. Too many features, dimension-reduction? PCA? 4. Clustering: PCA 5. Interpreting model fit View the clustering by principal component axis pairs PC1 Vs PC2, PC2 Vs PC1. Interpret each principal component regarding the linear combination it's obtained from; example: PC1=spendy axis (proportion of baskets containing spendy items, raw counts of items and visits)

Options Combination

A combination is defined as any strategy that uses both puts and calls

When Spark works with file.txt.gz, how many partitions can be created?

Ans: When using textFile with compressed files ( file.txt.gz not file.txt or similar), Spark disables splitting that makes for an RDD with only 1 partition (as reads against gzipped files cannot be parallelized). In this case, to change the number of partitions you should do repartitioning.Please note that Spark disables splitting for compressed files and creates RDDs with only 1 partition . In such cases, it's helpful to use sc.textFile('demo.gz') and do repartitioning using rdd.repartition(100) as follows: rdd = sc.textFile('demo.gz') rdd = rdd.repartition(100) With the lines, you end up with rdd to be exactly 100 partitions of roughly equal in size.

What is wide Transformations?(Spark)

Ans: Wide transformations are the result of groupByKey and reduceByKey . The data required to compute the records in a single partition may reside in many partitions of the parent RDD. All of the tuples with the same key must end up in the same partition, processed by the same task. To satisfy these operations, Spark must execute RDD shuffle, which transfers data across cluster and results in a new stage with a new set of partitions.

Is it possible to have multiple SparkContext in single JVM?

Ans: Yes, spark.driver.allowMultipleContexts is true (default: false ). If true Spark logs warnings instead of throwing exceptions when multiple SparkContexts are active, i.e. multiple SparkContext are running in this JVM. When creating an instance of SparkContex.

How would you brodcast, collection of values over the Spark executors?

Ans: sc.broadcast("hello")

When would you use random forests Vs SVM and why?

In a case of a multi-class classification problem: SVM will require one-against-all method (memory intensive) If one needs to know the variable importance (random forests can perform it as well) If one needs to get a model fast (SVM is long to tune, need to choose the appropriate kernel and its parameters, for instance sigma and epsilon) In a semi-supervised learning context (random forest and dissimilarity measure): SVM can work only in a supervised learning mode

What is latent semantic indexing? What is it used for? What are the specific limitations of the method?

Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text Based on the principle that words that are used in the same contexts tend to have similar meanings "Latent": semantic associations between words is present not explicitly but only latently For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations Used for: Learning correct word meanings Subject matter comprehension Information retrieval Sentiment analysis (social network analysis)

Is it better to have too many false positives, or too many false negatives? Explain.

It depends on the question as well as on the domain for which we are trying to solve the question. In medical testing, false negatives may provide a falsely reassuring message to patients and physicians that disease is absent, when it is actually present. This sometimes leads to inappropriate or inadequate treatment of both the patient and their disease. So, it is desired to have too many false positive. For spam filtering, a false positive occurs when spam filtering or spam blocking techniques wrongly classify a legitimate email message as spam and, as a result, interferes with its delivery. While most anti-spam tactics can block or filter a high percentage of unwanted emails, doing so without creating significant false-positive results is a much more demanding task. So, we prefer too many false negatives over many false positives.

Why Spark, even Hadoop exists?

Iterative Algorithm: Generally MapReduce is not good to process iterative algorithms like Machine Learning and Graph processing. Graph and Machine Learning algorithms are iterative by nature and less saves to disk, this type of algorithm needs data in memory to run algorithm steps again and again or less transfers over network means better performance.

Common metrics in regression:

Mean Squared Error Vs Mean Absolute Error RMSE gives a relatively high weight to large errors. The RMSE is most useful when large errors are particularly undesirable. The MAE is a linear score: all the individual differences are weighted equally in the average. MAE is more robust to outliers than MSE. RMSE=1n∑ni=1(yi−y^i)2−−−−−−−−−−−−−−√RMSE=1n∑i=1n(yi−y^i)2 MAE=1n∑ni=1|yi−y^i|MAE=1n∑i=1n|yi−y^i| Root Mean Squared Logarithmic Error RMSLE penalizes an under-predicted estimate greater than an over-predicted estimate (opposite to RMSE) RMSLE=sqrt(1/n∑ni=1(log(pi+1)−log(ai+1))^2) Where pi is the ith prediction, ai the ith actual response, log(b) the natural logarithm of b. Weighted Mean Absolute Error The weighted average of absolute errors. MAE and RMSE consider that each prediction provides equally precise information about the error variation, i.e. the standard variation of the error term is constant over all the predictions. Examples: recommender systems (differences between past and recent products) WMAE=1/∑wi(∑ni=1wi|yi−y^i|)

How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?

Often it is observed that in the pursuit of rapid innovation (aka "quick fame"), the principles of scientific methodology are violated leading to misleading innovations, i.e. appealing insights that are confirmed without rigorous validation. One such scenario is the case that given the task of improving an algorithm to yield better results, you might come with several ideas with potential for improvement. An obvious human urge is to announce these ideas ASAP and ask for their implementation. When asked for supporting data, often limited results are shared, which are very likely to be impacted by selection bias (known or unknown) or a misleading global minima (due to lack of appropriate variety in test data). Data scientists do not let their human emotions overrun their logical reasoning. While the exact approach to prove that one improvement you've brought to an algorithm is really an improvement over not doing anything would depend on the actual case at hand, there are a few common guidelines: Ensure that there is no selection bias in test data used for performance comparison Ensure that the test data has sufficient variety in order to be symbolic of real-life data (helps avoid overfitting) Ensure that "controlled experiment" principles are followed i.e. while comparing performance, the test environment (hardware, etc.) must be exactly the same while running original algorithm and new algorithm Ensure that the results are repeatable with near similar results Examine whether the results reflect local maxima/minima or global maxima/minima One common way to achieve the above guidelines is through A/B testing, where both the versions of algorithm are kept running on similar environment for a considerably long time and real-life input data is randomly split between the two. This approach is particularly common in Web Analytics.

How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression.

Proposed methods for model validation: If the values predicted by the model are far outside of the response variable range, this would immediately indicate poor estimation or model inaccuracy. If the values seem to be reasonable, examine the parameters; any of the following would indicate poor estimation or multi-collinearity: opposite signs of expectations, unusually large or small values, or observed inconsistency when the model is fed new data. Use the model for prediction by feeding it new data, and use the coefficient of determination (R squared) as a model validity measure. Use data splitting to form a separate dataset for estimating model parameters, and another for validating predictions. Use jackknife resampling if the dataset contains a small number of instances, and measure validity with R squared and mean squared error (MSE).

Explain what precision and recall are. How do they relate to the ROC curve?

ROC curve represents a relation between sensitivity (RECALL) and specificity(NOT PRECISION) and is commonly used to measure the performance of binary classifiers. However, when dealing with highly skewed datasets, Precision-Recall (PR) curves give a more representative picture of performance.

What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method

Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components. Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error. Algorithm: 1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable 2) Compute covariance matrix Σ 3) Compute eigenvectors of Σ 4) Choose k principal components so as to retain x% of the variance (typically x=99) Applications: 1) Compression - Reduce disk/memory needed to store data - Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set 2. Visualization: 2 or 3 principal components, so as to summarize data Limitations: - PCA is not scale invariant - The directions with largest variance are assumed to be of most interest - Only considers orthogonal transformations (rotations) of the original variables - PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not - If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances 11. Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important False positive Improperly reporting the presence of a condition when it's not in reality. Example: HIV positive test when the patient is actually HIV negative False negative Improperly reporting the absence of a condition when in reality it's the case. Example: not detecting a disease when the patient has this disease. When false positives are more important than false negatives: - In a non-contagious disease, where treatment delay doesn't have any long-term consequences but the treatment itself is grueling - HIV test: psychological impact When false negatives are more important than false positives: - If early treatment is important for good outcomes - In quality control: a defective item passes through the cracks! - Software testing: a test to catch a virus has failed

How to define/select metrics?

Type of task: regression? Classification? Business goal? What is the distribution of the target variable? What metric do we optimize for? Regression: RMSE (root mean squared error), MAE (mean absolute error), WMAE(weighted mean absolute error), RMSLE (root mean squared logarithmic error)... Classification: recall, AUC, accuracy, misclassification error, Cohen's Kappa...

Assume you need to generate a predictive model using multiple regression. Explain how you intend to validate this model

Validation using R2: - % of variance retained by the model - Issue: R2 is always increased when adding variables - R2=RSStot−RSSresRSStot=RSSregRSStot=1−RSSresRSStot Analysis of residuals: - Heteroskedasticity (relation between the variance of the model errors and the size of an independent variable's observations) - Scatter plots residuals Vs predictors - Normality of errors - Etc. : diagnostic plots Out-of-sample evaluation: with cross-validation

Ridge regression:

We use an L2L2 penalty when fitting the model using least squares We add to the minimization problem an expression (shrinkage penalty) of the form λ×∑coefficients λ: tuning parameter; controls the bias-variance tradeoff; accessed with cross-validation A bit faster than the lasso β^ridge=argminβ{∑(yi−β0−∑(xij)βj)^2+λ∑β^2}

What are feature vectors?

n-dimensional vector of numerical features that represent some object term occurrences frequencies, pixels of an image etc. Feature space: vector space associated with these vectors

Explain what resampling methods are and why they are useful

repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model example: repeatedly draw different samples from training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fit differ most common are: cross-validation and the bootstrap cross-validation: random sampling with no replacement bootstrap: random sampling with replacement cross-validation: evaluating model performance, model selection (select the appropriate level of flexibility) bootstrap: mostly used to quantify the uncertainty associated with a given estimator or statistical learning method

The fixedrate payer in an interestrate swap has a position equivalent to a series of: A) long interestputs and short interestrate calls. B) short interestrate puts and long interestrate calls. C) long interestrate puts and calls.

short interestrate puts and long interestrate calls. The fixedrate payer has profits when short rates rise and losses when short rates fall, equivalent to writing puts and buying calls.

What does NLP stand for?

"Natural language processing"! Interaction with human (natural) and computers languages Involves natural language understanding Major tasks: - Machine translation - Question answering: "what's the capital of Canada?" - Sentiment analysis: extract subjective information from a set of documents, identify trends or public opinions in the social media - Information retrieval

Options Credit Spread

A credit spread results from buying a long position that costs less than the premium received selling the short position of the spread

Which data scientists do you admire most? which startups?

DJ Patil, First US Chief Data Scientist, for using Data Science to make US government work better. Hadley Wickham, for his fantastic work on Data Science and Data Visualization in R, including dplyr, ggplot2, and Rstudio.

RDD

Resilient: Fault-tolerant and so able to recomputed missing or damaged partitions on node failures with the help of RDD lineage graph. Distributed: across clusters. Dataset: is a collection of partitioned data.

Which scheduler is used by SparkContext by default?

By default, SparkContext uses DAGScheduler , but you can develop your own custom DAGScheduler implementation.

Vertical Spread

A money spread, or vertical spread, involves the buying of options and the writing of other options with different strike prices, but with the same expiration dates.

Explain what a local optimum is and why it is important in a specific context, such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A solution that is optimal in within a neighboring set of candidate solutions In contrast with global optimum: the optimal solution among all others K-means clustering context: It's proven that the objective cost function will always decrease until a local optimum is reached. Results will depend on the initial random cluster assignment Determining if you have a local optimum problem: Tendency of premature convergence Different initialization induces different optima Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Calendar Spread

A time spread, or calendar spread, involves buying and writing options with different expiration dates. A horizontal spread is a time spread with the same strike prices. A diagonal spread has different strike prices and different expiration dates.

A stock is priced at 38 and the periodic riskfree rate of interest is 6%. What is the value of a twoperiod European put option with a strike price of 35 on a share of stock using a binomial model with an up factor of 1.15 and a riskneutral probability of 68%? A) $0.57. B) $0.64. C) $2.58.

A) $0.57. Given an up factor of 1.15, the down factor is simply the reciprocal of this number 1/1.15=0.87. Two down moves produce a stock price of 38 × 0.87 2 = 28.73 and a put value at the end of two periods of 6.27. An up and a down move, as well as two up moves leave the put option out of the money. You are directly given the probability of up = 0.68. The down probability = 0.32. The value of the put option is [0.32 2 × 6.27] / 1.06 2 = $0.57.

Which of the following is equivalent to a plain vanilla receive fixed currency swap? A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. B) A short position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. C) A short position in a foreign bond coupled with a long position in a dollardenominated floating rate note.

A) A long position in a foreign bond coupled with the issuance of a dollardenominated floating rate note. A long position in a fixed rate foreign bond will receive fixed coupons denominated in a foreign currency. The short floating rate note requires U.S. dollar denominated floatingrate payments. Combined, these are the same cash flow as a plain vanilla currency swap.

For a change in which of the following inputs into the BlackScholesMerton option pricing model will the direction of the change in a put's value and the direction of the change in a call's value be the same? A) Volatility. B) Exercise price. C) Riskfree rate.

A) Volatility. A decrease/increase in the volatility of the price of the underlying asset will decrease/increase both put values and call values. A change in the values of the other inputs will have opposite effects on the values of puts and calls.

For an interest rate swap, the swap spread is the difference between the: A) swap rate and the corresponding Treasury rate. B) fixed rate and the floating rate in a given period. C) average fixed rate and the average floating rate over the life of the contract.

A) swap rate and the corresponding Treasury rate. The swap spread is the swap rate minus the corresponding Treasury rate.

What is a RDD Lineage Graph

Ans: A RDD Lineage Graph (aka RDD operator graph) is a graph of the parent RDD of a RDD. It is built as a result of applying transformations to the RDD. A RDD lineage graph is hence a graph of what transformations need to be executed after an action has been called

How do you define RDD?

Ans: A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. It represents an immutable, partitioned collection of elements that can be operated on in parallel. Resilient Distributed Datasets (RDDs) are a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.

What is Preferred Locations

Ans: A preferred location (aka locality preferences or placement preferences) is a block location for an HDFS file where to compute each partition on. def getPreferredLocations(split: Partition): Seq[String] specifies placement preferences for a partition in an RDD.

What is the transformation?

Ans: A transformation is a lazy operation on a RDD that returns another RDD, like map , flatMap , filter , reduceByKey , join , cogroup , etc. Transformations are lazy and are not executed immediately, but only after an action have been executed.

How do you define actions?

Ans: An action is an operation that triggers execution of RDD transformations and returns a value (to a Spark driver - the user program). They trigger execution of RDD transformations to return values. Simply put, an action evaluates the RDD lineage graph. You can think of actions as a valve and until no action is fired, the data to be processed is not even in the pipes, i.e. transformations. Only actions can materialize the entire processing pipeline with real data.

Data is spread in all the nodes of cluster, how spark tries to process this data?

Ans: By default, Spark tries to read data into an RDD from the nodes that are close to it. Since Spark usually accesses distributed partitioned data, to optimize transformation operations it creates partitions to hold the data chunks

Please tell me , how execution starts and end on RDD or Spark Job

Ans: Execution Plan starts with the earliest RDDs (those with no dependencies on other RDDs or reference cached data) and ends with the RDD that produces the result of the action that has been called to execute.

How do you define SparkContext?

Ans: It's an entry point for a Spark Job. Each Spark application starts by instantiating a Spark context. A Spark application is an instance of SparkContext. Or you can say, a Spark context constitutes a Spark application. SparkContext represents the connection to a Spark execution environment (deployment mode). A Spark context can be used to create RDDs, accumulators and broadcast variables, access Spark services and run jobs.

What is Lazy evaluated RDD mean?

Ans: Lazy evaluated, i.e. the data inside RDD is not available or transformed until an action is executed that triggers the execution.

Why Spark is good at low-latency iterative workloads e.g. Graphs and Machine Learning?

Ans: Machine Learning algorithms for instance logistic regression require many iterations before creating optimal resulting model. And similarly in graph algorithms which traverse all the nodes and edges. Any algorithm which needs many iteration before creating results can increase their performance when the intermediate partial results are stored in memory or at very fast solid state drives.

How can you use Machine Learning library SciKit library which is written in Python, with Spark engine?

Ans: Machine learning tool written in Python, e.g. SciKit library, can be used as a Pipeline API in Spark MLlib or calling pipe().

What is Narrow Transformations? (Spark)

Ans: Narrow transformations are the result of map, filter and such that is from the data from a single partition only, i.e. it is self-sustained. An output RDD has partitions with records that originate from a single partition in the parent RDD. Only a limited subset of partitions used to calculate the result. Spark groups narrow transformations as a stage.

Can RDD be shared between SparkContexts?

Ans: No, When an RDD is created; it belongs to and is completely owned by the Spark context it originated from . RDDs can 't be shared between SparkContexts.

What are the possible operations on RDD?

Ans: RDDs support two kinds of operations: - transformations - lazy operations that return another RDD. - actions - operations that trigger computation and return values.

How would you the amount of memory to allocate to each executor?

Ans: SPARK_EXECUTOR_MEMORY sets the amount of memory to allocate to each executor.

How many concurrent task Spark can run for an RDD partition?

Ans: Spark can only run 1 concurrent task for every partition of an RDD, up to the number of cores in your cluster. So if you have a cluster with 50 cores, you want your RDDs to at least have 50 partitions (and probably 2-3x times that).As far as choosing a "good" number of partitions, you generally want at least as many as the number of executors for parallelism. You can get this computed value by calling sc.defaultParallelism .

How RDD helps parallel job processing?

Ans: Spark does jobs in parallel, and RDDs are split into partitions to be processed and written in parallel. Inside a partition, data is processed sequentially.

Why both Spark and Hadoop needed?

Ans: Spark is often called cluster computing engine or simply execution engine. Spark uses many concepts from Hadoop MapReduce. Both Spark and Hadoop work together well. Spark with HDFS and YARN gives better performance and also simplifies the work distribution on cluster. As HDFS is storage engine for storing huge volume of data and Spark as a processing engine (In memory as well as more efficient data processing). HDFS: It is used as a Storage engine for Spark as well as Hadoop. YARN: It is a framework to manage Cluster using pluggable scedular. Run other than MapReduce: With Spark you can run MapReduce algorithm as well as other higher level of operators for instance map(), filter(), reduceByKey(), groupByKey() etc.

Which all kind of data processing supported by Spark?

Ans: Spark offers three kinds of data processing using batch, interactive (Spark Shell), and stream processing with the unified API and data structures.

How can you define SparkConf?

Ans: Spark properties control most application settings and are configured separately for each application. These properties can be set directly on a SparkConf passed to your SparkContext. SparkConf allows you to configure some of the common properties (e.g. master URL and application name), as well as arbitrary key-value pairs through the set() method. For example, we could initialize an application with two threads as follows: Note that we run with local[2], meaning two threads - which represents minimal parallelism, which can help detect bugs that only exist when we run in a distributed context.

What is the advantage of broadcasting values across Spark Cluster?

Ans: Spark transfers the value to Spark executors once, and tasks can share it without incurring repetitive network transmissions when requested multiple times.

Give few examples , how RDD can be created using SparkContext

Ans: SparkContext allows you to create many different RDDs from input sources like: -Scala's collections: i.e. sc.parallelize(0 to 100) -Local or remote filesystems :sc.textFile("README.md") -Any Hadoop InputSource : using sc.newAPIHadoopFile

In Spark-Shell, which all contexts are available by default?

Ans: SparkContext and SQLContext

How can you create an RDD for a text file?

Ans: SparkContext.textFile

How can we distribute JARs to workers?

Ans: The jar you specify with SparkContext.addJar will be copied to all the worker nodes.

Which limits the maximum size of a partition?

Ans: The maximum size of a partition is ultimately limited by the available memory of an executor.

Give example of transformations that do trigger jobs(Spark)

Ans: There are a couple of transformations that do trigger jobs, e.g. sortBy , zipWithIndex , etc

Which all are the, ways to configure Spark Properties and order them least important to the most important.

Ans: There are the following ways to set up properties for Spark and user programs (in the order of importance from the least important to the most important): -conf/spark-defaults.conf - the default --conf - the command line option used by spark-shell and spark-submit -SparkConf

How many type of transformations exist?(spark)

Ans: There are two kinds of transformations: -narrow transformations -wide transformations

How would you hint, minimum number of partitions while transformation ?

Ans: You can request for the minimum number of partitions, using the second input parameter to many transformations. scala> sc.parallelize(1 to 100, 2).count Preferred way to set up the number of partitions for an RDD is to directly pass it as the second input parameter in the call like rdd = sc.textFile "hdfs://... /file.txt", 400) , here400 is the number of partitions. In this case, the partitioning makes for 400 splits that would be done by the Hadoop's Te tI putFor at , ot "park a d it ould ork u h faster. It'salso that the ode spawns 400 concurrent tasks to try to load file.txt directly into 400 partitions

How can you stop SparkContext and what is the impact if stopped?

Ans: You can stop a Spark context using SparkContext.stop() method. Stopping a Spark context stops the Spark Runtime Environment and effectively shuts down the entire Spark application

A portfolio manager holds 100,000 shares of IPRD Company (which is trading today for $9 per share) for a client. The client informs the manager that he would like to liquidate the position on the last day of the quarter, which is 2 months from today. To hedge against a possible decline in price during the next two months, the manager enters into a forward contract to sell the IPRD shares in 2 months. The riskfree rate is 2.5%, and no dividends are expected to be received during this time. However, IPRD has a historical dividend yield of 3.5%. The forward price on this contract is closest to: A) $905,175. B) $903,712. C) $901,494.

B) $903,712. The historical dividend yield is irrelevant for calculating the noarbitrage forward price because no dividends are expected to be paid during the life of the forward contract. FP = S 0 (1 + R f )^T 903,712 = 900,000(1.025)^2/12

An instantaneously riskless hedged portfolio has a delta of: A) anything; gamma determines the instantaneous risk of a hedge portfolio. B) 0. C) 1.

B) 0 A riskless portfolio is delta neutral; the delta is zero.

Which of the following is the best approximation of the gamma of an option if its delta is equal to 0.6 when the price of the underlying security is 100 and 0.7 when the price of the underlying security is 110? A) 1.00. B) 0.01. C) 0.10.

B) 0.01. The gamma of an option is computed as follows: Gamma = change in delta/change in the price of the underlying = (0.7 0.6)/(110 100) = 0.01

Referring to putcall parity, which one of the following alternatives would allow you to create a synthetic European call option? A) Sell the stock; buy a European put option on the same stock with the same exercise price and the same maturity; invest an amount equal to the present value of the exercise price in a purediscount riskless bond. B) Buy the stock; buy a European put option on the same stock with the same exercise price and the same maturity; short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. C) Buy the stock; sell a European put option on the same stock with the same exercise price and the same maturity; short an amount equal to the present value of the exercise price worth of a purediscount riskless bond.

B) Buy the stock; buy a European put option on the same stock with the same exercise price and the same maturity; short an amount equal to the present value of the exercise price worth of a purediscount riskless bond. According to putcall parity we can write a European call as: C 0 = P 0 + S 0 X/(1+R f ) TWe can then read off the righthand side of the equation to create a synthetic position in the call. We would need to buy the European put, buy the stock, and short or issue a riskless purediscount bond equal in value to the present value of the exercise price.

The floatingrate payer in a simple interestrate swap has a position that is equivalent to: A) a series of long forward rate agreements (FRAs). B) a series of short FRAs. C) issuing a floatingrate bond and a series of long FRAs.

B) a series of short FRAs. The floatingrate payer has a liability/gain when rates increase/decrease above the fixed contract rate; the short position in an FRA has a liability/gain when rates increase/decrease above the contract rate.

Which of the following statements regarding an option's price is CORRECT? An option's price is: A) a decreasing function of the underlying asset's volatility when it has a long time remaining until expiration and an increasing function of its volatility if the option is close to expiration. B) an increasing function of the underlying asset's volatility. C) a decreasing function of the underlying asset's volatility.

B) an increasing function of the underlying asset's volatility. Since an option has limited risk but significant upside potential, its value always increases when the volatility of the underlying asset increases.

Writing a series of interestrate puts and buying a series of interestrate calls, all at the same exercise rate, is equivalent to: A) a short position in a series of forward rate agreements. B) being the fixedrate payer in an interest rate swap. C) being the floatingrate payer in an interest rate swap.

B) being the fixedrate payer in an interest rate swap. A short position in interest rate puts will have a negative payoff when rates are below the exercise rate; the calls will have positive payoffs when rates exceed the exercise rate. This mirrors the payoffs of the fixedrate payer who will receive positive net payments when settlement rates are above the fixed rate.

An investor who anticipates the need to exit a payfixed interest rate swap prior to expiration might: A) buy a payer swaption. B) buy a receiver swaption. C) sell a payer swaption.

B) buy a receiver swaption. A receiver swaption will, if exercised, provide a fixed payment to offset the investor's fixed obligation, and allow him to pay floating rates if they decrease.

Which of the following statements regarding the goal of a deltaneutral portfolio is most accurate? One example of a delta neutral portfolio is to combine a: A) long position in a stock with a short position in a call option so that the value of the portfolio changes with changes in the value of the stock. B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. C) long position in a stock with a long position in call options so that the value of the portfolio does not change with changes in the value of the stock.

B) long position in a stock with a short position in call options so that the value of the portfolio does not change with changes in the value of the stock. A deltaneutral portfolio can be created with any of the following combinations: long stock and short calls, long stock and long puts, short stock and long calls, and short stock and short puts.

In order to compute the implied asset price volatility for a particular option, an investor: A) must have a series of asset prices. B) must have the market price of the option. C) does not need to know the riskfree rate.

B) must have the market price of the option. In order to compute the implied volatility we need the riskfree rate, the current asset price, the time to expiration, the exercise price, and the market price of the option.

Consider a fixedrate semiannualpay equity swap where the equity payments are the total return on a $1 million portfolio and the following information: 180day LIBOR is 4.2% 360day LIBOR is 4.5% Div. yield on the portfolio = 1.2% What is the fixed rate on the swap? A) 4.5143%. B) 4.3232%. C) 4.4477%.

C) 4.4477%. (1-(1/1.045))/((1/1+0.042(180/360))+(1/(1+0.045(360/360)) = 0.022239*2 = 4.4477%

A U.S. firm (U.S.) and a foreign firm (F) engage in a 3year, annual pay plainvanilla currency swap; U.S. is the fixed rate payer in FC. The fixed rate at initiation was 5%. The variable rate at the end of year 1 was 4%, at the end of year 2 was 6%, and at the end of year 3 was 7%. At the beginning of the swap, $2 million was exchanged at an exchange rate of 2 foreign units per $1. At the end of the swap period the exchange rate was 1.75 foreign units per $1. At the end of year 1, firm: A) F pays firm U.S. $200,000. B) U.S. pays firm F $200,000. C) U.S. pays firm F 200,000 foreign units.

C) U.S. pays firm F 200,000 foreign units. A plainvanilla currency swap pays floating on dollars and fixed on foreign. Fixed on foreign 0.05 × $2,000,000 × 2 foreign units per $1 = 200,000 foreign units paid by the U.S. firm.

How is market backwardation related to an asset's convenience yield? If the convenience yield is: A) positive, causing the futures price to be below the spot price and the market is in backwardation. B) negative, causing the futures price to be below the spot price and the market is in backwardation. C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation.

C) larger than the borrowing rate, causing the futures price to be below the spot price and the market is in backwardation. When the convenience yield is more than the borrowing rate, the noarbitrage costofcarry model will not apply. It means that the value of the convenience of holding the asset it is worth more than the cost of funds to purchase it. This usually applies to nonfinancial futures contracts.

Compared to the value of a call option on a stock with no dividends, a call option on an identical stock expected to pay a dividend during the term of the option will have a: A) higher value only if it is an American style option. B) lower value only if it is an American style option. C) lower value in all cases.

C) lower value in all cases An expected dividend during the term of an option will decrease the value of a call option.

Explain what resampling methods are and why they are useful. Also explain their limitations.

Classical statistical parametric tests compare observed statistics to theoretical sampling distributions. Resampling a data-driven, not theory-driven methodology which is based upon repeated sampling within the same sample. Resampling refers to methods for doing one of these Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping) Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests) Validating models by using random subsets (bootstrapping, cross validation)

What is cross-validation? How to do it right?

It's a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set. Examples: leave-one-out cross validation, K-fold cross validation How to do it right? the training and validation data sets have to be drawn from the same population predicting stock prices: trained for a certain 5-year period, it's unrealistic to treat the subsequent 5-year a draw from the same population common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well Bias-variance trade-off for k-fold cross validation: Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n−1n−1 observations). But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation Typically, we choose k=5k=5 or k=10k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.

Explain what is "Over the Counter Market"?

Over the counter market is a decentralized market, which does not have a physical location, where market traders or participants trade with one another through various communication modes such as telephone, e-mail and proprietary electronic trading systems.

What is root cause analysis?

Root cause analysis (RCA) is a method of problem solving used for identifying the root causes of faults or problems. A factor is considered a root cause if removal thereof from the problem-fault-sequence prevents the final undesirable event from recurring; whereas a causal factor is one that affects an event's outcome, but is not a root cause. Essentially, you can find the root cause of a problem and show the relationship of causes by repeatedly asking the question, "Why?", until you find the root of the problem. This technique is commonly called "5 Whys", although is can be involve more or less than 5 questions.

Are you familiar with price optimization, price elasticity, inventory management, competitive intelligence? Give examples.

Price optimization is the use of mathematical tools to determine how customers will respond to different prices for its products and services through different channels. Big Data and data mining enables use of personalization for price optimization. Now companies like Amazon can even take optimization further and show different prices to different visitors, based on their history, although there is a strong debate about whether this is fair. Price elasticity in common usage typically refers to Price elasticity of demand, a measure of price sensitivity. It is computed as: Price Elasticity of Demand = % Change in Quantity Demanded / % Change in Price. Similarly, Price elasticity of supply is an economics measure that shows how the quantity supplied of a good or service responds to a change in its price. Inventory management is the overseeing and controlling of the ordering, storage and use of components that a company will use in the production of the items it will sell as well as the overseeing and controlling of quantities of finished products for sale. Wikipedia defines Competitive intelligence: the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Tools like Google Trends, Alexa, Compete, can be used to determine general trends and analyze your competitors on the web.

Common metrics in classification:

Recall / Sensitivity / True positive rate: High when FN low. Sensitive to unbalanced classes. Sensitivity=TP/(TP+FN) Precision / Positive Predictive Value High when FP low. Sensitive to unbalanced classes. Precision=TP/(TP+FP) Specificity / True Negative Rate High when FP low. Sensitive to unbalanced classes. Specificity=TN/(TN+FP) Accuracy High when FP and FN are low. Sensitive to unbalanced classes (see "Accuracy paradox") Accuracy=TP+TN/(TN+TP+FP+FN) ROC / AUC ROC is a graphical plot that illustrates the performance of a binary classifier (Sensitivity Vs 1−Specificity1 or Sensitivity Vs Specificity). They are not sensitive to unbalanced classes. AUC is the area under the ROC curve. Perfect classifier: AUC=1, fall on (0,1); 100% sensitivity (no FN) and 100% specificity (no FP) Logarithmic loss Punishes infinitely the deviation from the true value! It's better to be somewhat wrong than emphatically wrong! logloss=−1/N(∑ni=1(yilog(pi)+(1−yi)log(1−pi))) Misclassification Rate Misclassification=1n∑iI(yi≠y^i)Misclassification=1/n(∑iI(yi≠y^i)) F1-Score Used when the target variable is unbalanced. F1Score=2(Precision×RecallP/recision+Recall)

Explain what regularization is and why it is useful.

Regularization is the process of adding a tuning parameter to a model to induce smoothness in order to prevent overfitting. This is most often done by adding a constant multiple to an existing weight vector. This constant is often either the L1 (Lasso) or L2 (ridge), but can in actuality can be any norm. The model predictions should then minimize the mean of the loss function calculated on the regularized training set.

What is selection bias, why is it important and how can you avoid it?

Selection bias, in general, is a problematic situation in which error is introduced due to a non-random population sample. For example, if a given sample of 100 test cases was made up of a 60/20/15/5 split of 4 classes which actually occurred in relatively equal numbers in the population, then a given model may make the false assumption that probability could be the determining predictive factor. Avoiding non-random samples is the best way to deal with bias; however, when this is impractical, techniques such as resampling, boosting, and weighting are strategies which can be introduced to help deal with the situation.

Spark Low-Latency

Spark can cache/store intermediate data in memory for faster model building and training. Also, when graph algorithms are processed then it traverses graphs one connection per iteration with the partial result in memory. Less disk access and network traffic can make a huge difference when you need to process lots of data.

What is the difference between supervised learning and unsupervised learning? Give concrete examples

Supervised learning: inferring a function from labeled training data Supervised learning: predictor measurements associated with a response measurement; we wish to fit a model that relates both for better understanding the relation between them (inference) or with the aim to accurately predicting the response for future observations (prediction) Supervised learning: support vector machines, neural networks, linear regression, logistic regression, extreme gradient boosting Supervised learning examples: predict the price of a house based on the are, size.; churn prediction; predict the relevance of search engine results. Unsupervised learning: inferring a function to describe hidden structure of unlabeled data Unsupervised learning: we lack a response variable that can supervise our analysis Unsupervised learning: clustering, principal component analysis, singular value decomposition; identify group of customers Unsupervised learning examples: find customer segments; image segmentation; classify US senators by their voting.

Consider a 9month forward contract on a 10year 7% Treasury note just issued at par. The effective annual riskfree rate is 5% over the near term and the first coupon is to be paid in 182 days. The price of the forward is closest to: A) 1,037.27. B) 1,001.84. C) 965.84.

The forward price is calculated as the bond price minus the present value of the coupon, times one plus the riskfree rate for the term of the forward. (1,000 35/1.05^( 182/365 )) 1.05 ^(9/12 ) = $1,001.84

Is it better to design robust or accurate algorithms?

The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before The generalization performance of a learning system strongly depends on the complexity of the model assumed If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system has poor generalization properties and is said to suffer from underfitting By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam's razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations Quick response: Occam's Razor. It depends on the learning task. Choose the right balance Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)

Explain what regularization is and why it is useful. What are the benefits and drawbacks of specific methods, such as ridge regression and lasso?

Used to prevent overfitting: improve the generalization of a model Decreases complexity of a model Introducing a regularization term to a general loss function: adding a term to the minimization problem Impose Occam's Razor in the solution

Options Debit Spread

a debit spread results when the long position costs more than the premium received for the short position — nonetheless, the debit spread still lowers the cost of the position.


Related study sets

Millennium Prize Problems (last update: 8/21)

View Set

ปุ่มต่างๆบนแป้นพิมพ์

View Set

UNIT 10: USING EXCEL PIVOT TABLES

View Set

50 Literary Devices (Definitions)

View Set

Cognitive Psychology: Concepts and Categories

View Set

Women & Gender Studies - Introduction

View Set