Competitive Programming - Bit Shifting, Formulas, and More., Python Language & ML Libs (Pandas, SciKit Learn, NumPy), Mental Math, Important Math for CS (Notation, Linear Algebra, Number Theory, Foundations, Calculus, etc), Codewars in JS, Machine Le...
What is an octet? Why do we use the term?
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, since historically there was no standard definition for the size of the byte.
Synonym for Input variable?
Features
pandas.DataFrame.fillna
Fill NA/NaN values using the specified method - returns the filled DataFrame
Matrix Addition & Subtraction rules?
Two matrices may be added or subtracted only if they have the same dimension; that is, they must have the same number of rows and columns. Addition or subtraction is accomplished by adding or subtracting corresponding elements. For example, consider matrix A and matrix B.
When is Matrix Multiplication commutative?
... one matrix is the Identity matrix. ... one matrix is the Zero matrix. ... both matrices are rotation matrices. (basically case #2) ... both matrices are Diagonal matrices.
Classifier?
Classifier: A classifier is a special case of a hypothesis (nowadays, often learned by a machine learning algorithm). A classifier is a hypothesis or discrete-valued function that is used to assign (categorical) class labels to particular data points. In the email classification example, this classifier could be a hypothesis for labeling emails as spam or non-spam. However, a hypothesis must not necessarily be synonymous to a classifier. In a different application, our hypothesis could be a function for mapping study time and educational backgrounds of students to their future SAT scores.
What is a nibble in Computer Programming?
Half an octet, 4-bit aggregation 0xF
What is "hypothesis" in machine learning?
Hypothesis: A hypothesis is a certain function that we believe (or hope) is similar to the true function, the target function that we want to model. In context of email spam classification, it would be the rule we came up with that allows us to separate spam from non-spam emails.
What is a stream?
In computer science, a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches Streams are processed differently from batch data - normal functions cannot operate on streams as a whole, as they have potentially unlimited data, and formally, streams are codata (potentially unlimited), not data (which is finite). Functions that operate on a stream, producing another stream, are known as filters, and can be connected in pipelines, analogously to function composition. Filters may operate on one item of a stream at a time, or may base an item of output on multiple items of input, such as a moving average. - *processed one at a time rather than in batches*
transpose matrix
In linear algebra, the transpose of a matrix A is another matrix AT (also written A′, Atr, tA or At) created by any one of the following equivalent actions: reflect A over its main diagonal (which runs from top-left to bottom-right) to obtain AT write the rows of A as the columns of AT write the columns of A as the rows of AT Formally, the i th row, j th column element of AT is the j th row, i th column element of A: {\displaystyle [\mathbf {A} ^{\mathrm {T} }]_{ij}=[\mathbf {A} ]_{ji}} [\mathbf {A} ^{\mathrm {T} }]_{ij}=[\mathbf {A} ]_{ji} If A is an m × n matrix then AT is an n × m matrix.
Partial derivative?
In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). Partial derivatives are used in vector calculus and differential geometry.
What is differential calculus?
In mathematics, differential calculus is a subfield of calculus concerned with the study of the rates at which quantities change. It is one of the two traditional divisions of calculus, the other being integral calculus. The primary objects of study in differential calculus are the derivative of a function, related notions such as the differential, and their applications. The derivative of a function at a chosen input value describes the rate of change of the function near that input value. The process of finding a derivative is called differentiation. Geometrically, the derivative at a point is the slope of the tangent line to the graph of the function at that point, provided that the derivative exists and is defined at that point. For a real-valued function of a single real variable, the derivative of a function at a point generally determines the best linear approximation to the function at that point.
Differential calculus intuition
It answers the question how fast things are changing (instantaneous) at a certain point in time rather than across a long period of time. This is where we attempt to find the slope at a point by using a tangent line. Because we cannot find the slope at a particular point to be precise, we do this by approaching a difference of zero when finding rise over run.
Learning algorithm?
Learning algorithm: Again, our goal is to find or approximate the target function, and the learning algorithm is a set of instructions that tries to model the target function using our training dataset. A learning algorithm comes with a hypothesis space, the set of possible hypotheses it can come up with in order to model the unknown target function by formulating the final hypothesis
Can you assign to a const after declaring the const?
No. A const promises that we won't change the value of the variable for the rest of its life and this is considered a reassignment or rewrite.
Bitwise Swap Intuition x = x xor y y = x xor y x = x xor y
On line 1 we combine x and y (using XOR) to get this "hybrid" and we store it back in x. XOR is a great way to save information, because you can remove it by doing an XOR again. So, this is exactly what we do on line 2. We XOR the hybrid with y, which cancels out all the y information, leaving us only with x. We save this result back into y, so now they have swapped. On the last line, x still has the hybrid value. We XOR it yet again with y (now with x's original value) to remove all traces of x out of the hybrid. This leaves us with y, and the swap is complete! *The mathematics are fairly simple and work because XOR has a useful property, when you XOR A and B, you get a value C. If you XOR C and A you'll get B back, if you XOR C and B you'll get A back.*
pandas.DataFrame.loc
Purely label-location based indexer for selection by label. .loc[] is primarily label based, but may also be used with a boolean array. Allowed inputs are: A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index). A list or array of labels, e.g. ['a', 'b', 'c']. A slice object with labels, e.g. 'a':'f' (note that contrary to usual python slices, both the start and the stop are included!). A boolean array. A callable function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing (one of the above) .loc will raise a KeyError when the items are not found.
Regression vs Classification?
Regression: the output variable takes continuous values. - Price of house given a size. Classification: the output variable takes class labels, or discrete value output - Breast cancer, malignant or benign? Almost like quantitative vs categorical
Supervised learning
Supervised learning is a type of machine learning algorithm that uses a known dataset (called the training dataset) to make predictions. The training dataset includes input data and response values. From it, the supervised learning algorithm seeks to build a model that can make predictions of the response values for a new dataset. A test dataset is often used to validate the model. Using larger training datasets often yield models with higher predictive power that can generalize well for new datasets. Called Supervised learning BECAUSE the data is labeled with the "correct" responses.
Synonym for output variable?
Targets
Distributive Property
The Distributive Property is easy to remember, if you recall that "multiplication distributes over addition". Formally, they write this property as "a(b + c) = ab + ac". In numbers, this means, that 2(3 + 4) = 2×3 + 2×4. Any time they refer in a problem to using the Distributive Property, they want you to take something through the parentheses (or factor something out); any time a computation depends on multiplying through a parentheses (or factoring something out), they want you to say that the computation used the Distributive Property.
Cocktail party effect/problem
The cocktail party effect is the phenomenon of being able to focus one's auditory attention on a particular stimulus while filtering out a range of other stimuli, much the same way that a partygoer can focus on a single conversation in a noisy room. Example of source separation.
Associative Property
The word "associative" comes from "associate" or "group";the Associative Property is the rule that refers to grouping. For addition, the rule is "a + (b + c) = (a + b) + c"; in numbers, this means 2 + (3 + 4) = (2 + 3) + 4. For multiplication, the rule is "a(bc) = (ab)c"; in numbers, this means 2(3×4) = (2×3)4. Any time they refer to the Associative Property, they want you to regroup things; any time a computation depends on things being regrouped, they want you to say that the computation uses the Associative Property.
Derivatives?
a derivative is the rate of change of a function with respect to changes in its variable
Clustering
a method of unsupervised learning - a good way of discovering unknown relationships in datasets. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.
The sum of an arithmetic sequence?
S_n = (n(a_1 + a_n))/2
Sum of geometric series?
S_n = a_1(1-r^n)/(1-r)
Sum of an arithmetic sequence?
S_n = terms(a_1 + a_n) / 2
Why is the following true? 2(x + y) = 2x + 2y
Since they distributed through the parentheses, this is true by the Distributive Property.
How to check if a number is a power of two?
(n & (n - 1)) == 0 this works because every power of 2 has just one 1 in binary
Get absolute value of n using bit ops?
(n ^ (n >> 31)) - (n >> 31);
Scalar official definition?
(of a quantity) having only magnitude, not direction.
Define the properties of two matrices and those particular of square matrices
*All Matrices* Not commutative, Distributive over matrix addition Scalar multiplication is compatible with matrix multiplication, Transpose Complex Conjugate Conjugate Transpose Traces *Square Matrices* Identity Element - If A is a square matrix, then AI = IA = A where I is the identity matrix of the same order. Inverse Matrix - ?? Determinants - if A and B are square matrices of the same order, the determinant of their product AB equals the product of their determinants:
Add one to n using bit ops
-~n
In 8-bit Two's Complement what is -1 in binary, what about -127?
1111 1111
Simplify 2(3x), and justify your steps.
2(3x) original (given) statement (2×3)x by the Associative Property 6x simplification (2×3 = 6)
The sum of powers of two is...
2^(n+1) - 1
Commutative Property
3 * 2 = 2 * 3, 5 + 3 = 3 + 5 The word "commutative" comes from "commute" or "move around", so the Commutative Property is the one that refers to moving stuff around. For addition, the rule is "a + b = b + a"; in numbers, this means 2 + 3 = 3 + 2. For multiplication, the rule is "ab = ba"; in numbers, this means 2×3 = 3×2. Any time they refer to the Commutative Property, they want you to move stuff around; any time a computation depends on moving stuff around, they want you to say that the computation uses the Commutative Property.
3x4 matrix * 4x1 vector result is...
3x1 matrix
Cost function vs Gradient Descent?
A cost function is something you want to minimize. For example, your cost function might be the sum of squared errors over your training set. Gradient descent is a method for finding the minimum of a function of multiple variables. So you can use gradient descent to minimize your cost function. If your cost is a function of K variables, then the gradient is the length-K vector that defines the direction in which the cost is increasing most rapidly. So in gradient descent, you follow the negative of the gradient to the point where the cost is a minimum. If someone is talking about gradient descent in a machine learning context, the cost function is probably implied (it is the function to which you are applying the gradient descent algorithm).
What is a vector in linear algebra?
A vector is a list of numbers (can be in a row or column) - think Array. n x 1 matrix (https://en.wikipedia.org/wiki/Row_and_column_vectors)
Divisible by 3?
Add up all digits and see if divisble by 3. If so, the number is divisible by 3.
What is ceteris paribus?
All other things equal, all else unchanged, etc (think partial derivative)
Properties of matrix multiplication?
Associative Trace Determinant: For square matrices only, the determinant of a product is the product of determinants:
Divisible by 4?
Check last two digits, if divisible by 4 then it is divisible by 4. Also note that all divisble by 4 numbers end in 0,4,8,2,6
pandas.Series.describe
Generate various summary statistics, excluding NaN values. For numeric dtypes, it will include: count, mean, std, min, max, and lower, 50, and upper percentiles. For object dtypes (e.g. timestamps or strings), the index will include the count, unique, most common, and frequency of the most common. Timestamps also include the first and last items. For mixed dtypes, the index will be the union of the corresponding output types. Non-applicable entries will be filled with NaN. Note that mixed-dtype outputs can only be returned from mixed-dtype inputs and appropriate use of the include/exclude arguments. If multiple values have the highest count, then the count and most common pair will be arbitrarily chosen from among those with the highest count.
What is an identity matrix?
In linear algebra, the identity matrix, or sometimes ambiguously called a unit matrix, of size n is the n × n square matrix with ones on the main diagonal and zeros elsewhere. Multiplying by the identity matrix I doesn't change anything, just like multiplying a number by 1 doesn't change anything. This property is why I and 1 are each called the "multiplicative identity". But while there is only one "multiplicative identity" for regular numbers (namely the number 1), there are lots of different identity matrices.
why did they name this subject calculus?
It's from the Latin meaning reckon or account (like calculate), which came from using pebbles (calx) to count with.
Model definition?
Model: In machine learning field, the terms hypothesis and model are often used interchangeably. In other sciences, they can have different meanings, i.e., the hypothesis would be the "educated guess" by the scientist, and the model would be the manifestation of this guess that can be used to test the hypothesis.
Matrix scalar multiplication rules?
Multiply the scalar against each number in the matrix.
Target Function definition?
Target function: In predictive modeling, we are typically interested in modeling a particular process; we want to learn or approximate a particular function that, for example, let's us distinguish spam from non-spam email. The target function f(x) = y is the true function f that we want to model. The target function is the (unknown) function which the learning problem attempts to approximate.
Matrix multiplication?
The number of columns of the 1st matrix must equal the number of rows of the 2nd matrix. And the result will have the same number of rows as the 1st matrix, and the same number of columns as the 2nd matrix. remember matrix multiplication does not have the commutative property. Use the Dot Product to get each value for matrix multiplication.
Partial derivative notation?
The partial derivative of z with respect to x. Remember that the partial derivative is a derivative of one variable wiht others held in constant.
What does the R mean in matrices?
The set of all rows x columns matrices
Training sample definition?
Training sample: A training sample is a data point x in an available training set that we use for tackling a predictive modeling task. For example, if we are interested in classifying emails, one email in our dataset would be one training sample. Sometimes, people also use the synonymous terms training instance or training example.
Unsupervised learning
Unsupervised learning is the machine learning task of inferring a function to describe hidden structure from unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised learning and reinforcement learning. Unsupervised learning is closely related to the problem of density estimation in statistics.[1] However unsupervised learning also encompasses many other techniques that seek to summarize and explain key features of the data.
X && Y equivalent?
X ∧ Y (called wedge)
X || Y equivalent?
X ∨ Y (called Vee)
A square matrix that is not invertible is called ..
singular or degenerate
⊆
subset of or equal to
What is left associativity?
a Q b Q c If Q is left associative, then it evaluates as (a Q b) Q c And if it is right associative, then it evaluates as a Q (b Q c)
How to swap two values a,b using bits?
a ^= b; b ^= a; a ^= b; OR (easier to memorize) x = x xor y y = x xor y x = x xor y
⊇
superset of or equal to
Nth term of geometric series?
a_n = a_1 * r^(n-1)
Find nth term of an arithmetic sequence?
a_n = a_1 + (n-1)d
Nth term of arithmetic sequence?
a_n = a_1 + (n-1)d
character literals in C++
completely distinct from string literals - and always enclosed in single quotes whereas strings are always enclosed in double quotes.
∋
contains as a member, "ni"
∀ meaning?
for all..
Import the linear regression class from Scikit
from sklearn.linear_model import LinearRegression
Find the odd int - Given an array, find the int that appears an odd number of times. There will always be only one integer that appears an odd number of times.
function findOdd(A) { var searched = []; for (let i = 0; i < A.length; i++){ var index = searched.findIndex((val) => A[i] ===val[0]); if (index === -1){ var pair = []; pair.push(A[i]); //element pair.push(1); //count searched.push(pair); } else { searched[index][1]++; } }; console.log(searched); for (let i = 0; i < searched.length; i++){ if (searched[i][1] % 2 === 1) return searched[i][0]; } return 0; }
How to check if the nth bit is set?
if (x & (1 << n)) { //set } else { //not-set }
Get Max Int (Bit Ops)
int maxInt = ~(1 << 31); int maxInt = (1 << 31) - 1; int maxInt = (1 << -1) - 1;
Get Min Int (Bit Ops)
int minInt = 1 << 31; int minInt = 1 << -1;
How to detect if two integers have opposite signs?
int x, y; // input values to compare signs bool f = ((x ^ y) < 0); // true iff x and y have opposite signs
∈
is in, element of..
lowercase letter convention in reference to matrices?
lowercase indicates vectors, uppercase indicates matrices
What is a zero matrix?
n mathematics, particularly linear algebra, a zero matrix or null matrix is a matrix with all its entries being zero. Acts as the additive identity.
How to clear a bit
number &= ~(1 << x);
How to toggle a bit?
number ^= 1 << x;
How to set any bit?
number |= 1 << x;
∂ meaning?
partial, often used for partial differentials
How are matrices denoted? Columns then rows or rows then columns?
rows x columns
∴
therefore
What does this hypothesis represent? h_theta(x) = theta_0 + theta_1 x
univariate linear regression model
local variables
varaibles defined inside a pair of curly braces and exist only while executing the part of the program within those braces.
when does a stream stop writing
whitespace encounter
How to fill first byte with 1's to a variable x?
x |= (0xFF)
Check for even or odd number using bits
x&1 ? "odd" : "even"
Does the order of operations matter in Matrices?
yes
Find the partial derivative of 3x-2y^4 with respect to x and with respect to y.
you treat everything else as a constant f_x = 3 f_y = -8y^3
Subtract one to n using bit ops
~-n
How do you change the sign of an integer?
~x + 1