QM+C
Poisson Distribution
a discrete probability distribution describing the likelihood of a particular number of independent events within a particular interval The rate of which events occur
The Handshaking lemma
a graph then the sum of the degrees of the all the vertices of the graph is twice the number of the edges of the graph. Proof: Each edge joins two vertices and thus contributes to the degree of two vertices. To use this result we apply the following corollaries (a corollary is an easy consequence of a lemma or theorem). HAND SHAKING LEMMA (Graphs) COROLLARIES i) For any graph the sum of the vertex degrees must be an even number. ii) For any graph there must be an even number of vertices of odd degree. iii) If the graph G contains n vertices and is regular of degree r then G has nr/2 edges
theorical probabilty
a ideal we can't see
Diagraph
a non empty set of elements called the vertex set and a list of ordered pairs of these vertices called arcs
graph
a non empty set of elements called vertices (singular: vertex) and a list of unordered pairs of these vertices called edges.
Eulerian Trail
a path that traverses every edge once - it is not necessarily a circuit
one time pad
a polyalphabetic substitution cipher in which the cipher key is longer than the message and (importantly) the key is only used once.
discrete random variable
a random variable that may assume either a finite number of values or an infinite sequence of values. Denoted by a small letter
empty set
a set with no elements extremly important represented by empty braces or a o with a slash through it.
line of best fit
a smooth line that reflects the general pattern in a graph
Ceaser Cipher
a technique for encryption that shifts the alphabet by some number of characters
Circuit
a trail the starts and ends with the same vertices
independent variable
a variable (often denoted by x ) whose variation does not depend on that of another.
Min Weight Spanning Tree
a weighted graph in which the sum of the weights on the edges of the tree is as small as possible over all the possible spanning trees.
parity bit
add 1 if its even add 0 if it's odd
Lossy Compression
allows for a small loss of data consistent with little loss of 'information' in the content. data compression techniques in which some amount of data is lost. This technique attempts to eliminate redundant information.
linear regression
an algorithm to find a precise line of fit for a set of data
Exponetial dist
an average service time (mew) - gives the pr serivice will exceed time (t)
Incident
an edge (e) is incident to vertices v and w if and only if e = vw
The 10 digit ISBN
an example of an error detection code - it can detect one error (and the interchange of two symbols but not more.).
Non-Linear Feedback shift registers
an improvement of LFSR - lsfr is susceptible to a known Plaintext attack.
Transpostion ciphers
arrangement of the symbols in the plaintext.
simple queue m/m/1
arrival rate (lamda) service time u-1 (mews complement)
Feistel Function
based around different booleaan functions.
collision
because the digest is generally smaller than the message there are different messages that will hash to the same value; i.e. there are distinct messages 1m and 2m such that ).()(21mhmh This is called a collision.
Complete Bipartite
bipartite graph in which every member of one of the disjoint subsets is joined to every member of the other subset. These graphs are denoted Kp,q where p and q are the numbers of vertices in each of the two sets
In-Degree Sequence
bracketed list of the in-degrees of all the vertices in a digraph written in non-decreasing order.
Out-Degree Sequence
bracketed list of the out- degrees of all the vertices in a digraph written in non-decreasing order.
hill cipher
breack into vertex key * vertex matrix decipher inverse matrix * encryption
Gray code
can be used to minimise errors in the representation of the orientation of the platform. using two codes that differ by a single digit to represent 'neigbouring' orientations will usually result in a read value within one position when a misread of a code occurs. eg. 000 001 011 111 101 100 110 010
two-tailed test
can fail either way a test of the null hypothesis where the alternative hypothesis is not expressed directionally
one-tailed test
can only fail one way
Arc-list
collection of all the arcs of a digraph is called the arc-list and is denoted A(D).
Hamiltorian cycle
cycle in a graph is a Hamiltonian cycle if and only if it visits all the vertices in the graph and, apart from the start/terminal vertex of the cycle, it visits each vertex exactly once. A graph that contains a Hamiltonian cycle is called Hamiltonian
relative frequency
divide all totals by sample size the fraction or percent of the time that an event occurs in an experiment. if sample size = 100 use %
Shortest path
doesn't have to pass through all nodes least coat from a-z = shortest path
Mono-alphabetic substitution
each letter of the alphabet is replaced by a single cipher mod 27 add symbol?
beaufort table
essentially the same as the viginere except the rows are reversed. z,y,x....
type 2 error
failing to reject a false null hypothesis (false positive)
inverse functions
functions that undo each other f-1
Tell me three times system
gives more info more likley two right than two wrong
Samples
glimpses of a bigger a picture
Regular
graph G is regular if and only if all the vertices of G have the same degree. If a regular graph is such that all the vertices have degree rthen the graph is regular of degree r.
Bipartite
graph in which the vertex set can be partitioned into two disjoint subsets A, B in such a way that each edge of the graph joins an element of A to an element of an B.
Vertex colouring
graph without loops is an assignment of colours (usually represented by integers) to the vertices of a graph in such a way that two adjacent vertices have NOT been assigned the same colour (integer). Chromatic number for a graph is the minimum number of colours (numbers) required to produce a vertex colouring of the graph. We denote the chromatic number of a graph G by the symbols G
lfsr
helps generate a sequence of pseudo random numbers.
complement
if A is an even, Not A is also an event. this is known as the complement
coprime
if a pair has only the common divisor of 1
xor
looks like a target symbol
test statistic
mean - hypothesis (mew) / s.d / ^n mean minus hypothesis divided by the population std deviation divided by the sqaure root of the sample
chi-squared test
measure category stats obs -exp^2/ exp this is done for each value in the table then added all together= chi value
Continuous data
measured data, often time . Data that can take on any value. There is no space between data values for a given domain. Graphs are represented by solid lines. data is continuous because it can theoretically take any value in a range (think numbers with decimal places). Continuous data is an idealised c
element
members of a set represented by small letter each element should only be listed once.
send and send again system
method of error detection
Transportation problem
min cost path ( a12 + a13 + a14 ect) Subject to any contraints for path remember non-neg value all >= 0
Simle finite Capacity Queue M/M/1/K
modelling assumptions of the M/M/1/K are identical to M/M/1 with one exception: we assume that there is a finite capacity to the system. We denote the system capacity by K (some authors use C).
observation
n
intersection
n comonalities.
binominal coefficiant
n- success k-failure
Sample
na random sample a 'small' number of the population are drawn at random and (only) their data is collected. We tacitly assume that the sample is sufficiently large so as to be representative of the whole population, at least from the point of view of drawing conclusions about the population. - subset of the population
Null Hypothesis (H0)
no change in mean
Determining probability from physical charachteristics
no. of ways it can occur / total no. of possible outcomes
sequence
ordered collection of objects
mew (u)
population mean
Pr(A and B)
pr(A) x Pr (B)
The OR rule
pr(a)+pr(b)-Pr (A and B)
Frequency continous
ranges eg 10-20 20-30 30-40 or 10 - <20 20<30 30<40 or 10-15 16-20 21-25
Lossless Compression Algorithms
records data perfectly - the original file can be produced exactly a mathematical formula for image compression that assumes that the likely value of a pixel can be inferred from the values of surrounding pixels *are not always effective in reducing the size of file.
sets
represented using captial letters
Parity Check Matrix
row = pcm column
std dev
s
variance
s^2
Pseudo-random number generator
seed - a truly random number - multiply seed by itself then output the middle of the middle of the number - this output is then used as the next seed.
Queueing theory
service rate (mew) mean rate of arrival (lamda) number of serverss (n) time (t) wait(w) que(q) client (x) avg customer (l) service utilizations (p) arrival time <= service rate
co domain set
set containing all the assigned values.
range of a random variable
set of all possible values of the random variables
Vertex-Set
set of vertices of a digraph D is called the vertex-set and is denoted V(D)
a (alpha)
significance level
Empirical
something we observe in actual data
Variance
standard deviation squared
o with curve thing
std deviation
markov chain transition matrice
takes info and puts input/outpbut into a matrix row-input column-output numbers in each row - 1
z-score
the distance between the mean of a dist & a data point in the std deviations
the indetity function
the domain and the codomian of the identity function are the same. (id)
Median
the middle score in a distribution; half the scores are above it and half are below it
Mode
the most frequently occurring score(s) in a distribution
mode
the most frequently occurring score(s) in a distribution
Dispersion
the pattern of spacing of a population within an area
Homophonic substitution cipher
the plaintext alphabet is mapped into a larger ciphertext alphabet. This allows more than one symbol to be associated with the plaintext symbol. In particular common letters such as e can be replaced by one of several options.
cumulative probability distribution
the probability that the random variable is less than or equal to a particular value
relative frequency in probability
the proportion of times an outcome would occur in the long run
rate of code
the ratio of usefull information in a codeword to the total codeword length k/n
domain set
the set containing the elements to be assigned values by the rule.
Distributions
the shape of data
cumulative frequency
the sum of the frequencies for that class and all previous classes
Primary function of statistics
to conclude information about a population through analysis of samples. It is the study of relationships between populations and the samples drawn from them that forms the theoretical underpinning of the science of statistics.
positive skew
to the left
negative skew
to the right
bivariate data
two measurements taken from the same entity. independent(x)- input dependent(y)- response
Bimodal data
two peaks (camel) Two sets of data measure together (disguised as one)
sample statistics
uncertain and random
error syndrome
vector that tells you which equations are not satisfied in a recieved codeword
AND
x men symbol
Function Composition
you could potentionally construct a new function from two existing function. - when all the elements in the 'first' function are contained in the domain of the second.
Run length Encoding (RLE)
- simplest compression technique A compression algorithm that represents an image in terms of the length of runs of identical pixels think ross' cyber escape room
normal distribution
Bell Curve unimodal - only one peak mode and median is the same as the normal distribution is symetric Mean - where the center of the dist is S.D - how thin or squished the dist is. - is the avg distance between any point to them mean.
Real numbers
R numbers with a point
uniform distrubution
each value has the same frequencey (equally likely)
factorials
!
Probability theory
"the science of dealing with uncertainty" Event(informal.): an occurrence of interest to the analyst. The probability of an event A written Pr(A): a measure of the likelihood that the event will occur.
set of words
*
The 13 digit ISBN
- start with the 9 digit code which makes up the 10 digit ISBN without the checksum.
bayesian stats
- updating -changing stats
hash function
-divide into vectors of same size (n) ie (1010) -pad when neccary add all vestors mod 2 Accepts an input message of any length and generates, through a one-way operation, a fixed-length output.
max flow
-edges have flowvalues & max capacity -as much as you can without exceedin capacity flow <=capacity
Min spanning tree
-gets rid of uneccessary arcs , while ensuring nodes are still connected
specifying a function
-requre two sets - a rule that assigns a *unique* element in the second set to each element in the first.
frequency discrete
0 1 2 3 4 5 6 ect
properties of a good hash
1. It is easy to compute the hash value for each message (desirable for efficiency in implementation) 2. Given a hash it is impractical to generate a message with that hash value. 3. Any changes to the message will almost certainly result in a changed hash value 4. It is infeasible to find two different messages with the same hash
playfair cipher
5x5 square different row and column - rectangle - adjacent letter same row- to the right same column- down
Degree Sequence
A bracketed list of the edge degree of all the vertices in a graph written in non-decreasing order.
Causation
A cause and effect relationship in which one variable controls the changes in another variable.
Eulerian Graph
A circuit in a graph is Eulerian if and only if it traverses every edge in the graph once. (Note the fact that it is a circuit implies that it must start and terminate at the same vertex). A graph that contains an Eulerian circuit is called Eulerian.
Cycle
A circuit in which the only vertex to appear twice is the start/end vertex.
unique decomposability
A code is uniquely decomposable (UD) if any string of codewords corresponds to a unique message.
Linear feedback Shift Registers
A commonly used procedure for generating (psuedo) random bits is to use feedback shift registers.
corollary
A connected graph G contains an Eulerian path if and only if there are exactly two vertices of odd degree
cycle
A cycle graph is composed of a single cycle. The cycle graphs are regular of degree 2 and are denoted by Cn where n is the number of vertices
(A|B)
A depends on B
Weighted graph
A graph in which there is a number associated with each edge [its weight].
Connected
A graph is connected if there is a path from every vertex to every other vertex. Otherwise it is disconnected
null
A graph with no edges. The null graph with n vertices is denoted Nn and is regular of degree 0
Eulerian graph
A graph with no odd vertices
Labelled
A labelled graph is one in which every vertex has been assigned an identifier. A graph in which no vertex has been assigned an identifier is called unlabelled
Incidence matrix
A matrix representing the edges in a graph If a graph has n vertices and m edges; then the incidence matrix is an n x m matrix. The rows identify the vertices and the columns the edges.
Correlation
A measure of the relationship between two variables
Path
A non empty sequence of vertices such that between consecutive pairs in the sequence there is an edge, forms a walk. A walk in which no edge is repeated is called a trail. A trail in which no vertex is repeated is called a path.
Hamiltonian path
A path that contains all the vertices of the graph
Transpostion Matrices
A permutation matrix is a square matrix in which a 1 appears precisely once in each row and column; the remaining elements are all zero. The matrix can be any size
Population
A population is the set of all possible measurements of a defined type. Note that we are referring to the set (population) of data values (not the actual items that gave rise to the data.)
Markov Chain
A process in which, from any one time to another, the probability of moving from any given value on a measure to another value stays the same
Spanning Tree
A spanning tree for a connected graph G is a connected subgraph on all the vertices of G which is also a tree.
Interpretating probabilites
All Probabilities lie between 0 and 1 and have the following interpretation: Pr(Event) The Event 0 Cannot occur Near 0 is unlikely Near 0.5 is as likely to happen as not Near 1is likely to happen 1 must occur You may prefer to multiply a probability by 100 and interpret the result as the percentage of times the event will happen
Tree
Any connected graph with no cycles. The tree with n nodes has (n-1) edges. (Note the star graphs and path graphs are examples of trees).
Concatenation
Attaching codewords in a message side-by-side. the error in this is that the code may not be able to be decoded properly
Frequency
Data in its originally collected form is referred to as raw data. Prior to analysis raw data needs to be organised into a manageable form. Typically this involves ordering and/or grouping it. To organise the data we consider the range of values that it can take and divide this up into a manageable number of smaller ranges
E and slash E
E denotes element membership for example x E A means x is an element of set A. * note it isn't actually an E it just looks like one. slash E = not a member of
mutually exclusive
Events that cannot occur at the same time.
Bayes' Theorem
Expansion of conditional probabilities The probability of an event occurring based upon other event probabilities.
In-Degree
If D is a digraph with vertex v then the in-degree of v, denoted in-deg(v) is the number of loops incident to v plus the number of remaining arcs incident tov. (Note that in- deg(v) is obvious from a pictorial representation of the graph - it is the number of arrowed lines pointing towards the vertex
Out-Degree
If D is a digraph with vertex v then the out-degree of v, denoted out-deg(v) is number of loops incident to v plus the number of remaining arcs incident from v.
Vertex degree
If G is a graph with vertex v then the degree of v, denoted deg(v) is twice the number of loops incident to v plus the number of remaining edges incident to v. (Note deg(v) is obvious from a pictorial representation of the graph - it is the number of lines enteringv).
Mulitple Edge
If a pair of vertices has more than one edge connecting them then the edges are referred to as multiple edges.
LOOP
If a vertex has an edge from itself to itself then this edge is called a loop. A grach without a loop is called simple.
The hand shaking (Di)Lemma
In any digraph the sum of the out-degrees is equal to the sum of the in-degrees is equal to the number of arcs
natural numbers
N numbers used for counting Positve
mean of binominal dist
N*P n = no. of events p = probability of
Discrete data
Numerical data values that can be obtained from counting - usually whole numbers.
The Not rule
Pr(not A) = 1 - Pr(A)
Poisson Distribution
Probability distribution for the number of arrivals during each time period
prefix free codes
Rather than struggle to find UD codes directly, we look for prefix free (PF) codes - which are easy to find codes - since PF implies UD. A prefix free (PF) code requires that no code member be the prefix of another code member.
Adjacency Matrix
Records the number of direct links between vertices. The row sum is the degree of the vertex represented by the row (same for the column sum). Because a simple graph has no loops there are always zeros 0 on each of the main diagonal elements 0111 1011 1101 1110
Type 1 error
Rejecting null hypothesis when it is true (False Negative)
Hill Cipher
Square matrix operating with mod arithmitic number is broken down into vectors ( , ) the times by key matrix using the mod. to diciper use inverse matrix (-1 * key matirix)
Little's Law
States a mathematical relationship between throughput rate, flow time, and the amount of work-in-process inventory
Huffman Coding
The Huffman code can reduce the amount of space required to store a file and it is straightforward to decode since it is PF. Huffman codes are optimal in the sense that no other lossless fixed-to-variable length code has a lower average rate *learn example* Huffman coding. (algorithm) Definition: A minimal variable-length character codingbased on the frequency of each character. First, each character becomes a one-node binary tree, with the character as the only node. The character's frequency is the tree's frequency.
laws of probability
The basis for hypothesis testing and confidence interval estimation.
Edge - List
The collection of all the edges of a graph is called the edge-list and is denoted E(G)
interquartile range
The difference between the upper and lower quartiles.
Edge Connectivity
The edge connectivity of a connected graph is the smallest number of edges that can be removed from the graph and cause it to become disconnected. We denote the edge connectivity of a graph G by (G). [Spoken: lambda of G]
alternative hypothesis
The hypothesis that states there is a difference between two or more sets of data. - significant difference of mean
The key distribution problem.
The key distribution problem refers to the problem of distributing keys without a safe channel; and if you have a safe channel why the need to distribute keys. Public keys are available to anyone (and hence easily distributed) but are of no use in decryption: this requires the corresponding private key, which is held by one 'person' only.
significance level
The probability of a Type I error. A benchmark against which the P-value compared to determine if the null hypothesis will be rejected. See also alpha.
Bayes' Theorem
The probability of an event occurring based upon other event probabilities.
Vertex -Set
The set of vertices of a graph G is called the vertex-set and is denoted V(G)
vertex connectivity
The vertex connectivity of a connected graph (with the exception of complete graphs - why?) is the smallest number of vertices that can be removed from the graph with their incident edges and cause it to become disconnected. We denote the vertex connectivity of the graph G as (G). [Spoken: kappa of G]
Polyalphabetic Cipher
The way you scramble the alphabet actually changes throughout the message Example: Vigenère cipher
Minimum distance of a code
The weight of a binary codeword is the sum of the bits in the word. We denote the weight as a function w. For linear codes the metric used is called the Hamming distance of the code . For two codewords x and y taken from a linear (n,k) code, we represent the hamming distance mathematically by ),(yxd . It is calculated by summing the number of positions by which the two codewords differ. The minimum distance of a code is the smallest value of the Hamming distance taken over all possible pairs of codewords. The minimum distance is represented using the symbol(delta- hooked o thing ).
code properties
There are three properties of codes that are important when considering the application to which the code is to be put: Economy, Reliability and Security. By economy we mean reducing the lengths of communications as much as possible without loss of meaning in the content. Economy is obtained using compression techniques. This is clearly important in an age were gigabytes of data are moving around the internet . By reliability we primarily mean the detection of errors in transmitted messages. This can be done using parity check matrices or hash functions. But reliability can also include the correction of detected errors. By security we mean ensuring that our messages are not meaningful to third parties.
Graph Isomorphism
Two graphs that have the same structure
adjacent
Two vertices u and v are adjacent if and only if there is an edge between them. i.e. (uv) is in the edge list.
union
U a combo
Polyalphabetic
Uses more than one alphabet to defeat frequency analysis -shift changes longer shift word - stonger cipher one time pad
parity code for error correction
We can use two parity bits to identify the location of an error (and since we are using binary) we can then correct it. This procedure is based on blocking the codes
The Kraft -McMillian Number
When we are using variable length strings it is convenient to have a test of when particular assignments of codewords should be rejected. (K)
set of integers
Z natural extension of natural numbers negatives + positive
Cube
a bipartite graph has n = 2kvertices (for some positive integer k) and is regular of degree k. The cube graph regular of degree k is denoted Qk
list
a collection of objects in which repetitions are allowed.
Star
a complete bipartite graph in which there is only one element in one of the subsets and is denoted K1,s
T- Dist
a cont. probability dist. thats unimodal, useful way to rep sample dist.
Probability Tree
a diagram that can be used to calculate the probabilities of combinations of events resulting from multiple random trials
Jackson Networks
a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution