Lecture 2 - Stream Ciphers
Can we formally prove a particular PRG is secure, as in being not predictable?
Can't prove this formally but there are good heuristics.
Define PRG and its property.
G: {0,1}^s → {0,1}^n where n >> s. Generator takes a seed (seed space: {0,1}^s) and produces a longer pseudorandom string. It has to be efficiently computable (bounded by time constraint) by a deterministic algorithm.
What is the trade off between performance and security properties of random number generators?
If the performance is fast and cheap, then the security decreases. But if security is high, then performance is slow and costly.
What is the important property of XOR?
If we take a random variable with an arbitrary distribution (Y) and XOR it with an independent uniform random variable (X) we end up with is a uniform random variable (Z).
How do you prove that OTP has perfect secrecy?
Lemma: OTP has perfect secrecy Proof: ∀m,c: Pr_k[E(k,m) = c] = #keys k ∃ K such that E(k,m) = c / |K| -the number of keys that map m to c divided by the total number of keys -For all messages and ciphertext, the probability of encrypting the message using a uniformly random k, produces a constant means you don't learn anything from this. The probability that you get c from the encryption using k is saying that the calculation of the number of keys there are that actually produce c divided by all possibles keys which results in a constant means for all messages and ciphertexts, you won't know which one is encrypted. if ∀m,c: [#keys k ∃ K: E(k,m) = c] = const → cipher has perfect secrecy
Explain the following: Lemma: OTP has perfect secrect For OPT: ∀m,c: if E(k,m) = c → k XOR m = c → k = m XOR c →#keys [ k ∃ K: E(k,m) = c] = 1 ∀m,c → cipher has perfect secrecy
One possible key that maps m to c. Since the # of keys is just 1, we have a constant/constant for all m. So we won't be able to see what m is given c. But the fact that it has perfect secrecy does not mean that OTP is a secure cipher to use.
Why can't you just generate a keystream from a smaller key?
The pseudo random string for the key has to be cryptographically strong. So we use a PRG but to find a good function is difficult.
If the key is a binary string of 128 bits, what is the key space?
2^128
In layman terms, what is the Formal Definition of Information Theoretic Security saying?
Given the ciphertext, you can't tell which message was encrypted for all set of messages. The most powerful adversary learns nothing about plaintext from the ciphertext. There's no ciphertext only attack but other attacks may be possible.
Why did we need a different definition of security for stream ciphers?
Since stream ciphers can't have perfect secrecy and it depends on the PRG. The PRG has to be unpredictable, if not, then everything breaks down since you can predict the output.
Describe the relationship with the plaintext, key, and ciphertext in One Time Pad.
The sizes of the plaintext, key, and ciphertext are the same because you are doing bit-size operations.
What is the importance of stream ciphers?
The usage of stream ciphers because they make OTP practical so that your key would be small but you can encrypt large amount of data. ie. key can be 128 bit long but you can encrypt MB of data
Describe the Two Time Pad. What does it tell us?
This is when you encrypt two different messages using the same key (output from the PRG(k)), you would be able to deduce some information and break the encryption. Let's say Eve eavesdrops and obtains the two ciphertexts. When you XOR the two ciphertexts, the results are two messages XOR'ed. c₁ XOR c₂ = m₁ XOR m₂ Messages are not uniformly random so certain deductions can be made. And if they are the same message, then this would produce all 0's. The ciphertext should not leak any information about the plaintext. m₁ XOR m₂ has enough redundancy in the language that we can recover some information. A violation of perfect secrecy. Two time pad is insecure. Never use stream cipher key more than once. Must use a different key each time.
What is the Consistency Property?
This must hold for every cipher, otherwise, it's impossible to decrypt then its useless. For all messages with a particular key from K, if you encrypt all the messages then decrypt with same key then you should get back m.
What would be a better construction for security with WEP that had the Two Time Pad issue?
To avoid Two Time Pad, need to add some randomness to it so that it's not repeated. We can treat the m's as one long message by concatenating them. Or take the long term key and generate a long string which would act as our random key. Then break it into pieces to encrypt different msgs.
Explain what had occurred with 802.11 WEP (Wired Equivalent Privacy).
To avoid the Two Time Pad, it took the m with CRC(m) which is a check sum for errors and XOR'ed it with a PRG( IV || k). In hopes that the concatenation of Initialization Vector (IV) with k would generate different keys with PRG, each time a packet is sent, because IV changes even though k does 2^24, the IV resets. So about after 16M frames you get Two Time Pad. Moreover, on some 802.11 cards, the IV resets to 0 after power cycle. Additionally, keys are closely related. It's possible to recover the key by observing just a few cipher tets.
Explain the following in respect to stream ciphers: c = E(k,m) = m XOR G(k) D(k,c) = c XOR G(k)
To produce the ciphertext c, the encryption takes k and m as input. The k which is a uniformly random key but a small one is fed to a generator G. The G(k) produces an arbitrary size pseudorandom string that covers the size of me. These both are XOR'ed to produce c. When decrypting, it takes the ciphertext and XOR's with G(k) which is the pseudorandom string.
Explain Negligible and Non-negligible Event with ε, a scalar value.
ε is non-negligible if ε >= 1/(2^30) meaning the event is likely to occur- not a small, ignore able amount. ε is negligible if ε <= 1/(2^80) meaning the event isn't likely to occur over the life of the key. And if it did, we don't care at that point. With functions, ε is negligible if less than an exponential, smaller than 1/polynomial. ε is non-negligible if bigger than 1/polynomial.
What are the desired properties for random numbers?
-Unbiased (uniform distribution) -Unpredictable (independence) -Irreproducible All values of a sample size are equally possible with unbiased, uniform distribution. And if they're unpredictable and independent, then it is impossible to predict what the next output will be, given all the previous outputs, but not the internal "hidden" state. Shouldn't be able to predict the next number based on previous ones. Furthermore, they should be irreproducible meaning two instances of generators should not produce same results. Two of the same generators, given the same starting conditions, will produce different outputs.
What's a secure cipher? Think about passive attacker who can obtain only one ciphertext and three candidate security requirements for secure cipher. 1. Attacker cannot recover secret key 2. Attacker cannot recover all of plaintext 3. Attacker cannot learn any character from the ciphertext Are these candidates good? Why or why not?
1. Not a good definition of a secure cipher -ie. E(k,m) = m [you can't infer the key whatsoever but you get m in plaintext ] 2. E(k, m0 || m1) = m0 || m1 XOR k -(m0 || m1) means to concatenate m0 and m1 then it becomes encrypted -m1 XOR k with OTP and part of m0 is with the output in plaintext -this is not breakable since the attacker can't see m1 but sees a part of m0 3. Attacker can't learn any character from ciphertext but somehow knows your salary amount -doesn't know your character but knows some sensitive information about you
What is the Formal Definition of Symmetric Ciphers?
A cipher defined over (K, M, C) is a pair of efficient algorithms (E,D)- runs in polynomial time in the size of its inputs (concrete time constraints) where: E: K x M → C x means applies to D: K x C → M Such that: ∀ m ∃ M, k ∃ K: D(k, E(k,m)) = m E is often randomized (randomized algorithm) D is always deterministic (deterministic algorithm) K is a set of keys M is a set of plaintexts C is a set of ciphertexts
Name the advantage and disadvantage of OTP.
Advantage: Easy and cheap Disadvantage: Key size is large and hard to manage -You also have to be communicated to the other party since it's symmetric -if you have a secure channel to transmit the key, you can just send the message instead since they're the same size
Explain why the following is a weak PRG: r[ i ] ← a * r [ i - 1 ] + b mod p output few bits of r[ i ] i++ seed = r[ 0 ] a,b are integers p is a prime
Because it is a linear congruential generator. It takes the seed of r[ 0 ] which is the initial state of the generator and executes the mathematical statement: a*r[ i - 1] + b mod p. This is a linear transformation which can be easily broken.
Describe an attack on OTP showing no integrity.
Bob sends a message to Alice using OTP, encrypting it by XOR'ing with k. It's a ciphertext of an email that starts with "From Bob." Eve, knowing this, can XOR the ciphertext with some X to produce some ciphertext when decrypted states "From Eve" since both Bob and Eve are the same size of letters. Eve knowing that messages follow certain protocol, XOR's ASCII values of Bob and Eve. Even though Eve could not create a cipher text that says "From Eve," by modifying an existing ciphertext she is able to achieve that.
Describe what had occurred with MS-PPTP (point-to-point transfer protocol Windows NT).
Both client and server utilized symmetric key encryption but had the same PRG(k). The client would concatenate messages and XOR with PRG(k), then send it to the server. But the server also concatenate server messages and XOR with PRG(k), and send to client. This scenario resulted in Two Time Pad. When sending msg from client to server and vice versa, different keys should be used. When you don't know the size of msg, it is a good idea to use OTP but with different keys.
What is the Formal Definition of Information Theoretic Security?
Cipher (E, D), a pair of algorithms of encryption and decryption, over (K, M, C) has perfect secrecy if ∀m₀, m₁ ∃ M |m₀| = |m₁| and ∀c ∃ C -For any 2 msgs such that the length are the same, and there exists a ciphertext in the set of ciphertexts where: Pr[E(k, m₀) = c] = Pr[E(k, m₁) = c] -the probability of c for encryption for m₀ is exactly the same for the probability of c for encryption for m₁. Can't determine which one was encrypted. R where i is uniform in K: k ← K (k is a random variable that is uniformly sampled in the key space K)
Explain the following with the One Time Pad: - E(k, m) - D(k, c)
E(k,m) is the encryption where it takes the plaintext (m) and XOR's it with the key to produce the ciphertext (C). C = E(k,m) = k XOR m D(k,c) is the decryption where it takes the ciphertext (c) and XOR's it with the key to produce the m. D(k, c) = k XOR C = D(k, E(k,m)) = D(k, k XOR m) = k XOR (k XOR m) = (k XOR k) XOR m [XOR is associative] = 0 XOR m [k XOR k produces all 0's] = m
Describe a scenario where we see OPT being malleable.
E(m,k) → m XOR k then (m XOR k) XOR p → (m XOR k) XOR p where p is some string of same size then decrypt with k: D((m XOR k) XOR p, k) → m XOR p (m XOR k) XOR p : you were able to modify the message here even though you don't know what's inside. Produced a known result by modifying the ciphertext (on the decrypted message). It's predictable, you know how the decrypted message is going to be (going to look like) even though you don't know what the message is. Knowing this, you can do different attacks.
Explain the One Time Pad.
First example of a secure cipher. A bit by bit operation. This is where you take plaintext and XOR it with a key of the same length to produce ciphertext. M = C = K = {0, 1}^n means the size of M (msg) = size of C (ciphertext) = size of K (key) because you're doing bit-size operations.
Why was Shannon's work or paper instrumental in modern cryptography?
He applied mathematical tools for information theory with communication in noisy channel which allowed to prove OTP to be unbreakable under certain assumptions.
What is Perfect Secrecy?
Informally, regardless of any prior information, the attacker has about the plaintext, the ciphertext should leak no additional information about the plaintext. Blindly guessing a character of the plaintext is not a violation of security, because the attacker could have guessed the character of the plaintext without seeing ciphertext. Can't get any information by looking at ciphertext Whether you have information about the plaintext (ie. it's English thus frequency of letters), you get no insights or additional information from the ciphertext. -you can guess but you could have guess without looking at the ciphertext
Why was Content Scrambling System (CSS) broken?
It is not random because the selection bits are fixed. The only thing that's random is the initial state which is the seed. Using cryptoanalysis, we can recover the key. A ciphertext would be an encrypted movie, typically a MPEG file. There's specific structure with MPEG files, a known prefix. We can XOR the ciphertext with the prefix and some m. c XOR m → k So we can determine the 20 bytes of the PRG output (recover part of the generator) by running 20 bytes of output for each of 2^17 for 17 bit LFSR. Then we subtract CSS prefix from the generated 20 bytes → a possible candidate 20 bytes output of 25 bit LFSR (easy to determine whether these 20 bytes was a correct setting for the 25 bit LFSR). If consistent, then we found the correct initial settings for both LFSR. Then using this key, we can generate the entire CSS output. By knowing the correct setting for the 17-bit LFSR, this leads to the discovery of the correct setting for 25 bit LFSR. Since we learn the initial setting for the generator, we can run it as many times we need to produce the output to recover the key.
What is the main idea behind PRG?
It is to replace the random key with a pseudorandom key. It is a function that makes a string to a long string.
Why is it important to find what secure cipher means?
It is to understand whether a cipher is secure. Hence, you need Formal Definitions of Security.
Describe RC4.
It's a stream cipher where it can take seeds of length 128 bits. Then it expands to 2048 bits. Used in HTTPS and WEP. Weaknesses include: 1. Bias in initial output: Pr[2nd byte = 0] = 2/256. (It should be 1/256). Probability that the 2nd byte produced by RC4 is likely to be 0. Thus won't be encrypting anything since you're XOR'ing with 0's. 2. Probability of two bytes (0,0) is 1/(256^2) + 1/(256^3). 1/(256^2) is normal but 1/(256^3) is not, not negligible value thus the bias. 3. Related key attacks. if the keys are related, its open to attack especially when you collect enough messages.
Describe Content Scrambling System (CSS). Where is it used?
It's implemented in hardware. Using Linear Feedback Shift Register (LFSR), each cell in the register represents a particular bit of either 0 or 1. The register can be of different size, depends on the implementation (17 bit or 25 bit). The seed is the initial state of LFSR. Each cycle, it drops the last digit (falls off), then takes some of the values (from the cells) and XOR's them to come up with a new bit. Then this new bit is added to the register, shifting by one. Continue doing this to get an output of variable size. So in CSS, it has two LFSRs where one is 17 bit and the other is 25 bit. Both have initials states- Seed1 and Seed2. The seed for CSS is 5 bytes because of exports- restriction the government places on key length. The first two bytes of the key is derived from the 17 bit LFSR while the remaining 3 bytes of the key is from the 25 bit LFSR. Every clock cycle produces 1 bit, so every 8 cycles produces 1 byte. Then it is mod with 256 producing a byte ( used as a PRG). Then this byte is XOR'ed with a byte of a movie. Often used in DVD (2 LFSRs), GSM (3 LFSRs), and Bluetooth (4 LFSRs) encryptions.
Describe Salsa20.
It's optimized for both hardware and software. Takes a random seed input- 2 versions: 128 or 256 bits. Then includes a nonce of 64 bit values to produce a long random string. Salsa20: {0,1} ^128 or 256 X {0,1}^64 → {0,1}^n -takes the seed, applies the nonce to produce a binary string of max length of 2^73. Salsa20(k;r) := H(k, (r, 0) || H(k, (r,1)) || ... -apply function H which is describe in the spec for Salso20 which takes a random key k (seed) as input and a nonce counter (r,0). Then you produce as many bytes as you need by applying the function H with key and nonce inputs. The H function is designed to be fast on 0x86 and it's an invertible function. For Salsa 128-bit key, takes 16 byte key, 8 byte nonce, and 8 byte counter (32 bytes). Then it expands to T₀ k T₁ r i T₂ k T₃ (64 bytes). -T₀, T₁, T₂, T₃ are 4 byte constants where the spec for it defines values for. T₀ k T₁ r i T₂ k T₃ is fed into the H function (described earlier), which goes for 10 rounds. During each round, the output of H function is XOR'ed with T₀ k T₁ r i T₂ k T₃. Then you produce 64 byte output to be used as a pseudorandom number for OTP.
Suppose G:K→{0,1}^n is such that for all k: XOR(G(k)) = 1 A generator that produces a binary string of size n but it has some bias using the output of the generator (G) and you XOR all the bits in the string: ie. 00101 → 0 but for this generator you always get 1 after XOR'ing. Is G predictable? Why or why not?
It's predictable. Using the definition of predictability, when looking at some number of bits, you can predict the remaining 1 bit or more. ie. 0010 ? The next bit value has to be so that when XOR'ing, it results into 1. Good generators should not have this property.
In the Formal Definition of Symmetric Ciphers, it includes the following: Such that: ∀ m ∃ M, k ∃ K: D(k, E(k,m)) = m Why is this important?
It's the consistency property. This must hold for every cipher, otherwise, it's impossible to decrypt then its useless. For all messages with a particular key from K, if you encrypt all the messages then decrypt with same key then you should get back m.
What is Malleable property? And how does this affect OTP?
Malleable property is when you can modify the ciphertext to produce know effect on the decrypted plaintext. Modifications to ciphertext are undetected and have predictable impact on plaintext. OTP has to be used with some integrity mechanisms because it's malleable
What the advantage and disadvantage to using OTP?
Need as many key bits as message, difficult in practice. But it's fast, cheap, and efficient. Also, it is unconditionally secure provided key is truly random.
Can a stream cipher have perfect secrecy?
No, because the key size has to be the same as msg size.
OTP is a secure cipher meaning it has perfect secrecy. But why doesn't it necessarily mean it is a secure cipher to be used?
OTP is optimal if the size of k = size of m. The key can't be shorter than the message. In general, k should be longer than m. |K| >= |M|. This is very hard to use in practice. Given any ciphertext, try decrypting it under every possible key in K then this gives us a list up to |K| possible messages. If |K| < |M|, then some messages are not in the list. If the length of the key is smaller than the message, then you would only get a subset of messages out of the whole set of messages. if the size of k is less than the possible size of msgs, the size of msgs is less than all possible msgs. Therefore the key has to be at least equal or greater than the size of m.
When is PRG unpredictable?
PRG is unpredictable if it is not predictable. ∀i: "no efficient" algorithm that can predict bit (i+1) for non-negligible ε
Describe eStream ciphers.
PRG: {0,1}^s X R → {0,1}^n, n >> s Takes a seed ({0,1}^s) and applies a nonce (R). A nonce is a non-repeating value for a given key. It's something that never repeats and doesn't have to be random for the life of the key- can be a counter, predictable. It produces a binary string that's larger than the seed. This allows you to reuse the key because: E(k, m;r) = m XOR PRG(k;r) But the pair (k,r) is never used more than once. If you change the key, you can repeat the nonce. This is a way not to change the ky every time to ensure avoidance of Two Time Pad. Allows you to keep the key for a long time.
Explain what a Discrete Random Variable is.
Random variable X is a function X: U -> V that induces probability on V: each value occurs with a probability. A function that takes input from U and returns some values which has some probabilities- depends on the distribution which can be uniform (then every single value is possible).
State the components for Random variable X, X: U-> V and the distribution for the following example: Tossing a fair coin 3 times. What would be the events in this example?
Set U: -0 TTT -1 HTT, TTH, THT -2 HHT, HTH -3 HHH Set V: 0, 1, 2, 3 Probability Distribution: -Since there are total of 8 combinations --0: 1/8 --1: 3/8 --2: 3/8 --3: 1/8 Events would be the values from set V -ie. Prob( events 1 & 2) = 6/8
Why is Stream Ciphers used?
Since long keys make OTP impractical, stream ciphers are used to make it practical. The idea is to have a small, truly random seed then expand this to a large string of arbitrary size. It is to replace random key with a pseudorandom key. Stream ciphers are used in applications where plaintext comes in quantities of unknowable length- for example, a secure wireless connection. Useful when you don't know the length of data that needs to be encrypted such as communication between server and client- don't know how many messages would be there.
How is key generated, with PRGs?
Take a random seed and feed it to a Pseudo Random Generator (PRG) which then produces a pseudorandom string where the size of the string should be as long as we need it for encryption- bit wise XOR'ing.
Why shouldn't we use OTP or stream ciphers for disc encryption?
The fact that OTP or a stream cipher encrypts one bit at a time means that if one change takes place, then it's very easy to tell where that change occurred. So if Bob encrypts a file where the header says "To Bob." Then the same file changes only with the header to say "To Eve." The body remains unchanged. If an attacker sees these two versions, they would immediately see the change- breaks perfect secrecy since you are able to get extra information.
Explain the idea behind Information Theoretic Security.
The idea is that ciphertext should reveal no information about plaintext. -ie. the ciphertext is indistinguishable from random
Prove the important property of XOR when n = 1.
We have random variable Y and uniform variable X. Random variable Y -Y=0: P0 -Y=1: P1 Uniform variable X -X=0: 1/2 -X=1: 1/2 Together (variables are independent, multiply the probabilities): -XY=00: P0/2 -XY=01: P1/2 -XY=10: P0/2 -XY=11: P1/2 Z = 0 when values of X and Y are either 00 or 11 -Pr[Z=0] = Pr[ (X, Y) = (0,0) or (X, Y) = (1,1) ] -Pr[ (x,y) = (0,0) ] + Pr[ (x,y) = (1,1) ] = (P0/2) + (P1/2) = 1/2 You add the Pr[(x,y) = (0,0)] and Pr[(x,y) = (1,1)] because they're union bound of disjoined events. These events are completely independent hence the strict completely.
How do we formally define PRG predictability?
We say that G:K→{0,1}^n is predictable if: ∃ "efficient" algorithm A and ∃ 1 <= i <= n-1 such that Pr[A ( G(k) |₁, ..., i = G(k) | i+₁)] > 1/2 + ε and R k←K choose random k from set K for non-neglible ε (ie. ε = 1/ (2^30)) -there's an efficient algorithm A such that given the position i, probability of A ( if A is given some output of G(k)), you would be able to calculate the remaining bits with some negligible probability.
Explain why Random Variable is important with keys.
We want a Random Variable that's uniformly distributed to be applied to keys. If it's not uniformly distributed then we can predict the key. Thus making keys predictable. Attacker can predict what the key is with some probability which is non-negligible.
Why is uniform distribution important?
We, typically, want uniform distribution with keys because this allows to generate them at random. Each key should be equally possible to be returned from the key space using a random variable.
Given a message m and OTP encryption c: 1. Can you compute the OTP key from m and c? if yes, how? 2. How many OTP keys map m to c?
Yes. Since k XOR m = c then m XOR c = k. This is true for all messages for the size of the key space. Can't say which m was produced when looking at c cause every m is equally possible to produce the particular ciphertext. There's one key.
Explain why PRG must be unpredictable.
alg ∃ i: G(k) |₁, ..., i → G(k) | i+₁, ..., ₙ Let's say we intercepted some ciphertext and we know that this is an encryption of an MPEG file. MPEG files have some prefix that's always present. We know a part of the plaintext since the prefixes are a standard format. If we XOR the plaintext with the ciphertext, we can get a part of G(k). With PRG, it's predictable given the first i bits of the output, there's an efficient algorithm that will compute the rest of the output. c XOR known part with unknown m gives you a part of G(k) Given some i, a position between 1 & i+n, you're able to predict the remaining bits then it means it's predictable. Since we are able to get a part of G(k), then we can calculate the remaining bits to recover the plaintext.