T6: Compression Techniques (Ch 11,12,13,14)
Progressive Mode
"Starts with nothing -> Blurry -> More and more refined" -Amari Lewis, 2017
JPEG Header
(Frame and Scan Header) Contains info about frame (ie, width, height of image, precision, # of components, unique ID, sampling factors, Q table) Contains huffman table user app data
Image Compression: Image Preparation
Analog-to-Digital Conversion
Compression Technique: Source
Considers data semantics prioritizes importance of information Prediction, Transformation, Layered Approaches Quantization on less important information
Compression Environments: Retrieval
Data is coded only once. User only receives data, so only needs decoding.
Huffman Coding
Huffman coding aims to derive more optimal codes by using occurrence probabilities of codes. Unlike fixed-length coding KNOW HOW TO ENCODE HUFFMAN (T6 page 30) also know how to calculate average length of code
Image Redundancy Types
Human perceptual sensitivity to different color channels - lower resolution coding in chrominance (UV, or CrCb) components Spatial redundancy - reduce by intra-frame coding Perceptual redundancy - reduce by quantization Coding redundancy - reduce by Run-length and Huffman codings
IDCT
I = Inverse Used to convert frequency domain to spatial domain in an image
MCU
Minimum Coded Unit: used to define the image region level an operation is applied on
Why Compression?
Motivation for Compression: Reduction in storage space, reduction in transmission time for large amount of information. Cost for Compression: Lossy and Lossless.
MPEG
Moving Picture Expert Group Mission - To develop standards for coded representation of motion pictures and audio at a bit rate of up to 1.5Mb/s. MPEG-1: 1.5 Mbits/sec (VCD) MPEG-2: higher quality 2-10Mb/s (DVD)
Bottleneck for Digital Video
Much greater than audio. (1000x greater) Compression is required. Cannot send video data RAW.
Why 8 x 8 DCT Block?
N=8 gives best compromise between error & complexity Reduce coding error (larger N --> smaller error) Computational complexity (larger N --> more complex)
Captured Image Format
spatial resolution: pixel x pixel color encoding: bits/pixel dependent on hardware and software
JPEG: Color Image Preparation
take into consideration R,G,B. However, RGB is designed for Hardware so we need to convert to YUV (luminance (Y) and chrominance(U,V))
MPEG Frame Header
temporal references, frame type, frame structure...
Matching Criterion for Macroblock
MSE (Minimum Mean Square Error) MAD (Minimum Absolute Error)
General Compression Phrase
1. Data Preparation 2. Data Processing 3. Quantization 4. Entropy Encoding
How does Entropy Encoding Work
1. Order the coeffs in zig-zag sequence shown (most trailing coeffs will be zero, after quantization...higher frequency = 0) 2. Use run-length encoding to encode the resulting sequence 3. Use Huffman or Arithmetic coding to minimize the coding sequence (Apply Huffman after RLE)
JPEG Requirements
1. should be independent of image size 2. image content may be of any complexity 3. should have a good compression ratio without sacrificing image quality 4. should be able to run on most standard platforms 5. should have a sequential decoding and progressive decoding abilities
Macroblock
16 x 16 pixel (or 2x2 DCT) blocks macroblocks, as motion compensation units
Image Sizes
1995: 307.2K 2013: 15MP
Data Processing
2nd phase in general compression phase. First step of compression
Intensity Value: Color Image
3 8-bit integers are used. (total 24 bits per pixel, 0-255 for each integer) RGB channel
Basic Compression Techniques
3 Techniques: Entropy, Source, and Hybrid
Quantization
3rd phase. Processing results of previous step: mapping real numbers into integers resulting in loss of precision. STAGE FOR LOSSY COMPRESSION. THERE IS NO LOSS IN OTHER 3 STAGES
Entropy Encoding
4th/Last Phase. Compress sequential data (integers from previous step) without loss. (e.g., run-length coding or Huffman coding)
Intensity Value: Gray level Images
8-bit integer used (0-255, from black to white)
Video Compression Redundancy
Additional: Temporal Redundancy (reduce by inter-frame coding)
MPEG Macroblock
Address, Type, Q Scale, Motion Vectors, Blocks....
Encoding Algorithm:
Algorithm: 1. Down-sample the original image by a factor of (multiples of) 2 in each dimension 2. Encode this reduced image used the standard JPEG compression 3. Decode this compressed image and up-sample it 4. Use this up-sampled image as a prediction of the original image and encode the difference using standard JPEG compression
Types of Video Compression (MPEG)
Asymmetric Compression: Compression process is done only once and at the time of storage Examples: Video-on-Demand and News-on-Demand servers MPEG is an asymmetric standard Symmetric Compression: Equal use of compression and decompression process Examples: video conferencing, video telephone, and desktop video publishing
Advantages of Interpolative Coding
Better compression ratio Deals with uncovered areas properly better statistical properties and noise reduction Can decouple between prediction and coding
Motion Compensation Predictors:
Causal Predictors: Pure Predictive Coding P-frames = prediction frames Noncausal Predictors: Interpolative coding Contents of a frame is generated based on both a previous and a successive frame Interpolative Frame = B-Frame
Compression Technique: Hybrid
Combination of Entropy and Source. Most standards use different schemes at different stages of compression.
JPEG 2000
Complements JPEG. Better quality at lower bit-rates. Lossy and Lossless compression with Discrete Wavelet Transform (DWT) Progressive Coding Increased robustness to errors Content-based description Protective image security (Watermarking) Interfacing with MPEG-4
DCT Transformation
Done in Image Processing Stage. 8 x 8 block of pixels (DCT: N=8) maps to frequency domain 1DC value 63 AC values or 64 orthogonal basis functions. Refer to slide T6.35 for example
Predictor Encoding
Each pixel is encoded a pair of 8 bits (One group of 8 bits for prediction defines 8 possible prediction values) number of the chosen predictor & the difference of the prediction to the actual value are entropy encoded
MPEG Intraframe Coding
Encoding of a single picture Discrete Cosine Transform- Converts spatial to frequency domain Quantization of spectral coefficients DPCM to encode DC terms Zigzag scan to group zeros into long sequences, followed by run-length coding Lossless, Variable Length Coding to encode AC coefficients SIMILAR TO IMAGE COMPRESSION (JPEG) *Difference is DPCT instead of FDCT
Hierarchical Mode II
Encoding of an image is done at successively lower resolutions as shown Decoding can be started at the lowest resolution and repeated till the highest resolution is reached Example: Showing lower resolution first then high resolution later
Dialog Requirements
End-to-End delay should not exceed 150ms (compress, network, protocol processing, data transfer delays)
EXIF
Exchangeable Image File Format. Used by digital cameras IMPORTANT METADATA FOR IMAGE RETRIEVAL
Retrieval Requirements
Fast Forward, Fast Rewind, Random Access, Decompression from a random standing point in data stream
Data Preparation
First phase of compression. Analog - Digital conversion (compression has not begun)
Image Compression: Image Processing
First step for compression (e.g., applying transform coding to convert representation form pixel to frequency)
General Requirements
Frame size/rate independency, Supporting various rates for different types of compression (audio/video) Synchronization Economical Implementation Portability
Three step search
From larger blocks to smaller blocks. Split frame into large blocks first, then from that one block, split again into smaller blocks, and then again. Logarithmic strategy Idea: Halve the matching distance each time to obtain finer resolution estimates Step 1: Search at nine points marked by 1's and 0. 1 = motion, 0 = no motion. (Refers to splitting frame into large blocks). Select the 1 with best MSE and MAD Step 2: Evaluate points around step 1 chosen block. Step 3: Continue process to choose best match position
MPEG Sequence
GoP Header, Frame Header, Frame....
MPEG1 Characteristics for I,P,B GoP
I only: Compression: Low Random Access: Highest (can point to exact frame) Coding Delay: Low I and P: Compression: Medium Random Access: Medium Coding Delay: Medium I, P, B: Compression: High Random Access: Medium Coding Delay: High Tradeoff between coding delay and compression
P(E): probability of occurrences of a random event E
I(E) = -logP(E) where I(E) = unit of information for E. If P(E) = 1 (event occurs with 100% certainty), there is no information for E since I(E) will = 0. Use less bits to code more frequent occurrences
average self-information generated by the production of a single source symbol
I(aj) = -log {p(aj)}
MPEG Coding Frames
I,P,B Frames I = Image P = Prediction B = Bidirectional (Interpolative)
Epitome of Compression
Identify and remove redundancy. Final quality is acceptable if lossy. Price-performance tradeoff
Image Process
Image -> Scanner -> Captured Format -> Stored Format -> Database
Phases of Image Compression
Image Preparation -> Image Processing -> Quantization -> Entropy Encoding
Purpose of DCT Transform
In frequency domain, we can remove more of high frequency components which contribute more to details of an image. DCT = LOSSLESS Example: Zig-Zag sequence Compression is also achieved by removing high frequency terms in AC
How is intensity in a pixel of an image measured?
Intensity at each pixel is represented by an integer. Value of integer got from the analog (continuous) image by averaging over a small neighborhood around the pixel location. 2^P possible values for each pixel, where P = # of bits for each pixel.
Video Frame Coding for Compression
Intra-coded frames - good for random access Inter-coded frames - higher compression rate
JPEG
JPEG: Joint Photographers Expert Group Established: 1986, Adopted: 1991 Applies to gray-scale and color images have .jpg or .JPEG file extension
Image Compression: Entropy Encoding
Last step -- compress sequential data (integers from previous step) without loss. (e.g., runlength coding or Huffman coding)
Compression Types
Lossless (maintain original quality). Lossy (inferior to original quality by controllable amount)
MPEG Encoding Characteristics:
Lossy compression Trade off image quality with bit rate according to objective or subjective criteria Intraframe coding Interframe coding Group of Picture (GOP) - one group of I, P, and B-frames
MPEG 4 Motivations
Object Oriented Concepts New Compression Techniques
Hierarchical Mode
Obtain I_0 from I_-1 + Diff(I_0 - I_-1) I_0 = original image, I_-1 = new image from sampling Basically: can obtain original image with new image and the difference between old image and new image...k, got it? good.
Analog Video
One or more analog signals that contain time-varying 2-D intensity (monochrone or color) patterns and timing info to align the pictures Examples: Component Analog Video (CAV) = RGB, YUV, YCrCB video Composite Video = NTSC S-Video
Lossless Encoding
Operates at pixel level instead of an 8 x 8 block Uses predictive techniques instead of DCT to remove redundancy in data. Image -> Predictor -> Entropy Encoder (/w table) -> Compressed Image
Frame Sequence
Organize frame sequence in terms of Group of Picture (GoP) Example: [ I B B P B B P] [ I B B P B B P] .... GoP needs to be uniform Each GoP is an independent entity I-Frame provides random access point, & as re-synchronization point P and B frames allows for greater compression efficiency
DCT Transform Process
Original Image -> FDCT -> DCT Image Storage -> IDCT -> Display Image
MPEG Picture
Picture Header, Slice Header, Macroblocks
MPEG Sequence Header
Picture Width, Height, Aspect Ratio, Bit Rate, Picture Rate
Block-based Motion Compensation
Principle: predict contents of current frames (at block level) from previous or subsequent frames Motion information comprises the amplitude and direction of displacement of the contents Advantages: Low overhead - needs only one motion vector per block Availability of low-cost VLSI implementation Disadvantages Fails for zoom, rotation motion and under local deformation Discontinuity at block boundaries Serious blocking artifacts, especially at low bit-rate
Image Compression: Quantization
Processing results of previous step: mapping real numbers into integers resulting in loss of precision
Common Image Formats
RIFF: Resource Interchange File Format GIF: Graphics Interchange Format TIFF: Tagged Image File Format JPEG: Joint Photographers Expert Group
Desired Features of Video
Random Access Fast Forward and Fast Rewind Think YOUTUBE, NETFLIX Reverse Playback Audio-Visual Synchronization: Robustness to Errors Coding/Decoding Delay: Total System Delay < 150 ms Editability Format Flexibility
Requirements of MPEG Encoding
Random access requirements --> pure intra-frame coding Higher compression rates ---> inter-frame coding
Principles of Compression
Redundancy and Matching user Expectations
MPEG Interframe Coding
Remove temporal redundancies between frames Use extensively in MPEG-1 and MPEG-2 Based on estimation of motion between video frames Use of motion vectors to describe displacement of pixels from one frame to the next One motion vector can represent the motion of a block of pixels.
How does Quantization Work?
Rounding to the nearest number by some reference (ie 10). then reversing the process. Smaller quantization step greater accuracy, less error. Smaller quantization steps for lower frequency components psycho-visual analysis to decide how much to quantize and why in the 8x8 DCT array
MPEG1 Encoding Scheme
STEPS Partitioning of images into Macroblocks (MB) size 16X16 Intraframe coding on one out of every K images - GOP size = K Motion estimation on MBs Generate (K-1) predicted frames Encode residual error images
Imaging and Video Requirements
Simple Image (307.2KByte) Color Image (921.6 KByte) Video (27.648MBytes/sec) ~30 frames/second
Conditional Replenishment
Skipped MB - Zero motion vector, the MB is neither encoded nor transmitted Inter MB - Motion Prediction is valid, the MB type and address, motion vector and the coded DCT coefficients are transmitted Intra MB - Encoded DCT coefficients of the MB are transmitted. No Motion Compensation is used
Stages of Quantization
Stage 1: Use Human Visual System (Psychovisual Features) Stage 2: Use FDCT Coefficients and obtain Q Matrix Stage 3: Scale each coefficient by the Q factor Stage 4: Most entries become 0 after applying Q matrix
Stored Image Format
Storage Options: 2D Array of Values. (Each value represents data from image pixel) Bitmap: binary digit value Color Image: Numbers or Color Lookup Table If enough space, store in RGB triplets (for color image) or compress the info Other necessary information: Width, Height, Image Depth, Creator, etc. (EXIF info)
Compression Audio Requirements
Telephone Speech sampled at 8kHz (8 bit/sample) (64Kbits/second) Audio CD Quality sampled at 44.1KHz (16 bits/sample) (176.4 KBytes/second)
Temporal Redundancy
Temporal redundancy: subsequent frames carry similar but slightly varying content
Compression Requirements
Text (Lowest common size = 640 x 480 screen size, 9.6KBytes) Vector Graphics (~500 lines, 2,875Bytes)
Lossless Coding Theorem
The minimum bit rate that can be achieved by lossless coding of discrete memoryless source X is given by: min {R} = H(X) + ε bits per symbol where R is the transmission rate, H is the entropy of the source and ε is a positive quantity which can be made arbitrarily close to zero
Search Strategies in Motion Compensation
Three step search Cross search
Image Decoding
Work backwards: Decode Huffman/RLE Dequantize Data with same Q matrix Apply IDCT to transform back to Spatial Domain (some loss here because of limitations)
Intensity Value: Black and White Image
Two Values (0 and 1). This is a binary-value image
Video: Scanning
Two types: Progressive Scanning: Each frame is rendered completely in each display cycle Interlaced Scanning: Each frame is split into two fields, and only half the frames is rendered in each cycle. Tradeoffs between speed and quality NEW SCAN METHODS AVAILABLE
Run-length Coding
Type: ENTROPY. Replaces repeated byte sequences with the byte and the number of occurrences. The number of occurrences is indicated by a special flag like "!" E.g. : BBBBBBBBB --> B!9 This technique is useful for images which have large regions of uniform colors (or gray values) GIF essentially uses run-length encoding - hence it is good for cartoon images but not for natural scenery images
FDCT
Used to convert spatial domain to frequency domain in an image
Compression Environments: Dialog Mode
User both sends/receives data. (Both transmitter and receiver)
Compression Technique: Entropy
Uses image content knowledge to reduce redundancy. Ignore media and human characteristics. (ie Runlength-coding, Huffman coding)
Structure of MPEG (Syntax Layers)
Video -> Sequence (group of pictures) -> picture (or frame) -> slice (first row in picture) -> macro block (2x2) in slice -> block -> (one block in macroblock)
MJPEG
Video JPEG where each frame is coded using JPEG
YUV, YCrCb
Y = luminance (grayscale component) UV = color components Chrome = high frequency Luminance = low frequency Humans are more sensitive to luminance so we can afford to lose a lot more info in chrominance components than in luminance component CUT CHROMINANCE FIRST (usually be 2:1...2h2v, 2h1v sampling (horizontal, vert)) Typical choice: Leave Luminance component alone at full resolution Chrominance components are often reduced 2:1 horizontally and either 2:1 or 1:1 (no change) vertically We often call these alternatives 2h2v or 2h1v sampling This immediately reduces the data size by 1/2 or 1/3.
Evolution of Computer Standards
from 16K color graphics to up to 786.4K with 256 colors.
What does an 8 x 8 DCT Block look like
top left corner is DC value (uniform...called DCT coefficient) first row = vertical variations first col = horizontal variations Remember: 1 DC, 63 AC