MPEG-2 is an international standard for video and audio compression released by the MPEG working group in 1994. MPEG-2 is commonly used to provide video and audio encoding for broadcast signals, including satellite TV and cable TV. After a few modifications, MPEG-2 also became the core technology for DVD products.
The system description part of MPEG-2 (Part 1) defines the transport stream, which provides a mechanism for transmitting digital video and audio signals over unreliable media, primarily used in the broadcasting field.
The second part of MPEG-2, which is the video part, is similar to MPEG-1 but provides support for interlaced video display modes (interlacing is widely used in broadcasting). MPEG-2 video is not optimized for low bit rates (below 1 Mbps); however, it significantly outperforms MPEG-1 at bit rates of 3 Mbit/s and above. MPEG-2 is backward compatible, meaning all compliant MPEG-2 decoders can also play MPEG-1 video streams.
MPEG-2 technology is also applied in HDTV transmission systems and Blu-ray discs.
The third part of MPEG-2 defines the audio compression standard. This part improves upon MPEG-1 audio compression, supporting audio with more than two channels. The audio compression part of MPEG-2 also maintains backward compatibility.
The seventh part of MPEG-2 defines audio compression that is not backward compatible. This part offers stronger audio capabilities. What we commonly refer to as MPEG-2 AAC refers to this part.
An MPEG-2 system stream generally consists of two basic elements:
The principle of MPEG-2 video compression utilizes two characteristics of the image: spatial correlation and temporal correlation. These correlations result in a significant amount of redundant information within the image. If we can remove this redundant information and only retain a small amount of non-correlated information for transmission, we can greatly save on bandwidth. The receiver can use this non-correlated information, along with a specific decoding algorithm, to reconstruct the original image while maintaining a certain quality.
MPEG-2 video typically contains multiple GOPs (Group Of Pictures), each of which includes multiple frames. The types of frames usually include I-frames (I-frame), P-frames (P-frame), and B-frames (B-frame). I-frames use intraframe coding, P-frames use forward prediction, and B-frames use bidirectional prediction.
I-frames utilize intraframe coding, meaning they only use spatial correlation within a single frame without relying on temporal correlation. I-frames employ intraframe compression and do not use motion compensation. Since I-frames do not depend on other frames, they serve as random access entry points and reference frames for decoding. I-frames are primarily used for initializing the receiver, acquiring the channel, and switching and inserting programs, with relatively low compression ratios. I-frames appear periodically in the image sequence, with the frequency determined by the encoder.
P-frames and B-frames use interframe coding, utilizing both spatial and temporal correlations. P-frames use only forward temporal prediction, which can improve compression efficiency and image quality. P-frames can also contain portions coded with intraframe coding, meaning each macroblock in the P-frame can be either forward predicted or intraframe coded.
B-frames use bidirectional temporal prediction, greatly improving the compression ratio. It is worth noting that because B-frames use future frames as references, the transmission order of frames in the MPEG-2 encoded bitstream differs from the display order.
The encoded bitstream of MPEG-2 is divided into six layers. To better represent the encoded data, MPEG-2 uses a syntax to specify a hierarchical structure. It is divided into six layers: Video Sequence Layer (Sequence), Group of Pictures Layer (GOP), Picture Layer, Slice Layer, Macro Block Layer, and Block Layer. It can be seen that, except for the Macro Block Layer and Block Layer, the upper four layers have corresponding start codes (Start Code, SC) that can be used for resynchronization by the decoder in case of errors or loss of synchronization between the sending and receiving ends. Thus, one loss of synchronization will result in at least one slice of data being lost.
Generally, the input video format is 25 (CCIR standard) or 29.97 (FCC) frames per second.
MPEG-2 supports both interlaced and progressive scan. In progressive scan mode, the basic unit for encoding is the frame. In interlaced scan mode, the basic encoding can be either a frame or a field.
The original input image is first converted to the YCbCr color space. Y represents brightness, and Cb and Cr are the two chroma channels. For each channel, block segmentation is performed to form "macroblocks," which are the basic unit of encoding. Each macroblock is further divided into 8x8 blocks. The number of blocks for the chroma channels depends on the initial parameter settings. For example, in the commonly used 4:2:0 format, each chroma macroblock only samples one block, so the total number of blocks that the three channel macroblocks can be divided into is 4+1+1=6.
For I-frames, the entire image enters the encoding process directly. For P-frames and B-frames, motion compensation is performed first. Generally, due to the strong correlation between adjacent frames, macroblocks can find similar areas for matching in nearby positions in both the previous and next frames. This offset is recorded as the motion vector, and the error in reconstructing the motion estimation is sent to the encoder for encoding.
For each 8x8 block, discrete cosine transform (DCT) converts the image from the spatial domain to the frequency domain. The resulting transform coefficients are quantized and rearranged to increase the likelihood of long runs of zeros. This is followed by run-length coding. Finally, Huffman encoding is performed.
I-frame coding is aimed at reducing spatial domain redundancy, while P-frames and B-frames are aimed at reducing temporal domain redundancy.
A GOP consists of a fixed pattern of I-frames, P-frames, and B-frames. A commonly used structure comprises 15 frames in the form IBBPBBPBBPBBPBB. The selection of the proportions of frames in the GOP is related to bandwidth and image quality requirements. For instance, since the compression time for B-frames may be three times that of I-frames, certain real-time systems with limited computational power may need to reduce the proportion of B-frames.
The output bitstream of MPEG-2 can be either constant bit rate or variable bit rate. The maximum bit rate can reach 10.4 Mbit/s, as in DVD applications. If a constant bit rate is to be used, the quantization scale needs to be continuously adjusted to produce a constant bitstream. However, increasing the quantization scale may result in visible distortion effects, such as a mosaic effect.
The audio coding in MPEG-2 includes:
The MPEG-2 standard is used in DVDs and introduces the following technical parameter restrictions:
MPEG-2 in NTSC must meet one of the following resolutions:
蘊藏許多助人的知識與智慧。