MPEG-2


MPEG-2 is an international standard for video and audio compression released by the MPEG working group in 1994. MPEG-2 is commonly used to provide video and audio encoding for broadcast signals, including satellite TV and cable TV. After a few modifications, MPEG-2 also became the core technology for DVD products.

The system description part of MPEG-2 (Part 1) defines the transport stream, which provides a mechanism for transmitting digital video and audio signals over unreliable media, primarily used in the broadcasting field.

The second part of MPEG-2, which is the video part, is similar to MPEG-1 but provides support for interlaced video display modes (interlacing is widely used in broadcasting). MPEG-2 video is not optimized for low bit rates (below 1 Mbps); however, it significantly outperforms MPEG-1 at bit rates of 3 Mbit/s and above. MPEG-2 is backward compatible, meaning all compliant MPEG-2 decoders can also play MPEG-1 video streams.

MPEG-2 technology is also applied in HDTV transmission systems and Blu-ray discs.

The third part of MPEG-2 defines the audio compression standard. This part improves upon MPEG-1 audio compression, supporting audio with more than two channels. The audio compression part of MPEG-2 also maintains backward compatibility.

The seventh part of MPEG-2 defines audio compression that is not backward compatible. This part offers stronger audio capabilities. What we commonly refer to as MPEG-2 AAC refers to this part.

An MPEG-2 system stream generally consists of two basic elements:

  • Video data + timestamps
  • Audio data + timestamps

The principle of MPEG-2 video compression utilizes two characteristics of the image: spatial correlation and temporal correlation. These correlations result in a significant amount of redundant information within the image. If we can remove this redundant information and only retain a small amount of non-correlated information for transmission, we can greatly save on bandwidth. The receiver can use this non-correlated information, along with a specific decoding algorithm, to reconstruct the original image while maintaining a certain quality.

MPEG-2 video typically contains multiple GOPs (Group Of Pictures), each of which includes multiple frames. The types of frames usually include I-frames (I-frame), P-frames (P-frame), and B-frames (B-frame). I-frames use intraframe coding, P-frames use forward prediction, and B-frames use bidirectional prediction.

I-frames utilize intraframe coding, meaning they only use spatial correlation within a single frame without relying on temporal correlation. I-frames employ intraframe compression and do not use motion compensation. Since I-frames do not depend on other frames, they serve as random access entry points and reference frames for decoding. I-frames are primarily used for initializing the receiver, acquiring the channel, and switching and inserting programs, with relatively low compression ratios. I-frames appear periodically in the image sequence, with the frequency determined by the encoder.

P-frames and B-frames use interframe coding, utilizing both spatial and temporal correlations. P-frames use only forward temporal prediction, which can improve compression efficiency and image quality. P-frames can also contain portions coded with intraframe coding, meaning each macroblock in the P-frame can be either forward predicted or intraframe coded.

B-frames use bidirectional temporal prediction, greatly improving the compression ratio. It is worth noting that because B-frames use future frames as references, the transmission order of frames in the MPEG-2 encoded bitstream differs from the display order.

The encoded bitstream of MPEG-2 is divided into six layers. To better represent the encoded data, MPEG-2 uses a syntax to specify a hierarchical structure. It is divided into six layers: Video Sequence Layer (Sequence), Group of Pictures Layer (GOP), Picture Layer, Slice Layer, Macro Block Layer, and Block Layer. It can be seen that, except for the Macro Block Layer and Block Layer, the upper four layers have corresponding start codes (Start Code, SC) that can be used for resynchronization by the decoder in case of errors or loss of synchronization between the sending and receiving ends. Thus, one loss of synchronization will result in at least one slice of data being lost.

Generally, the input video format is 25 (CCIR standard) or 29.97 (FCC) frames per second.

MPEG-2 supports both interlaced and progressive scan. In progressive scan mode, the basic unit for encoding is the frame. In interlaced scan mode, the basic encoding can be either a frame or a field.

The original input image is first converted to the YCbCr color space. Y represents brightness, and Cb and Cr are the two chroma channels. For each channel, block segmentation is performed to form "macroblocks," which are the basic unit of encoding. Each macroblock is further divided into 8x8 blocks. The number of blocks for the chroma channels depends on the initial parameter settings. For example, in the commonly used 4:2:0 format, each chroma macroblock only samples one block, so the total number of blocks that the three channel macroblocks can be divided into is 4+1+1=6.

For I-frames, the entire image enters the encoding process directly. For P-frames and B-frames, motion compensation is performed first. Generally, due to the strong correlation between adjacent frames, macroblocks can find similar areas for matching in nearby positions in both the previous and next frames. This offset is recorded as the motion vector, and the error in reconstructing the motion estimation is sent to the encoder for encoding.

For each 8x8 block, discrete cosine transform (DCT) converts the image from the spatial domain to the frequency domain. The resulting transform coefficients are quantized and rearranged to increase the likelihood of long runs of zeros. This is followed by run-length coding. Finally, Huffman encoding is performed.

I-frame coding is aimed at reducing spatial domain redundancy, while P-frames and B-frames are aimed at reducing temporal domain redundancy.

A GOP consists of a fixed pattern of I-frames, P-frames, and B-frames. A commonly used structure comprises 15 frames in the form IBBPBBPBBPBBPBB. The selection of the proportions of frames in the GOP is related to bandwidth and image quality requirements. For instance, since the compression time for B-frames may be three times that of I-frames, certain real-time systems with limited computational power may need to reduce the proportion of B-frames.

The output bitstream of MPEG-2 can be either constant bit rate or variable bit rate. The maximum bit rate can reach 10.4 Mbit/s, as in DVD applications. If a constant bit rate is to be used, the quantization scale needs to be continuously adjusted to produce a constant bitstream. However, increasing the quantization scale may result in visible distortion effects, such as a mosaic effect.

The audio coding in MPEG-2 includes:

  • Using half the sampling rate for low bit rate audio.
  • Multi-channel encoding achieving up to 5.1 channels.
  • Providing MPEG-2 AAC, which is not backward compatible.

The MPEG-2 standard is used in DVDs and introduces the following technical parameter restrictions:

  • Resolution
    • 720 x 480, 704 x 480, 352 x 480, 352 x 240 pixels (NTSC format)
    • 720 x 576, 704 x 576, 352 x 576, 352 x 288 pixels (PAL format)
  • Aspect Ratio
    • 4:3
    • 16:9
  • Frame Rate (frames per second)
    • 59.94 fields/second, 23.976 frames/second, 29.97 frames/second (NTSC)
    • 50 fields/second, 25 frames/second (PAL)
  • Video + Audio Bit Rate
    • Average maximum buffer 9.8 Mbit/s
    • Peak 15 Mbit/s
    • Minimum 300 Kbit/s
  • YUV 4:2:0
  • Subtitle support
  • Embedded subtitle support (NTSC only)
  • Audio
    • LPCM encoding: 48kHz or 96kHz; 16 or 24-bit; up to 6 channels
    • MPEG Layer 2 (MP2): 48 kHz, up to 5.1 channels
    • Dolby Digital (DD, also known as AC-3): 48 kHz, 32-448 kbit/s, up to 5.1 channels
    • Digital Theater Systems (DTS): 754 kbit/s or 1510 kbit/s
    • NTSC DVDs must include at least one LPCM or Dolby Digital track
    • PAL DVDs must include at least one MPEG Layer 2, LPCM, or Dolby Digital track
  • GOP Structure
    • Must provide header information for the GOP sequence
    • Maximum number of frames in a GOP: 18 (NTSC) / 15 (PAL)

MPEG-2 in NTSC must meet one of the following resolutions:

  • 1920 × 1080 pixels, up to 60 frames/second (1080i) * 1080i is encoded at 1920×1080 pixels, but the last 8 lines are discarded during display.
  • 1280 × 720 pixels, up to 60 frames/second (720p)
  • 720 × 576 pixels, up to 50 frames/second, 25 frames/second (576i, 576p)
  • 720 × 480 pixels, up to 60 frames/second, 30 frames/second (480i, 480p)
  • 640 × 480 pixels, up to 60 frames/second
MPEG-2
  • Content of this article is excerpted from Wikipedia

Wi-Fi Network Planning Advice

Answers & Suggestions

15 Tips for Hard Drive Longevity

Answers & Suggestions

10 Things About Image Analysis

Answers & Suggestions

知識學院

蘊藏許多助人的知識與智慧。

關注知識學院