H.264
H.264, also known as MPEG-4 Part 10, is a highly compressed digital video codec standard proposed by the Joint Video Team (JVT), which consists of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG).
The ITU-T H.264 standard and ISO/IEC MPEG-4 Part 10 (formally named ISO/IEC 14496-10) are technically the same in terms of coding, and this coding technique is also referred to as AVC, which stands for Advanced Video Coding. The final draft of the first edition of this standard was completed in May 2003.
H.264 is one of the standards named in the ITU-T H.26x series, while AVC is the term used by ISO/IEC MPEG. This standard is commonly referred to as H.264/AVC (or AVC/H.264 or H.264/MPEG-4 AVC or MPEG-4/H.264 AVC) to clearly indicate its dual developers. The standard originated from a project known as H.26L in ITU-T. Although the name H.26L is not widely used, it has persisted.
Sometimes the standard is also referred to as the "JVT codec" because it was organized and developed by the JVT (as it is not uncommon for two organizations to jointly develop the same standard; the previous video coding standard MPEG-2 was also developed collaboratively by MPEG and ITU-T, thus MPEG-2 is referred to as H.262 in ITU-T naming conventions).
The initial goal of the H.264/AVC project was to enable the new codec to provide good video quality at bit rates much lower than those of previous video standards (such as MPEG-2 or H.263), possibly half or less, while not introducing too many complex coding tools that would make hardware implementation difficult. Another goal was adaptability, meaning the codec could be used across a wide range of scenarios (including both high and low bit rates, as well as different video resolutions) and function on various networks and systems (such as multicast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephone systems).
Recently, JVT completed an extension to the original standard, known as Fidelity Range Extensions (FRExt). This extension supports higher precision video coding by enabling higher pixel precision (including 10-bit and 12-bit pixel precision) and higher chroma sampling rates (including YUV 4:2:2 and YUV 4:4:4). The extension introduced several new features (such as adaptive integer transformations of 4x4 and 8x8, user-defined quantization weighting matrices, efficient inter-frame lossless coding, and support for new chroma spaces and chroma sub-sampling transformations). The design of this extension was completed in July 2004, with the draft also finished in September 2004. As the earliest version of the standard was completed in May 2003, JVT has completed a round of corrections to the standard, and a new round of corrections has recently been completed and approved by ITU-T, soon to be approved by MPEG as well.
H.264/AVC includes a range of new features that make it more efficient for encoding compared to previous codecs and adaptable for various network applications. These new features include:
- Multi-reference frame motion compensation. Compared to previous video coding standards, H.264/AVC uses more encoded frames as reference frames in a more flexible manner. In some cases, up to 32 reference frames can be used (previous standards allowed either 1 or 2 for B-frames). This feature can lead to bit rate reductions or quality improvements for most scene sequences, and it can significantly reduce encoding bit rates for specific types of scene sequences, such as rapidly repeating flashes, repeated cuts, or background occlusions.
- Variable block size motion compensation. Blocks can be used for motion estimation and compensation ranging from a maximum of 16x16 to a minimum of 4x4, allowing for more precise segmentation of motion areas in image sequences. These sizes include 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4.
- To reduce aliasing and achieve sharper images, a six-tap filter (sixth-order digital filter) is used to generate predictions for the luminance component at half pixel positions.
- Macroblock structure allows for the use of 16x16 macroblocks in field mode (as opposed to 16x8 in MPEG-2).
- 1/4 pixel precision motion compensation provides higher precision for motion block predictions, as chroma is typically sampled at 1/2 of the luminance (see 4:2:0), achieving a motion compensation precision of 1/8 pixel.
- Weighted prediction, which allows for the use of increased weights and offsets during motion compensation. This can provide significant coding gains in specific situations, such as fades, dissolves, and fade-outs followed by fade-ins.
- A loop filter is used to reduce blocking artifacts common in other video codecs based on discrete cosine transform (DCT).
- A matched integer 4x4 transformation (similar to the design of the discrete cosine transform), also using integer 8x8 transformations in the high precision extension, can adaptively choose between 4x4 and 8x8 transformations.
- After the initial 4x4 transformation, a Hadamard transformation is performed on the DC coefficients (the chroma DC coefficients and some special cases of the luminance DC coefficients) to improve compression in smooth areas.
- Intra-prediction using boundary pixels from neighboring blocks (which is more effective than the DC coefficient prediction used in MPEG-2 video or the transformation coefficient prediction used in H.263+ and MPEG-4 video).
- Context-adaptive binary arithmetic coding (CABAC), which enables more efficient lossless entropy coding of various syntax elements when the respective context probability distributions are known.
- Context-adaptive variable-length coding (CAVLC), used for coding quantized transform coefficients. Its complexity is relatively lower than CABAC, and while it has lower compression ratios, it is quite effective compared to entropy coding schemes used in earlier video coding standards.
- For syntax elements that are not coded with CABAC or CAVLC, the Exponential-Golomb (Exp-Golomb) entropy coding scheme is used.
- Using a Network Abstraction Layer (NAL) allows the same video syntax to apply across multiple network environments; it also employs Sequence Parameter Sets (SPSs) and Picture Parameter Sets (PPSs) to enhance robustness and flexibility.
- Switching slices (including SP and SI), which allows the encoder to instruct the decoder to switch to a currently processed video bitstream, helping to manage bit rate switching and trick mode operations. When the decoder switches to a video stream midstream using SP/SI slices, it can achieve a completely consistent decoded reconstructed image unless the subsequent decoded frame references an image prior to the switching frame.
- Flexible macroblock ordering (FMO) and arbitrary slice ordering (ASO) are used to change the coding order of macroblocks, the most basic unit of image coding. This can improve robustness in variable channel conditions and serve other purposes.
- Data partitioning (DP), which can separate and package syntax elements of varying importance for transmission, using unequal error protection (UEP) techniques to improve the robustness of the video stream against channel errors and packet loss.
- Redundant slices (RS), which is another technique to enhance the robustness of the bitstream. The encoder can use this technique to send another encoded representation of a certain area (or the entire image) (usually a lower resolution bitstream) so that when the primary representation encounters errors or loss, it can be decoded using the redundant second representation.
- An automatic byte stream packing method is employed to prevent codewords in the bitstream from duplicating the start code. The start code is the codeword in the bitstream used for random access and synchronization.
- Supplemental Enhancement Information (SEI) and Video Usability Information (VUI) provide methods for adding information to the video stream for various applications.
- Auxiliary pictures can be used for special functions, such as alpha compositing.
- Frame numbering supports creating a subsequence of a video sequence, which can help achieve temporal scalability and allows for detection and concealment of lost frames (losses may be due to network packet loss or channel errors).
- Picture order count enables the sequence of frames to be independent of the pixel values of the decoded image and timing information, using a separate system for transmitting, controlling, and changing timing information without affecting the pixel values of the decoded image.
The combination of these technologies with others has enabled H.264 to significantly improve performance compared to previous video codecs and achieve wider applications across various environments. H.264 provides a substantial increase in compression efficiency over MPEG-2, allowing for bit rates to be reduced to half or less at the same image quality.
Like other MPEG video standards, H.264/AVC also provides a reference software that can be downloaded for free. Its primary purpose is to serve as a demonstration platform for the various features of H.264/AVC rather than a direct application platform. Currently, MPEG is also working on some hardware reference design implementations.
Manufacturers and service providers using H.264/AVC must pay licensing fees to the patent holders of their products, similar to those required for MPEG-2 Parts 1 and 2, and MPEG-4 Part 2. The main source of these patent licenses is a private organization called MPEG-LA LLC, which has no relation to the MPEG standardization organization but manages patents for MPEG-2 Part 1 systems, Part 2 video, MPEG-4 Part 2 video, and several other technologies.
Other patent licenses must be applied for from another private organization called VIA Licensing, which also manages patents for audio compression standards like MPEG-2 AAC and MPEG-4 Audio.
- The content of this article is sourced from Wikipedia.