DIVX3 / MS-MPEG4v1-v3 / WMV7-8 Version: 0.07 2003-01-13 Copyright (c) 2003 Michael Niedermayer Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". Version numbers used in this doc: 1 MS MPEG4v1 2 MS MPEG4v2 3 MS MPEG4v3 / divx3 4 Windows Media Video 7 / WMV7 / WMV1 5 Windows Media Video 8 / WMV8 / WMV2 latest version is available at: http://www.mplayerhq.hu/~michael/msmpeg4.txt Introduction: MS MPEG4/DIVX3/WMV will be referred to by using MSMPEG4 throught this document. It is assumed that the reader is familiar with video coding concepts including VLCs, run-length encoding, DCTs, quantization, motion compensation, frame types, etc. (FIXME some links to MPEG video intros) MSMPEG4 upto version 3 is pretty much ISO-MPEG4 with most advanced features removed, and different VLC tables. WMV1 just has different scantables too and WMV2 additionally uses 8x4, 4x8 DCT in addition to the 8x8 DCT for P Frames, and supports horizontal quarterpel motion compensation. Note: The WMV8=WMV2 info is not complete yet. =============================================================================== Terms and Definitions: ABT adaptive block transform AC Any DCT coefficient for which the frequency in one or both dimensions is non-zero. CBP coded block pattern DC The DCT coefficient for which the frequency is zero in both dimensions. DCT Discrete cosine transform GOP Group of pictures (starts with an I Frame) IDCT inverse discrete cosine transform MB Macroblock MPEG Motion Pictures Expert Group MS microsoft MV Motion Vector MC Motion Compensation Slice for MSMPEG4: consecutive macroblock rows, so a slice is a rectangle of macroblocks which always touches the left and right edge of the image VLC variable length code WMV windows media video median(a,b,c) a+b+c - MAX(a,b,c) - MIN(a,b,c) (= the middle value of 3) =============================================================================== High Level Description: MSMPEG3 supports the following frame types: I Frames (these are decoded without using any previous frames) P Frames (these use the last decoded frame for prediction) J Frames (they are like I Frames but use a different encoding, they only exist in WMV2) Note: None of the MSMPEG4 versions support B frames --------------- Colorspace / Output Format: YV12 / planar 4:2:0 YCbCr (like MPEG1/2/4) --------------- Macroblocks: Macroblock are comprised of 16x16 luma samples & 8x8 U + V chroma samples. There are intra & inter type macroblocks. While MSMPEG4 splits a MB into 4 luma 8x8 blocks & 2 chroma 8x8 blocks, WMV2 also splits some 8x8 blocks in 2 8x4 or 4x8 blocks but only in P frames. Note: J Frames do not use Macroblocks. The coded order of 8x8 blocks in a MB is luma then Cb then Cr. The luma blocks are organized as: Y->Y / / Y->Y If a 8x8 block is split into subblocks then the left or top subblock is transmitted first. --------------- Slices: MSMPEG4 frames can be split into slices, but only at the start of a MB row. All slices in all pictures in a GOP have the same height (except the last slice in the pictures which may be shorter). Normally the height stays the same throughout the video, and WMV2 cannot even change it between GOPs, as the number of slices is encoded in a global header for WMV2. The slice height is encoded directly in MSMPEG4 v1. For the later versions the number of slices is encoded instead. Vectors for motion vector prediction across a slice boundary are not available, like for the top edge. AC, DC, and coded values for prediction are not available for prediction across a slice boundary in versions 1-3. Note: There is no special marker or header at the slice start or end. --------------- Motion Compensation: MSMPEG4 motion compensation is macroblock-based like in MPEG1, so there is exactly one motion vector for every MB in a P frame. MSMPEG4 supports rounded and not rounded halfpel interpolation like ISO-MPEG4. WMV2 supports 2 different motion compensation methods: 1. The ISO-MPEG4 standard halfpel interpolation with 2 rounding modes like in the older MSMPEG4 versions 2. the WMV2 horizontal quarter pel interpolation with only 1 rounding mode --------------- Block Coding: MSMPEG4 is identical to ISO-MPEG4 in zigzag / horizontal and vertical alternate scans with run/level/last VLC, with the exception of WMV1&2 which use different scantables. The decoding process for an individual block is: bitstream ---vlc decode---> run/level/last ---RLE decode with scantable---> 8x8 block of coefficients (+dc/ac prediction for intra MBs) ---dequantize--IDCT---> residual 8x8 block For intra MBs the residual block is identical to what is displayed. For inter MBs the residual block is the difference between what should actually be displayed and what was predicted by motion compensation --------------- DC Prediction For Intra Macroblocks: DC prediciton for intra MBs for MSMPEG4 v1 is identical to MPEG1. The dc is simply predicted from the "last" block of the same type (left block for Cb/Cr). The prediction is reset to gray (128) after each macroblock row. The luma prediction order for MBs follows this pattern: Y->Y Y->Y Y->Y / / / / / / / / / / Y->Y Y->Y Y->Y DC prediction for intra MBs in version 2-3 is nearly identical to ISO-MPEG4. A B C X A is the dc value of the top left 8x8 block B is the dc value of the top 8x8 block C is the dc value of the left 8x8 block X is the predicted dc value of the current 8x8 block Unavailable blocks (left or top edge or inter blocks are set to gray), where: gray = (1024 + dc_scale/2)/dc_scale if(abs(a-c) <= abs(a-b)) x= b else x= c Note: The difference between this scheme and ISO-MPEG4 is that the latter uses "abs(a-c) < abs(a-b)" instead of "abs(a-c) <= abs(a-b)". DC prediction For intra MBs in version 4-5 is identical to ISO-MPEG4. In other words, '<=' is replaced by '<'. However, if inter_intra_pred==1 then the following prediction is used instead: ( Note: inter_intra_pred == (width*height < 320*240 && bit_rate<=128kbit && P Frame); ) For the top left luma block in a macroblock the prediction direction is coded in the bitstream. The prediction is the average of the 64 samples of the top or left 8x8 block (even if it is inter). The prediction is gray if it is on an edge. The prediction direction of the 2 chroma blocks is also encoded in the bitstream, (note, that the bitstream stores just 1 direction for both Cb and Cr blocks). The top right and bottom left luma blocks in a MB are predicted by using the DC of the left top luma block. The bottom right luma block in a MB uses the same rule as non inter_intra predicted blocks. This is the prediction direction for the luma blocks in a MB: | v -->Y->Y | | v v Y->Y --------------- AC Prediction: AC prediction in MSMPEG4 is identical to ISO-MPEG4. Intra blocks are AC predicted in the same direction as the DC prediction if the ac_pred bit is set for the current macroblock. Left AC prediction: The left column of AC values (=7) in the current 8x8 block is predicted from the left block. Top AC prediction: the top row of AC values (=7) in the current 8x8 block is predicted from the top block. DAAAAAAA DAAAAAAA A....... A||||||| A....... A||||||| A....... A||||||| A....... A||||||| A....... A||||||| A....... A||||||| A....... A||||||| ||||||| ||||||| vvvvvvv DAAAAAAA DAAAAAAA A---------->A....... A---------->A....... A---------->A....... A---------->A....... A---------->A....... A---------->A....... A---------->A....... Unavailable blocks and inter blocks are replaced by all 0. --------------- Scantables: Version 1-3: (identical to ISO-MPEG4) Inter and intra blocks without AC prediction use the zigzag scantable. AC-predicted intra blocks use the alternate horizontal and vertical scantables depending upon the prediction direction. Version 4-5 Identical to version 1-3 except that all tables are replaced by different ones and the inter and intra without AC prediction use different tables. Note that in version 5 the 8x4 & 4x8 subblocks use a fixed scantable. --------------- Luma Coded Block Pattern Prediction For I Frames: A B C X A is the coded value of the top left 8x8 luma block B is the coded value of the top 8x8 luma block C is the coded value of the left 8x8 luma block X is the predicted value of the current 8x8 luma block Unavailable blocks (left & top edge) have coded=0. if(A==B) X=C; else X=B Note: Neither the chroma CBP, nor the CBP of any MB in a P Frame is predicted. Note: Coded means that there are non zero AC coefficients in the 8x8 block, which are are coded in the bitstream. Note: Due to the AC prediction, it is possible that there are non zero AC coefficients which are not coded. In that case, coded=0. --------------- Motion Vector Prediction: The motion vector prediction scheme used in v1-4 is identical to the scheme used in ISO MPEG-4. The scheme used in v5 is differs in that the halfpel motion vector is predicted from the left, top & top right halfpel motion vectors. If only the left vector is available then it is used as predictor. If only the top and top right vectors are available then the vector of the medians of their components and 0 is used as predictor. If all 3 vectors are available then the vector of the medians of their components is used as predictor unless all of the following conditions are true: 1. version==5 2. mspel==0 3. top_left_mv_flag==1 4. the horizontal or vertical difference of the top & left vectors is >= 8 If these conditions are met, an extra bit is transmitted which selects the left or top vectors as predictors. Note: median((a0,a1), (b0,b1), (c0,c1)) = (median(a0,b0,c0), median(a1,b1,c1)) median(a,b,c) = a+b+c - MAX(a,b,c) - MIN(a,b,c) (= the middle value of 3) --------------- Differential Encoding of Motion Vectors: For halfpel: (dmvx, dmvy) is the motion vector difference from the bitstream (pmvx, pmvy) is the predicted motion vector (mvx, mvy) is the motion vector mvx= pmvx + dmvx if (mvx<=-64) mvx+= 64; else if(mvx>= 64) mvx-= 64; The same calculations apply for the y components. So the components of the motion vectors are limited to -63 ... 63 which is -31.5 ... 31.5 in fullpel units. For horizontal qpel: in WMV2 (mspel==1) If the halfpel vector does not point to a full pel position then an additional bit is transmitted which shifts the vector right by 1/4 pixel if the bit is 1. --------------- Halfpel Motion Compensation: Halfpel MC in MSMPEG4 is identical to ISO-MPEG4. F Fullpel position H H or V halfpel position h HV halfpel position FHF HhH FHF F->H<-F |\ /| v \ / v H h H ^ / \ ^ |/ \| F->H<-F H= (F+F+r)>>1 f= (F+F+F+F+1+r)>>2 r is 0 or 1 depending upon the rounding mode for the current frame --------------- WMV2 Horizontal Quarter Pel: F Fullpel position H H or V halfpel position h HV halfpel position Q,q quarterpel positions FQHQF ..... HqhqH ..... FQHQF F-----F->H<-F-----F | | | | | | | | | | | | F-----F->H<-F-----F | | | v v v H h H ^ ^ ^ | | | F-----F->H<-F-----F | | | | | | | | | | | | F-----F->H<-F-----F F->Q<-H->Q<-F H->q<-h->q<-H H = from F by 4tap (halfpel samples) h = from H by 4tap vertically (halfpel samples) Q = (F+H+1)>>1 (quarterpel samples) q = (H+h+1)>>1 (quarterpel samples) 4tap = clip((-a + 9b + 9c - d + 8)>>4, 0, 255); --------------- 8x8 IDCT (identical to MPEG1/2/4): 2 N-1 N-1 (2x+1)u*pi (2y+1)v*pi f(x,y) = - SUM SUM C(u)C(v)F(u,v)cos---------- cos---------- N u=0 v=0 2N 2N N=8 C(u) = 1/sqrt(2) if u==0 1 if u!=0 --------------- 8x4 and 4x8 IDCT: 2 N-1 M-1 (2x+1)u*pi (2y+1)v*pi f(x,y) = ------- SUM SUM C(u)C(v)F(u,v)cos---------- cos---------- sqrt(MN) u=0 v=0 2N 2M N=8 M=4 C(u) = 1/sqrt(2) if u==0 1 if u!=0 --------------- (de)quantization: For AC coefficients in intra blocks and all coefficients in inter blocks, the quantization is the same as H263 / ISO-MPEG4. For the DC coefficient: for luma dequant_dc= quant_dc * luma_scale [quantizer_scale] for chroma dequant_dc= quant_dc * chroma_scale[quantizer_scale] For version 1-2: both luma_scale & chroma_scale are constant 8 (like ISO-MPEG1) For version 3 (like ISO-MPEG4): luma_scale[31]= {0, 8, 8, 8, 8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,34,36,38,40,42,44,46}; chroma_scale[31]= {0, 8, 8, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,20,21,22,23,24,25}; For version 4-5: luma_scale[31]= {0, 8, 8, 8, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20,21,21}; chroma_scale[31]= {0, 8, 8, 8, 8, 9, 9,10,10,11,11,12,12,13,13,14,14,15,15,16,16,17,17,18,18,19,19,20,20,21,21,22}; =============================================================================== Bitstream: Name type Values Version Default ----------------- Header: Startcode u(24) 0x100 1 Frame number u(5) 1 Picture type u(2) 00 I-Frame 1234 01 P-Frame 1234 u(1) 0 I-Frame 5 1 P-Frame 5 if(Picture type == I-Frame) code u(7) 5 Quantizer scale u(5) 1-31 12345 if(Picture type == I-Frame){ slice code u(5) 1234 ext_header 4 if(j_type_bit) j_type u(1) 5 0 if(j_type==0){ if(per_mb_rl_bit) per_mb_rl_table u(1) 45 else per_mb_rl_table NC 0 12345 if(!per_mb_rl_table){ rl chroma_table_index c3 345 2 rl table_index c3 345 2 } dc_table_index u(1) 345 0 } }else{ wmv2_skip 5 use_mb_skip_code NC 1 1 u(1) 234 NC 0 5 if(quantizer scale<=10) cbp_index c3 0 0 5 3 10 2 5 3 11 1 5 3 else if(quantizer scaley<=20) cbp_index c3 0 1 5 3 10 0 5 3 11 2 5 3 else cbp_index c3 0 2 5 3 10 1 5 3 11 0 5 3 if(mspel_bit) mspel u(1) 5 0 if(abt_flag){ per_mb_abt u(1)^1 5 0 if(!per_mb_abt) abt_type c3 5 } if(per_mb_rl_bit) per_mb_rl_table u(1) 45 else per_mb_rl_table NC 0 12345 if(!per_mb_rl_table){ rl chroma_table_index c3 345 2 rl table_index = rl chroma_table_index } dc_table_index u(1) 345 0 mv_table_index u(1) 345 0 } The frame number is the frame number modulo 31 (skipped frames are counted too). The slice code is the height of each slice in MSMPEG4 v1. In later versions, it is the number of slices + 0x16. j_type_bit: indicates that the j_type flag is coded j_type: is 1 if the Frame is a J Frame 0 if its a I Frame per_mb_rl_bit is 1 if the ac vlc table is selected per Macroblock use_mb_skip_code is 1 if the skip bit is coded for each MB ---------------- Ext Header: ext_header: fps u(5) 12345 bitrate (in kbit) u(11) 12345 flipflop_rounding NC 0 12 u(1) 34 NC 1 5 mspel_bit u(1) 5 0 ? u(1) 5 ? abt_flag u(1) 5 0 j_type_bit u(1) 5 0 top_left_mv_flag u(1) 5 0 per_mb_rl_bit NC 0 123 NC bit_rate>50kbit 4 u(1) 5 slice_code u(3) >0 5 FPS is truncated to an integer value (e.g., 30.9 -> 30). The slice_code is the number of slices (note no + 0x16 here). The ext header is at the end of each I Frame for version 1-3, part of the main I frame header for version 4 and in the global ASF/AVI header for version 5 --------------- 3rd Level AC ESC Coding, Versions 1-3: esc3: last u(1) 123 run u(6) 123 level s(8) 123 ---------------- 3rd Level AC ESC Coding, Versions 4-5: esc3: last u(1) 45 if(first_esc3_in_frame){ if(quantizer_scale<8){ level_length u(3) 45 if(level_length==0){ level_length u(1)+8 45 } }else{ level_length 1 2 45 01 3 45 ... 000001 7 45 000000 8 45 } run_length u(2)+3 45 } run u(run_length) 45 sign u(1) 45 level u(level_length) 45 --------------- WMV2 MB Skip Bitstream; wmv2_skip: skip_type u(2) 5 if(skip_type==0){ // no macroblock is skiped for(mb_y=0; mb_y < mb_height; mb_y++){ for(mb_x=0; mb_x < mb_width; mb_x++){ skip[mb_y][mb_x] NC 0 5 } } }else if(skip_type==1){ //1 skip bit for every macroblock for(mb_y=0; mb_y < mb_height; mb_y++){ for(mb_x=0; mb_x < mb_width; mb_x++){ skip[mb_y][mb_x] u(1) 5 } } }else if(skip_type==2){ for(mb_y=0; mb_y < mb_height; mb_y++){ skip_row[mb_y] u(1) 5 if(!skip_row[mb_y]){ for(mb_x=0; mb_x < mb_width; mb_x++){ skip[mb_y][mb_x] u(1) 5 1 } } } }else{ for(mb_x=0; mb_x < mb_width; mb_x++){ skip_column[mb_x] u(1) 5 if(!skip_column[mb_x]){ for(mb_y=0; mb_y < mb_height; mb_y++){ skip[mb_y][mb_x] u(1) 5 1 } } } } --------------- Macroblock Bitstream, Versions 1-2: mb: if(P Frame){ if(use_skip_mb_code) skip[mb_y][mb_x] u(1) 1234 0 code v1_inter_cbpc 1 code v2_mb_type 2 intra_mb= code>>2 }else{ code v1_intra_cbpc 1 code v2_intra_cbpc 2 intra_mb= 1 } if(intra_mb) ac_pred u(1) 2 0 cbpy cbpy_tab 12 cbp= (code & 0x3) | (cbpy<<2) if(intra_mb){ if(P Frame && version==1) cbp^= 0x3C; }else{ if(version==1 || (cbp&3) != 3) cbp^= 0x3C; mv } for(i=0; i<6; i++){ block(i, (cbp>>(5-i))&1) } --------------- Macroblock Bitstream, Versions 3-5: mb: if(P Frame){ // Note: for version 5 skip is stored directly after the main header if(use_skip_mb_code) skip[mb_y][mb_x] u(1) 1234 else skip[mb_y][mb_x] NC 0 1234 code table_mb_non_intra 34 code wmv2_inter_table[cbp_index] 5 intra_mb= (~code&0x40)>>6 cbp= code &0x3F }else{ code table_mb_intra 345 intra_mb= 1 cbp=0 for(i=0;i<6;i++) { int val = ((code >> (5 - i)) & 1); if (i < 4) { val = val ^ coded_block_pred(i); } cbp |= val << (5 - i); } } if(intra_mb){ ac_pred u(1) 345 if(inter_intra_pred) pred_direction table_inter_intra 45 }else{ if(!mspel && top_left_mv_flag && mb_x && mb_y && top_left_mv_diff >=8) mv_prediction_direction u(1) 5 2 } if(per_mb_rl_table && cbp){ rl_table_index c3 45 chroma_rl_table_index= rl_table_index } if(!intra_mb){ if(abt_flag && per_mb_abt && cbp){ per_block_abt u(1) 5 0 if(!per_block_abt) abt_type c3 5 0 }else per_block_abt=0 mv 345 } for(i=0; i<6; i++){ block(i, (cbp>>(5-i))&1) } mv_prediction_direction: 0-> predict from left MB 1-> predict from top MB 2-> predict from median table_inter_intra[4][2]={ {0,1} /*Luma-Left Chroma-Left*/, {2,2} /*Luma-Top Chroma-Left*/, {6,3} /*luma-Left Chroma-Top */, {7,3} /*luma-Top Chroma-Top */ }; --------------- Block Bitstream: block(i, coded): if(intra_mb) subblock(i, coded) else if(coded){ if(per_block_abt) abt_type c3 5 0 if(abt_type) sub_cbp 0 2 5 1 10 3 5 1 11 1 5 1 if(sub_cbp&1) subblock(i, 1) if(sub_cbp&2) subblock(i, 1) } Note: For versions 1-4 subblocks == blocks. --------------- Subblock Bitstream (mostly identical to ISO-MPEG4 for version 2-5): subblock(i, coded): if(intra_mb){ dc if(n<4) // luma rl= rl_table[rl_table_index] else //chroma rl= rl_table[3 + chroma_rl_table_index] }else{ rl= rl_table[3 + rl_table_index] run_diff=1 345 0 } last= coded while(!last){ code rl 12345 if(code==rl->n){ //Escape escape 1-2 1 1 2345 3 01 2 00 3 if(escape==1){ code rl 2345 sign u(1) 2345 run= rl->run[code] last= rl->last[code] level= rl->level[code] + rl->max_level[last][run] }else if(escape==2){ code rl 2345 sign u(1) 2345 level= rl->level[code] last= rl->last[code] run= rl->run[code] + rl->max_run[last][level] + run_diff }else esc3 }else{ sign u(1) 12345 level= rl->level[code] last= rl->last[code] run= rl->run[code] } } --------------- Motion Vector Bitstream, Versions 1-2 (identical to ISO-MPEG4 except for the limited range): mv: mv_component(dmvx) mv_component(dmvy) mv_component(dmv): code mv2_vlc 12 if(code){ sign u(1) 12 if(sign) dmv= -code else dmv= code }else dmv= 0 --------------- Motion Vector Bitstream, Versions 3-5: mv: code mv_vlc[mv_table_index] 345 if(code==1099){ dmvx u(6)-32 345 dmvy u(6)-32 345 }else{ dmvx = mvx_table[code] dmvy = mvy_table[code] } if(mspel && ((dmvx|dmvy)&1)) hshift u(1) 5 0 --------------- DC Bitstream, Versions 1-2 (identical to ISO-MPEG4 except that 0 and 1 are exchanged): dc: len dc2_vlc[isChroma]^-1 12 if(len!=0){ level u(len) 12 if((level >> (len - 1)) == 0) /* if MSB not set it is negative*/ level = - (level ^ ((1 << len) - 1)); if(len>8) marker u(1) 1 12 }else level=0 --------------- DC Bitstream, Versions 3-5: dc: if(isLumaBlock) level luma_dc_vlc[dc_table_index]345 else level chroma_dc_vlc[dc_table_index]345 if(level==119) level u(8) 345 if(level!=0) sign u(1) 345 if(sign) level= -level ------------------------------ Data Types: u(x) unsigned number encoded in x bits in MSB first order s(x) a twos complement signed number encoded in x bits in MSB first order = simply the x least significant bits of a variable (on a normal cpu) c3 value bitstream 0 0 1 10 2 11 NC Not coded (always a constant) ------------------------------ MSMPEG4 tables: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/msmpeg4data.h?rev=HEAD&content-type=text/vnd.viewcvs-markup Standard ISO-MPEG4 & H263 tables: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/mpeg4data.h?rev=HEAD&content-type=text/vnd.viewcvs-markup http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ffmpeg/ffmpeg/libavcodec/h263data.h?rev=HEAD&content-type=text/vnd.viewcvs-markup =============================================================================== Changelog: 0.07 2003-01-13 cosmetics by Mike Melanson minor changes 0.06 2003-01-13 (still non public) dc prediction direction fix (variable names where exchanged) Grammar, punctuation, spelling, capitalization by Mike Melanson formatting, 80 column linewrap by Mike Melanson dc & mv prediction fix minor changes 0.05 2003-01-10 (non public AFAIK) minor changes non linear dc quantization ext header is also in msmpeg4v1 finished header bitstream mb bitstream block & subblock bitstream 0.04 2003-01-10 (non public AFAIK) minor changes cbp prediction ac prediction & scantables dc bitstream slices & prediction more cosmetics by Mike Melanson 0.03 (non public AFAIK) spelling fixes & comments & improvements by Mike Melanson fixed table links 0.02 (non public AFAIK) added defaults to the bitstream syntax instead of lots of not coded cases dct/idct MV prediction MV bitstream MB skip encoding ext header ac esc3 encoding dc prediction 0.01 2003-01-07 initial (non public AFAIK) Note: Entries without a name are written by Michael Niedermayer =============================================================================== GNU Free Documentation License: see http://www.gnu.org/licenses/fdl.html