[MPlayer-dev-eng] Cleaning your nuts

Thu Apr 22 17:32:06 CEST 2004

Hi

On Thursday 22 April 2004 04:11, D Richard Felker III wrote:
> OK, there's been some discussion/complaints/objections regarding some
> decisions in nut spec, and imo we need to figure out what the issues
> are and fix them so we can move on and finish it.
>
> Here are Ivan's complaints as I understand them:
> - error recovery sucks now
> - framecode is obfuscated nonsense
>
> In principle I agree with both of these. But all of Ivan's proposals
> to fix them have been bad and contradicted what I see as the main
> goals of nut:
> - correct & strictly ordered timestamps
> - read/write without seeking (including live streaming)
> - minimal filesize
> - simple implementation
>
> That's in order of importance to me (the first two have to be first
> because if you give up on them, filesize overhead can easily be an
> arbitrarily small constant).
>
> I would like to reconcile error recovery with these goals, but quite
> frankly it seems very difficult to do. Personally I'm inclined to
> think that error recovery does not belong in the file, but in the
> transport/storage medium. At the same time, however, it would be nice
> to be able to play incomplete downloads or recover movies from heavily
> damaged media.
>
> My challenge to us is to come up with a system that allows at least a
> minimal level of recovery/resync (without waiting for next syncpoint)
> while not increasing the filesize/overhead beyond that of any other
> container format.
>
> As of Michael's latest draft, there are only two types of frames:
> type2, with startcode to identify them and full vlc timestamps &
> datasize, and type0, with no stardcode and predicted timestamp & size.
> Neither has backward-pointers anymore, so the only way to recover
> after an error is to search for a startcode. This means that all type0
> frames until the next type2 frame will be lost. Unless we use lots of
> type2 frames (which bloats the file size), that means losing a big
> chunk of data.
>
> Ivan has pointed out that the bidirectional pointers are better in
> some cases, since you can walk backwards from the next startcode to
> get data after the point of corruption. But this doesn't seem to work
> well with header prediction/compression. Also, checking that
> forward/backward pointers match isn't necessarily so easy, since a
> common form of corruption is a uniform byte value over the damaged
> region. And this sort of recovery requires seeking quite a few times,
> rather than just continuing to read forward.
there are a few more issues with the bidirectional pointers / type 1 frames, 
the pointers needed 4bytes per packet while the startcode needs 8byte, a 
simple improvement would be to add the type 1 packets again but with a 4byte 
startcode instead of pointers, that way the overhead would be smaller then it 
ever was (no pointers in type 2 frames) and it would be simpler then the 
bidir pointer case, furthermore as a type 1 packet search would only be used 
in the case of actual damaged frames the 1 missdeection in 4gb shouldnt be an 
issue
probably 3byte startcodes would be enough for these?
lsb timestamps should be enough for these packets too, so i guess the type 1 
headers would be 50-70% smaller then type 2

there are several other ideas i have how error recovery with low overhead 
might be possible, but they all have a serious disadvantage:
1. possibility: use the startcodes in the video or audio stream, mpeg4, 
mp3, ... have startcodes which could be used for recovery and even for 
finding the size of a frame, it introduces an ugly codec-muxer dependancy and 
wont work with all codecs ...

2. possibility: use 2byte startcodes, and escape all occurances of this 2byte 
code in the stream, we wouldnt even need to store the packet size in this 
case, but theres the (un)escaping which needs to be done which wastes cpu 
time, we could also use a few bits (~4) of these 2 bytes as flags

>
> Finally, I have some proposals of my own. I'm not sure if they're good
> as-is, but I want to discuss them anyway.
>
> 1. Require perfect interleaving.
>
> This means if packet1 comes before packet2, packet1's timestamp is
> less than or equal to that of packet2. Presumably we have an exception
> for out-of-order frames, allowing them to be stored anywhere between
> the surrounding two frames in decode-order.
>
> Rationale: Demuxing AVI is hell because idiots make files with broken
> interleaving or no interleaving. We should stop this before it starts
> by strictly specifying the interleaving, and the only natural choice
> is monotone ordering.
>
> Caveats: Before writing any packet, the muxer must know that no other
> stream will want to write a packet before it. Either the muxer can
> buffer one packet in each stream, or the calling app can just call the
> muxer in the proper order.
further caveat, do we use the timestamp of the first, last or middle sample of 
an audio frame?

>
> 2. Get rid of framecode and replace it with something non-obfuscated.
>
> Only if we can avoid increasing overhead, of course. My original
> proposal was to replace the flags byte (which then became framecode)
> with a bitfield containing streamid and pts delta predictor, and maybe
> some flags too. Each field could be optionally vlc-coded after the
> bitfield byte too, in case of overflow. Leave the size (forward
> pointer) coded in a traditional way (without predication) so that it
> can't be ruined by past corrupt frames, allowing a slightly better
> degree of recovery.
>
> Rationale: Meets the simplicity goal and perhaps improves error
> recovery.
simlicity, yes,
error recovery IMHO no, its easily possible to require 50% of the framecodes 
to be invalid to detect errors

>
> Caveats: Might increase file size too much.
>
> 3. Store size (forward pointer) with bias.
>
> For audio streams, we may have very small frames. Somewhere around 128
> bytes. 128 is a magic number, because beyond 127 we need two bytes for
> vlc. Michael's solution is to use predictors and just code lsb of
> size, which is probably a good idea. But another idea I had (worth
> thinking about) is to store in the stream header a "base size" for
> each stream (minimum likely size of a packet) and have the forward
> pointer be relative to that. This way we could extend the range of a
> 1byte vlc up to 150 or 200 or something. A bit in the flags could
> indicate "absolute size" for rare packets that are smaller than the
> base size.
>
> Of course, it might be more efficient to just use several bits of the
> flags/bitfield byte for lsb of the size. For example 2 bits for stream
> id, 2 bits for pts predictor, and 3 or 4 bits for size lsb. That would
> allow packets up to 1024 bytes (or 2048) with just one byte spent for
> size.
>
> At the very least, we should make the size coding a little more
> efficient if we don't want to use predictors. For example, 0-byte
> packet is never possible, and neither is 1-byte. So we could always
> add 2 to the size. To allow a range of 2-1025 instead of 0-1023.
maybe i should mention here that for CBR audio there will only be 2 sizes, 
like 125 and 126, never another, and for vorbis there are 128 and 1024 sample 
packets, their bitstream size also differs similarely

[...]
-- 
Michael
level[i]= get_vlc(); i+=get_vlc();		(violates patent EP0266049)
median(mv[y-1][x], mv[y][x-1], mv[y+1][x+1]);	(violates patent #5,905,535)
buf[i]= qp - buf[i-1];				(violates patent #?)
for more examples, see http://mplayerhq.hu/~michael/patent.html
stop it, see http://petition.eurolinux.org & http://petition.ffii.org/eubsa/en