[MPlayer-dev-eng] [PATCH] Detect "restrict" support
Falk Hueffner
falk.hueffner at student.uni-tuebingen.de
Thu Jul 4 21:32:41 CEST 2002
Arpi <arpi at thot.banki.hu> writes:
> what is that 'restrict' keyword good for?
It tells the compiler that the data a pointer points to cannot be
aliased by another pointer. Take for example this function:
void foo(short *block, const unsigned char *pixels, int line_size) {
int i;
for (i = 0; i < 8; i++) {
block[0] = pixels[0];
block[1] = pixels[1];
block[2] = pixels[2];
block[3] = pixels[3];
pixels += line_size;
block += 8;
}
}
The C standard mandates that char pointers can alias anything, so it
must assume block and pixels may point to the same region. It will
generate code like this (Alpha assembly):
ldbu t4,0(a1)
addl t3,0x1,t3
cmple t3,0x7,v0
stw t4,0(a0)
ldbu t2,1(a1)
stw t2,2(a0)
ldbu t0,2(a1)
stw t0,4(a0)
ldbu t1,3(a1)
addq a1,a2,a1
stw t1,6(a0)
lda a0,16(a0)
bne v0,10
i. e. it won't change the order of the loads (ldbu) and stores
(stw). Every store will then stall 2 cycles till the load is
finished. This loop takes 13 cycles (assuming data is in L1 cache).
However, we know that block and pixel actually can never alias, so we
mark them as restrict. The compiler can then reorder loads and stores:
ldbu v0,0(a1)
ldbu t3,1(a1)
ldbu t4,2(a1)
ldbu t0,3(a1)
addl t5,0x1,t5
addq a1,a2,a1
stw v0,0(a0)
cmple t5,0x7,v0
stw t3,2(a0)
stw t4,4(a0)
stw t0,6(a0)
lda a0,16(a0)
bne v0,10
The stores now no longer stall, the loads can double issue, and the
loop runs in 8 cycles.
gcc does not support this well for a long time, so you might need a
very recent gcc version; it seems 3.1 doesn't do this reordering, but
the CVS head does.
--
Falk
More information about the MPlayer-dev-eng
mailing list