Hehe, that an awesomely suspicious speed gain!
When I saw that I immediately suspected the compiler optimization so I compiled your example and looked at the sources..and sure enough:
Code: Select all
while(!GetVsyncFlag()){
SETTILE(0,0,0);
2088: 10 92 00 01 sts 0x0100, r1
ts++;
208c: 21 96 adiw r28, 0x01 ; 1
PrintInt(10,10,ts,true);
That single "sts 0x0100, r1" is the compiler's "super optimized" version SETTILE
. 0x0100 is VRAM[0] and r1 always zero is GCC's asm code. That happens because you don't vary any of the parameters.If you add volatile parameters the code becomes:
Code: Select all
while(!GetVsyncFlag()){
SETTILE(x,y,t);
2088: 20 91 bb 07 lds r18, 0x07BB
208c: 30 91 ba 07 lds r19, 0x07BA
2090: 40 91 b9 07 lds r20, 0x07B9
2094: 82 2f mov r24, r18
2096: 90 e0 ldi r25, 0x00 ; 0
2098: fc 01 movw r30, r24
209a: 55 e0 ldi r21, 0x05 ; 5
209c: ee 0f add r30, r30
209e: ff 1f adc r31, r31
20a0: 5a 95 dec r21
20a2: e1 f7 brne .-8 ; 0x209c <main+0x5e>
20a4: 88 0f add r24, r24
20a6: 99 1f adc r25, r25
20a8: e8 1b sub r30, r24
20aa: f9 0b sbc r31, r25
20ac: e3 0f add r30, r19
20ae: f1 1d adc r31, r1
20b0: e0 50 subi r30, 0x00 ; 0
20b2: ff 4f sbci r31, 0xFF ; 255
20b4: 40 83 st Z, r20
ts++;
20b6: 21 96 adiw r28, 0x01 ; 1
ClearVsyncFlag();
ts = 0;
Which is pretty terrible code! (no mul !?) The function gets 760 while the macro yields 581. For sure, there's possible gains by removing the call/ret overhead of the SetTile function (which takes up about 25-30% of the cycles). But that inline macro would need to be in assembler. I recalled Paul McPhee made a nice inline asm macro for SetTile in B.C dash, I digged his sources:
Code: Select all
#define inline_set_tile(x,y,tileId) \
asm ( \
"mov r24,%2" "\n\t" \
"ldi r25,%4" "\n\t" \
"ldi %A3,lo8(vram)" "\n\t" \
"ldi %B3,hi8(vram)" "\n\t" \
"mul %1,r25" "\n\t" \
"clr r25" "\n\t" \
"add r0,%0" "\n\t" \
"adc r1,r25" "\n\t" \
"add %A3,r0" "\n\t" \
"adc %B3,r1" "\n\t" \
"subi r24,%5" "\n\t" \
"st %a3,r24" "\n\t" \
"clr r1" \
: /* no outputs */ \
: "r" (x), \
"r" (y), \
"r" (tileId), \
"e" (vram), \
"M" (VRAM_TILES_H), \
"M" (~(RAM_TILES_COUNT-1)) \
: "r24", \
"r25" \
)
This inline macro yields a nice 917 or about a 20% gain over the function.
That's a great idea nowaday to have a macro version in addition to the function. Would need a distinct name though, what about...SetTileMacro or InlineSetTile? Other ideas?