Asteroids - maybe not - now with movement.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Asteroids - maybe not - now with movement.
trying to port asteroids now.
This is a static test scene being draw.
With any more rocks on screen than that I run out of CPU time.
I can see some areas for some marginal improvement in line_draw and set_pixel but I don't think enough to get a very well featured game.
I can try split the "logic" and "render" on odd/even VSync.
Possibly drop to 256x224 pixels use 2 x 768 bytes of RAM and double buffer and dedicate 1/2 the 256 ram tiles per buffer.
I am already trying to rewrite mode6 a bit to save 1000+ cycles by clearing the VRAM during callbacks.
Anyone else have any other ideas?
This is a static test scene being draw.
With any more rocks on screen than that I run out of CPU time.
I can see some areas for some marginal improvement in line_draw and set_pixel but I don't think enough to get a very well featured game.
I can try split the "logic" and "render" on odd/even VSync.
Possibly drop to 256x224 pixels use 2 x 768 bytes of RAM and double buffer and dedicate 1/2 the 256 ram tiles per buffer.
I am already trying to rewrite mode6 a bit to save 1000+ cycles by clearing the VRAM during callbacks.
Anyone else have any other ideas?
- Attachments
-
- ast.hex
- (28.68 KiB) Downloaded 512 times
Last edited by CunningFellow on Mon Apr 22, 2013 6:44 am, edited 1 time in total.
Re: Asteroids - maybe not
If the asteroids aren't going to rotate you could try to making some kind of 1bit per pixel ROM to RAM bitter and pre-draw the rocks in ROM. There might actually be enough ROM to have rotating rocks.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
The bit-blit may be an idea if others fail.
You would need to store 8 versions of each each with a +1 pixel H offset to make it faster.
You would also then have zx_spectrum like collision flicker when they overlapped.
I have a few ideas I am working on now for speed.
Aligning the VRAM and RAM_TILES to 1K boundaries.
Shortening the display to 256x224 (still same aspect ratio) and using the extra H_Sync clocks to pre-clear VRAM and maybe RAM_Tiles.
Maybe even shorted the display to 256x208 and gain an extra 16 lines worth of CPU time.
Finally having odds/evens v_sync and split logic/render.
The most important thing I think is to save clocks on SETPIXEL as it is called a lot. The 1K alignment should help there.
You would need to store 8 versions of each each with a +1 pixel H offset to make it faster.
You would also then have zx_spectrum like collision flicker when they overlapped.
I have a few ideas I am working on now for speed.
Aligning the VRAM and RAM_TILES to 1K boundaries.
Shortening the display to 256x224 (still same aspect ratio) and using the extra H_Sync clocks to pre-clear VRAM and maybe RAM_Tiles.
Maybe even shorted the display to 256x208 and gain an extra 16 lines worth of CPU time.
Finally having odds/evens v_sync and split logic/render.
The most important thing I think is to save clocks on SETPIXEL as it is called a lot. The 1K alignment should help there.
Re: Asteroids - maybe not
I agree that 256x224 will allow more optimization without scarifying too much screen real estate. Pre-clearing ramtiles during hsync will be tricky since they can be located on any scanline. An idea: the black bars on the side could also be made into static arcade cabinet-style marquee.CunningFellow wrote:As it stands I have done about as much optimizing of my asteroids code as I think can be done
(It is from another AVR project with Asteroids playing on an LCD screen)
My only options for optimization now are the video kernel and line/pixel routines.
I am going to try shorten the display to 256x224. So X and Y can both be uint8_t.
This is also going to give me extra clocks each H_Sync that I will use to pre-clear VRAM (and maybe TILEs if I can work out a quick way)
Line/Pixel routines are going to have 2 entry points. One that is C callable that saves registers and one that is ASM callable that doesnt need to push/pop as much.
Finally I am going to try align VRAM and TILES to 1K boundaries to save some MULs
TILEs 1Kbyte at 0x0800 (-60)
VRAM 896byte at 0x0c00 (-60)
Does any of this break your kernel/code philosophy ????
The align trick is interesting but doesn't it wastes ram? I was thinking is some memory interleave trick like mode 3 with scrolling could not be used...
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
OK rational for aligning ramTile and vram to 0x800 and 0xc00 respectivly
I think to go from X,Y to VRAM char could be
13 clocks instead of the current 18 with unaligned VRAM
Changing the logic about RamTiles can also save a few clocks I think.
I think to go from X,Y to VRAM char could be
Code: Select all
MOV HiVram, Y ; Assume this is a movw and only one clock. The two clocks is for clarity
MOV LoVram, X
LSR HiVram
LSR HiVram
LSR HiVram
LSR HiVram
ROR LoVram
LSR HiVram
ROR LoVram
LSR HiVram
ROR LoVram
SBR HiVram, 0x0C
LD LoRamTile, VRAM
Changing the logic about RamTiles can also save a few clocks I think.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
At least 1700 clock can be saved by clearing only the VRAM.uze6666 wrote: I agree that 256x224 will allow more optimization without scarifying too much screen real estate. Pre-clearing ramtiles during hsync will be tricky since they can be located on any scanline. An idea: the black bars on the side could also be made into static arcade cabinet-style marquee.
I realise the clearing tile_ram is tricky. I was thinking I may be able to read VRAM after line 7 and then clear only tiles that where used previous Char_Row.
It's going to be messy code, but it might work.
128 Ramtiles = 1024 bytes.uze6666 wrote: The align trick is interesting but doesn't it wastes ram? I was thinking is some memory interleave trick like mode 3 with scrolling could not be used...
32x24 Chars in VRAM = 768 bytes
allign that with 0x0800 and 0x0c00 and there is not wasted space between them.
There is 256 bytes ABOVE them that could either be left as stack space or used for something else.
Should not be too much waste.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
Thinking about it
256 bytes above the obv choice is some ((SIN/COS)*scale) tables.
I think my ASM macro set_pixel routine is down to
14 Clocks Best
51 Clocks worst
This is an improvement over the non-aligned version in videomode6 kernel that has
22 clocks Best
58 clocks Worst
Also C callable version of set pixel should be 34/44 clocks best/worst
256 bytes above the obv choice is some ((SIN/COS)*scale) tables.
I think my ASM macro set_pixel routine is down to
14 Clocks Best
51 Clocks worst
This is an improvement over the non-aligned version in videomode6 kernel that has
22 clocks Best
58 clocks Worst
Also C callable version of set pixel should be 34/44 clocks best/worst
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
OK - here is SetPixel that is callable from C
It is 37 clocks best 47 clocks worst (including the RET)
It still has a MUL in the (TILE_NUMBER << 3) | (Y & 0x03) part to be compatible with the old render engine. I can save 4 clocks here by changing the render engine to expect
0 0 0 0 0 0 y1 y0 : y2 t6 t5 t4 t3 t2 t1 t0
for the tile addressing.
I can also see with some big sheets of paper to draw some flow diagrams - I should be able to get line draw&pixel ASM routines at least 2x as fast as they currently are.
Comments, critique and improvements welcome.
It is 37 clocks best 47 clocks worst (including the RET)
It still has a MUL in the (TILE_NUMBER << 3) | (Y & 0x03) part to be compatible with the old render engine. I can save 4 clocks here by changing the render engine to expect
0 0 0 0 0 0 y1 y0 : y2 t6 t5 t4 t3 t2 t1 t0
for the tile addressing.
I can also see with some big sheets of paper to draw some flow diagrams - I should be able to get line draw&pixel ASM routines at least 2x as fast as they currently are.
Code: Select all
.global SetPixelFastC
; C-Callable
; X in r24
; Y in r22
SetPixelFastC:
mov r25,r22 ; Mov Y from r22 to r25 so they are in consecutive regs R24/25
movw r28,r24 ; Mov X/Y in to Y-Register (can be trashed). Y is now "VRAM address Hi/Lo"
; R29 R28 Carry
; y7y6y5y4y3y2y1y0 x7x6x5x4x3x2x1x0 -
lsr r29 ; 0 y7y6y5y4y3y2y1 x7x6x5x4x3x2x1x0 y0
lsr r29 ; 0 0 y7y6y5y4y3y2 x7x6x5x4x3x2x1x0 y1
lsr r29 ; 0 0 0 y7y6y5y4y3 x7x6x5x4x3x2x1x0 y2
lsr r29 ; 0 0 0 0 y7y6y5y4 x7x6x5x4x3x2x1x0 y3
ror r28 ; 0 0 0 0 y7y6y5y4 y3x7x6x5x4x3x2x1 y3
lsr r29 ; 0 0 0 0 0 y7y6y5 x7x6x5x4x3x2x1x0 y4
ror r28 ; 0 0 0 0 0 y7y6y5 y4y3x7x6x5x4x3x2 y4
lsr r29 ; 0 0 0 0 0 0 y7y6 y4y3x7x6x5x4x3x2 y5
ror r28 ; 0 0 0 0 0 0 y7y6 y5y4y3x7x6x5x4x3 y5
ori r29, hi8(vram) ; Fixed in linker to 0x0C00 0 0 0 0 1 1 y7y6 y5y4y3x7x6x5x4x3 y5
ld r22, Y ; Get the Tile to use from VRAM address. r22 is now Tile#
cpi r22, 0x00 ; See if there is already a till allocated at this X/Y address
brne SPF_Allocated
lds r22,nextFreeRamTile ; If not allocated then we need to get # of the next free tile
cpi r22,(RAM_TILES_COUNT-1) ; make sure we have not run out of ram tiles
breq SPF_Fail
st Y, r22 ; After alloacting new tile save the # in the VRAM location X/Y
inc r22 ; Save the new value of "next free" into
sts nextFreeRamTile, r22
dec r22 ; undo the INC two lines above because we want to know THIS not next
SPF_Allocated:
; R23 / R1 R22 / R0 Carry
; - - - - - - - - 0 t6t5t4t3t2t1t0 -
ldi r19, 0x08 ;
mul r22,r19 ; x8 and leave result in r0/r1 0 0 0 0 0 0 t6t6 t4t3t2t1t0- - - -
andi r25, 0x07 ; clear r25 to 0 0 0 0 0 y2y1y0
or r0, r25 ; 0 0 0 0 0 0 t6t5 t4t3t2t1t0y2y1y0 -
; NEXT LINE IS DEFERED TILL BELOW Duplicated for comment clarity
; ori r29, Hi8(ramTiles) ; Fixed in linker at 0x0800 0 0 0 0 1 0 t6t5 t4t3t2t1t0y2y1y0 -
; R25 R24 Carry
; 0 0 0 0 0 y2y1y0 x7x6x5x4x3x2x1x0 -
andi r24, 0x07 ; 0 0 0 0 0 y2y1y0 0 0 0 0 0 x2x1x0 -
ori r24, lo8(shift_tbl_ram) ; 0 0 0 0 0 y2y1y0 t7t6t5t4t3x2x1x0 -
ldi r25, hi8(shift_tbl_ram) ; T7T6T5T4T3T2T1T0 t7t6t5t4t3x2x1x0 -
movw r28, r24 ; Get pixel mask
ld r20, Y
movw r28, r0 ; Move Tile_Row_Byte_Address into Y from r0/r1 where it was left from MUL
; NEXT LINE IS DEFERED FROM ABOVE
ori r29, hi8(ramTiles)
ld r21, Y ; Get TileRowByte
or r21, r20 ; OR TileRowByte with the pixel mask
st Y, r21 ; write TileRowByte back to memory
clr r1 ; clear r1 back to zero after the MUL trashing.
SPF_Fail:
ret
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Asteroids - maybe not
OH - someone here asked me about the alignment of vram and ramTiles to the specific locations.
I am not real smart and I don't know much about linker scripts. I can however ape other people who are smart
In the C code
then in the make file add
to the
## Linker flags
bit.
This will waste the 256 top bytes in its current set up. I can either alter the main target sections for the 644 to move stack to 0x00FFF or I can fill that top 1/4 K with SIN/COS tables.
I am not real smart and I don't know much about linker scripts. I can however ape other people who are smart
In the C code
Code: Select all
unsigned char ramTiles[128*8] __attribute__ ((section (".ramtiles")));
unsigned char vram[32*24] __attribute__ ((section (".vram")));
Code: Select all
LDFLAGS += -Wl,--section-start=.ramtiles=0x00800800
LDFLAGS += -Wl,--section-start=.vram=0x00800C00
## Linker flags
bit.
This will waste the 256 top bytes in its current set up. I can either alter the main target sections for the 644 to move stack to 0x00FFF or I can fill that top 1/4 K with SIN/COS tables.
Re: Asteroids - maybe not
A pretty good improvement! I admit having a hard time figuring out the algorithm at first sight, but if it works, eh that all that counts.