Asteroids - maybe not - now with movement.

Use this forum to share and discuss Uzebox games and demos.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Asteroids - maybe not - now with movement.

Post by CunningFellow »

trying to port asteroids now.

This is a static test scene being draw.

With any more rocks on screen than that I run out of CPU time.

I can see some areas for some marginal improvement in line_draw and set_pixel but I don't think enough to get a very well featured game.

I can try split the "logic" and "render" on odd/even VSync.

Possibly drop to 256x224 pixels use 2 x 768 bytes of RAM and double buffer and dedicate 1/2 the 256 ram tiles per buffer.

I am already trying to rewrite mode6 a bit to save 1000+ cycles by clearing the VRAM during callbacks.

Anyone else have any other ideas?
Attachments
ast.hex
(28.68 KiB) Downloaded 512 times
Last edited by CunningFellow on Mon Apr 22, 2013 6:44 am, edited 1 time in total.
User avatar
JRoatch
Posts: 108
Joined: Mon May 11, 2009 11:48 pm
Contact:

Re: Asteroids - maybe not

Post by JRoatch »

If the asteroids aren't going to rotate you could try to making some kind of 1bit per pixel ROM to RAM bitter and pre-draw the rocks in ROM. There might actually be enough ROM to have rotating rocks.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

The bit-blit may be an idea if others fail.

You would need to store 8 versions of each each with a +1 pixel H offset to make it faster.

You would also then have zx_spectrum like collision flicker when they overlapped.

I have a few ideas I am working on now for speed.

Aligning the VRAM and RAM_TILES to 1K boundaries.

Shortening the display to 256x224 (still same aspect ratio) and using the extra H_Sync clocks to pre-clear VRAM and maybe RAM_Tiles.

Maybe even shorted the display to 256x208 and gain an extra 16 lines worth of CPU time.

Finally having odds/evens v_sync and split logic/render.

The most important thing I think is to save clocks on SETPIXEL as it is called a lot. The 1K alignment should help there.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Asteroids - maybe not

Post by uze6666 »

CunningFellow wrote:As it stands I have done about as much optimizing of my asteroids code as I think can be done

(It is from another AVR project with Asteroids playing on an LCD screen)

My only options for optimization now are the video kernel and line/pixel routines.

I am going to try shorten the display to 256x224. So X and Y can both be uint8_t.

This is also going to give me extra clocks each H_Sync that I will use to pre-clear VRAM (and maybe TILEs if I can work out a quick way)

Line/Pixel routines are going to have 2 entry points. One that is C callable that saves registers and one that is ASM callable that doesnt need to push/pop as much.

Finally I am going to try align VRAM and TILES to 1K boundaries to save some MULs

TILEs 1Kbyte at 0x0800 (-60)
VRAM 896byte at 0x0c00 (-60)

Does any of this break your kernel/code philosophy ????
I agree that 256x224 will allow more optimization without scarifying too much screen real estate. Pre-clearing ramtiles during hsync will be tricky since they can be located on any scanline. An idea: the black bars on the side could also be made into static arcade cabinet-style marquee.

The align trick is interesting but doesn't it wastes ram? I was thinking is some memory interleave trick like mode 3 with scrolling could not be used...
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

OK rational for aligning ramTile and vram to 0x800 and 0xc00 respectivly

I think to go from X,Y to VRAM char could be

Code: Select all

MOV HiVram, Y    ; Assume this is a movw and only one clock.  The two clocks is for clarity
MOV LoVram, X

LSR HiVram
LSR HiVram
LSR HiVram

LSR HiVram
ROR LoVram
LSR HiVram
ROR LoVram
LSR HiVram
ROR LoVram

SBR HiVram, 0x0C
LD LoRamTile, VRAM
13 clocks instead of the current 18 with unaligned VRAM

Changing the logic about RamTiles can also save a few clocks I think.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

uze6666 wrote: I agree that 256x224 will allow more optimization without scarifying too much screen real estate. Pre-clearing ramtiles during hsync will be tricky since they can be located on any scanline. An idea: the black bars on the side could also be made into static arcade cabinet-style marquee.
At least 1700 clock can be saved by clearing only the VRAM.

I realise the clearing tile_ram is tricky. I was thinking I may be able to read VRAM after line 7 and then clear only tiles that where used previous Char_Row.

It's going to be messy code, but it might work.
uze6666 wrote: The align trick is interesting but doesn't it wastes ram? I was thinking is some memory interleave trick like mode 3 with scrolling could not be used...
128 Ramtiles = 1024 bytes.
32x24 Chars in VRAM = 768 bytes

allign that with 0x0800 and 0x0c00 and there is not wasted space between them.

There is 256 bytes ABOVE them that could either be left as stack space or used for something else.

Should not be too much waste.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

Thinking about it

256 bytes above the obv choice is some ((SIN/COS)*scale) tables.

I think my ASM macro set_pixel routine is down to

14 Clocks Best
51 Clocks worst

This is an improvement over the non-aligned version in videomode6 kernel that has

22 clocks Best
58 clocks Worst

Also C callable version of set pixel should be 34/44 clocks best/worst
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

OK - here is SetPixel that is callable from C

It is 37 clocks best 47 clocks worst (including the RET)

It still has a MUL in the (TILE_NUMBER << 3) | (Y & 0x03) part to be compatible with the old render engine. I can save 4 clocks here by changing the render engine to expect

0 0 0 0 0 0 y1 y0 : y2 t6 t5 t4 t3 t2 t1 t0

for the tile addressing.

I can also see with some big sheets of paper to draw some flow diagrams - I should be able to get line draw&pixel ASM routines at least 2x as fast as they currently are.

Code: Select all

.global SetPixelFastC
; C-Callable
; X in r24
; Y in r22

SetPixelFastC:

    mov     r25,r22             ; Mov Y from r22 to r25 so they are in consecutive regs R24/25

    movw    r28,r24             ; Mov X/Y in to Y-Register (can be trashed). Y is now "VRAM address Hi/Lo"

                                ;                                   R29              R28                Carry
                                ;                                   y7y6y5y4y3y2y1y0 x7x6x5x4x3x2x1x0   -
    lsr     r29                 ;                                   0 y7y6y5y4y3y2y1 x7x6x5x4x3x2x1x0   y0
    lsr     r29                 ;                                   0 0 y7y6y5y4y3y2 x7x6x5x4x3x2x1x0   y1
    lsr     r29                 ;                                   0 0 0 y7y6y5y4y3 x7x6x5x4x3x2x1x0   y2

    lsr     r29                 ;                                   0 0 0 0 y7y6y5y4 x7x6x5x4x3x2x1x0   y3
    ror     r28                 ;                                   0 0 0 0 y7y6y5y4 y3x7x6x5x4x3x2x1   y3

    lsr     r29                 ;                                   0 0 0 0 0 y7y6y5 x7x6x5x4x3x2x1x0   y4
    ror     r28                 ;                                   0 0 0 0 0 y7y6y5 y4y3x7x6x5x4x3x2   y4

    lsr     r29                 ;                                   0 0 0 0 0 0 y7y6 y4y3x7x6x5x4x3x2   y5
    ror     r28                 ;                                   0 0 0 0 0 0 y7y6 y5y4y3x7x6x5x4x3   y5

    ori     r29, hi8(vram)      ; Fixed in linker to 0x0C00         0 0 0 0 1 1 y7y6 y5y4y3x7x6x5x4x3   y5
	
    ld      r22, Y              ; Get the Tile to use from VRAM address. r22 is now Tile#

    cpi     r22, 0x00           ; See if there is already a till allocated at this X/Y address
    brne    SPF_Allocated

    lds     r22,nextFreeRamTile         ; If not allocated then we need to get # of the next free tile
    cpi     r22,(RAM_TILES_COUNT-1)     ; make sure we have not run out of ram tiles
    breq    SPF_Fail

    st      Y, r22                      ; After alloacting new tile save the # in the VRAM location X/Y

    inc     r22                         ; Save the new value of "next free" into
    sts     nextFreeRamTile, r22
    dec     r22                         ; undo the INC two lines above because we want to know THIS not next

SPF_Allocated:
                                ;                                   R23 / R1         R22 / R0           Carry
                                ;                                   - - - - - - - -  0 t6t5t4t3t2t1t0   -
    ldi     r19, 0x08           ;
    mul     r22,r19             ; x8 and leave result in r0/r1      0 0 0 0 0 0 t6t6 t4t3t2t1t0- - -    -
    andi    r25, 0x07           ; clear r25 to 0 0 0 0 0 y2y1y0
    or      r0, r25             ;                                   0 0 0 0 0 0 t6t5 t4t3t2t1t0y2y1y0   -

;   NEXT LINE IS DEFERED TILL BELOW Duplicated for comment clarity
;   ori     r29, Hi8(ramTiles)  ; Fixed in linker at 0x0800         0 0 0 0 1 0 t6t5 t4t3t2t1t0y2y1y0   -


                                    ;                               R25              R24                Carry
                                    ;                               0 0 0 0 0 y2y1y0 x7x6x5x4x3x2x1x0   -
    andi    r24, 0x07               ;                               0 0 0 0 0 y2y1y0 0 0 0 0 0 x2x1x0   -
    ori     r24, lo8(shift_tbl_ram) ;                               0 0 0 0 0 y2y1y0 t7t6t5t4t3x2x1x0   -
    ldi     r25, hi8(shift_tbl_ram) ;                               T7T6T5T4T3T2T1T0 t7t6t5t4t3x2x1x0   -
	
    movw    r28, r24                ; Get pixel mask			
    ld      r20, Y

    movw    r28, r0             ; Move Tile_Row_Byte_Address into Y from r0/r1 where it was left from MUL
	
;   NEXT LINE IS DEFERED FROM ABOVE
    ori     r29, hi8(ramTiles)
    ld      r21, Y              ; Get TileRowByte
    or      r21, r20            ; OR TileRowByte with the pixel mask
    st      Y, r21              ; write TileRowByte back to memory

    clr     r1                  ; clear r1 back to zero after the MUL trashing.
	
SPF_Fail:

    ret
Comments, critique and improvements welcome.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Asteroids - maybe not

Post by CunningFellow »

OH - someone here asked me about the alignment of vram and ramTiles to the specific locations.

I am not real smart and I don't know much about linker scripts. I can however ape other people who are smart

In the C code

Code: Select all

unsigned char ramTiles[128*8]  __attribute__ ((section (".ramtiles")));
unsigned char vram[32*24] __attribute__ ((section (".vram")));
then in the make file add

Code: Select all

LDFLAGS += -Wl,--section-start=.ramtiles=0x00800800
LDFLAGS += -Wl,--section-start=.vram=0x00800C00
to the

## Linker flags

bit.

This will waste the 256 top bytes in its current set up. I can either alter the main target sections for the 644 to move stack to 0x00FFF or I can fill that top 1/4 K with SIN/COS tables.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Asteroids - maybe not

Post by uze6666 »

A pretty good improvement! I admit having a hard time figuring out the algorithm at first sight, but if it works, eh that all that counts. :lol:
Post Reply