In the mode's description, the code block for the tile rows would take 21 words (42 bytes, so 336 bytes for a tile). What I was thinking about whether it was possible to trim this down considerably for an attribute mode like result (which Yllawwally uses in his rougelike: only 2 colors per tile). This seems very much possible!
Code: Select all
common:
out PIXOUT, ZL
movw ZL, r0
add ZH, r6 ; Block base for the row (at a 256 word boundary)
dec r2 ; Count of remaining tiles
out PIXOUT, r3
breq commone
ijmp
commone:
nop
out PIXOUT, r4
code_blocks:
out PIXOUT, r4
ldi r18, bgcol
ldi r19, fgcol
mov ZL, r18/r19 ; Px. 3 of tile
out PIXOUT, r18/r19 ; Px. 0 of tile
ld r0, X+ ; Tile index
mov r3, r18/r19 ; Px. 4 of tile
out PIXOUT, r18/r19 ; Px. 1 of tile
mul r0, r5 ; Code block size: 13 words
mov r4, r18/r19 ; Px. 5 of tile
out PIXOUT, r18/r19 ; Px. 2 of tile
jmp common
An even smaller alternative is the following concept:
Code: Select all
common:
out PIXOUT, r1
breq commone
mul r0, r4 ; Code block size: 11 words
out PIXOUT, r2
movw ZL, r0
add ZL, r6 ; Block base for the row, low
adc ZH, r7 ; Block base for the row, high
out PIXOUT, r3
ijmp
commone:
nop
out PIXOUT, r2
lpm r0, Z
out PIXOUT, r3
code_blocks:
movw ZL, cpair ; A reg. pair supplying the color set
out PIXOUT, ZL/ZH ; Px. 0 of tile
mov r1, ZL/ZH ; Px. 3 of tile
mov r2, ZL/ZH ; Px. 4 of tile
mov r3, ZL/ZH ; Px. 5 of tile
out PIXOUT, ZL/ZH ; Px. 1 of tile
ld r0, X+ ; Tile index
dec r5 ; Count of remaining tiles
out PIXOUT, ZL/ZH ; Px. 2 of tile
jmp common
The freely colorable alternative can also be brought down to 11 words for a code block if you are willing to make a bit more elaborate code block generator for it:
Code: Select all
common:
out PIXOUT, r1
mul r0, r4 ; Code block size: 11 words
movw ZL, r0
out PIXOUT, r2
dec r5 ; Count of remaining tiles
breq commone
add ZH, r7 ; Block base for the row (at a 256 word boundary)
out PIXOUT, r3
ijmp
commone:
out PIXOUT, r3
code_blocks:
ldi ZL, bgcol ; Or "ldi ZH, fgcol" depending on first pixel
out PIXOUT, ZL/ZH ; Px. 0 of tile
ldi ZH, fgcol ; Or "ldi ZL, bgcol" depending on first pixel
mov r1, ZL/ZH ; Px. 3 of tile
mov r2, ZL/ZH ; Px. 4 of tile
out PIXOUT, ZL/ZH ; Px. 1 of tile
ld r0, X+ ; Tile index
mov r3, ZL/ZH ; Px. 5 of tile
out PIXOUT, ZL/ZH ; Px. 2 of tile
jmp common
Note that all scanline cores read past the VRAM line one byte (since the "ld r0, X+" instruction comes before the line termination check). The proper entry is calculating the first tile like in the "common" code, setting up the remaining tiles appropriately (such as to 60), then executing an "ijmp" to the code blocks.
Palette effects are possible with all variants, the most obvious is the second form where color pairs are used fetched from registers, but you can also replace the "ldi" instructions in the others to "mov"s to load from a set of colors filled in somewhere in HSync.
Hope this could help realizing some high-res ideas!