A little bit of Mode 9 insanity

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
Post Reply
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

A little bit of Mode 9 insanity

Post by Jubatian »

Looking at Yllawwally's WIP screenshot in the UCC topic, I started to think a bit. Assuming it was Mode 9 at 4 cycles per pixel.

In the mode's description, the code block for the tile rows would take 21 words (42 bytes, so 336 bytes for a tile). What I was thinking about whether it was possible to trim this down considerably for an attribute mode like result (which Yllawwally uses in his rougelike: only 2 colors per tile). This seems very much possible!

Code: Select all

common:
	out   PIXOUT,  ZL
	movw  ZL,      r0
	add   ZH,      r6      ; Block base for the row (at a 256 word boundary)
	dec   r2               ; Count of remaining tiles
	out   PIXOUT,  r3
	breq  commone
	ijmp
commone:
	nop
	out   PIXOUT,  r4

code_blocks:
	out   PIXOUT,  r4
	ldi   r18,     bgcol
	ldi   r19,     fgcol
	mov   ZL,      r18/r19 ; Px. 3 of tile
	out   PIXOUT,  r18/r19 ; Px. 0 of tile
	ld    r0,      X+      ; Tile index
	mov   r3,      r18/r19 ; Px. 4 of tile
	out   PIXOUT,  r18/r19 ; Px. 1 of tile
	mul   r0,      r5      ; Code block size: 13 words
	mov   r4,      r18/r19 ; Px. 5 of tile
	out   PIXOUT,  r18/r19 ; Px. 2 of tile
	jmp   common
Hope this is somewhat self-explanatory. The main thing is that a code block in this one takes 13 words (26 bytes, so 208 bytes for a tile), although it requires word boundary alignments, which gives big steps between certain tile counts, here are some: 19 or less tiles: 4Kb; 39 or less tiles: 8Kb; 59 or less tiles: 12Kb; 78 or less tiles: 16Kb; 98 or less tiles: 20Kb; 118 or less tiles: 24Kb. Anyway, considerably smaller than the "stock" Mode 9.

An even smaller alternative is the following concept:

Code: Select all

common:
	out   PIXOUT,  r1
	breq  commone
	mul   r0,      r4      ; Code block size: 11 words
	out   PIXOUT,  r2
	movw  ZL,      r0
	add   ZL,      r6      ; Block base for the row, low
	adc   ZH,      r7      ; Block base for the row, high
	out   PIXOUT,  r3
	ijmp
commone:
	nop
	out   PIXOUT,  r2
	lpm   r0,      Z
	out   PIXOUT,  r3

code_blocks:
	movw  ZL,      cpair   ; A reg. pair supplying the color set
	out   PIXOUT,  ZL/ZH   ; Px. 0 of tile
	mov   r1,      ZL/ZH   ; Px. 3 of tile
	mov   r2,      ZL/ZH   ; Px. 4 of tile
	mov   r3,      ZL/ZH   ; Px. 5 of tile
	out   PIXOUT,  ZL/ZH   ; Px. 1 of tile
	ld    r0,      X+      ; Tile index
	dec   r5               ; Count of remaining tiles
	out   PIXOUT,  ZL/ZH   ; Px. 2 of tile
	jmp   common
This does the trick by limiting the choice of color pairs for the line, in this form allowing a selection from 10 color pairs at most (since register pairs r8:r9; r10:r11; r12:r13; r14:r15; r16:r17; r18:r19; r20:r21; r22:r23; r24:r25 and YL:YH are free here). Or if you abuse the stack to load the tile index using "pop", then 11 pairs become available (with XL:XH). Of course this type of code may also be used if you decide on a fixed background color, then the "movw ZL, cpair" instruction should be replaced appropriately to an "ldi" (also using a different register for the persisting bg. color which is not overridden). This takes 11 words for a code block (22 bytes, so 176 bytes for a tile), and it is not constrained to 256 word boundaries. With this, you could have almost twice the code tiles in the same ROM space like the normal 4cy / pixel Mode 9.

The freely colorable alternative can also be brought down to 11 words for a code block if you are willing to make a bit more elaborate code block generator for it:

Code: Select all

common:
	out   PIXOUT,  r1
	mul   r0,      r4      ; Code block size: 11 words
	movw  ZL,      r0
	out   PIXOUT,  r2
	dec   r5               ; Count of remaining tiles
	breq  commone
	add   ZH,      r7      ; Block base for the row (at a 256 word boundary)
	out   PIXOUT,  r3
	ijmp
commone:
	out   PIXOUT,  r3

code_blocks:
	ldi   ZL,      bgcol   ; Or "ldi ZH, fgcol" depending on first pixel
	out   PIXOUT,  ZL/ZH   ; Px. 0 of tile
	ldi   ZH,      fgcol   ; Or "ldi ZL, bgcol" depending on first pixel
	mov   r1,      ZL/ZH   ; Px. 3 of tile
	mov   r2,      ZL/ZH   ; Px. 4 of tile
	out   PIXOUT,  ZL/ZH   ; Px. 1 of tile
	ld    r0,      X+      ; Tile index
	mov   r3,      ZL/ZH   ; Px. 5 of tile
	out   PIXOUT,  ZL/ZH   ; Px. 2 of tile
	jmp   common
The trick is the order of loading the background and foreground colors, ordered so the right one is loaded for the first pixel of the tile. The same "banding" in ROM consumption applies due to the 256 word boundaries, but here more tiles fit in a band: 23: 4Kb, 46: 8Kb, 69: 12Kb, 93: 16Kb, 116: 20Kb, 139: 24Kb.

Note that all scanline cores read past the VRAM line one byte (since the "ld r0, X+" instruction comes before the line termination check). The proper entry is calculating the first tile like in the "common" code, setting up the remaining tiles appropriately (such as to 60), then executing an "ijmp" to the code blocks.

Palette effects are possible with all variants, the most obvious is the second form where color pairs are used fetched from registers, but you can also replace the "ldi" instructions in the others to "mov"s to load from a set of colors filled in somewhere in HSync.

Hope this could help realizing some high-res ideas! :)
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: A little bit of Mode 9 insanity

Post by CunningFellow »

I'm not in a good headspace to think about this kind of thing at the moment.

However I see you are doing a DEC/BREQ to end the tiles.

You could save a few clocks per tile by doing the interrupt ended scanline thing. That might help you get down a few more words.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: A little bit of Mode 9 insanity

Post by Jubatian »

Yes, I thought about that, but aimed to avoid the complexity coming from it, and the glitch of having the last pixel column 7 cycles wide (almost two pixels in this mode). It also enables mixing this mode with my kernel boost hack. Using timer termination would give 2 additional cycles to work with, but I don't think it is possible to exploit it. You need 7 or 8 instructions to get the job done (one or two to load colors, six to set or buffer pixels), which requires using 2 full gaps between pixels, while after the last pixel out, you must have the "jmp", thus forcing to have 3 pixels within the code block, which is a very thight constraint. I couldn't even ram it down to 10 words, no matter what I tried.

I however found a 10 word alternative:

Code: Select all

common:
   dec   r5               ; Count of remaining tiles
   out   PIXOUT,  r1
   breq  commone
   mul   r0,      r4      ; Code block size: 10 words
   out   PIXOUT,  r2
   movw  ZL,      r0
   add   ZL,      r6      ; Block base for the row, low
   adc   ZH,      r7      ; Block base for the row, high
   out   PIXOUT,  r3
   ijmp
commone:
   nop
   out   PIXOUT,  r2
   lpm   r0,      Z
   out   PIXOUT,  r3

code_blocks:
   movw  r18,     cpair   ; A reg. pair supplying the color set
   out   PIXOUT,  r18/r19 ; Px. 0 of tile
   mov   r1,      r18/r19 ; Px. 3 of tile
   mov   r2,      r18/r19 ; Px. 4 of tile
   mov   r3,      r18/r19 ; Px. 5 of tile
   out   PIXOUT,  r18/r19 ; Px. 1 of tile
   ld    r0,      X+      ; Tile index
   movw  ZL,      r8      ; Address of common in r9:r8
   out   PIXOUT,  r18/r19 ; Px. 2 of tile
   ijmp
This is a modification of the color pair variant, using a register pair loaded with the address of "common", and another pair due to the writing of ZL:ZH. This gives you a bit more than two tiles for one compared to normal Mode 9, but of course you would have only 8 (or 9 with the stack abusing) color pairs to choose from. Or you could use the variant described above having a fixed background color.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: A little bit of Mode 9 insanity

Post by Jubatian »

And here it comes, an 8 word variant!

Code: Select all

common:
   out   PIXOUT,  r0
   breq  commone
   ld    r0,      X+      ; Tile index
   out   PIXOUT,  r1
   mul   r0,      r4      ; Code block size: 8 words
   movw  ZL,      r0
   out   PIXOUT,  r2
   add   ZL,      r6      ; Block base for the row, low
   adc   ZH,      r7      ; Block base for the row, high
   dec   r5               ; Count of remaining tiles (Z flag preserved until next entry)
   out   PIXOUT,  r3
   ijmp
commone:
   nop
   out   PIXOUT,  r1
   lpm   r0,      Z
   out   PIXOUT,  r2
   lpm   r0,      Z
   out   PIXOUT,  r3

code_blocks:
   mov   r0,      cfg/cbg ; Px. 2 of tile
   out   PIXOUT,  cfg/cbg ; Px. 0 of tile
   mov   r1,      cfg/cbg ; Px. 3 of tile
   mov   r2,      cfg/cbg ; Px. 4 of tile
   mov   r3,      cfg/cbg ; Px. 5 of tile
   out   PIXOUT,  cfg/cbg ; Px. 1 of tile
   jmp   common
I just discovered that the color pair variant just doesn't need to actually load the color pair. Just load a set of registers with colors (there are 20 regs free in this) in HSync and use those in any combination you like for the tiles (as "cfg" and "cbg"), realizing the same essential output like the proposed color pair solution. 8 words for a code block, 128 bytes for a tile. It is nearly impossible to go any below this, and this would be likely suitable for Yllawwally's game. This solution of course also doesn't constrain you to 2 colors per tile row, you can use any register loaded with color for any of the pixels.

Well, "nearly impossible". There is one thing which can get one word down: using "rjmp" instead of "jmp" (padding with a "nop" in the common block). This would appear to limit the count of possible tiles to about 73 (due to the 8 Kbyte range of the relative jump), but you can get around this by replicating the common block for excess tiles, so all jumps can have one in range. With this you get 112 bytes for a tile, which means you have 3 tiles in the space of one with the "stock" Mode 9.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: A little bit of Mode 9 insanity

Post by Jubatian »

Another massive pile of insanity would be completing this concept:

Code: Select all

common:
   dec   r5               ; Count of remaining tiles
   out   PIXOUT,  r0
   breq  commone
   ld    r0,      X+      ; Tile index
   out   PIXOUT,  r1
   mul   r0,      r4      ; Code block size: 2 words
   movw  ZL,      r0
   out   PIXOUT,  r2
   add   ZH,      r7      ; Block base for the row (at a 256 word boundary)
   ijmp
commone:
   nop
   out   PIXOUT,  r1
   lpm   r0,      Z
   out   PIXOUT,  r2
   lpm   r0,      Z
   out   PIXOUT,  r3

code_block_selectors:
   out   PIXOUT,  r3
   rjmp  code_block_xxx

code_blocks:
   mov   r0,      cfg/cbg ; Px. 2 of tile
   out   PIXOUT,  cfg/cbg ; Px. 0 of tile
   mov   r1,      cfg/cbg ; Px. 3 of tile
   mov   r2,      cfg/cbg ; Px. 4 of tile
   mov   r3,      cfg/cbg ; Px. 5 of tile
   out   PIXOUT,  cfg/cbg ; Px. 1 of tile
   rjmp  common
This concept allows reusing tile rows, potentially capable to achieve even smaller sizes assuming normal usage. However the "rjmp" instructions (especially that from the code_block_selectors) make it quite nontrivial to exploit. The likely possible use would be by a resource compiler which generated a callable scanline function at a fixed address, which could be merged into the output binary (optionally through the C compiler if it is supplied as a big C array defined at a fixed address by the use of a section). This solution would allow the resource compiler to compress the input proper, probably further halving the size for "normal" usage (a monochrome character set would likely be quite compressible).

Anyway, I say this is rather for those really desperate, needing a lot of work to devise a good compressor.
yllawwally
Posts: 73
Joined: Tue Mar 05, 2013 7:29 pm

Re: A little bit of Mode 9 insanity

Post by yllawwally »

I am using mode 9, but with 60 columns, not 80. I just don't use more than 2 colors for most of the game. I haven't had time to figure out how to setup a palette properly in gimp. So I just left the characters with random colors. Currently my game using 96 tiles, using literally half of my rom space, at 32256 Bytes. I would go to dither images to save space. However it takes the same amount of ROM space. The 73 tile limitation, would not work for me. I will probably end up using a couple more tiles, before I finish. I'm at 94% rom usage. I would reduce the colors, but at the moment it provides no advantage. Being able to setup some ram tiles, would come in handy. I was looking into adding that to a custom mode 9, but haven't had the chance. And the other thing I was planning on working on was adapting the midi player, to play from sd instead. The only way I could go down to 73 tiles, would be if tiles could be double mapped. Then I would want 64 tiles. So that tile 0, would be the same as tile 63,127,and 255. This helps internally to reduce the amount of ram I need. Or some other method to make one tile, be referenced as several different numbers. I use the screen as ram for the game. There is not enough RAM to hold all the map info. So I only store the level details on the screen, using duplicate tiles to accomplish that. It uses some more rom, but it's better than the amount of ram that would be needed.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: A little bit of Mode 9 insanity

Post by Jubatian »

There is no 73 tile limitation anywhere, I just mentioned it as an apparent limit above with the 7 word / tile row variant, easy to overcome to give you up to 256 tiles (at 28.5 Kbytes). If you prefer that, in the 7 word / tile variant there is a "nop" which might be used to mask off the highest bit (or more bits) of the tile index, giving you 128 tiles only (at 14 Kbytes), but also a free bit to use in the VRAM. (A RAM economic alternative if you prefer to have more than 128 tiles is to have bit maps as additional VRAM which take 210 bytes for each bit, this is simply user side programming)

Normal RAM tiles are not possible since only one 2 cycle instruction fits within the pixel gaps. 1bpp RAM tiles neither fit since the color decision also counts as a 2 cycle instruction (an "sbrs" skip paired with a "mov"). They could possibly only fit as codeish tiles as described in the high resolution attribute mode topic, it is possible to make room for the branch-off (by getting the condition using a "muls" instead of "mul", so 128 ROM tile + 128 RAM tile split), but that also takes a ridiculous amount of ROM space, and I neither think it would fit in 6 pixels / tile.
yllawwally
Posts: 73
Joined: Tue Mar 05, 2013 7:29 pm

Re: A little bit of Mode 9 insanity

Post by yllawwally »

The way I use tiles has 3 tiles for each thing on the screen, except monsters. Full lit version, dim version, black version. Defining extra black tiles, I think only costs ~400 Bytes of ROM. Altogether an extra 6k of ROM space. Another way would be if a whole set of tiles, can all use the same tile. For example every tile over 191, would be black. Then I could save the 6k of ROM space. Although low I'm currently very low on remaing ROM, I think getting the midi files to be able sit on the SD, would give me plenty of space. The way I was considering doing the ram tiles, was they would appear at a specific location. Like a window in the middle of the screen. I figured that wouldn't be very tricky. I haven't looked at the engine very closely. They don't have to be the same resolution as the rest of the screen. They aren't a replacement for the regular tiles.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: A little bit of Mode 9 insanity

Post by Jubatian »

I checked a few things.

First, the current Mode 9 and GConvert's respective code. I think the assembly on the wiki should be correct, that is, GConvert generates this code for the tiles. This means that every tile you have would consume 336 ROM bytes, and it can't do any tile merging (so every tile index you needed to produce a black tile would take 336 additional ROM bytes). I couldn't come up with a variant which would support tile merging, simply there is no room for that, except if you went for the tile compression insanity (which would require a quite complex generator). Anyway, still, with the 7 word per tile row variant populating the full 256 tile set would take a bit less than your current tile ROM consumption.

If you have 210 bytes of RAM to spare, I would suggest going for 128 tiles (14K ROM using the 7 word / tile row variant) to get one bit in the VRAM free. Then you add a 60x28 bitmap (that's 210 bytes) in which you store another bit for each tile, so you end up with 2 additional information bits for each tile. This would serve your current goals as far as I understand.

A horizontally fixed, low res (6 cycles / pixel to align nicely with the 3 cycles / pixel Mode 9 tiles) RAM tile area should be possible, depending on available RAM for it and your ideas.

What do you think about the 7 word per tile alternative in general? Could you handle it if I only created some code for it which produces the code tiles through Gimp's header export feature? (that's about the simplest thing to do, generating an assembly source which you can compile together with your program, allowing the linker to resolve jumps and locate it - I just don't want to actually compile which the GConvert approach would require this case as the 7 word per tile row variant requires some support code and a lot of jumping around)

Of course it won't be done very soon, but if you see it useful to realize the game, I would likely make it.
yllawwally
Posts: 73
Joined: Tue Mar 05, 2013 7:29 pm

Re: A little bit of Mode 9 insanity

Post by yllawwally »

Creating each black tile separately for each, is not a problem. That would only have been a concern if the amount of tiles available would have to be cut. So tile merging isn't really something very important. I don't have 200 extra bytes I could dedicate to something. The monsters and items are items in a pointer list, so that RAM requirement isn't shown at compile time. However I do have a 28x20(560Byte) array that could be used, when I wanted to use RAM tiles. I could certainly use the 7 word variant. I have a little wiggle room on ROM space at the moment, but I still have a number of features that I haven't had time to implement, yet. I was just thinking about attribute mode. Depending on how it worked, I could make the foreground and background the same color, which would get rid of the need for black tiles, and simplify some of the code I use to move the monsters around.
Post Reply