Palettization - color; colour; couleur; färg; farbe...

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
User avatar
Jubatian
Posts: 1569
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Palettization - color; colour; couleur; färg; farbe...

Post by Jubatian »

Some thoughts then on these modes...

Mode 13: There is no real reason to keep around the 28 tiles wide variant, maybe it had a little cleaner source, but that's all (I tried to compensate with easier to read layout and commenting). 30 tiles did fit by organizing code better, and removing the reset of Timer1 in every scanline (which is completely unnecessary, and took lots of cycles for nothing).

Sprite blitter: M13, 15 colors is the slowest, then comes M13, 8 colors. These both are C code. M74's blitter normally would be as fast as M13, 8 colors, but it is hand-optimized assembly using techniques which are impossible to access by any C code (such as the half carry flag). I didn't analyze code yet, but even the slowest M74 blitter (using arbitrary color remapping) might be 2 times faster than M13, 8 colors (It may be possible to port M74's blitter to M13, 8 colors).

M13 potential capabilities: By M13's technique there are sufficient cycles to produce 192 4bpp ROM tiles, but this capability is not exploited. So it has 128 ROM tiles and up to 128 RAM tiles (of course that would be 4K of RAM, so impossible). It should support bank switching (demonstrated in the Mode13ExtendedDemo, check its code, particularly line 181).

M64 potential capabilities: I now have a scanline loop which would be capable to have 128 x 4bpp ROM tiles + 64 x 1bpp ROM tiles + 64 x 4bpp RAM tiles for a tile row.

M13 has 30 x 8 pixels wide tiles in one row. M64 would have the same (same resolution). M74 has 24 x 8 pixels wide tiles in one row.
User avatar
D3thAdd3r
Posts: 3293
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Palettization - color; colour; couleur; färg; farbe...

Post by D3thAdd3r »

Jubatian wrote:M64 would have the same (same resolution)
Since I have not used M74 in a game let alone started to contemplate M64, maybe I have some things confused. From the information I am gathering off that last post, M64 can have the same resolution as M13 at 30 tiles? I had thought M64 was lower resolution or had some other tradeoff. If that were the case that it has all Mode 13 does, and it has the very handy 64 1BPP ram tiles for font stuff besides the 4bpp(64 is about all you will get in M13 anyway), it then seems to make M13 obsolete? I suppose even if so, M13 having tile banks gives it usefulness in specific situations. If all that is true, damn M64 might even let me finish Dig Dug where M13 just didn't work out(blitter cycles).
User avatar
Jubatian
Posts: 1569
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Palettization - color; colour; couleur; färg; farbe...

Post by Jubatian »

The difference between M13 and the M64 concept is that M13 is "ldd - ldd" for palette lookup, M64 is "ld - swap - ld" for lookup. An M64 scanline loop for a 8 pixels wide tile so has 4 cycles less for doing things beyond calculating pixels. M13 has 12 cycles, M64 would have 8 cycles.

M13 pushed to the extreme could possibly be terminated normally (not Timer), or have 192, or maybe even arbitrary amount of ROM tiles. M64's scanline loop needed some insane consideration, even to make the original (128 ROM tiles + up to 128 RAM tiles) concept possible, then some more to squeeze in the possibility of 64 1bpp ROM tiles (so now I have a scanline loop which can deal with 128 4bpp ROM tiles + 64 1bpp ROM tiles + 64 4bpp RAM tiles).

So if I did M64, then the current M13 could indeed become obsolete. However an M13 which uses those four cycles to give you more 4bpp ROM tiles or more options in selecting your ROM tiles (such as having 224 ROM + 32 RAM tiles if you don't need many sprites but would like to have more RAM than in Mode 3) could still be useful.

And of course currently you have M13 only.

Vote for M64! :D Anyway, I see there is need for this thing, of course, I am just a little swamped (SDHC, occasionally also progressing with FoaD, and that's just Uzebox).
User avatar
D3thAdd3r
Posts: 3293
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Palettization - color; colour; couleur; färg; farbe...

Post by D3thAdd3r »

Jubatian wrote:M13 pushed to the extreme could possibly be terminated normally (not Timer), or have 192, or maybe even arbitrary amount of ROM tiles.
That I see there is no advantage for M13 to *not* terminate with the timer, except maybe save a few flash bytes; or is there something you see where it would be an advantage to not use the timer?

Well whenever you get M64 done I will definitely use it for some game(Lolo if it can blit fast enough for ~38 sprite/ram tiles with 26 tiles high...another game where I have to switch away from M13 due to cycles...with no replacement available). Only hope I have to do it, is to stick to levels of asm I can actually handle, so basically higher level things to unburden resources in other ways so that M3-likes can be used(SPI vram to increase ram tiles, streaming music to make up for 8bpp).
nicksen782 wrote:30 tiles wide? Mode 13 with scrolling? I can put a max of 15 (2x2) sprites on screen right now (comfortably 12) with my game in mode 3 (no scrolling).
M13 with scrolling is the only working M13 option right now, and it *does* support 30 tiles wide. If you have say 32 ram tiles right now with M3, for the same amount of ram, you should be able to have ~(64 ram tiles)-(256 bytes palette) = 56 ram tiles. An extra restriction, the ram tiles must be a multiple of 8 however so, 8,16,24,32,40,48,56,64,72 are the possible values.

There is a real issue right now though, and it might not be possible to improve it enough to hit equilibrium with M3. It seems like you can blit about 28 tiles with M13 15 color, in the same amount of times it takes to do 38 tiles with M3. Since 38 tiles is very close to the maximum possible with a short screen and a game on top...basically those extra ram tiles are worthless for sprites, and I would expect you will actually have *less* sprites possible with M13, even though you will have more ram tiles(there is no time to blit to them). Honestly I have a hard time believing there is anything possible with an asm blitter or otherwise, that will change that fundamental fact(that palette modes are inherently slower to blit, and non-palette modes run out of ram almost exactly as they run out of cycles:perfect...if not so much space used!).

@Jubatian do you know a relative speed for M74 blitter versus the M3 one? I don't think it can every possibly be faster than M3, what do you think?
User avatar
Jubatian
Posts: 1569
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Palettization - color; colour; couleur; färg; farbe...

Post by Jubatian »

D3thAdd3r wrote: Wed Aug 09, 2017 1:38 am@Jubatian do you know a relative speed for M74 blitter versus the M3 one? I don't think it can every possibly be faster than M3, what do you think?
M74's sprite fraction blitter for a 7px wide segment (an offset of 1px, this is the slowest) takes around 80 cycles in the fastest configuration with a ROM source (no color remapping or masking by foreground). The equivalent part of Mode 3 is here (videoMode3core.s, line 1296), which for 8 pixels actually takes longer! Y loop and the set-up time is similar for the two. So usually M74 blits should take about the same time as M3 blits even if some extra features are thrown in.

(It would be possible to make M3 blits significantly faster by unrolling the X loop, but as it is now, this is it, M74 in its fastest configuration might even beat it. In FoaD on a tightly packed screen you could have as many as 40 8x8 sprite blocks or even possibly more which of course could break down to more than 100 fractions. Of course FoaD gives as much time for blit as possible by running itself at 30 FPS, one frame dedicated to graphics, the other to logic. That game uses the slowest blitter configuration available without SPI RAM)

Timer termination for M13: True, there are only slight advantages for not terminating with the timer. Such as you can modify the kernel's timing without affecting the video mode (the kernel boost hacks). M13 also has the last pixel 7 clocks wide instead of 6, well, barely noticable, true (it would be more so if someone decided he needed to mix it with a 3 clocks per pixel mode, like done with the SPI RAM Mode 74 which has a 384 pixels wide 1bpp option). More significant is that it is messier to use 4 pointers in the scanline logic (such as done in M74 abusing the stack pointer turning it into an index for VRAM) with a timer terminated mode (my M64 scanline loop plan doesn't need this, though).
Post Reply