Video Mode 3
Mode 3 uses a "restore buffer" in addition to a byte VRAM array. This buffer has as many slots as there are ramtiles. Each slot is composed of a VRAM address (2 bytes) and a tile index (1 byte). At the beginning of each frame (during VSYNC), sprites are processed. For each 8x8 pixel sprite, we compute which VRAM background tile(s) are overlapped (up to 4 tiles can be overlapped, so 4 tests are done). Two cases can then arise:
1) the VRAM tile index is >=RAM_TILES_COUNT which indicates the sprite will overlap a regular flash tile. In this case, we check if we still have unallocated ramtiles. If so, the overlapped flash tile's pixels are copied into the newly allocated ramtile. The new ramtile index in set in VRAM (the previous VRAM value & adress is preserved in the restore buffer). The sprite is blitted against this ramtile (which contains only the flash tile pixels at this point). If no more ramtiles were available, the sprite blitting against this VRAM location is clipped/ignored.
2) the VRAM tile index is < RAM_TILES_COUNT which indicates the sprite will overlap a previously allocated ramtile. The sprite is blitted against this ramtile (which contains the bg flash tile pixels overlaid with perhaps one or more sprites pixels).
When rendering begins, the tile indexes in the restore buffer are updated with the current VRAM values. This is required since the main program may have updated the VRAM between VSYNC and rendering. So for each slot in the restore buffer, we check if the tile index in the matching VRAM location is < RAM_TILES_COUNT. If not we know there has been an update and we store the new VRAM tile index in the restore buffer and write the restore buffer's slot index into that VRAM location.
The main rendering loop then goes through the VRAM tile by tile. Two separate "sub loops" handles flash and ram tiles. When using scrolling there was no cycles left in the flash inner loop to perform X wrapping so we need to "pre-wrap" the current tile row into a "linear" buffer so we don't need any wrapping during rendering. This unfortunately adds overhead to each scanline and reduces the displayable tile from 30 to 28.
Once frame rendering is finished, the VRAM is restored using the "restore buffer" values.
The number of ramtiles is related to the number of sprites you expect to have on screen. The point of ramtiles is to allow the rendering engine to render both tiles and sprites at a higher resolution than the old mode 2, with the added benefit of handling overlapping sprites. Mode 2 has to run at a lower resolution than mode 1 because of the extra time used up by checking for sprites and handling sprite transparency. Ramtiles are a way of allowing some of the rendering tasks to be done during VBL rather than while actually outputting scanlines. It works by finding where sprites are on the screen, looking at what tiles would be under those sprites, then compositing the sprite onto the tiles and storing the result in RAM. That way when it's time to output pixels on a scanline, it can just fetch the pre-rendered images from RAM rather than having to do the sprite compositing on the fly. This saves enough cycles to allow for better resolution, and as a side effect we get nifty things like overlapping sprites and sprite flipping. The tradeoff is that games have less RAM to work with, as well as fewer CPU cycles for running game logic. (plus additional limitations on sound channels, if I remember correctly.)
MAX_SPRITES really controls the allocation of memory for the sprites structure and also slighly speeds ups the blitting of sprites. RAM_TILES_COUNT controls the allocation of memory for the ramtiles. I.e.: Your 12 sprites 'megasprite' will consume up to 20 ramtiles if allowed to move freely.
Additional Info (WIP)
The rendering flow is as follow:
1) Frame renders 2) At end of frame VSYNC flag is set 3) VSYNC routine begins immediately. Note that control will not return to the main program until the end of the VSYNC routine. 4) VideoModeVsync function executed for the active video mode. For mode 3, this will blit the sprites for the *next* frame. 5) Pre-vsync user callback is executed 6) Joypads are read 7) Music is processed and sound mixed 8) Post-vsync user callback is executed 9) VSYNC routine ends, control returned to main program.
All step from 1 to 9 are executed sequentially in 1 shot. Typically rendering will begin at line 20 and ends at line 244. The VSYNC code then executes right after and will extend up to line 262 then continue back to line 1 of the next frame. Depending on the sprites to process, this could extend to near line 20, in which case the main program, will have just a few lines worth of CPU to execute and will slow down. Naturally, going over line 20 will cause a a stack overflow and crash the program.
As for the RestoreBG() part, geez, shame on me for not putting any comments...I spent a lot of time understanding how the hell that works! The whole thing has to do with showing the main program an unaltered VRAM. When VSYNC begins blitting sprites, the VRAM is updated with the ram tile indexes used. This is required to handle overlapping sprites. Now the tricky part: As each ramtile is blitted, its pointer in VRAM is saved in the ram_tiles_restore buffer along the *previous* value, that is, a regular "rom tile" and free_tile_index is incremented. At the end of sprite blitting, the VRAM is restored to the initial "rom tiles" values. Now the main program executes and may modify the VRAM. Then rendering begins and the ram_tiles_restore buffer is iterated (note that the iterator variable actually corressponds the ramtile number). For each entry, the vram pointer reads the current value and update the ram_tiles_restore (This is required because since the time the process_sprite is executed at VSYNC, the main program may have altered the vram and wrong/old bakground tiles could be restored). Finally the vram pointer is used to write the ram tile index.
(To be rewritten, this is just a cut-and-paste from the forums)
- New memory arrangement removes need to update the "linear buffer" for each scanline and frees enough cycles to use the inline mixer. This in turn frees >512 bytes of RAM (more ramtiles !) and lots of cycles used for mixing during VSYNC. Here's the new way to access a specific X/Y location in VRAM: Ptr=((y>>3)*256)+(x*8)+(y&7). It look complex but it takes barely more cycles in assembler than the usual (y*VRAM_WIDTH)+x. This is relevant only for direct access to the vram array. No changes if you were using SetTile().
- Due to the new arrangement, VRAM_TILES_V can only be a multiple of 8, so basically it has to be 24 or 32. Defaults to 32.
- The overlay VRAM is located with the VRAM_TILES_V allocated region. I.e: to draw a map in an overlay region of 4 tiles high use: DrawMap2(0,VRAM_TILES_V-4,my_overlay_map);
- Set the overlay dynamically with Screen.overlayHeight=4
- Y scroll wrap height is controlled by Screen.scrollHeight=28 (28 is the default). However Y scrolling is currently broken, so not to be used.
- Add -DSCROLLING=1 -DSOUND_MIXER=1 to the kernel compile swicthes.
- To force the aligment of the VRAM at 0x0100 and not waste space, this MUST be added to the linker flags section:
# Adjust the 0x800500 value to be 0x800100+VRAM_SIZE. LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500
- Finally, check the SuperMarioDemo project for more details.