Video Mode 3

From Uzebox Wiki
Jump to: navigation, search

Introduction

Alt text
Mode 3 with scrolling - Super Mario demo
Alt text
Mode 3 without scrolling- Donkey Kong

Mode 3 is the most popular video mode on the Uzebox. It is functionally the closest to familiar retro consoles video modes like the NES. It supports a 240x224 or 256x224 resolution, sprites and full screen scrolling. With it you can implement games like pac-man or platformers like super mario bros.

As with most things on the Uzebox, this video mode can be customized with compiler switches in order to balance ram, flash and CPU consumption. Mode 3 comes in two main configuration: with scrolling and without scrolling. Internally, the implementations are different and RAM requirements and trade-offs will vary slightly based on which configuration you choose.

Both configurations support sprites and an "overlay", a special section you can enable at the top of the screen to display scores, etc. To determine how many sprites can be used at once, you will have to understand the concept of "ramtiles". This will be discussed in a further section.

Mode 3 Without Scrolling

Mode 3 with no scrolling has the following specifications:

  • Up to 240x224 or 256x224 resolution
  • Up to 32x28 tiles VRAM using 8-bit indices
  • Up to 32x28 tiles visible window
  • 8x8 pixels sprites with X/Y-flipping function
  • Variable-height overlay

To use Mode 3 with no scrolling, configure your Makefile with these parameters:

 KERNEL_OPTIONS += -DVIDEO_MODE=3 -DSCROLLING=0 -DSOUND_MIXER=1 -DMAX_SPRITES=xx -DRAM_TILES_COUNT=yy

Where:

  • xx is the maximum number of simultaneous sprites you intend to use (the greater the number, the more RAM is allocated)
  • yy is the ramtiles to allocate (each ramtile consumes 64 bytes of RAM. ramtiles are discussed later)
  • SOUND_MIXER=1 enable the inline audio mixer and lowers RAM usage by the audio engine

This will configure Mode 3 to have 240x224 tiles resolution (30x28 tiles). If you want more (256x224; 32 tiles width), you can add the following:

 KERNEL_OPTIONS += -DRESOLUTION_EXT=1 -DVRAM_TILES_H=32

Mode 3 With Scrolling

Mode 3 with scrolling has the following specifications:

  • Up to 240x224 or 256x224 resolution
  • 32x32, 32x24 or 32x16 tiles VRAM using 8-bit indices
  • Up to 32x28 tiles visible window
  • 8x8 pixels sprites with X/Y-flipping function
  • Variable-height overlay


IMPORTANT: To use Mode 3 with scrolling, you must adjust your Makefile with these following parameters. Failing to do step 2 will result in video memory corruption.

1) Set the following compile switches:

 KERNEL_OPTIONS += -DVIDEO_MODE=3 -DSCROLLING=1 -DSOUND_MIXER=1 -DMAX_SPRITES=xx -DRAM_TILES_COUNT=yy

Where:

  • xx is the maximum number of simultaneous sprites you intend to use (the greater the number, the more RAM is allocated)
  • yy is the ramtiles to allocate (each ramtile consumes 64 bytes of RAM. ramtiles are discussed later)
  • SOUND_MIXER=1 enable the inline audio mixer and lowers RAM usage by the audio engine


2) Add the following line the linker parameters. This forces aligment of the VRAM on an 8bit boundary (the .noinit section) without wasting any RAM. 0x800100 is the first memory location right after the register file and the I/O ports. It is also very important to set the .data section start to .noinit+(VRAM_TILES_H*VRAM_TILES_V). Since VRAM_TILES_V must be a multiple of 8 use the following values:

For VRAM_TILES_V==32

 LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500  

For VRAM_TILES_V==24

 LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800300

By default with the above parameters, Mode 3 will have 224x224 (28x28 tiles) resolution. You can add the following parameter to get 240x224 (30x28 tiles):

 KERNEL_OPTIONS += -DSCREEN_TILES_H=30

If you still want more, you may use RESOLUTION_EXT to allow 31 or 32 tiles width. Note that when setting 32 tiles width, you will no longer be capable to scroll in the X dimension (there is no spare column in the 32 tiles wide VRAM to fill in scrolled tiles):

 KERNEL_OPTIONS += -DRESOLUTION_EXT=1 -DSCREEN_TILES_H=32


Example makefile with parameters set:

...
## Kernel settings
KERNEL_DIR = ../../../../kernel
KERNEL_OPTIONS  = -DVIDEO_MODE=3 -DINTRO_LOGO=0 -DSCROLLING=1  -DSOUND_MIXER=1 -DMAX_SPRITES=8 -DRAM_TILES_COUNT=20

## Options common to compile, link and assembly rules
COMMON = -mmcu=$(MCU)

## Compile options common for all C compilation units.
CFLAGS = $(COMMON)
CFLAGS += -Wall -gdwarf-2 -std=gnu99 -DF_CPU=28636360UL -Os -fsigned-char -ffunction-sections 
CFLAGS += -MD -MP -MT $(*F).o -MF dep/$(@F).d 
CFLAGS += $(KERNEL_OPTIONS)

## Assembly specific flags
ASMFLAGS = $(COMMON)
ASMFLAGS += $(CFLAGS)
ASMFLAGS += -x assembler-with-cpp -Wa,-gdwarf2

## Linker flags
LDFLAGS = $(COMMON)
LDFLAGS += -Wl,-Map=$(GAME).map 
LDFLAGS += -Wl,-gc-sections 
LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500
...

Compile-time Switches

These configuration switches are supported in the Makefile's KERNEL_OPTIONS.

  • SCROLLING: Controls if scrolling will be used. 0=no scrolling, 1=use scrolling. Default=0.
  • TILE_HEIGHT: Specify the height of tiles and sprites. Default=8, can not be changed.
  • VRAM_TILES_H: Horizontal size of the VRAM. Fixed at 32 when SCROLLING=1. Configurable when SCROLLING=0 and SCREEN_TILES_H follows this value.
  • VRAM_TILES_V: Vertical size of VRAM. Must be 16,24 or 32 when SCROLLING=1. Default=32 when SCROLLING=1. Default=28 when SCROLLING=0.
  • SCREEN_TILES_H: Horizontal number of tiles displayed. Only use with SCROLLING=1. Default=28.
  • RESOLUTION_EXT: Increases the mode's physical resolution, allowing up to 32 horizontal tiles (256 pixels) to fit instead of 30. Default=0 (30 tiles).
  • OVERLAY_LINES: Allocate RAM for the specified number of lines. RAM usage will be OVERLAY_LINES*VRAM_TILES_H. Default=0. Note that this option is only valid when SCROLLING=0. Due to the memory organization when using scrolling with an overlay set VRAM_TILES_V=32 to allocate the required VRAM.
  • FRAME_LINES: The number of video lines to render. Rendering less video lines leaves the main program with more CPU cycles. Defaults to SCREEN_TILES_V*TILE_HEIGHT.
  • FIRST_RENDER_LINE: When changing FRAME_LINES, the picture will not be centered in the screen. Use this parameter to adjust Y centering of the picture. Default=20.
  • TRANSLUCENT_COLOR: The color index to use as translucent pixel for sprites. Default=0xfe.
  • RAM_TILES_COUNT: The number of ramtiles to allocate. Default=0.
  • SPRITES_VSYNC_PROCESS: Default=1. If turned off (0), sprite blitting won't be handled by the VSync interrupt. You have to call ProcessSprites(); and RestoreBackground() explicitly.
  • SPRITES_AUTO_PROCESS: Default=1. If turned off (0), the default sprite engine is removed, and you have to blit sprite tiles manually using BlitSprite().
  • RTLIST_ENABLE: Default=1. If turned off (0), the ramtile restore list is removed. Use only if you know what you are doing!
  • RT_ALIGNED: Default=0. If enabled (1), ramtiles are aligned to boundary, improving performance, but it is more complex to set up (Makefile). Use only if you know what you are doing!

Sprites

Sprites are tile-sized objects (usually 8x8 pixels) that can be moved freely over the background. They can be flipped on X or Y axis dynamically and can use tile data from one of four shared banks. Sprites are usually assembled in larger chunks also called composite or "mega" sprites, such as Jumpman in the following picture, which is made of a 2x2 mega sprite:

Dk sprites zoom.png

Initializing the sprites engine

Before using sprites, you must define the tileset(s) they will use. For this, you will use the SetSpritesTileBank() function:

#include "data/belmont.inc"
...
SetSpritesTileBank(0,belmont_tiles);
...
SetSpriteVisibility(true);

In this example we set bank #0 to point to the belmont_tiles tileset we have imported in our project and activate the sprite engine.

See the Gconvert tool to learn how to create tileset include files.

Sprites structure

At run time, sprites are moved and controlled via an array of SpriteStruct structures.

 struct SpriteStruct
 {
   u8 x;
   u8 y;
   u8 tileIndex;
   u8 flags;		
 };	
 extern struct SpriteStruct sprites[];

Where:

  • x: the horizontal position of the sprite in respect to the left side of the screen
  • y: the vertical position of the sprite in respect to the top of the screen
  • tileIndex: the tile number to use from the sprite's tile bank
  • flags: options that controls the sprite's behavior at run time. Flags can be updated at anytime.
    • SPRITE_FLIP_X: The sprite will be drawn flipped on its horizontal axis
    • SPRITE_FLIP_Y: The sprite will be drawn flipped on its vertical axis
    • SPRITE_BANK0, SPRITE_BANK1, SPRITE_BANK2, SPRITE_BANK3: Specify which of the four tile banks to use to draw the sprite. SPRITE_BANK3n are mutually exclusive.

Using sprites

You can draw and move sprites using the API or the manual way. In the "manual" way, you must move every 8x8 sprites one by one. This is usually used by game engines on when you need more control than what's offered by the API's functions MapSprite() and MoveSprite().

Assuming the engine as been setup and tilebanks are set, this will display sprite #0 at the center of the screen. It will use tile #6 from bank #0 (default bank) and will be drawn flipped on both X and Y axis.

sprites[0].x=120;
sprites[0].y=112;
sprites[0].tileIndex=6;
sprites[0].flags=SPRITE_FLIP_X | SPRITE_FLIP_Y;

As mentionned earlier, to turn on/off all sprites at once you use the SetSpriteVisibility() function. To turn off a single sprite, it's X position must be set to offscreen. This intent can be declared explicitely by using the following define:

sprites[0].x=OFF_SCREEN;

When you begin with mode 3 it is easier to use the API functions since they can be used to move super sprites.

First, MapSprite2() is used to set multiples consecutive sprites indexes and set their flags.

//prototype is void MapSprite2(unsigned char startSprite,const char *map,u8 spriteFlags)
MapSprite2(3,mario_walk1_map,SPRITE_FLIP_X|SPRITE_BANK0);

Where:

  • 3: the first sprites index to start mapping
  • mario_walk1_map: a u8[] array that define tile indexes in the tileset defined by bank #0. Super sprites are mapped row by row from their top-left corner to their down-right corner.
  • SPRITE_FLIP_X|SPRITE_BANK0: flips the super sprite horizontally and use bank #0 for all of them.

Second, use MoveSprite() to actually move the mega sprite:

//prototype is: void MoveSprite(unsigned char startSprite,unsigned char x,unsigned char y,unsigned char width,unsigned char height)
MoveSprite(3,100,120,2,2);

Where:

  • 3: First sprites index of our mega sprite
  • 100,120: x,y position of the mega sprite
  • 2,2: width and height (in tiles) of our mega sprite

TODO

  • overlay

Implementation

Mode 3 uses a "restore buffer" in addition to a byte VRAM array. This buffer has as many slots as there are ramtiles. Each slot is composed of a VRAM address (2 bytes) and a tile index (1 byte). At the beginning of each frame (during VSYNC), sprites are processed. For each 8x8 pixel sprite, we compute which VRAM background tile(s) are overlapped (up to 4 tiles can be overlapped, so 4 tests are done). Two cases can then arise:

1) the VRAM tile index is >=RAM_TILES_COUNT which indicates the sprite will overlap a regular flash tile. In this case, we check if we still have unallocated ramtiles. If so, the overlapped flash tile's pixels are copied into the newly allocated ramtile. The new ramtile index in set in VRAM (the previous VRAM value & adress is preserved in the restore buffer). The sprite is blitted against this ramtile (which contains only the flash tile pixels at this point). If no more ramtiles were available, the sprite blitting against this VRAM location is clipped/ignored.

2) the VRAM tile index is < RAM_TILES_COUNT which indicates the sprite will overlap a previously allocated ramtile. The sprite is blitted against this ramtile (which contains the bg flash tile pixels overlaid with perhaps one or more sprites pixels).

When rendering begins, the tile indexes in the restore buffer are updated with the current VRAM values. This is required since the main program may have updated the VRAM between VSYNC and rendering. So for each slot in the restore buffer, we check if the tile index in the matching VRAM location is < RAM_TILES_COUNT. If not we know there has been an update and we store the new VRAM tile index in the restore buffer and write the restore buffer's slot index into that VRAM location.

The main rendering loop then goes through the VRAM tile by tile. Two separate "sub loops" handles flash and ram tiles. When using scrolling there was no cycles left in the flash inner loop to perform X wrapping so we need to "pre-wrap" the current tile row into a "linear" buffer so we don't need any wrapping during rendering. This unfortunately adds overhead to each scanline and reduces the displayable tile from 30 to 28.

Once frame rendering is finished, the VRAM is restored using the "restore buffer" values.

The following images shows the various compile time switches and dynamic variable in relation to the final display. Note: The values displayed are arbitrary and are the ones used by the Castlevania game.

Mode3 vram.png

Mode 3 with scrolling

When using scrolling, the video scanline renderer needs to "wrap" horizontally every 32 tiles. A previous way to do this used some pointer wrapping code within the inner loops. Although it needed only 3 assembler instructions per tile, there was not enough cycles to do it. The trick was to "linearize" the memory used to render the scanlines. This however required many cycles in the HSYNC period and wasted some RAM. A newer way to wrap was found and it involves using the AVR 8 bit registers "natural" rollover than happen when incrementing 255 (rolls back to zero). Since we don't have a VRAM that is 256 tiles wide but 32, the trick is to interleave the tile indexes every 8 memory adresses. The physical memory arrangement is represented by this table (VRAM starts at 0x100):

Mode3memory.png

That way, the tile pointer in memory is advanced by 8 for each consecutive tile on the screen. Using such a scheme, the tile indexing formula is t(x,y)=vram[((y>>3)*256)+8x+(y&7)].

Using this approach, frees up the cycles used during HSYNC. This in turn allows the use the inline audio mixer which frees up 524 byte of RAM and lots of cycles during VSYNC. These freed cycles can be used by the main program or the kernel to process more ramtiles (hence more sprites).

Ramtiles

The number of ramtiles is related to the number of sprites you expect to have on screen. The point of ramtiles is to allow the rendering engine to render both tiles and sprites at a higher resolution than the old mode 2, with the added benefit of handling overlapping sprites. Mode 2 has to run at a lower resolution than mode 1 because of the extra time used up by checking for sprites and handling sprite transparency. Ramtiles are a way of allowing some of the rendering tasks to be done during VBL rather than while actually outputting scanlines. It works by finding where sprites are on the screen, looking at what tiles would be under those sprites, then compositing the sprite onto the tiles and storing the result in RAM. That way when it's time to output pixels on a scanline, it can just fetch the pre-rendered images from RAM rather than having to do the sprite compositing on the fly. This saves enough cycles to allow for better resolution, and as a side effect we get nifty things like overlapping sprites and sprite flipping. The tradeoff is that games have less RAM to work with, as well as fewer CPU cycles for running game logic. (plus additional limitations on sound channels, if I remember correctly.)[1]

MAX_SPRITES really controls the allocation of memory for the sprites structure and also slighly speeds ups the blitting of sprites. RAM_TILES_COUNT controls the allocation of memory for the ramtiles. I.e.: Your 12 sprites 'megasprite' will consume up to 20 ramtiles if allowed to move freely.[2]

User Ramtiles

It is possible to dynamically reserve a number or ramtiles for user usage from the pool size defined by RAM_TILES_COUNT. These tiles then will not be used by the kernel and the sprite blitter. They can be used as mini frame buffer and user has total control of the pixels. The user can write a user ram tile index to vram. It will behave like flash tiles and if a sprite comes to overlay one, the kernel will copy the user tile to a free ramtile before blitting. The available functions are:

/*Set the number of ramtiles to allocate for the user program. User ramtiles are
* not use by the kernel and the sprite blitter. User ramtiles are allocated from
* the beginning of the ramtiles table.*/
extern void SetUserRamTilesCount(u8 count);
/*Get a pointer to the specified tamtile index. User ramtiles are allocated from
* the beginning of the ramtiles table.*/
extern u8* GetUserRamTile(u8 index);
/*Copy srcTile from the active tileset in flash to destTile ramtile*/
extern void CopyFlashTile(u8 srcTile,u8 destTile);
/*Copy srcTile ramtile to destTile ramtile*/
extern void CopyRamTile(u8 srcTile,u8 destTile);

Additional Info (WIP)

The rendering flow is as follow:

1) Frame renders
2) At end of frame VSYNC flag is set
3) VSYNC routine begins immediately. Note that control will not return to the main program until the end of the VSYNC routine.
4) VideoModeVsync function executed for the active video mode. For mode 3, this will blit the sprites for the *next* frame.
5) Pre-vsync user callback is executed
6) Joypads are read
7) Music is processed and sound mixed
8) Post-vsync user callback is executed
9) VSYNC routine ends, control returned to main program.

Renderflow.png

All step from 1 to 9 are executed sequentially in 1 shot. Typically rendering will begin at line 20 and ends at line 244. The VSYNC code then executes right after and will extend up to line 262 then continue back to line 1 of the next frame. Depending on the sprites to process, this could extend to near line 20, in which case the main program, will have just a few lines worth of CPU to execute and will slow down. Naturally, going over line 20 will cause a a stack overflow and crash the program.

As for the RestoreBG() part, geez, shame on me for not putting any comments...I spent a lot of time understanding how the hell that works! The whole thing has to do with showing the main program an unaltered VRAM. When VSYNC begins blitting sprites, the VRAM is updated with the ram tile indexes used. This is required to handle overlapping sprites. Now the tricky part: As each ramtile is blitted, its pointer in VRAM is saved in the ram_tiles_restore[] buffer along the *previous* value, that is, a regular "rom tile" and free_tile_index is incremented. At the end of sprite blitting, the VRAM is restored to the initial "rom tiles" values. Now the main program executes and may modify the VRAM. Then rendering begins and the ram_tiles_restore[] buffer is iterated (note that the iterator variable actually corressponds the ramtile number). For each entry, the vram pointer reads the current value and update the ram_tiles_restore[] (This is required because since the time the process_sprite is executed at VSYNC, the main program may have altered the vram and wrong/old bakground tiles could be restored). Finally the vram pointer is used to write the ram tile index.

Using Scrolling

Please see the forum thread for the Uzebox Mode 3 with Scrolling Guide

(To be rewritten, this is just a cut-and-paste from the forums)

  • New memory arrangement removes need to update the "linear buffer" for each scanline and frees enough cycles to use the inline mixer. This in turn frees >512 bytes of RAM (more ramtiles  !) and lots of cycles used for mixing during VSYNC. Here's the new way to access a specific X/Y location in VRAM: Ptr=((y>>3)*256)+(x*8)+(y&7). It look complex but it takes barely more cycles in assembler than the usual (y*VRAM_WIDTH)+x. This is relevant only for direct access to the vram[] array. No changes if you were using SetTile().
  • Due to the new arrangement, VRAM_TILES_V can only be a multiple of 8, so basically it has to be 24 or 32. Defaults to 32.
  • The overlay VRAM is located with the VRAM_TILES_V allocated region. I.e: to draw a map in an overlay region of 4 tiles high use: DrawMap2(0,VRAM_TILES_V-4,my_overlay_map);
  • Set the overlay dynamically with Screen.overlayHeight=4
  • Y scroll wrap height is controlled by Screen.scrollHeight=28 (28 is the default).
  • Add -DSCROLLING=1 -DSOUND_MIXER=1 to the kernel compile swicthes.
  • To force the aligment of the VRAM at 0x0100 and not waste space, this MUST be added to the linker flags section:
# Adjust the 0x800500 value to be 0x800100+VRAM_SIZE.  
LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500   
  • Finally, check the SuperMarioDemo project for more details.