Video Mode 3: Difference between revisions

From Uzebox Wiki
Jump to navigation Jump to search
Line 33: Line 33:




'''IMPORTANT:''' To use Mode 3 with no scrolling, you must adjust your Makefile with these following parameters. Failing to do step 2 will result in video memory corruption.
'''IMPORTANT:''' To use Mode 3 with scrolling, you must adjust your Makefile with these following parameters. Failing to do step 2 will result in video memory corruption.


1) Set the following compile switches:
1) Set the following compile switches:

Revision as of 23:34, 15 February 2013

Introduction

Alt text
Mode 3 with scrolling - Super Mario demo
Alt text
Mode 3 without scrolling- Donkey Kong

Mode 3 is the most popular video mode on the Uzebox. It is functionally the closest to familiar retro consoles video modes like the NES. It supports a 240x220 resolution, sprites and full screen scrolling. With it you can implement games like pac-man or platformers like super mario bros.

As with most things on the Uzebox, this video mode can be customized with compiler switches in order to balance ram, flash and CPU consumption. Mode 3 comes in two main configuration: with scrolling and without scrolling. Internally, the implementation are very different and RAM requirements and trade-offs will vary slightly based on which configuration you choose.

Both configurations support sprites and an "overlay", a special section you can enable at the top of the screen to display scores, etc. To determine how many sprites can be used at once, you will have to understand the concept of "ramtiles". This will be discussed in a further section.

Mode 3 Without Scrolling

Mode 3 with no scrolling has the following specifications:

  • 240x224 resolution
  • 30x28 tiles VRAM using 8-bit indices
  • 30x28 visible window
  • 8x8 pixels sprites with X-flipping function
  • Variable-height overlay

To use Mode 3 with no scrolling, configure your Makefile with these parameters:

 KERNEL_OPTIONS += -DVIDEO_MODE=3 -DSCROLLING=0 -DMAX_SPRITES=xx -DRAM_TILES_COUNT=yy

Where:

  • xx is the maximum number of simultaneous sprites you intend to use (the greater the number, the more RAM is allocated)
  • yy the ramtiles to allocates (each ramtile consumes 64 bytes of RAM. ramtiles are discussed later)

Mode 3 With Scrolling

Mode 3 with scrolling has the following specifications:

  • 240x224 resolution
  • 32x32,32x28 or 32x24 tiles VRAM using 8-bit indices
  • 28x28 visible window
  • 8x8 pixels sprites with X-flipping function
  • Variable-height overlay


IMPORTANT: To use Mode 3 with scrolling, you must adjust your Makefile with these following parameters. Failing to do step 2 will result in video memory corruption.

1) Set the following compile switches:

 KERNEL_OPTIONS += -DVIDEO_MODE=3 -DSCROLLING=1 -DMAX_SPRITES=xx -DRAM_TILES_COUNT=yy

Where:

  • xx is the maximum number of simultaneous sprites you intend to use (the greater the number, the more RAM is allocated)
  • yy the ramtiles to allocates (each ramtile consumes 64 bytes of RAM. ramtiles are discussed later)


2) Add the following line the linker parameters. This forces aligment of the VRAM on an 8bit boundary (the .noinit section) without wasting any RAM. 0x800100 is the first memory location right after the register file and the I/O ports. It is also very important to set the .data section start to .noinit+(VRAM_TILES_H*VRAM_TILES_V). Since VRAM_TILES_V must be a multiple of 4 use the following values:

For VRAM_TILES_V==32

 LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500  

For VRAM_TILES_V==24

 LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800300


Example makefile with parameters set:

...
## Kernel settings
KERNEL_DIR = ../../../../kernel
KERNEL_OPTIONS  = -DVIDEO_MODE=3 -DINTRO_LOGO=0 -DSCROLLING=1 -DMAX_SPRITES=8 -DRAM_TILES_COUNT=20

## Options common to compile, link and assembly rules
COMMON = -mmcu=$(MCU)

## Compile options common for all C compilation units.
CFLAGS = $(COMMON)
CFLAGS += -Wall -gdwarf-2 -std=gnu99 -DF_CPU=28636360UL -Os -fsigned-char -ffunction-sections 
CFLAGS += -MD -MP -MT $(*F).o -MF dep/$(@F).d 
CFLAGS += $(KERNEL_OPTIONS)

## Assembly specific flags
ASMFLAGS = $(COMMON)
ASMFLAGS += $(CFLAGS)
ASMFLAGS += -x assembler-with-cpp -Wa,-gdwarf2

## Linker flags
LDFLAGS = $(COMMON)
LDFLAGS += -Wl,-Map=$(GAME).map 
LDFLAGS += -Wl,-gc-sections 
LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500
...

Compile-time Switches

These configuration switches are supported in the Makefile's KERNEL_OPTIONS.

  • SCROLLING: Controls if scrolling will be used. 0=no scrolling, 1=use scrolling. Default=0.
  • TILE_HEIGHT: Specify the height of tiles and sprites. Dfeault=8.
  • VRAM_TILES_H: Horizontal size of the VRAM. Fixed at 32 when SCROLLING=1. Configurable when SCROLLING=0 and SCREEN_TILES_H follows this value.
  • VRAM_TILES_V: Vertical size of VRAM. Must be 16,24 or 32 when SCROLLING=1. Default=32 when SCROLLING=1. Default=28 when SCROLLING=0.
  • OVERLAY_LINES: Allocate RAM for the specified number of lines. RAM usage will be OVERLAY_LINES*VRAM_TILES_H. Default=0. Note that this option is only valid when SCROLLING=0. Due to the memory organization when using scrolling with an overlay set VRAM_TILES_V=32 to allocate the required VRAM.
  • FRAME_LINES: The number of video lines to render. Rendering less video lines leaves the main program with more CPU cycles. Defaults to SCREEN_TILES_V*TILE_HEIGHT.
  • FIRST_RENDER_LINE: When changing FRAME_LINES, the picture will not be centered in the screen. Use this parameter to adjust Y cetering of the picture. Default=20.
  • TRANSLUCENT_COLOR: The color index to use as translucent pixel for sprites. Default=0xfe.
  • RAM_TILES_COUNT: The number of ramtiles to allocate. Default=0.

Using mode 3

TBD

  • sprites
  • overlay
  • scrolling

Implementation

Mode 3 uses a "restore buffer" in addition to a byte VRAM array. This buffer has as many slots as there are ramtiles. Each slot is composed of a VRAM address (2 bytes) and a tile index (1 byte). At the beginning of each frame (during VSYNC), sprites are processed. For each 8x8 pixel sprite, we compute which VRAM background tile(s) are overlapped (up to 4 tiles can be overlapped, so 4 tests are done). Two cases can then arise:

1) the VRAM tile index is >=RAM_TILES_COUNT which indicates the sprite will overlap a regular flash tile. In this case, we check if we still have unallocated ramtiles. If so, the overlapped flash tile's pixels are copied into the newly allocated ramtile. The new ramtile index in set in VRAM (the previous VRAM value & adress is preserved in the restore buffer). The sprite is blitted against this ramtile (which contains only the flash tile pixels at this point). If no more ramtiles were available, the sprite blitting against this VRAM location is clipped/ignored.

2) the VRAM tile index is < RAM_TILES_COUNT which indicates the sprite will overlap a previously allocated ramtile. The sprite is blitted against this ramtile (which contains the bg flash tile pixels overlaid with perhaps one or more sprites pixels).

When rendering begins, the tile indexes in the restore buffer are updated with the current VRAM values. This is required since the main program may have updated the VRAM between VSYNC and rendering. So for each slot in the restore buffer, we check if the tile index in the matching VRAM location is < RAM_TILES_COUNT. If not we know there has been an update and we store the new VRAM tile index in the restore buffer and write the restore buffer's slot index into that VRAM location.

The main rendering loop then goes through the VRAM tile by tile. Two separate "sub loops" handles flash and ram tiles. When using scrolling there was no cycles left in the flash inner loop to perform X wrapping so we need to "pre-wrap" the current tile row into a "linear" buffer so we don't need any wrapping during rendering. This unfortunately adds overhead to each scanline and reduces the displayable tile from 30 to 28.

Once frame rendering is finished, the VRAM is restored using the "restore buffer" values.

Ramtiles

The number of ramtiles is related to the number of sprites you expect to have on screen. The point of ramtiles is to allow the rendering engine to render both tiles and sprites at a higher resolution than the old mode 2, with the added benefit of handling overlapping sprites. Mode 2 has to run at a lower resolution than mode 1 because of the extra time used up by checking for sprites and handling sprite transparency. Ramtiles are a way of allowing some of the rendering tasks to be done during VBL rather than while actually outputting scanlines. It works by finding where sprites are on the screen, looking at what tiles would be under those sprites, then compositing the sprite onto the tiles and storing the result in RAM. That way when it's time to output pixels on a scanline, it can just fetch the pre-rendered images from RAM rather than having to do the sprite compositing on the fly. This saves enough cycles to allow for better resolution, and as a side effect we get nifty things like overlapping sprites and sprite flipping. The tradeoff is that games have less RAM to work with, as well as fewer CPU cycles for running game logic. (plus additional limitations on sound channels, if I remember correctly.)[1]

MAX_SPRITES really controls the allocation of memory for the sprites structure and also slighly speeds ups the blitting of sprites. RAM_TILES_COUNT controls the allocation of memory for the ramtiles. I.e.: Your 12 sprites 'megasprite' will consume up to 20 ramtiles if allowed to move freely.[2]

Additional Info (WIP)

The rendering flow is as follow:

1) Frame renders
2) At end of frame VSYNC flag is set
3) VSYNC routine begins immediately. Note that control will not return to the main program until the end of the VSYNC routine.
4) VideoModeVsync function executed for the active video mode. For mode 3, this will blit the sprites for the *next* frame.
5) Pre-vsync user callback is executed
6) Joypads are read
7) Music is processed and sound mixed
8) Post-vsync user callback is executed
9) VSYNC routine ends, control returned to main program.

Renderflow.png

All step from 1 to 9 are executed sequentially in 1 shot. Typically rendering will begin at line 20 and ends at line 244. The VSYNC code then executes right after and will extend up to line 262 then continue back to line 1 of the next frame. Depending on the sprites to process, this could extend to near line 20, in which case the main program, will have just a few lines worth of CPU to execute and will slow down. Naturally, going over line 20 will cause a a stack overflow and crash the program.

As for the RestoreBG() part, geez, shame on me for not putting any comments...I spent a lot of time understanding how the hell that works! The whole thing has to do with showing the main program an unaltered VRAM. When VSYNC begins blitting sprites, the VRAM is updated with the ram tile indexes used. This is required to handle overlapping sprites. Now the tricky part: As each ramtile is blitted, its pointer in VRAM is saved in the ram_tiles_restore[] buffer along the *previous* value, that is, a regular "rom tile" and free_tile_index is incremented. At the end of sprite blitting, the VRAM is restored to the initial "rom tiles" values. Now the main program executes and may modify the VRAM. Then rendering begins and the ram_tiles_restore[] buffer is iterated (note that the iterator variable actually corressponds the ramtile number). For each entry, the vram pointer reads the current value and update the ram_tiles_restore[] (This is required because since the time the process_sprite is executed at VSYNC, the main program may have altered the vram and wrong/old bakground tiles could be restored). Finally the vram pointer is used to write the ram tile index.

Using Scolling

(To be rewritten, this is just a cut-and-paste from the forums)

  • New memory arrangement removes need to update the "linear buffer" for each scanline and frees enough cycles to use the inline mixer. This in turn frees >512 bytes of RAM (more ramtiles  !) and lots of cycles used for mixing during VSYNC. Here's the new way to access a specific X/Y location in VRAM: Ptr=((y>>3)*256)+(x*8)+(y&7). It look complex but it takes barely more cycles in assembler than the usual (y*VRAM_WIDTH)+x. This is relevant only for direct access to the vram[] array. No changes if you were using SetTile().
  • Due to the new arrangement, VRAM_TILES_V can only be a multiple of 8, so basically it has to be 24 or 32. Defaults to 32.
  • The overlay VRAM is located with the VRAM_TILES_V allocated region. I.e: to draw a map in an overlay region of 4 tiles high use: DrawMap2(0,VRAM_TILES_V-4,my_overlay_map);
  • Set the overlay dynamically with Screen.overlayHeight=4
  • Y scroll wrap height is controlled by Screen.scrollHeight=28 (28 is the default). However Y scrolling is currently broken, so not to be used.
  • Add -DSCROLLING=1 -DSOUND_MIXER=1 to the kernel compile swicthes.
  • To force the aligment of the VRAM at 0x0100 and not waste space, this MUST be added to the linker flags section:
# Adjust the 0x800500 value to be 0x800100+VRAM_SIZE.  
LDFLAGS += -Wl,--section-start,.noinit=0x800100 -Wl,--section-start,.data=0x800500   
  • Finally, check the SuperMarioDemo project for more details.