Sound Engine

From Uzebox Wiki
Jump to: navigation, search

The mixer: Theory of Sound Generation

The Buffer

The sound engine uses a circular buffer sometimes called a ring buffer. It is a byte array in RAM logically segmented in two parts. One half plays while the other half is mixed. Each half contains exactly as many samples as there are scan lines in a video field (two interlaced fields makes a frame), in this case, 262. *UPDATE*There is now an option "-DMIXER=1" to mix during the HSYNC period which does not require this buffer and therefore saves ram.

So at 15.7Khz (the NTSC line rate) during the HSYNC pulse, a byte is read from the circular buffer and output to the sound port (by means of PWM). HSYNC pulses occur non-stop even during blanking intervals.

During each VSYNC (once, at the beginning of each field) the first operation performed is the mixing of the music for the next field. Naturally, music mixer code is in assembler for optimal speed. A whole field worth of music is mixed "one-shot" and all four channels are mixed simultaneously without resorting to a temporary 16-bit signed buffer (not enough RAM!). This implies *all* registers are used during mixing.

Sound Generation

The engine uses a table made of short, repeating waveforms for the first 3 channels. Each wave is exactly 256 samples long (8 bits signed) and are forced-aligned on an 8 bit boundary in ROM. Because of this, we only need an 8 bit pointer for the waveform's position. Position will wrap automatically, effectively giving "free-running oscillators".

Using a tool like CoolEdit, its easy to create waveforms that can vary from a simple square wave to a sine or filtered triangle.

The 4th channel is a noise channel and is based on a switchable 7/15 bits LFSR. The 7bit mode is more metallic sounding because all bit states are repeated each 127 samples. The 15 bits mode sound much more like white noise.

Pitch is done like most MOD players out there, with a "step table". This ROM table consists of pre-calculated 8:8 fixed point values that represents the input sample's (i.e.: the 256 bytes wave) pointer increment per output sample (the mix buffer). There is one fixed point word per note, for a total of 127 notes. The wavetable is composed of 256 bytes waves and each wave models exactly one "sound cycle". I.e. for a triangle wave, it would contain : /\/ .

Lets say the mixing rate is 8Khz and we want to play a C5. We look in the step table for note 48 (C5). It says note frequency is 8Khz and its calculated stepping is hence 1.000. For each output sample, we increment the input pointer by exactly one. Now say we have a C6, an octave higher (so its double the frequency). The stepping of this note will be 2.000. That means that for each output sample, we increment the input by two samples, effectively skipping one of them. You get the idea. Note that for high stepping, a lot of samples are skipped, and combined with wrapping it introduces aliasing. That can be somewhat minimized by using slow rising/ending waves.

Mixing Procedure

The mixing procedure for each of the 256 wave samples is as follows:

  • (For each channel)The next sample pointer is incremented according to the appropriate notes step (8:8 fixed point)
  • (For each channel)The next Sample is read from the wave table
  • (For each channel)The sample is multiplied by it's volume and added to an 16 bits signed accumulator
  • The accumulator is divided by 2 and clipped back to 8-bit
  • The final value is stored in the mix buffer

Mixing on the Fly

Note it could be possible to mix on the fly, hence not requiring the circular buffer. There are basically 2 reasons why mixing isn't (currently) done on the fly:

  • When rendering a field (specially in mode 2), all cycles are taken. At the end of each scanline, there's a short period of time where the electron beam returns back to the beginning of the next line, the horizontal blanking. And the problem is that there's not enough cycles left when rendering a field, especially considering just the setup required like loading all registers, etc. With that said, simple video modes like mode 1 has plenty of slack, so on-the-fly mixing could be possible.
  • The rendering phase is coded in assembler because it requires "cycle-perfect" code, each clock cycle is important or the image will shear. For most code like the mixer and music engine, you want to put that outside the rendering path to allow easy modifications or optimizations. And currently they are executed right after the VSYNC "cycle-perfect" code is done. It lasts for about 25 scanlines and after control is returned to the main program for a bunch of other lines, then rendering begins again.

Music Replayer

The music replayer sits on top of the mixer and was designed to play MIDI streams. MIDI is a very compact and space efficient format for music. It is made of a continuous stream of events, each of which is separated in time using a number of 'ticks' (a delta-time value).

Events can be notes, tempo change, modulation change, etc. and can be associated with a specific channel or be global (like tempo event). Currently only the following types of events are supported, any other events like NOTEOFF will be filtered out by the conversion tool:

  • Meta 0x2f : End of song
  • Meta 0x06 : Marker (For start and end of loop. Only two value are accepted "S" for start and "E" for end of loop.)
  • 0x90  : Note on
  • 0xB0  : Controllers (Volume, Expression, Tremolo Level and Tremolo Rate)
  • 0xC0  : Program Change (Patch change)

The music engine also supports direct input through a MIDI port. To enable this code, the MIDI interface must be built and support added to the kernel using the MIDI_IN=1

MIDI Converter

MIDI files can be converted to Uzebox format using a console Java-based conversion tool. A DOS batch file makes it easier to call the tool. Since the conversion tool is Java based you need to install the JRE (version 7+) and have it's /bin directory on your system's PATH.

   $ java -cp ~/uzebox/tools/JavaTools/dist/uzetools.jar -h
   Uzebox (tm) MIDI converter 1.1
   (c)2009 Alec Bourque. This tool is released under the GNU GPL V3.
   usage: midiconv [options] inputfile outputfile
   Converts a MIDI song in format 0 or 1 to a Uzebox MIDI stream outputted as
   a C include file.
    -d         Prints debug info.
    -e <arg>   Force a loop end (specified in tick). Any existing loop end in
               the input will be discarded.
    -f <arg>   Speed correction factor (double). Defaults to 30.0
    -h         Prints this screen.
    -no1       Include note off events for channel 1
    -no2       Include note off events for channel 2
    -no3       Include note off events for channel 3
    -no4       Include note off events for channel 4
    -no5       Include note off events for channel 5
    -s <arg>   Force a loop start (specified in tick). Any existing loop
               start in the input will be discarded.
    -v <arg>   variable name used in the include file. Defaults to 'midisong'
   Ex: midiconv -s32 -vmy_song -ls200 -le22340 c:\mysong.mid c:\


  • Tick per quarter note should ideally be 120.
  • Since the converter strips out note-off events to save space, you'll end up with "stuck" notes if your instruments don't have fade-out envelopes. 3 possible solutions to this:

1) Insure all your patches include a fade-out or note cut command 2) Add notes with zero volume at the very end of the song 3) In the conversion tool, leave note-off events by using the -no1, -no2, -no3, -no4, or -no5 switches

Using the Player

When the conversion process is complete, it is pretty easy to play songs. Simply initialize the engine using:


Then start the song using:


Stop/pause the song using:


And resume where you last stopped with:


Patch, Instruments & FXs

The music replayer engine works with the concept of "patches". Patches are a sequence of commands that defines how your notes or sound effects evolves over time. There's already a couples of patches made for Megatris. Add this include to your program file:

#include "data/"

Look into it, you will see something like this:

//FX: "Echo Droplet"
const char patch01[] PROGMEM ={ 

This is called a command stream. The first byte, is the sound type. 0 is for wavetable sounds, 1 for noise channels sound. (from beta3 and onwards, this byte is removed, more on that later). The rest is a sequence of commands. Commands are made of 3 bytes: the first one is a time delta in term of frames (frames happens at 1/60 of a sec) to wait until this command is executed. The second byte is the command type and the last byte is the command value. So in this example we have a wavetable sound (0), then when the sound is triggered (time zero), set the volume envelope decay speed to -12. On each frame after wards, -12 will automatically be subtracted from the sound's volume. Then, after a wait of 5 frames, raise's the sound pitch by 12 semitones (one octave). Then wait again for 5 frames then lower the sound's pitch by an octave. And so on, until the last command, which must be a PATCH_END command (no value byte for this one). Have a look at the .h files for all the possible commands.

From Beta3 the patch system has been tweaked to support a PCM channel which allows to play samples of arbitrary length. The sound type byte as been removed from the command stream and move into a special array of structs.

const struct PatchStruct patches[] PROGMEM = {

Let's interpret one entry:

  • First parameter (1) is the sound type, in this case its to be played on the noise channel (0=wave,1=noise,2=PCM)
  • Second parameter (NULL) is a pointer to the PCM data if it were a PCM patch
  • Third parameter (patch04) is the patch's command stream pointer
  • Fourth (0) is the loop start position for PCM samples
  • Fifth (0) is the loop end position for PCM samples

Note for PCM: A bit like the old wavetable based sound cards (i.e.: AWE32 and GUS), looping is always on internally. If you do not want looping, you have to set both loop start and loop end parameters to the sample size, effectively looping on the same byte at the end of the sample.

So, now let play some sounds! Add this line to your main() function:


And *then* you can trigger fxs anywhere in your code. If you look in, patch 19 is the "t-spin" fx from megatris


In this case:

  • 19 is the patch number
  • 0xff is the volume
  • true is the 'retrig' attribute. It basically means that if another TriggerFx() call is made for the same patch *before* the previous one is finished playing, it will re-trigger it right away on the same channel instead of starting another simultaneous instance of the sound onto another channel (determined by the voice stealing "algorithm").

Creating new patches is really trial an error. We suggest you make an empty project to create new patches. It will compile and flash faster.

All patch commands are described here.