Theory of sound generation

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
EmbeddedMan
Posts: 2
Joined: Mon Aug 25, 2008 10:19 pm

Theory of sound generation

Post by EmbeddedMan »

Wow! This is simply an amazing project - I'm very impressed with what you designed and built. Way to go!

I have a basic question about the implementation of the sound engine. I'd like to understand the ideas and theory that you used to construct the code to generate the four channels of sound. Where did you go to learn about this? Are there any sources on the web I can turn to to learn more about it? I want to implement a similar (although simpler) system on a different micro, so your assembly code won't be of direct use to me. I'm also not an Atmel guy so I can't easily grok the assembly file.

Can you describe, in general terms, what happens at the 15KHz rate? Why did you need to use an assembly file to call the ProcessMusic() function? How often are the other functions run? I'd like to know anything you're willing to share!

Thanks again for publishing such an awesome project.

*Brian
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Theory of sound generation

Post by uze6666 »

Woa. That's quite complex stuff to explain in a forum thread. Even simplified!

I'm currently writing some more advanced documentation, with pictures and everything to make sound and video principles easier to digest. It won't be ready until a couple of days but, in the meantime, I can give you some "quick" info.

The sound engine uses a circular buffer sometime called a ring buffer. It is logically segmented in two parts. One half plays while the other half is mixed. Each halves contains exactly as many samples as there is scanlines in a video field (two interlaced fields makes a frame), in this case, 262. So at 15.7Khz ( the NTSC line rate) during the HSYNC pulse, a byte is read from the mix buffer and output to the PWM. HSYNC pulses happens non-stop even during blanking intervals. Also, during each VSYNC (once, at the beginning of each field) the first thing done is mix the music for the next field.

Naturally, music mixer code is in assembler for optimal speed. A whole field worth of music is mixed "one-shot" and all four channels are mixed simultaneously without resorting to a temporary 16-bit signed buffer (no enough RAM!). This implies *all* registers are used during mixing. The main loop goes like this: The current sample for each channels are calculated (volume and pitch) and added to an 16 bits signed accumulator register one after the other. After that, the accumulator is divided and clipped back to 8-bit and stored in the mix buffer as the final sample value.

Although the engine could mix longer, more complex samples, the limited ROM would not have permitted interesting sounds. Instead a table made of short, repeating waveforms was used for the 3 first channels . Each wave is exactly 256 samples long (8 bits signed) and are forced-aligned on an 8 bit boundary in ROM. Because of this, we only need a 8 bit pointer for the waveform's position. Position will wrap automatically, effectively giving "free-running' oscillators. Using a tool like CoolEdit, its easy to create waveforms that can vary from a simple square wave to a filtered triangle.

The 4th channel is based on a switchable 7/15 bits LFSR. The 7bit mode is more metallic sounding because all bit states are repeated each 127 samples. The 15 bits mode sound much more like white noise.

To answer you last question: It just happen that I started coding the mixer in assembler first and the music player in C afterwards! And since the video interrupt is in assembler and call the mixer also in assembler they are like the "parent" in the call stack. Thats it.

More to come on the main site, stay tuned!

Cheers,

Uze
CompMan
Posts: 91
Joined: Mon Aug 25, 2008 3:48 am
Location: Kent, WA

Re: Theory of sound generation

Post by CompMan »

Wow that is complex. I can't wait for the advanced documentation.

Thanks for the explanation,
Compman
EmbeddedMan
Posts: 2
Joined: Mon Aug 25, 2008 10:19 pm

Re: Theory of sound generation

Post by EmbeddedMan »

uze666 - very nice explanation. It makes a lot more sense to me now, thank you. I too look forward to the documentation - there is a huge amount of very useful knowledge contained in your code and techniques, and having it documented with pictures and such will mean a lot of people can learn from it. Thanks again for making this whole project available.

So the ring buffer in two parts makes sense to me. But I'm interested in why you need a buffer at all - can't you just compute what the next PWM value should be at the 15.7KHz rate? There must be a good reason you do all the mixing for the next part of the buffer all in one chunk rather than sample by sample as they are output, since it would save a bunch of RAM the other way.

Also, how do you vary the pitch of the samples when playing back and mixing? If each sample is 256 bytes long, and you have a single period of a square wave in that 256 byte sample, and you are outputting one PWM value every 15.7KHz, then the pitch would be 61 Hz. But if you want to play it back at 62Hz, you have to interpolate between sample values when you generate your mixed down buffer, no?

This is all just way to cool.

:)
*Brian
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Theory of sound generation

Post by uze6666 »

Theres basically 2 reasons why mixing isn't done on the fly:

1.When rendering a field, all cycles are taken. At the end of each scanline, there's a short period of time where the electron beam returns back to the beginning of the next line, the horizontal blanking. And the problem is there, theres not enough cycles left when rendering a field, specially considering just the setup required like loading all registers, etc. Well, currently there is, but I'm finishing a sprites engine which eats up pretty much all cycles left. That was planned upfront.

2.The rendering phase is coded in assembler because it requires "cycle-perfect" code, each clock cycle is important or the image will shear. For most code like the mixer and music engine, you want to put that outside the rendering path to allow easy modifications or optimizations. And currently they are, this code is executed right after the VSYNC "cycle-perfect" code is done. It last for about 25 scanlines and after control is returned to the main program for a bunch of other lines, then rendering begins.

That said, you have a point. There is enough time when not using the sprite engine. I could make that configurable at build time by putting conditional #ifdef# blocks. That could save some precious RAM for those who don't need sprite in their application.

Pitch is done like most MOD players out there, with a "step table". This ROM table consists of pre-calculated 8:8 fixed point values that represents the input sample's (i.e.: the 256 bytes wave) pointer increment per output sample (the mix buffer). There is one fixed point word per note, for a total of 127 notes. Here a quick example (these are not the actual value, just for simplification):

The wavetable is composed of 256 bytes samples. Each sample models exactly one wave cycle. I.e. for a triangle wave, it would contain : /\/ .
Let say the mixing rate is 8Khz and we want to play a C5. We look in the step table for note 48 (C5). It says note frequency is 8Khz and its calculated stepping is hence 1.000. For each output sample, we increment the input pointer by exactly one. Now say we have a C6, an octave higher (so its double the frequency). The stepping of this note will be 2.000. That means that for each output sample, we increment the input by two samples, effectively skipping one of them. You get the idea. Note that for high stepping, a lot of sample are skipped and combined with wrapping it introduces aliasing. That can be somewhat minimized by using slow rising/ending waves.

Seem I'm finally writing the official documentation here ;) . Hope that helped.

Cheers,

Uze
Acedio
Posts: 2
Joined: Tue Aug 26, 2008 10:22 pm

Re: Theory of sound generation

Post by Acedio »

I have to agree: this is a SWEET project. I've been working on sound generation for a couple days now on an Atmega168 and I have a question... If you're generating sound at a sample rate of X KHz, is there any way to output a frequency that doesn't line up with an even division, such as X/3.3 KHz? For example, is there a way to generate a 440 Hz square wave with a 8 KHz sample rate? 8000/440 = 18.18. The closest I can come is by adding 18 every time to the wave table position, but that gives me a frequency of 444.44. Is there any way to get around this that doesn't eat up a bunch of cycles? You've got me really interested in this project, I'll definitely have to make an Uzebox =D
Ceriand
Posts: 1
Joined: Wed Aug 27, 2008 4:03 am

Re: Theory of sound generation

Post by Ceriand »

Of friend of mine and I made a stepper motion controller as part of a robot for our senior EE project. We ended up using a version of the Bresenham algorithm to do our step modulation when accelerating. You could probably use a similar method here by just keeping track of the amount of error on each sample, and when it overflows a threshold, you add an extra sample.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Theory of sound generation

Post by uze6666 »

Acedio: I'm not sure I fully understand your question but as mentioned, you must use fixed point arithmetics for this. It's tough to explain without a picture. Lets try with a asm code example:

Lets use the use a 8:8 fixed point step (8 bits for the integral part and 8 bit for the fractional part)

Code: Select all

data: .db 0x25,0x34,0x78, 0xc0

ldi XL,lo8(data)
ldi XH,hi8(data)
clr r18            ;this will be our sample pointer fractional part
ldi r17,1          ;this is our stepping integral part
ldi r16,0x80       ;this is our stepping fractional part. it's .5 in decimal. (.5*256)
ld r0,X            ;r0=0x25, the fist byte since we didn't add the step yet

add r18,r16        ;we add the step to the pointer including fractional part
adc XL,r17         ;since the step is 1.5,  X now equals "data address+1.5"
adc XH,0           ;however X doesn't know about the fractional part so 
                   ;the pointer increments only by one
ld r0,X            ;r0=0x34

add r18,r16        ;add step again
adc XL,r17         ;X now equals "data address+3.0"
adc XH,0           ;
ld r0,X            ;r0=0xc0
Samples now play at 1.5x the mixing rate. The trick is now to (pre) calculate the correct step for the sample you want to play. This depends on the mixing frequency and also the sample rate and note frequency of the sample itself. I'll add that to the docs shortly (when I find a bit of time).

Cheers,

Uze
Acedio
Posts: 2
Joined: Tue Aug 26, 2008 10:22 pm

Re: Theory of sound generation

Post by Acedio »

Thanks Uze and Ceriand, that's exactly what I was looking for. I had completely forgot about fixed point arithmetic! I'll be sure to post any cool things I come up with =D
Tinctu
Posts: 65
Joined: Sun Aug 31, 2008 2:22 pm

Re: Theory of sound generation

Post by Tinctu »

What about to make simple crossplatform PC music tracker / sequencer that will save UZBX soungs???
I mean somethink like this one for Atari...

SEQUENCER
Image
INSTRUMENT EDITOR
Image

Simply I will make tunes on my notebook :D...

Or I will just use this MIDI Tracker... :roll:
http://www.rf1.net/software/mt
Image
Last edited by Tinctu on Sun Aug 31, 2008 6:39 pm, edited 1 time in total.
Post Reply