Tempest is possible

Use this forum to share and discuss Uzebox games and demos.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Tempest is possible

Post by CunningFellow »

T2K already has black bars on the left/right of the screen - so has no "non visible" pixels. PLUS all that black area is used to pre-clear the VRAM if the "Clear VRAM flag" is set.

WRT to the PCM thing.

Currently the PCM channel 5 can play PCM sounds AND change the pitch and change the volume. Now you have told me that if you set the PCM channel to point to a 256 byte wave - it can emulate a wave table channel.

For all this functionality - that channel number 5 takes a whopping 45 clocks to process. Where as a normal wavetable channel only takes 27 clocks.

SO - my thoughts was - split that 45 clocks in two.

27 clocks for another normal wave table channel.
18 clocks for a PCM channel that can only play a sample at a single speed and single volume that is terminated by a token (0xFF maybe)

Code: Select all

lds   ZL, PCM_Position_Lo       ; Get low byte of PCM address in flash
lds   ZH, PCM_Position_Hi       ; Get high byte of PCM address in flash
lpm   r16, Z                    ; Get the PCM data
cpi   r16, 0xFF                 ; if r16 was 0xFF then set the carry flag
adc   ZL, r0                    ; increment the 16 bit value in Z only if the sample was 0xFF
adc   ZH, r0                    ;   by adding ZERO with carry
sts   PCM_Position_Lo, ZL       ; Store the PCM flash address regardless of if it has
sts   PCM_Position_Hi, ZH       ;   been incremented or not
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Tempest is possible

Post by CunningFellow »

CunningFellow wrote: We should still be slightly ahead if we change the PCM to fixed freq/length though.
OH - saying "fixed length" was probably the wrong way to say it. I meant the sample played ONCE without a repeat/sustain/loop at the end.

It is just a stream of bytes that is terminated with an 0xFF (or 0x00 or some other token)

So to play the "play" sample - the main music/sequencer code would point to the start of "PLAY" and then then inline mixer would play the PCM until it hit 0xFF and then the pointer would not move FWD.
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Tempest is possible

Post by D3thAdd3r »

Also keep in mind if you are opting simply including a custom sound engine in your makefile like Alec suggested, the priority system should make this all work. You wouldn't have to resort to squeezing in more channels which seems daunting; though I am sure you could pull it off if it is possible. I guess the advantage would be the sound effects are better, but reworking the priorities so TriggerNote()/T2k_TriggerFx work together nicely should let everything work including PLAY

If the 2 part effects always played both parts, then I would add more to make everything 2 parts then(like explosion noise channel gets extra low notes for bass sound on channel 5). No way around it, you can make things sound better when you are not limited to 1 note/wave at a time. Flash is tight no doubt for even adding those second patches. PLAY costs 3.5k which is probably 1 extra MOD song, more sfx secondary depth(and extra channel to play them if basically making 5 copies of channel 1), and some extra code space. I am not so biased against PLAY as I seem, it is just many a game I had some cool PCM and always traded it for other things I thought were worth more. FrogFeast I managed a PCM but I regretted being out of space, plus why the hell would a frog "ribbit" every time it jumps...hmmm..time for sleep :lol:
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Tempest is possible

Post by D3thAdd3r »

CunningFellow wrote:SO - my thoughts was - split that 45 clocks in two.

27 clocks for another normal wave table channel.
18 clocks for a PCM channel that can only play a sample at a single speed and single volume that is terminated by a token (0xFF maybe)
Oh I see, I like it. If the cycles lay out OK for that it should work since you only have to worry about 1 sample rate for PLAY. Though it would require probably as much rework to the C parts of the sound engine as the priority setup, it would allow sfx to play simultaneously with PLAY at least.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Tempest is possible

Post by CunningFellow »

I've had a crazy idea that maybe able to squeeze in 8 channels (5 wave, 2 noise, 1 fixed PCM)

I am going to have to start converting my object management routines to ASM to try and save a few K of flash so I can pull it off.

It's not quite as crazy as the T2K video mode. It is going to be hard to get your head around though.

Watch this space :)
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Tempest is possible

Post by D3thAdd3r »

Oh man, Cunning has finally lost his grip on reality :P

This would really be great if it works out. At this point I just default to believing you can do it when you utter what sounds the impossible heh. All those channels would be a nice addition and if you manage to save a bit extra I would go back through the effects and add more secondary parts to make sure everything was as utilized full time. Only concern might be how much ram you have left for all the channel variables.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Tempest is possible

Post by CunningFellow »

OK - here is the idea.

Each channel (wave or noise) takes 27 clock cycles.

The most expensive part of each channels 27 clocks is all the LDS and STS instructions.

example channel 1 here

Code: Select all

lds   r16, tr1_step_lo
lds   r17, tr1_pos_frac
add   r17, r16              ;add step to fractional part of sample pos
lds   r16, tr1_step_hi	
lds   ZL,  tr1_pos_lo
lds   ZH,  tr1_pos_hi 
adc   ZL,  r16              ;add step to low byte of sample pos
lpm   r16, Z                ;load sample
sts   tr1_pos_lo,   ZL
sts   tr1_pos_frac, r17
lds   r17, tr1_vol
mulsu r16, r17              ;(sample*mixing vol)
sbc   r0,r0                 ;sign extend	
mov   r28,r1                ;set (sample*vol>>8) to mix buffer lsb
mov   r29,r0                ;set mix buffer msb	
nop
16 of those 27 clock cycles are for loading and storing parameters
7 of them are for frequency divider, reading the sample and applying the volume <<< THE ACTUAL WORK
4 of them are for mixing

If we change the whole thing so it is part like the VSync mixer and part like the current inline mixer - we can almost double the channels. What I mean by this - is instead of processing each line separate you can bundle them up into runs of 4 lines and save samples to RAM. Like the VSync mixer saves 252 samples to RAM. It will be slightly different as we will be saving pre-mixed 8 bit values for 8 channels for 4 samples (32 bytes) instead. (maybe you can pre-mix channel pairs and make that 16 bytes I am not sure how that maths works yet)

Split the inline-mixer into 4 phases

Phase 1
Read 4 Samples from Channel 1 (Wave)
Write 3 Samples to RAM
Read 4 Samples from Channel 2 (Wave)
Write 3 Samples to RAM
Read Sample 1 of Channel 3,4,5,6,7 and 8 from RAM
Mix all 8 channels for sample 1

Phase 2
Read 4 Samples from Channel 3 (Wave)
Write 3 Samples to RAM
Read 4 Samples from Channel 4 (Wave)
Write 3 Samples to RAM
Read Sample 2 of Channel 1,2,5,6,7 and 8 from RAM
Mix all 8 channels for sample 2

Phase 3
Read 4 Samples from Channel 5 (Wave)
Write 3 Samples to RAM
Read 4 Samples from Channel 6 (Wave)
Write 3 Samples to RAM
Read Sample 3 of Channel 1,2,3,4,7 and 8 from RAM
Mix all 8 channels for sample 3

Phase 2
Read 4 Samples from Channel 7 (Noise)
Write 3 Samples to RAM
Read 4 Samples from Channel 8 (PCM)
Write 3 Samples to RAM
Read Sample 4 of Channel 1,2,3,4,5 and 6 from RAM
Mix all 8 channels for sample 4

There is a little more overhead and RAM usage by reading and writing the other 3 phases samples each time, but you save on those 16 clocks of reading/saving the channel parameters a lot.


Other things to consider to fully flesh this out

PCM Currently takes 45 clocks - but if I make that a fixed freq-non-wrapping channel it can be ~18 clocks
Noise also takes 27 clocks but has less LDS/STS fat in it (so may only be able to share with PCM)
There will be about another 20 clocks of overhead pushing and popping more stack
There will be more overhead (and a counter) deciding which phase to run each time "update sound" is called (guess at 20 clocks also)
There will be 32+ bytes extra RAM used and I will have to find that somewhere. At present I have 36 bytes free in T2K below object strore
There will be 600 words extra flash taken up which is about 2x what I currently have free in T2K
I don't fully understand the mixing maths. It seems to be 16 bit math, but the samples are only 8 bits. So this might be a show stopper if I can't just save an 8 bit value for the samples.
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Tempest is possible

Post by D3thAdd3r »

I think I get the basics of your idea. It seems that this should work out OK for the normal HSYNC period that constantly interrupts user code as well?

Pretty intense! You could do some really amazing music with this concept as well.
User avatar
Jubatian
Posts: 1564
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Tempest is possible

Post by Jubatian »

Interesting ideas! I didn't yet look in what happens when one calls update_sound, only digging in it when I noticed how it also uses the T flag (got some nasty bug from that, its not the register usage which is the problem, rather the occasional lack of its correct documentation). I saw there is a lot of housekeeping in there (stack and load/store), nice someone is thinking about how to use those cycles more efficiently!

Just some general ideas for making it fitting with other video modes. Of course register usage on the interface should stay (since hsync_pulse in which it is, is called by video modes which usually also use all the regs they can). The more important is cycle counts, currently being 213 at most (due to line 369 in uzeboxVideoEngineCore.s it is not possible to have both UART and 5ch). You should also be careful with the stack: the video already uses a considerable stack frame (saving all the registers), the audio pushes stuff onto that (a bit OFF: if in a game you could make sure that the game logic using a deep stack can not be interrupted by video, you can also gain a considerable amount of RAM, like 40 bytes or so, saved by that you use less stack).

Just some ramblings. Keep going, hope it can be accomplished! :)
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Tempest is possible

Post by CunningFellow »

Uze,

Am I correct in thinking that the Wave Tables default to include all 10 waves even if they are not all used?

The patches D3thAdd3r and yourself have given me only use 7 of the 10 waves in the default include file

That would mean if I did a custom "sounds.inc" I could save 768 bytes of FLASH yes ?

Also the "Align 8" could waste up to 255 bytes of flash if the linker does not put it at the absolute end. In my latest compile the HEX and LST file shows it wasting 252 bytes.

If I use a "section" to align it manually I should be able to save that 252 bytes from no-mans-land.

If all this is true/correct then I think I can get almost 1K of flash free again and I am in business with the new custom sound mixer.
Post Reply