Inline Mixer - saving clocks
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Uze, What is the story with GPIOR0 ?
Did you decide to use that in the sound engine somehow ?
That could give me 2 extra clocks to play with.
Did you decide to use that in the sound engine somehow ?
That could give me 2 extra clocks to play with.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Uze, aside from the above ? about GPIOR0 does this look correct
Your code here I think is checking if the mixing 16 bit register
less than -128 the result will end up 0
between -128 and +127 and the result is adjusted to be 0..255
greater than 127 and the result is 255
It takes 13 clocks
I think the above does the same in 9 clocks and could possibly be 8 clocks if Z is favorable to us and the TST is redundant (can't think too stright now)
Does this look right ?
Your code here I think is checking if the mixing 16 bit register
less than -128 the result will end up 0
between -128 and +127 and the result is adjusted to be 0..255
greater than 127 and the result is 255
Code: Select all
;final processing
;clip
clr r0
cpi r28,128 ;> 127?
cpc r29,r0 ;0
brlt .+2
ldi r28,127
dec r0
cpi r28,-128; <-128?
cpc r29,r0 ;0xff
brge .+2
ldi r28,-128
subi r28,128 ;convert to unsigned
sts _SFR_MEM_ADDR(OCR2A),r28 ;output sound byte
Code: Select all
subi r28, 0x80
sbci r29, 0xFF
brcs .+2
ldi r28, 0x00
tst r29
breq .+2
ldi r29, 0xFF
sts _SFR_MEM_ADDR(OCR2A),r28
Does this look right ?
Re: Inline Mixer - saving clocks
I want to reserve GPIOR0 for kernel use, perhaps for crazy optimizations like this. I'm ok to use it in the mixer.
The code below is really to clamp/saturate the 16bit accumulator in r28:r29 to an signed 8 bit value (then make it unsigned).
But yeah, the end results accounting for the sign conversion is right. I can't wrap my head on the flag logic right now, I'd need to validate it in the simulator.
The code below is really to clamp/saturate the 16bit accumulator in r28:r29 to an signed 8 bit value (then make it unsigned).
But yeah, the end results accounting for the sign conversion is right. I can't wrap my head on the flag logic right now, I'd need to validate it in the simulator.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Got it down to 8 clocks in the end (saving 5 clocks)
Code: Select all
; r28:29 contains the 16 bit signed value for the mixing accumulator
;
; after adding all the channels together the min/max legal values are -128 and +127
; any values outside this range are illegal and need to be clipped to the maximum
; allowed values
;
;
; examples
; +128 +128
; Dec Hex Dec Hex
;
; -129 FF7F -1 FFFF << Ilegal value and must be clipped to 0
;
; -128 FF80 0 0000 << Legal values all have 0x00 as the high byte
; -1 FFFF +127 007F <<
; 0 0000 +128 0080 << So we can test for legal values by comparing
; +1 0001 +129 0081 << the high byte with 0x00
; +127 007F +255 00FF <<
;
; +128 0080 +256 0100 << Illegal value and must be clipped to 255
subi r28, 0x80 ; Convert to (8 bit) unsigned by adding 128
sbci r29, 0xFF ; and continue with the 16 bit math to get high byte for "0x00 test"
cpi r29, 0x00 ; if the hi byte is 0x00
breq No_Over_Under_Flow ; then there has been no underflow or overflow during mixing
; and branch "out of band" for the legal cases
cpi r29, 0x80 ; other wise work out if it was an underflow (leave 0 in C)
; or an overflow (leave 1 in C)
sbc r28, r28 ; and use carry to set r24 to either 0x00 (underflow) or 0xFF (overflow)
sts _SFR_MEM_ADDR(OCR2A), r28 ; output the sound sample to the PWM generator
ret
No_Over_Under_Flow:
nop ; equalize clocks for two paths
sts _SFR_MEM_ADDR(OCR2A), r28 ; output the sound sample to the PWM generator
ret
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Oh - and I wrote some test code to see it works OK
the test cases for r28:29 are on the left side and range from -897 to +890
The values in SREG after the SUBI and SBCI are in the middle
The modified values in R28:29 are on the far right with the last two digits showing the expected result (I tested it with the original code and got same result for r28)
the test cases for r28:29 are on the left side and range from -897 to +890
The values in SREG after the SUBI and SBCI are in the middle
The modified values in R28:29 are on the far right with the last two digits showing the expected result (I tested it with the original code and got same result for r28)
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
So adding the savings up now
1 Nop
3 Hsync
7 LFSR
4 Wave Channels
5 Final Mixing
4 One less push/pop
3 Changing PCM for Dumb PCM + Normal Wave
27 clocks
I am now only 5 clocks shy of a 2nd noise channel
Using GPIOR0 could save another 2 clocks.
The only spot I have not looked at to optimize yet is the old full featured PCM. If I could save enough clocks off it then I could have
3 wave
2 noise
1 PCM (that doubles as a ave channel when not PCM-ing)
which seeing as D3thAdd3r pointed out "play" only happens in quite music parts that is quite OK.
1 Nop
3 Hsync
7 LFSR
4 Wave Channels
5 Final Mixing
4 One less push/pop
3 Changing PCM for Dumb PCM + Normal Wave
27 clocks
I am now only 5 clocks shy of a 2nd noise channel
Using GPIOR0 could save another 2 clocks.
The only spot I have not looked at to optimize yet is the old full featured PCM. If I could save enough clocks off it then I could have
3 wave
2 noise
1 PCM (that doubles as a ave channel when not PCM-ing)
which seeing as D3thAdd3r pointed out "play" only happens in quite music parts that is quite OK.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Well - assuming I have my dissection of the full featured PCM channel correct - then there is no clock savings I can see in it.
Can't win them all
I did find savings of
5% on the Wave channels
18% on the noise channel
38% on the final mixing
I'll call it a day here till I decide if I am going to go with
wave, noise, dumb PCM
3+2+1
5+1+1 (with the dumb PCM being able to be switched to a 6th wave channel)
6+0+1 (with the dumb PCM being able to be switched to a 7th wave channel)
Can't win them all
I did find savings of
5% on the Wave channels
18% on the noise channel
38% on the final mixing
I'll call it a day here till I decide if I am going to go with
wave, noise, dumb PCM
3+2+1
5+1+1 (with the dumb PCM being able to be switched to a 6th wave channel)
6+0+1 (with the dumb PCM being able to be switched to a 7th wave channel)
Code: Select all
;channel 5 PCM -- 45 cycles
;
; pass n-3 S...................E...... << Keep adding divider to sample position
; pass n-2 ........S...........E......
; pass n-1 ................S...E......
; pass n ....N...............E....I. << When sample position is past end
; <--------------------| << Then subtract sample repeat length rather
; than just resetting to sample start
; to ensure pitch is continuous
;
; S = Sample position
; E = End of PCM sample
; I = Invalid sample position past end of PCM sample
; N = New sample position after subtracting sample length
;
; C equivalent of this logic
;
; if(samplePosition > sampleEnd) samplePosition -= loopLength;
; Add fractional part -
lds r16, tr5_pos_frac ; Get the fractional part of the 16:8 bit sample position
lds r17, tr5_step_lo ; Get the fractional 8 bits of the 8:8 divider
add r16, r17 ; Add the two fractional parts together setting CARRY
sts tr5_pos_frac, r16 ; Save the fractional part of the sample position
; Add lo 8 bits -
lds ZL, tr5_pos_lo ; Get the lo whole 8 bits of the 16:8 bit sample position
lds r17, tr5_step_hi ; Get the whole 8 bits of the 8:8 divider
adc ZL, r17 ; Add the two whole bits together with carry from fractional add
; Add hi 8 bits
lds ZH, tr5_pos_hi ; Get the hi whole 8 bits of the 16:8 bit sample position
ldi r16, 0 ;
adc ZH, r16 ; Add the carry bit to the hi 8 bits
; At this stage ZL:ZH contains the current sample position
movw r16, ZL ; Make a copy of the current sample position to R16:r17
lds r0, tr5_loop_len_lo ; Get the 16 bit value for the loop length in r0:r1
lds r1, tr5_loop_len_hi
sub r16, r0 ; subtract the length of the "loop" from the copy of current
sbc r17, r1 ; position to give the loop_repeat position
lds r0, tr5_loop_end_lo ; Get the end position of the PCM sample
lds r1, tr5_loop_end_hi
cp ZL, r0 ; Compare the current position to the end position
cpc ZH, r1
brlo .+2 ; If the current position is past the end position
movw ZL, r16 ; then overwrite the current position with the loop_repeat
; position calculated above
sts tr5_pos_lo, ZL ; Save the current position
sts tr5_pos_hi, ZH
lpm r16, Z ; load sample
lds r17, tr5_vol ; Get the channel volume
mulsu r16, r17 ; R1 = (sample*mixing vol) >> 8
sbc r0, r0 ; sign extend 8 bit R1 to 16 bits (MULSU leaves C = MSB of result)
add r28, r1 ; add low byte of (sample*vol>>8) to mixing accumulator
adc r29, r0 ; add high byte to mixing accumulator
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
Next idea that has arisen from the T2K thread.
Sure yeah - I got the LFSR noise channel down from 39 clocks to 32 clocks. But that just wasn't good enough
How about we try make "noise" a different way?
The LFSR was probably used in AY-3 and older synthesisers because it is very very little silicon. Back in those days every gate counted.
D3thAdd3r had the idea of a noise "waveform" so you could just reuse a wave channel as noise.
I am now thinking - what if we could find a way to make "noise" that throws tables and/or MUL at the problem that can come in at 24 clocks or less.
In fact, if we can make "noise" for 20 clocks or less then we can just have "7 channels" and on the fly the sequencer can decide "wave, noise or dumb-pcm"
Sure yeah - I got the LFSR noise channel down from 39 clocks to 32 clocks. But that just wasn't good enough
How about we try make "noise" a different way?
The LFSR was probably used in AY-3 and older synthesisers because it is very very little silicon. Back in those days every gate counted.
D3thAdd3r had the idea of a noise "waveform" so you could just reuse a wave channel as noise.
I am now thinking - what if we could find a way to make "noise" that throws tables and/or MUL at the problem that can come in at 24 clocks or less.
In fact, if we can make "noise" for 20 clocks or less then we can just have "7 channels" and on the fly the sequencer can decide "wave, noise or dumb-pcm"
Re: Inline Mixer - saving clocks
Using a wave channel for noise can work, but only for metallic sounding effects. This is due to the short repeating sequence of having only 8 bits. You wont be able to do good sounding cymbals, hats and such. Another option I once tough back then is to just read from some arbitrary point in flash. Code tends to sound relatively random, but graphics or other data will not. Plus depending on the conmpilation it will always sound a bit different, so not good at all. So that's why I went for a 7/15 bit LFSR. The sound palette is more rich and varied for the relatively small footprint in CPU and code.
But a custom mixer like you are doing, perhaps two noise channels with the contraint of not being multimode could save cycles? Say one being 7-bits and the other 15-bits. Or both being 15 bits and using a wave channel for 7-bit sounding effects?
But a custom mixer like you are doing, perhaps two noise channels with the contraint of not being multimode could save cycles? Say one being 7-bits and the other 15-bits. Or both being 15 bits and using a wave channel for 7-bit sounding effects?
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Inline Mixer - saving clocks
How often do noise patches have a divider of 0?
Would it be OK to have one of the two noise channels not able to have a divider of zero?
Would it be OK to have one of the two noise channels not able to have a divider of zero?