Inline Mixer - saving clocks

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Uze, What is the story with GPIOR0 ?

Did you decide to use that in the sound engine somehow ?

That could give me 2 extra clocks to play with.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Uze, aside from the above ? about GPIOR0 does this look correct

Your code here I think is checking if the mixing 16 bit register

less than -128 the result will end up 0
between -128 and +127 and the result is adjusted to be 0..255
greater than 127 and the result is 255

Code: Select all

	;final processing

	;clip
	clr r0
	cpi r28,128	;> 127?
	cpc r29,r0 ;0	
	brlt .+2
	ldi r28,127
	
	dec r0
	cpi r28,-128; <-128?
	cpc r29,r0 ;0xff
	brge .+2
	ldi r28,-128

	subi r28,128	;convert to unsigned		
	sts _SFR_MEM_ADDR(OCR2A),r28 ;output sound byte
It takes 13 clocks

Code: Select all

subi  r28, 0x80
sbci  r29, 0xFF
brcs  .+2
ldi   r28, 0x00
tst   r29
breq  .+2
ldi   r29, 0xFF
sts   _SFR_MEM_ADDR(OCR2A),r28
I think the above does the same in 9 clocks and could possibly be 8 clocks if Z is favorable to us and the TST is redundant (can't think too stright now)

Does this look right ?
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Inline Mixer - saving clocks

Post by uze6666 »

I want to reserve GPIOR0 for kernel use, perhaps for crazy optimizations like this. :) I'm ok to use it in the mixer.

The code below is really to clamp/saturate the 16bit accumulator in r28:r29 to an signed 8 bit value (then make it unsigned).

But yeah, the end results accounting for the sign conversion is right. I can't wrap my head on the flag logic right now, I'd need to validate it in the simulator.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Got it down to 8 clocks in the end (saving 5 clocks)

Code: Select all

; r28:29 contains the 16 bit signed value for the mixing accumulator
; 
; after adding all the channels together the min/max legal values are -128 and +127
; any values outside this range are illegal and need to be clipped to the maximum
; allowed values
;
;
; examples
;             +128  +128
;  Dec   Hex   Dec   Hex
;
; -129  FF7F    -1  FFFF  << Ilegal value and must be clipped to 0
;
; -128  FF80     0  0000  << Legal values all have 0x00 as the high byte
;   -1  FFFF  +127  007F  << 
;    0  0000  +128  0080  << So we can test for legal values by comparing
;   +1  0001  +129  0081  << the high byte with 0x00
; +127  007F  +255  00FF  <<
;
; +128  0080  +256  0100  << Illegal value and must be clipped to 255


    subi   r28, 0x80                    ; Convert to (8 bit) unsigned by adding 128
    sbci   r29, 0xFF                    ; and continue with the 16 bit math to get high byte for "0x00 test"

    cpi    r29, 0x00                    ; if the hi byte is 0x00
    breq   No_Over_Under_Flow           ; then there has been no underflow or overflow during mixing
                                        ; and branch "out of band" for the legal cases

    cpi    r29, 0x80                    ; other wise work out if it was an underflow (leave 0 in C)
                                        ;                            or an overflow  (leave 1 in C)
    sbc    r28, r28                     ; and use carry to set r24 to either 0x00 (underflow) or 0xFF (overflow)

    sts    _SFR_MEM_ADDR(OCR2A), r28    ; output the sound sample to the PWM generator
    ret

No_Over_Under_Flow:
    nop                                 ; equalize clocks for two paths
    sts    _SFR_MEM_ADDR(OCR2A), r28    ; output the sound sample to the PWM generator
    ret

CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Oh - and I wrote some test code to see it works OK
MixSimulate.png
MixSimulate.png (4.84 KiB) Viewed 6366 times
the test cases for r28:29 are on the left side and range from -897 to +890

The values in SREG after the SUBI and SBCI are in the middle

The modified values in R28:29 are on the far right with the last two digits showing the expected result (I tested it with the original code and got same result for r28)
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

So adding the savings up now

1 Nop
3 Hsync
7 LFSR
4 Wave Channels
5 Final Mixing
4 One less push/pop
3 Changing PCM for Dumb PCM + Normal Wave

27 clocks

I am now only 5 clocks shy of a 2nd noise channel

Using GPIOR0 could save another 2 clocks.

The only spot I have not looked at to optimize yet is the old full featured PCM. If I could save enough clocks off it then I could have

3 wave
2 noise
1 PCM (that doubles as a ave channel when not PCM-ing)

which seeing as D3thAdd3r pointed out "play" only happens in quite music parts that is quite OK.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Well - assuming I have my dissection of the full featured PCM channel correct - then there is no clock savings I can see in it.

Can't win them all :(

I did find savings of

5% on the Wave channels
18% on the noise channel
38% on the final mixing

I'll call it a day here till I decide if I am going to go with

wave, noise, dumb PCM
3+2+1
5+1+1 (with the dumb PCM being able to be switched to a 6th wave channel)
6+0+1 (with the dumb PCM being able to be switched to a 7th wave channel)

Code: Select all

;channel 5 PCM -- 45 cycles
;
; pass n-3  S...................E......   << Keep adding divider to sample position
; pass n-2  ........S...........E......
; pass n-1  ................S...E......
; pass n    ....N...............E....I.   << When sample position is past end
;               <--------------------|    << Then subtract sample repeat length rather
;                                            than just resetting to sample start
;                                            to ensure pitch is continuous
;
; S = Sample position
; E = End of PCM sample
; I = Invalid sample position past end of PCM sample
; N = New sample position after subtracting sample length
;
; C equivalent of this logic
;
; if(samplePosition > sampleEnd) samplePosition -= loopLength;

                            ; Add fractional part -
lds   r16, tr5_pos_frac     ; Get the fractional part of the 16:8 bit sample position
lds   r17, tr5_step_lo      ; Get the fractional 8 bits of the 8:8 divider
add   r16, r17              ; Add the two fractional parts together setting CARRY
sts   tr5_pos_frac, r16     ; Save the fractional part of the sample position

                            ; Add lo 8 bits -
lds   ZL, tr5_pos_lo        ; Get the lo whole 8 bits of the 16:8 bit sample position
lds   r17, tr5_step_hi      ; Get the whole 8 bits of the 8:8 divider
adc   ZL, r17               ; Add the two whole bits together with carry from fractional add

                            ; Add hi 8 bits
lds   ZH, tr5_pos_hi        ; Get the hi whole 8 bits of the 16:8 bit sample position
ldi   r16, 0                ;
adc   ZH, r16               ; Add the carry bit to the hi 8 bits

                            ; At this stage ZL:ZH contains the current sample position

movw  r16, ZL               ; Make a copy of the current sample position to R16:r17
lds   r0, tr5_loop_len_lo   ; Get the 16 bit value for the loop length in r0:r1
lds   r1, tr5_loop_len_hi	
sub   r16, r0               ; subtract the length of the "loop" from the copy of current
sbc   r17, r1               ;     position to give the loop_repeat position

lds   r0, tr5_loop_end_lo   ; Get the end position of the PCM sample
lds   r1, tr5_loop_end_hi
cp    ZL, r0                ; Compare the current position to the end position
cpc   ZH, r1
brlo  .+2                   ; If the current position is past the end position
movw  ZL, r16               ; then overwrite the current position with the loop_repeat
                            ;     position calculated above

sts   tr5_pos_lo, ZL        ; Save the current position
sts   tr5_pos_hi, ZH

lpm   r16, Z                ; load sample

lds   r17, tr5_vol          ; Get the channel volume
mulsu r16, r17              ; R1 = (sample*mixing vol) >> 8
sbc   r0,  r0               ; sign extend 8 bit R1 to 16 bits (MULSU leaves C = MSB of result)
add   r28, r1               ; add low  byte of (sample*vol>>8) to mixing accumulator
adc   r29, r0               ; add high byte to mixing accumulator
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

Next idea that has arisen from the T2K thread.

Sure yeah - I got the LFSR noise channel down from 39 clocks to 32 clocks. But that just wasn't good enough :(

How about we try make "noise" a different way?

The LFSR was probably used in AY-3 and older synthesisers because it is very very little silicon. Back in those days every gate counted.

D3thAdd3r had the idea of a noise "waveform" so you could just reuse a wave channel as noise.

I am now thinking - what if we could find a way to make "noise" that throws tables and/or MUL at the problem that can come in at 24 clocks or less.

In fact, if we can make "noise" for 20 clocks or less then we can just have "7 channels" and on the fly the sequencer can decide "wave, noise or dumb-pcm"
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Inline Mixer - saving clocks

Post by uze6666 »

Using a wave channel for noise can work, but only for metallic sounding effects. This is due to the short repeating sequence of having only 8 bits. You wont be able to do good sounding cymbals, hats and such. Another option I once tough back then is to just read from some arbitrary point in flash. Code tends to sound relatively random, but graphics or other data will not. Plus depending on the conmpilation it will always sound a bit different, so not good at all. So that's why I went for a 7/15 bit LFSR. The sound palette is more rich and varied for the relatively small footprint in CPU and code.

But a custom mixer like you are doing, perhaps two noise channels with the contraint of not being multimode could save cycles? Say one being 7-bits and the other 15-bits. Or both being 15 bits and using a wave channel for 7-bit sounding effects?
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Inline Mixer - saving clocks

Post by CunningFellow »

How often do noise patches have a divider of 0?

Would it be OK to have one of the two noise channels not able to have a divider of zero?
Post Reply