I will try teach you
It's not actually very glamorous and mostly just being methodical.
First step (as in any endeavour) is to define the problem. If you write all you ASM code something like this it is not only easier to for someone else (or future you) to understand, it is easier for you to see the big picture.
Code: Select all
; r0 : trash
; r1 : trash
; r16 : Low 8 bits of LFSR Barrel Shifter / Later reused for "Volume"
; r17 : High 8 bits of LFSR Barrel Shifter
; r28 : Low byte of sample accumulator (to add up Channels 1..5)
; r29 : High byte of sample accumulator (to add up Channels 1..5)
; ZL : Divider / Countdown till next action
; ZH : Channel Parameters Bit 7..1 = Divider Reload Value Bit 1 = 7/15 bit LFSR mode
; Each 15.734Khz
;
; Decrement a counter
; If the counter has roller over then
; Reload the counter
; Perform the Linear Feedback Shift Register Function
; Calculte the XOR of Bit 0 and Bit 1
; if LFSR mode is 15 bit then store this result in bit 15
; else store the result in bit 7
; else do nothing
; If bit 0 of the LFSR is 0 then "Sample" = +127 (The noise channel is a square wave
; else "Sample" = -128 that can only be MAX/MIN SIGNED BYTE)
; Multiply the "Sample" by "Volume"
; Add the Sample to the 'Accumulator"
;channel 4 - 7/15 bit LFSR
lds r16, tr4_barrel_lo ; Load in the 16 bit LFSR
lds r17, tr4_barrel_hi
lds ZL, tr4_divider ; Load the divider
dec ZL ; Decrement the Divider
brpl ch4_no_shift ; If no overflow then do nothing
lds ZH, tr4_params ; Otherwise get the parameters (bit7..1 diver reload val, bit 0 LFSR mode)
mov ZL, ZH ; copy the parameters to "divider"
lsr ZL ; Shift Divider to keep bits7:1 (and get rid of the mode bit)
mov r0, r16 ; Make a copy of the low byte of LFSR for XOR opperation
lsr r0 ; Shift the copy so Bit 1 in the copy aligns with bit 0 in the original
eor r0, r16 ; XOR bit0 and bit1
bst r0, 0 ; And copy that result to T
lsr r17 ; Shift the 16 bit LFSR
ror r16
; Copy the XOR'd value into bit 15 (Regarless of mode as writing to bit 7
bld r17, 6 ; after this will overwrite this)
sbrs ZH, 0 ; Check to see if 7 or 15 bit LFSR mode
bld r16, 6 ; and if 7 bits mode then copy the XOR'd value into bit 7
sts tr4_barrel_lo, r16 ; Save the 16 bit LFSR
sts tr4_barrel_hi, r17
rjmp ch4_end ; Skip over the wait routine
ch4_no_shift:
;wait loop 21 cycles ; Wait routine to make both paths same length
ldi r17, 6
dec r17
brne .-4
ch4_end:
sts tr4_divider, ZL ; Save the divider (This may have been decremented OR reloaded)
ldi r17, 0x80 ; If the lowest bit of the LFSR is 0 the load "Sample" with
sbrc r16, 0 ; +127 otherwise load "Sample" with -128
ldi r17, 0x7f ;
lds r16, tr4_vol ; Get the channel volume
mulsu r17, r16 ; R1 = (sample*mixing vol) >> 8
sbc r0, r0 ; sign extend 8 bit R1 to 16 bits (MULSU leaves C = MSB of result)
add r28, r1 ; add low byte of (sample*vol>>8) to mixing accumulator
adc r29, r0 ; add high byte to mixing accumulator
; 39 Clocks
Now first and most obvious thing when looking at it. There are two paths through this
- Process LFSR
(Note here that the second path is not actually "Do nothing" but is "Do nothing but wait". It is the "but wait" part that makes it a separate path. If it was just "Do nothing" it would just be a skip down the same path.)
Optimization Rule : Minimize the critical path.
Now the "critical path" is not always the longest. Sometimes the critical path is the one run most often. Sometimes it is just the most important (Save the girl OR save Gotham City).
In this case the critical path is very obvious as the other path is "Do nothing but wait"
So we make the "Do nothing but wait" branch "Out of band". Every time a path splits and then has to come back together there is going to be a BRANCH to split the path and a JUMP to rejoin them.
If we make the BRANCH to somewhere outside of the code then the JUMP to join the paths back together can be in the 'Do nothing but wait" path that is otherwise just sitting there twiddling its thumbs.
Now the Critical Path is not being lumbered with the RJMP.
There are many other ways you might optimize the critical path. Another example is when the decision is based on a modified variable, the critical path needs the variable modified but the non critical path needs the original value
Code: Select all
Read variable from memory
Make a copy of the variable
Modify the copy of the variable
Make Decision based on the copy of the variable
Critical Path non-Critical Path
Do something with the Modified copy Do something with original variable
Do other critical stuff
could be changed to
Code: Select all
Read variable from memory
Modify the the variable
Make Decision based on the modified variable
Critical Path non-Critical Path
Do something with the Modified Variable Re-Load the un-modified variable from RAM
Do other critical stuff Do something with un-modified variable
So you can see we have made the non-critical path slower - but have sped up the critical path by not having to make a COPY of the variable.