Assembler Tips
Interfacing with C language components from assembler
You can call C routines or write C callable assembly routines by understanding the calling convention and how the C compiler uses the AVR's registers:
- Using the assembler: http://www.nongnu.org/avr-libc/user-manual/assembler.html
- How to interface with C routines: http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage
One very important thing to remember is that r1 *must always* be cleared before invoking C functions. Similarly, r1 must be cleared when returning to C function from assembler.
16 bit and 32 bit math
Atmel has a nice cheat-sheet that outlines the algorithms to perform arithmetic on an 8-bit processor.
AVR 201: multiplication and division:
- http://ww1.microchip.com/downloads/en/AppNotes/Atmel-1631-Using-the-AVR-Hardware-Multiplier_ApplicationNote_AVR201.pdf
- http://ww1.microchip.com/downloads/en/AppNotes/avr201.zip
AVR 202: addition, subtraction and comparisons:
- http://ww1.microchip.com/downloads/en/AppNotes/doc0937.pdf
- http://ww1.microchip.com/downloads/en/AppNotes/avr202.zip
Use SUBI+SBCI for Fast additions
When in tigh loops, and need to add constant values to registers, you can save a register and load intruction by using SUBI + SBCI instead of ADD + ADC. The trick is to substact the complemented values.
So instead of:
ldi r20, lo8(0x1234) ldi r21, hi8(0x1234) add ZL, r20 adc ZH, r21
do:
subi ZL, lo8(-(0x1234)) sbci ZH, hi8(-(0x1234))
If the constant you want to add is small, keep in mind that you can also use the ADIW and SBIW instructions (for registers r25:r24, XH:XL, YH:YL and ZH:ZL)!
Branch by a Single Bit
Sometimes, you just want to test if a bit is set - like a flag. Normally you could use the SBRS or SBRC instructions for this purpose like:
sbrs r0, 1 ; Skip if bit 1 is set ((r0 & 0x02) != 0) rjmp bit1_was_clear (...) bit1_was_clear: (...)
Of course if you have a single instruction to perform when the bit was set or clear, you can use that directly instread of jumping.
In some cases, you might not want to keep the value in a register (such as when you are short in registers, for example when designing video mode or inline mixer code). This case the T flag can come handy:
bst r0, 1 ; Store bit 1 of the register for later use (...) brtc bit1_was_clear (...) bit1_was_clear: (...)
The T flag also has the peculiar property of not being set or cleared by any operation other than bst, set and clt. So it is immune to arithmetic, comparisons and other bit operations.
Jump Table
Jump to a location based on a calculation, much like a switch() in C:
mov ZL, r0 ; Load index (in this example in r0) ldi ZH, 0 subi lo8(-(pm(jump_table))) sbci hi8(-(pm(jump_table))) ijmp jump_table: rjmp jump_target_0 rjmp jump_target_1 rjmp jump_target_2 jump_target_0: (...) jump_target_1: (...) jump_target_2: (...)
You can also call functions this way via icall, instead of ijmp.
Multiply without MUL
In AVR assembler, the mul family of instructions treat the r0:r1 like an accumulator; it gets destroyed and replaced with the result without exception. If you can afford to destroy your operands instead, use bit shifts and addition.
; r25:r24 * 2 -> r25:r24 lsl r24 ; high bit goes into carry rol r25 ; move the carry into low bit of r25
; r25:r24 / 2 -> r25:r24 lsr r25 ; low bit goes into carry ror r24 ; move the carry into high bit of r24
; r24 * 2 -> r25:r24 ldi r25, 0 lsl r24 ; high bit goes into carry rol r25 ; move the carry into low bit of r25
Each shift left is equivalent to mutiplying by two, and each shift right is equivalent to dividing by two. Keep in mind that since the mul family of instructions are fast, it is most often better to just use them!
Use sbc for fast set/clear of a register based on the carry
lsl r24 ; shift the msb of r24 in the carry sbc r0, r0 ; if carry == 0 -> r0 = 0, if carry == 1 -> r0 = 0xff
You may use this technique to sign-extend after a muls processing a fixed-point number (often useful in inline mixer design).
Video mode & Inline mixer design
Timing
nop ; 1 cycle rjmp . ; 2 cycles lpm ; 3 cycles - but destroys a register (r0 if plain "lpm")
To kill off 3N cycles: (N > 0)
.macro delay3N value ldi r19, \value dec r19 brne .-4 .endm
You can use any register r16 or above for the counter. The branch costs one less cycle on the last iteration, but that is "paid for" by the LDI instruction up front.
To kill off variable number of cycles
You may also write a routine which waits an arbitrary amount of cycles as follows:
delay_cycles: lsr r24 brcs . ; +1 if bit0 was set lsr r24 brcs . ; +1 if bit1 was set brcs . ; +1 if bit1 was set dec r24 nop brne .-6 ; 4 cycle loop ret
This produces a delay of 12 cycles (excluding the CALL or RCALL used to call it), when r24 is 4. By incrementing r24, you can increment the delay cycle by cycle, up to 267 (r24 = 3, after wrapping around).
Useful Macros
Tired of the verbose way to send a pixel out? I always cut/paste another one myself. Here's a better way:
.macro pixel reg out _SFR_IO_ADDR(DATA_PORT),\reg ; 1 .endm
Timing measurements
The emulators can report cycles consumed between wdr instructions. You can use this to measure the performace or the timing of a block of code like this:
wdr (...) ; Block of code to measure wdr
Note that you don't need to use this if you want to check whether your HSync timing is right when designing a video mode as the CUzeBox emulator is capable to indicate timing proper.
Indirect Jump Without Z
Since the Z register is the only pointer allowed to index Program Memory, it tends to get used heavily. Unfortunately, Z is also necessary in order to execute an indirect jump, suitable for a jump table. If your Z register is busy, you can use the program stack instead and execute a RET instruction to perform the jump:
mov r24, r0 ; Load index (in this example in r0) ldi r25, 0 subi lo8(-(pm(jump_table))) sbci hi8(-(pm(jump_table))) push r24 push r25 ret
Keep in mind that this is costly (8 cycles versus the 2 cycles of the IJMP), so it may be rarely used, but sometimes might be useful to know.