C Tips, Tricks and Optimizations

From Uzebox Wiki
Jump to navigation Jump to search

Using Named Address Spaces

Named Address Spaces is a feature often available in C compilers targeting architectures requiring multiple address spaces. GCC has it for the AVR platform, as described in its documentation:

https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html

This is a very useful feature both easing programming (compared to using the pgmspace functions) and porting to other architectures.

For example, if you previously had a ROM array like this:

   static u8 const level_data[20 * 16] PROGMEM = {
     (...)
   };

   int main(void)
   {
     u8  tile;
     u16 addr;

     (...)

     tile = pgm_read_byte(&level_data[addr]);

     (...)
   }

With Named Address Spaces, you can do this instead:

   static u8 const __flash level_data[20 * 16] = {
     (...)
   };

   int main(void)
   {
     u8  tile;
     u16 addr;

     (...)

     tile = level_data[addr];

     (...)
   }

The point is that you no longer need to use the pgm_read_byte() and related macros, you can use the data just like any other data accessed through pointers or array indexing.

In addition to this, the compiler does type checks on these arrays and pointers proper, and will provide error messages if you accidentally mix them up (such as trying to pass something in the ROM to a function accepting a RAM pointer). Of course, you need to declare your functions accordingly, like:

   void load_level(u8 const __flash* ldata);

Moreover Named Address Spaces can help you porting if you wanted to compile your game for other targets. You can negate the __flash keyword by:

   #define __flash

This definition essentially removes it, so the __flash pointers and arrays become ordinary pointers and arrays which a compiler for example targeting a normal PC can handle.

Use -fno-tree-switch-conversion

The GCC compiler for some reason likes to create jump tables in RAM for large switches on higher optimization levels. This can bump your RAM consumption up which is often a bottleneck in Uzebox games, and it is neither something very elegant. You can get rid of this behavior by adding the following parameter to your CFLAGS in the Makefile:

-fno-tree-switch-conversion

The resulting code may be a little slower, but usually this is much less of a problem than consuming RAM (for some reason GCC is apparently not capable to generate a ROM jump table).

Pointer arithmetic is often faster

On the AVR target it is often faster to use pointer arithmetic in tight loops than indexing. For example:

   u8* data;

   (...)

   sum = 0U;
   for (u8 i = 0U; i < 100U; i ++){
     sum += data[i];
   }

Such a loop might compile to something performing better if it was written as:

   u8* data;

   (...)

   sum = 0U;
   for (u8 i = 0U; i < 100U; i ++){
     sum += *(data++);
   }

Most often the compiler can figure out how to make optimal code in these cases, so you don't need to care about such (use whichever feels the more intuitive for you), however if you see you are experiencing a performance bottleneck, such tight loops might be good candidates for improvements. Sometimes the compiler fails to do a decent job with them, then rewriting them in a similar manner could help. Of course look at the assembly output first! (the usual Makefiles in the Uzebox project generate them, they are the .lss files)

Function calls

On architectures with many registers function calls can have significant overhead compared to what they are doing. This originates from that the Application Binary Interface (ABI) marks many registers as potentially used by a function. On the AVR registers r0, r18-r24, X and Z may be altered by a function.

When the compiler generates code, whenever it sees a function call, it must assume all these registers are changed (as it can not see what is within the called function). So the caller has to save whatever it stored in these registers before making a call. This is made worse by that the caller itself is also a function, so it has the exact same set of working registers like what it may call, so it has to use the stack to save things.

Below coding techniques are outlined to minimize overheads coming from this architectural property.

Don't forget static

Don't forget about the static keyword. Define functions which are used locally to a module (C source file) static, like:

   static u8 actor_get_sprite(actor_t* actor)
   {
     (...)
   }

A static function is only visible within the source file it is defined, and often the compiler will eliminate the function call if it determines that inlining it produces better code. If you forget the static, then it becomes an externally visible function (even if you don't have its declaration in the header file), which enforces the compiler to always generate it as a function!

So write as many functions as you need for a clean, easy to read code, and have them static when they are not meant to be called from outside! The compiler will do the rest.

Use tail calling

If you can organize your function which needs to call some other that it does this on the end, before returning, the compiler can optimize the construct to a tail call, instead of calling, transferring to the called function by a jump. For example:

   void map_changetile(u16 x, u16 y, u8 tile)
   {
     (...) /* Calculate screen coordinates, early return if off-screen */
     screen_settile(u8 x, u8 y, u8 tile);
   }

This case since screen_settile() is at the end, the compiler most likely generates code which prepares its parameters in the appropriate registers, then jumps to it.

Call before calculating

The opposite to tail calling may also translate into code performing better, that is, if possible, call functions you need input from early, before doing too many calculations. This ensures that there is a smaller workspace to save and restore before and after the call.

Using global variables

It is generally a bad practice to have global variables, however they could help if you have performance-critical code which would otherwise have to call setter and getter functions in another module.

Reading or writing a global variable translates to single instructions while the call marks the entire set of registers clobbered which the ABI marks usable by functions.

So don't use them, but know you may cheat a bit by using them if you have something performance-critical which could benefit from them.

(Global variables in this context doesn't cover variables marked static which are only visible in the source file they are defined. You can always use as many of those as you like)