Get that emu faster

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Get that emu faster

Post by Jubatian » Thu Oct 01, 2015 11:16 pm

Hi!

I was a bit shocked when I tried uzem, it did work, but it can barely do its job, getting stuck at somewhere near 26MHz while keeping a 2.2GHz core busy (not very great for the audible experience). Observing the code, I could push it past this point with a little modification:

Code: Select all

inline void set_bit(u8 &dest, unsigned int bit, unsigned int value)
{
 value = ((value + 0xFFFFU) >> 16);
 dest = dest & (~(1U << bit));
 dest = dest | (value << bit);
}
It is not fully equivalent of what was there (only works so long the 'value' input is within 16 bits), but it looked like it passed. Apparently with the original gcc couldn't figure out any way to eliminate the if, and there are lots of calls to this thing. I am not sure either, this source would be a beast to dig around in assembler, just that after I compiled it with this change, I could get 28MHz and continuous music.

How it shocks me is that I conceived a 16 bit CPU, and did an emulator for that at 12.5MHz, which I occasionally tested on a Pentium MMX 233MHz, and that could almost do it (although that CPU is designed to work without flags, so probably much less work per instruction). Maybe the big damn switch-case loops are neither turned in jump tables etc. in here and there is a hell of branch misprediction going on, maybe something else, but quite disappointing.

I am thinking about trying to do something about it, but to be honest what is in there looks a bit like a mess (compared to the assembly of say, the display modes of the Uzebox kernel whose construction I could understand well from the code). It would take a while to analyze the assembly output to see how the compiler deals with it (and where may lie so many cycles burning away).

By the way another thing which bugs me is that I cannot get any input to it. What I could deduct from the code seems like it looks for joysticks, and if none exists, there is no input despite what its help says (the keyboard controls). I couldn't get to feel confident enough with the code so far to hack myself around this to actually play games... (I tried softgun though, which seems fast, but had a few other oddities, and some weird alsa bug, so lacks sound, but at least I could try a few games with it for the "look and feel")

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox » Fri Oct 02, 2015 12:54 am

Nice work! :)

If you haven't already, edit the Makefile to uncomment out these two lines:

Code: Select all

ARCH=native
TUNE=y
That's what made the biggest speed difference for me on my old laptop.

Also, you should try the uzem140 branch, as that uses SDL2 and it's much faster on both my laptop and desktop machines. It renders into a smaller memory buffer, and uses the GPU to scale it up. The master branch that uses SDL1.2 falls short of 100% speed on my laptop no matter what I do.

I did a quick test with your optimization. Before the change, with sound (and vsync disabled) on my desktop I can run it at 72.6 MHz, and with this change, it can run at 73.5 MHz. So it definitely works!

User avatar
uze6666
Site Admin
Posts: 4449
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Get that emu faster

Post by uze6666 » Fri Oct 02, 2015 3:11 am

And I'm glad the original programmer made it, since it helped to build the Uzebox community a lot. It was coded in about only 3 weeks so, yeah, it's not perfect and it's very slow. Since it just works at barely 28Mhz on my machine I never bothered to improve it. And as you said, the code is a bit messy and convoluted. But eh, it works and a sub-optimal emu is way better than no emu at all! :P You can send be push requests on github for any improvements (just post something first about it, so it doesn't clash with someone else work).

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox » Fri Oct 02, 2015 5:57 am

Jubatian wrote:By the way another thing which bugs me is that I cannot get any input to it. What I could deduct from the code seems like it looks for joysticks, and if none exists, there is no input despite what its help says (the keyboard controls).
Try hitting the '5' key until the console outputs:

Code: Select all

SNES pad.
and then you should be able to use the keyboard.

User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian » Fri Oct 02, 2015 8:01 am

Huh, fast responses here! :)

For now I hacked the generation of assembly output into the Makefile, and checked what happens with those big switches. Mostly they turned into jump tables all right, so those alone aren't the culprit, although maybe their nesting is (each nesting level is a guaranteed pipeline flush, there is even a 4 levels deep switch in there along with a huge pile of level 3's just in the apparent code, who knows what lurks beyond, through calls).

Otherwise there might be stuff hidden in the general execution paths. I won't be at my Linux box the weekend, but I guess I will take the code and the assembly with me to try to trace a few paths in it. And maybe to see what could be done about "flattening" those switches or building some call table instead.

(If mentioning GitHub, yes, I will do once I get familiar with the code, then! If interested, you may check out my profile there, along with my RRPGE system's emulator which has that 16 bit CPU I mentioned)
Artcfox wrote:Try hitting the '5' key until the console outputs:
Tried, no luck. I tested it with the Controller Tester app, which keeps spitting out "no controller" no matter what. If I enable mouse ('-m' flag) or hit '5' until it says SNES mouse, when I move the mouse over uzem's window, I get the mouse flicking in on the top (but no reaction for clicks).

Yes, having an emu is definitely better than none :) - and the system feels simple, manageable enough to not make it too complicated (it is already something that it fits within so little code, convoluted it may be). It raised my interest, hopefully I can get a few things done in it here and there.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox » Fri Oct 02, 2015 11:19 am

Jubatian wrote:Huh, fast responses here! :)

For now I hacked the generation of assembly output into the Makefile, and checked what happens with those big switches. Mostly they turned into jump tables all right, so those alone aren't the culprit, although maybe their nesting is (each nesting level is a guaranteed pipeline flush, there is even a 4 levels deep switch in there along with a huge pile of level 3's just in the apparent code, who knows what lurks beyond, through calls).

Otherwise there might be stuff hidden in the general execution paths. I won't be at my Linux box the weekend, but I guess I will take the code and the assembly with me to try to trace a few paths in it. And maybe to see what could be done about "flattening" those switches or building some call table instead.

(If mentioning GitHub, yes, I will do once I get familiar with the code, then! If interested, you may check out my profile there, along with my RRPGE system's emulator which has that 16 bit CPU I mentioned)
Artcfox wrote:Try hitting the '5' key until the console outputs:
Tried, no luck. I tested it with the Controller Tester app, which keeps spitting out "no controller" no matter what. If I enable mouse ('-m' flag) or hit '5' until it says SNES mouse, when I move the mouse over uzem's window, I get the mouse flicking in on the top (but no reaction for clicks).

Yes, having an emu is definitely better than none :) - and the system feels simple, manageable enough to not make it too complicated (it is already something that it fits within so little code, convoluted it may be). It raised my interest, hopefully I can get a few things done in it here and there.
Bummer. What version of the compiler are you using? I tested it with clang++ today, and it gave much better warnings than gcc ever did.

Also, could you post your hack to the Makefile, I'm curious what the assembly looks like as well.

User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian » Fri Oct 02, 2015 6:04 pm

Artcfox wrote:Bummer. What version of the compiler are you using? I tested it with clang++ today, and it gave much better warnings than gcc ever did.
Also, could you post your hack to the Makefile, I'm curious what the assembly looks like as well.
Eh, the fun thing is that now I am sitting in front of a different Debian PC (I --forgot-- I set up one for my parents a couple of months ago, anyway, so I have Linux here, too :) ), and there even the binaries work! Which I compiled at home, and wouldn't take any input there. Weird (SDL apps otherwise work proper there).

The Makefile hack is adding the following command to the "$(TARGET_OBJ_DIR)/%.o: %.cpp" rule:

Code: Select all

<TAB> $(CC) -c -S $< -o $@.asm $(TARGET_CPPFLAGS) $(DEPFLAGS) $(TARGET_D_DEFINES)
Of course leave the original command intact there, so the object files are still generated, so it can link. This additional command will spit out .o.asm files alongside the .o files which you can explore.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox » Fri Oct 02, 2015 9:16 pm

Jubatian wrote:Eh, the fun thing is that now I am sitting in front of a different Debian PC (I --forgot-- I set up one for my parents a couple of months ago, anyway, so I have Linux here, too :) ), and there even the binaries work! Which I compiled at home, and wouldn't take any input there. Weird (SDL apps otherwise work proper there).
Weird! I'm glad it works for you now, but I wonder what could be going on. Are you using the master branch version, or the uzem140 branch version?
Jubatian wrote:The Makefile hack is adding the following command to the "$(TARGET_OBJ_DIR)/%.o: %.cpp" rule:

Code: Select all

<TAB> $(CC) -c -S $< -o $@.asm $(TARGET_CPPFLAGS) $(DEPFLAGS) $(TARGET_D_DEFINES)
Of course leave the original command intact there, so the object files are still generated, so it can link. This additional command will spit out .o.asm files alongside the .o files which you can explore.
Thanks! I changed mine to this:

Code: Select all

$(CC) -c -S -masm=intel $< -o $@.asm $(TARGET_CPPFLAGS) $(DEPFLAGS) $(TARGET_D_DEFINES)
so I can see it in Intel syntax.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox » Fri Oct 02, 2015 9:53 pm

What are your thoughts on the removing -O3 for the release build, and replacing it with:

Code: Select all

-Ofast -flto -fwhole-program
That seems to have given me a 1 to 2 Mz bump in emulation speed (with sounds disabled).

User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian » Fri Oct 02, 2015 11:07 pm

Artcfox wrote:Are you using the master branch version, or the uzem140 branch version?
Master branch for now. Hey, I just wanted to play around, and suddenly got involved in coding! Will look in it later, I see there is some SDL2 porting going around here with other optimizations.

For now I forked the repo, and added a bunch of little bithacks on the thing (https://github.com/Jubatian/uzebox/tree/uzem-hacks). I don't yet know how collaborating goes on on GitHub being new to this aspect. Any guidelines on this regarding Uzebox?

Further I think I will look around in the switches which should be where the most CPU power wastes away (too deep nesting), and which definitely affects every single instruction's decoding. Anyone having a nice instruction matrix for the AVR?

In general I would concentrate on the algorithm, hoping to get things here and there better in that regard. I feel like there should be plenty of possibilities here (like transforming the big switch tree for now which seems apparent to have less levels, and eliminating switches which can't compile to jump tables). Anyway, will also read through what is going on on the SDL2 side.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests