Get that emu faster

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

uze6666 wrote:You guys are killing me! ;) Too many great brains on the same problem is hard to manage.

I checked out Artfox last PR, and can't run it, executing:

Code: Select all

uzem -n -v Mode13ExtendedDemo.hex
or even just

Code: Select all

uzem Mode13ExtendedDemo.hex
goes into an endless loop in the console outputting forever something like (no SDL window ever):

Code: Select all

"C:\work\uzebox\git\tools\uzem>uzem -n -v -n "
Anybody on windows can replicate?

Btw, are those the switches you guys use to measure the top speed emulation? -v disabling vsync and -n disabling sound?
Yes, those switches run it as fast as it can.

What does the command:

Code: Select all

uzem Mode13ExtendedDemo.hex -vn
give you? That runs fine for me (on Linux).

The Mode13ExtendedDemo has the following compile error:

In videoMode13.c:

Code: Select all

	#if EXTENDED_PALETTE
		#include "videomode13/paletteTable.h"
	#endif
needs to be:

Code: Select all

	#if EXTENDED_PALETTE
		#include "videoMode13/paletteTable.h"
	#endif
I hate Windows and it's case-insensitivity. That's caused so many pointless errors.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

Jubatian wrote:Fasten your seatbelts!

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

This thing might even outdo CunningFellow's branch, so might probably fly even without his changes! :) (Obviously, in the longer run, combine and smile) I was a bit shocked by the end result, four changes, which individually I couldn't measure, but combined gave a whopping 30 percents on my PC, Arkanoid suddenly bumping from 55MHz to 72MHz, while before no matter what I did, I was very lucky if I got 60MHz. It is a consistent improvement for everything. Realistically all the distinct parts, if compiled by a compiler having a consistent behavior, should produce small performance bumps, but the nature of these changes are so that they work best combined (not just by luck, by the way they work this is expected: three changes trimming down the main path, then a fourth inlining that, no wonder that their combination is which kicks real good).

They are distinct from CunningFellow's work, so the combination of the two should add up nicely in performance. For now no pull request, I will wait for others to do their part first, and merge then. But it is up for experimenting if you like.

Uze6666: Yes, that variable nightmare should be fixed. Those u8, u32 and such things are just as nasty like names "k12", "k7", and alikes, impossible to search proper. The types from "<stdint.h>" should be used (what you mention, uint8_t and friends). The proper type for variables intended to use the architecture's native type is "uint_fast32_t" (or "int_fast32_t" if you really need signedness).
Nice work Jubatian! The next huge performance gain will be to fix the non-native integer sizes (the proper way, not blindly changing them to u32 and s32 like in my proof-of-concept test) in CunningFellow's patch, but first we should really get the PR merged in, and then get your changes integrated properly before we start improving CunningFellow's instruction reworking. Based on my tests, we should see at least another 10% speed improvement by using unsigned native integer sizes.

Edit: Here is the branch that I have with everyone's optimizations (even Jubatian's latest) on it, but Github is saying that it can no longer cleanly apply to uzem140. I fear that if something else got merged in recently, then the PR I submitted a few days ago that has CunningFellow's changes will have major merge conflicts. I can't really pull another all-nighter to do another manual merge of everything, so I'll just have to let Uze sort it out.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

I was trying out some more optimizations that removed two additional branches from inside every call to ::exec(), and this is what I have so far on my desktop:
225MHz.png
225MHz.png (14.13 KiB) Viewed 12289 times
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Get that emu faster

Post by uze6666 »

Yes, those switches run it as fast as it can.

What does the command:
CODE: SELECT ALL
uzem Mode13ExtendedDemo.hex -vn
give you? That runs fine for me (on Linux).
I rebuilt on my laptop after merging your PR. All works fine. There's something on my main machine.
The Mode13ExtendedDemo has the following compile error:

In videoMode13.c:

CODE: SELECT ALL
#if EXTENDED_PALETTE
#include "videomode13/paletteTable.h"
#endif


needs to be:

CODE: SELECT ALL
#if EXTENDED_PALETTE
#include "videoMode13/paletteTable.h"
#endif


I hate Windows and it's case-insensitivity. That's caused so many pointless errors.
Thanks for pointing this out, I have fixed it.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

Awesome! Thanks.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

To better coordinate everything that's happening with Uzem right now (since Uze wants to do some refactoring and cleanup once things settle down a bit), I figured I'd let everyone know what I hope can happen before the refactoring. It looks like the only major optimization left to be pulled into the uzem140 branch are Jubatian's "Fasten your seatbelts" changes, and then I was planning on doing some profiling and benchmarking to see what impact (if any) calling his update_hardware_fast method everywhere it can be called, versus only calling it in a few places makes. When I hit 225MHz emulation speed on my 4th gen i7, that was with it being called everywhere possible (and some Makefile changes I've been working on), but once Jubatian's changes are integrated, I'd like to do a solid benchmark and submit at least one more round of optimizations for the stuff I've been working on.

After that, I think that things will have settled down enough for the code cleanup to take place unhindered. What do you guys think?
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Get that emu faster

Post by uze6666 »

Artcfox wrote:To better coordinate everything that's happening with Uzem right now (since Uze wants to do some refactoring and cleanup once things settle down a bit), I figured I'd let everyone know what I hope can happen before the refactoring. It looks like the only major optimization left to be pulled into the uzem140 branch are Jubatian's "Fasten your seatbelts" changes, and then I was planning on doing some profiling and benchmarking to see what impact (if any) calling his update_hardware_fast method everywhere it can be called, versus only calling it in a few places makes. When I hit 225MHz emulation speed on my 4th gen i7, that was with it being called everywhere possible (and some Makefile changes I've been working on), but once Jubatian's changes are integrated, I'd like to do a solid benchmark and submit at least one more round of optimizations for the stuff I've been working on.

After that, I think that things will have settled down enough for the code cleanup to take place unhindered. What do you guys think?
I fine with that, I can wait. :)
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian »

I did the merge along with performance measurements. It is in the commit message:

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

However I am simply unable to create a pull request today, I am trying to load that page in GitHub since like half an hour without success. My net access is atrocious. Please pull it in yourself if possible since I really don't want to spend half the day hitting refresh on this crap while my CPU burns itself to the ground battling with the damned wifi card. (I hope I can send this post in a matter of several minutes)
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

Jubatian wrote:I did the merge along with performance measurements. It is in the commit message:

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

However I am simply unable to create a pull request today, I am trying to load that page in GitHub since like half an hour without success. My net access is atrocious. Please pull it in yourself if possible since I really don't want to spend half the day hitting refresh on this crap while my CPU burns itself to the ground battling with the damned wifi card. (I hope I can send this post in a matter of several minutes)
Awesome, thanks! I'll test it on my end (and remove the uzem and uzemdbg binaries that got checked in by mistake, so those don't get merged into the official repo) and submit a PR along with my newest (unrelated) optimizations.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

I merged Jubatian's code into my uzem140 branch, and added my unrelated fix for the linker errors I was getting with the Emscripten build, along with a making GDB support a compile-time option, since it contained two conditionals that got executed 28M times per second, even when you don't activate it with the command line option. This gives a noticeable speed up in the web build, and it also speeds up the native build if you choose to compile it using:

Code: Select all

NOGDB=1 make release
The default behavior is unchanged.

I also tweaked the link-time-optimizations for the web build to make that even faster. The web build got fast enough that I actually had to decrease the number of cycles that it executes each 1/60th of a second because it was running too fast between frames, resulting in jitter and distorted sound. Ideally it would execute 28636360 / 60 emulated cycles per iteration, but that doesn't account for the time that the web browser spends doing other tasks, so it needs to execute more cycles per iteration to compensate.

On my desktop, I did notice a slowdown when update_hardware_fast is not called everywhere versus when it's called in the few places that Jubatian chose (225 MHz when called everywhere, versus 215 MHz the way it is now), but I left it the way it is so we have a baseline to build on top of.

Jubatian, can you benchmark the version that's in my PR the way it currently is, but compile it using:

Code: Select all

make clean
rm -rf Release
GEN=1 NOGDB=1 ARCH=core2 make release

./uzem bugz.uze -w
(play it for a minute or so)

make clean
USE=1 NOGDB=1 ARCH=core2 make release

./uzem bugz.uze -vnw
(It's very important that for the second compile you switch the GEN=1 to USE=1)

and then add update_hardware_fast() everywhere it can possibly go, and re-benchmark it using the same procedure above? I'm curious if you'll end up seeing the same gains that I saw on my core-avx2 on your core2.

Edit: Uze, I sent you a PR.
Post Reply