Get that emu faster

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Get that emu faster

Post by uze6666 »

Just saw your pm. Will merge tomorrow after work. After that I'll see to apply my stuff probably into another branch for you to validate.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

uze6666 wrote:Just saw your pm. Will merge tomorrow after work. After that I'll see to apply my stuff probably into another branch for you to validate.
Awesome. :)
User avatar
Jubatian
Posts: 1564
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian »

Fasten your seatbelts!

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

This thing might even outdo CunningFellow's branch, so might probably fly even without his changes! :) (Obviously, in the longer run, combine and smile) I was a bit shocked by the end result, four changes, which individually I couldn't measure, but combined gave a whopping 30 percents on my PC, Arkanoid suddenly bumping from 55MHz to 72MHz, while before no matter what I did, I was very lucky if I got 60MHz. It is a consistent improvement for everything. Realistically all the distinct parts, if compiled by a compiler having a consistent behavior, should produce small performance bumps, but the nature of these changes are so that they work best combined (not just by luck, by the way they work this is expected: three changes trimming down the main path, then a fourth inlining that, no wonder that their combination is which kicks real good).

They are distinct from CunningFellow's work, so the combination of the two should add up nicely in performance. For now no pull request, I will wait for others to do their part first, and merge then. But it is up for experimenting if you like.

Uze6666: Yes, that variable nightmare should be fixed. Those u8, u32 and such things are just as nasty like names "k12", "k7", and alikes, impossible to search proper. The types from "<stdint.h>" should be used (what you mention, uint8_t and friends). The proper type for variables intended to use the architecture's native type is "uint_fast32_t" (or "int_fast32_t" if you really need signedness).
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

Jubatian wrote:Fasten your seatbelts!

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

This thing might even outdo CunningFellow's branch, so might probably fly even without his changes! :) (Obviously, in the longer run, combine and smile) I was a bit shocked by the end result, four changes, which individually I couldn't measure, but combined gave a whopping 30 percents on my PC, Arkanoid suddenly bumping from 55MHz to 72MHz, while before no matter what I did, I was very lucky if I got 60MHz. It is a consistent improvement for everything. Realistically all the distinct parts, if compiled by a compiler having a consistent behavior, should produce small performance bumps, but the nature of these changes are so that they work best combined (not just by luck, by the way they work this is expected: three changes trimming down the main path, then a fourth inlining that, no wonder that their combination is which kicks real good).

They are distinct from CunningFellow's work, so the combination of the two should add up nicely in performance. For now no pull request, I will wait for others to do their part first, and merge then. But it is up for experimenting if you like.

Uze6666: Yes, that variable nightmare should be fixed. Those u8, u32 and such things are just as nasty like names "k12", "k7", and alikes, impossible to search proper. The types from "<stdint.h>" should be used (what you mention, uint8_t and friends). The proper type for variables intended to use the architecture's native type is "uint_fast32_t" (or "int_fast32_t" if you really need signedness).
Sweet! I look forward to trying this tonight! :)

Even without this latest improvement, I was able to go from 124 MHz to 195 MHz with the Makefile enhancement I made last night (on the branch that's awaiting to be pulled)

Code: Select all

make clean && GEN=1 make release
(Play through a game)

Code: Select all

make clean && USE=1 make release
Unfortunately the Emscripten compiler doesn't support profile-guided optimization, because the 150% speed improvement I got from PGO would have been most appreciated there.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Get that emu faster

Post by CunningFellow »

Jubatian wrote:Fasten your seatbelts!

https://github.com/Jubatian/uzebox/tree/uzem140-hacks

This thing might even outdo CunningFellow's branch

Yep - I can stop working on mine now.

your update_hardware_fast() is almost exactly what I was working on.

My update hardware was looking like

__inline__ update_hardware()
bitmap[index++] = pixel;
nextEvent--;
if(nextEvent==0) {
processEvent();
getNextEventTime();
}

Though I was planing on not having a linebuffer and just writing straight to a 1440x(244+2) bitmap.

AND the "events" would not only include the hardware AVR events but two special emulator related events that also ran.

audio_scanline_watchdog: Which would happen every 15Khz or so to keep everything from going bung if the AVR code crashes (audio ring buffer and pixel index)
h_sync_to_porch_time: That would reset the pixel "index" to the start of a line the correct number of clocks after sync.

Your latest stuff looks pretty much as low overhead as that anyways - so I will get back to AVR ASM code now :)
User avatar
Jubatian
Posts: 1564
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian »

CunningFellow: I don't think the audio watchdog idea is the right approach for that problem. In my opinion about there things are severely messed up: it is not the AVR code which should trigger the UI of the emulator when it likes so, rather vice-versa. That is, the UI is "outside", calling the emulator to progress "some" (like 100K or so) cycles forth. That should be about the first major change in modularizing this thing proper, so separating the emulation from the user interface.

(The bitmap approach by this glimpse may also carry a potential segfault unless you somehow planned to guard that index. The linebuffer is a nice thing in that even if the AVR code goes completely south, it just spins pixels within that 2K buffer, inherently safe)
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Get that emu faster

Post by CunningFellow »

Jubatian wrote:CunningFellow: I don't think the audio watchdog idea is the right approach for that problem. In my opinion about there things are severely messed up: it is not the AVR code which should trigger the UI of the emulator when it likes so, rather vice-versa. That is, the UI is "outside", calling the emulator to progress "some" (like 100K or so) cycles forth. That should be about the first major change in modularizing this thing proper, so separating the emulation from the user interface.
From a modularising C++ code point of view I understand that. But from the Uzebox perspective hardware I don't understand it.

The TV does not ask for a H-Sync or a pixel. It accepts that when the AVR changes a port pin low that is commanding it to "sync up".

In my mind the logic way is that the buffer act just like a TV signal and it react to the AVR. Modularise/compartmentalize the C++ code up higher than the bitmap/linebuffer. But the pixels are so tied to hardware anyways.
Jubatian wrote:(The bitmap approach by this glimpse may also carry a potential segfault unless you somehow planned to guard that index. The linebuffer is a nice thing in that even if the AVR code goes completely south, it just spins pixels within that 2K buffer, inherently safe)
The index was to be guarded by the hidden "audio_scanline_watchdog:" event.

If the AVR did not put out an audio sample OR a HSync within 1820+x clocks then INDEX would be reset to somewhere safe and the audio buffer still serviced with silence.

You could only segfault if the emulator went haywire and didn't service the watchdog every 1820+x clocks.

Moot point now though as your new update_hardware_fast is going to be so close to my idea that it's not worth me implementing.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Get that emu faster

Post by uze6666 »

You guys are killing me! ;) Too many great brains on the same problem is hard to manage.

I checked out Artfox last PR, and can't run it, executing:

Code: Select all

uzem -n -v Mode13ExtendedDemo.hex
or even just

Code: Select all

uzem Mode13ExtendedDemo.hex
goes into an endless loop in the console outputting forever something like (no SDL window ever):

Code: Select all

"C:\work\uzebox\git\tools\uzem>uzem -n -v -n "
Anybody on windows can replicate?

Btw, are those the switches you guys use to measure the top speed emulation? -v disabling vsync and -n disabling sound?

ps: Btw, don't get too picky on such stuff, the emu is just a tool, it just have to be good enough when you think about it. Be picky on the kernel or other thing that affect the platform itself!! That's the essential. ;)
User avatar
Jubatian
Posts: 1564
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Get that emu faster

Post by Jubatian »

CunningFellow wrote:From a modularising C++ code point of view I understand that. But from the Uzebox perspective hardware I don't understand it.

The TV does not ask for a H-Sync or a pixel. ...
It is not the modularising point of view, rather the user interface responsiveness point of view. Withing the emulator, the "television" is not necessarily equivalent to the window or display of the program, despite that as of now it seems so. The emulator should still drive the emulated television. Whose output is grabbed by the user interface for actual display.

My design in my own emulator project looks like this:

UI component drives emulation, using the sound hardware as real time source (it determines how many cycles it needs to request being emulated by how full the audio buffer is, to keep it consistently, evenly filled). The emulation then does as many cycles as requested, driving various pieces of the hardware as required. The UI component, before starting emulation, sets up a line render callback, which the emulator can call whenever it completes a line, and so the UI component can immediately draw as soon as the lines arrive.

Since the UI governs the run of the program, it always stays responsive even when some foul code is fed in the emulator, which, if it had such feature, would also be useful for debugging that. It can also manage frameskipping, dropping the render of frames as necessary to keep up with the real time source from the audio, which is useful on older hardware or stuff with crappy video card drivers.

Uze: Where is that Mode 13 demo? Until now I couldn't stumble upon it on the branches I had seen.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Get that emu faster

Post by Artcfox »

Jubatian wrote: Uze: Where is that Mode 13 demo? Until now I couldn't stumble upon it on the branches I had seen.
It's in the paletteMode branch, in the demos/Mode13ExtendedDemo/ directory, but it doesn't compile, so I'm not sure how he's building it.
Post Reply