Too many stuff going on

The Uzebox now have a fully functional emulator! Download and discuss it here.
Post Reply
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Too many stuff going on

Post by uze6666 »

Hi guys,

I can't follow all those topics, to many stuff going and there's seems to be different views on how to optimize. I have receive a pull request from Jubatian which seems good to me since it simplified the code by implementing emulation by cycle instead of by instruction. I loose between 2-5Mhz of emulation speed but my slowest PC still runs it at least @ 31Mhz.

I propose we merge this and you can proceed with any further optimizations (like CunningFellow's table approach).

Edit: Just noticed I can't run T2K full speed on any version on SDL2. Just short though. :(
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Too many stuff going on

Post by Artcfox »

uze6666 wrote:Hi guys,

I can't follow all those topics, to many stuff going and there's seems to be different views on how to optimize. I have receive a pull request from Jubatian which seems good to me since it simplified the code by implementing emulation by cycle instead of by instruction. I loose between 2-5Mhz of emulation speed but my slowest PC still runs it at least @ 31Mhz.

I propose we merge this and you can proceed with any further optimizations (like CunningFellow's table approach).

Edit: Just noticed I can't run T2K full speed on any version on SDL2. Just short though. :(
What about if you use the -w flag to use the software renderer?

Does this one run full it speed for you?

That is CunningFellow's pre-decoded changes merged with my patch for native word size for some things that happen after decoding. I can run Bugz at 43 MHz on my 2.0 GHz Core 2 using the -wn flags, and it runs full speed with no flags, or with only the -w flag.

That version runs 10% faster than the one from the currently open PR from Jubatian. I would say hold off a bit on merging that PR because 10% makes a huge difference, especially for low end computers and the web version.
Last edited by Artcfox on Thu Oct 22, 2015 7:13 am, edited 1 time in total.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Too many stuff going on

Post by CunningFellow »

uze6666 wrote:
Edit: Just noticed I can't run T2K full speed on any version on SDL2. Just short though. :(
Two things that could be.

Either SD card related. T2K hits the SD card very hard obviosuly.

The other thing is - if the changes that Jubatian made for cycle accuracy improvements slowed down 2 cycles instructions - T2K is choccas full of IJMP and MUL LD and ST.

In fact I doubt anything else hits IJMP anywhere near what T2K does.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Too many stuff going on

Post by Artcfox »

CunningFellow wrote:
uze6666 wrote:
Edit: Just noticed I can't run T2K full speed on any version on SDL2. Just short though. :(
Two things that could be.

Either SD card related. T2K hits the SD card very hard obviosuly.

The other thing is - if the changes that Jubatian made for cycle accuracy improvements slowed down 2 cycles instructions - T2K is choccas full of IJMP and MUL LD and ST.

In fact I doubt anything else hits IJMP anywhere near what T2K does.
I just tried T2K on my 2.0 GHz Core 2, and it runs full speed if you pass it the -w flag. :)
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Too many stuff going on

Post by Artcfox »

uze6666 wrote:I propose we merge this and you can proceed with any further optimizations (like CunningFellow's table approach).
Can we evaluate the performance of Jubatian's uzem140-hacks branch combined with CunningFellow's uzem140-fast-pre-decode branch first? My hope is that when combined we at least end up faster than the current upstream uzem140 branch, with both cleaner code that is more cycle perfect, and fast instruction pre-decoding, which also cleans up the code.

Edit: Okay, so I merged them together (without any of my unproven 32-bit integer enhancements) and the combined uzem140-hacks-uzem140-fast-pre-decode code is faster than the upstream uzem140 branch, and using the resulting binary I can play T2K at full speed on my 2.0 GHz Core 2 laptop using the -w flag, and run Bugz at 43.3 MHz! The code is definitely much cleaner this way, and the net result is a speedup, so I'd be happy if both Jubatian's uzem140-hacks branch and CunningFellow's uzem140-fast-pre-decode branches got merged into the upstream uzem140 branch.
Last edited by Artcfox on Thu Oct 22, 2015 12:19 pm, edited 2 times in total.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Too many stuff going on

Post by Jubatian »

I just got an idea which will likely ramp up performance especially for multi-cycle instructions in my branch. Actually very simple thing, and it doesn't even wreck anything of the clean-up, as far as I remember the assembly output, it should deliver a notable kick. Should... I will see what happens in the evening, if it works, then I will replace my pull request to include that (I expect it being about just some twenty or so lines of change).

EDIT: Done, it is part of the pull request. On my computer it gave about 5% improvement. I did some further experiments, but with the crazy optimization goals it is hard to determine the outcome from a dubious change. Anyway, this one, merging the timer registers is something definitely beneficial by concept.

What I tried in addition and discarded was tweaking update_hardware itself for removing parts into inline functions to be used with multi-cycle instructions. One combination of this gave me a 10% speed bump, but the overall results simply weren't consistent, varying usually between 0% and about 7% with barely any sane relation to what I did to the code. For now I drop this, I think performance is now rather capped somewhere else. Maybe after polishing up CunningFellow's changes it could be re-attempted.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Too many stuff going on

Post by uze6666 »

Artcfox, can you merge Jubatian latest improvements with CunningFellow 's and send a PR? From what I hear this would bring all the best together, so that should become the baseline.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Too many stuff going on

Post by Artcfox »

uze6666 wrote:Artcfox, can you merge Jubatian latest improvements with CunningFellow 's and send a PR? From what I hear this would bring all the best together, so that should become the baseline.
I did merge them on my own last night, but as per the discussion with Jubatian over the past few days, I agree that the official merge for the PR should probably be performed (or at least verified) by one of the original authors of the code that's being merged. I found it much easier to start with Jubatian's branch and pull CunningFellows changes into it, which is what I made available online here, but I think the best thing might be to have CunningFellow perform the merge himself, and then we can compare notes as a way to double-check that nothing got missed.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Too many stuff going on

Post by uze6666 »

Ok. So I'll merge Jubatian pull request, then when ready someone will send another one with CunningFellow changes.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Too many stuff going on

Post by Artcfox »

uze6666 wrote:Ok. So I'll merge Jubatian pull request, then when ready someone will send another one with CunningFellow changes.
Sounds good.
Post Reply