Uzem ported to SDL2

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Uzem ported to SDL2

Post by Artcfox » Fri Oct 02, 2015 10:21 am

uze6666 wrote:Thanks for the rom and capture. I was able to reproduce the line skip issue. I will review you patch to fully understand what is going on.

I will do some audio tests with 15700 and various buffer sizes.
Awesome!

About the patch, at the time I just wanted to get the emulator to a working state again, so it is a bit heavy-handed and it removes all support for the delayed setting of flags in the TIFR1 register because being more delicate did not work. You are correct in that also makes it not cycle perfect, so a more careful approach is probably needed to actually solve the bug correctly.

TIFR1 is the "TIFR1 – Timer/Counter1 Interrupt Flag Register" described on page 131 of the datasheet.

Here is what the datasheet has to say about "Bit 0 – TOV1: Timer/Counter1, Overflow Flag":
The setting of this flag is dependent of the WGMn3:0 bits setting. In Normal and CTC modes, the TOV1 Flag is set when the timer overflows. Refer to Table 14-5 on page 127 for the TOV1 Flag behavior when using another WGMn3:0 bit setting.
TOV1 is automatically cleared when the Timer/Counter1 Overflow Interrupt Vector is executed. Alternatively, TOV1 can be cleared by writing a logic one to its bit location.
But in the uzem140 code, the setting of TOV1 does not happen immediately like I believe it should, instead it is delayed by one instruction. That still does not fix the vertical jitter, so there has to be at least one more bug. I believe another bug occurs because at the end of the function there is a:

Code: Select all

TIFR1 = tempTIFR1;
which blindly overwrites TIFR1 with whatever is currently in the tempTIFR1 variable, even if something else in that call to update_hardware (or a recursive call to update_hardware via its call to trigger_interrupt) has directly written to TIFR1 that cycle. That would in effect "undo" that direct write to TIFR, which could cause you to miss anything that was directly written into TIFR during that instruction (or during a recursive call). Those direct writes include:

Code: Select all

TIFR1|=OCF1B; //CTC match flag interrupt
TIFR1|=OCF1A; //CTC match flag interrupt
TIFR1&= ~OCF1A; //clear CTC match flag
TIFR1&= ~OCF1B; //clear CTC match flag
TIFR1&= ~TOV1; //clear TOV1 flag
which would definitely explain why we are missing an entire scanline (and correspondingly, why the sounds might be messed up).

One possible solution I thought of was to set tempTIFR1 = TIFR1 any time you need to do a direct write to TIFR1, but that doesn't work for the same exact reason. As soon as any one thing sets tempTIFR1, then it blows away any other changes to the tempTIFR1 variable that happened during that call (or a recursive call). Since the only changes to TIFR1 that need to be delayed happen when setting bits, we can probably get away by just merging those bit changes into TIFR1 at the end of the function using:

Code: Select all

TIFR1|=tempTIFR1;
I just tested it with only those changes, and it looks like all the patch needs to do to not drop scanlines (verified using the test case I sent you), and stay true to the goal of being cycle perfect is this:

Code: Select all

diff --git a/tools/uzem/avr8.cpp b/tools/uzem/avr8.cpp
index 204d768..b9ad0b7 100644
--- a/tools/uzem/avr8.cpp
+++ b/tools/uzem/avr8.cpp
@@ -648,8 +648,7 @@ void avr8::update_hardware(int cycles)
 
                        if (TCNT1 > (0xFFFF - cycles)){
 
-                                //TIFR1|=TOV1; //overflow interrupt
-                               tempTIFR1|=TOV1; //overflow interrupt
+                                TIFR1|=TOV1; //overflow interrupt
 
                        }
 
@@ -684,8 +683,7 @@ void avr8::update_hardware(int cycles)
 
                        if (TCNT1 > (0xFFFF - cycles)){
                                if (TIMSK1 & TOIE1){
-                                        //TIFR1|=TOV1; //overflow interrupt
-                                       tempTIFR1|=TOV1; //overflow interrupt
+                                       TIFR1|=TOV1; //overflow interrupt
                                }
                        }
                        TCNT1 += cycles;
@@ -830,7 +828,7 @@ void avr8::update_hardware(int cycles)
 
 
        TCCR1B=newTCCR1B; //Timer increments starts after executing the instruction that sets TCCR1B
-       TIFR1=tempTIFR1;
+       TIFR1|=tempTIFR1;
 }
 
 u8 avr8::exec()
It looks like you knew at one point that the TOV1 flag needed to be set during the initial cycle, because those lines were there, but commented out.

It seems a bit odd that the only interrupt missed would be the very first scanline all the time, but I'm guessing that instead of missing an interrupt, it missed the clearing of the interrupt flag, causing that interrupt to fire twice in a row, which would have incorrectly incremented the scanline counter variable twice, eating the first scanline.

I'm submitting this patch as a pull request. :D

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Uzem ported to SDL2

Post by Artcfox » Fri Oct 02, 2015 5:09 pm

I saw your comment that this is still not completely cycle perfect. :cry:

I wanted to put my reply on Github here too, so when I go looking for it in the future I'll find it:
Artcfox wrote:I think there may be a third bug that is still not fixed, that involves trigger_interrupt potentially recursing back into update_hardware. If that happens, then any flags that should have been set delayed one cycle (pending changes in tempTIFR1) will be overwritten and lost by the tempTIFR1 = TIFR1; at the beginning of any recursive calls to update_hardware (via trigger_interrupt), since that's one of the first things update_hardware does.

I think it may be necessary to have some sort of data structure that specifies which bit in which register needs to be set or cleared, and something like a TTL value that specifies how many clock ticks need to occur before that bit set/clear occurs. Then it wouldn't matter if the recursion happens or not as long as you decrement the TTL during every call to update_hardware, and perform the bit set/clear when that TTL reaches zero.

User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Uzem ported to SDL2

Post by Jubatian » Sat Oct 03, 2015 7:58 pm

I checked out this branch to see what it could offer, but for me the results were notably inferior. Uzem on the master branch runs stuff at 26MHz here, this one can barely reach 22MHz. Debian Linux Jessie with XFCE, an old NVidia video card (GeForce 8400 GS), a 2,66MHz Core 2 Duo.

However if I exit and restart it several times, sometimes it flies with 28MHz for a short while (up to 15 secs, graphics and all) before falling back to a crawl. Resizing its window a few times might kick it back to full speed.

The master branch version doesn't seem to produce something alike.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Uzem ported to SDL2

Post by Artcfox » Sat Oct 03, 2015 8:25 pm

Jubatian wrote:I checked out this branch to see what it could offer, but for me the results were notably inferior. Uzem on the master branch runs stuff at 26MHz here, this one can barely reach 22MHz. Debian Linux Jessie with XFCE, an old NVidia video card (GeForce 8400 GS), a 2,66MHz Core 2 Duo.

However if I exit and restart it several times, sometimes it flies with 28MHz for a short while (up to 15 secs, graphics and all) before falling back to a crawl. Resizing its window a few times might kick it back to full speed.

The master branch version doesn't seem to produce something alike.
Does glxinfo report that you have actual hardware acceleration enabled, or is it using Mesa to do the work of the GPU on your CPU?

A very easy way to test if the slowdown is being caused by the GPU scaling is to test what happens after making this change:

Code: Select all

diff --git a/tools/uzem/avr8.cpp b/tools/uzem/avr8.cpp
index 38bbc8a..5e2328b 100644
--- a/tools/uzem/avr8.cpp
+++ b/tools/uzem/avr8.cpp
@@ -1659,7 +1659,7 @@ bool avr8::init_gui()
                return false;
        }
        SDL_SetHint(SDL_HINT_RENDER_SCALE_QUALITY, "nearest");
-       SDL_RenderSetLogicalSize(renderer, 640, 480);
+       SDL_RenderSetLogicalSize(renderer, 640, 240);
 
        surface = SDL_CreateRGBSurface(0, 640, 240, 32, 0x000000ff, 0x0000ff00, 0x00ff0000, 0xff000000);
        if(!surface){
Which will make it so the GPU won't have to upscale the texture.

User avatar
Jubatian
Posts: 1355
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Uzem ported to SDL2

Post by Jubatian » Sat Oct 03, 2015 9:42 pm

I checked it, hardware acceleration is okay (glxinfo reports so and everything else I know), and when I apply the diff, the same still happens. I even applied it after adding my CPU instruction decoder hacks, still the exact same result. Something seriously goes south here.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Uzem ported to SDL2

Post by Artcfox » Sat Oct 03, 2015 9:50 pm

Jubatian wrote:I checked it, hardware acceleration is okay (glxinfo reports so and everything else I know), and when I apply the diff, the same still happens. I even applied it after adding my CPU instruction decoder hacks, still the exact same result. Something seriously goes south here.
Weird! Some people have reported the SDL2 version working better than the SDL1.2 version (on both Linux and Windows), while others have better luck with the SDL1.2 version. I think that older computers have trouble with SDL2 more often than newer computers, though the SDL2 port runs better on my 5 year old laptop than the SDL1.2 version ever did, but from what I gather, it's a pretty high end "old laptop."

I guess maybe I should make a preprocessor flag that you can set in the Makefile to allow the SDL version to be changed at compile time?

User avatar
uze6666
Site Admin
Posts: 4449
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Uzem ported to SDL2

Post by uze6666 » Sun Oct 04, 2015 10:11 pm

checked out this branch to see what it could offer, but for me the results were notably inferior. Uzem on the master branch runs stuff at 26MHz here, this one can barely reach 22MHz. Debian Linux Jessie with XFCE, an old NVidia video card (GeForce 8400 GS), a 2,66MHz Core 2 Duo.
We have some very similar specced machines. Mine is 2.4Ghz, Core2 with a 8600GTS and can run uzem from the master branch at full speed under Windows 7 32 bit with no sound drop ever.

I am doing many tests and so far, removing the SDL_RENDERER_PRESENTVSYNC flag cause a small increase in frame rate (~.5%). I'm using Mode13ExtendedDemo.hex from the palleteMode branch and it averages 23.8Mhz with SDL2. The same demo runs full speed with the master branch uzem.

However, in the init code, replacing the following lines boosts the emulation to ~28.5Mhz, a 16% increase in frame rate.

Code: Select all

bool avr8::init_gui()
...
	surface = SDL_CreateRGBSurface(0, 640, 240, 32, 0,0,0,0);
	//surface = SDL_CreateRGBSurface(0, 640, 240, 32, 0x000000ff, 0x0000ff00, 0x00ff0000, 0xff000000);

	if(!surface){
		fprintf(stderr, "CreateRGBSurface failed: %s\n", SDL_GetError());
		return false;
	}

	//texture = SDL_CreateTexture(renderer,SDL_PIXELFORMAT_ABGR8888,SDL_TEXTUREACCESS_STREAMING,640,240);
	texture = SDL_CreateTexture(renderer,surface->format->format,SDL_TEXTUREACCESS_STREAMING,640,240);
...
It seems there's some pixel format transformation going on that slow down things. Doing more tests.

Update: In fact 28.5Mhz is what is reported in uzem from the master branch. So I think that was the issue. Using the SDL_RENDERER_PRESENTVSYNC, the scrolling in the demo is *really* smoother than the SDL1 but I lose perhaps 0.1Mhz in emulation speed. Will test for the sound issues now.
Attachments
Mode13ExtendedDemo.hex
(62.14 KiB) Downloaded 128 times

User avatar
uze6666
Site Admin
Posts: 4449
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Uzem ported to SDL2

Post by uze6666 » Sun Oct 04, 2015 11:34 pm

Ok did more tests. Considering the GPU will do all scaling for us, we should push the least pixels as possible to the video card. I have obtained 31.8Mhz emulation (sound disabled) with the following:

Code: Select all

bool avr8::init_gui(){
...
	surface = SDL_CreateRGBSurface(0, 720, 224, 32, 0,0,0,0);
...
	texture = SDL_CreateTexture(renderer,surface->format->format,SDL_TEXTUREACCESS_STREAMING,720,224);
...
	inset =0;
...
void avr8::update_hardware(int cycles){
...

    //draw pixels on scanline
	if (scanline_count >= 0 && current_cycle < 1440)
	{
		while (cycles)
		{
			if (current_cycle >= 0 && current_cycle < 1440)
			{
				current_scanline[current_cycle>>1] = pixel;
...
So I expended the texture to 720 which reduce the CPU cycles required to address the current_scanline array and reduced the vertical lines of the texture to 224. Hopefully this wont break the video recording.

User avatar
Artcfox
Posts: 944
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Uzem ported to SDL2

Post by Artcfox » Sun Oct 04, 2015 11:37 pm

Beautiful discovery! When it worked fine on Intel integrated graphics, but ran slow on two Nvidia cards I suspected that some sort of texture translation might be happening behind the scenes, but didn't realize that passing all 0's would still work. Since we're still forcing a pixel format in the call to CreateTexture, I wonder if there is still a potential conversion that can happen?

If so, it may be possible to have Uzem auto detect the fastest texture format and use that, or I've been discussing how we could go about using an 8-bit texture with CunningFellow. Initially I thought it wasn't possible, but the more I've been looking at the API (The SDL2 documentation is terrible!) I think it might be possible to use an 8-bit texture (using purely SDL calls), and have it create an empty palette that we could fill in instead of using SDL_MapRGB, and then we wouldn't have to do the 8 -> 32 bit expansion in software, and we'd be streaming 1/4th the amount of data to the GPU each frame. If that works, that could really speed things up on the rendering side.

So with that change, you're happy with the performance of SDL2?

I too am very impressed with the smoothness when I completely comment out the:

Code: Select all

//while (audioRing.isFull())SDL_Delay(1);
line and let the SDL_RENDERER_PRESENTVSYNC be the only thing that limits the CPU of the emulator. Everything becomes as smooth as butter! Not all monitors have a 60Hz refresh rate though, so maybe that should be configurable through a command line argument. That's definitely the way that I prefer to run my copy of Uzem. Such smoothness!

User avatar
uze6666
Site Admin
Posts: 4449
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Uzem ported to SDL2

Post by uze6666 » Mon Oct 05, 2015 12:06 am

If so, it may be possible to have Uzem auto detect the fastest texture format and use that, or I've been discussing how we could go about using an 8-bit texture with CunningFellow. Initially I thought it wasn't possible, but the more I've been looking at the API (The SDL2 documentation is terrible!) I think it might be possible to use an 8-bit texture (using purely SDL calls), and have it create an empty palette that we could fill in instead of using SDL_MapRGB, and then we wouldn't have to do the 8 -> 32 bit expansion in software, and we'd be streaming 1/4th the amount of data to the GPU each frame. If that works, that could really speed things up on the rendering side.
That's definitely worth a try. However, in my tests, if I disabled the whole pixel copying block (and the current_scanline[current_cycle>>1] = pixel), the performance only increased by about1 Mhz emulation speed. Also, palette mode works in windowed mode?
So with that change, you're happy with the performance of SDL2?
Absolutely, with all the fixes applied, it's even a bit better on my machine.
I too am very impressed with the smoothness when I completely comment out the:

CODE: SELECT ALL
//while (audioRing.isFull())SDL_Delay(1);


line and let the SDL_RENDERER_PRESENTVSYNC be the only thing that limits the CPU of the emulator. Everything becomes as smooth as butter! Not all monitors have a 60Hz refresh rate though, so maybe that should be configurable through a command line argument. That's definitely the way that I prefer to run my copy of Uzem. Such smoothness!
The improvement in "smoothness" is really night and day. Even with the audio ring delay, it's near perfect. The rom in attachment really has a lot of scrolling in it and, before, stuttering was a mess. Now it's silky smooth! :D

If I disable that delay line, it looks better at first sight and I didn't get sound drops so far. But you bring a good point, can we assume all monitors are at 60hz? Not so sure. Perhaps it's better to keep that line for now (or have some flag or config). I'll commit my changes to the branch for others to check out.
Attachments
sprite-demo2.hex
(178.34 KiB) Downloaded 129 times

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest