Smoother rendering of high res modes

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Smoother rendering of high res modes

Post by Jubatian »

I don't know whether SDL is capable to support Deep color (or whether it might later become able to support it), but that has more than 8 bits per channel, and still is a 32 bit mode. Normally its masks would look like as follows:
  • Red: 0x3FF00000
  • Green: 0x000FFC00
  • Blue: 0x000003FF
Moreover while the code above can certainly produce a proper output string for all possible channel orders, avconv itself might not support all of them. See the documentation of the "-pix_fmt" flag (for example on this doc). Maybe it does, but by the description of this program's interface there is a possibility it doesn't (the robust method with it would be querying the supported formats by "-pix_fmts", and using one which we can supply). And you still only solved the problem for avconv then, while other possible future capture devices (such as libpng which was mentioned here as a feature some wish for) might not be this generous, and would make it necessary to add kludges.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Smoother rendering of high res modes

Post by Jubatian »

I give up this branch for now.

The problems with modularization right now is not just simply how a damn surface will be passed around so video capturing can be a little bit faster in the most common cases with avconv. As things stand now, the renderer would be commanded by Uzebox's emulation (emulator => renderer calls) which is a big limitation. The renderer governs the user interface, and for a decent debug support it would be nice if it (or the complex of modules running it) was capable to query parts of the emulator, such as the state of the RAM so it could display RAM usage (like I had seen somewhere here). This is not really feasible with this approach, and using this so will likely bite back later.

In my own emulator project I got around this problem by that the emulator core could be initialized with a set of callbacks, which callbacks it could use to ask the host to do things, such as to render a line. It is a plain C project. In OO callbacks are not that simple (you can't call back to class methods), and it would require research for me what is the proper design pattern this case (which due to my net access constraints I wouldn't like to do, and I am neither that much interested in OO languages, only using OO as design pattern within C which I am pretty familiar with).

First I think rather the emulator core should be cleaned up as best as possible.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Smoother rendering of high res modes

Post by Artcfox »

Sorry, I was just trying to contribute what I thought was a modular and generic approach to video recording to your branch. I thought it was pretty neat that it kept your special CRT effects.

So are you thinking a complete rewrite in C?

Uzem doesn't really use that many C++ constructs, it feels more like a C program, but with the file extensions ending in ".cpp" so I wouldn't really be opposed to one written purely in C.

One thing that I have been researching is using GLUT (freeglut) instead of SDL, because that would give us direct access to OpenGL calls (specifically custom shader programs) which would allow us to emulate CRT effects like this.

An additional benefit is we wouldn't have to worry about pixel formats at all because the custom shader program can use two textures, one being the raw pixel indicies, and the other being a 1D texture that specifies the palette. Emscripten supports cross-platform programs that use GLUT, so the web build should still work.

The other thing I looked into for modularity is the simavr project. Apparently that is what another AVR-based gaming console (gamebuino) uses as a base for its emulator, and simavr is written using OO constructs in pure C. Someone wrote it as their thesis project, and it has a great PDF manual/white paper that explains in great detail how the entire simulator works. Apparently it is also very easy to create and attach new virtual hardware modules to it. I'm not sure the speed (the gambuino claims it runs 10x the speed of their actual hardware), but it looks modular enough that we could even undefine the parts of the microcontroller that aren't necessary to run the Uzebox kernel to make it run even faster if necessary.

We might be able to borrow some ideas from that project for a more modular Uzem core, or go in the complete opposite direction and port/rewrite our display, sound, SD card emulation over to new virtual modules for simavr.

Edit: Someone already wrote a SD card module for simavr that we could use as a base (it doesn't support streaming reads).
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Smoother rendering of high res modes

Post by Jubatian »

No problem. Interface design is always messy, prone to bring heated debates.

Yes, I have an idea for later to rip the innards out for a plain C port. I am quite familiar with the language, and how to solve things nice and proper in it (that's my profession also). What I have on my mind for later is creating a very light Uzebox emulator not to compete with this one, but to serve a very specific purpose: just gaming, particularly Emscripten, so to get a nice fast emulator so you could publish your games for everyone.

A major thing on my mind is a bit of "butchering" in this regard: to implement the common graphics modes in native C code instead of running avr emulation, so huge chunk of instructions can be skipped in games using those modes. This would be very useful for Emscripten usage, to demonstrate specific games. Another use case would be compiling this emulator with a game embedded, so you could also distribute native binaries, hopefully broadening the audience, involving people who don't necessary have interest in tinkering with hardware, just retro-gaming.

Of course these are just daydreaming for now, I am pretty sure I could go down this road, but right now I am more interested in tinkering with Uzebox itself. I buried myself deep in assembler, hopefully in the progress of creating a shiny new graphics mode.

For performance I doubt much more is possible unless you JIT compile the thing. Uzem's code may be a mess at parts, but the avr's emulation dominates it's cycle budget, and I already understand well all the parts of that which matter. My box spends some 35 cycles for an AVR instruction. The AVR instruction table is simply built so you need to go through 2 branches to emulate one, and that's two pipeline flushes. 28MHz with most instructions taking 1 cycle is steep. Then it is also necessary to advance other parts of the hardware with proper timing.

I checked simavr, the core of it is this file:
https://github.com/buserror/simavr/blob ... sim_core.c
If you stroll it through, it will look almost suspiciously familiar. And it has deeper switches than our emulator, including ones which can not be unfolded into jump tables. So this part should definitely perform worse. The core's performance also depends on games of course, notably how the video mode of the game is implemented. If it runs through that maze always on different routes, then the poor CPU will be flushing away its pipeline like no tomorrow during emulating, while if it uses some quite repetitive instruction pattern, the jump predictor might remember it, and allow running it much faster.

The 10x faster figure is something I simply won't believe unless they have a monster of CPU to run it, and-or the avr code itself is laid out so the branch predictor can memorize the paths through it. It would mean 160MHz (!) since the Gamebuino is clocked at 16MHz. If I applied it on my box, that would mean it can emulate one AVR instruction in 12-14 CPU cycles hardware and all! Now cram all the flag logic an ADD for example produces in that, and two pipeline flushes... Or even just one.

The modularization of the simavr project is of course, seems nice, at least from what I got for the first glimpse. But it may have more overhead than a gaming-oriented Uzebox emulator would reasonably demand, where it is not necessary to emulate every little component of the AVR.

Anyway, drifted a bit off. I don't think Uzem should be ditched in favor for something else, the Uzebox-oriented core is quite worthy for keeping once the cruft is peeled off of it: it is a good base for producing a speedy gaming-oriented emulator (even if for development and debugging you rather moved on to something more beefy).
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Smoother rendering of high res modes

Post by Artcfox »

Emscripten first compiles down into LLVM IR asm, so if we JIT-compiled the AVR assembly into LLVM IR code, and added the bits necessary for interrupts and interfacing with hardware, I can see it running crazy fast, but that seems like a huge undertaking, and I'm not even sure that Emscripten allows one to use hand-generated LLVM IR.

I haven't tried simavr yet, but time-permitting, I would like to try creating some input and output modules for it to see how well it can perform as an Uzebox emulator. Even if it requires a beefy CPU, the fact that it implements everything that the ATmega644 supports would mean that a game that uses features of the chip that aren't available in Uzem might run properly.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Smoother rendering of high res modes

Post by uze6666 »

You guy already done impressive improvements to uzem but I think you should not go overboard with all the optimizing. At this point it's fast enough to run full speed on 10 years old machine. Uzem was meant just as a tool to help development, not an end in itself. Keep some brain power for the platform itself! ;)
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Smoother rendering of high res modes

Post by Artcfox »

uze6666 wrote:You guy already done impressive improvements to uzem but I think you should not go overboard with all the optimizing. At this point it's fast enough to run full speed on 10 years old machine. Uzem was meant just as a tool to help development, not an end in itself. Keep some brain power for the platform itself! ;)
But the web version doesn't run smoothly on my Chromebook (or cellphone) yet, so the quest will continue! :P
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Smoother rendering of high res modes

Post by Artcfox »

Jubatian wrote:https://github.com/Jubatian/uzebox/tree ... tools/uzem

Any opinion on this branch, huh? For me so far it seems functional, nothing broken. Does it seem achieving its goal? (Cleaning up cycle perfection related problems obviously while correctly implementing cases experienced so far).
I tested it out, and was even able to (manually) merge it into the upstream uzem140 branch (with a change to make it work). Using git merge and trying to resolve conflicts gave me a mess, so I used:

Code: Select all

git difftool -t meld HEAD upstream/uzem140
and then once that ran, I checked out a clean copy of upstream/uzem140 and copied over the changed files, to end up with a patch goes the right way (that applies your change onto the uzem140 branch). Attached is that patch if you want to take a look at it.

Your branch alone, and your branch merged into uzem140 both run a bit slower on my desktop (~5MHz), but if it's more cycle perfect, then that's probably a good tradeoff, as long as it can still run at 100% emulation speed on the slower machines.

I haven't extensively tested my manual merge, so you probably should take a look at it to see if it makes sense.

Edit: I also pushed the result of the manual merge to a branch on my Github if that makes it easier to look at the resulting merge.
Attachments
jubatian.patch.zip
patch file to apply your changes on top of the latest upstream uzem140, with the tweak I had to make to change 1440 to 1820 so it draws all the columns
(9.36 KiB) Downloaded 575 times
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Smoother rendering of high res modes

Post by Jubatian »

Back to the original subject of this topic.

I was merrily developing my new video mode using a patched up version of the emulator from the linebuffer branch to output cycle counts for testing and verifying the working of the graphics mode. The codes I used for testing always generated a slow scroll, which goes nice and smooth with my software renderer. Then I "accidentally" started the current Uzem on it. I think the result is something abysmal, especially concerning scrolling.

I noticed that the current branch also simply discards the pixel value on the port of every second cycle, then mangles further the result by achieving proper aspect ratio with linear interpolation (which can result in completely discarding up to 3 consequent cycles of the output, visible on the Mode 9 demo as missing columns).

Something should be done about this... As of now this "emulation" might only reproduce faithfully the output of some sloppy low resolution LCD mini television. I can fix up my software renderer so it supports proper all possible 32 bit color modes (including deep color), it is rather trivial, I just didn't do it for the dropping of the branch. It doesn't need anything from that branch, it is just a stand-alone function processing the line buffer into a correct aspect-ratio surface (scanline emulation is not a necessity, it can work well with 224 lines to leave more for the hardware, the main benefit of it actually is how it somewhat reproduces the behavior of a relatively slow response electron beam, fuzzing together pixels horizontally, correctly retaining every cycle of output).
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Smoother rendering of high res modes

Post by Artcfox »

Jubatian wrote:Back to the original subject of this topic.

I was merrily developing my new video mode using a patched up version of the emulator from the linebuffer branch to output cycle counts for testing and verifying the working of the graphics mode. The codes I used for testing always generated a slow scroll, which goes nice and smooth with my software renderer. Then I "accidentally" started the current Uzem on it. I think the result is something abysmal, especially concerning scrolling.

I noticed that the current branch also simply discards the pixel value on the port of every second cycle, then mangles further the result by achieving proper aspect ratio with linear interpolation (which can result in completely discarding up to 3 consequent cycles of the output, visible on the Mode 9 demo as missing columns).

Something should be done about this... As of now this "emulation" might only reproduce faithfully the output of some sloppy low resolution LCD mini television. I can fix up my software renderer so it supports proper all possible 32 bit color modes (including deep color), it is rather trivial, I just didn't do it for the dropping of the branch. It doesn't need anything from that branch, it is just a stand-alone function processing the line buffer into a correct aspect-ratio surface (scanline emulation is not a necessity, it can work well with 224 lines to leave more for the hardware, the main benefit of it actually is how it somewhat reproduces the behavior of a relatively slow response electron beam, fuzzing together pixels horizontally, correctly retaining every cycle of output).
Once all of the latest speed improvements have been integrated I was going to re-test the speed of using the full 1440 pixels and then scaling that 1440x224 surface down (or up) to whatever the window size is. In the case where it gets scaled up (say, because you clicked on the maximize button of the Uzem window) nearest neighbor does a great job and slow smooth scrolling looks great. It's pretty easy to change the existing version to use linear rather than nearest, which is what my original implementation had, but Uze requested that we use nearest neighbor instead. It wouldn't be hard to add a command line argument that allows one to switch between nearest and linear, and that would apply to both the hardware and software renderer. (That wouldn't give you the fancy CRT effects that you like, but it should display the higher resolution modes better.)
Post Reply