SFML vs SDL

Post by **D3thAdd3r** » Fri Sep 09, 2016 2:09 am

Does anyone have experience with SFML? I have seen several benchmarks on their site, and I am always a bit skeptical about arbitrary benchmarks, but the speed differences in those cases seem amazing. Even if they were on average 50% realistic indicators of real world performance, they might increase Uzem speed by a considerable percentage.

If someone knew that it wouldn't be too difficult to convert over to that, it might be worth looking into. If no one else does it, I might even take a try, as a large speed increase would very much help my current developments where system requirements are at the highest end of machines. For those that don't know, it is cross platform to Win,Lin,OSX, and perhaps mobile and more in the future. It has C and Java bindings, among others.

The downside as far as I can see, it would take extra code to work with Emscripten, and/or cannot be done until SFML supports OpenGL ES.

Post by **Jubatian** » Fri Sep 09, 2016 5:11 am

It won't help in performance at all. Uzem's bottleneck is the AVR's emulation, not anything about SDL. I don't know about the exact percentages there, but in my own emulator development, it is around 95% to 5% for the AVR emulation. I won't use it myself since it is (primarily) C++, and I only code in that language when its absolutely necessary (the C familiy has many quirks, I don't feel confident enough with C++ to be sure that I can properly handle or avoid them).

SDL's performace could possibly be only relevant on old machines where as I experienced due to the imperfect video drivers (particularly on Linux) things could get crippled. But SFML is newer, so it is even less likely to support those, and I mean "old machines" by my Hungarian standards (with me using a nine years old Core 2 Duo laptop which I don't yet consider old enough to qualify as "old", as in "must be replaced since it is sooo outdated and slow"). So I thinks SDL totally does its job as well as it is reasonably possible.

(With heavy hardware accelerated rendering there might be things to consider, but they are irrelevant for Uzebox emulation)

Post by **D3thAdd3r** » Fri Sep 09, 2016 8:07 am

Primarily I am thinking about bottom of the line hardware(~1.2 ghz) where Uzem 2.0 seems to run full speed at 1.6ghz(AMD even). More honestly, I am very curious what core improvements can be made to speed up ESP8266 emulation.

About the ratio, I did not know how extreme it was. If your estimate there is on the order of 19:1, where there could only possible be gains of really < 1-2% and maybe 0%, then I would agree that is a waste of time entirely versus other things. That curiosity is complete then, it is all about the opcode/hardware update loop. Things are tied very tight together, multithreading is useless.

Is there some radical thing to be done if complexity and clean code was not a consideration? I would assume the compiler is turning the switch() into a jump table already. I have seen some Chip8 emulator use an array of function pointers where the instruction just calls the correct function but then you get function call overhead. Recompilation seems difficult and perhaps error prone..or could it work? I saw discussion to use goto on a table of precomputed labels somewhere. It's just a curiosity, I don't imagine those ideas would do anything or that anyone would actually do them. I have not noticed any way to increase Uzem speed myself. The only thing I noticed, was that SPI things seems slightly hacked and it would slower to emulate it correctly.

Post by **Jubatian** » Fri Sep 09, 2016 10:50 am

You may look in my new emulator implementation (here). For other stuff (RRPGE) I also used function pointer table implementations for the instruction decoder, here I stuck to a big switch-case, exploiting it with tail merging to minimize code size at low cost (forward "goto"s). The minimized code size helps that it fits in caches easier. In the case of the AVR due to the ridiculous amounts of code required to handle the flags I think this is a more optimal implementation than a function pointer table. The cost of switch versus function table is about the same: the most substantial component is one branch misprediction (which is very costly).

My emulator core on my 2GHz core can do over 100MHz, which means it completes an AVR instruction (and associated hardware emulation tasks) in 20 clock cycles on average. Considering that every AVR instruction is most likely to be mispredicted by the branch predictor (it is just too chaotic), this is quite remarkable. But the current Uzem could also peak at 70-80MHz on the same machine.

I had an idea for doing a kind of post-processing to remove flag calculations where they were certainly unnecessary. But I don't see it too useful as the bulk of any reasonable Uzebox game is video generation, which is load and out heavy (if it has any flag generator, those are usually used especially to act on the resulting flags), and I also improved the flag calculations using different algorithms. And such an effort could also backfire by increasing the size of the instruction decoder (needing the no flag variant of several instructions).

I don't think butchering the code any more than what I cooked up (in my emulator) could help in any substantial manner to get better performance. Maintainability is also important since if it is absolutely "write only code", then someone to come along a year later will shoehorn in some hideous kludge, then another, crippling performance in the end.

By the way what is so steep on the ESP8266? I would think that the most common use case was simply networking, sending and receiving packets by some variant of AT protocol, possibly at most driving the UART at 115Kbaud (which fits with the current method of polling it on every scanline since you have 15720 such lines in a second, and 115Kbaud generates at most ~11K of data), but that would require ridiculous buffer sizes making it impractical (256 bytes receive, 256 bytes transmit), of course unless you also use the SPI RAM for your application logic otherwise (so you can spare that hefty 512 bytes). I can't believe why it would be anything too costly, it is just a (special) modem to emulate, interfacing with plain sockets on the host's end.

Post by **Jubatian** » Sun Sep 11, 2016 8:49 am

You might look in my emulator now to check how it behaves on the term of performance.

I added GUI controls to it, so you may turn off its frame rate limiter to see how far it can go (documentation is also updated, you can see them in the README on its GitHub front page)

In the current state it is a fair comparison with Uzem: everything in the AVR core is implemented what Uzem has (while it also tracks memory accesses). The SD card is not there, but the SPI peripheral is processed, so with games not accessing the SD card, it is the same. On my machine I get around 20% improvement compared to the best results I ever got with Uzem. You might check how it works on those old machines you mentioned.

Post by **D3thAdd3r** » Wed Sep 14, 2016 6:02 am

I think I might look into changing some core details, and see what speed gains can be had by simple clocking the module based on HSYNC. I don't expect single threaded would be fast enough still, but there might be some ways to optimize it. Unfortunately there are several timers and states that need to be updated/checked all the time. So there is always quite a bit going on under the hood that wouldn't appear obvious at first. I believe there is about the same amount of code lines just for the ESP8266 as there is for the entire normal Uzem.

I will try to compile your CUzebox. If you will implement SPI ram and SD support, then it might be best for my goals to jump aboard for the speed gains and functionality. There are network things I would like to try that simply can't work without more ram as well.

Uzebox Forums

SFML vs SDL

SFML vs SDL

Re: SFML vs SDL

Re: SFML vs SDL

Re: SFML vs SDL

Re: SFML vs SDL

Re: SFML vs SDL