I've yet to delve into the world of writing networking code so I don't really have anything interesting to add at this stage but I'm interested to know what language, libraries etc you're likely to choose to write the server software. People seem to like using go for stuff like this these days but don't take that as a recommendation, I woudn't know what to recommend. Speaking of Go, there is also TinyGo
https://tinygo.org/
I think we can safely assume what you have access to isn't even going to notice its running when it is up and going, until UZENET dwarfs Steam and the Play store etc that is!
MegaBomber
Re: MegaBomber
For me personally, I like C and raw Berkley sockets. These have worked great for decades, and are easy. There are libraries/dependencies to "simplify" this with abstractions, but I never found one too useful. SDL might be useful and is already required for CUzeBox, but even then for these little details, I'd rather it be able to compile using straight Linux default stuff. I'd support Windows later if there was demand. Winsock is based on Berkley sockets, and only a few OS specifics are needed.
Yes the load of the server would be quite low as it's a glorified chatroom, with useful Uzebox networking commands. Additional services I imagine as bots that connect to the server, but are hosted somewhere else potentially. I have spare just a humble rackmount 8? Core 2.6Ghz with 32G RAM, SSD, and 10Gbps SFP+ which is overkill, but doing nothing now.
Basically I'll run what I have, and if some other server standard exists we can see what to do then to merge efforts.
Yes the load of the server would be quite low as it's a glorified chatroom, with useful Uzebox networking commands. Additional services I imagine as bots that connect to the server, but are hosted somewhere else potentially. I have spare just a humble rackmount 8? Core 2.6Ghz with 32G RAM, SSD, and 10Gbps SFP+ which is overkill, but doing nothing now.
Basically I'll run what I have, and if some other server standard exists we can see what to do then to merge efforts.
Re: MegaBomber
Sockets seems indeed like a great abstraction layer. Love the idea.For me personally, I like C and raw Berkley sockets.
Yes the load of the server would be quite low as it's a glorified chatroom, with useful Uzebox networking commands. Additional services I imagine as
A subject for another thread but the whole uzebox.org domain is hosted at GoDaddy and on my plan I seemingly have access to develop database based apps in PHP, Perl, Python and Ruby too. Besides PHP and Perl which I'm not too fond of, never tried Python or Ruby. For low bandwith / high latency games we could develop some sort of hub on there at no extra cost. At least with Python I think you guys could also develop stuff I could then upload/install on my hosting provider with relative ease.bots that connect to the server, but are hosted somewhere else potentially. I have spare just a humble rackmount 8? Core 2.6Ghz with 32G RAM, SSD, and 10Gbps SFP+ which is overkill, but doing nothing now.
Basically I'll run what I have, and if some other server standard exists we can see what to do then to merge efforts.
Re: MegaBomber
That would be cool for a high score table that integrates into the website!
I was screwing around with completely write only ESP8266 code for Solitaire. Basically if an ESP8266 is listening and was setup for autoconnect to somewhere, it just sent out a new world record during the high score screen. If it's not there nothing happens, oh well. The resources for a 4800 baud buffer a tiny code segment are modest enough to put into most existing games like Donkey Kong, etc.
I was screwing around with completely write only ESP8266 code for Solitaire. Basically if an ESP8266 is listening and was setup for autoconnect to somewhere, it just sent out a new world record during the high score screen. If it's not there nothing happens, oh well. The resources for a 4800 baud buffer a tiny code segment are modest enough to put into most existing games like Donkey Kong, etc.
Re: MegaBomber
Added a lot to the demo, went nuts and made a dialogue script which follows with song changes, effects, and probably some stuff no one would expect. Very silly random stuff, you'll find it humorous or think I'm an idiot, either way it's probably the most engaging original content I've ever made. Should make use of about every Track and SFX. Trying to add more cameo appearences to justify the 50MB music file and fill out flash more. Flash space is wide open, 30K
There is never really an issue mixing music in time and the SD buffer could be way smaller just for that. I'm working on the demo demonstrating filling the buffer, then using the SD to drop graphics for a while, then fill buffer back up. So the script follows the implemented and planned events/effects and I'm still running it all at 60Hz. But it wouldn't work for a game even though the demo load at this point is more than a lot of games. It's brittle and the stack can fall out when changing songs.
Mixing always happens in time but sector reading always gets caught in the middle. No way around it, I tried about every arrangement of things. It reads SD as fast as possibe at 11K cycles a sector. You can't use SPI RAM in the middle or stop the transfer, because partial sector reads as a strategy is not sustainable and has no usable benefit. I tried weird arrangements but it's no use, so all reads are sector aligned now(except the loop, but the last sector read has prestored data from the loop beginning to help). The RAM gained from the stock mixer works out to be less than half the sector buffer. Have a crazy idea on that.
PCM mixer has lots of free cycles, and I'm starting to unravel sdBase.S into HSYNC SD actions. By my calculations, it may be possible without losing any user time over VSYNC mixer, to read 2-3 bytes per scanline during screen rendering. The idea is that user code runs, uses SPI RAM as needed and cleans up, then the CustomWaitVsync() will consume remaining time to mix the buffers and determine where and what needs buffering for the song.
The change is instead of attempting to get a sector read done before the flag(which will always get interrupted, though that technically works so far), it just cues the sector read and waits. The bulk of the work will get done while the video mode is outputting lines. This would basically be a 75% decrease in cycles load over the existing, since those cycles are wasted currently in PCMMixer.s
The HSYNC code being so open in this mixer, will read bytes from SD to wherever it was instructed to write them. Even with UART I think I can get 3 bytes in per line and be compatible with all video modes. This would allow a full sector read to complete, even if a game had a screen only 22 tiles high(like the demo). If after that the end transmission can be in HSYNC that yields some more cycles back.
Mixing music and SFX(in that order) can't happen in HSYNC, but by intertwining the signed calculations of samples with the delays on SPI RAM data, the asm version of the mixer(chopping up spiram.s basically) could probably save a chunk of cycles as well. User code could still use SPI RAM directly, and also access the SD at will for multiple seconds through the HSYNC sector read interface. All theory, but the cycles add up as I'm looking through the code so far.
Huge speculation, but the mix_buf itself, which represents the main downside at 524 bytes can actually get dual use in this scheme, since SD access is controlled during this time. Here, just use the buffer space for already output sound samples. It just might work, if it did, it would mean a game would need to immediately take that data to SPI RAM. Except...there are still scanlines left after the sector at 3 bytes. If sector reading is done, it could start freeing up bytes that are just about needed again as it loops back around.
With a call before user code to finish up the remaining SPI RAM fill, the user would need not consider anything. It couldn't miss(extra disasterous if it did), since both SD and SPI RAM control are complete before the user can do anything. 3 bytes a scanline would be crucial for all this, but if it were so, I feel those are realistic if speculative implications. Nothing real tricky in asm either since it's spelled out in existing code, so I'm attacking the asm part with the assumption it will work at the end and pay off. This best case(with recovered ~120 stock player bytes)would mean a PCM game would basically lose ~6 ram_tiles[] over inline mixer; less than the vsync mixer.
Basically the "use half mix_buf and race it" idea turned into "use full mix_buf, win an easy race against it, but eliminate a 512 byte SD buffer by multiplexing it during the race". Last idea is to have an arbitrary pointer replace mix_buf[], so the user might in very limited cases, multiplex the RAM even further to defeat more of the loss. Since the audio buffer to be consumed during user time is totally predictable after vsync, this would mean there is a temporary safe spot to unpack game state variables, operate on them, and pack them back into SPI RAM. This in turn might allow a game to recover back yet more ram_tiles[]. Obviously never miss a frame in this crazy scenario! Sounds like I'm blowing smoke, but I don't see any logical inconsistency with limits I can see in the hmixer, SPI RAM, or SD asm that is known working.

There is never really an issue mixing music in time and the SD buffer could be way smaller just for that. I'm working on the demo demonstrating filling the buffer, then using the SD to drop graphics for a while, then fill buffer back up. So the script follows the implemented and planned events/effects and I'm still running it all at 60Hz. But it wouldn't work for a game even though the demo load at this point is more than a lot of games. It's brittle and the stack can fall out when changing songs.
Mixing always happens in time but sector reading always gets caught in the middle. No way around it, I tried about every arrangement of things. It reads SD as fast as possibe at 11K cycles a sector. You can't use SPI RAM in the middle or stop the transfer, because partial sector reads as a strategy is not sustainable and has no usable benefit. I tried weird arrangements but it's no use, so all reads are sector aligned now(except the loop, but the last sector read has prestored data from the loop beginning to help). The RAM gained from the stock mixer works out to be less than half the sector buffer. Have a crazy idea on that.
PCM mixer has lots of free cycles, and I'm starting to unravel sdBase.S into HSYNC SD actions. By my calculations, it may be possible without losing any user time over VSYNC mixer, to read 2-3 bytes per scanline during screen rendering. The idea is that user code runs, uses SPI RAM as needed and cleans up, then the CustomWaitVsync() will consume remaining time to mix the buffers and determine where and what needs buffering for the song.
The change is instead of attempting to get a sector read done before the flag(which will always get interrupted, though that technically works so far), it just cues the sector read and waits. The bulk of the work will get done while the video mode is outputting lines. This would basically be a 75% decrease in cycles load over the existing, since those cycles are wasted currently in PCMMixer.s
The HSYNC code being so open in this mixer, will read bytes from SD to wherever it was instructed to write them. Even with UART I think I can get 3 bytes in per line and be compatible with all video modes. This would allow a full sector read to complete, even if a game had a screen only 22 tiles high(like the demo). If after that the end transmission can be in HSYNC that yields some more cycles back.
Mixing music and SFX(in that order) can't happen in HSYNC, but by intertwining the signed calculations of samples with the delays on SPI RAM data, the asm version of the mixer(chopping up spiram.s basically) could probably save a chunk of cycles as well. User code could still use SPI RAM directly, and also access the SD at will for multiple seconds through the HSYNC sector read interface. All theory, but the cycles add up as I'm looking through the code so far.
Huge speculation, but the mix_buf itself, which represents the main downside at 524 bytes can actually get dual use in this scheme, since SD access is controlled during this time. Here, just use the buffer space for already output sound samples. It just might work, if it did, it would mean a game would need to immediately take that data to SPI RAM. Except...there are still scanlines left after the sector at 3 bytes. If sector reading is done, it could start freeing up bytes that are just about needed again as it loops back around.
With a call before user code to finish up the remaining SPI RAM fill, the user would need not consider anything. It couldn't miss(extra disasterous if it did), since both SD and SPI RAM control are complete before the user can do anything. 3 bytes a scanline would be crucial for all this, but if it were so, I feel those are realistic if speculative implications. Nothing real tricky in asm either since it's spelled out in existing code, so I'm attacking the asm part with the assumption it will work at the end and pay off. This best case(with recovered ~120 stock player bytes)would mean a PCM game would basically lose ~6 ram_tiles[] over inline mixer; less than the vsync mixer.
Basically the "use half mix_buf and race it" idea turned into "use full mix_buf, win an easy race against it, but eliminate a 512 byte SD buffer by multiplexing it during the race". Last idea is to have an arbitrary pointer replace mix_buf[], so the user might in very limited cases, multiplex the RAM even further to defeat more of the loss. Since the audio buffer to be consumed during user time is totally predictable after vsync, this would mean there is a temporary safe spot to unpack game state variables, operate on them, and pack them back into SPI RAM. This in turn might allow a game to recover back yet more ram_tiles[]. Obviously never miss a frame in this crazy scenario! Sounds like I'm blowing smoke, but I don't see any logical inconsistency with limits I can see in the hmixer, SPI RAM, or SD asm that is known working.
Re: MegaBomber
Found the catch. Yeah I write it out to sanity check it sometimes, or just to see a compilation of theory. A lot of times it's also documentation for when I forget something I used to know 
The concept should be valid to save RAM, but the necessity to achieve a sector per frame means the mix_buf would need to become larger to fill both roles. It would overwrite the sound otherwise...you know, details details.
Still then instead of a 100% gain, it would be maybe almost ~50%. That does end up less ram_tiles[] than VSYNC mixer in reality. If it were only RAM considerations, one could just eat the buffer loss up front, and offload VRAM to SPI RAM to mitigate the bufferd. Not being able to swap back and forth with SPI RAM mid sector pretty much ends any other idea at the drawing board. Theres really no fat to trim on the kernel either one you exhaust all the existing compile switches.
Last ditch is multiplex over mix_buf until you cant, then change the pointer to VRAM. But that's a tough cycle trade to force a tile redraw every frame. Or do half size mix_buf, and do a direct cycle->RAM trade by forcing a full VRAM redraw each frame. I almost think that's a fair trade.
Not enough cycles but a fun idea that fails right at the surface then is to pack/unpack VRAM so as not to give it back to the user trashed.

The concept should be valid to save RAM, but the necessity to achieve a sector per frame means the mix_buf would need to become larger to fill both roles. It would overwrite the sound otherwise...you know, details details.
Still then instead of a 100% gain, it would be maybe almost ~50%. That does end up less ram_tiles[] than VSYNC mixer in reality. If it were only RAM considerations, one could just eat the buffer loss up front, and offload VRAM to SPI RAM to mitigate the bufferd. Not being able to swap back and forth with SPI RAM mid sector pretty much ends any other idea at the drawing board. Theres really no fat to trim on the kernel either one you exhaust all the existing compile switches.
Last ditch is multiplex over mix_buf until you cant, then change the pointer to VRAM. But that's a tough cycle trade to force a tile redraw every frame. Or do half size mix_buf, and do a direct cycle->RAM trade by forcing a full VRAM redraw each frame. I almost think that's a fair trade.
Not enough cycles but a fun idea that fails right at the surface then is to pack/unpack VRAM so as not to give it back to the user trashed.
Re: MegaBomber
There's no need to apologise for your detailed mental notes. Most of it goes over my head but others will understand more of it and its a valuable resource for your future self and other Uzebox devs and its good to know you're making progress.
I solve most of my computing problems by writing it out in detail like this. If I get stuck then I often find posting to a forum, mailing list or mailing workmates can often speed it up a bit as often I work it out before anyone answers. It's like tricking myself to think about it differently or better.
I solve most of my computing problems by writing it out in detail like this. If I get stuck then I often find posting to a forum, mailing list or mailing workmates can often speed it up a bit as often I work it out before anyone answers. It's like tricking myself to think about it differently or better.
Last edited by danboid on Sun Mar 19, 2023 8:31 am, edited 1 time in total.
Re: MegaBomber
This sounds great! I think you will beat me to the HSYNC SD code, which I don't mind. I think this will bring a lot of games that were otherwise impossible into the realm of possibility.
I love hearing your stream of consciousness ideas. I wish I had more time to just focus on learning AVR ASM.
I love hearing your stream of consciousness ideas. I wish I had more time to just focus on learning AVR ASM.
Re: MegaBomber
Yes I agree this really does change something to state it in peer reviewable writing.
Maybe, maybe not, hopefully there are correlations to be had either way since I think this will be a powerful concept in general.
I'm looking at how this setup can cooperate with other items, so I will probably leave the last 32K(~2 seconds) for general use alongside the music data. I wrote a section of the demo dialog/character script which requires music playback while accessing a different part of the file for a long animation. 32K is probably enough to preload level data larger than most platformers ever need, or whatever other uses. I'm just doing a high speed ram_tiles[] mandelbrot animation to demonstrate about a maxed out simultaneous offloading case without dropping music.
Inline mixer looks like maybe 1 byte per line to me, but someone experienced would probably find a way to get 2. It would be an unexpected selling point, but possibly only PCM mixer is open enough to get the full sector in 1 frame at 3 bytes.
Either way I'd guess the interface I'm thinking about is a reasonably general form. Let the user decide where the buffer is, and I suspect in many cases it will point to VRAM. For that scenario, it also means the scanline to start on needs to be controllable so the first HSYNC wouldn't trash the top left of the display before it's rendered. For advanced uses, an adjustable increment value for the write location in RAM would allow them to "multiplex onto a minefield".
I like Jubatian's approach with the display lists he runs in a few different modes/places for per scanline control. Simple approach here, one could chain lists(sequential in flash) to put different portions of sector data to different buffers in rather complex patterns.
Basically 1 pointer to a flash structure like:
Code: Select all
u8 startLine; //line to wait until, 255 for skip
u8 readLen; //save RAM, need at least 2/sector
u8 *dstBuf; //target for this sequence
u8 bufInc; //amount to increment pointer per byte read
...(more entries..or)
...(next entry startLine as 0 to end SD read)
For PCM, if you really didn't want to redraw VRAM, were a complete Mad Lad, and spent the time to plan/test, I'm guessing many parts of the kernel could have default values reset after serving this purpose and before they are needed. Otherwise presave VRAM or game variables to SPI RAM. Drastic but flexible.