An RPG on Uzebox

Use this forum to share and discuss Uzebox games and demos.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: An RPG on Uzebox

Post by Jubatian »

nicksen782 wrote: Fri Jul 20, 2018 2:59 pmThese are the blitting functions. Could they be more efficient? Can you see what I am doing here?
Apart from the suggestion by Lee, I can't really say anything, it should do the job proper. The main blitting loop should optimize well. One thing, maybe, but only if you really wish to push it: Check the assembly. Normally the compiler does a good job with constructs like "ramTile[thisPixel]" in a loop, converting them to indexing by an incrementing pointer with the proper base (instead of calculating offset every time), here it could get confused by that the ROM source also uses "thisPixel", preventing it to synthetize the most efficient code. Such a case you could rewrite it to have two pointers incrementing through the respective source & target tiles. But only if you really need the cycles, and you see that poor optimization indeed happens (by the assembly output).
User avatar
nicksen782
Posts: 714
Joined: Wed Feb 01, 2012 8:23 pm
Location: Detroit, United States
Contact:

Re: An RPG on Uzebox

Post by nicksen782 »

Checking the assembly makes sense. I can follow bits and pieces at this point.

How would I take a look at the assembly?

The reason that I was doing SpiRamReadInto was to just get 1 byte at a time. I am trying to avoid the streaming music from being interrupted by the draw/blit routines holding onto the bus for too long. The streaming music is a pre-vsync operation.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: An RPG on Uzebox

Post by Jubatian »

nicksen782 wrote: Sun Jul 22, 2018 5:02 pm Checking the assembly makes sense. I can follow bits and pieces at this point.

How would I take a look at the assembly?

The reason that I was doing SpiRamReadInto was to just get 1 byte at a time. I am trying to avoid the streaming music from being interrupted by the draw/blit routines holding onto the bus for too long. The streaming music is a pre-vsync operation.
For the assembly, usually you should get a .lss file in your build, at least the master repo's Makefiles and most other games' Makefiles contain the appropriate rules to get one. That's the disassembly listing which you can browse for such things.

The pre-vsync is called from interrupt at a fixed number of scanlines after the end of the video frame, which in Mode 3 is the practical vertical blanking period useful for running user code (as the kernel's sprite engine processes sprites in vsync, and if you are a lot of them, you wouldn't have much time after the end of sprite processing before the next frame). No matter how briefly you hold onto the bus, if the two happens to meet, it may have nasty consequences.

I think it would be more robust if you removed the streaming music's processing from there, and rather called it from your main program at the appropriate time (after a WaitVsync(1) to synchronize with the video frame). If your game wasn't losing frames, this would work. If it was losing frames, the music would become slower (as some commands will wait a frame), but that's not the end of the world. That however needs kernel modding (as far as I see, assuming you use Lee's implementation for the streaming music), the same thing I did in FoaD where I was experiencing nasty race conditions. The solution was basically processing music just after WaitVsync (FoaD is somewhat different, but the idea is this).

Even if you didn't do this, doing the job faster (like Lee suggested) is more advantageous than trying to split the SPI access into tiny fragments: if you mess around for too long and get your code spilling over Vsync, bad things are bound to happen.

Moreover you could add a more elaborate frame rate manager to skip rendering a frame if you detect that your code took too long (spilled into the next video frame), which also helps. It does since it can ensure that your sensitive display code always has a full Vblank to work with (it won't be started at an arbitrary point, which may get it tangled up with the Vsync interrupt).

Anyway, I think the best would be removing that music processor from Vsync as if you call it from main, at least your game becomes sequential regarding accessing the SPI RAM: it can not just break due to a hazard condition. That's the most important, more so than occasional skips in the music for example.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: An RPG on Uzebox

Post by D3thAdd3r »

Streaming Music is sort of designed to be last priority, since inherently it is more tolerant to occasional stalls due to SPI usage than other things. It seems to me also that doing this in user code time eliminates potential problems for no disadvantage that I see. This is why I went a custom WaitVsync route, though other solutions are certainly possible.

With Mode 15(SPI Vram) I am at a loss to find an easy way to coordinate it with Streaming Music. "Concurrent' SPI access gets tricky really fast. As much as I have tried, I think the only way to prevent race conditions in a system that demands immediate access every frame to SPI Ram, is to have a task queue with state machine, so that on the first wasted line of rendering, it can finish the current access and start again later to fill some buffer. I would do anything else to avoid that nightmare though. It seems you can easily avoid it in your case, which I would really recommend for sanity!
User avatar
nicksen782
Posts: 714
Joined: Wed Feb 01, 2012 8:23 pm
Location: Detroit, United States
Contact:

Re: An RPG on Uzebox

Post by nicksen782 »

This is how my music streaming handler starts:

Code: Select all

	// Skip the buffering if the mplayer is set to not active or if the SPI bus is in use.
	if(
		   !playSong                  // Not playing a song.
		|| !musicStruct.mplayerActive // Music player set to inactive.
		|| !(PORTD & (1<<6))          // SD Card chip select.
		|| !(PORTA & (1<<4))          // SPIRAM chip select.
	){
		return;
	}
Basically, it just bails if the SPI bus is in use or the player is set to off.

In my game state machine I have a while(gamestate==samegamestate) sort of thing. Inside is a WaitVsync(1). Without it the game loop may happen too fast. The music streamer function runs pre-vsync along with some simple counters.

Code: Select all

void pre_VsyncCallBack(){
	vCounter_8b_1++;
	vCounter_8b_2++;
	vCounter_8b_3++;
	vCounter_8b_4++;
	vCounter_16b_1++;
	N782_fillStreamingMusicBuffer(1);
}
The argument in N782_fillStreamingMusicBuffer specifies how many vsyncs to read music through. 1 is enough. It IS possible for some other SPI read to be happening at the same time as the music stream but that initial check defends against this. The music buffer only gets filled if the SPI bus is not in use for that vsync.

Mode 15 sure does sound pretty cool but you would need to read vram each vsync, right? You probably would need some sort of data access queue then. Should you then do the other SPI access directly AFTER the vram is read?

I have a great data structure generator for the game but I have a feeling I will need to refactor it before the game can grow. I always wanted to swap things back and forth with the SD card. I was thinking about breaking up the SPI RAM into logical partitions (like how a cell phone file system works, or Linux.) Right now the whole system depends on a series of #define offsets. In some cases at that offset is a table of offsets to where the needed data is (like a filesystem.) But, those offsets are useless once I start swapping data around. We shall see. I will likely post more on this in the future.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: An RPG on Uzebox

Post by D3thAdd3r »

Ah I get what you are doing then, so it seems there should be no random issues since basically you are doing something like an equivelant to yield() as far as I can tell. In a sense, these are a lot like concurrency problems/solutions when dealing with multithreaded programs. I guess it doesn't get too nasty unless the access happening during an interrupt must always take control immediately.

Mode 15 I really want to finish when I have time, but it definitely has lots of opportunities for race conditions. Even for collision detection, with some queue system, imagine breaking a block in a platformer and queueing that to be removed from vram. A few hundred cycles later before it was removed, a check is made against it for some other object as a collision test where it should pass through since it is supposed to be gone...but it isn't yet. Then it seems the generic answer is always more buffers(like store collision data in a separate 644 ram bitmap), but that is not ideal always. I think for the racing game..I will just race it and finish vram changes before proceeding and hopefully only do things that always win the race against vsync. A lot easier, unless it isn't due to unexpected problems, but it has potential for real headaches. I sort of expect to check against the timer to see if game logic should continue or WaitVsync early before something expensive late in the frame.

I would guess development might be a lot easier if you made it all a filesystem so your tools can make all the changes to the SD file instead of manual define changes. No doubt this is ambitious as hell, and changes are going to happen several times. Alternatively, and I think the performance gain would be meaningless(just avoiding lookups/sectore cues), could the tools generate something like a separate sd_offsets.h which has things that need not ever be manually modified? I would expect very small savings and functionally equivalent.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: An RPG on Uzebox

Post by Jubatian »

nicksen782 wrote: Mon Jul 23, 2018 6:56 pmThis is how my music streaming handler starts: (...)
Should be OK. Especially if like you mention further down, it is possible to prefill more frames. The condition looking at the chip select lines' states guards properly against corrupting a main program SPI access.

The only problem to look out for is deterministic poor timing, that is, if it was possible that the main program somewhat systematically ended up being interrupted within an SPI access, preventing filling the buffer (this could be crucial: whether such a condition could happen depends only on the overall performance characteristic of the code, if it is somewhat deterministic, and may end up aligning poorly with the relatively rare interrupt, you could end up with severely distorted music playback, and then it is caused by something not relating to the code itself, rather the timing, which is a fragile thing). Anyway, it can only cause hiccups in music playback without really breaking anything.

However if you have a buffer (N782_fillStreamingMusicBuffer()), you might be able to organize it so you fill it up from the main program, thus avoiding the timing hazard.

Mode 15 is similar to Mode 748 in that it uses the SPI RAM during the Video frame (it is about having the VRAM in the SPI RAM). That imposes timing problems similar to handling the streamed music, and you can't buffer it, neither put a similar guarding condition on it (as then you would miss a complete frame, the screen flickers). I don't know whether it did anything in Vsync too, however if it is Mode 3 otherwise, it does, including accessing the SPI RAM.

Also, Vsync isn't the Video frame, just mentioning. The Video frame (at least I call it so) is the duration when the AVR is busy drawing pixels on the screen, that 224 scanlines (if you use the full screen). WaitVsync() actually waits for this to happen (so it is NOT actually related to Vsync!). The VBlank is the period between two Video frames, where User code and Vsync code may run. The Vsync always happens when the TV set is ordered to start the next frame, which is somewhere around the middle of VBlank. The Vsync interrupt processes music, controller input, and in the case of Mode 3, sprites for the normal kernel, however after it returns, you are still in the same VBlank, and User code may still continue running before the next Video frame. (I removed all Vsync interrupt tasks in Flight of a Dragon to utilize the VBlank better, since as you may see, doing critical stuff in Vsync splits the Vblank, limiting what you can do in it) Probably this really should be documented somewhere... This figure in the Wiki is not right (part of Mode 3's documentation), as the bulk of the user code usually executes after "Kernel Renders Frame" (Video frame as I call it), before Vsync, in simpler games, often there is no User code executing where it indicates so (in that region in a sprite-intensive game, usually sprites render, called from the Vsync interrupt). In Iros, particularly, the situation can be observed well, especially the space shooter sections: you can recognize where the user code spills past Vsync by how the game behaves.
D3thAdd3r wrote: Tue Jul 24, 2018 5:30 pmI will just race it and finish vram changes before proceeding and hopefully only do things that always win the race against vsync.
Maybe rather try to win the race against the next video frame! That's easier (you have the full VBlank). I think the original kernel & Mode 3's methods should really be left behind for such SPI RAM intensive stuff, rather try to work as if Mode 3 was used with SPRITES_VSYNC_PROCESS set zero (so you can still use the normal sprite engine, but the kernel no longer does the sprites "automagically" from Vsync). With that you only have the Video frame versus User code (which could spill out of Vblank) conflict, without the Vsync interrupt for added trouble (if you do streaming music, you still have SPI RAM access in Vsync...).
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: An RPG on Uzebox

Post by D3thAdd3r »

Ah I am guilty of what I hate the most: false positive/misinformation! I will need to update that old diagram then. I do think it is very important to have something that exactly explains the sequence, and I would be in favor of some wrapper/define or something to keep compatibility with old source(or just update it), and use a more correct name for user code and in the kernel(WaitVblank()?). I throw around vblank and vsync interchangeably like others, mostly because when discussing it, the name of that function sort of dictates the language, to avoid confusion. What do you guys think about changing it? I will try and post an updated diagram and confirm all info is correct, since this I believe is likely the most commonly misunderstood thing by developers. To be honest, I always forget the details when I am away for a bit, and only look them up(based on that wrong diagram!) when I need to remember.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: An RPG on Uzebox

Post by Jubatian »

Some good in-depth info for the kernel would be nice indeed. It is designed so normally you can just slap together a simple game and it would run (normal kernel + Mode 3 doing everything automatically), but when one starts requiring advanced stuff, things start to fall apart subtly.

Vsync has an importance why it is required: It is precise timing. The vsync interrupt is exact 60Hz. The end of the video frame isn't necessary exact 60Hz if you dynamically altered render start and height (SetRenderingParameters). Which have legitimate uses (such as screen shaking), although currently it can not really be done anyway due to the kernel not being able to handle it too well. The exact 60Hz (262 scanlines) nature of the vsync is necessary for some types of audio code for example (putting aside anything currently sitting there).

Altering names of functions, type signatures are something which could be nice for cleaning things up (the plain char type used extensively is especially nasty as its signedness is implementation defined, could hinder creating code which can be compiled for something else too), but it is difficult due to compatibility. This includes just adding things, later that might also produce confusion (if I have a WaitVblank() and a WaitVsync(), then sure the latter is for waiting the Vsync... And so another bug happens).

Anyway for SPI RAM stuff like this RPG game, it is important to have a good understanding of what happens during and between frames to see where conflicts in SPI usage may arise and how they could possibly be dealt with.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: An RPG on Uzebox

Post by Jubatian »

I added a new page to the Wiki:

http://uzebox.org/wiki/Video_mode_operation

Hopefully it helps clearing up some of the matters. It is linked from the Video modes page.
Post Reply