MegaBomber

Use this forum to share and discuss Uzebox games and demos.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: MegaBomber

Post by D3thAdd3r »

danboid wrote: Sat Mar 04, 2023 8:15 pm ...You don't seem to hint at any kind of non coder, musician friendly composition workflow in your last post, not that I could discern at least...
I encourage all musicians who are not currently a programmer, to become a programmer :lol: But in seriousness, musicians for MIDI would likely be best served with the UzeSynth idea you speak of, and/or RoseGarden. For PCM, any tools will work outside Uzebox and just mix down to 8bit mono at 15.7Khz. I think you will find the SD image build script makes this as easy as possible and hides some tedious inner workings.

Integrating sound into a game will always require programming to some degree, but ArtcFox, Uze, myself, and others have always been about making things easier to get started, and adding tools when there is a tangible clearly defined idea to do it. I just have no new ideas on that, and ArtcFox has already made encompassing beginner friendly tutorial videos on making Uzebox MIDI.
Artcfox wrote: Sat Mar 04, 2023 8:17 pm ...you are essentially using the SPI RAM as a swap space for actual RAM, paging in chunks as you need them, and the SD card will feed the SPI RAM with even more pages that can be swapped into actual RAM, allow you to achieve multiple PCM sound effects?...
Exactly, still working on this part as I don't have the entire system cooperating with itself yet(needed a full format spec and data). Also should have mentioned the intent is to add an option like -DSOUND_MIXER=2, where dropping the channel controls and song player items will yield back(forgot magnitude, small but significant) bytes.
Artcfox wrote: Sat Mar 04, 2023 8:17 pm ...I wonder if you could use the RAM tiles space as a temporary intermediate buffer for going from SD card to SPI RAM, as long as you do it before blitting happens, and can finish before blitting starts, you should be able to reuse that RAM for free essentially I think.
This ^ I think that is probably the best idea where possible. I don't have enough built out to do a true cycle test yet(a visualizer and a bouncing ball sprite are it, maybe 3K cycles)so I have a dedicated buffer for simplicity. Even if a game can't hit that on cycles, as you know there is still graphical benefits of having larger ram_tiles[].

I actually hadn't got to thinking about this much...really for Mode 3, ideally even the HSYNC buffer should point at unused ram_tiles(which might allow tricks with extreme timing, or extra graphics with volume set to 0). SetUserRamTiles() kind of thing...I'll convert the demo to use ram_tiles[] now, since that is the case everyone should target if using M3.
User avatar
danboid
Posts: 1931
Joined: Sun Jun 14, 2020 12:14 am

Re: MegaBomber

Post by danboid »

D3thAdd3r wrote: Sat Mar 04, 2023 8:56 pm This ^ I think that is probably the best idea where possible. I don't have enough built out to do a true cycle test yet(a visualizer and a bouncing ball sprite are it, maybe 3K cycles)
When I read this my retro computing wired brain interpreted this as DA having ported the classic Amiga boing ball demo to the Uzebox, but then I made the mistake of going back to re-read what you said and thats not what you said at all but now this to me seems like its almost more important to get running than the holy SID player - a recreation of the classic Amiga boing ball demo on the Uzebox. Could it be done?
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: MegaBomber

Post by D3thAdd3r »

I'm interested in SID, I did briefly try to port some Arduino player but never got anywhere. I do think it's possible minus filters.

So a small megasprite version of the Amiga ball with enough frames to appear to rotate, that would be cool and I might take that idea. But to really do it justice and replicate to the highest possible would be a custon video mode I think. That's more CunningFellow, Uze, Jubatian, ry755, and others good at assembly territory. There is probably already a Jubatian video mode that would work for it actually if someone was interested. I was at one point half competent in AVR Asm, and had some working if not ideal mode tests, but I'm now a rusty novice at best :roll:
User avatar
danboid
Posts: 1931
Joined: Sun Jun 14, 2020 12:14 am

Re: MegaBomber

Post by danboid »

Think how much Commodore / demo scene buzz and new dev interest you could harvest with one bouncing ball CF! We won't be able to keep them away.

I'd probably have my Uzebox running the boing ball demo most of the time if someone pulled it off.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: MegaBomber

Post by Artcfox »

D3thAdd3r wrote: Sat Mar 04, 2023 8:56 pm
Artcfox wrote: Sat Mar 04, 2023 8:17 pm ...you are essentially using the SPI RAM as a swap space for actual RAM, paging in chunks as you need them, and the SD card will feed the SPI RAM with even more pages that can be swapped into actual RAM, allow you to achieve multiple PCM sound effects?...
Exactly, still working on this part as I don't have the entire system cooperating with itself yet(needed a full format spec and data). Also should have mentioned the intent is to add an option like -DSOUND_MIXER=2, where dropping the channel controls and song player items will yield back(forgot magnitude, small but significant) bytes.
Artcfox wrote: Sat Mar 04, 2023 8:17 pm ...I wonder if you could use the RAM tiles space as a temporary intermediate buffer for going from SD card to SPI RAM, as long as you do it before blitting happens, and can finish before blitting starts, you should be able to reuse that RAM for free essentially I think.
This ^ I think that is probably the best idea where possible. I don't have enough built out to do a true cycle test yet(a visualizer and a bouncing ball sprite are it, maybe 3K cycles)so I have a dedicated buffer for simplicity. Even if a game can't hit that on cycles, as you know there is still graphical benefits of having larger ram_tiles[].

I actually hadn't got to thinking about this much...really for Mode 3, ideally even the HSYNC buffer should point at unused ram_tiles(which might allow tricks with extreme timing, or extra graphics with volume set to 0). SetUserRamTiles() kind of thing...I'll convert the demo to use ram_tiles[] now, since that is the case everyone should target if using M3.
Very exciting! I think using the RAM tile memory is the easiest win, especially if you are using mode 3 as a blitting mode where you have total control over when everything happens.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: MegaBomber

Post by D3thAdd3r »

The file builder works now with correct pointers/offsets, so it's quite easy to make whatever arrangement of PCM you want from an individual file for each sound effect, and each song. I was crazy to imagine I would just going to "real quick" beat this up in a hex editor. It's an aggressive little shell script, but it seems to do the job and the configuration couldn't be simpler(would require WSL bash for Windows). I'll write a tutorial once I have something to show.

I am aiming to change the ball graphics to small Amiga like ones(2x2) for the visualizer as a representative sprite load for testing. Actually looks cool so far bouncing off the dancing sound visualizer(sort of "equalizer" looking). Physics are 8.8 fixed point(super fake collisions with visualizer ATM), so they are actually pretty neat looking in motion.

Got this switched over to ram_tiles[] for the SD buffer and started working towards making HSYNC buffer a user pointer. I need to improve things, but various compile flags are added to remove the normal sound engine items across the kernel. So far this yields back 133 bytes RAM(2K+ flash). Aaaaand I picked up a buzzing sound I will need to figure out. The pulse timing is still correct according to Cuzebox, there is something I'm not doing to the buffer samples that the standard kernel does do..I haven't looked at this stuff in so long, but the amount of work to do in HSYNC for this mixer is very small and simple in comparison to the quite complex standard inline mixer.

So in theory this means the normal benefits of decreasing screen height(someone curb my enthusiasm if I'm missing something) are amplified significantly. I haven't figured it out yet, but it could be thousands of extra cycles gains back for user code? Enough to offset SPI RAM copies?

@Alec
So I remember making some poor video modes in the past, where you need to modify the interrupt to end a scanline. This seems to be absolutely required for Mode 3 and others, but then it seems like this should not be modified....I just don't recall exactly how that worked.
Basically at the end of your vsync mixer I'm stripping down from I have:

Code: Select all

....much code above...
	pop ZL
	pop r18
	pop r17
	pop r16

	;*** Video sync update ***
	sbrc ZL,0								;pre-eq/post-eq sync
	sbi _SFR_IO_ADDR(SYNC_PORT),SYNC_PIN	;TCNT1=0xAC
	sbrs ZL,0								
	rjmp .+2
	ret

	ldi ZH,20
	dec ZH
	brne .-4
	rjmp .

	;*** Video sync update ***
	sbrc ZL,1								;hsync
	sbi _SFR_IO_ADDR(SYNC_PORT),SYNC_PIN	;TCNT1=0xF0
	sbrs ZL,1								
	rjmp .

	ret 
Currently the HSYNC is triggered for the cycles before the pulse to have time, I think. Can one modify TCNT1 and just pop into the code later(hitting the same HSYNC timing overall), mixed a single sample, send the pulse, and bail out early?
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: MegaBomber

Post by CunningFellow »

I am not sure how other modes did it, but T2K which was the first interrupt ended scanline mode did not use the timer counter interrupt to end the scanline. It left that one untouched. It used the OVERFLOW interrupt. and set the counter to [0x10000 - time]
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: MegaBomber

Post by uze6666 »

I'm a bit confused when you guys talk about the mixers, terms are sometimes inversed. The inline (or HSYNC) mixer is mixing one sample for every video scanline. The VSYNC on the other hand mixer mixes 262 samples on every video frame (60Hz).

Looking at mode 3 code is appears to had received the Jubatian treatment so I am a bit lost. Now I can't recall if I did or he did but the timer overflow trick in in there. You can see here we it is set and here is were the line ends after the interrupt.
Currently the HSYNC is triggered for the cycles before the pulse to have time, I think. Can one modify TCNT1 and just pop into the code later(hitting the same HSYNC timing overall), mixed a single sample, send the pulse, and bail out early?
I'm afraid I'm not grasping what you are trying to achieve here. But for sure, you can pop out of code but not in at an arbitrary place. This is because the interrupt vector is fixed in flash.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: MegaBomber

Post by D3thAdd3r »

I'm making things confusing on terms probably.

In the context of a sound system which has no traditional wave/noise channels, and just a single PCM channel. In fact this PCM channel simpler than the normal one, in that there is no volume scaling or fractional positioning. Literally 1 raw sample which should have been mixed at the appropriate volume prior, and it should have been 15.7Khz sample rate. Prebaked music should not have been so loud that it clips with sound effects added to the waveform(hoping this pure integer maxes sound quality versus fractional)..ok, the basic undersranding I have, if you will do me the large favor of sanity checking the following understanding:

I'm basically copying down to a simplified version of the original vsync mixer. No channels at all, no noise, no flash waves. No fractional volume nor position. Every sample is prebaked ready for direct output. For that, I should be prebaking as unsigned 8 bit?. The signed logic(and subsequent conversion to unsigned) of the existing was only necessary for the fractional flash based mechanics in the normal mixers?

NTSC standard mandates a consistent 59.XHz pulse sequence. This sequence needs to happen to indicate vertical sync(return electron gun to the top). Uzebox generates 60 FPS progressive, with no interlacing(technically possible to do interlacing).

After each 60Hz vertical sync, 262(by spec interlaced 262/261.5, but it works as old machines did also) horizontal sync pulse sequences happen(return electron gun to the left). The starting point, and the duration between, these pulses is fixed to complete 262 every 1/60 second. The first hsync must happen in perfect timing after the vsync as well(vsync itself doesn't reset horizontal).

Left alone at black level voltage during the scanlines, this sequence continues rendering a black screen. To generate sound that is not horrible, samples need to be output at the same interval each time. 15.7Khz is the only logical frequency to run for games/demos, but would it be possible to generate (15.7/2), or (15.7*2)? by halving or doubling the amount of samples(evenly spaced) across a raster line? Or is the clocking of the PWM more involved with this than I'm thinking?

User code operates after all scanlines have been drawn, which is a function of SCREEN_TILES_V and render_lines for Mode3-likes and others. This is to avoid interrupts(the hsync interrupt is turned off during this period and cycle counting pulse timing is used?) because all cycles are required with perfect timing during this sequence to generate the appropriate pixel colors for the modes fixed scanline resolution. The less lines drawn, the more time user code has? Are there caveats to these cycles gains, depending on when the first line/last line is drawn relative to the start of vsync? I think thats actually a tricky question to quantify.

The horizontal sync period must be sound output, because it is the only consistent high frequency event, where the CPU isn't locked into pixel generation. Other things may be possible, but not practical. The vsync mixer, and what I'm attempting, require the 262*2 buffer to precalculate samples, as there isn't time while rendering a scanline. The inline mixer doesn't need it, because sound generation happens incrementally during every hsync(complicated!). This revolution was why we got all the "free ram_tiles[] and cycles" back then, when you first made that mixer. I then promised to name my first born Uze, and technically, still haven't broken that promise yet :lol:

User code is also interrupted at hsync intervals for sound generation. These are faster updates, because they immediately return instead of burning a few extra cycles to keep a rendering line in sync. A precomputed sample is output from mix_buf[] just like every other hsync. But it seems if you have less work to do(like a single prebaked sample to output), you still need to burn cycles to align hsync. Could hsync interrupt be adjusted so that less cycles(during user time) are burned for alignment(basically the interrupt happens later, but achieves the same pulse timing, to represent less calculations than the vsync mixer I'm copying from). wondering if that would somehow break compatibility with video modes.
uze6666 wrote: Sun Mar 05, 2023 10:22 pm But for sure, you can pop out of code but not in at an arbitrary place. This is because the interrupt vector is fixed in flash.
So I want to pop in at the same location, but starting later in time since the amount of code is far less before sending the hsync pulse, then immediately return. Right now I replaced not needed stuff like volume/position fractionals with timed delays. Obviously I'd rather that somehow be user time instead. I feel if this is possible to optimize for the less hsync work required, it nullifies part of the otherwise unoptimizable SPI RAM buffer juggling requirement.

The plan then was to check how far into mix_buf the kernel is each frame, and if there is still time in the frame, fill all the way to right behind there(the sample just output, possibly more than 262 samples moved per frame then). This should hopefully help accelerate progress ahead in otherwise wasted time, to help on following frames where SD card "spool up time" and a read are required. A more complex multipart pre-spool vefore user code might be better...haven't thought that through yet. Would seem to break user code ability to access SPI RAM, which is almost certainly required to make up for buffer loss.

The other important component is that mix_buf isn't a dedicated array, but a user adjustable pointer. From a Mode 3 perspective, this would allow clever use of SetUserRamTiles() in conjunction with the mixer knowing if it is about to read SD or not.

The above would allow a variable length sound buffer to increase or decrease in size, based on load. This would mean on heavier frames, more ram_tiles are used as mix_buf so we don't run out of samples. That has the nice automatic feature that less ram_tiles are blit that frame, allowing an automatic catchup while still delivering max ram_tiles when it can. Not sure what will work, maybe a start and stop pointer, and hsync code could output 0x80 if it runs out. User code would need to update the start, before the stop point was reached.
User avatar
D3thAdd3r
Posts: 3221
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: MegaBomber

Post by D3thAdd3r »

Reasonable progress made. The demo now has a full load of SFX(4 banks, about 16 seconds). MegaBomber did not need all 4 banks, so there is some unrelated diversity. This demonstrates what you can expect for quality, and the limitations of the bank size. The scheme could support less banks that are longer, I may add the option later if there is demand. It seems just about enough for most things though. 1 bank in this, IMO sounds damn close to a general PS1 quality of SFX.

Very fast and easy to add SFX now. If they are already framed out in a file for you, it's < 1 minute work to downsample to add it to the SD image config. Rebuild is 250ms. 4 seconds bank time runs out fast though as you'll see. Not going to pull off a faithful Doom port with this, but at least some of it 8-)

There is now a wide variety of music downsampled, this is the best part. This is very easy for certain cases, however, some PCM song are *not* easy to loop. If it's just a PCM of a chiptune(the SNES bomberman stuff), it's pretty obvious how to it. Have a N64 one that I think loops ok. But in the interest of time, I wont perfect this out for all the songs here(16). They will be play once with a fade out, then done(but looping is supported if you prepare waveforms to do so). Some good songs here IMO, this has me pretty excited. Got the buzzing sound fixed though still wasting cycles and will need some asm before it all done. Music buffer shuffling is currently broken. SFX system is basically final state.

On the programming side things are becoming more clear how this can work in a game. I think it's possible to run at 60Hz, but it requires the game code to cooperate for the fastest solution. I haven't even started it, but I see that a separate SD state machine would yield thie highest gains. Right now several FSM inside one make it run, and everything yields to keeping the buffer full and avoiding static(even the game code is prevented from running if it would make the buffer run dry).

Without the SD wait spinlock, there is nothing here that takes all that unpredictably long(bank changes are ~300ms, but you control at will). If the game code can periodically drive the SD wait polling(can't use SPI RAM at that time), and do anything requiring SPI RAM for game logic inside of a callback, the only real choke point would be reasoably mitigated. If it really works like that, I think this is going to be something.

Bank changes need to happen during a transition, which forces some design constraints on games. The demo will give a good feel for this. Starting a song is also different(perhaps easier than in SPIRamMusicDemo though), in that it does require planning. I'm adding a mechanism that lets the user play out a song that's in the buffer, while instructing a preload of the next song to happen. 2 SPI RAM writes by user code to do it(inside a callback if the SD FSM idea pans out). In general game logic is always aware of when a transition will happen, here, you would just need to prepare some frames prior. The reason is a heavy frame on top of an empty buffer is really bad here. Worse than streaming music which can gracefully slow down(as it has prepared waveforms it can spin on), this can't at all. Clicks and static, very jarring. It must continue the waveform it started perpetually, so preperation always needs to be there. Hopefully still easy to use.
Post Reply