I'm making things confusing on terms probably.
In the context of a sound system which has no traditional wave/noise channels, and just a single PCM channel. In fact this PCM channel simpler than the normal one, in that there is no volume scaling or fractional positioning. Literally 1 raw sample which should have been mixed at the appropriate volume prior, and it should have been 15.7Khz sample rate. Prebaked music should not have been so loud that it clips with sound effects added to the waveform(hoping this pure integer maxes sound quality versus fractional)..ok, the basic undersranding I have, if you will do me the large favor of sanity checking the following understanding:
I'm basically copying down to a simplified version of the original vsync mixer. No channels at all, no noise, no flash waves. No fractional volume nor position. Every sample is prebaked ready for direct output. For that, I should be
prebaking as unsigned 8 bit?. The signed logic(and subsequent conversion to unsigned) of the existing was only necessary for the fractional flash based mechanics in the normal mixers?
NTSC standard mandates a consistent 59.XHz pulse sequence. This sequence needs to happen to indicate vertical sync(return electron gun to the top). Uzebox generates 60 FPS progressive, with no interlacing(technically possible to do interlacing).
After each 60Hz vertical sync, 262(by spec interlaced 262/261.5, but it works as old machines did also) horizontal sync pulse sequences happen(return electron gun to the left). The starting point, and the duration between, these pulses is fixed to complete 262 every 1/60 second. The first hsync must happen in perfect timing after the vsync as well(vsync itself doesn't reset horizontal).
Left alone at black level voltage during the scanlines, this sequence continues rendering a black screen. To generate sound that is not horrible, samples need to be output at the same interval each time. 15.7Khz is the only logical frequency to run for games/demos, but
would it be possible to generate (15.7/2), or (15.7*2)? by halving or doubling the amount of samples(evenly spaced) across a raster line? Or is the clocking of the PWM more involved with this than I'm thinking?
User code operates after all scanlines have been drawn, which is a function of SCREEN_TILES_V and render_lines for Mode3-likes and others.
This is to avoid interrupts(the hsync interrupt is turned off during this period and cycle counting pulse timing is used?) because all cycles are required with perfect timing during this sequence to generate the appropriate pixel colors for the modes fixed scanline resolution.
The less lines drawn, the more time user code has? Are there caveats to these cycles gains, depending on when the first line/last line is drawn relative to the start of vsync? I think thats actually a tricky question to quantify.
The horizontal sync period must be sound output, because it is the only consistent high frequency event, where the CPU isn't locked into pixel generation. Other things may be possible, but not practical. The vsync mixer, and what I'm attempting, require the 262*2 buffer to precalculate samples, as there isn't time while rendering a scanline. The inline mixer doesn't need it, because sound generation happens incrementally during every hsync(complicated!). This revolution was why we got all the "free ram_tiles[] and cycles" back then, when you first made that mixer. I then promised to name my first born Uze, and technically, still haven't broken that promise yet
User code is also interrupted at hsync intervals for sound generation. These are faster updates, because they immediately return instead of burning a few extra cycles to keep a rendering line in sync. A precomputed sample is output from mix_buf[] just like every other hsync. But it seems if you have less work to do(like a single prebaked sample to output), you still need to burn cycles to align hsync.
Could hsync interrupt be adjusted so that less cycles(during user time) are burned for alignment(basically the interrupt happens later, but achieves the same pulse timing, to represent less calculations than the vsync mixer I'm copying from). wondering if that would somehow break compatibility with video modes.
uze6666 wrote: ↑Sun Mar 05, 2023 10:22 pm
But for sure, you can pop out of code but not in at an arbitrary place. This is because the interrupt vector is fixed in flash.
So I want to pop in at the
same location,
but starting later in time since the amount of code is far less before sending the hsync pulse, then immediately return. Right now I replaced not needed stuff like volume/position fractionals with timed delays. Obviously I'd rather that somehow be user time instead. I feel if this is possible to optimize for the less hsync work required, it nullifies part of the otherwise unoptimizable SPI RAM buffer juggling requirement.
The plan then was to check how far into mix_buf the kernel is each frame, and if there is still time in the frame, fill all the way to right behind there(the sample just output, possibly more than 262 samples moved per frame then). This should hopefully help accelerate progress ahead in otherwise wasted time, to help on following frames where SD card "spool up time" and a read are required. A more complex multipart pre-spool vefore user code might be better...haven't thought that through yet. Would seem to break user code ability to access SPI RAM, which is almost certainly required to make up for buffer loss.
The other important component is that mix_buf isn't a dedicated array, but a user adjustable pointer. From a Mode 3 perspective, this would allow clever use of SetUserRamTiles() in conjunction with the mixer knowing if it is about to read SD or not.
The above would allow a variable length sound buffer to increase or decrease in size, based on load. This would mean on heavier frames, more ram_tiles are used as mix_buf so we don't run out of samples. That has the nice automatic feature that less ram_tiles are blit that frame, allowing an automatic catchup while still delivering max ram_tiles when it can. Not sure what will work, maybe a start and stop pointer, and hsync code could output 0x80 if it runs out. User code would need to update the start, before the stop point was reached.