Streaming Music(SD, SPI ram, Network, etc. source)

Post by **D3thAdd3r** » Fri Nov 03, 2017 5:48 pm

The mipmap approach is about the only thing that brings the PCM instruments back into consideration. I have been looking into baking some nearly lossless prediction based on the state of a simulation both sides agree to, that only encodes the errors versus the model they run, at each step as a differential. I think ultimately this is about as high quality of compression as possible, but it is not ideal to have to run the simulation on Uzebox even though it is not too bad. 1 model is not the best suited for all things, so then I think the trade in code space comes into the equation. Maybe a few models and the conversion program just picks the one that works best and builds that into the format. I have been messing with some 4 bit stuff as well, and I do think it can be very good just that the development time to actually make it usable and tested is alluding me lately(60 hour work weeks

)

YouTube · Post by **nicksen782** » Sat Nov 04, 2017 2:40 am

@nicksen782 if you would, for now just replace the version of SongBufBytes() in your uzeboxSoundEngine.c. Also I believe somehow this did not make it into the version I put up, in defines.h

Confirmed. This fixed the problem.

I'm continuing my study.

Post by **D3thAdd3r** » Sat Nov 04, 2017 8:50 pm

Saving cycles is way more important than saving space, saving space is only interesting if it saves cycles by eliminating reads. If doing only sound effects...I think 8 seconds of PCM is rather overkill anyway. ADPCM bases the predictions on previous data in the stream...that doesn't work for instruments/notes versus just reading more data(which as the last demo shows is already something that take monstrous amounts of cycles if you wanted 3 or 4 channels). However for a different approach, I think it wins hands down, where a lossy compression would be ideal. A 5:1 compression scheme would let you directly take a high quality remake of a retro song(the type where they use modern synths or real instruments) as a WAV, and fit a 30 second song into about 100K. The quantization noise bothers me as it seems out of place in a game. It would be much faster than reading 4 different PCM for different channels, which so far looks impossible to fit inside most games.

I consider ~2:1 compression that can be quickly decompressed, simple to implement, and good for skipping; such as the solution Jubatian mentioned, to be the ultimate endpoint for instrument based PCM and probably PCM sound effects. The reasons I think this uncompressed method is not even ideal for sound effects, is just that most games probably need more than 1 simultaneous effect, and that already takes a ton of cycles. I also found this random thing and thought it sounded decent, though I wish the noise could be shaped lower or higher in the spectrum. Then I saw how crazy simple it is, which almost makes me forget about the obvious quantization noise...I actually can't believe it works:

This is simple 1 bit PCM...where all examples I noticed(besides something like Super CD format which is awesome IMO) would literally cause pain in my ears. I don't think this directly works, as it is an NES simulation that has an output frequency of 32khz. It is so easy to try of course I will, but I expect dropping to ~16khz is going to seriously break the 1 bit concept. At twice the bits maybe it makes up for half the output rate. What interests me is how dead simple it is, where here is a source listing for a program that takes in a raw file and outputs a compressed version:

Code: Select all

/* 2A03 D+PCM simulator - a program that simulates the audio output of the "2A03 D+PCM" (Delta+PCM) audio codec for the Nintendo Entertainment System.
Made by Victor Carneiro (github.com/VitinhoCarneiro)..
This program is licensed under the GNU Affero General Public License v3 (AGPLv3), available here: https://www.gnu.org/licenses/agpl-3.0.html

2A03 D+PCM codec specification:
	-The codec is made for low playback overhead - meaning that it can be easily integrated into CPU-demanding programs and does not require cycle-accurate timing, since it uses the DMC channel as an interrupt.
	-The samples are encoded into 2-byte blocks containing 8 samples.
	-The first byte holds the 7-bit PCM value for the first sample, with 1-bit of padding (currently unused, can be used ad libitum for some sort of signaling).
	-The second byte holds a block of 8 1-bit DPCM samples to be played back by the 2A03 after the first sample.
	
	Playback is done as follows:
		-The first byte is directly written to the DMC delta counter.
		-The second byte is then passed as the sample pointer for the DMC sample playback - that address's value should be the same as the 8-bit pointer address.
		-The DMC is then set to playback a sample of length 1 at speed $F and set to interrupt the CPU at the sample end.
		-The CPU goes back to executing normal code, until the interrupt is fired, when the process is repeated.
	
	In NTSC systems, the interrupt happens at a frequency of ~4142Hz; the actual sample rate is ~33143Hz.
	In PAL systems, the interrupt happens at a frequency of ~4156Hz; the actual sample rate is ~33252Hz.
	If desired, one can lower the sample rate by simply changing the DPCM speed - this will cause the interrupt to be executed at a lower frequency.
	
	The audio bitrate is fixed, at 64kbps for 32KHz playback.
	
	As of the date of creation of this program (October 8, 2016), there does not exist an actual decoder for this format that runs on the NES system.

This encoder has 3 different encoding modes:
	-Normal delta - The delta values are determined by simple DPCM encoding.
	-Differential delta (-dd, --diff-delta) - The delta values are determined by a hybrid algorithm - if the output error is low enough, it uses the derivative of the input values rather than the output error as deltas.
	-No delta - No deltas are written (all AA/55 values) - this results in very low sound quality and should generally not be used, since it basically outputs 7-bit PCM at 1/8th of the input sample rate with an extra wasted byte.
	
	Normal delta yields a higher treble noise floor, but better treble response, and may sound better in louder passages due to auditory masking caused by the high-frequency carrier noise.
	Differential delta yields a lower noise level in quieter passages, but has higher quantization noise at 2KHz.
	
The program accepts an unsigned 8-bit raw audio stream as input, and outputs an unsigned 8-bit raw audio stream that simulates how the input audio would sound like when encoded into the 2A03 D+PCM codec.
Usage:
	2a03-d+pcm [-dd|-nd] [ns]
	
	-h / --help - display usage
	-dd / --diff-delta - chooses the Differential Delta encoding mode. 
		  An optional number (0-255) determines noise supression value.
		(defaults to 1 if not informed)
		  Higher values supress high-frequency noise, but can increase aliasing
		artifacts.
	-nd / --no-delta - disables delta encoding (not recommended)
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>

static char* usage = "2a03-d+pcm - 2A03 D+PCM codec simulator by Victor Carneiro\n(github.com/VitinhoCarneiro) - Version 1.0\nUsage:	2a03-d+pcm [-dd|-nd] [ns]\n\n  -h | --help - display usage\n -dd | --diff-delta - chooses the Differential Delta encoding mode.\n	  An optional number (0-255) determines noise supression value.\n	(defaults to 1 if not informed)\n	  Higher values supress high-frequency noise, but can increase aliasing\n	artifacts.\n -nd | --no-delta - disables delta encoding (not recommended)\n\n";

int main(int argc, char** argv){
	uint8_t input[8];
	int16_t output;
	char delta = 1;
	uint8_t ns = 1;
	signed char zerodelta = 2;
	if(argc > 1 && (strcmp(argv[1],"--help") == 0 || strcmp(argv[1],"-h") == 0)){
		fprintf(stderr, "%s", usage);
		exit(0);
	}
	if(argc > 1 && (strcmp(argv[1],"--no-delta") == 0 || strcmp(argv[1],"-nd") == 0)){
		delta = 0;
	}
	else if(argc > 1 && (strcmp(argv[1],"--diff-delta") == 0 || strcmp(argv[1],"-dd") == 0)){
		delta = 2;
		if(argc > 2){
			if(sscanf(argv[2], "%hhu", &ns) != 1){
				fprintf(stderr, "Invalid value given for \"diff-delta\" - \'%s\' (must be 0-255 - defaulting to \'1\')\n", argv[2]);
			}
		}
	} 
	else if(argc > 1){
		fprintf(stderr, "Invalid argument \"%s\".\n%s", argv[1], usage);
		exit(0);
	}
	/*uint8_t ns_trig;
	if (ns <= 3){
		ns_trig = 2;
	} else {
		ns_trig = (ns >> 1) + 1;
	}*/
	while (read(STDIN_FILENO, input, 8) == 8){
		output = input[0] >> 1;
		putchar((unsigned char)output << 1);
		if (delta == 1){
			for(int i = 1; i < 8; i++){
				if (input[i] > output << 1){
					output += 2;
				} else {
					output -= 2;
				}
				if(output > 127){
					output = 127;
				} else if(output < 0){
					output = 0;
				}
				putchar((unsigned char)output << 1);
			}
		} else if (delta == 2){
			zerodelta = -zerodelta;
			for(int i = 1; i < 8; i++){
				if (input[i] - (output << 1) > ns){
					output += 2;
					zerodelta = -2;
				} else if (input[i] - (output << 1) < -ns){
					output -= 2;
					zerodelta = 2;
				} else if (input[i] - input[i - 1] > 2){
					output += 2;
					zerodelta = -2;
				} else if (input[i] - input[i - 1] < -2){
					output -= 2;
					zerodelta = 2;
				} else {
					output += zerodelta;
					zerodelta = -zerodelta;
				}
				if(output > 127){
					output = 127;
				} else if(output < 0){
					output = 0;
				}
				putchar((unsigned char)output << 1);
			}
		} else {
			for(int i = 0; i < 7; i++){
				output += zerodelta;
				zerodelta = -zerodelta;
				if(output > 127){
					output = 127;
				} else if(output < 0){
					output = 0;
				}
				putchar((unsigned char)output << 1);
			}
		}
	}
	return 0;
}

Anyway, this stuff is pretty interesting and I am nerding out on it.

Edit - @Jubatian, does this seem similar to the quantization noise you noticed in your 2 bit normalized chunks approach?

Post by **Jubatian** » Mon Nov 06, 2017 6:11 pm

This is different to mine. The problem with this is that it loses high-frequency components, with 7 bits the 1 bit delta just can't do anything to replicate stuff above 4KHz (32KHz / 8). If you halve the frequency, you would lose a lot from the typical frequencies we recognize.

My stuff on the other hand is a simple bit depth reduction preserving frequency. It adds quantization noise, but doesn't filter out higher frequencies.

I found the best is probably this combination:

Use 8 sample blocks.
One sample is 3 bits
The volume multiplier is 8 bits

So a block takes 8 * 3 + 8 = 32 bits (4 bytes), half the size compared to uncompressed PCM. It has no much noise when using it on actual music. Reducing sample size has drastic effect from this point, 2 bits / sample is quite noisy. Encoding 3 samples in a byte (6 values / sample; 6 * 6 * 6 = 216) sounds a lot better than that.

I think it is only doing something like MP3 or Ogg Vorbis would take things really further on this. Otherwise you could always generate procedural samples especially with the vsync mixer (you can realize AM or FM with the AVR). That of course doesn't need any SPI RAM, just a possibility which might solve the problem in some cases.

YouTube · Post by **nicksen782** » Tue Nov 07, 2017 7:05 pm

Simple question (I hope.)

How do I detect that the song has stopped playing? My MIDI tracks don't have the loop start and loop end markers. Some songs I would like to play without repeat and other songs I would like to repeat. I need to be able to figure out when they end.

... and I have began to integrate the streaming music into Bubble Bobble. So, real world tests are coming soon!

Post by **D3thAdd3r** » Tue Nov 07, 2017 8:20 pm

IsSongPlaying() should work, and all it does is return the value of playSong in uzeboxSoundEngine.c. You could wait to complete something until that returns 0 (indicating the song is over). I probably never tested it with this, so please let me know otherwise I will fix mconvert to add that if end event is not present in the MIDI.

I would recommend manually inserting S and E with midiconv, as it often needs several tweaks before it sounds right. It is extra code to do what the song player already does and I can't see any advantage to doing it manually in user code.

YouTube · Post by **nicksen782** » Tue Nov 07, 2017 8:24 pm

Perhaps there is a better way but here is what I came up with.

I don't actually need a loopstart value since I only need to be able to determine the beginning and end of song in my use case. My method is pretty simple though as you'll see below.

If a song is playing then playSong is true. If a song isn't playing or isn't playing anymore then playSong is false.

You shouldn't just check for playSong == false all the time because there may be times where you want silence. I just want to make sure that when a song ends that IT ENDS. With my code as long as the buffer isn't full yet then bytes will be read into it. Once the buffer is full then no bytes are written to the buffer. However, they are still there and the player hangs on whatever the last note was.

Solution? Memset.

Code: Select all

memset(&songBuf, 0, SONG_BUFFER_SIZE);

In the function I have for changing the song it resets the same things that the normal demo does but it also runs memset. Result? Buffer is actually cleared. Setting songBufIn, and songBufOut to 0 doesn't seem to be enough.

So, here is my change the track code for your review.

Code: Select all

void N782_changeSong(uint8_t songNumber, uint8_t autoStart){
	// Make sure the current song isn't playing.
	StopSong();

	// Clear the songBuf buffer. (If this isn't done then previous songs may hang throughout the next song.)
	memset(&songBuf, 0, SONG_BUFFER_SIZE);

	// Reset pointers and values.
	songBufIn   = 0;
	songBufOut  = 0;
	songOff     = 0;
	songStalls  = 0;

	// Get the base address of the next song from the beginning of the SPIRAM file.
	SpiRamSeqReadEnd();

	// Address the SPI RAM for a sequential read.
	uint8_t  bank = (uint8_t)  ((((uint32_t)( songNumber*4 ))) >> 16);
	uint16_t addr = (uint16_t) ((uint32_t) (( songNumber*4 )));

	// Set the new song base address.
	songBase = SpiRamReadU32(bank, addr);

	if(autoStart){
		// Reactivate the music player. It will play whatever is in the buffer.
		N782_fillStreamingMusicBuffer(1);
		StartSong();
	}
}

And, if I want the song to repeat I can do this. Notice that the song counter is commented out?

Code: Select all

		// Is the song over?
		if( !playSong && SongBufFull() ){
			// If at the last song start over on song 0.
			// if(++songNum >= songCount){	songNum = 0; }

			N782_changeSong(songNum, true);
		}

Any thoughts? At least I can determine if the song has finished now. Looping is easy since I just change the song to the same song which restarts anything anyway.

Post by **D3thAdd3r** » Wed Nov 08, 2017 2:36 am

nicksen782 wrote: ↑Tue Nov 07, 2017 8:24 pm However, they are still there and the player hangs on whatever the last note was.

If you call StopSong(), regardless of how full the buffer is, it should immediately stop processing note events, and set some rather sharp envelopes that will quickly degrade any active notes volume.. The buffer could still be full of course, but if you ResumeSong() then that is what you want. If you change songs as the demo, then you set the head and tail to 0, which effectively empties the buffer without any memset overhead(the old values are still there, but they will overwritten before they are used again). It should work exactly like that, otherwise there is a bug somewhere.

nicksen782 wrote: ↑Tue Nov 07, 2017 8:24 pm Setting songBufIn, and songBufOut to 0 doesn't seem to be enough.

I really need to understand a bit better, can I get a .hex? The reason I can't understand(and might suspect something else) is this:

Code: Select all

while(!nextDeltaTime){
				
				if(SongBufBytes() < SONG_BUFFER_MIN){//no notes can play if this evaluates < 2(default)
#if STREAM_MUSIC_DEBUG == 1
					songStalls++;
#endif
					break;
}

Then:

Code: Select all

u8 SongBufBytes(){//this just got fixed, but perhaps a bug still remains?
	if(songBufIn > songBufOut)
		return (songBufIn-songBufOut);
	else
		return ((sizeof(songBuf)+songBufIn)-songBufOut)%sizeof(songBuf);
}

I really should get the non-buffered mode added, just that I was not sure how to do it cleanly, so I am just going to #define it in there and be done with it. The way you code it, and the demo does it, is basically filling whatever is missing each frame. For less cycles and ram, the song player might as well read it directly if there is time, and if not the effect will be the same(or some byte per frame limiter could be implemented to "get ahead" on game logic and not eat too many cycles right before a worst case frame..). Then things like changing songs are exactly the same to the user as stock, and the code is smaller. The buffer started off with the SD in mind so a game could survive crossing sector boundaries, with the idea some smarter code would not even wait on that, and essentially seamless SD streaming(this should be possible, just more complex than the example). Anyway let me think on this, but are you able to reproduce any such thing in stock SPIRamMusicDemo? I have not seen anything like this.

Jubatian wrote:I think it is only doing something like MP3 or Ogg Vorbis would take things really further on this.

I have been looking at ffmpeg and other things, and that I understand it .ogg does a first pass with something like ADPCM which based on the characteristics of that moment, can change the scaling/predictor so variable bit rate. It doesn't stop there(and for Uzebox, unfortunately I think we would have to stop there), to get the big compression they do something else...it is not Huffman, but something like that I believe. I guess I find it unlikely we will be able to afford such resources in most games to do all that, and if there were that many cycles laying around in some non-time critical game, might as well do it like Uzeamp which sounds flawless without all the playing around. Meh..not sure what to do here, but I think I will implement something like your idea as I understand it; then see what it sounds like and where it can go for instruments.

YouTube · Post by **nicksen782** » Wed Nov 08, 2017 2:54 am

Great news!

I've integrated the streaming music into Bubble Bobble. There is music at the title screen, the intro screen right before the game starts, and within the game itself. Oh, and the game over screen. I track a flag for repeating the song or not.

In the buffer fill routine I make sure the the SDCARD and SPIRAM are not already in use (by checking their chip select pins.) If the SPIRAM is available then I read into the buffer until the specified number of vsyncs (1 in my case.) This is being called from a vsync post callback. I had to adjust some of my other streaming code (the level scrolling code for example) to not hold onto the SPIRAM's chip select the whole time. It takes it and releases it for each drawn row. This gives more opportunities for the SPIRAM buffer fill function to be able to run.

Result? I can scroll in a level AND play music at what appears to be the same time with no noticeable jitter. I can also move the character around and all the bad guys move on their own with no noticeable slow down.

To play another song you set the song number (each song has a #define title so that makes it easy), set if it will repeat, and then run the function that changes to the next song. The buffer fill function auto repeats the song if the repeat flag is set.

I was wrong about songBufIn, and songBufOut. Also the memset wasn't required either. It turned out that my MIDI files needed some tweaking.

I'm still finalizing the code. I haven't changed the sound engine or anything. I just wrote some custom routines that could generically handle the streaming music.

Post by **D3thAdd3r** » Wed Nov 08, 2017 3:11 am

nicksen782 wrote: ↑Wed Nov 08, 2017 2:54 am I had to adjust some of my other streaming code (the level scrolling code for example) to not hold onto the SPIRAM's chip select the whole time. It takes it and releases it for each drawn row. This gives more opportunities for the SPIRAM buffer fill function to be able to run.

Nice, so you are taking advantage of it for multiple things

nicksen782 wrote: ↑Wed Nov 08, 2017 2:54 amResult? I can scroll in a level AND play music at what appears to be the same time with no noticeable jitter. I can also move the character around and all the bad guys move on their own with no noticeable slow down.

Very cool, can't wait to see the next demo!

BTW I am not sure what the current state of your code is exactly during the integration, but one thing would be quite easy to overlook. If a song reaches the loop end(if you decide to add it in midiconv), there is data after that. That data matches exactly the data that is after loop start, and so it might seem superfluous to do it the way the demo does, but the purpose is so that you never buffer a byte that is unusable even if you have 256 bytes after the loop. Just so that there is no weak point where you are out of buffer or have to throw away buffered data(like you would have to, if data after the loop end was garbage or the start of another song). This had to be, since you could be several frames buffered ahead, by the time the song player ever told you the song was over. You might have understood that, but it is the trickiest part to grasp on this so just want to make sure.

Uzebox Forums

Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)

Re: Streaming Music(SD, SPI ram, Network, etc. source)