Five clock per pixel RLE mode

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Five clock per pixel RLE mode

Post by CunningFellow »

Hi,

I've just finished the core of a new RLE mode.

It is 5 clocks per pixel. 256 pixels wide. Full 256 colours with one minor exception. If you have a single run of pixels that is only 2 wide, the second pixel can only choose from 128 colours (LSB must be set)

Any single change of pixel colour uses up 2 bytes of RAM.

It also uses less than 1K of flash.

It runs fine on real hardware, but at the moment neither of the "release" versions of the emulators Uzem or Cuzebox support it because it uses the USART0 Transmit Complete interrupt to end the scanline.

This is very similar to how I used Timer1_Overflow to end the scanline in Tornado2000 (and now has been used in a few other modes). It sets the UART for 7 Data bits, No Parity, 1 Stop and the Baud Rate Divider to 8. This is then a pseudo timer that ticks over at 1304 clocks.

The reason I had to use the USART as a timer is that TIMER1 was already being used for something else amazing.

Any time there is a run of 10 pixels or more of the same colour, the pixel-render-loop can return from an interrupt and start running user ASM code. This means you can keep filling the RLE buffer as the screen progresses.

You are REALLY "racing the beam"

For every pixel greater than 10 in a run the AVR gets to execute 5 extra CPU clocks. So the included UZE file which has 4 colour changes per line with each run being approx 64 pixels

Code: Select all

(4 x (64-10)) x 5 = 1080
cycles per scanline

Code: Select all

1080 x 224 = 241920
cycles per field

So that is about 8x as many free clocks as what a standard video mode gives. Of course that is assuming 4 colour changes per line. If you for example had 8 colour changes per lines you only end up with 90K free cycles (3x more than normal)

For the moment I have called if mode1337 due to how many "wait" clocks it had in free on the first line of the video mode.

It probably works well for things like 3D space ships flying around with lots of pixel runs of black for empty space :)

I'm having a few issues again so I'll be back in about 2 weeks.
Attachments
RLEDemo.uze
(10.99 KiB) Downloaded 675 times
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Five clock per pixel RLE mode

Post by Jubatian »

Interesting stuff for sure.

What I can't get is how the user code is supposed to be operating. 10 pixels make 50 cycles, that's just about enough for an interrupt exit and re-entry with resync while the user code is constrained to a set of free registers. Timer1 OVF I guess would do this then, at least this way it feels doable.

A better looking demo would be nice.

Possibly it would work well with SPI RAM source, maybe that could allow the Uzebox to do some flat-polygon 3D rendering.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

Jubatian wrote: Sat Oct 14, 2017 7:46 pm A better looking demo would be nice.
Are you not entertained?
Jubatian wrote: Sat Oct 14, 2017 7:46 pm Possibly it would work well with SPI RAM source, maybe that could allow the Uzebox to do some flat-polygon 3D rendering.
I don't overly like the idea of SPI RAM usage. To me it feels "not pure". The reason I developed this mode is so I could have lots of free clocks to fill up a small SRAM circular buffer with RLE data while it was getting emptied from the other end.

I am pretty sure I can do flat polygon rendering with the 256 bytes I have set aside for RLE buffer as it is. If 256 bytes is too tight I can expand the RLE buffer to 512 or 1024 bytes. But I want to keep as much RAM free for the polygon stuff as possible . I am doing my "back of napkin" calculations based on an average of 8 pixel changes per line. With a 256 byte buffer, that gives me 16 lines head start on the electron beam. It also gives me 440 CPU clocks to keep ahead of the beam per line. That is 55 CPU clocks per pixel change. Heaps of time I think.
Jubatian wrote: Sat Oct 14, 2017 7:46 pm What I can't get is how the user code is supposed to be operating. 10 pixels make 50 cycles, that's just about enough for an interrupt exit and re-entry with resync while the user code is constrained to a set of free registers. Timer1 OVF I guess would do this then, at least this way it feels doable.
Yes - I used TIMER1 to trigger an interrupt after X many pixels have elapsed. 10 is the minimum number of pixels there is enough clocks to enter/exit the interrupt

I may have over simplified the maths as the 10th pixel might not get a full 5 clocks free time. I have not rechecked if I have lost any recently. I know originally the numbers where ((Pixels - 11) * 5) + 3 and I got that down to (Pixels - 10) * 5. But it is about right.

The reason any "User code" needs to be in ASM is that some registers C needs are full. But really if you have only 55 clocks per pixel change to do the polygon edges and RLE encoding. You want it in ASM anyways.

The Pseudo code of the interrupt bit looks like this

If number of [free pixels] <10 then branch back and count pixels
Otherwise subtract 6 from the number of [free pixels]
Multiply ([free pixels] - 6) by 5 to get [Clocks to run timer]
Store [Clocks to run timer] in TIMR1
Restore r0:r1
Restore YH:YL
Restore ZH:ZL
Restore SREG
Return from interrupt

ISR_Code:
Save SReg
Save ZH:ZL
Save YH:YL
Save Zero
Read TCNT1
Correct for Jitter
Re-enable interrupts
Continue processing RLE-to-pixel loop


I'll post the heart of the ASM code up in a bit when I have checked the comments a bit. I think it is right/good but can't hurt having more eyes looking for problems with it.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

Here is the core of the ASM code for rendering a 256 pixel line. There is some setup code before this is run. The part in the comments with lots of time wasting RJMP and the comments "here be dragons" is obviously not complete.

For anyone that can't look at it on real hardware, here is what it looks like on the modified Cuzebox.
RLEDemo.png
RLEDemo.png (20.18 KiB) Viewed 16993 times
Attachments
gcrt1.zip
(6.25 KiB) Downloaded 627 times
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

And the C code in Main that makes that pattern is

Code: Select all

	RLE_ClrScreen();

	for (i=0; i<7; i++){
		RLE_AddRun(Blue2, 64);
		RLE_AddRun(Red3, (64-i));
		RLE_AddRun(Green3, (64+i));
		RLE_AddRun(White, 64);

		RLE_NewLine();
	}

	vram[VramPointer++] = 0x00;
	vram[VramPointer++] = 0xFE;         // 0xFE is a is currently a marker for "end of data" until the real ASM code is complete.
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Five clock per pixel RLE mode

Post by D3thAdd3r »

Huh, this sounds potentially pretty damn good with all those cycles. I have a hard time imagining exactly what the possibilities would be on this for real time images, but with enough care I bet there are impressive things that can be done this way which cannot be done otherwise. Seems pretty clever indeed being able to reuse ram as the screen is drawn(correct me if I understood this wrong), because I suspect otherwise ram becomes a serious issue. I'd love to do Missile Command some day, just that so far no mode could pull it off well.

I could imagine some demo with a Star Fox Arwing rotating with a scrolling star field or similar. Even a flat shaded 25 polygon model with moving dots behind it would be steps above what is probably possible with different forms of ram tile concepts.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

D3thAdd3r wrote: Sun Oct 15, 2017 1:20 ambeing able to reuse ram as the screen is drawn(correct me if I understood this wrong), because I suspect otherwise ram becomes a serious issue. I'd love to do Missile Command some day, just that so far no mode could pull it off well.

I could imagine some demo with a Star Fox Arwing rotating with a scrolling star field or similar. Even a flat shaded 25 polygon model with moving dots behind it would be steps above what is probably possible with different forms of ram tile concepts.
Yes - the whole point of this mode is to do 3D polygon things like Elite. The sparse black areas of space make for plenty of time to re-fill the RLE buffer.

The RLE RAM only has enough space for 16 or so raster-lines of data when set at 256 bytes. That is assuming 8 equally spaced colour changes per row.

There is going to be another hunk of RAM filled with triangles pre sorted into Y order. The job of the code that runs during 10+ pixel runs in the renderer is to turn those pre-sorted triangles into RLE data.

If everything goes to plan, I should be able to stay ahead of the electron beam. If I can't then the "here be dragons" part is going to abort the next line and show a black row, while the RLE-Filler-Upper gets a bit more of a chance.

There is no reason someone could not replace the 3D polygon RLE-Filler-Upper code with race-car-race-track-filler-upper code or Incoming-Missle-and-Explosion-filler-upper code.
User avatar
Jubatian
Posts: 1563
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Five clock per pixel RLE mode

Post by Jubatian »

CunningFellow wrote: Sat Oct 14, 2017 9:39 pmAre you not entertained?
Eh, I kind of expected a little more visual entertainment :D (Although Mode 74's first demo didn't have any better showoff either :) )

With running user code I am pondering since a while over the inline mixer. To swap it for the VSync mixer's code, and run user code from the HSync, but so far I didn't venture for that. A bit tricky with the "cbi" and "sbi" for the sync, and I would really like to make it C code compatible. It would be quite wild. Maybe with Mode3's extended resolution it could be viable to retain a 30 pixels wide Mode 3 (at 5.5 clocks per pixel) with such an extension. On a normal tiled mode this would mean you could Y sort your sprites to draw a lot more than the RAM (RAM tiles) would otherwise permit, the VSync mixer is also a lot more flexible than the HSync one (you could easily rip the current audio engine out and use your own on it, you just need to fill a buffer after all).

Sure if you can do such a vertically split region polygon renderer, this could be an amazing flat polygon mode on the "unexpanded" Uzebox!
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

Jubatian wrote: Sun Oct 15, 2017 7:41 am Eh, I kind of expected a little more visual entertainment :D (Although Mode 74's first demo didn't have any better showoff either :) )
Tornado2000s first HEX file posted also did not show off how amazing it could be

viewtopic.php?f=5&t=1905&start=30#p11690

It just proved that the idea was working. As long as you have got something working you can always improve on it :)
Jubatian wrote: Sun Oct 15, 2017 7:41 am Sure if you can do such a vertically split region polygon renderer, this could be an amazing flat polygon mode on the "unexpanded" Uzebox!
I've already counted out the clocks on the back of a napkin. Am certain I can do the flat polygons with an average of 8 changes per scanline with peaks of up to 32 changes per scanline as long as it is only a few lines in a row.

This should be enough for my two target games. Elite and Super Hexagon.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Five clock per pixel RLE mode

Post by CunningFellow »

Still not an exciting demo.

But this one is actually running code in the background.

The first section of the screen (Blue, Red, Green, Purple, White and Black) is being filled by C code before the video mode is called. It is filling the 256 byte buffer up.

The Black section below it is a feature/function I just added where you can command N totally blank line and get about 1200 free clocks per N.

The bottom section of the screen (Blue, Green, Red, White) is code running the background. It is just testing the RLE_Add function.

Not amazing visuals, but in the background code is proving more parts of the overall plan. Sadly I did lose 5 clocks per interrupt. So it is now (r-11)*5
Attachments
RLE_Demo2.uze
(10.77 KiB) Downloaded 685 times
Post Reply