Raycaster Experiment

Use this forum to share and discuss Uzebox games and demos.
Post Reply
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Raycaster Experiment

Post by uze6666 »

Awesome!!! And I didn't expect the music! Damn you Lee, now we totally need Doom on the Uzebox! :D

I think the resolution will have to be raised a bit if possible, it's a tad too pixalated and hard to see what's going on in the distance when the scene is open. I'll look at your code for sure.
User avatar
uze6666
Site Admin
Posts: 4801
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Raycaster Experiment

Post by uze6666 »

Oh btw, just recall I have this awesome Doom NSF...really rocks! Always wanted to convert it, I guess it's time. It could require a new vibrato function in the sound engine though.
Attachments
OBE_-_Doom_Hangar(e1m1)1.5.zip
(5.89 KiB) Downloaded 363 times
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Raycaster Experiment

Post by D3thAdd3r »

Yes in my mind 64 seemed like it would be ok, but I was a bit disappointed. My newer version isn't working yet, but I believe my cycles are laid out almost right. It seems I can output a pixel every 12cycles from the scan line buffer. In between this calculating to put a pixel into the buffer for next line, seem I can do 1 pixel every 3 I have to output. So with triple lines, should be able to have a new 120 line ready by the time it's needed....am not ultra crafty with asm comparing my own to mode3 stuff hehe. Going crazy so I'm working on sprites now.

Wow that sounds great I can't even believe that is NES, I think a vibrato is quite important I just never wanted to make another feature request :lol:
User avatar
Janka
Posts: 214
Joined: Fri Sep 21, 2012 10:46 pm
Location: inside Out

Re: Raycaster Experiment

Post by Janka »

Great! I think the even the small resolution is acceptable if we'd only use it for the main image but not for the game stats. You already know where I'm heading to ;) plugable screen sections so modes can be mixed.
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Raycaster Experiment

Post by D3thAdd3r »

That would be very nice, and a lot of cool things could be done then. It was quite easy to plug in mode 5 right after my video mode, so it seems possible and hopefully clean.

Oh BTW I correct myself..my idea on 120 resolution was totally wrong(obvious oversight). Not even sure I can figure out how to get 80.
User avatar
Harty123
Posts: 467
Joined: Wed Jan 12, 2011 9:30 pm
Location: PM, Germany
Contact:

Re: Raycaster Experiment

Post by Harty123 »

this looks really awesome !!! WOW!
User avatar
Roukan
Posts: 113
Joined: Sat Oct 27, 2012 7:50 pm
Location: Lancashire / England

Re: Raycaster Experiment

Post by Roukan »

Oh my that looks superb.

You guys never cease to amaze me with the new stuff you come up with.


Cheers

Roukan / JIm
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Raycaster Experiment

Post by D3thAdd3r »

Best possible is 80 horizontal resolution at 18 cycles and it's not really much better...with unrolled loops it can be 14 cycles @ 102 resolution(requiring ~7k flash...). Probably a lame question but I can't figure out how to do something like this:

Code: Select all

#define i 0
#define j 0
.rept 2
.rept 102/2

lds r16,(j*64)+i
;do other stuff

#define i i+1
.endr
#define j j+1
.endr
User avatar
D3thAdd3r
Posts: 3222
Joined: Wed Apr 29, 2009 10:00 am
Location: Minneapolis, United States

Re: Raycaster Experiment

Post by D3thAdd3r »

I thought about this last night and remember feeling like it was too complicated and limited to go any further. I sat down with the code late last night, and by 3:00am it turns out that the line buffer shuffling idea works better than I thought. 130 horizontal resolution is possible, which is more than double the resolution of the last video mode(it essentially works the same way). So with the higher resolution and the sprite support that was already there I am pretty psyched now. It basically works like this: Each scanline is triplicated or drawn 3 times just like the last version. The first time through the scanline we read precalculated pixel values from the line buffer, while simultaneously performing calculations to replenish the buffer for the next set of triplicated scanlines. We calculate 1 new pixel for every 2 that we take from the buffer. This all works at 11 cycles per pixel, and by the time we have drawn 2 pixels to the screen, we already have calculated and pushed a pixel for the next line to the stack. So by the time we have drawn 2 of the same scanline, we have all the pixels for the next scanline buffer on the stack. The first 2 lines are drawn with the exact same code, the third scanline is much simpler and just draws from the line buffer like the others, while simultaneously popping values from the stack and stuffing them back in the line buffer. I don't have a demo yet, and I have some stuff to finish before I tackle this again, but this code will work. I figure it is to a point where the work of a raycaster in assembly would be worth it.

Code: Select all

render_texture_line_PUSH:			
;~130 horizontal pixels max. 128x96 @8bpp raycasting mode with sprite capabilities

	ldi XL,lo8(vram);vram is laid out in 2 byte pairs. first byte is how high the wall is
	ldi XH,hi8(vram);second byte is which vertical slice or column of the texture to use.

	;using precalculated offsets for each possible wall height we achieve precalculated
	;texture mapping simply based on x-offset and wall height which are automatically
	;calculated just to have a raycaster anyways...ie. free texture mapping
	;if MSBit of first byte is set then it is a ram_column so the 2 byte pair is a different format
	;ram_columns must be blit before we get here.

	ldi YL,lo8(line_buffer)
	ldi YH,hi8(line_buffer)
	clr r4

MAINLOOP_PUSH_0:  
	ld r18,Y+					;get precalculated pixel from line buffer
	out _SFR_IO_ADDR(DATA_PORT),r18	;1 update the screen with it

	ld ZL,X+					;2 load LSB to offset column
	ld ZH,X+					;2 load MSB
	
	sbrc ZH,7					;1 if MSBit is set, it's a ram column
	rjmp RAMLOOP_PUSH_0		
	;Z points to start of offset table column
ROMLOOP_PUSH_0:
	add ZL,r20				  	;1 add row offset
	adc ZH,r4					;1 add clear reg for carry bit
	
	ld r18,Y+					;2 get precalculated pixel from line buffer
	out _SFR_IO_ADDR(DATA_PORT),r18	;1 draw a precalculated pixel from earlier

	lpm r16,Z					;3 get offset byte

	ld ZL,X+					;2 get address of texture column
	ld ZH,X+					;2
	add ZL,r16					;1 add offset table(which was adjusted for row)

	ld r18,Y+					;2 get another precalculated value
	out _SFR_IO_ADDR(DATA_PORT),r18	;1 update screen pixel 

	adc ZH,r4					;1 add carry bit
	lpm r16,Z					;3 get texture pixel based on scanline and column height
	push r16	  				;2 store calculated pixel onto stack for later

	rjmp MAINLOOP_PUSH_1			;2 jump into next section of inlined code
									
RAMLOOP_PUSH_0:
	mov r16,ZL					;1 save ram column index

	ld r18,Y+					;load precalculated pixel from line buffer
	out _SFR_IO_ADDR(DATA_PORT),r18

	ldi ZL,lo8(ram_columns)		;1 get pointer to ram_columns
	ldi ZH,hi8(ram_columns)		;1

	ldi r17,96					;1 multiply by vertical resolution
	mul r16,r17				;2 for which ram_column index we are on
	nop						;1
	add ZL,r20					;1 add row offset based on scanline
	adc ZH,r4					;1

	ld r18,Y+					;2 load precalculated pixel from line buffer
	out _SFR_IO_ADDR(DATA_PORT),r18	;1 maintain our 11 cycle pixel rate

	add ZL,r0					;1 add offset into ram_column
	adc ZH,r1					;1
	ld  r16,Z+					;2 now load the pixel from ram_columns
	
	push r16					;2 store calculated pixel to stack
	rjmp MAINLOOP_PUSH_1			;2 jump to next inlined section


MAINLOOP_PUSH_1:  

	ld r18,Y+;get precalculated pixel from line buffer
	out _SFR_IO_ADDR(DATA_PORT),r18;

	;....and so on, it's inlined for every pixel requiring a bit of flash space....
	;the resolution is pretty nice though :)
	;......
	;...
	;..
MAINLOOP_PUSH_129:
	;...
	;..

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


render_texture_line_POP:
	;3rd scanline in the series. Here we turn all our calculated values that are on the stack
	;into usable data in the line_buffer for the next group of 3 scanlines.

	ldi YL,lo8(line_buffer);setup our read pointer
	ldi YH,hi8(line_buffer)
	ldi ZL,lo8(line_buffer);setup our write pointer
	ldi ZH,lo8(line_buffer)

	clr r4
	ldi 19,VRAM_COLUMNS/2
	ld r18,Y+					;get first precalculated pixel

MAINLOOP_POP:
	out _SFR_IO_ADDR(DATA_PORT),r18	;update screen
	ld r17,Y+					;get precalculated pixel from line buffer
	ld r18,Y+					;get another one so we don't have to shuffle pointers twice
	pop r16					;get precalculated pixel from stack
	movw XL,YL					;save offset in line buffer we are drawing from
	movw YL,ZL					;get offset in line buffer we are drawing to
	st r16,YL					;store calculated value from stack to buffer
	out _SFR_IO_ADDR(DATA_PORT),r17	;update screen
	pop r16					;get another one
	st r16,YL					;store it, we are ahead of the game now
	movw YL,XL					;restore pointer to where we are drawing from in line buffer
	dec r19
	rjmp .
	brne MAINLOOP_POP

	;we maintain 11 cycles per pixel throughout entire 3 scanline process.

This only works with inlined code as is and that takes a lot of flash space, which is at a premium because the textures take lots of space. Cunning, did you have any example code you could help me out with on this for ending the scanline with an interrupt?

Edit* late night coding...well I guess it takes 3 scanlines to fill the buffer back up this way so quadrupled scanlines.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Raycaster Experiment

Post by CunningFellow »

D3thAdd3r wrote:Cunning, did you have any example code you could help me out with on this for ending the scanline with an interrupt?
It's a little bit of dicking around. Once you figure out how it works it makes things easier though, as you no longer have to count pixels.

You have to Enable Timer1 Overflow Interrupt.
Change the TCNT1 value to something that will cause the interrupt to happen Just AS the last pixel is put out.
SEI
Go output all you pixels without caring how many you have done.
Pop 2 values off the stack (the RETI ones that the int call places there)
Repeat till all scan lines done.
Disable TOV1 and restore TCNT and OCIE to what will make the normal HSync things work.

If you are having problems with too many NOPS because of branch equalization - that sounds like you have a job for IJMP. IJMP can process multiple decisions in 2 clocks PLUS the overhead to getting your decisions into Z. It is also the "LOOP" rjmp at the same time. You are basically trading "loop unrolling" for "decision unrolling". In T2K I had 1 of 512 decisions to make with each being 16 words. So my complete "decision unroll" was 16 kilobytes BUT it did allow me to do a multi colour RAMTile screen with a background BMP from SD card in 5 clocks per pixel.

I don't really understand what your trying to do yet, I will have to look at the code a bit more, but I am more than willing to help you speed it up.

The first things I would be looking at trying to eliminate would be memory moves and branches.

60 pixels across the screen is 24 clocks per pixel - are you trying to sneak some bitcoin mining in there ?
Post Reply