Raycaster Experiment
Re: Raycaster Experiment
Awesome!!! And I didn't expect the music! Damn you Lee, now we totally need Doom on the Uzebox!
I think the resolution will have to be raised a bit if possible, it's a tad too pixalated and hard to see what's going on in the distance when the scene is open. I'll look at your code for sure.
I think the resolution will have to be raised a bit if possible, it's a tad too pixalated and hard to see what's going on in the distance when the scene is open. I'll look at your code for sure.
Re: Raycaster Experiment
Oh btw, just recall I have this awesome Doom NSF...really rocks! Always wanted to convert it, I guess it's time. It could require a new vibrato function in the sound engine though.
- Attachments
-
- OBE_-_Doom_Hangar(e1m1)1.5.zip
- (5.89 KiB) Downloaded 363 times
Re: Raycaster Experiment
Yes in my mind 64 seemed like it would be ok, but I was a bit disappointed. My newer version isn't working yet, but I believe my cycles are laid out almost right. It seems I can output a pixel every 12cycles from the scan line buffer. In between this calculating to put a pixel into the buffer for next line, seem I can do 1 pixel every 3 I have to output. So with triple lines, should be able to have a new 120 line ready by the time it's needed....am not ultra crafty with asm comparing my own to mode3 stuff hehe. Going crazy so I'm working on sprites now.
Wow that sounds great I can't even believe that is NES, I think a vibrato is quite important I just never wanted to make another feature request
Wow that sounds great I can't even believe that is NES, I think a vibrato is quite important I just never wanted to make another feature request
Re: Raycaster Experiment
Great! I think the even the small resolution is acceptable if we'd only use it for the main image but not for the game stats. You already know where I'm heading to plugable screen sections so modes can be mixed.
Re: Raycaster Experiment
That would be very nice, and a lot of cool things could be done then. It was quite easy to plug in mode 5 right after my video mode, so it seems possible and hopefully clean.
Oh BTW I correct myself..my idea on 120 resolution was totally wrong(obvious oversight). Not even sure I can figure out how to get 80.
Oh BTW I correct myself..my idea on 120 resolution was totally wrong(obvious oversight). Not even sure I can figure out how to get 80.
Re: Raycaster Experiment
this looks really awesome !!! WOW!
Life's too short to remove usb safely
Web: www.hwhardsoft.de
http://www.facebook.com/hwhardsoft
YouTube: http://www.youtube.com/user/hwhardsoft
Web: www.hwhardsoft.de
http://www.facebook.com/hwhardsoft
YouTube: http://www.youtube.com/user/hwhardsoft
Re: Raycaster Experiment
Oh my that looks superb.
You guys never cease to amaze me with the new stuff you come up with.
Cheers
Roukan / JIm
You guys never cease to amaze me with the new stuff you come up with.
Cheers
Roukan / JIm
Re: Raycaster Experiment
Best possible is 80 horizontal resolution at 18 cycles and it's not really much better...with unrolled loops it can be 14 cycles @ 102 resolution(requiring ~7k flash...). Probably a lame question but I can't figure out how to do something like this:
Code: Select all
#define i 0
#define j 0
.rept 2
.rept 102/2
lds r16,(j*64)+i
;do other stuff
#define i i+1
.endr
#define j j+1
.endr
Re: Raycaster Experiment
I thought about this last night and remember feeling like it was too complicated and limited to go any further. I sat down with the code late last night, and by 3:00am it turns out that the line buffer shuffling idea works better than I thought. 130 horizontal resolution is possible, which is more than double the resolution of the last video mode(it essentially works the same way). So with the higher resolution and the sprite support that was already there I am pretty psyched now. It basically works like this: Each scanline is triplicated or drawn 3 times just like the last version. The first time through the scanline we read precalculated pixel values from the line buffer, while simultaneously performing calculations to replenish the buffer for the next set of triplicated scanlines. We calculate 1 new pixel for every 2 that we take from the buffer. This all works at 11 cycles per pixel, and by the time we have drawn 2 pixels to the screen, we already have calculated and pushed a pixel for the next line to the stack. So by the time we have drawn 2 of the same scanline, we have all the pixels for the next scanline buffer on the stack. The first 2 lines are drawn with the exact same code, the third scanline is much simpler and just draws from the line buffer like the others, while simultaneously popping values from the stack and stuffing them back in the line buffer. I don't have a demo yet, and I have some stuff to finish before I tackle this again, but this code will work. I figure it is to a point where the work of a raycaster in assembly would be worth it.
This only works with inlined code as is and that takes a lot of flash space, which is at a premium because the textures take lots of space. Cunning, did you have any example code you could help me out with on this for ending the scanline with an interrupt?
Edit* late night coding...well I guess it takes 3 scanlines to fill the buffer back up this way so quadrupled scanlines.
Code: Select all
render_texture_line_PUSH:
;~130 horizontal pixels max. 128x96 @8bpp raycasting mode with sprite capabilities
ldi XL,lo8(vram);vram is laid out in 2 byte pairs. first byte is how high the wall is
ldi XH,hi8(vram);second byte is which vertical slice or column of the texture to use.
;using precalculated offsets for each possible wall height we achieve precalculated
;texture mapping simply based on x-offset and wall height which are automatically
;calculated just to have a raycaster anyways...ie. free texture mapping
;if MSBit of first byte is set then it is a ram_column so the 2 byte pair is a different format
;ram_columns must be blit before we get here.
ldi YL,lo8(line_buffer)
ldi YH,hi8(line_buffer)
clr r4
MAINLOOP_PUSH_0:
ld r18,Y+ ;get precalculated pixel from line buffer
out _SFR_IO_ADDR(DATA_PORT),r18 ;1 update the screen with it
ld ZL,X+ ;2 load LSB to offset column
ld ZH,X+ ;2 load MSB
sbrc ZH,7 ;1 if MSBit is set, it's a ram column
rjmp RAMLOOP_PUSH_0
;Z points to start of offset table column
ROMLOOP_PUSH_0:
add ZL,r20 ;1 add row offset
adc ZH,r4 ;1 add clear reg for carry bit
ld r18,Y+ ;2 get precalculated pixel from line buffer
out _SFR_IO_ADDR(DATA_PORT),r18 ;1 draw a precalculated pixel from earlier
lpm r16,Z ;3 get offset byte
ld ZL,X+ ;2 get address of texture column
ld ZH,X+ ;2
add ZL,r16 ;1 add offset table(which was adjusted for row)
ld r18,Y+ ;2 get another precalculated value
out _SFR_IO_ADDR(DATA_PORT),r18 ;1 update screen pixel
adc ZH,r4 ;1 add carry bit
lpm r16,Z ;3 get texture pixel based on scanline and column height
push r16 ;2 store calculated pixel onto stack for later
rjmp MAINLOOP_PUSH_1 ;2 jump into next section of inlined code
RAMLOOP_PUSH_0:
mov r16,ZL ;1 save ram column index
ld r18,Y+ ;load precalculated pixel from line buffer
out _SFR_IO_ADDR(DATA_PORT),r18
ldi ZL,lo8(ram_columns) ;1 get pointer to ram_columns
ldi ZH,hi8(ram_columns) ;1
ldi r17,96 ;1 multiply by vertical resolution
mul r16,r17 ;2 for which ram_column index we are on
nop ;1
add ZL,r20 ;1 add row offset based on scanline
adc ZH,r4 ;1
ld r18,Y+ ;2 load precalculated pixel from line buffer
out _SFR_IO_ADDR(DATA_PORT),r18 ;1 maintain our 11 cycle pixel rate
add ZL,r0 ;1 add offset into ram_column
adc ZH,r1 ;1
ld r16,Z+ ;2 now load the pixel from ram_columns
push r16 ;2 store calculated pixel to stack
rjmp MAINLOOP_PUSH_1 ;2 jump to next inlined section
MAINLOOP_PUSH_1:
ld r18,Y+;get precalculated pixel from line buffer
out _SFR_IO_ADDR(DATA_PORT),r18;
;....and so on, it's inlined for every pixel requiring a bit of flash space....
;the resolution is pretty nice though :)
;......
;...
;..
MAINLOOP_PUSH_129:
;...
;..
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
render_texture_line_POP:
;3rd scanline in the series. Here we turn all our calculated values that are on the stack
;into usable data in the line_buffer for the next group of 3 scanlines.
ldi YL,lo8(line_buffer);setup our read pointer
ldi YH,hi8(line_buffer)
ldi ZL,lo8(line_buffer);setup our write pointer
ldi ZH,lo8(line_buffer)
clr r4
ldi 19,VRAM_COLUMNS/2
ld r18,Y+ ;get first precalculated pixel
MAINLOOP_POP:
out _SFR_IO_ADDR(DATA_PORT),r18 ;update screen
ld r17,Y+ ;get precalculated pixel from line buffer
ld r18,Y+ ;get another one so we don't have to shuffle pointers twice
pop r16 ;get precalculated pixel from stack
movw XL,YL ;save offset in line buffer we are drawing from
movw YL,ZL ;get offset in line buffer we are drawing to
st r16,YL ;store calculated value from stack to buffer
out _SFR_IO_ADDR(DATA_PORT),r17 ;update screen
pop r16 ;get another one
st r16,YL ;store it, we are ahead of the game now
movw YL,XL ;restore pointer to where we are drawing from in line buffer
dec r19
rjmp .
brne MAINLOOP_POP
;we maintain 11 cycles per pixel throughout entire 3 scanline process.
Edit* late night coding...well I guess it takes 3 scanlines to fill the buffer back up this way so quadrupled scanlines.
-
- Posts: 1445
- Joined: Mon Feb 11, 2013 8:08 am
- Location: Brisbane, Australia
Re: Raycaster Experiment
It's a little bit of dicking around. Once you figure out how it works it makes things easier though, as you no longer have to count pixels.D3thAdd3r wrote:Cunning, did you have any example code you could help me out with on this for ending the scanline with an interrupt?
You have to Enable Timer1 Overflow Interrupt.
Change the TCNT1 value to something that will cause the interrupt to happen Just AS the last pixel is put out.
SEI
Go output all you pixels without caring how many you have done.
Pop 2 values off the stack (the RETI ones that the int call places there)
Repeat till all scan lines done.
Disable TOV1 and restore TCNT and OCIE to what will make the normal HSync things work.
If you are having problems with too many NOPS because of branch equalization - that sounds like you have a job for IJMP. IJMP can process multiple decisions in 2 clocks PLUS the overhead to getting your decisions into Z. It is also the "LOOP" rjmp at the same time. You are basically trading "loop unrolling" for "decision unrolling". In T2K I had 1 of 512 decisions to make with each being 16 words. So my complete "decision unroll" was 16 kilobytes BUT it did allow me to do a multi colour RAMTile screen with a background BMP from SD card in 5 clocks per pixel.
I don't really understand what your trying to do yet, I will have to look at the code a bit more, but I am more than willing to help you speed it up.
The first things I would be looking at trying to eliminate would be memory moves and branches.
60 pixels across the screen is 24 clocks per pixel - are you trying to sneak some bitcoin mining in there ?