Missing first scanline on the uzem140 branch

The Uzebox now have a fully functional emulator! Download and discuss it here.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Missing first scanline on the uzem140 branch

Post by Artcfox »

When running the uzem140 branch of the emulator, it sometimes misses the first scanline, and the entire frame jumps up a pixel and then jumps back down again. This also causes input captures using Uzem from the master branch to de-synchronize when playing them back on the uzem140 branch and vice-versa. :(

If you change the playback speed to 0.25x, you can see it happen very clearly in this video: https://youtu.be/Na5ePkX9mQY?t=72

This happens often and very reliably on certain levels of the game I'm working on, but it does not happen at all with the uzem from the master branch.

I ran git bisect telling it that the uzem140 branch was bad, and the master branch was good, and it automatically picked a commit in the middle and I tested that using a short input capture that I had saved. If the test passed, I would type git bisect good, if it failed, I would type git bisect bad. Each time it would automatically checkout a commit in the middle and tell me to test that one and report whether it was bad or good. Eventually it stopped and told me exactly which commit had introduced the error:

Code: Select all

9d65b0e0a98336dc718fae01703f77a25863699f is the first bad commit
commit 9d65b0e0a98336dc718fae01703f77a25863699f
Author: Uze <uze@belogic.com>
Date:   Fri Aug 7 01:14:05 2015 -0400

    work on timing
With the help of a friend (it was actually the first thing that he had pointed out when I showed him the code!) and that git bisect, we found the bug in the avr8::update_hardware function.

In the working code, TIFR1 was updated directly, but that commit introduced a tempTIFR1 variable that sometimes gets written into (and sometimes not), and then TIFR gets updated with tempTIFR at the end of that function. The problem lies further down in that function when trigger_interrupt() gets called. Right before the call to trigger_interrupt, TIFR1 is updated directly, however when it returns from trigger_interrupt() it will eventually get to the bottom of the update_hardware function, and overwrite TIFR1 with whatever was stored in tempTIFR1. Additionally, trigger_interrupt calls back into update_hardware, which means that when all is said and done, TIFR1 has who-knows-what stored inside it, and then we'll sometimes miss the first scanline, and input captures won't be recorded correctly.

Setting a flag when trigger_interrupt gets called and then only writing the tempTIFR1 back into TIFR1 at the end of the function if that flag is not set did not solve the problem, so I went ahead and removed that tempTIFR1 variable completely, so it just always writes directly into TIFR1, and I added back in the two conditionals that got dropped:

Code: Select all

if (TIMSK1 & OCIE1B)
and

Code: Select all

if (TIMSK1 & OCIE1A)
to make this part of the new code behave the same way that the old (working) code did.

Lo and behold, that solved both the missing scanline problem (I verified this by playing through my entire WIP), and I was able to use a 25+ minute long input capture file recorded from one version with the other version, and they stayed perfectly in sync! :D

Here is exactly what I did to the code to make the fix:

Code: Select all

diff --git a/tools/uzem/avr8.cpp b/tools/uzem/avr8.cpp
index b1e5f62..4679826 100644
--- a/tools/uzem/avr8.cpp
+++ b/tools/uzem/avr8.cpp
@@ -632,24 +632,17 @@ void avr8::update_hardware(int cycles)
 		TCNT1 = TCNT1L | (TCNT1H<<8);
 		OCR1A = OCR1AL | (OCR1AH<<8);
 		OCR1B = OCR1BL | (OCR1BH<<8);
-		tempTIFR1=TIFR1;
 
 		if(TCCR1B & WGM12){ //timer in CTC mode: count up to OCRnA then resets to zero
 
 			if (TCNT1 > (0xFFFF - cycles)){
 
-				 //TIFR1|=TOV1; //overflow interrupt
-				tempTIFR1|=TOV1; //overflow interrupt
+				 TIFR1|=TOV1; //overflow interrupt
 
 			}
 
 			if (TCNT1 <= OCR1B && (TCNT1 + cycles) > OCR1B){
-				u16 tmp=TCNT1;
-				tmp-=(OCR1B - cycles + 1);
-
-				if(tmp==0){
-					tempTIFR1|=OCF1B; //CTC match flag interrupt
-				}else{
+				if (TIMSK1 & OCIE1B){
 					TIFR1|=OCF1B; //CTC match flag interrupt
 				}
 
@@ -658,11 +651,7 @@ void avr8::update_hardware(int cycles)
 			if (TCNT1 <= OCR1A && (TCNT1 + cycles) > OCR1A){
 				TCNT1 -= (OCR1A - cycles + 1);
 
-				if(TCNT1==0){
-					//if timer rolls over during the last cycle of an instruction, the int is acknowledged on the next machine cycle
-					tempTIFR1|=OCF1A; //CTC match flag interrupt
-				}else{
-					//otherwise the int is acknowledged for multi-cycles instructions
+				if (TIMSK1 & OCIE1A){
 					TIFR1|=OCF1A; //CTC match flag interrupt
 				}
 
@@ -674,8 +663,7 @@ void avr8::update_hardware(int cycles)
 
 			if (TCNT1 > (0xFFFF - cycles)){
 				if (TIMSK1 & TOIE1){
-					 //TIFR1|=TOV1; //overflow interrupt
-					tempTIFR1|=TOV1; //overflow interrupt
+					TIFR1|=TOV1; //overflow interrupt
 				}
 			}
 			TCNT1 += cycles;
@@ -820,7 +808,6 @@ void avr8::update_hardware(int cycles)
 
 
 	TCCR1B=newTCCR1B; //Timer increments starts after executing the instruction that sets TCCR1B
-	TIFR1=tempTIFR1;
 }
 
 u8 avr8::exec()
I'm not sure exactly what impact this will have on the timing, but I think it beats the broken implementation that drops scanlines and breaks compatibility with previously recorded input captures. (Based on that alone, I would guess that this patch improves the timing?)

Thoughts?
User avatar
uze6666
Site Admin
Posts: 4778
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Missing first scanline on the uzem140 branch

Post by uze6666 »

Ha, yeah quite possible I introduced some weird bug. These commits were done so uzem is cycle perfect. The original author decided to not make the emulation 1 cycle at a time but emulate per instruction. Although this makes thing faster a bit it becomes a nightmare when you consider I/O and timer continue to work during instructions and can generate ints etc. With Tornado 2000 and mode 13, cycle perfect emulation was required to avoid jitters. Specifically I/O reading must happen after the first cycle of a 2 cycles instruction etc. So I will have to review you fix to insure it does not break timing. Only one cycle off will not show in uzem but can cause jitter in modes 13.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Artcfox »

Okay. I fully expect that you may have to make changes to my patch to make it cycle perfect.

I just can't agree that it was cycle perfect when it often drops an entire scanline. ;)

Having update_hardware call trigger_interrupt, which may recurse back into update_hardware looks like it would be tricky to get correct.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Artcfox »

I think I just solved this bug here, while I was in the process of trying to explain why my patch was so heavy-handed. Hooray for rubber duck debugging!
User avatar
Jubatian
Posts: 1560
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Jubatian »

I got an idea over this problem.
Uze6666 wrote:Specifically I/O reading must happen after the first cycle of a 2 cycles instruction etc.
Is there anything else you know of? I can not really imagine anything else.

If only this is the problem, that there are a few instructions whose effect on the hardware is carried out with an offset relative to the start of the instruction, that could be solved without much hassle, with negligible effect on the emulator's performance (I don't know the code well enough yet though to see how easy it would be to implement on the hardware emulation part).

Basically add an another variable, lets name it "cycleseff" which defaults to zero when starting the instruction's decoding. Those instructions whose effect (write) is not at zero would set it to whatever necessary. Then on the hardware side, consider not only that "cycles" cycles were elapsed, but also how much their effect is offset, so the problem would be solved.

What I see is that this would only work for writes since they don't feed back into whatever the instruction did. For reads this solution wouldn't work (if the instruction would sample the input with some offset, and some hardware would change it before the instruction's execution arrived to fetching it).
User avatar
Jubatian
Posts: 1560
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Jubatian »

A bit of gardening helps stirring thoughts.

The emulator's code as it is now feels suggesting an another probably possible way to get things cycle perfect. The update_hardware function is now called right after the instruction decoder, and it even has a cycles input. What if it was called within the instruction decoder at the appropriate places, so the hardware's state progresses proper with the instruction's decoding? The only thing which seems to get in the way is trigger_interrupt, which would hinder this approach as now, mangling the processing instruction. It doesn't seem a huge deal of work to solve this.

What also seems probably getting in the way that as of now I have the impression that some hacks to get nearer to cycle perfection sit in the write_io and read_sram_ld functions, which would mangle things when attempting such a change.

Anyone could point me to some docs or anything describing the sub-instruction timing peculiarities of the ATMega? Or summarize what was attempted on this field? I might try to do a few things about this (will take a longer while, sure), and need some info to check against. Maybe games and apps to test with, to see how the system reacts.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Artcfox »

That sounds much better than the idea I had with the abstract syntax tree.

Uze (uze6666) would be the person to ask about the current instruction timings. I believe he used a combination of an official Atmel tool, and simulavr to verify timings.
User avatar
uze6666
Site Admin
Posts: 4778
Joined: Tue Aug 12, 2008 9:13 pm
Location: Montreal, Canada
Contact:

Re: Missing first scanline on the uzem140 branch

Post by uze6666 »

A bit of gardening helps stirring thoughts.

The emulator's code as it is now feels suggesting an another probably possible way to get things cycle perfect. The update_hardware function is now called right after the instruction decoder, and it even has a cycles input. What if it was called within the instruction decoder at the appropriate places, so the hardware's state progresses proper with the instruction's decoding? The only thing which seems to get in the way is trigger_interrupt, which would hinder this approach as now, mangling the processing instruction. It doesn't seem a huge deal of work to solve this.
I would have to look at it, but I won't have enough time in the next few days. If you guys can think of something feel free to try it out.
What also seems probably getting in the way that as of now I have the impression that some hacks to get nearer to cycle perfection sit in the write_io and read_sram_ld functions, which would mangle things when attempting such a change.
Your impression is right, read_sram_ld functions is a ugly hack. I hoped to find something else better but was getting fed up at the time. :?
Anyone could point me to some docs or anything describing the sub-instruction timing peculiarities of the ATMega? Or summarize what was attempted on this field? I might try to do a few things about this (will take a longer while, sure), and need some info to check against. Maybe games and apps to test with, to see how the system reacts.
None that I am aware of. The only way I could find was to trace the code, clock by clock, with the AVR Simulator in Atmel Studio. I had read on AvrFreaks that the emulation code is directly generated using the MCU engineering models, so they are like the real thing.
User avatar
Jubatian
Posts: 1560
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Missing first scanline on the uzem140 branch

Post by Jubatian »

I had a quarter hour to spare, so strolled through the current code.

It seems like the cycle perfection hacks are buggy even now! They actually do what I outlined (that is, call update_hardware within the decoder) for the timer register, without considering that update_hardware could trigger interrupts, potentially causing an interrupt entry while the instruction is still decoding.

What do you think about my instruction decoder hacks? ( https://github.com/Jubatian/uzebox/tree/uzem140-hacks ) If OK, it should be a good time to merge them. I think as soon as I get home I will sit down to code, to get this cycle perfection completed the "right" way, but I would prefer to go on starting from my changes on the decoder. As far as I see the underlaying problem is simple: the 2 cycle instructions accessing I/O read or write in their second cycle and not in their first, the only real problem to solve is to properly code the interrupt triggering mechanism so it still fires an interrupt only when an instruction is completed.
CunningFellow
Posts: 1445
Joined: Mon Feb 11, 2013 8:08 am
Location: Brisbane, Australia

Re: Missing first scanline on the uzem140 branch

Post by CunningFellow »

Is the way to make the interrupts only fire when an instruction is complete the same way hardware does it in a lot of cases?

The thing the wants the interrupt to happen can't make it happen. It can only set a flag to request an interrupt.

Then the only place that can actually jump to the interrupt (when requested) is the instruction decoder, and it can only do that at the 0th tick.
Post Reply