Bootloader API library with SDHC and FAT32

Topics related to the API, programming discussions & questions, coding tips, bugs, etc. should go here.
User avatar
nicksen782
Posts: 714
Joined: Wed Feb 01, 2012 8:23 pm
Location: Detroit, United States
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by nicksen782 »

Ha! My work is outdated! So, yeah, read my stuff for fun... use the new FS library for real life.

Thanks Jubatian! I like the new FS library.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Jubatian »

nicksen782 wrote: Fri Mar 30, 2018 2:05 pmHa! My work is outdated! So, yeah, read my stuff for fun... use the new FS library for real life.
It was still a good idea! Without that, I would have had to search for something actually writing the SD to see an example. So I took the code you mentioned in PFF and some sites on SD interfacing to assemble it properly.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Artcfox »

So I was planning on using this library to get the start sector based off a filename, and then manually sending a CMD17 in order to be able to read some data from various offsets within that sector without having to store the entire sector in RAM (since I certainly don't have 512 bytes free).

But, folks here have mentioned being able to use RAM tiles as their 512 byte buffer, which seems a lot easier, and then I can keep the CRC checking on, but I'm not 100% certain where in the code I can use the RAM tile memory as a buffer without it clobbering anything, especially since I pulled ProcessSprites() and RestoreBackground() out of the kernel's domain, and manually call them in my code.

My main loop looks like this now:

Code: Select all

  for (;;) {
    ProcessSprites();
    WaitVsync(1);
    RestoreBackground();
    ReadControllers();

    Player_input(&p);
    Player_update(&p);
    
    int16_t targetX = nearestScreenPixel(p.x) - ((SCREEN_TILES_H * TILE_WIDTH) / 2) + TILE_WIDTH;
    int16_t targetY = nearestScreenPixel(p.y) - ((SCREEN_TILES_V * TILE_HEIGHT) / 2) + TILE_HEIGHT;
    Camera_moveTo(&c, targetX, targetY);

    Camera_update(&c);
    Player_render(&p, &c);
  }
The Player_input function reads the state of the controller, but it also calls GetTile() in order to do some advanced input filtering based on the current contents of VRAM.

The Player_update function definitely calls GetTile(), because that's what calculates the physics, and does the collision checking based on the current contents of VRAM.

The Camera_moveTo function just does validation on a camera position and sets internal variables.

Inside the Camera_update function is where I'd want to call SDC_Read_Sector and have access to the 512 byte buffer, so I can zip through it and write either a row or column, or a row and column of tiles in VRAM, and maybe iterate through the last half or 3/4 of that buffer to do other level related things with collectables, enemies, and maybe triggers.

Inside the Player_render function is where I call:

Code: Select all

  MapSprite2(0, map, flags | SPRITE_BANK0);
  MoveSprite(0, nearestScreenPixel(p->x) - c->x - 8, nearestScreenPixel(p->y) - c->y - 24, 3, 4);
Is calling SDC_Read_Sector inside the Camera_update function something that's possible, or am I just out of luck trying to use SDC_Read_Sector like this?

Also, I was wondering if calling SDC_Read_Sector will happen with the SPI speed set to the fastest allowed, because the comments indicate that it may use a slower clock speed, and this may be called 60 times per second (to read 1 randomly accessed sector per frame). If it's too slow I'm thinking that I'll try disabling the CRC checks, but I believe that I have worked out a level encoding that only requires a single sector read for 8-way scrolling and storing additional information, so I'm hoping that I can keep the CRCs on so my level data is reliable, I just want to make sure that the SPI speed is as fast as possible, because there is a lot of other stuff my game will need to do every frame.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Jubatian »

The SDC* functions don't set SPI speed, only the FS* functions do so. So if you want, you can do low-level SD access at full speed with the SDC* functions.

If you want to use the RAM tiles as a buffer, you have to do it after RestoreBackground(), but before ProcessSprites(). In this interval, since the RAM tiles are already swept off from the VRAM and the sprite engine did not start yet, their memory area is completely unused. However if you fail to do the job within one VBlank, sprites will disappear from the screen for the next frame (as RestoreBackground() removed them). So what you want to do seems fine for the task.

Keep in mind though that CRC calculation is slow since it is done by a heavily size optimized routine (to fit in the bootloader), so bumping up SPI speed to max won't help you much (I actually chose SPI speed to match the performance of the CRC routine as there was no reason to try to push it faster if it would have to wait anyway for the CRC to finish: throttling down SPI speed also makes things more stable, so it was a sensible thing to do). So if you really want to be fast & robust at the same time, you would have to add own CRC routines using tables.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Artcfox »

Jubatian wrote: Sun Apr 01, 2018 9:19 am The SDC* functions don't set SPI speed, only the FS* functions do so. So if you want, you can do low-level SD access at full speed with the SDC* functions.

If you want to use the RAM tiles as a buffer, you have to do it after RestoreBackground(), but before ProcessSprites(). In this interval, since the RAM tiles are already swept off from the VRAM and the sprite engine did not start yet, their memory area is completely unused. However if you fail to do the job within one VBlank, sprites will disappear from the screen for the next frame (as RestoreBackground() removed them). So what you want to do seems fine for the task.

Keep in mind though that CRC calculation is slow since it is done by a heavily size optimized routine (to fit in the bootloader), so bumping up SPI speed to max won't help you much (I actually chose SPI speed to match the performance of the CRC routine as there was no reason to try to push it faster if it would have to wait anyway for the CRC to finish: throttling down SPI speed also makes things more stable, so it was a sensible thing to do). So if you really want to be fast & robust at the same time, you would have to add own CRC routines using tables.
Sweet! :D :D :D

So that means I have metric crap tons of scratch RAM available during most of the execution of my game! :shock: I think this realization will change a lot, because it opens up a lot more possibilities for things that need temporary scratch space.

For getting a proof of concept working with no CRC, but using the bootlib functions it looks like I could just do:

Code: Select all

// Call only once outside my main loop
FS_Init(&sds);
uint32_t cluster = FS_Find(&sds,
	    ((u16)('B') << 8) |
	    ((u16)('I')     ),
	    ((u16)('G') << 8) |
	    ((u16)('M')     ),
	    ((u16)('A') << 8) |
	    ((u16)('P')     ),
	    ((u16)(0) << 8) |    // demo doesn't make it clear what to do when filename is < 8 chars, so this is a guess
	    ((u16)(0)     ),
	    ((u16)('L') << 8) |
	    ((u16)('V')     ),
	    ((u16)('L') << 8) |
	    ((u16)(0) ));
FS_Select_Cluster(&sds, cluster);
uint32_t startSector = FS_Get_Sector(&sds);
SPI_Set_Max();
SDC_CRC_Enable(&sds, FALSE);

// Call inside Camera_update
SDC_Read_Sector(&sds, SDC_Command_Address(&sds, startSector + calculatedSectorOffset));
and then I can do what I need to with the 512 byte buffer, write to VRAM, etc.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Jubatian »

Artcfox wrote: Sun Apr 01, 2018 10:01 amFor getting a proof of concept working with no CRC, but using the bootlib functions it looks like I could just do: (...)
A problem with this is that the bootloader library routines won't give up calculating the CRC, so they will remain as slow as they were. I can not realistically implement such as when the filesystem routines are used, it is important that they transfer to the bootloader when it is available, so the bootloader can do SDHC, FAT32 and fragmentation. There the CRC keeps being calculated. It won't do harm, but it means that things may be fast or slow depending on whether you have the bootloader or not, so the performance would be inconsistent.

If you really need speed with certain routines, you have to implement an SDC_Read_Sector equivalent yourself and use that in performance-critical paths. The library has support for such (including mitigating SDSC / SDHC differences), you may do it by starting off with what is in the library.

If you do this for reading, by the way, you don't even have to turn off CRC. You would use SDC_Command then to send the command (which calculates CRC), then you can read the sector at max speed, simply ignoring the returned CRC.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Artcfox »

So turning the CRC off only turns it off in the case where the bootloader is not available and it has to call the fallback routines? And the code in kernel/bootlib.[hs] is the fallback library?
Jubatian wrote: Sun Apr 01, 2018 11:26 amIf you really need speed with certain routines, you have to implement an SDC_Read_Sector equivalent yourself and use that in performance-critical paths. The library has support for such (including mitigating SDSC / SDHC differences), you may do it by starting off with what is in the library.

If you do this for reading, by the way, you don't even have to turn off CRC. You would use SDC_Command then to send the command (which calculates CRC), then you can read the sector at max speed, simply ignoring the returned CRC.
That wouldn't slow down the SD card hardware at all because it is calculating the CRC in hardware? It would just add a minor slowdown calculating the CRC for the SDC_Command it has to send for the CMD17?

Okay, so it looks like what I need is a local copy of SDC_Read_Sector that skips the CRC check. SPI_Set_Max (which I don't need to copy) should make that local copy go fast, and as long as I rename the local copy and skip the first part, it will ensure that the local copy will be called instead of the bootloader's version. I don't need retries. I just figure using your asm version would be faster than using my C implementation of sending a CMD17 (or a C version that calls SDC_Command), because it looks like you are using the Z register in a fancy way that I don't think I can get by writing this in C.

I just have no idea if the rcall stuff will work unless I also copy those things to my local copy (or will those work across different .s files?), and any fallthrough stuff probably needs to be copied after the functions as well. I've never made my own .s file before, which is why I am so apprehensive about just hacking on any of your asm stuff and expecting my result to work. (It also calls rjmp to different asm functions, so I'd probably need to copy those too, unless rjmp can work across different .s files.)

Since this might be a common request, would it be possible for you to add a (conditionally included, so no bloat) SDC_Read_Sector_Fast_No_CRC function to bootlib.[hs] that is biased toward speed vs size, uses the max spi speed, doesn't retry, and doesn't calculate the CRC. I think I could see a lot of use cases for that versus trying to mix multiple libraries together.

It is a personal goal of mine to be able to make and call my own .s files but I want to start with baby steps, like a wholly self contained function with no clobbers, then a few clobbers with proper save/restore before and after, then something that can accept C arguments, and one that can return something to C, and one that can modify a pointer argument passed in from C, then making and defining an ISR within the .s file.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Jubatian »

Artcfox wrote: Sun Apr 01, 2018 2:38 pmSo turning the CRC off only turns it off in the case where the bootloader is not available and it has to call the fallback routines? And the code in kernel/bootlib.[hs] is the fallback library?
No, it turns CRC checking off on the SD card by sending it the appropriate command. The library is completely unaware of whether CRC was turned off or not. The effect of turning off CRC is on the SD card: If CRC is off, the card will no longer check the CRCs, so becomes unable to detect an error in transmission. It neither affects the speed of the SD card.
Artcfox wrote: Sun Apr 01, 2018 2:38 pmI just have no idea if the rcall stuff will work unless I also copy those things to my local copy (or will those work across different .s files?)
rcall is a short range call, so it is only likely to work within one assembly file (unless you got into messing with linker scripts to ensure that a group of files are within each other's reach).
Artcfox wrote: Sun Apr 01, 2018 2:38 pmSince this might be a common request, would it be possible for you to add a (conditionally included, so no bloat) SDC_Read_Sector_Fast_No_CRC function to bootlib.[hs] that is biased toward speed vs size, uses the max spi speed, doesn't retry, and doesn't calculate the CRC. I think I could see a lot of use cases for that versus trying to mix multiple libraries together.
CunningFellow told a while ago that he will create a streaming library cooperating with the bootloader library. I really don't want to butcher around in the bootloader library for performance stuff as a proper streaming library (working with CMD18) would have much better performance than any screwing around with CMD17 loads for multiple blocks. The proper solution to this would be that as I think the bootloader library is already complex and delicate enough.
User avatar
Artcfox
Posts: 1382
Joined: Thu Jun 04, 2015 5:35 pm
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Artcfox »

Jubatian wrote: Sun Apr 01, 2018 3:54 pm No, it turns CRC checking off on the SD card by sending it the appropriate command. The library is completely unaware of whether CRC was turned off or not. The effect of turning off CRC is on the SD card: If CRC is off, the card will no longer check the CRCs, so becomes unable to detect an error in transmission. It neither affects the speed of the SD card.
Okay, so the library will still calculate and send the proper CRC anyway (unnecessarily) and that will be the slowdown. Gotcha.
Jubatian wrote: Sun Apr 01, 2018 3:54 pmrcall is a short range call, so it is only likely to work within one assembly file (unless you got into messing with linker scripts to ensure that a group of files are within each other's reach).
Okay, good to know.
Jubatian wrote: Sun Apr 01, 2018 3:54 pmCunningFellow told a while ago that he will create a streaming library cooperating with the bootloader library. I really don't want to butcher around in the bootloader library for performance stuff as a proper streaming library (working with CMD18) would have much better performance than any screwing around with CMD17 loads for multiple blocks. The proper solution to this would be that as I think the bootloader library is already complex and delicate enough.
In the cases when you only need to read a single sector (especially randomly accessing them) CMD17 vastly outperforms CMD18, because you can queue up another random sector immediately, whereas when you use CMD18, you have to issue a CMD12 to stop it, and the recovery time is huge before you can issue any other command to the card, which is why I am specifically trying to use CMD17. I'll see how far I can get modifying the assembly language to achieve what I need though.

I wrote a program that encodes my levels in such a way that I only ever need to read a single sector to get the data needed for 8-way scrolling in any direction, and with blitting 24 sprites I don't think I have that many CPU cycles left per frame, which is why having it written in assembly language would be ideal.
User avatar
Jubatian
Posts: 1561
Joined: Thu Oct 01, 2015 9:44 pm
Location: Hungary
Contact:

Re: Bootloader API library with SDHC and FAT32

Post by Jubatian »

Eh, if you need it so much, here is a solution in an attachment. Add it to the Makefile like any other assembly module (check the lines used for including kernel stuff), then you can use it as described in its header file.

Still, I wouldn't especially recommend this route. When designing the bootloader library, I could experience some of my SD cards being really slow. I mean so slow that even with the normal CRC enabled reading, waiting for the data token took occasionally more than the read itself. So realistically if you wanted to support any SD card, you may still need to spare a full VBlank for a read (it may not always happen, but if it does, be prepared to handle it in a sensible manner).

CunningFellow's streaming (Tempest background) on the other hand may still work fine even on such slow cards. If you issued a CMD18 at the beginning of the VBlank, then left the card to do whatever it needs to do to get the data ready, it may be ready to read by the time the next frame starts, and the cards are designed to be fast with sequential access.

EDIT: What about the write support I submitted as an experimental branch above? If it works okay, I would like to merge it in!
Attachments
fastread.tar.bz2
Fast SD read with no data CRC check
(2.42 KiB) Downloaded 312 times
Post Reply