Recovery from a Soft Brick

aven · February 23, 2024, 6:16pm

I was building Gateware locally using the shipping capes as a starting point and it seems I have stumbled across an invalid configuration that locks up the boot process.

My serial debug output prints the following before stopping:

[    2.786184] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249)
[    2.794420] io scheduler mq-deadline registered
[    2.801748] GPIO line 510 (sd_card_cs): no hogging state specified, bailing out
[    2.810689] gpio-494 (ADC_IRQn): hogged as input
[    2.815841] gpio-497 (USB_OCn): hogged as input
[    2.822486] GPIO line 472 (vio_enable): no hogging state specified, bailing out
[    2.830578] gpio-473 (SD_DET): hogged as input
[    2.836410] gpio gpiochip3: (41200000.gpio): not an immutable chip, please consider fixing it!
[    2.846522] gpio gpiochip4: (41100000.gpio): not an immutable chip, please consider fixing it!
[    2.856049] gpio gpiochip4: (41100000.gpio): detected irqchip that is shared with multiple gpiochips: please fix the driver.
[    2.870313] microchip-pcie 3000000000.pcie: host bridge /fabric-pcie-bus@3000000000/pcie@3000000000 ranges:
[    2.881130] microchip-pcie 3000000000.pcie:      MEM 0x3009000000..0x3017ffffff -> 0x0009000000
[    2.890832] microchip-pcie 3000000000.pcie:       IO 0x3008000000..0x3008ffffff -> 0x0008000000
[    2.900467] microchip-pcie 3000000000.pcie:      MEM 0x3018000000..0x3087ffffff -> 0x0018000000
[    2.910114] microchip-pcie 3000000000.pcie:   IB MEM 0x0080000000..0x0083ffffff -> 0x0080000000
[    2.919757] microchip-pcie 3000000000.pcie:   IB MEM 0x00c4000000..0x00c9ffffff -> 0x0084000000
[    2.929413] microchip-pcie 3000000000.pcie:   IB MEM 0x008a000000..0x0091ffffff -> 0x008a000000
[    2.939054] microchip-pcie 3000000000.pcie:   IB MEM 0x1412000000..0x1421ffffff -> 0x0092000000
[    2.948670] microchip-pcie 3000000000.pcie:   IB MEM 0x1022000000..0x107fffffff -> 0x00a2000000

Then after about 5 minutes I get the following output every 60 seconds or so:

[  337.681383] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  337.687908] rcu: 	0-...0: (14 ticks this GP) idle=3e3c/1/0x4000000000000000 softirq=38/39 fqs=2625
[  337.697781] 	(detected by 2, t=5255 jiffies, g=-1151, q=2 ncpus=4)
[  337.704586] Task dump for CPU 0:
[  337.708127] task:swapper/0       state:R  running task     stack:0     pid:1     ppid:0      flags:0x00000008
[  337.719046] Call Trace:
[  337.721729] [<ffffffff80a67ba0>] __schedule+0x27c/0x834

I have tried to use the DirectC JTAG programmer without success. I am using the shipping Gateware image BVF-0.4.0-27-g7078de9. I issue the programming command as documented in the DirectC repository.
I have tried programming in both the HSS prompt stage and after the boot process appears to hang.
In both cases I get this error code:

Identifying device...
Looking for MPF device...
ActID = 0 ExpID = F8531CF
ERROR_CODE: 8004
Error return code  6
Elapsed time = 00:00:00 Done.

I double checked my wiring and it looks correct.

So my questions are:
Can I use the shipping *.dat files or do I need to compile my own as the DirectC documentation suggests?
When should I perform the programming action? During the HSS stage?
Do I need to toggle the eMMC multiplexer to USB mode or anything like that?
Are there DirectC commands I can use to verify my wiring? I have tried “device_info” and “read_idcode” but I get the same error.
Is signal integrity a significant concern? I am using a Rpi5 and have to use 3 inch jumpers from the Pi to the IDC header end of the TC2050-IDC.

Thank you in advance!

Vauban · February 23, 2024, 9:04pm

I think what is happening here is that somehow the PCIe block is not included in your gateware but the device tree overlay causes Linux to look for it causing a locking bus transaction to the FPGA fabric.
I did this to myself a couple of times.
The way I recovered from it was to replace the version of U-Boot with one that did not merge the gateware content device tree overlay before passing it to the Linux kernel. This is a little bit of a nuclear option but it means it can be done with just using the USB-C cable to the board and the HSS.

Vauban · February 23, 2024, 9:14pm

I’m sure there is a more intelligent approach with the software stack we have now. I’m guessing use the HSS’ usbdmsc command to get the board to show as a USB mass storage device. This should let you retrieve the dtb and boot.scr. I think the key is to modify the boot.scr to remove the check for device tree overlay. @RobertCNelson what do you think? Does that make sense?

Vauban · February 23, 2024, 9:16pm

Yes, temporarily comment the following line in boot.scr:
run design_overlays;

RobertCNelson · February 23, 2024, 9:18pm

the fun part… boot.scr is built with u-boot mkimage…

setenv fdt_high 0xffffffffffffffff
setenv initrd_high 0xffffffffffffffff

load mmc 0:${distro_bootpart} ${scriptaddr} beaglev_fire.itb;
bootm start ${scriptaddr}#kernel_dtb;
bootm loados ${scriptaddr};
# Try to load a ramdisk if available inside fitImage
bootm ramdisk;
bootm prep;
fdt set /soc/ethernet@20112000 mac-address ${icicle_mac_addr0};
fdt set /soc/ethernet@20110000 mac-address ${icicle_mac_addr1};
run design_overlays;
bootm go;

You can cheat and stop it in the u-boot console and manually copy it’s instructions… skipping run design_overlays;

aven · February 23, 2024, 10:25pm

These tips worked! I am up and running again.

Notes:
Commenting a line in the boot.scr file triggers a CRC check failure at boot.

## Executing script at 8e000000
Bad data crc
SCRIPT FAILED: continuing...
... lines removed for clarity ...
RISC-V #

This turned out to be convenient because I could directly enter the scripted commands line by line at the RISC-V # prompt. The script commands aren’t valid in HSS and I wasn’t sure how to exit HSS and get back into u-boot.

After I executed the script commands as Robert suggested, the boot proceeded normally and I was able to flash shipping Gateware back onto the FPGA. Then I un-commented the line in boot.scr so it would pass the CRC check again.

Thank you both for your help!

leoh · June 20, 2024, 12:15pm

Hi everyone,

I am facing a similar issue with a fresh image downloaded from here.

I get the same error line [ 2.856049] gpio gpiochip4: (41100000.gpio): detected irqchip that is shared with multiple gpiochips: please fix the driver.

Following the same steps as mentioned by @RobertCNelson and @Vauban, I commented the run design_overlays; line, typed the commands mentioned by Robert into the uboot prompt to proceed to boot. Once inside the OS, I flashed the default gateware (named ci-default, more on this below) and once the BVF rebooted, I interrupted the HSS to run usbdmsc and remove the commented line.

Even after removing the comment, I get the bad header crc error. I’m not sure what I am missing here. My end goal is to boot normally to flash more custom gateware ideas, how can I proceed?

Note: The default gateware is said to be in a folder named default in the documentation, but in fact it seems to be in the ci-default folder. This needs to be updated in the docs.

lranders · June 20, 2024, 3:24pm

Funny you should ask…

RobertCNelson · June 20, 2024, 3:29pm

So the current shipping version of bbb.io-gatware default is the default… Early on it was ci-default.

What ever image you have installed, running:

sudo apt update
sudo apt-get dist-upgrade

Should pull in the newer version of bbb.io-gatware

Regards,

Vauban · June 20, 2024, 3:29pm

You beat me to it.

RobertCNelson · June 20, 2024, 3:30pm

I’ve been hacking so much on the docs, i even had a branch setup… ah crap need to fix it… Whew doc’s are okay…

Instead, i need to upload a new RootFS version!

leoh · June 21, 2024, 5:02am

Ah, a new bookmark Thanks!

leoh · June 21, 2024, 5:04am

I see, I’ll make sure to do this. Maybe the image must be updated here?

wrc · August 20, 2024, 1:26am

I’m new to this forum, so perhaps this isn’t the right place to ask, but here goes: Where do I find instructions to fully recover a BeagleV-Fire to “factory” condition (i.e. the way it was when I got it),
which assumes nothing about the BeagleV-Fire current state? I need to know how to recover
from any state because:
I’ve gone through the “blinky.v” tutorial
successfully, replacing blinky.v with my own verilog which implements my own custom processor into
the fabric. I also was able to get this new (misc-polar1) project to work using a local libero installed under Ubuntu/WSL. I also have copied the libero project files from WSL to
the Window’s directory where I usually put my libero projects and was able to open and view the project
with my Window’s Libero installation. Now, I would like to use the Window’s Libero and a Flashpro5
to put my project into the BeagleV-Fire. But, I may “soft-brick” the board, so I need to know how
to recover in case that happens.

A related question is: what is the meaning of the “digest” numbers associated with an FPGA design bitstream. I assume they are very long checksums of some sort. Some of the numbers differ between the bitstreams built using Ubuntu Libero and Window’s LIbero:

Unbuntu LIbero:
BITS component bitstream digest: 743c84a51bb8f09c698540b2227ead42bbe263b008451c7bc9a18da54c0d78e5
Fabric component bitstream digest: e374db15e93dfbbe7698bff746bf8e3f046d443d14381607314a06f7be6b6b88
eNVM component bitstream digest: c39e1ccb403b5513e00b277aec340cab7f66a94203f017e684337540310d69fb
sNVM component bitstream digest: b0a9e2be29f47fa951a6a131a8dd9a9016e5ba87de5833c742b7f654428ea6e7
EOB component bitstream digest: 2be0f160919e03a75a253a00e27c6611ea996538b54883656894dfbffc4b2bd7
Entire bitstream digest: 11c0688ac333d5ed2e9fbf1a8c6a4ffdef8eac15523db26dc23caf7b2a5491ae

Window’s Libero:
BITS component bitstream digest: 743c84a51bb8f09c698540b2227ead42bbe263b008451c7bc9a18da54c0d78e5
Fabric component bitstream digest: f1c895cb2117fd3c3e71d5f39cc19043aee339f7f1e4c3e78b679b67fbf77f2d
eNVM component bitstream digest: f0a528b252cd1e521594c30ace37ea2abbd30732645e61df637cb180410060dc
sNVM component bitstream digest: b0a9e2be29f47fa951a6a131a8dd9a9016e5ba87de5833c742b7f654428ea6e7
EOB component bitstream digest: fb386477ed2a277451baa1d290978d40ea5ec8ac603785831b6eaf3fe541ca2c
Entire bitstream digest:
a1e894a33c0033f58a8c5b1353181844cfb73d7b7943b31fd62ce267744a7486

I can understand why the “fabric” digest would be different since a different seed was likely
used during the layout. But, I don’t understand the other categories well enough to know
if differences are to be expected. Any helpful comments would be greatly appreciated.

silver2row · August 20, 2024, 2:07am

I am not sure if you saw this image but here goes it: BeagleV-Fire Ubuntu 2023-11-21 - BeagleBoard

Now, is that the factory image? I really do not know if that available image is the factory image.

I started a Fire-Gate repo and prematurely “soft bricked” the device. I say “soft bricked” because I need to update the time each and every time I update/upgrade via apt.

And my Fire-Gate repo is busted and fails. I know it is something that I did but I kept it because I think I can figure out what exactly it was I was doing when I goofed up. Remorseful Coding!

Seth

P.S. Just for reference, there are an amount of people that can help. I am most likely not one of them for now. I am just learning…

Update

WSL or WSL2 stands no chance in working well or playing well from my perspective. I use Ubuntu Distros for the BeagleV-Fire.

It makes things a tad bit simpler (my opinion).

wrc · August 20, 2024, 2:31am

Thanks for your comments. Just to be clear, I’m not using WSL on the Beagleboard, but on a Windows PC. I haven’t touched the Ubuntu installation on my BeagleV-Fire, apart from what’s required to update it and do the various tutorials. I used Ubuntu under WSL on my Windows PC to install Libero and run the bitstream generation scripts there, like in one of the later tutorials. But, after doing that I would copy the bitstreams back to the beagle board and use the “linux” programming method to put the project into the fabric. That all
works just fine now (after much effort). What I’m now ready to try is putting the same
design into the FPGA using a Windows Libero installation and Flashpro5. I think I’m all set
to give it a try, but it seems highly likely that I will “soft-brick” my BeagleV-Fire. So, I need
a quick and fool proof way to restore the BeagleV-Fire to factory state. I’m not yet skilled
enough to know what to do with the Linux image you pointed me to. Presumably the
installation of such an image would be one step in the overall process I’m looking for.
Thanks again, Walter Cook

lranders · August 20, 2024, 5:10am

If you have a FP5, it is virtually impossible to get stuck, lest you somehow fry the board.

You have nothing to fear once you successfully tested out your FP5.

As for the SHA digests, yes, those are expected to change randomly.

If you copy out what is known as the default bitstream, you can always get back in working order.

Technically, I think Seeed uses a special test bitstream, so “factory default” might not be desirable.

lranders · August 20, 2024, 5:12am

It’s just us mere mortals w/o an FP5 that have to be a bit more apprehensive,
as we need the BVF to be able to boot up far enough to allow Linux to burn another bitstream…