Beagleboard AI's Multiple Hardware Failures

Hello, everybody.

We’ve been using the BeagleBone AI to prototype a system derived from LinuxCNC using the RT_PREEMPT kernel.

The unit is mounted in a somewhat enclosed area, but have continual airflow.

Thinking it might be a thermal issue, I placed temperature sensors in numerous places to try and corroborate this.

The CPU itself was reporting 60°C while the sensor placed next to the heatsink was reading 51°C.

Both of these seem to be within the operating range.

I’ve disabled HDMI and GPU through the device tree.

The first hardware failure we chalked up to bad luck. The second one took place in the same cabinet under the same conditions so now we may have an actual issue rather than a one-off failure.

Connecting over serial, the device boots partially but then fails at different points in the process. The SD card has been verified to work in another BBAI. I also tried to use the vanilla Debian flashing image with no luck. The boot errors persist.

Questions:

Has anybody else had similar problems?

Does anybody have any suggestions on how to further investigate? I’m perfectly happy using specialty images or to toss on a JTAG device for deeper introspection.

Thank you your time and thank you for the community!

Edit: Adding partial trace from boot messages.
Edit 2: Added full stack trace

This is the output from a freshly applied vanilla flasher image: am57xx-eMMC-flasher-debian-12.2-iot-armhf-2023-10-07-4gb.img

U-Boot 2022.04-ge0d31da5 (Aug 04 2023 - 18:49:43 +0000)

CPU  : DRA752-GP ES2.0
Model: BeagleBoard.org BeagleBone AI
Board: BeagleBone AI REV A
DRAM:  1 GiB
Core:  60 devices, 16 uclasses, devicetree: separate
MMC:   no pinctrl state for default mode
omap_hsmmc_init_setup: timedout waiting for cc2!
no pinctrl state for default mode
omap_hsmmc_init_setup: timedout waiting for cc2!
mmc@480d1000 - probe failed: -110
OMAP SD/MMC: 0, OMAP SD/MMC: 1no pinctrl state for default mode
omap_hsmmc_init_setup: timedout waiting for cc2!

Loading Environment from nowhere... OK
BeagleBone Cape EEPROM: no EEPROM at address: 0x54
BeagleBone Cape EEPROM: no EEPROM at address: 0x55
BeagleBone Cape EEPROM: no EEPROM at address: 0x56
BeagleBone Cape EEPROM: no EEPROM at address: 0x57
Net:   eth2: ethernet@48484000
Press SPACE to abort autoboot in 1 seconds
MMC: no card present
MMC: no card present
MMC: no card present
switch to partitions #0, OK
mmc1(part 0) is current device

Partition Map for MMC device 1  --   Partition Type: DOS

Part    Start Sector    Num Sectors     UUID            Type
  1     8192            30613504        3f0e9fb5-01     83 Boot
Scanning mmc device 1
Checking for: /uEnv.txt ...
Checking for: /boot/uEnv.txt ...
1368 bytes read in 1 ms (1.3 MiB/s)
Loaded environment from /boot/uEnv.txt
Checking if uname_r is set in /boot/uEnv.txt ...
debug: [uname_r=5.10.168-ti-r72] ...
loading /boot/vmlinuz-5.10.168-ti-r72 ...
11325952 bytes read in 137 ms (78.8 MiB/s)
loading /boot/dtbs/5.10.168-ti-r72/am5729-beagleboneai.dtb ...
344176 bytes read in 7 ms (46.9 MiB/s)
uboot_overlays: [fdt_buffer=0x60000] ...
loading /boot/initrd.img-5.10.168-ti-r72 ...
7844649 bytes read in 96 ms (77.9 MiB/s)
debug: [console=ttyS0,115200n8 root=/dev/mmcblk1p1 ro rootfstype=ext4.
debug: [bootz 0x82000000 0x88080000:77b329 0x88000000] ...
Kernel image @ 0x82000000 [ 0x000000 - 0xacd200 ]
## Flattened Device Tree blob at 88000000
   Booting using the fdt blob at 0x88000000
   Loading Ramdisk to 8f884000, end 8ffff329 ... OK
   Loading Device Tree to 8f82c000, end 8f88306f ... OK

Starting kernel ...

[    6.615753] reg-fixed-voltage fixedregulator-vtt: Failed to regist7
[   10.777954] sdhci-omap 4809c000.mmc: failed to set system capabilis
[   10.789825] sdhci-omap 480b4000.mmc: failed to set system capabilis
[   10.855560] omap_voltage_late_init: Voltage driver support not addd
[   10.917327] reg-fixed-voltage fixedregulator-vtt: Failed to regist7
[   11.048156] omapdss error: HDMI I2C Master Error
[   11.113037] OF: graph: no port node found in /ocp/interconnect@4a00
[   11.136810] sdhci-omap 4809c000.mmc: no pinctrl state for sdr104 me
[   11.144226] sdhci-omap 4809c000.mmc: no pinctrl state for ddr50 moe
[   11.151275] sdhci-omap 4809c000.mmc: no pinctrl state for sdr50 moe
[   11.157684] sdhci-omap 4809c000.mmc: no pinctrl state for sdr25 moe
[   11.157684] sdhci-omap 4809c000.mmc: no pinctrl state for sdr12 moe
[   11.157684] sdhci-omap 4809c000.mmc: no pinctrl state for ddr_1_8ve
[   11.157714] sdhci-omap 4809c000.mmc: no pinctrl state for ddr_3_3ve
[   11.170684] sdhci-omap 4809c000.mmc: no pinctrl state for hs mode
[   11.184112] sdhci-omap 4809c000.mmc: no pinctrl state for hs mode
[   11.196228] sdhci-omap 4809c000.mmc: no pinctrl state for hs200_1_e
[   12.119934] omapdss error: HDMI I2C Master Error
trap: EXIT: bad trap
[   20.037109] remoteproc remoteproc0: request_firmware failed: -2
[   20.046691] remoteproc remoteproc1: request_firmware failed: -2
[   20.052886] remoteproc remoteproc2: request_firmware failed: -2
[   20.059143] remoteproc remoteproc3: request_firmware failed: -2
[   23.491485] vpe 489d0000.vpe: couldn't get firmware
[   26.151977] cpu cpu0: _opp_add_static_v2: opp ked
[   26.226226] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -19
[   26.290069] cpu cpu0: OPP table can't be empty
[   26.421997] cpu cpu0: _opp_add_static_v2: opp key field not found
[   26.501098] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -19
[   26.608337] cpu cpu0: OPP table can't be empty

After this point, the system is non-responsive.