[Issue] BeagleBone Black Random Reboot

Antonio_Cebrian · November 27, 2013, 8:30am

I have soldered a 15 ohm power resistor between SYS_5V (P9.8) and DGND (P9.2). This increases SYS_5V load by 333mA. After three days of test I can confirm that random reboot frequency have not changed so it seems that the current load is not the problem.

BTW: I have found and unexpected TPS65217C behavior related to the USB power detection. I have posted this issue at http://e2e.ti.com/support/arm/sitara_arm/f/791/t/305879.aspx

Best regards.

Lei_Wang · December 3, 2013, 5:32pm

Follow your finding on the unexpected TPS65217C behavior, I patched the tps65217 driver with irq handling. The 3.2 kernel does not handle nNMI/PMIC_INT interrupt; the 3.8 kernel does. I placed printk in the interrupt handler and got the same result as yours. The PMIC_INT was issued every 2 seconds which is caused by the USBI flag in the TPS65217C interrupt register.

[ 217.095367] tps65217_irq: USB power status change
[ 217.259338] tps65217_irq: USB power status change
[ 219.096801] tps65217_irq: USB power status change
[ 219.256103] tps65217_irq: USB power status change
[ 221.094177] tps65217_irq: USB power status change
[ 221.262084] tps65217_irq: USB power status change
[ 223.095611] tps65217_irq: USB power status change
[ 223.259033] tps65217_irq: USB power status change
[ 225.097045] tps65217_irq: USB power status change
[ 225.255859] tps65217_irq: USB power status change
[ 227.094024] tps65217_irq: USB power status change
[ 227.262908] tps65217_irq: USB power status change
[ 229.095489] tps65217_irq: USB power status change

I paste part of the log above. The actual timing of USBI status change is at a inteval of 0.16, 1.84 seconds interval. I lowered the CPU frequency from 1G to 300MHz. I don’t observe any change in this timing.

I also loaded an Angstrom (3.8 kernel) on the BBB. By inspecting the interrupts (cat /proc/interrupts), it seems that 3.8 kernel does NOT have the same behavior. I will be checking the PMIC configuration difference between 3.2 and 3.8 kernel.

Also, if USB is connected to the mini USB connector. This behavior is going away.

Thanks.

mentorel · December 3, 2013, 5:44pm

By the way when I connected 5vdc directly to the additional ldo ic I have grounded Vusb as well and the board worked without any reboot issue during 1 week. Probably I was in a wrong direction thinking that it was totally depending on the current overload at tps65217. One of the ideas was about the USB grounding and I combined all solutions at once

Will test only grounded Vusb to check your hypothesis

dekrueg · December 3, 2013, 10:49pm

This appears to be related to USB OTG. The VBUS, ID, and D+ pins on USB0 are all generating 0.5 Hz signals. The signal on VBUS that the TPS65217C is detecting may be OTG probing.

This may not be the cause of the random reboots, but it’s worth some examination.

mentorel · December 4, 2013, 6:45pm

The abstract from the TPS65217 datasheet to describe what is going on here:

The linear charger periodically applies a 10-mA current source to the BAT pin to check for the presence of a

battery. This will cause the BAT terminal to float up to > 3 V which may interfere with AC removal detection and

the ability to switch from AC to USB input. For this reason, it is not recommended to use both AC and USB

inputs when the battery is absent.

dekrueg · December 4, 2013, 11:15pm

I wonder when the BAT terminal drifts > 3 V, if the PMIC behaves as if V_BAT > V_UVLO.

If so, I wonder what happens if AM335x USB-OTG probing drives VBUS > V_BAT + 190 mV. That would exceed V_IN(DT).

Lei_Wang · December 5, 2013, 5:10pm

I can confirm that the pulsing detected by PMIC on USB_DC signal is the probing from USB-OTG.

After I disabled the USB-OTG in the kernel, the system has never rebooted. Btw I also re-loaded Angstrom image (3.8 kernel) and Andrew’s Android image (with 3.8 kernel). I did not observe USB-OTG probing pulses on the VBus. I believe in the 3.8 kernel, the USB-OTG has not been implemented/enabled. That might be reason why it seems that 3.8 kernel doesn’t have the random reboot behavior.

Lei_Wang · December 5, 2013, 6:08pm

In case anyone wants to test it out, here is the change in the source code (NOTE: ignore the line and column numbers; just search for the struct “static struct omap_musb_board_data musb_board_data” ):
…

— a/arch/arm/mach-omap2/board-am335xevm.c
+++ b/arch/arm/mach-omap2/board-am335xevm.c

…

@@ -3956,7 +4125,8 @@ static struct omap_musb_board_data musb_board_data = {

mode[4:7] = USB1PORT’s mode
AM335X beta EVM has USB0 in OTG mode and USB1 in host mode.
*/

.mode = (MUSB_HOST << 4) | MUSB_OTG,
+// .mode = (MUSB_HOST << 4) | MUSB_OTG,

.mode = (MUSB_HOST << 4) | MUSB_PERIPHERAL,
.power = 500,
.instances = 1,
};

Please let me know the results.

Thank.

Illutian_Kade · December 5, 2013, 6:13pm

Crap…now to figure out if this is doable (by me) for Debian Wheezy.
…if computers become sentient, it’s probably because I goofed something up.

Illutian_Kade · December 10, 2013, 3:45pm

Sigh, nope, Angstrom 3.8 (disabled OTG detection) didn’t fix the issue either.

As of late, the BBB actually powers down completely (power LED is off). And pressing the power button, reset button, even BOOT button does nothing. Hell, even unplugging and plugging back in the power doesn’t do anything. If I wait several minutes I can get the thing to power on…for about 50seconds then it goes dead again.

Appears that it is, as others suggested, a physical issue…crap.

Gerald_Coley1 · December 10, 2013, 4:20pm

Please request an RMA so we can look at it. Make sure it is in the failed state and that you let the RMA team know how to get it in that state and recover.

Gerald

Thomas_J · February 10, 2014, 10:55am

Hi
any news on this issue ? I have the same problem with my beagle xm, with linux not running anything and it still reboots after 1 to 5 minutes
If there is anything electric to do to make this work (add a capacity in front of the 5V power supply, …) I can do it, I have lots of electronic tools at work

Btw, I am logged in through tty and I see

Broadcast message from root@arm
        (unknown) at 11:40 ...

The system is going down for reboot NOW!
[  143.036193] Restarting system.

so somehow it might not be a complete physical problem, the system knows about it
(Linux version 3.7.10-x9 on Ubuntu 12.10 and beagleboard xm)

Illutian_Kade · February 10, 2014, 7:04pm

Turns out it was a faulty PMIC. It’s been fixed and is happily blinking away

Jukka_Mykkanen · February 19, 2014, 9:49am

Hey,

I have to BBB:s (A6) that boots randomly with Ubuntu installed. How the PMIC was fixed, did you do it or send it back to shop?`

-Jukka

maanantai, 10. helmikuuta 2014 21.04.51 UTC+2 Illutian Kade kirjoitti:

Illutian_Kade · February 19, 2014, 2:09pm

I RMA-ed it using the Beagle Board website. Took about 20 days, round trip, to get the board back. I doubt even if I had the skills to replace circuit chips, I’d would have been able to find it…I have no idea how they diagnose issues for their boards.

But you may have another issue. As mine eventually stopped booting back up after the randomly rebooting for several days. Might be harder for them to find the issue. As they told me to leave the board in a “failed state” and ship it to them. But if your’s is rebooting, then it will never be in a ‘failed state’.

Thomas_J · February 19, 2014, 2:59pm

The problem was fixed on my side with the new kernel
https://github.com/RobertCNelson/armv7-multiplatform

Damien1 · February 24, 2014, 4:19am

Hi Robert,
Do you know more detail about the time jump issue in uboot? for example, fix commit number, or some words used for the commit?
I am interested to find out what exactly the fixes are, but there are too many commits in uboot and I need some thing to search with.

Regards,
Damien

RobertCNelson · February 24, 2014, 4:23am

Probably… http://git.denx.de/?p=u-boot.git;a=commit;h=000820b5835c2b8b863af992b66dc973dc4bd202

Damien1 · February 26, 2014, 3:34am

Strange … I checked the commit and found it had been included within the v2013.04 uboot release … but my BB board is still experienced with this time jump issue one/two times per day. Could there be anything else?
Regards.
Damien

Lei_Wang · February 26, 2014, 2:50pm

Damien,
Check the link on TI e2e:
http://e2e.ti.com/support/embedded/android/f/509/t/308616.aspx
It basically switched the clock source from 32K to 24MHz clock. It eventually fixed my time jump problem.

Good luck!

Lei