[Issue] BeagleBone Black Random Reboot

Has anyone experienced their BBB rebooting at random?

It was running solid for two days straight and this morning it just keeps rebooting. This last time the power light even went out.

*I’ve checked and all cables are secured. The entire setup is running of a PFC UPS; so power fluctuations shouldn’t be an issue.

*CPU doesn’t seem to be under load; ~20% utilization

*I’ve done a total unplug-let-sit and restart

*From what logs I’ve looked through (message and kern.log) the board does a total restart and loses even the date (goes form Oct 5 to Aug 26)

*I’ve not made any changes other than unplugging the Syba C-media USB sound card and transferring it back to my main computer after I wake up (BBB is doubling as a media player)
-This has worked for a grand total of ‘uptime’ 2 days and some odd hours with no issues.

…I really hope this doesn’t mean the board is defective :\

OS: Debian Wheezy (BBB-eMMC-flasher-debian-7.1-2013-08-26.img)

Power: AC Adapter (http://www.adafruit.com/products/276) [specifically recommended by Adafruit]

UPDATE

Appears the DC port on the board failed.

That would be a first! I suspect it may be a grounding issue.

Gerald

Ya, figured it was something wrong with the port. The AC Adapter works fine; 5.1v on a meter.

The board works fine; when powered by USB using the Debug Port.

I wonder if I can use a 5v,2a “Fast Charger” port or if it’ll burn out the BBB’s Debug Port…

Good you know the reason
But It is always better if you print your PMIC last boot status during next boot or even the Processor last boot reason so that next boot you can see the reason for reboot
https://android.googlesource.com/kernel/omap.git/+/android-omap-tuna-3.0-ics-mr1/arch/arm/mach-omap2/resetreason.c

Kavitha

You will not burn the board but Nothing will change because when powering the board by a usb port the software limits the current to some level and when this limit is exceeded then the board reboots. Therefore even if you give 100A to the usb port the pmic will limit it

That’s a bit of a relief.

And now the board won’t power on from the Debug Port, but will form the DC jack. At this point I’m about to say “fuck this shit”.

I have similar issue. I have (2) versions of BBB (A5A and A5C). They both randomly reboot themselves while I am running TI prebuilt BBB android image (from a couple hours to ten to fifteen hours). When I plug in the USB (DC is still powered) for logging with logcat, the reboot issue seems to disappear.

I don’t have problem with BBB Angstrom image (based on 3.8 kernel). I don’t have problem with Andrew Henderson’s android image (based on 3.8 kernel) either.

Another issue is that when I run TI BBB android image, the clock randomly jumps forward 2^17 seconds. This happens on both of my BBB boards. The problem goes away when I run Angstrom or Andrew’s android.

I suspect it has something to do with the processor, DDR3 (BBB: AM3359 1GHz + 512MB DDR3), the configuration, or apply workaround of errata. We have an AM335x EVM kit (AM3359 720MHz + 256MB DDR2). I also loaded TI prebuilt android image. I have run it for several months. It is rock solid. I never had problem with it.

Here are the links to my other posts in regarding to this issue.
https://groups.google.com/forum/#!category-topic/beagleboard/advanced/5qSJ4dQdar4

http://e2e.ti.com/support/embedded/android/f/509/t/297726.aspx

I have similar issue. I have (2) versions of BBB (A5A and A5C). They both
randomly reboot themselves while I am running TI prebuilt BBB android image
(from a couple hours to ten to fifteen hours). When I plug in the USB (DC is
still powered) for logging with logcat, the reboot issue seems to disappear.

I don't have problem with BBB Angstrom image (based on 3.8 kernel). I don't
have problem with Andrew Henderson's android image (based on 3.8 kernel)
either.

Another issue is that when I run TI BBB android image, the clock randomly
jumps forward 2^17 seconds. This happens on both of my BBB boards. The
problem goes away when I run Angstrom or Andrew's android.

This time 'issue' was fixed in u-boot sometime last year, so I'm
guessing the TI image has an un-patched u-boot..

I suspect it has something to do with the processor, DDR3 (BBB: AM3359 1GHz
+ 512MB DDR3), the configuration, or apply workaround of errata. We have an
AM335x EVM kit (AM3359 720MHz + 256MB DDR2). I also loaded TI prebuilt
android image. I have run it for several months. It is rock solid. I never
had problem with it.

Here are the links to my other posts in regarding to this issue.
https://groups.google.com/forum/#!category-topic/beagleboard/advanced/5qSJ4dQdar4
http://e2e.ti.com/support/embedded/android/f/509/t/297726.aspx

Regards,

I think Robert is correct. I fetched a snapshot of u-boot (2013.10), build and dropped it in. It seems that the time jump problem is going away.

I also did some testing with TI’s kernel. I disabled the OPPs at 1G, 720M, and 600MHz. So the highest OPP is 500MHz. It seems that BBB is running much more robust. I ran the system overnight without seeing random reboots.

I wonder if someone has some insight into this random reboot problem.
Why it seems a slower clock helps?
Why a USB connection also seems help (it doesn’t seem to decrease the BogoMIPs)?
What is the main difference between 3.8 kernel and 3.2 kernel?

Thanks!

It could be a power ‘level’ issue. Most modern sound systems have a built-in kill switch. That if the volume spikes the speakers are turned off to prevent damage… A similar system is probably in the BBB that prevents it from receiving to much power.

It has to be something like that because on a whim I tried to run my BBB before going to bed. In the time it took to get ready for bed (~5min) the BBB had powered down on it’s own (no power LED light). Sometimes it runs for days with no issue…other times, like previously mentioned, mere minutes.

And this happens on both USB and AC power (adapter is the recommended one by Adafruit).

I confirm that I have the same issue with a BBB A5B using TI 3.2 kernel.

After reading the full thread I guess that the posible workarounds are:

  • Using Angstrom 3.8 kernel.
  • Powering the BBB from USB.
  • Limiting the CPU frequency.

Can anybody confirm it?

Best regards.

It seems that Angstrom 3.8 kernel is more robust. But I couldn’t confirm that it will resolve the random reboot issue. I haven’t done long term testing. I remember seeing someone also has problem with it. It could be just that kernel is more optimized which draws less current.

Limiting the CPU frequency only made BBB random reboot less often. I can confirm that. I believe it is also related to the current draw. The slower the clock the less the power is drawn from PMIC.

By plugging in both DC power and USB cable (mini connector), it increases the PMIC current capacity. PMIC has two power paths (AC->SYS and USB->SYS), which reduces the stress placed on a single path (AC->SYS). I am suspecting the PMIC thermal shutdown causes the random reboot. Please take a look at the other thread I mentioned in a previous post. Jakub suggested putting in a Mosfet to bypass VDD_5V to SYS_5V path after boot up.

BTW, could you tell me what is your setup besides using TI prebuild kernel? Are you driving a HDMI display or a LCD cape? what else do you connect to the BBB?

Thanks,

Lei

After reading the Jakub thread the new conclusion is that this seems to bee a hardware related problem (related to PMIC). This may explain why changing kernel or changing CPU frequency doesn’t resolve the problem (only minimizes it).

I will test the Jakub hardware workaround (external MOSFET) in my own system.

About your questions:

  • I’m using a BBB A5B with a custom cape that includes: a LVDS converter for a LCD, a resistive touchscreen, a MAX3232 converter for a RS232 port and a buffer for some push buttons.

  • I have recompiled the TI 3.2 kernel (from LINUXEZSDK-BONE v06.00) to include the proper LCD initialization.

  • The system runs a Qt application with CPU load below 10%. The CPU works all time at default 1 GHz frequency.

  • I have two identical systems under test and I got one reboot each 2 to 3 hours but some days both systems work without reboots for 10 to 15 hours.

Best regards.

I have captured a BBB A5B random reboot with the oscilloscope (see attached image).

Ch1 → VDD_3V3B (P9.4)
Ch2 → VDD_5V (P9.6)
Ch3 → SYS_5V (P9.8)
Ch4 → SYS_RESETn (P9.10)

This confirms that the random reboot is produced by 1 second SYS_5V fall.

Best regards.

capture.png

Thanks for sharing the captured trace. It is very helpful. I wonder if you could further zoom in on the falling edge. I am really interested in the order of SYS_5V voltage drop and SYS_RESETn voltage drop.

Also I noticed in TPS65217 datasheet, if the nRESET pin is pulled low, there will be a minimum 1s delay before the PMIC returns to Active state (page 15). The nRESET pin is not connected on BBB. It should have an internal pull-up resistor of 100k (supposed to an alway-on supply). TI’s engineer may have better insight on this.

I did some measurement on the current consumptions of different Kernels (showing below). As you can tell 3.8 Kernels do draw less current.

My guess is that some piece of the kernel is setting PWR_EN low (see pages 15 & 16 in the TPS65217 Datasheet). This would explain the 1 second duration of the off state.
The system could restart due to PWR_EN floating high.
It does not explain why connecting to the USB power prevents the random reset.

The PMIC_POWR_EN is driven by the RTC module in the AM335x (see section 20.3.3.8 in the AM335x TRM).
Is there a difference in the way the 3.2 and 3.8 kernels handle the RTC or sleep modes?

I doubt the issue is connected with the floating nRESET pin. If you have a look at the “Figure 1. Global State Diagram” at page 16 the PMIC always waits for 1 second after a FAULT. Please read the following:

FAULT = UVLO || OTS || PGOOD low|| PWR_EN pin not asserted within 5s of Wakeup event.
If no battery is present, OVP on AC input also leads to OFF mode. With battery present, device switches
automatically from AC to BAT if AC is>6.5V and back to AC when voltage recovers to<6.5V.
Device will remain in RESET state for at least 1s.

UVLO = Under Voltage Lockout

OTS = Over Temperature Shutdown

OVP = Over Voltage Protection

I don’t think that connecting a USB cable reduces the current load to the PMIC because there are two different switches in the power path for the AC and USB inputs. They can’t be opened at the same time because the PMIC can’t be sure that two sources have the same voltage. If both switches are opened then current can flow backwards to a weaker source IMHO. Probably it’s really a ground issue as Gerald said before