Python hangs while reading ADC BB Black

I have a Pyrhon based data collection program that runs forever as a service. It runs perfectly for about 3 to 4 weeks then hangs. Kill -9 has no effect, ps shows it still running. If I attempt to manually start the program in a terminal, it hangs as well and also cannot be killed. So far the only solution is to reboot. My assumption is that the ADC is stuck and that the Python ADC library does not have a timeout failsafe timer. Is there a way to reset the ADC hardware via a direct register write as an experiment? If that experiment proves the root cause, I should be able to alter the ADC library to add a recovery method.

Sounds like a memory problem. Runs until the it runs out of memory. Can you run a script or just run TOP to monitor what is going on?

I will watch the system but it will take several days or weeks to see of there is a memory leak. All data is saved to an SD card every hour so there shouldn’t be any memory buildup unless there is something wrong with Python or it’s libraries. Of note, is that the OS is still fully functional when this hang occurs and other programs can be run as long as they don’t access the ADC.

William Petty

It seems that you’re not using libpruio for ADC sampling …

Use Enable (bit 0) in register CTRL (offset 40h) to disable and re-enable the TSC subsystem.

Regards

I thought I had this conquered by fixing a bad line in the code, but It just happened again yesterday. Seems to be brought on by power disturbances that do not affect the overall system.

Problem

python program intermittently locks up and cannot be killed.
python program is spawned by WEB interface program running as root service. (Once at boot time)

System

Beaglebone Black BeagleBoard.org Debian Image 2018-10-07
Uses only the internal ADC and stdout as IO.

import Adafruit_BBIO.ADC as ADC)
import time
import datetimeimport time
import datetime
import math
import sys

import math
import sys

#Program is paced by polling for changes in the current time.

Do this once every n seconds

if (CurrTime.tm_sec != LastSeconds) and ((CurrTime.tm_sec % SAMPLE_RATE) == 0):
TankTemp = getTemp(TankPin) # Read All Temperatures
InletTemp = getTemp(InletPin)
OutletTemp = getTemp(OutletPin)
CollectCur = ADC.read(CollectPin) # Get status of all pumps
HeatCur = ADC.read(HeatPin)
RecircCur = ADC.read(RecircPin)

LastSeconds = CurrTime.tm_sec
time.sleep((SAMPLE_RATE - 0.3))a

Once hung (which may take many months to occur), the python process cannot be killed even by root.
Also, launching the program manually hangs immediately and this new process cannot be killed either.
It appear that the ADC library call never completes and is uninterruptible. My only recourse is to power cycle the system. Note that “sudo reboot now” results in an unstable system. (No network, but LEDs are blinking normally)

I have tried resetting the ADC manually with no effect

sudo busybox devmem 0x44E0d040 w 0x00000006 # Disable ADC (Verified that new register value is 0x00000006)
sudo busybox devmem 0x44E0d040 w 0x00000007 # Enable ADC (Verified that register value returned to 0x00000007)

The rest of the system seems unaffected and all unrelated programs run normally.

Typical ADC task looks like this

take N samples in a row

samples = 0
for i in range(0,NUMSAMPLES):
samples=samples + (1-ADC.read(Sensor))

average all the samples out

average = samples / NUMSAMPLES

A similar discussion is here: linux - What is an uninterruptible process? - Stack Overflow

Is someone still supporting the Adafruit_BBIO.ADC library that can address this problem?

Try using libiio instead. Example:

import iio

iio_context = iio.Context()

class Am335xAdc:
    def __init__( self, vrefp=1.8, vrefm=0.0 ):
        self.vrefp = vrefp
        self.vrefm = vrefm
        self._device = iio_context.find_device('TI-am335x-adc.0.auto')
        if self._device is None:
            raise RuntimeError("ADC not enabled");
        self._channels = []
        self._raw = []
        for i in range(8):
            ch = self._device.find_channel( f'voltage{i}' )
            self._channels.append( ch )
            if ch is not None:
                ch = ch.attrs['raw']
            self._raw.append( ch )

    # output range 0 .. 4095
    def raw( self, ch ):
        return int( self._raw[ ch ].value )

    # output range 0.0 .. 1.0
    def value( self, ch ):
        return self.raw( ch ) / 4095

    # output range vrefm .. vrefp
    def voltage( self, ch ):
        return self.value( ch ) * ( self.vrefp - self.vrefm ) + self.vrefm

adc = Am335xAdc()

for ch in range(7):
    v = adc.voltage( ch )
    print( f'ain{ch} = {v:.2f}V' )

I just noticed how old your system is. What you’re describing sounds like a bug in the adc kernel driver, and there’s of course a good chance it’s been fixed already.

Regardless, using IIO instead of the sysfs interface could perhaps happen to avoid the problem, who knows. You may also want to look into updating your system, or at least the kernel.

You should never mess with a hardware device like that (unless you know exactly what you’re doing and how this will affect both the hardware and the kernel driver). It’s far more likely to just make things worse by creating further confusion for the driver by desynchronizing the hardware state from what the driver expects it to be.

Almost certainly it’s not the ADC itself that’s stuck, just the driver.

Thanks for the suggested code below. However, using the supplied example, I am getting the following syntax error:

debian@beaglebone:~/debug$ python3 iio_test.py
  File "iio_test.py", line 19
    ch = self._device.find_channel( f'voltage{i}' )
                                                ^

SyntaxError: invalid syntax

Another question: I notice that the class define is using range 8 while the measure loop is using range 7.

Oh right, f-strings were introduced in python 3.6 but your ancient debian 9 system still has python 3.5… you’ll need to replace the f-strings by some alternative way of formatting strings, e.g.:

change f'voltage{i}' to 'voltage%d' % i

change f'ain{ch} = {v:.2f}V' to 'ain%d = %.2fV' % (ch, v)

The ADC has 8 channels, but the beaglebone exposes only the first 7 of these as external analog inputs, hence the difference.

1 Like

Hello,

I just saw this idea. Is libiio a python built-in or is it a lib. from another person making Open Source software available?

Seth

The libiio python binding is part of the libiio source repository, it’s debian-packaged as python3-libiio and installed by default on current images.

1 Like

Hello,

Thank you, sir.

Seth

From your suggested program I get this:
File “iio_test.py”, line 3, in
import iio
ImportError: No module named ‘iio’

I have searched the file system and found libiio.so.0 in /lib/arm-linux-gnueabihf/ but no mention of it under the python sub directories. I also see this /usr/lib/modules/4.14.71-ti-r80/kernel/drivers/iio

This BB is running as a 24/7 controller and I only have a few hours after midnight to experiment with fixing the hang issue. It is running a licensed SW package and I don’t want to possibly kill it with changing too many variables at once.
You mentioned that I am running an ancient OS version. Does Beagleboard support an in-place full upgrade? This system runs completely from the SD card (Along with years worth of data) Flashing a new SD card with the a newer OS version will require me to move and rebuild thousands of data files as well as the license keys. Trying to avoid that.

Like I mentioned in my last comment it’s part of the python3-libiio package, which is installed by default on current images but evidently not on your old image so you’d need to install it using apt.

Upgrading a system is certainly a lot trickier than starting with a fresh image, and is not something I’d suggest doing in a production environment without having thoroughly tested it in a development/test environment… but it sounds like you don’t even have one of those.

If the system on this SD card is this precious, I do hope you’re regularly making backups?

Note that for your specific problem of the ADC hanging the most important part to upgrade would be the kernel, which is a lot safer since you can just install the new kernel and if it causes any problems just go back to the previous kernel by changing the uname_r variable in /boot/uEnv.txt (which selects the active kernel). If changing the kernel causes boot failure you can just stick the sd card into any linux system and use that to change the variable back.

The safest thing to try would be to upgrade to the latest patchlevel within the same kernel series you’re using currently (whichever that may be). If that doesn’t fix the problem you could try a slightly newer kernel series like 4.19-ti.

1 Like

zmatt, I have set up a naked (board only) BB Black with the latest available version (debian 10.3 iot buster) and have been trying to get your sample program to compile/run. Still can’t import iio. Where did you find the iio library, it doesn’t seem to be on the system and apt can’t find it either.
At least now I can work on this any time and I can import all of the data and SW from the online system to create an updated SD locally. Once debugged, I can just swap SD cards.

Been away from this for a little while. Finally solved the libiio issue. Seems it doesn’t install automatically even on the most recent release. So I manually installed it and voila! I will start testing the new iiolib ADC calls and see how it goes.

P.S. I still believe that the ADC or internal busing pathway is hung because when it hangs, even a reboot does not fix the problem until power is removed. Now, don’t get me wrong, it may be the kernel driver somehow triggering this problem and the newly updated lib and kernel my well solve my issue.

P.P.S. From your earlier reply about twiddling register bits, I have spent the last 20 years designing and implementing 50-100 million gate disk storage network routers so I know my way around the internal guts of most SOC chips. Thanks for your help. Fingers crossed, and time will tell if this was it.

Current status:

  • New Beaglebone Black board.
  • New kernel (Latest version)
  • New ADC library (Switched from Adafruit to iiolib)
    It failed again after 58 days running. Same hanging Python process that cannot be killed.
    I am fully convinced this is a problem in the am3325 chip because a reset or reboot will not clear the problem (which also rules out a memory leaks). Only after a power cycle does the ADC work again.

One unlikely possibility is that the ADC logic is latching up due to some sort of voltage spike to the analog inputs which are all fed by shielded coaxial cables with a common ground.
I will be adding clamp diodes to ALL of the analog inputs as a last ditch effort before giving up on using the BBB.

If I am forced to add an external ADC hat board I might as well switch to a Raspberry Pi. I chose the BBB because it had an integrated ADC (Single board solution).

That’s actually pretty impressive… Any chance can you share your adc reading application? I’d like to possibly add that to my am335x ci testing farm. :wink:

Regards,