Python hangs while reading ADC BB Black

Here is the Python code. It is started once, as a service, by the web gui program and runs forever.
Its output is consumed by the web server program that started it.

P.S. As a sanity check, the original BBB has been running this same service routine (without the web interface or sensors) for 61 days and has not hung… Yet.

#!/usr/bin/python3

import iio
import time
import datetime
import math
import sys

iio_context = iio.Context()

class Am335xAdc:
def init( self, vrefp=1.8, vrefm=0.0 ):
self.vrefp = vrefp
self.vrefm = vrefm
self._device = iio_context.find_device(‘TI-am335x-adc.0.auto’)
if self._device is None:
raise RuntimeError(“ADC not enabled”);
self._channels = []
self._raw = []
for i in range(8):
ch = self._device.find_channel( ‘voltage%d’ % i )
self._channels.append( ch )
if ch is not None:
ch = ch.attrs[‘raw’]
self._raw.append( ch )

# output range 0 .. 4095
def raw( self, ch ):
    return int( self._raw[ ch ].value )

# output range 0.0 .. 1.0
def value( self, ch ):
    return self.raw( ch ) / 4095

# output range vrefm .. vrefp
def voltage( self, ch ):
    return self.value( ch ) * ( self.vrefp - self.vrefm ) + self.vrefm

adc = Am335xAdc()

Thermister constants

resistance at 25 degrees C

THERMISTORNOMINAL = 10000 # temp. for nominal resistance (almost always 25 C)
TEMPERATURENOMINAL = 25

The beta coefficient of the thermistor (usually 3000-4000)

BCOEFFICIENT = 3435

the value of the ‘other’ resistor

SERIESRESISTOR = 10000

Calabration tweaks for each zone

TANK_CAL = 3.50
INLET_CAL = 3.30
OUTLET_CAL = 2.00

BTU Calculation constants

SAMPLE_RATE = 2 # Number of seconds per sample
GPM = 5.0 # Gallons per Minute Flow Rate
INTERGRAL = 2.0 # Itegration period in minutes
BTU_MULT = ((GPM/(60/SAMPLE_RATE)) * 8.33333)

Define ADC channel numbers (0-7) and associated header pin

CollectPin = 4 # P9-33
HeatPin = 6 # P9-35
RecircPin = 2 # P9-37
TankPin = 1 # P9-40
InletPin = 3 # P9-38
OutletPin = 5 # P9-36

Target_ID = “thermal_1008”

def getTemp(Sensor):

how many samples to take and average, more takes longer but is more ‘smooth’

NUMSAMPLES = 5
Temperature = 0

take N samples in a row

samples = 0
for i in range(0,NUMSAMPLES):
samples=samples + (1-adc.value(Sensor))
#print (“Sample = %d Sum = %5.0f” % (i, samples))

average all the samples out

average = samples / NUMSAMPLES

if average != 0:
# convert the value to resistance
average = 1 / average - 1
average = SERIESRESISTOR / average

# steinhart = average / THERMISTORNOMINAL         # (R/Ro)
steinhart = THERMISTORNOMINAL / average           # (R/Ro)
steinhart = math.log(steinhart)                   # ln(R/Ro)
steinhart /= BCOEFFICIENT                         # 1/B * ln(R/Ro)
steinhart += 1.0 / (TEMPERATURENOMINAL + 273.15)  # + (1/To)
steinhart = 1.0 / steinhart                       # Invert
steinhart -= 273.15                               # convert to C
Temperature = ((steinhart * 9.0)/ 5.0) + 32.0     # Convert Celcius to Fahrenheit

if Sensor == TankPin:
    Temperature = Temperature + TANK_CAL
if Sensor == InletPin:
    Temperature = Temperature + INLET_CAL
if Sensor == OutletPin:
    Temperature = Temperature + OUTLET_CAL

return Temperature

Set starting conditions

CurrTime = time.localtime()
CollectRun = False
HeatRun = False
RecircRun = False
LastSeconds = CurrTime.tm_sec
LastMinutes = CurrTime.tm_min
LastDay = CurrTime.tm_mday
Sum_BTU = 0
Sum_Inlet = 0
Sum_Outlet = 0
Sum_Delta = 0
BTU_24Hr = 0
Sum_BTU = 0
Delta_Temp = 0
SampleCount = 0

Begin endless loop (Service routine)

while 1:

CurrTime = time.localtime()

#===============================================

Do this once every n seconds

if (CurrTime.tm_sec != LastSeconds) and ((CurrTime.tm_sec % SAMPLE_RATE) == 0):
TankTemp = getTemp(TankPin) # Read All Temperatures
InletTemp = getTemp(InletPin)
OutletTemp = getTemp(OutletPin)
CollectCur = adc.value(CollectPin) # Get status of all pumps
HeatCur = adc.value(HeatPin)
RecircCur = adc.value(RecircPin)

# Process current sensors
if CollectCur > 0.35:
  CollectRun = 1
else:
  CollectRun = 0

if HeatCur > 0.15:
  HeatRun = 1
else:
  HeatRun = 0

if RecircCur > 0.15:
  RecircRun = 1
else:
  RecircRun = 0

# Average measurements
Sum_Outlet += OutletTemp
Sum_Inlet  += InletTemp
SampleCount += 1


# Process BTU calculations
if CollectRun:
  DeltaTemp  = OutletTemp - InletTemp
  if DeltaTemp < 0:
    DeltaTemp = 0
  Sum_Delta  += DeltaTemp
  Sum_BTU    += (BTU_MULT * DeltaTemp)
else:
  DeltaTemp = 0

#print ("2-Sec BTU= %5.0f Sum_BTU = %5.0f BTU_MULT= %2.4f" % (Sum_BTU, (BTU_MULT * DeltaTemp), BTU_MULT))

# Reset Seconds Trigger
LastSeconds = CurrTime.tm_sec
time.sleep((SAMPLE_RATE - 0.3))

#===============================================

Do this once each INTERGRAL minutes

if (CurrTime.tm_min != LastMinutes) and ((CurrTime.tm_min % INTERGRAL) == 0):
BTU_24Hr = BTU_24Hr + Sum_BTU
if SampleCount != 0:
DeltaAvg = Sum_Delta / SampleCount
InletAvg = Sum_Inlet / SampleCount
OutletAvg = Sum_Outlet / SampleCount
else:
DeltaAvg = 0
InletAvg = 0
OutletAvg = 0
timestamp = datetime.datetime.now().isoformat()

print ("metric:id=%s,n=TankTemperature,vd=%0.1f,u=F, ts=%s"   % (Target_ID, TankTemp, timestamp))
print ("metric:id=%s,n=InletTemperature,vd=%0.1f,u=F, ts=%s"  % (Target_ID, InletAvg, timestamp))
print ("metric:id=%s,n=OutletTemperature,vd=%0.1f,u=F, ts=%s" % (Target_ID, OutletAvg, timestamp))
print ("metric:id=%s,n=DeltaTemp,vd=%0.1f,u=F, ts=%s"         % (Target_ID, DeltaAvg, timestamp))
print ("metric:id=%s,n=Sum_BTU,vd=%4.0f,u=BTU, ts=%s"         % (Target_ID, Sum_BTU, timestamp))
print ("metric:id=%s,n=BTU_24Hr,vd=%6.0f,u=BTU, ts=%s"        % (Target_ID, BTU_24Hr, timestamp))
print ("metric:id=%s,n=CollectCur,vd=%3.0f,u=W, ts=%s"        % (Target_ID, (CollectCur * 171), timestamp))
print ("metric:id=%s,n=HeatCur,vd=%3.0f,u=W, ts=%s"           % (Target_ID, (HeatCur * 171), timestamp))
print ("metric:id=%s,n=RecircCur,vd=%3.0f,u=W, ts=%s"         % (Target_ID, (RecircCur * 171), timestamp))
sys.stdout.flush()

# Reset Intergral accumulators
Sum_BTU     = 0
Sum_Delta   = 0
Sum_Inlet   = 0
Sum_Outlet  = 0
SampleCount = 0

# Reset Minutes Trigger
LastMinutes = CurrTime.tm_min

#===============================================

Do this once each 24 hours

if (CurrTime.tm_mday != LastDay):

BTU_24Hr = 0
LastDay = CurrTime.tm_mday

I have seen something like that happen a long time ago when I was experimenting with the ADC directly, iirc specifically when disabling the ADC without first making sure it becomes idle. It seems unlikely to me that the kernel driver would make a similar mistake, but I suppose it is not completely impossible? This may be relevant since I manually sampling individual channels requires the kernel driver to reconfigure the ADC for each sample which I think (would need to double-check) requires disabling the ADC during reconfiguration.

If this is the cause however then this issue should be reproducible by other people, and I would have expected more people to be running into it.

Though if this does turn out to be the problem then that explains why we haven’t seen it happen to our devices, since we configure the ADC once at boot and never reconfigure it. For reasons I don’t quite remember I ended up not using the kernel driver at all and instead wrote a small program that setup and enabled the ADC in continuous operation after setting up DMA for it to a small fixed buffer in memory which I made mmap()able by userspace processes so they can read the latest measurement whenever they want to know. I do believe that IIO also supports continuous operation but I’m not really familiar with it, I think back then IIO either didn’t exist yet or I just didn’t know about it.

It’s also not impossible that we do experience this but just don’t know about it since if the ADC state machine were to lock up then userspace probably wouldn’t even notice this in our particular setup. I’ll see if we can add some sort of check that logs an alert if the ADC values remain unchanged for a long time.

Root cause has been found to be caused by internal latch up of the ADC portion of the CPU chip due to voltage spikes on the ADC inputs and or the common ground signal. This was occurring even when using a single common ground point for the entire system. Adding a series of ferrite chokes to the entire wiring bundle solved the problem. Note that all ADC inputs were connected via shielded coaxial cables grounded only on the BeagleBoard side and were electrically isolated at the thermister sensing positions. Even with these precautions taken, noise spikes from the pump motors were still an issue.

Hi, I’m having the same problem of ADC hanging due to voltage spikes on the ADC inputs, Is there a way to reset the ADC module or a solution without a power cycle?

Regards

After an entire year of debugging, I finally solved my issue. It turns out that the BeagleBoard design has an inductor between analog ground and digital ground. This was allowing spikes from the motor relays to latch up the internal ADC logic while the rest of the chip operated normally. Even though the digital ground was tied to a common point where all external grounds (Conduits, tank, pumps, etc) were bonded with 12ga wire. And, the analog ground was only tied to the coaxial shields which were also used as the return line of the thermisters, I still had intermittent latch-ups.

I ended up using the digital ground for everything and NOT USING the AGND. It has not had any issues since. I hope this helps.

2 Likes

To answer your question, NO, this is a true SCR latchup within the CPU die and only removing power will reset it. I am an IC designer and know how latchup occurs very well… Too well…

2 Likes

Thanks for the info, I will try that solution.

I have tried resetting the ADC manually with no effect

sudo busybox devmem 0x44E0d040 w 0x00000006 # Disable ADC (Verified that new register value is 0x00000006)
sudo busybox devmem 0x44E0d040 w 0x00000007 # Enable ADC (Verified that register value returned to 0x00000007)

have you tried toggling the bit 4 “Power_Down” (1 = Write 1 to power down AFE (the tsc_adc_ss enable (bit 0) should also be set to off)?

Thanks
Regards

Yes, I tried bit banging all of the ADC register bits to no avail. Like you, I wanted a solution that did not involve power cycling. Not using AGND was the solution for applications that use the ADC across different physical pieces of equipment.

In my case, I was using 10K thermisters to sense temperature between 4 to 8 feet away from the BBB. I used coaxial cables with the shield tied to AGND. The thermisters and coax were electrically isolated from all points being measured and it still got triggered about once every couple of months until I switched to DGND.

1 Like

We tried using DGND, but it wasn’t the solution for us so far. Interestingly, toggling the ADC enable bit as you posted fixes it, and I got the ADC running again. I’m still testing because sometimes I notice a slight drift in the measurements, but at least it’s not stuck.

Reagards