Python hangs while reading ADC BB Black

Here is the Python code. It is started once, as a service, by the web gui program and runs forever.
Its output is consumed by the web server program that started it.

P.S. As a sanity check, the original BBB has been running this same service routine (without the web interface or sensors) for 61 days and has not hung… Yet.

#!/usr/bin/python3

import iio
import time
import datetime
import math
import sys

iio_context = iio.Context()

class Am335xAdc:
def init( self, vrefp=1.8, vrefm=0.0 ):
self.vrefp = vrefp
self.vrefm = vrefm
self._device = iio_context.find_device(‘TI-am335x-adc.0.auto’)
if self._device is None:
raise RuntimeError(“ADC not enabled”);
self._channels = []
self._raw = []
for i in range(8):
ch = self._device.find_channel( ‘voltage%d’ % i )
self._channels.append( ch )
if ch is not None:
ch = ch.attrs[‘raw’]
self._raw.append( ch )

# output range 0 .. 4095
def raw( self, ch ):
    return int( self._raw[ ch ].value )

# output range 0.0 .. 1.0
def value( self, ch ):
    return self.raw( ch ) / 4095

# output range vrefm .. vrefp
def voltage( self, ch ):
    return self.value( ch ) * ( self.vrefp - self.vrefm ) + self.vrefm

adc = Am335xAdc()

Thermister constants

resistance at 25 degrees C

THERMISTORNOMINAL = 10000 # temp. for nominal resistance (almost always 25 C)
TEMPERATURENOMINAL = 25

The beta coefficient of the thermistor (usually 3000-4000)

BCOEFFICIENT = 3435

the value of the ‘other’ resistor

SERIESRESISTOR = 10000

Calabration tweaks for each zone

TANK_CAL = 3.50
INLET_CAL = 3.30
OUTLET_CAL = 2.00

BTU Calculation constants

SAMPLE_RATE = 2 # Number of seconds per sample
GPM = 5.0 # Gallons per Minute Flow Rate
INTERGRAL = 2.0 # Itegration period in minutes
BTU_MULT = ((GPM/(60/SAMPLE_RATE)) * 8.33333)

Define ADC channel numbers (0-7) and associated header pin

CollectPin = 4 # P9-33
HeatPin = 6 # P9-35
RecircPin = 2 # P9-37
TankPin = 1 # P9-40
InletPin = 3 # P9-38
OutletPin = 5 # P9-36

Target_ID = “thermal_1008”

def getTemp(Sensor):

how many samples to take and average, more takes longer but is more ‘smooth’

NUMSAMPLES = 5
Temperature = 0

take N samples in a row

samples = 0
for i in range(0,NUMSAMPLES):
samples=samples + (1-adc.value(Sensor))
#print (“Sample = %d Sum = %5.0f” % (i, samples))

average all the samples out

average = samples / NUMSAMPLES

if average != 0:
# convert the value to resistance
average = 1 / average - 1
average = SERIESRESISTOR / average

# steinhart = average / THERMISTORNOMINAL         # (R/Ro)
steinhart = THERMISTORNOMINAL / average           # (R/Ro)
steinhart = math.log(steinhart)                   # ln(R/Ro)
steinhart /= BCOEFFICIENT                         # 1/B * ln(R/Ro)
steinhart += 1.0 / (TEMPERATURENOMINAL + 273.15)  # + (1/To)
steinhart = 1.0 / steinhart                       # Invert
steinhart -= 273.15                               # convert to C
Temperature = ((steinhart * 9.0)/ 5.0) + 32.0     # Convert Celcius to Fahrenheit

if Sensor == TankPin:
    Temperature = Temperature + TANK_CAL
if Sensor == InletPin:
    Temperature = Temperature + INLET_CAL
if Sensor == OutletPin:
    Temperature = Temperature + OUTLET_CAL

return Temperature

Set starting conditions

CurrTime = time.localtime()
CollectRun = False
HeatRun = False
RecircRun = False
LastSeconds = CurrTime.tm_sec
LastMinutes = CurrTime.tm_min
LastDay = CurrTime.tm_mday
Sum_BTU = 0
Sum_Inlet = 0
Sum_Outlet = 0
Sum_Delta = 0
BTU_24Hr = 0
Sum_BTU = 0
Delta_Temp = 0
SampleCount = 0

Begin endless loop (Service routine)

while 1:

CurrTime = time.localtime()

#===============================================

Do this once every n seconds

if (CurrTime.tm_sec != LastSeconds) and ((CurrTime.tm_sec % SAMPLE_RATE) == 0):
TankTemp = getTemp(TankPin) # Read All Temperatures
InletTemp = getTemp(InletPin)
OutletTemp = getTemp(OutletPin)
CollectCur = adc.value(CollectPin) # Get status of all pumps
HeatCur = adc.value(HeatPin)
RecircCur = adc.value(RecircPin)

# Process current sensors
if CollectCur > 0.35:
  CollectRun = 1
else:
  CollectRun = 0

if HeatCur > 0.15:
  HeatRun = 1
else:
  HeatRun = 0

if RecircCur > 0.15:
  RecircRun = 1
else:
  RecircRun = 0

# Average measurements
Sum_Outlet += OutletTemp
Sum_Inlet  += InletTemp
SampleCount += 1


# Process BTU calculations
if CollectRun:
  DeltaTemp  = OutletTemp - InletTemp
  if DeltaTemp < 0:
    DeltaTemp = 0
  Sum_Delta  += DeltaTemp
  Sum_BTU    += (BTU_MULT * DeltaTemp)
else:
  DeltaTemp = 0

#print ("2-Sec BTU= %5.0f Sum_BTU = %5.0f BTU_MULT= %2.4f" % (Sum_BTU, (BTU_MULT * DeltaTemp), BTU_MULT))

# Reset Seconds Trigger
LastSeconds = CurrTime.tm_sec
time.sleep((SAMPLE_RATE - 0.3))

#===============================================

Do this once each INTERGRAL minutes

if (CurrTime.tm_min != LastMinutes) and ((CurrTime.tm_min % INTERGRAL) == 0):
BTU_24Hr = BTU_24Hr + Sum_BTU
if SampleCount != 0:
DeltaAvg = Sum_Delta / SampleCount
InletAvg = Sum_Inlet / SampleCount
OutletAvg = Sum_Outlet / SampleCount
else:
DeltaAvg = 0
InletAvg = 0
OutletAvg = 0
timestamp = datetime.datetime.now().isoformat()

print ("metric:id=%s,n=TankTemperature,vd=%0.1f,u=F, ts=%s"   % (Target_ID, TankTemp, timestamp))
print ("metric:id=%s,n=InletTemperature,vd=%0.1f,u=F, ts=%s"  % (Target_ID, InletAvg, timestamp))
print ("metric:id=%s,n=OutletTemperature,vd=%0.1f,u=F, ts=%s" % (Target_ID, OutletAvg, timestamp))
print ("metric:id=%s,n=DeltaTemp,vd=%0.1f,u=F, ts=%s"         % (Target_ID, DeltaAvg, timestamp))
print ("metric:id=%s,n=Sum_BTU,vd=%4.0f,u=BTU, ts=%s"         % (Target_ID, Sum_BTU, timestamp))
print ("metric:id=%s,n=BTU_24Hr,vd=%6.0f,u=BTU, ts=%s"        % (Target_ID, BTU_24Hr, timestamp))
print ("metric:id=%s,n=CollectCur,vd=%3.0f,u=W, ts=%s"        % (Target_ID, (CollectCur * 171), timestamp))
print ("metric:id=%s,n=HeatCur,vd=%3.0f,u=W, ts=%s"           % (Target_ID, (HeatCur * 171), timestamp))
print ("metric:id=%s,n=RecircCur,vd=%3.0f,u=W, ts=%s"         % (Target_ID, (RecircCur * 171), timestamp))
sys.stdout.flush()

# Reset Intergral accumulators
Sum_BTU     = 0
Sum_Delta   = 0
Sum_Inlet   = 0
Sum_Outlet  = 0
SampleCount = 0

# Reset Minutes Trigger
LastMinutes = CurrTime.tm_min

#===============================================

Do this once each 24 hours

if (CurrTime.tm_mday != LastDay):

BTU_24Hr = 0
LastDay = CurrTime.tm_mday

I have seen something like that happen a long time ago when I was experimenting with the ADC directly, iirc specifically when disabling the ADC without first making sure it becomes idle. It seems unlikely to me that the kernel driver would make a similar mistake, but I suppose it is not completely impossible? This may be relevant since I manually sampling individual channels requires the kernel driver to reconfigure the ADC for each sample which I think (would need to double-check) requires disabling the ADC during reconfiguration.

If this is the cause however then this issue should be reproducible by other people, and I would have expected more people to be running into it.

Though if this does turn out to be the problem then that explains why we haven’t seen it happen to our devices, since we configure the ADC once at boot and never reconfigure it. For reasons I don’t quite remember I ended up not using the kernel driver at all and instead wrote a small program that setup and enabled the ADC in continuous operation after setting up DMA for it to a small fixed buffer in memory which I made mmap()able by userspace processes so they can read the latest measurement whenever they want to know. I do believe that IIO also supports continuous operation but I’m not really familiar with it, I think back then IIO either didn’t exist yet or I just didn’t know about it.

It’s also not impossible that we do experience this but just don’t know about it since if the ADC state machine were to lock up then userspace probably wouldn’t even notice this in our particular setup. I’ll see if we can add some sort of check that logs an alert if the ADC values remain unchanged for a long time.

Root cause has been found to be caused by internal latch up of the ADC portion of the CPU chip due to voltage spikes on the ADC inputs and or the common ground signal. This was occurring even when using a single common ground point for the entire system. Adding a series of ferrite chokes to the entire wiring bundle solved the problem. Note that all ADC inputs were connected via shielded coaxial cables grounded only on the BeagleBoard side and were electrically isolated at the thermister sensing positions. Even with these precautions taken, noise spikes from the pump motors were still an issue.