Repeatable bug where beaglebone black loses bytes from it's serial connection.

I have spent about two days tracking down an issue in one of my machines. I am streaming pressure readings from an i2c pressure sensor to a PC at high speed. I have reduced the problem to a small test case. To replicate the fault I connect the beaglebone black to a PC, connect to it with serial over the micro USB port, login and launch a command that streams data fast. On the PC run a program that collects the streamed data more slowly than it is being created.

import serial

x = serial.Serial('COM15')
to_send = b'\x03yes HelloWorld\n'
x.write(to_send)
while True:
    received = x.readline()
    if received != b'HelloWorld\r\n':
        print(received) 

The python program above prints lines like:

b’ld\r\n’
b’HelloWoroWorld\r\n’
b’HelloWorldorld\r\n’
b’HelloWorldorld\r\n’
b’ld\r\n’
b’ld\r\n’
b’ld\r\n’
b’ld\r\n’
b’ld\r\n’
b’ld\r\n’
b’HelloWoroWorld\r\n’
b’ld\r\n’
b’HelloWlloWorld\r\n’
b’ld\r\n’
b’HelloWorldorld\r\n’

At about 4 lines per second. If I set the program on the beagle bone to stream slowly then this issue goes away. It appears to be an issue with a buffer filling up somewhere between the output of the yes command and reading bytes from the windows API. The correct behaviour is that the yes command should be stopped when the buffer is full to prevent the loss of data, then allowed to continue when there is space in the buffer again. I observe that the streaming command (in this case yes) is stopped but there is still data loss happening somewhere.

I wrote this in python because it is the language I am most comfortable in but if you needed me to I could rewrite the example in some other language. To run the program shown above you will need pyserial and you will need to change ‘COM15’ to be whatever the port is called on your computer.

I should mention I am using:

  • Windows 10
  • Drivers from about 2 weeks ago
  • A beaglebone image from about 2 weeks ago

It looks like the serial output buffer overflow. I think you’re using the USB serial port, so it’s probably not lost in the USB link, but on the Windows side the USB serial driver is presumably overwriting the output buffer. Would you have a Linux system somewhere to cross-check it by using a different OS/serial driver combination?

Can you change the settings in the Windows serial port, asking for hardware flow control?

Hi,

You have to create a state machine, at start you can have something in the serial buffer and you will lose receiver sync…

Java example:

This is expected behavior unless you are using hardware flow control
or something like XON/XOFF to throttle the Tx side.

Otherwise, if you transmit data faster than a receiver can process it,
you will loose data.

...looks like you need a faster PC! :wink:

What is the current speed you are using for the serial ports?

There is also a transmit buffer inside the serial port transmitter in the BBB.If you are putting streaming character data into the transmit buffer faster than
the serial port can send it, you can also lose data.

One way to fix this is to set the serial port to a higher speed, on both ends.

I think the BBB and most PCs are well behaved up to 256000 bits per second.

— Graham

I recall our local serial expert stating that the beaglebone's UART is
capable of up to at least 4Mbit.

I think Charles’ suggestion is probably the best way to go about dealing with this issue. Enable some kind of hardware flow control. XON / XOFF is probably a good bet.

On Fri, 21 Apr 2017 17:27:23 -0700, William Hermans
<yyrkoon@gmail.com> declaimed the following:

I think Charles' suggestion is probably the best way to go about dealing
with this issue. Enable some kind of hardware flow control. XON / XOFF is
probably a good bet.

  Technically, XON/XOFF is not hardware flow control -- it is software
flow control, relying on the receiving end sending a character to the
sending side which has to be read and interpreted (hopefully by the driver
and not the application) to start/stop transmission.

  RTS/CTS (ready to send/clear to send) is hardware flow control, since
it uses extra signal wires to actually affect the other side UART. DTR/DSR
(data terminal ready/data set [modem] ready) is the other hardware set as I
recall.

  Of course, once you go through a USB<>serial converter (as I recall,
the OP is using the USB connection, not the debug or other wired UART),
anything goes -- since USB doesn't have hardware serial lines running end
to end; the adapter has to convert hardware flow control to special USB
packets which are then converted back on the other side. It should probably
also be handling some of the buffering -- perhaps by setting RTS/CTS itself
when /it/ is nearing full before the other end polls the USB serial
connection, and the other end probably doesn't have a physical UART either,
just USB packet buffer.

Thanks for all the replies.

Reply to Przemek Klowoski

On Thu, 27 Apr 2017 09:54:42 -0700 (PDT), David Howlett
<david.howlett3@gmail.com> declaimed the
following:

x = serial.Serial('COM10', xonxoff=True)
x = serial.Serial('COM10', rtscts=True)
x = serial.Serial('COM10', dsrdtr=True)

  These will only be of use if both sides of the link are set for the
same protocol. The above look to be Windows COM port names -- you need to
have the BBB also set to use those protocols (I suspect the default is for
no flow control).

I have tried baud rates on the PC from 10Hz to 100GHz including common baud
rates like 9600. The serial connection works at all baud rates tested.
There appears to be no change in the data rate or the error rate. To rule
out a bug in the pyserial library I have also changed the baud rate with a
GUI terminal called "Termite 3.1". This also does not change the data rate.
I believe that the serial port is a virtual device and all commands to
change the baud rate are ignored by the driver.

  If you are going through the USB port, that would tend to be correct as
the driver is just moving USB packet data (which are polled at USB rates)
into a "serial" buffer for reading.

x = serial.Serial('COM10', baudrate=10)
x = serial.Serial('COM10', baudrate=9600)
x = serial.Serial('COM10', baudrate=100_000_000_000)

  And as mentioned above -- changing baudrate on the Windows side
wouldn't have any affect on baudrate on the BBB => though if one were using
real serial ports and not going through a USB translation (at both ends),
getting a mismatch in serial port speeds will result in garbage (seen at my
former employer when using USB->Serial dongles to connect to equipment with
physical serial ports; the serial end of the dongle has to match baudrate
of the physical equipment, otherwise garbage is seen and sent via USB).

Hi,

I use a similar algorithm on 76800 baud for RS485 Honeywell C-bus token ring protocol and it works without problems from java on BBB and PC…
And I noticed that you nowhere call a flush when sending data, that can be the problem…

Test it with the state machine, It must work…

Here is my reading algorithm from c lib on linux it detects a pause in the transfer:

timeout - read timeout in seconds
usReadPause - after no incomming data on port for x micro seconds, return from read sequence… (return one data packet)

I couldn’t easily find documentation on how to enable flow control on the beaglebone side so I chose to add a CRC32 checksum to each line. The data now looks like:

102.04541 125 1186 3665238047
102.04588 125 1273 3232659341
102.04635 125 1273 3345150538
102.04708 125 1245 4074941927
102.04756 125 1245 943190513
102.04803 125 1303 526161833
102.04850 125 1274 3970916767
102.04898 125 1273 472341404
102.04970 125 1100 2928958186
102.05018 125 1244 2418030567
102.05066 125 1187 2253189159
102.05113 125 1157 1456340796
102.05161 125 1273 258216438
102.05234 125 1012 3972342546
102.05282 125 1214 2310087250
102.05329 125 1274 2579294272
102.05377 125 1215 1325398437

This means that corrupted lines are easy to detect. I then deal with data being missing later on in the analysis pipeline.

Here is my reading algorithm from c lib on linux it detects a pause in the transfer
Unfortunately the code I am writing on the PC side is called too infrequently for this to help. The 12k buffer can occasionally fill up and lose data in the gap between two calls. I could make a seperate thread regularly poll the windows API to see if there was new data but a checksum was simpler to implement as I am not familiar with threading.

The problem is documentation. As in it exists, but one part of the equation
does not necessarily have anything to do with the other. As such, it's hard
to find information if you know nothing about what you're looking for.
Trust me, I've been using Debian nearly since it's first release, and I
personally do not know how this is to be done from hands on experience.
However, I can shed some light on the subject whcih should allow you to
experiment on your own, and have something working fairly quickly if you
have a decent amount of experience with Linux in general.

First, you're going to need to learn how to use systemd, and specifically,
you're going ot need to learn how to create, and use a startup service.

Second, you need a tool that is capable of working with serial devices at a
"low level". Such as:
https://manpages.debian.org/jessie/coreutils/stty.1.en.html and
specifically this setting:

[-]ixonenable XON/XOFF flow control

Then it's just a matter of using this tool properly form within the
service, or perhaps calling a script from the service, that sets the serial
port exactly how you want. You'll have to experiment to get things exactly
how you want / need.

First, you’re going to need to learn how to use systemd, and specifically, you’re going to need to learn how to create, and use a startup service.

I happen to have used systemd startup services in a previous project so I can see how it would be useful for setting settings on startup.

Second, you need a tool that is capable of working with serial devices at a “low level”. Such as: stty(1) — coreutils — Debian jessie — Debian Manpages and specifically this setting:

This is the tool I was looking for and could not find, thank you. The default settings for the serial connection I am using are:

root@beaglebone:~/reusable# stty --all
speed 9600 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts
-ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff
-iuclc -ixany imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke

I can now use the stty command to set ixon on or off.

root@beaglebone:~/reusable# stty -ixon
root@beaglebone:~/reusable# stty --all
speed 9600 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^; erase = ^?; kill = ^U; eof = ^D; eol = ;
eol2 = ; swtch = ; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts
-ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff
-iuclc -ixany imaxbel -iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt
echoctl echoke

Unfortunately the data loss is the same regardless of whether ixon is enabled or disabled. Changing ixoff does not affect the data loss either.
I tried setting crtscts and cdtrdsr but this is not permitted.

root@beaglebone:~/reusable# stty crtscts
stty: standard input: unable to perform all requested operations
root@beaglebone:~/reusable# stty cdtrdsr
stty: invalid argument `cdtrdsr’

There is flow control regardless of what I do

David:

I am confused as to what your actual hardware configuration is, with respect to the “serial communications.”

Are you using a COM port in Windows to talk to a USB to serial cable, which is talking to a hardware serial port in the BBB?

Or, are you doing something else?

If so, please describe.

I have never seen these kinds of problems when dealing with hardware serial ports.

— Graham

>> First, you're going to need to learn how to use systemd, and
specifically, you're going to need to learn how to create, and use a
startup service.

I happen to have used systemd startup services in a previous project so I
can see how it would be useful for setting settings on startup.

That good, something less to get in your way to fixing your problem.

Conclusions

- The virtual serial port provided is a poor imitation of a real serial
port. Every command to change any property of either end of the connection
either fails or is ignored.

- There is undocumented flow control somewhere behind the scenes outside
my control. It is probably in a driver.

- The flow control mechanism loses data if the data is not a multiple of 4
bytes long.

So, you've actually delved further into using the UARTs on this platform
than I ever have. Even with other( bare metal ) platforms, I only really
use UARTs for a printf() style serial debug output. With this platform, we
do not need it, unless you're troublshooting the boot process.

Anyway, my first instinct wants to say baud rate is involved somehow. It's
very unlikely a baud rate difference between systems, because usually when
this occurs, your transmission will be garbled. At least a few characters
here and there will be different between send, and receive. So this leads
me to speculate that you're somehow exceeding your maximum baud rate.

Another thing I'm noticing from what your code is outputting. is that your
"data loss" is not consistent. Which immediately makes me want to jump to
the conclusion that you code is somehow being preempted. Almost as if
you're data is on the stack, something preempts your code, and by the time
the part of your code is given control back, the data on the stack is no
longer there. This sort of situation is what I refer to as "stepping all
over the stack". e.g. some of those routines you're using could be stack
unfriendly. Granted, I know very little about Python, or the libraries
Python uses. But I've personally experienced this effect first hand, when
using a third party web server API, when dealing with a lot of data fast.
CANBUS at 1Mbit, while decoding PGNs in real time, and attempting to send
this data in real time using this third part web server API.

I wonder if you run the application that failing for you there is you pipe
that data to a file, instead of to stdout, if you would be experiencing the
same problem ? Why don't you humor me, and give that a try ? e.g. run your
application as such:

$ myapp > some_log_file.txt

Run it for a few hundred iterations and see if you get any blank
transmission, and if you do, if they are fewer. If this stops your blank
tramissions, or lessens them a fair bit. Then your problem could either be
python not being able to keep up( I'm kind of doubting that ), or Windows
for whatever reason is preempting your code mid transmission. Or even
something else not yet considered.

WARNING -- LONG POST FOLLOWS (Lots of cut&paste output)

On Thu, 4 May 2017 09:32:03 -0700 (PDT), David Howlett
<david.howlett3@gmail.com> declaimed the
following:

  I'll have to confess I'm still not sure which serial port is being used
here...

  The debug serial located on the 6-pin header midway between the
USB-host port and the 5V power jack (I don't have a 3.3V USB/Serial for
that -- though I could maybe use jumper wires with the RaspPi adapter I
have [the pin-out is different]); or
  The emulated serial port available when connecting to the USB-client
port (connecting a BBB via that port created two COM ports on my Win10
machine -- Gadget Serial/COM3 [triangle warning in device manager] and USB
Serial Device/COM4); or
  Some other port/UART via some other serial/USB converter.

This is the tool I was looking for and could not find, thank you. The
default settings for the serial connection I am using are:

root@beaglebone:~/reusable# stty --all
speed 9600 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt =
^R;
werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0;
-parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts
-ignbrk brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon

  If I understand the documentation, X-on/X-off flow control can be used
by the remote end to pause the local end.

-ixoff

  But the local end will NOT send X-on/X-off flow control to tell the
other end to pause.

Hi,

call flush after write command!!!

**`flush`****(****)**

Flush of file like objects. In this case, wait until all data is written.

Arsi

I think it's important to note that the OP is not sending data from the
beaglebone, but receiving it. At least looking at the last code he pasted.
Usually you wont see "COM9" on a Linux system :wink:

At any rate if receiving, you don't need to use flush at all. If it's
anything like fflush() in C.