BBB + PREEMPT_RT

I am trying to figure out how to create a kernel for the BBB that supports PREEMPT_RT. It’s kind of strange that the BBB’s default kernel does not even have PREEMPT activated. Such a board doesn’t fit to many embedded applications where we need at least some kind of determinism. It is even worse, that nobody seems to care about this problem. Contrary to that, the Raspberry PI’s standard kernel has PREEMPT activacted from the very beginning.

I have tested Robert Nelsons kernel 3.8.13-r9 (https://github.com/beagleboard/kernel/tree/3.8-rt). It does not have PREEMPT_RT activated by default. When doing so, it does not boot. But activating PREEMPT does work. However, development of this branch has stopped several months ago. The official source for RT Linux (3.8.13) has evolved since then. Meanwhile there’s an rt17 patch set (https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/). Did anybody give this a try? Does it work with the BBB?

I am trying to figure out how to create a kernel for the BBB that supports PREEMPT_RT. It's kind of strange that the BBB's default kernel does not even have PREEMPT activated. Such a board doesn't fit to many embedded applications where we need at least some kind of determinism. It is even worse, that nobody seems to care about this problem. Contrary to that, the Raspberry PI's standard kernel has PREEMPT activacted from the very beginning.

A long, long time ago (with the original BeagleBone) I tried this, but ran into problems with the NIC driver. There's probably a reason it's not enabled by default! Feel free to try though, maybe some problems have been fixed since then.

-- Bas

Surely the point of the Beaglebone, or rather its processor, is that you
do not need to put the time critical bits on the main processor, you put
them in the PRUSS processors.

David

Or just try Xenomai…

https://github.com/cdsteinkuehler/linux-dev/tree/3.8.13-bone39-xenomai

Or just try Xenomai...

GitHub - cdsteinkuehler/linux-dev at 3.8.13-bone39-xenomai

While this may be answer to the original poster's question, it is of no
relevance to my point which is that the Sitara processor encourages a
different way of solving the real time problem by having PRUSS processors
in the SoC to do the real time bits independently of the main processor

David

Just a few thoughts …

It is not possible to have a fully deterministic real-time operating system on a processor that uses instruction/data caches. ie you have to turn off the cacheing to achieve determinism and eliminate performance jitter (which then degrades the average performance).

From what I understand PREEMPT_RT does not really improve the real-time performance of linux if you stick to user level applications. You have to start doing things at kernel level, which can get difficult and break many of the existing device drivers. Anyway, who said all embedded applications require a deterministic real-time performance? Soft real-time performance is generally good enough for a lot of applications.

For real-time, the PRU co-processors are the way to go.

There are a number of papers around on the web comparing the performance of normal linux, PREEMPT_RT and Xemonai in real-world situations (use google to find them). They make for interesting reading and caused me to re-access my approach to embedded linux systems.

Regards …

Hi,

Just a few thoughts …

It is not possible to have a fully deterministic real-time operating system on a processor that uses instruction/data caches. ie you have to turn off the cacheing to achieve determinism and eliminate performance jitter (which then degrades the average performance).

Yep, but, that’s the easy part. How about pipelines and instructions reordering done by the compiler and the processor? How about interrupts? How about multi-cores? How about the drift of the crystal you use as the clock source of your CPU? You might be shocked now, but as you can see it’s impossible to have a hard real-time system with state of the art (multi-core) processors. Is it? I think that you need to come up with a realistic test suite to see if preempt-rt (with or without CPU isolation) is good enough, or if you need Xenomai (still you will see issues if Xenomai and Linux use the same caches), or some dedicated hardware like PRU. There is also some interesting work by Jan Kiszka - not yet on ARM.[1]

Regards,

Robert

[1] Jailhouse: A Linux-based Partitioning Hypervisor [LWN.net]

Shameless self promotion:

[2] http://www.reliableembeddedsystems.com/pdfs/2010_03_04_rt_linux.pdf
[3] Getting real (time) about embedded GNU/Linux - Embedded.com

Hi,

It is not possible to have a fully deterministic real-time operating system on a processor that uses instruction/data caches. ie you have to turn off the cacheing to achieve determinism and eliminate performance jitter (which then degrades the average performance).

That is correct (in theory), but you need to figure out what kind of real-time requirements you have on your system first to understand if that is important or not. You typically divide real-time operating systems in these 3 categories;

  • Hard: Missing a deadline is a total system failure.
  • Firm: Infrequent deadline misses are tolerable, but may degrade the systems quality of service. The usefulness of a result is zero after its deadline.
  • Soft: The usefulness of a result degrades after its deadline, thereby degrading the system’s quality of service

Above defintions from Real-time computing - Wikipedia

If you have “Hard” real-time requirements you typically do not try to run that application in Linux user-space, not even with PREEMPT-RT. That is the task for dedicated real-time solutions such as the PRU co-processor or dedicated real-time OSes.

Firm real-time can be solved using Linux PREEMPT-RT though, and also soft of course. I did a quick benchmark on the BBone some time ago on the 3.4.39-rt54 kernel and found bounded latencies in the 60us range. So if you have firm real-time requirements and can accept latencies in that range PREEMPT-RT can be a solution.

From what I understand PREEMPT_RT does not really improve the real-time performance of linux if you stick to user level applications. You have to start doing things at kernel level, which can get difficult and break many of the existing device drivers. Anyway, who said all embedded applications require a deterministic real-time performance? Soft real-time performance is generally good enough for a lot of applications.

The point of PREEMPT-RT is to provide bounded latencies for user-space applications (SCHED_FIFO tasks), without PREEMPT-RT can can’t count on bounded latencies in Linux (even for SCHED_FIFO tasks).

For real-time, the PRU co-processors are the way to go.

Agreed, but that is for hard real-time. And programming the PRU is not at all as convenient as programming user-space applications in Linux on the posix interface. Posix on top of Linux with PREEMPT-RT provides you with a preemptive programming model (if needed) and bounded latencies, though you need to be careful with which system calls you are using.

Regards
Daniel

Hi,

Just a few thoughts …

It is not possible to have a fully deterministic real-time operating system on a processor that uses instruction/data caches. ie you have to turn off the cacheing to achieve determinism and eliminate performance jitter (which then degrades the average performance).

Yep, but, that’s the easy part. How about pipelines and instructions reordering done by the compiler and the processor? How about interrupts? How about multi-cores? How about the drift of the crystal you use as the clock source of your CPU? You might be shocked now, but as you can see it’s impossible to have a hard real-time system with state of the art (multi-core) processors. Is it? I think that you need to come up with a realistic test suite to see if preempt-rt (with or without CPU isolation) is good enough, or if you need Xenomai (still you will see issues if Xenomai and Linux use the same caches), or some dedicated hardware like PRU. There is also some interesting work by Jan Kiszka - not yet on ARM.[1]

I think there is some confusion about what real-time really means. It doesn’t mean fast or even consistent, it just means that it will respond to some event in a required time. If your requirement is that something respond in 1 second, then Linux kernel is good for real-time. If you want a response of less than 1ms, then the Linux interrupt latency may not meet this requirement. Remember, latency tests are conducted when the processor is under load. Xenomai running on the BBB can achieve 50uS interrupt latency whereas preempt-rt is more like 200uS.

Regards,
John

Hi,

For those interested in what the BBB can do in terms of interrupt latencies with PREEMPT-RT applied, OSADL actually has one in their Q&A racks;

https://www.osadl.org/Profile-of-system-in-rack-7-slot-8.qa-profile-r7s8.0.html

Most recent latency plot (under load) is here:

https://www.osadl.org/Latency-plot-of-system-in-rack-7-slot.qa-latencyplot-r7s8.0.html

Regards
Daniel

Hi,

For those interested in what the BBB can do in terms of interrupt latencies with PREEMPT-RT applied, OSADL actually has one in their Q&A racks;

https://www.osadl.org/Profile-of-system-in-rack-7-slot-8.qa-profile-r7s8.0.html

Most recent latency plot (under load) is here:

https://www.osadl.org/Latency-plot-of-system-in-rack-7-slot.qa-latencyplot-r7s8.0.html

Very interesting. Looks like preempt-rt is getting better all the time. At 110uS, it is only double that of Xenomai and for most applications that won’t matter.

Regards,
John

No, the PRUSS unit is not a solution because support for it is even worse. It’s not a solution to program in assembler. If we had PREEMPT_RT we could use the full Linux functionality. That’s the way to go.

Only PREEMPT_RT allows access to the full Linux functionality. Xenomai uses a dual kernel concept which is very limited. All custom device drivers need be design to fit into the Xenomai concept which makes things even worse. The performance gain of Xenomai compared to that of PREEMPt_RT is negligible in most cases.

Be careful of applying x86 experience to the ARM. PREEMPT_RT requires
well written driver code that is "high-performance SMP friendly" in
order to run well. PREEMPT_RT on the x86 works so well because a *LOT*
of smart people have been working very hard to get maximum performance
out of the multi-core CPUs that ship in virtually every new system these
days.

ARM systems, on the other hand, are riddled with vendor supplied device
drivers that hopefully work well and if you're lucky weren't written by
the summer intern. The ARM situation _is_ getting better, but IMHO
PREEMPT_RT on the ARM is still hit-and-miss. It will work quite well in
some situations, and have horrible performance on similar but
not-quite-identical setups.

That said, while Xenomai offers better bounded performance figures,
PREEMPT_RT is perfectly fine for a large class of problems and as you
mentioned you get access to the full suite of Linux services.

Unless I missed it, you never said exactly _what_ you are trying to do,
you simply started off by complaining that the kernel wasn't configured
by default the way that you wanted it. So go compile a kernel, and if
you're asking for advice, please provide specifics on exactly what you
are trying to do and any limitations on your solution space.

Also, make sure you test any PREEMPT_ based kernel to determine your
worst-case performance. When I was looking into this some time ago, the
PREEMPT_RT patches wouldn't even apply to the BeagleBone kernel, and
using the built-in CONFIG_PREEMPT setting I was seeing latencies in the
hundreds of mS (yes that is tenths of seconds!). I believe things have
gotten much better, but you'll need to test to know if your setup will
provide the performance you require.

No, the PRUSS unit is not a solution because support for it is even worse.
It's not a solution to program in assembler. If we had PREEMPT_RT we could
use the full Linux functionality. That's the way to go.

There is a C compiler which is in beta, and can be requested. There have
been references to it on this list.

David

While I agree with your definition I’d like to know where you have got this results as mine in worst case conditions have been somewhat different, and worst includes "load+running in SD card " the latency test average have been more around 40 µS:

http://flic.kr/ps/2LwUC9

Therefore, I’m confident in the new latency measurements that I’m going to do with the Charles’ Xenomaibone39 from the emmC, as it will < those in SD.

Hi Charles,

the PREEMPT_RT patches can now be applied to the BBB kernel. As John3909 suggested there is a ready to use patch script available from OSADL. Compiling a BBB kernel 3.12.10-rt15 is quick and easy. But unfortunately, it requires the use of kernel 3.12.x which causes problems with the GPMC as documented at https://groups.google.com/forum/#!searchin/beagleboard/gpmc/beagleboard/KOHLJI1NUTA/8wrsV_ZodDUJ.

In the default BBB kernel there is no “built-in” PREEMPT option. Without that Linux cannot be used for any time critical application and latencies a very bad. Contrary to that the Raspberry PI’s default kernel had PREEMPT activated from the very beginning and does provide some form of determinism out of the box.

What we need is a default kernel that has at least support for the simple PREEMPT option. Yes, there are custom drivers and they need to fit into the concept. But that is working perfectly for the Raspberry PI which definitely uses custom firmware and drivers. The BBB seems to be far behind that.

There's PREEMPT_RT, and there's PREEMPT. *ALL* Linux kernels have
PREEMPT available now. It sounds like you're just complaining that the
kernel was built with a different option than you want. Simply rebuild
the kernel and set CONFIG_PREEMPT instead of CONFIG_PREEMPT_VOLUNTARY:

https://github.com/RobertCNelson/linux-dev/blob/am33x-v3.8/patches/defconfig#L467

Or if you're requesting a change to the default kernel configuration,
you're going about it in kind of a round-about way.

I have recently tested kernel 3.8.13-rt9 (https://github.com/beagleboard/kernel/tree/3.8-rt) using git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git. I am using Ubuntu 12.04.4. The load was created using stress –cpu 1 which generates a cpu load of about 100%. I then used cyclictest:

root@ubuntu-armhf:/home/ubuntu/rt-tests# ./cyclictest -l1000000 -m -n -t1 -p99 -i400 -q

/dev/cpu_dma_latency set to 0us

T: 0 ( 770) P:99 I:400 C:1000000 Min: 14 Act: 19 Avg: 18 Max: 132

uname -a reports:

root@ubuntu-armhf:/home/ubuntu/rt-tests# uname -a

Linux ubuntu-armhf 3.8.13-rt9-00899-g160e771 #1 SMP PREEMPT Wed Jun 19 10:49:36 CEST 2013 armv7l armv7l armv7l GNU/Linux

I am absolutely surprised that the result is looking that good.

Of course every user can create his own kernel coniguration or even modify the Linux kernel in any way he wants. But if there is any problem he’s beeing left alone. I think it would be much better to have a default kernel configuration which at least provides simple PREEMPT support. That might help people to fight against most simple latency problems.