Beaglebone Black low memory throughput when reading

Hi, I’ve had Beaglebone Black for quite some time now and known about this issue. However, I never asked anyone about it, until now. When I run memory performance tests with a simple C program, I get the following results:

Write 200MiB of data using memcpy() to RAM - this gives 1230MiB/sec.
Read 200MiB of data using memcpy() from RAM - this gives 200MiB/sec.

The 200MiB memory region is allocated with malloc() and then every page is pre-faulted by writing one byte to it, so when I call memcpy() all physical memory pages are mapped to the process. The throughput measurements are started after all pages have been pre-faulted. So the code runs like this:

malloc() 200MiB region
pre-fault 200MiB region
time1 = get current time
do memcpy()
time2 = get current time
calculate throughput based on time1 and time2 difference

The RAM read throughput is quite disappointing and contributes significantly to large latencies when running memory intensive applications. Does anyone have any ideas why this is the case? Is it a quirk of this specific hardware? Thanks.

BeagleBone Black DDR3L

5.3.1 512MB DDR3L
A single 256Mb x16 DDR3L 4Gb (512MB) memory device is used. The memory used is
the MT41K512M16HA-125 from Micron. It will operate at a clock frequency of
303MHz yielding an effective rate of 606MHZ on the DDR3L bus allowing for
1.32GB/S of DDR3L memory bandwidth.
tar -zxvf ramsmp-3.5.0.tar.gz
cd ramsmp-3.5.0/
export CFLAGS="-O3 -march=native $CFLAGS"
cc $CFLAGS -o ramsmp fltmark.c fltmem.c intmark.c intmem.c ramsmp.c
debian@bbb-pwr01-ser09:~/ramsmp-3.5.0$ ./ramsmp -b 1
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

8Gb per pass mode, 2 processes

INTEGER & WRITING         1 Kb block: 1536.73 MB/s
INTEGER & WRITING         2 Kb block: 1523.58 MB/s
INTEGER & WRITING         4 Kb block: 1504.21 MB/s
INTEGER & WRITING         8 Kb block: 1493.01 MB/s
INTEGER & WRITING        16 Kb block: 1491.49 MB/s
INTEGER & WRITING        32 Kb block: 1494.19 MB/s
INTEGER & WRITING        64 Kb block: 1494.16 MB/s
INTEGER & WRITING       128 Kb block: 1495.49 MB/s
INTEGER & WRITING       256 Kb block: 1494.00 MB/s
INTEGER & WRITING       512 Kb block: 1495.48 MB/s
INTEGER & WRITING      1024 Kb block: 1495.73 MB/s
INTEGER & WRITING      2048 Kb block: 1494.60 MB/s
INTEGER & WRITING      4096 Kb block: 1493.46 MB/s
INTEGER & WRITING      8192 Kb block: 1479.68 MB/s
INTEGER & WRITING     16384 Kb block: 1484.87 MB/s
INTEGER & WRITING     32768 Kb block: 1476.65 MB/s
debian@bbb-pwr01-ser09:~/ramsmp-3.5.0$ ./ramsmp -b 2
RAMspeed/SMP (GENERIC) v3.5.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

8Gb per pass mode, 2 processes

INTEGER & READING         1 Kb block: 3778.31 MB/s
INTEGER & READING         2 Kb block: 3807.56 MB/s
INTEGER & READING         4 Kb block: 3770.33 MB/s
INTEGER & READING         8 Kb block: 3804.58 MB/s
INTEGER & READING        16 Kb block: 3808.98 MB/s
INTEGER & READING        32 Kb block: 3801.54 MB/s
INTEGER & READING        64 Kb block: 2788.45 MB/s
INTEGER & READING       128 Kb block: 1920.23 MB/s
INTEGER & READING       256 Kb block: 595.01 MB/s
INTEGER & READING       512 Kb block: 312.75 MB/s
INTEGER & READING      1024 Kb block: 278.06 MB/s
INTEGER & READING      2048 Kb block: 273.88 MB/s
INTEGER & READING      4096 Kb block: 241.53 MB/s
INTEGER & READING      8192 Kb block: 273.36 MB/s
INTEGER & READING     16384 Kb block: 273.31 MB/s
INTEGER & READING     32768 Kb block: 273.35 MB/s


Hi, looks like you demonstrated the same issue that I described. When you start reading larger segments of memory that don’t entirely fit into L1 cache, the throughput drops to around 200MiB/sec. But do you know the answer on my original question - why does this hardware have such low throughput when reading from RAM?

Sorry, no idea, it’s been that way since launch… I ran the “ramsmp” benchmark just to compare with numbers on which still jive from 10 years ago…

if your on irc, i’d ping zmatt, he did look into this a few years ago:


OK thanks for the info, although I don’t think PRU has anything to do with it, i.e. it’s a completely separate subsystem. I’m not reading data from PRU RAM.