How can I improve processing time on BB?

I’m trying to use Beagle Board for real-time audio-processing.

I bought this one because of these clock speed(1Ghz).

My algorithm running time is about 0.6 ms for one frame in my computer(window7, 2.6Ghz Quad-core).

But when I run my algorithm on BB(Angstrom, 1Ghz), it spends about 40ms for one frame.

Isn’t it ridiculous? Because clock speed defference is just 2 or max 3 times. But measuring processing speed is not.

Wha’ts wrong with it?

Please let me know I didn’t notice something.

I have to running my algorithm in at least 6ms for one frame.

All the best.

For a start, your 2.7GHz Windows machine is _not_ an ARM architecture so
comparing the two is useless. Secondly, processor speed is not linear,
1Ghz is not always worse than 2GHz, and something running in 10seconds
at 1Ghz, will not always run in 5 seconds at 2Ghz.

I suggest you read up processor architectures and understand some of the
basics of CPUs. As for speeding up your algorithm, try using the NEON
instructions available on this ARM chip, however your code will then not
run on Windows PC. I also assume you're using C, and not
Python/Java/Javascript.

to answer simply: mhz != performance, (like pentium4 vs ivy bridge)

A coretex-a8 is diff to current mainline x64 arch, different pipelines, memory/cache hierarchy, huge memory bandwidth difference

for mem bandwidth theoretical beagle is 16bit * 800MHz = 1.6GiB/s (also need to share with display and blah) , tested in real life is about 300MiB/s
for a pc dual ch pc12800: 1600MHz * 64bit * 2 (ch) = 25.6GiB/s, tested real life is about 20GiB/s

Assuming that you can make a fair comparison of clock speed to performance[which you can’t], your difference is over 10 times, not 2 or 3. Your comparing a quad core device with a single core device. Right off the bat this will cause issues since Linux is not real time operating system tuned for a specific set of tasks, it is a general operating system - so it is running multiple processes at the same time. On your quad core device, you have some buffer since the running programs will be spread between different processors. The Bone will be running everything on one core.

Firstly, did you make sure to adjust the priority of your process to give it a big share of the processor space? If running from the command line, for example “runmyprogram” instead run it as “sudo nice --adjustment=-20 runmyprogram**”**

Secondly, you have to consider what your doing. Your doing audio processing. What do musicians do for that? They use Linux with the Real Time performance enhancements installed. This lets you give your process higher priority over almost everything else, including most of the operating system. So you need to recompile your kernel for that.

Thirdly, your processing a large amount of data, correct? And your trying to store it on the “hard drive” I bet - which is a slow MMC card[as opposed to your fast SATA hard drive on your windows box]. The simple solution there is to sacrifice some memory and setup a ram disk. Check your /etc/fstab file - you probably already have a small ramdisk setup there to map to the /temp directory using the tmpfs file system. Just increase that up to 128M and do all your file processing in the /temp folder.

Fourthly, avoid disparaging seeming comments like “isn’t it ridiculous” and instead phrase them in such a way as to take responsibility for the failing such as “I’m sure I am doing something wrong. What methods are there to increase performance?”

Speaking for myself personally, it makes the difference when I answer between my taking the time to do a few google searches and provide links to articles explaining how to implement something vs just giving the answers and leaving looking up implementation to the other party.

^^^

Fifthly - Look into using the two PRU’s to help offload some tasks. Also look into using using specific external hardware to offload even more from the CPU.

Could also be that your routines need some tightening up. Have you done any cycle counting on your code ? Checked the generated ASM ?

I’m trying to use Beagle Board for real-time audio-processing.

Oh, I just ran across this link[again] when trying to fiddle with my desktop sound - it has no direct bearing for me but I’d suggest you take a look at it.

If your doing Real Time Audio Processing, there is a group dedicated to doing just that and have produced a bunch of tools in an open source package for working on it. It’s called “Jack”, http://jackaudio.org/

Note that the consensus of the Jack Community[which includes professionals in audio development] is that the Real Time Kernel options aren’t needed for most use cases. http://jackaudio.org/realtime_vs_realtime_kernel

"No. Realtime scheduling is available on all Linux systems no matter what kernel they use, and current versions of JACK use it by default. A kernel built with the realtime patches (an “RT kernel”) is needed only if:

  • You want to run JACK with very low latency settings that require realtime performance that can only be achieved with an RT kernel
  • Your hardware configuration triggers poor latency behaviour which might be improved with an RT kernel

Most users do not need an RT kernel in order to use JACK, and most will be happy using settings that are effective without an RT kernel."

Personally, I’m such an extremely lazy programmer that I’d START with implementing my application using Jack and a reasonably sized tempfs drive to store files while working - and then ask my performance tuning questions on their mailing list as 80% of the issues won’t be related to the hardware.