Faster memcpy


I am using Ubuntu 13.10 with kernel 3.13.6 on my BBB. I heavily use memcpy in my application. It takes ~18ms to copy 3932160bytes from src to dest. Thats ~200MB/sec. Is there any chance to make memcpy faster?

As compiler I use
gcc-Version 4.8.2 20130902 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2013.09 - Linaro GCC 2013.09)
on my windows machine. The compiler switches are:
-march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3

Any help is appreciated.


See from here

fastest way is PLD + NEON

but seems linaro dont use neon as default

I test with this code

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main ( void ) {
char *src;
char *dst;

arm-linux-gnueabihf-gcc test.c -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3 -static

arm-linux-gnueabihf-objdump -S a.out > memcpy.dump

You can see it . but I dont know if any version gcc for arm use neon + pld as default . if anyone know , please let me know .

Thanks and regards