Faster memcpy

Hi,

I am using Ubuntu 13.10 with kernel 3.13.6 on my BBB. I heavily use memcpy in my application. It takes ~18ms to copy 3932160bytes from src to dest. Thats ~200MB/sec. Is there any chance to make memcpy faster?

As compiler I use
gcc-Version 4.8.2 20130902 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2013.09 - Linaro GCC 2013.09)
on my windows machine. The compiler switches are:
-march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3

Any help is appreciated.

Robert

See from here
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka13544.html

fastest way is PLD + NEON

but seems linaro dont use neon as default

I test with this code

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main ( void ) {
char *src;
char *dst;
src=malloc(1<<20);
dst=malloc(1<<20);
memcpy(dst,src,(1<<20));
}

arm-linux-gnueabihf-gcc test.c -march=armv7-a -mtune=cortex-a8 -mfloat-abi=hard -mfpu=neon -ffast-math -O3 -static

arm-linux-gnueabihf-objdump -S a.out > memcpy.dump

You can see it . but I dont know if any version gcc for arm use neon + pld as default . if anyone know , please let me know .

Thanks and regards