Cache Performance Issues

I have seen several old posts about the L2 performance from a year ago
and some recently. The issue that I am seeing is that when doing a
series of memcpy's from one address to another address it does not
appear that the L2 is functioning. When doing repetitive memcpy's
that should stay within the L1 I get less than ideal results but when
I enter the L2 range there seems to be performance increase over the
system ram. When the L2 is disabled, the L1 provides a much greater
performance than when L2 is enabled. None of this makes a lot of
sense.
Have other people seen this issue?
Is this a OMAP problem?
Is there a workaround?
Thanks in advance for help.