Clear CPU caches for reading system timer on beaglebone

Hi,

I am using TI AM335x BeagleBone, Linux kernel version 3.8.13. My aim is to get fast and reliable time values from user-space using dmtimer.

When I I read (and print) tick values of DMTimer2 though /dev/mem, I have the following:

3146348594
3146350438
3146352109
3146357959
3146360117
3146361773
3146363376
3146364986
3146370527
3146374221
3146376003
3146382901
3146384741
3146386379

However, when I invoke device driver with mmap operation and then in user space interact with this device, I have a strange behavior. Ticks are repeated periodically:

2296380051
2296386006
2296386006
2296403883
2296412958
2296423574
2296423574
2296438010
2296438010
2296438010
2296457840
2296468548
2296468548
2296482669
2296482669
2296482669
2296482669
2296507701

I am not sure, but it looks like L1/L2 caches are overflowed. My question is what can cause this behaviour. And if it is CPU caches, how they can be cleared.
I found this answer How to clear CPU L1 and L2 cache, but I doubt how I can apply this method.
Do you have any suggestions? I will appreciate any help.

If it is necessary, I post here briefly the way how I obtain these ticks vs driver.

  1. remapping of a specific region in kernel space (0x48040000 - DMTIMER2 register start address):
static int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma)
{       
   unsigned long off = vma->vm_pgoff << PAGE_SHIFT;    
   // generate the correct page frame number  
   unsigned long pfn = (0x48040000 + off) >> PAGE_SHIFT;

   // vsize is the requested size of virtual memory
   unsigned long vsize = vma->vm_end - vma->vm_start;

   // psize is the physical I/O size that is left after the offset has been specified
   unsigned long psize = 0x48040000 + 32 - off;

   // refuses to map addresses that extend beyond the allowed memory range
   if (vsize > psize)
   {   
      return -EINVAL;
   }
   if (remap_pfn_range(vma, vma->vm_start, pfn, vsize, vma->vm_page_prot))
   {      
      return -EAGAIN;
   }
   vma->vm_ops = &simple_remap_vm_ops;  
   simple_vma_open(vma);
   return 0;
}

  1. in user space I do it approximately in the following way:

.

  volatile unsigned char* dmt2_regs;
  dmt2_regs = (unsigned char*) mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  while (true)
  {
       uint32_t t0 = * (uint32_t*) ( dmt2_regs + 0x3c );
       std::cout << t0 << std::endl;                              
  }

https://groups.google.com/forum/#!searchin/beagleboard/$20Clear$20CPU$20caches/beagleboard/ypXbgkmkAek/GaCqAyAWREkJ