Omap3 Performance Counters

Hi folks.

Yesterday I've tried to get access to the Omap3 performance counters.

To enable usermode access I wrote a minimal kernel-module that toggles
the performance counter user-mode access flag in the coprocessor. The
module worked out of the box.

I'm now able to read the cycle counter register. I can reset all
event-counters, read and write to them, setup events and so on without
problems. However, no matter what I program into the event-registers,
the counter-register never increase. :frowning:

I found some old posts on this list regarding the same issue, so I
wonder if someone has solved the problem in the meantime.

Btw: I'm running my tests on a rev. B4 beagleboard which as far as I
know has very early silicon. Is it possible that the performance
counters are simply not functional with this revision?

Cheers,
  Nils

np wrote:

Hi folks.

Yesterday I've tried to get access to the Omap3 performance counters.

To enable usermode access I wrote a minimal kernel-module that toggles
the performance counter user-mode access flag in the coprocessor. The
module worked out of the box.

I'm now able to read the cycle counter register. I can reset all
event-counters, read and write to them, setup events and so on without
problems. However, no matter what I program into the event-registers,
the counter-register never increase. :frowning:

I found some old posts on this list regarding the same issue, so I
wonder if someone has solved the problem in the meantime.

Btw: I'm running my tests on a rev. B4 beagleboard which as far as I
know has very early silicon. Is it possible that the performance
counters are simply not functional with this revision?
  
Are you declaring your pointers as volatile? Maybe gcc is optimizing
out your reads from the counter register?

b.g.

Bill Gatliff wrote:

Are you declaring your pointers as volatile? Maybe gcc is optimizing
out your reads from the counter register?
  

Hi Bill,

Thanks for answering.

The registers are not memory mapped but reside inside the coprocessor.

I use an asm volatile inline-assembler code block to read them, so the
compiler can't optimize them out. I also checked the disassembly to make
sure everything compiles as expected.. Looks fine so far..

Cheers,
    Nils Pipenbrinck

deepakvr@gmail.com wrote:

can this thread give some insights?

http://groups.google.com/group/beagleboard/browse_thread/thread/c47fc5c864576a7a/ba8421d3df1b35bc?hl=en&lnk=gst&q=performance+counters#ba8421d3df1b35bc

Thanks a lot.

I've tried the code, but it has exactly the same problem as mine. The
thread also shows that I'm not the first one who has these problems. I
can access the performance-counters (no exceptions generated), but they
never increase in value.

After some further digging I found out that there are two bits that
further control access to performance counters. It's NIDEN and DBGEN.
After boot these are zero. I checked that via one of the CP14 debug
registers.

Digging deeper:

DBGEN is not a bit in some register but a signal. Seems like I need a
JTAG-devide to set it.

NIDEN otoh is a ordinary register bit. I could set it in the "Secure
Debug Enable Register" of CP15. Unfortunately I need secure privileged
mode to do so.

Twiddeling this bit from a kernel-module results in exceptions. Guess
I'm not secure or privileged enough at the moment.

Any idea how to execute code in this secure mode? I thought executing
stuff from the kernel is the most privileged thing I could do.

Cheers,
  Nils Pipenbrinck

I'm running the code NOT under linux, but with code-composer, and here
what works for me:

int Counter_uSec = 0;

void counters_init( void )
{
  int pmuserenr=0;
  int pmcr=0;
  pmuserenr = readPMUSERENR();
  writePMCR_Ebit(1); /* enable the A8 counters */
  writePMCR_Dbit(0); /* no clock divider */
  pmcr = readPMCR();
  enableCycleCounter();
  writePMCNTENSET_eventCounters(0xf); /* enable all the user counters
*/
  setupEventCounter(Counter0, EventTypeInstructions);
  setupEventCounter(Counter1, EventTypeICacheMisses);
  setupEventCounter(Counter2, EventTypeDCacheMisses);
  setupEventCounter(Counter3, EventTypeCycleCount);
  Counter_uSec = 0;
  resetAllCounters();
}

;****************************************************************
int GetCounter( void )
{
    unsigned int c2 = readCycleCounter();
    resetAllCounters();
    c2 = c2 / 484; // 331;
    Counter_uSec += c2;
    return c2;
}

;****************************************************************
;****************************************************************
;****************************************************************
;****************************************************************
  .global _writePMCR_Ebit
_writePMCR_Ebit:
  mrc p15, #0, r1, c9, c12, #0

    AND r0,r0,#0x3
    BIC r1,r1,#0x3
    ORR r1,r1,r0
  mcr p15, #0, r1, c9, c12, #0
  bx lr

;****************************************************************
  .global _readPMUSERENR
_readPMUSERENR:
  mrc p15, #0, r0, c9, c14, #0
  bx lr

;****************************************************************
; unsigned int readPMCR(void)
; Read the Performance Monitor Control Register
  .global _readPMCR
_readPMCR:
  mrc p15, #0, r0, c9, c12, #0
  bx lr

;****************************************************************
; void enableCycleCounter(void)
; enable the cycle counter register
  .global _enableCycleCounter
_enableCycleCounter:
  mov r1, #1
  mov r1, r1, lsl #31
  mrc p15, #0, r0, c9, c12, #1
  orr r0, r0, r1
  mcr p15, #0, r0, c9, c12, #1
  bx lr

;****************************************************************
; void writePMCNTENSET_eventCounters(int eventCounters)
; Write to Count Enable Set Register event counters (PMCNTENSET)
  .global _writePMCNTENSET_eventCounters
_writePMCNTENSET_eventCounters:
  ; was bug...need high order bit set to enable counter
  orr r0,r0, #0x80000000
  mcr p15, #0, r0, c9, c12, #1
  bx lr

;****************************************************************
; void setupEventCounter(int eventNumber, int eventType)
; setup event number to type of event to count
  .global _setupEventCounter
_setupEventCounter:
  mcr p15, #0, r0, c9, c12, #5
  mcr p15, #0, r1, c9, c13, #1
  bx lr

;****************************************************************
; void resetAllCounters(void)
; Reset all counters including the cycle counter
; Write to the Performance Monitor Control Register C-bit and P-bits,
bit[3:2]
  .global _resetAllCounters
_resetAllCounters:
  mrc p15, #0, r1, c9, c12, #0
  orr r1, r1, #0x6 ; set C, E bits to 1 - 'P=clear user counters',
C="clear clock counter'
  mcr p15, #0, r1, c9, c12, #0 ; reset the counters
  bx lr

;****************************************************************
; unsigned int readCycleCounter(void)
; read cycle counter register
  .global _readCycleCounter
_readCycleCounter:
  mrc p15, #0, r0, c9, c13, #0
  bx lr

Yuli Kaplunovsky
yk@magniel.com
+1 408 884 5965

I'm running the code NOT under linux, but with code-composer, and here
what works for me:

int Counter_uSec = 0;

void counters_init( void )
{
  int pmuserenr=0;
  int pmcr=0;
  pmuserenr = readPMUSERENR();
  writePMCR_Ebit(1); /* enable the A8 counters */
  writePMCR_Dbit(0); /* no clock divider */
  pmcr = readPMCR();
  enableCycleCounter();
  writePMCNTENSET_eventCounters(0xf); /* enable all the user counters
*/
  setupEventCounter(Counter0, EventTypeInstructions);
  setupEventCounter(Counter1, EventTypeICacheMisses);
  setupEventCounter(Counter2, EventTypeDCacheMisses);
  setupEventCounter(Counter3, EventTypeCycleCount);
  Counter_uSec = 0;
  resetAllCounters();
}

;****************************************************************
int GetCounter( void )
{
    unsigned int c2 = readCycleCounter();
    resetAllCounters();
    c2 = c2 / 484; // 331;
    Counter_uSec += c2;
    return c2;
}

;****************************************************************
;****************************************************************
;****************************************************************
;****************************************************************
  .global _writePMCR_Ebit
_writePMCR_Ebit:
  mrc p15, #0, r1, c9, c12, #0

    AND r0,r0,#0x3
    BIC r1,r1,#0x3
    ORR r1,r1,r0
  mcr p15, #0, r1, c9, c12, #0
  bx lr

;****************************************************************
  .global _readPMUSERENR
_readPMUSERENR:
  mrc p15, #0, r0, c9, c14, #0
  bx lr

;****************************************************************
; unsigned int readPMCR(void)
; Read the Performance Monitor Control Register
  .global _readPMCR
_readPMCR:
  mrc p15, #0, r0, c9, c12, #0
  bx lr

;****************************************************************
; void enableCycleCounter(void)
; enable the cycle counter register
  .global _enableCycleCounter
_enableCycleCounter:
  mov r1, #1
  mov r1, r1, lsl #31
  mrc p15, #0, r0, c9, c12, #1
  orr r0, r0, r1
  mcr p15, #0, r0, c9, c12, #1
  bx lr

;****************************************************************
; void writePMCNTENSET_eventCounters(int eventCounters)
; Write to Count Enable Set Register event counters (PMCNTENSET)
  .global _writePMCNTENSET_eventCounters
_writePMCNTENSET_eventCounters:
  ; was bug...need high order bit set to enable counter
  orr r0,r0, #0x80000000
  mcr p15, #0, r0, c9, c12, #1
  bx lr

;****************************************************************
; void setupEventCounter(int eventNumber, int eventType)
; setup event number to type of event to count
  .global _setupEventCounter
_setupEventCounter:
  mcr p15, #0, r0, c9, c12, #5
  mcr p15, #0, r1, c9, c13, #1
  bx lr

;****************************************************************
; void resetAllCounters(void)
; Reset all counters including the cycle counter
; Write to the Performance Monitor Control Register C-bit and P-bits,
bit[3:2]
  .global _resetAllCounters
_resetAllCounters:
  mrc p15, #0, r1, c9, c12, #0
  orr r1, r1, #0x6 ; set C, E bits to 1 - 'P=clear user counters',
C="clear clock counter'
  mcr p15, #0, r1, c9, c12, #0 ; reset the counters
  bx lr

;****************************************************************
; unsigned int readCycleCounter(void)
; read cycle counter register
  .global _readCycleCounter
_readCycleCounter:
  mrc p15, #0, r0, c9, c13, #0
  bx lr

Yuli Kaplunovsky
yk@magniel.com
+1 408 884 5965

Nils,

We tried (in vain) to get the performance counters to work within the Linux environment on the beagle. Recently, we got hold of a JTAG device and connected to the beagleboard. Then from within the CCS environment the event counters increment properly (as reported by Yuli below).

Yuli,

From within the CCS, we observed very high clock/event counts when we profiled the code. We realized this is because the default rts library does not initialize the MMU. For the beagleboard, did you initialize the MMU from within the CCS project? If yes, can you post the code for the initialization?

Thanks,
Krishna

Krishna Nagarajan wrote:

Nils,

We tried (in vain) to get the performance counters to work within the Linux environment on the beagle. Recently, we got hold of a JTAG device and connected to the beagleboard. Then from within the CCS environment the event counters increment properly (as reported by Yuli below).
  

Hi Yuli,

Thanks for your response. That's exactly what I expected. Great that it
works for you... bad luck for those without at jtag-device. (Okay, it's
cheap, but still)..

I'm now sure that the CCS + JTag-combination raises one of the two
essential signals that could disable the performance counter. In your
case one of the signals has been raised by sending the right bitstrem
via JTAG. The other signal that allows us to optimize our code is in
theory programmalble, but unfortunatley within code that has "secure
privileged" rights.

If anyone from the TI folks is listening (Hey!):

I have no idea what it takes to execute code in that "secure privileged
mode", but executing stuff from within the linux-kernel is not
privileged enough).

However, I bet MLO or xload could...

Would it be a big deal to give us a binary that enables the funky NIDEN
bit of CP15 "Secure Debug Enable Register"? As an option maybe? Giving
us a "Debug MLO" and "Release MLO" would be something everyone could
live with imho.

Cheers,
  Nils

Hi Nils,

The other signal that allows us to optimize our code is in
theory programmalble, but unfortunatley within code that has "secure
privileged" rights.

I have no idea what it takes to execute code in that "secure privileged
mode", but executing stuff from within the linux-kernel is not
privileged enough).

However, I bet MLO or xload could...

Would it be a big deal to give us a binary that enables the funky NIDEN
bit of CP15 "Secure Debug Enable Register"?

You can compile your own MLO/x-loader in case you want. The code is freely
available..
Info about obtaining the code can be found at:
http://elinux.org/BeagleBoard#X-Loader
That being said, I'm not sure that x-loader have and special privileges.

Looking at:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0344j/ch12s08
s05.html
it seems you need to set CP15 SCR[0] in order to put the CPU into Secure
State, but this is an area of which I (for Cortex A8) know very little, so
I'm unfortunate not able to help you further than this :slight_smile:

Good luck
  Søren

There are instructions in the OMAP35xx initialization guide on
entering secure mode - please see page 22. of the sprufd6b.pdf. Using
r12=3, you can set the aux. control reg appropriately to set secure
mode accesses to other pert. regs.

Jerry Johns