Samuel <samuel.xu.tech@gmail.com> writes:
Michael Zucchi <not...@gmail.com> writes:
>> Performance counter is too heavy for most usage and it is used for
>> detailed performance tuning.
> Err? It is?
Of course not. The performance counters have no overhead at all.
> If you don't need detailed performance tuning use gettimeofday().
gettimeofday() is much more expensive. Reading the Cortex-A8 cycle
counters with an MRC instruction takes 50 cycles while a call to
gettimeofday() takes about 1000 cycles.
>> If there is a similar counter like X86's TSC(increase 1 on every
>> machine cycle,like watch's tick and readable globally), it will be
>> very convenient. Since user only need to read the TSC register
>> directly. 2 TSC value's difference is simply what's the elapse cycles
>> between 2 TSC reading.
> Well like i said in my reply, there is a simple cycle counter too.
>> Any more information?
> Try reading the manuals and using google.
The ARM ARM and Cortex-A8 TRM would be good reading.
Here is a patch to make the counters available directly from userspace:http://git.mansr.com/?p=linux-omap;a=commitdiff;h=5170038
Thanks Måns Rullgård and Michael Zucchi !
Yes, gettimeofday() needs much more cycles and granularity might be
too big.
Could you share me more one how to use MRC instruction to read some
cycle counter? e.g. which counter, how? 50 cycle overhead looks ok
for me.
BTW, I must clarify that "heavy" in my context is that :usage step is
not very easy, since I must communicate PMU via kernel module, the
usage mode is some how "heavy", which need more code and debugging
than read register directly from user space..... I know the overhead
of PMU is light. 
I prefer some counter not lived inside PMU, while if there isn't any
choice beside PMU, I will try PMU cycle counter. For the patch of
usage space visiting of PMU, need I re-compile kernel? or it is
already up-streamed? Is there any usage example?
To use the cycle counter from userspace, apply the patch linked above,
enable the new config option and rebuild the kernel. In your app, use
these functions to access the counter:
static inline void ccnt_start(void)
{
__asm__ volatile ("mcr p15, 0, %0, c9, c12, 1" :: "r"(1<<31));
}
static inline void ccnt_stop(void)
{
__asm__ volatile ("mcr p15, 0, %0, c9, c12, 2" :: "r"(1<<31));
}
static inline unsigned ccnt_read(void)
{
unsigned cc;
__asm__ volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r"(cc));
return cc;
}
static inline void ccnt_init(void)
{
ccnt_stop();
__asm__ volatile ("mcr p15, 0, %0, c9, c12, 0" :: "r"(5));
}
If you don't stop the counter after you're done with it, oprofile will
be unhappy, should you wish to use it. Needless to say, using this
while oprofile is running is not a good idea.