BeagleBoard RevC MPU speed - Bare Metal

I am trying to use BeagleBoard RevC in bare metal.

A very simple test reveals that the MPU speed is not high enough (I am expecting to be able to run at the highest speed possible, 600 MHz if I am not wrong)

volatile float sec_init;
volatile float sec_end;
volatile uint32_t cnt;
const uint32_t maxiter = 10000000;

cnt = maxiter;
printf(“T\n”);
sec_init = second();
while (cnt)
{
cnt–;
asm("");
}
sec_end = second();
printf(“P\n”);
printf(“Time difference for %d iterations: %1.5f secs\n”, maxiter, sec_end - sec_init);
printf(“Number of iterations per second: %1.3f\n”, (float)(maxiter) / (sec_end - sec_init));

second() is a function which simply reports back the number of seconds run since startup, and I know it works properly.

Running above code:

T
P
Time difference for 10000000 iterations: 5.48938 secs
Number of iterations per second: 1821699.375

(The time 5.48938 secs is correct, using the time stamp feature of TeraTerm confirms that)

I was expecting something in the range of hundreds of hundreds of millions of iterations per second.

How can I configure the MPU on BeagleBoard Rev C to run at the highest possible frequency?

Making a dump of the relevant register configurations of PRM and CM registers does not show any remarkable thing. Indeed, if my understanding is correct, the MPU clock frequency should be 26 MHz / 2 x 500 / 12 = 541 MHz. But obviously, it is not.

U-Boot 2009.11 (Feb 23 2010 - 15:33:48)

OMAP3530-GP ES3.0, CPU-OPP2 L3-165MHz
OMAP3 Beagle board + LPDDR/NAND
I2C: ready
DRAM: 256 MB
NAND: 256 MiB
*** Warning - bad CRC or NAND, using default environment

In: serial
Out: serial
Err: serial
Board revision C1/C2/C3
Die ID #5760000300000000040323090d007008
Hit any key to stop autoboot: 0
OMAP3 beagleboard.org #

OMAP3 beagleboard.org #

OMAP3 beagleboard.org # loads

Ready for S-Record download …

First Load Addr = 0x80008000

Last Load Addr = 0x8001A65B

Total Size = 0x0001265C = 75356 Bytes

Start Addr = 0x80008000

OMAP3 beagleboard.org # go 80008000

Starting application at 0x80008000 …

[print_clock_info_clock_control] PRM_CLKSEL: 0x00000003
[print_clock_info_clock_control] > SYS_CLKIN_SEL = 0x3
[print_clock_info_clock_control] >> OSC_SYS_CLK is 26.0 MHz
[print_clock_info_clock_control] PRM_CLKSRC_CTRL: 0x00000080
[print_clock_info_clock_control] > CLKOUT_EN = 0x80
[print_clock_info_clock_control] >> sys_clkout1 is enabled
[print_clock_info_clock_pll] CM_CLKEN_PLL: 0x00370037
[print_clock_info_clock_pll] > PWRDN_EMU_PERIPH = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL4_M6X2_CLK HSDIVIDER path
[print_clock_info_clock_pll] > PWRDN_CAM = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL4_M5X2_CLK HSDIVIDER path
[print_clock_info_clock_pll] > PWRDN_DSS1 = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL4_M4X2_CLK HSDIVIDER path
[print_clock_info_clock_pll] > PWRDN_TV = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL4_M3X2_CLK HSDIVIDER path
[print_clock_info_clock_pll] > PWRDN_96M = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL4_M2X2_CLK path
[print_clock_info_clock_pll] > EN_PERIPH_DPLL_LPMODE = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL4 LP mode to re-enter the normal mode at the following lock or re-lock sequence
[print_clock_info_clock_pll] > PERIPH_DPLL_FREQSEL = 0x3
[print_clock_info_clock_pll] >> Range of the DPLL4 internal frequency depending on the DPLL reference clock and the N divider: 0.75 MHz - 1.0 MHz
[print_clock_info_clock_pll] > EN_PERIPH_DPLL_DRIFTGUARD = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL4 automatic recalibration mode
[print_clock_info_clock_pll] > EN_PERIPH_DPLL = 0x7
[print_clock_info_clock_pll] >> Enables the DPLL4 in lock mode
[print_clock_info_clock_pll] > PWRDN_EMU_CORE = 0x0
[print_clock_info_clock_pll] >> Power-up the DPLL3_M3X2 HSDIVIDER path
[print_clock_info_clock_pll] > EN_CORE_DPLL_LPMODE = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL3 LP mode to re-enter the normal mode at the following lock or re-lock sequence
[print_clock_info_clock_pll] > CORE_DPLL_FREQSEL = 0x3
[print_clock_info_clock_pll] >> Range of the DPLL3 internal frequency depending on the DPLL reference clock and the N divider: 0.75 MHz - 1.0 MHz
[print_clock_info_clock_pll] > EN_CORE_DPLL_DRIFTGUARD = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL3 automatic recalibration mode
[print_clock_info_clock_pll] > EN_CORE_DPLL = 0x7
[print_clock_info_clock_pll] >> Enables the DPLL3 in lock mode
[print_clock_info_clock_pll] CM_CLKEN2_PLL: 0x00000011
[print_clock_info_clock_pll] > EN_PERIPH2_DPLL_LPMODE = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL5 LP mode to re-enter the normal mode at the following lock or re-lock sequenc
[print_clock_info_clock_pll] > PERIPH2_DPLL_FREQSEL = 0x1
[print_clock_info_clock_pll] >> Range of the second PERIPHERAL DPLL internal frequency depending on the DPLL reference clock and the N divider: Reserved
[print_clock_info_clock_pll] > EN_PERIPH2_DPLL_DRIFTGUARD = 0x0
[print_clock_info_clock_pll] >> Disables the DPLL5 automatic recalibration mode
[print_clock_info_clock_pll] > EN_PERIPH2_DPLL = 0x1
[print_clock_info_clock_pll] >> Put the second DPLL5 in low power stop mode
[print_clock_info_clock_pll] CM_IDLEST_CKGEN: 0x00001E3F
[print_clock_info_clock_pll] > ST_EMU_PERIPH_CLK = 0x0
[print_clock_info_clock_pll] >> EMU_PER_ALWON_CLK is not active
[print_clock_info_clock_pll] > ST_CAM_CLK = 0x1
[print_clock_info_clock_pll] >> CAM_MCLK is not active
[print_clock_info_clock_pll] > ST_DSS1_CLK = 0x1
[print_clock_info_clock_pll] >> DSS1_ALWON_FCLK is not active
[print_clock_info_clock_pll] > ST_TV_CLK = 0x1
[print_clock_info_clock_pll] >> DPLL4_M3X2_CLK is not active
[print_clock_info_clock_pll] > ST_FUNC96M_CLK = 0x1
[print_clock_info_clock_pll] >> DPLL4_M2X2_CLK is not active
[print_clock_info_clock_pll] > ST_EMU_CORE_CLK = 0x0
[print_clock_info_clock_pll] >> EMU_CORE_ALWON_CLK is not active
[print_clock_info_clock_pll] > ST_54M_CLK = 0x1
[print_clock_info_clock_pll] >> 54MHz is active
[print_clock_info_clock_pll] > ST_12M_CLK = 0x1
[print_clock_info_clock_pll] >> 12M_FCLK is active
[print_clock_info_clock_pll] > ST_48M_CLK = 0x1
[print_clock_info_clock_pll] >> 48M_FCLK is active
[print_clock_info_clock_pll] > ST_96M_CLK = 0x1
[print_clock_info_clock_pll] >> 96M_FCLK is active
[print_clock_info_clock_pll] > ST_PERIPH_CLK = 0x1
[print_clock_info_clock_pll] >> DPLL4 is locked
[print_clock_info_clock_pll] > ST_CORE_CLK = 0x1
[print_clock_info_clock_pll] >> DPLL3 is locked
[print_clock_info_clock_pll] CM_IDLEST2_CKGEN: 0x00000000
[print_clock_info_clock_pll] > ST_FUNC120M_CLK = 0x0
[print_clock_info_clock_pll] >> DPLL5_M2_CLK is not active
[print_clock_info_clock_pll] > ST_120M_CLK = 0x0
[print_clock_info_clock_pll] >> USB HOST 120M_FCLK is not active
[print_clock_info_clock_pll] > ST_PERIPH2_CLK = 0x0
[print_clock_info_clock_pll] >> DPLL5 is bypassed
[print_clock_info_clock_pll] CM_AUTOIDLE_PLL: 0x00000000
[print_clock_info_clock_pll] > AUTO_PERIPH_DPLL = 0x0
[print_clock_info_clock_pll] >> DPLL4 auto control disabled
[print_clock_info_clock_pll] > AUTO_PERIPH_DPLL = 0x0
[print_clock_info_clock_pll] >> DPLL3 auto control disabled
[print_clock_info_clock_pll] CM_AUTOIDLE2_PLL: 0x00000000
[print_clock_info_clock_pll] > AUTO_PERIPH2_DPLL = 0x0
[print_clock_info_clock_pll] >> DPLL5 auto control disabled
[print_clock_info_clock_pll] CM_CLKSEL1_PLL: 0x094C0C00
[print_clock_info_clock_pll] > CORE_DPLL_CLKOUT_DIV = 0x01
[print_clock_info_clock_pll] >> DPLL3 output clock is divided by 1
[print_clock_info_clock_pll] > CORE_DPLL_MULT = 0x14C
[print_clock_info_clock_pll] >> DPLL3 multiplier factor: 332
[print_clock_info_clock_pll] > CORE_DPLL_DIV = 0x0C
[print_clock_info_clock_pll] >> DPLL3 divider factor: 12
[print_clock_info_clock_pll] > SOURCE_96M = 0x0
[print_clock_info_clock_pll] >> Source for 96M_FCLK is CM_96M_FCLK
[print_clock_info_clock_pll] > SOURCE_54M = 0x0
[print_clock_info_clock_pll] >> Source for 54MHz clock is DPLL4_M3X2_CLK
[print_clock_info_clock_pll] > SOURCE_48M = 0x0
[print_clock_info_clock_pll] >> Source for Func_12M_clk and Func_48M_clk is CM_96M_FCLK
[print_clock_info_clock_pll] CM_CLKSEL2_PLL: 0x0001B00C
[print_clock_info_clock_pll] > PERIPH_DPLL_MULT = 0x1B0
[print_clock_info_clock_pll] >> DPLL4 multiplier factor: 432
[print_clock_info_clock_pll] > PERIPH_DPLL_DIV = 0x0C
[print_clock_info_clock_pll] >> DPLL4 divider factor: 12
[print_clock_info_clock_pll] CM_CLKSEL3_PLL: 0x00000009
[print_clock_info_clock_pll] > DIV_96M = 0x09
[print_clock_info_clock_pll] >> 96 MHz clock is DPLL4 clock divided by 9
[print_clock_info_clock_pll] CM_CLKSEL4_PLL: 0x00000000
[print_clock_info_clock_pll] > PERIPH2_DPLL_MULT = 0x000
[print_clock_info_clock_pll] >> DPLL5 multiplier factor: 0
[print_clock_info_clock_pll] > PERIPH2_DPLL_DIV = 0x00
[print_clock_info_clock_pll] >> DPLL5 divider factor: 0
[print_clock_info_clock_pll] CM_CLKSEL5_PLL: 0x00000001
[print_clock_info_clock_pll] > DIV_120M = 0x01
[print_clock_info_clock_pll] >> 120 MHz clock is DPLL5 clock divided by 1
[print_clock_info_clock_pll] CM_CLKOUT_CTRL: 0x00000003
[print_clock_info_clock_pll] > CLKOUT2_EN = 0x0
[print_clock_info_clock_pll] >> sys_clkout2 is disabled
[print_clock_info_clock_pll] > CLKOUT2_DIV = 0x0
[print_clock_info_clock_pll] >> sys_clkout2 / 1
[print_clock_info_clock_pll] > CLKOUT2_DIV = 0x3
[print_clock_info_clock_pll] >> sys_clkout2 source is 54 MHz clock
[print_clock_info_clk_mpu] CM_CLKEN_PLL_MPU: 0x00000037
[print_clock_info_clk_mpu] > EN_MPU_DPLL_LPMODE = 0x0
[print_clock_info_clk_mpu] >> Disables the MPU DPLL LP mode to re-enter the normal mode at the following lock or re-lock sequence
[print_clock_info_clk_mpu] > MPU_DPLL_FREQSEL = 0x3
[print_clock_info_clk_mpu] >> Range of the DPLL1 internal frequency depending on the DPLL reference clock and the N divider: 0.75 MHz - 1.0 MHz
[print_clock_info_clk_mpu] > EN_MPU_DPLL_DRIFTGUARD = 0x0
[print_clock_info_clk_mpu] >> Disables the DPLL1 automatic recalibration mode
[print_clock_info_clk_mpu] > EN_MPU_DPLL = 0x7
[print_clock_info_clk_mpu] >> Enables the DPLL1 in lock mode
[print_clock_info_clk_mpu] CM_IDLEST_MPU: 0x00000000
[print_clock_info_clk_mpu] > ST_MPU = 0x0
[print_clock_info_clk_mpu] >> MPU is active
[print_clock_info_clk_mpu] CM_IDLEST_PLL_MPU: 0x00000001
[print_clock_info_clk_mpu] > ST_MPU_CLK = 0x1
[print_clock_info_clk_mpu] >> DPLL1 is locked
[print_clock_info_clk_mpu] CM_AUTOIDLE_PLL_MPU: 0x00000000
[print_clock_info_clk_mpu] > AUTO_MPU_DPLL = 0x0
[print_clock_info_clk_mpu] >> DPLL1 auto control disabled
[print_clock_info_clk_mpu] CM_CLKSEL1_PLL_MPU: 0x0011F40C
[print_clock_info_clk_mpu] > MPU_CLK_SRC = 0x02
[print_clock_info_clk_mpu] >> DPLL1_FCLK is CORE_CLK divided by 2
[print_clock_info_clk_mpu] > MPU_DPLL_MULT = 0x1F4
[print_clock_info_clk_mpu] >> DPLL1 multiplier factor: 500
[print_clock_info_clk_mpu] > MPU_DPLL_DIV = 0x0C
[print_clock_info_clk_mpu] >> DPLL1 divider factor: 12
[print_clock_info_clk_mpu] CM_CLKSEL2_PLL_MPU: 0x00000001
[print_clock_info_clk_mpu] > MPU_DPLL_CLKOUT_DIV = 0x01
[print_clock_info_clk_mpu] >> DPLL1 CLKOUTX2 divided by 1
[print_clock_info_clk_mpu] CM_CLKSTCTRL_MPU: 0x00000000
[print_clock_info_clk_mpu] > CLKTRCTRL_MPU = 0x00
[print_clock_info_clk_mpu] >> Automatic clock state transition of the MPU domain is disabled
[print_clock_info_clk_mpu] CM_CLKSTST_MPU: 0x00000001
[print_clock_info_clk_mpu] > CLKACTIVITY_MPU = 0x1
[print_clock_info_clk_mpu] >> MPU domain clock is active

You need to enable both instruction and data cache. I’m traveling so I don’t have access to my system, but look at starterware for examples that enable the cache.
I don’t recall exactly, but from what I remember, toggling a GPIO without cache enabled, I got something in the area of 200 KHz waveform, but with cache enabled, I saw 4 MHz waveform.

Regards,
John

Hi John,

Thanks for the hint. But it should be something else:

  • I am already calling CacheEnable(CACHE_ALL); before jumping into my application
  • I get same results whether I enable caches be above method or not, meaning caches were already enabled
  • If instead I call CacheDisable(CACHE_ALL); performance gets down, but just by little margin, see below:

Time difference for 10000000 NOPs: 5.79538 secs
Number of iterations per second: 1725512.500

I checked, VDD1 is 1V2 which allows OPP3 = up to 500 MHz (so, voltage rail is not my restriction)

However, if I let it boot Ubuntu (from SD card) and run Linux version of same source code:

Uncompressing Linux… done, booting the kernel.
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.2.0-23-omap (buildd@ishigaq) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu4) ) #36-Ubuntu Tue Apr 10 20:24:21 UTC 2012 (Ubuntu 3.2.0-23.36-omap 3.2.14)
[ 0.000000] CPU: ARMv7 Processor [411fc083] revision 3 (ARMv7), cr=10c53c7d
[ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[ 0.000000] Machine: OMAP3 Beagle Board
[ 0.000000] Reserving 12582912 bytes SDRAM for VRAM
[ 0.000000] Memory policy: ECC disabled, Data cache writeback
[ 0.000000] OMAP3430/3530 ES3.0 (l2cache iva sgx neon isp )
[ 0.000000] Clocking rate (Crystal/Core/MPU): 26.0/332/500 MHz

I get much better results:

Time difference for 10000000 NOPs: 0.28000 secs
Number of iterations per second: 35714284.000

Which means, Linux is of course configuring it correctly. But, how to replicate what Linux does, only w.r.t. clock configuration / MPU speed, within Bare Metal environment?

There is a GEL file in Starterware that sets up the clocks. The GEL file is very similar to C programming and I converted ti to a C file that setup everything for my application. I wish I had access to my desktop so I could send you my project. I also converted that GEL file to a Lauterbach Practice script so that I could debug with a Lauterbach JTAG debugger.

Regards,
John

I guess that cache were never enabled at all. Performance without caches must be dramatically lower (kind of ratio between your bare-metal and Linux) I don’t know StarterWare enough to tell you what to do , but remember that data cache is only running when MMU is on, so make sure MMU is on. – Laurent

HI Laurent,

Thanks for the hint. I’ve integrated the relevant code from StartweWareFree (https://sourceforge.net/projects/starterwarefree/)

From there I am using:

system_config/armv7a/cache.c
system_config/armv7a/mmu.c
system_config/armv7a/gcc/cp15.S

And the respective headers.

I base my app code on axamples/beaglebone/cache_mmu/uartEdma_Cache.c

In there, the sequence is

MMUConfigAndEnable();
CacheEnable(CACHE_ALL);

I modified MMUConfigAndEnable(); to configure only DDR, the rest is left untouched (no OCMC, no Device)

If I call MMUConfigAndEnable() my app gets lost. I see nothing happening. If I remove it, I can see it again working, in the crappy performance that I have reported so far.

Any furhter hint?

Things to keep in mind:

  • I am using BeagleBoard original RevC (OMAP3530 ES3.0), not BeagleBone (White, Black, or Green or any other variant). In other words, no AM335x.

  • Anyways, both use same Cortex-A8 and everything related to MMU or Caches is the same (is it?), therefore, it should not make any difference.
    I noticed, in MMUConfigAndEnable() below constant is used. What is it? What is it for?

/* To configure as a section. Section Size is 1MB */
#define MMU_PGTYPE_SECTION (0xFFF04002)

Thanks and regards!!!

Hello,

One additional note: my environment is just composed of:

  • ARM GCC EABI Toolchhain & Eclipse (no IAR, no CCS, no JTAG)
  • BeagleBoard Rev C running U-Boot already
  • I do not let U-Boot to autoboot anything, but instead I get into U-Boot
  • I load the .srec file generated after building my aplication (> loads)
  • Once loaded, I just give over control to my application (> go 80008000)
    In other words, I take benefit from U-Boot already initializing most of the SoC, I just need to make it able to run at full speed, caches enabled, for my bare-metal application benchmark. Is should be possible, shouldn’t it?

Thanks!

Hi,

I just want to report that I could fix it.

It as way simpler than all what I did before: take the example of StarterWare-Free as is, including the MMU configuration, no only for DDR (as I was doing before), but also for OCM and Device. It now works flawless.

I also combined in my project, the code from U-Boot for clock configuration, and I can get up to 600 MHz.

All combined, I get

Time difference for 10000000 iterations: 0.10001 s

Which is around factor 54x improvement!

Regards,

Jaime Arnguren