SGX performance under Angstrom with TI supplied drivers

I'm trying to write some GL benchmark 'demos' to see how well the SGX
performs using the TI supplied drivers on the beagle board under
angstrom. As a test mesh I'm using the Monkey head object from
blender (version I'm using is composed of ~3800 triangles). I'm not
texture mapping the geometry; just using vertex data and surface
normals.

I'm not seeing *anywhere* near the theoretical 10 million triangles/
second marketing figure published all over the OMAP 35xx
documentation. In fact with lighting disabled (and just a constant
color applied to the mesh) the best performance I'm getting is ~1.2
million triangles per second. If I turn lighting on I only get ~250k
triangles per second.

In both cases the demo is only using ~3% of the Arm CPU according to
top output. Is anyone else trying to get benchmark numbers on the
beagle GPU? Has anyone been able to get close to the 10 million tris/
sec number?

Should have mentioned in initial post that these tests are being run