XBMC Weekly report 5

http://xbmc.org/topfs2/2010/06/28/weekly-report-5/

http://xbmc.org/topfs2/2010/06/28/weekly-report-5/

Status

 \* Finally XBMC runs on Angstrom, turns out it was optical code
   paths deadlocking while loading\. Commited a –
   disable\-optical\-drive on the gsoc branch\.

yay!

Plan

 \* Since C4 runs at 10\-15 fps and 100% CPU we need to tripple
   performance and since CPU seems to be limiting here \(unless
   buffer flip does busy wait\) next stop will be limiting
   processing amount\. First will be eventbased since this will
   allow a way to create a skin thats extremely light on processing
   resources\.

What about video overlay? Even if it requires a custom skin/feature ...

IMO separating (the main) video from the gui is the best chance for
mitigating cpu limitations, assuming the video can make it the gui
needn't run at the same rate. Video framebuffers are the obvious way
but not the only one.

 \* Finding out what paths eat the most amount of CPU time \(should
   be somewere in font according to other tests\) and try to set up
   a proper plan on how to limit the CPU usage\.

How do you plan to profile the code, and are you going to do it on a
beagle or pc (less accurate, but at the required granularity it might
suffice)? Is there work that has already been done in this area that
could save you the time?

 \* Backport a few changes to trunk to allow building xbmc on
   angstrom on trunk\.

I don't think this is very important at this point. That's what the
branch is for.

Risks

 \* Given that it looks like CPU might be a limiting factor getting
   XBMC to lower resource use by a third might be hard without
   limiting the skin

I think this is a pretty reasonable solution - if it is required.
Nothing wrong with a bit of branding either ...

!Z

Michael Zucchi <notzed@gmail.com> writes:

 \* Since C4 runs at 10\-15 fps and 100% CPU we need to tripple
   performance and since CPU seems to be limiting here \(unless
   buffer flip does busy wait\) next stop will be limiting
   processing amount\. First will be eventbased since this will
   allow a way to create a skin thats extremely light on processing
   resources\.

What about video overlay? Even if it requires a custom skin/feature ...

IMO separating (the main) video from the gui is the best chance for
mitigating cpu limitations, assuming the video can make it the gui
needn't run at the same rate. Video framebuffers are the obvious way
but not the only one.

Using the video overlays has another distinct advantage: the scaled
and rgb-converted image doesn't need to be written back to memory
before being displayed as is the case when the sgx pipeline is used
for this. The reduced memory bandwidth should allow the rest to run a
bit faster.

 \* Finding out what paths eat the most amount of CPU time \(should
   be somewere in font according to other tests\) and try to set up
   a proper plan on how to limit the CPU usage\.

How do you plan to profile the code, and are you going to do it on a
beagle or pc (less accurate, but at the required granularity it might
suffice)? Is there work that has already been done in this area that
could save you the time?

I would recommend using oprofile on the beagle.

From what I've seem at customers, painting the alpha channel with 0xff is all that's needed to make the video shine through. and I suspect taking Måns' omapfb code as a basis would work as well.

regards,

Koen

tis 2010-06-29 klockan 02:22 +0930 skrev Michael Zucchi:

> http://xbmc.org/topfs2/2010/06/28/weekly-report-5/

>
> Status
>
> * Finally XBMC runs on Angstrom, turns out it was optical code
> paths deadlocking while loading. Commited a –
> disable-optical-drive on the gsoc branch.

yay!

> Plan
>
> * Since C4 runs at 10-15 fps and 100% CPU we need to tripple
> performance and since CPU seems to be limiting here (unless
> buffer flip does busy wait) next stop will be limiting
> processing amount. First will be eventbased since this will
> allow a way to create a skin thats extremely light on processing
> resources.

What about video overlay? Even if it requires a custom skin/feature ...

IMO separating (the main) video from the gui is the best chance for
mitigating cpu limitations, assuming the video can make it the gui
needn't run at the same rate. Video framebuffers are the obvious way
but not the only one.

Currently the players pushes it to a VideoRenderer which is specialized
for GLES so afaict it seems like a perfect place to put the overlay
specific.

Basically on VideoRenderer::Render we would (switch to overlay?) push
yuv data and then switch to the layer which can take alpha? I still
haven't read through the documentation enough but I doubt it will be any
significant problem.

> * Finding out what paths eat the most amount of CPU time (should
> be somewere in font according to other tests) and try to set up
> a proper plan on how to limit the CPU usage.

How do you plan to profile the code, and are you going to do it on a
beagle or pc (less accurate, but at the required granularity it might
suffice)? Is there work that has already been done in this area that
could save you the time?

> * Backport a few changes to trunk to allow building xbmc on
> angstrom on trunk.

I don't think this is very important at this point. That's what the
branch is for.

True, I'll wait with it. better not waste time on it for now.

> Risks
>
> * Given that it looks like CPU might be a limiting factor getting
> XBMC to lower resource use by a third might be hard without
> limiting the skin

I think this is a pretty reasonable solution - if it is required.
Nothing wrong with a bit of branding either ...

!Z

Indeed.

A bit of update on this though, since I can finally run XBMC on angstrom
I have tried doing a limit of the rendered screen with glScissor and if
I limit to 1/4th of the screen I get about double the framerate (20fps
in 720p).

This is abit odd seeing I had 100% CPU rendering 480p and 720p and I
didn't expect to see that big of a change since it looked like CPU was
limiting. This is good news as I would believe either we have a stray
thread (possibly due to the optical disableing) or perhaps vsync do a
busy wait?

XBMC guilib does a lot of calculations per frame (lots and lots of
matrix multiplications :slight_smile: ) which will be lowered significantly when we
have a proper event based solution.

The SGX drivers implement silicon workaround on the ARM, so when you think you're using GLES, you're actually using software rendering for some portions. The good news is that with newer SGX revisions this is done less and less. I'm certain this is why the xM is so much faster, since it does more on the SGX. Last I heard the SGX in ES3.x silicon has around 70 fallbacks, the one in ES5.x silicon around 10. I haven't bothered to check that claim, though.

regards,

Koen

tis 2010-06-29 klockan 12:49 +0200 skrev Koen Kooi:

tis 2010-06-29 klockan 12:49 +0200 skrev Koen Kooi:

A bit of update on this though, since I can finally run XBMC on angstrom
I have tried doing a limit of the rendered screen with glScissor and if
I limit to 1/4th of the screen I get about double the framerate (20fps
in 720p).

This is abit odd seeing I had 100% CPU rendering 480p and 720p and I
didn't expect to see that big of a change since it looked like CPU was
limiting. This is good news as I would believe either we have a stray
thread (possibly due to the optical disableing) or perhaps vsync do a
busy wait?

The SGX drivers implement silicon workaround on the ARM, so when you think you're using GLES, you're actually using software rendering for some portions. The good news is that with newer SGX revisions this is done less and less. I'm certain this is why the xM is so much faster, since it does more on the SGX. Last I heard the SGX in ES3.x silicon has around 70 fallbacks, the one in ES5.x silicon around 10. I haven't bothered to check that claim, though.

regards,

Koen

That explain alot. Is there any way to either upgrade to 5.x (or thats
just for xM?)?

Yes, 5.x silicon is AM37xx, DM37xx and OMAP36xx

Does it exist any documentation regarding what has
fallbacks because I read a SGX optimization document which didn't say
much about it, basically just listed general GLES optimizations.

AFAIK you can only check that by looking at the driver sources (NDA + $$$$$$) or changelogs (again, NDA, lots of $).

I just installed xbmc in my C4 and I do get all the textures, so this could be a bug in the ES5 drivers. Oh the fun...

regards,

Koen

Tobias Arrskog wrote:

tis 2010-06-29 klockan 12:49 +0200 skrev Koen Kooi:

> A bit of update on this though, since I can finally run XBMC on angstrom
> I have tried doing a limit of the rendered screen with glScissor and if
> I limit to 1/4th of the screen I get about double the framerate (20fps
> in 720p).
>
> This is abit odd seeing I had 100% CPU rendering 480p and 720p and I
> didn't expect to see that big of a change since it looked like CPU was
> limiting. This is good news as I would believe either we have a stray
> thread (possibly due to the optical disableing) or perhaps vsync do a
> busy wait?

The SGX drivers implement silicon workaround on the ARM, so when you think you're using GLES, you're actually using software rendering for some portions. The good news is that with newer SGX revisions this is done less and less. I'm certain this is why the xM is so much faster, since it does more on the SGX. Last I heard the SGX in ES3.x silicon has around 70 fallbacks, the one in ES5.x silicon around 10. I haven't bothered to check that claim, though.

regards,

Koen

That explain alot. Is there any way to either upgrade to 5.x (or thats
just for xM?)?

you need a very steady hand :slight_smile:

Not quite like that.

If you were using video overlays the video rendering could and
probably would be completely separate from the GUI rendering.

The 'video plane' widget would still be in the tree - and be used to
setup the stencil/alpha, or simply noop, or even used to determine
which 'video screen' was the overlay and which was rendered (if you
wanted to support multiple video outputs). But the actual video data
would never go through it and should run on a completely independent
schedule. Setting up the video layer would also be done outside of
the Render call.