Improved MPlayer: At the performance level of Omapfbplay

Hello,

Here are four files to put the performance of mplayer at the level of
omapfbplay on the beagleboard platform. It's a first pass, the code
needs refinement for bugs and the known issues. Feed-back is welcome,

Grégoire

/*

mplayer_svn.bb.ph (1.75 KB)

yuv.S (4.52 KB)

vo_omapfb.c (16.5 KB)

omapfb.patch (946 Bytes)

Hi,

4) As the video is written to the upper plane, overlapping of window is
not working.
Currently, I disable the video in such case. Any suggestion of roadmap
would be appreciated.

Can you write the video in a slower way instead for this case?

DirectFB could be used though I have not understood if it's a
replacement of fbdev or an abstraction layer.

If you write an omapfb driver for DirectFB. DirectFB typically uses
the regular framebuffer device, but for each framebuffer driver, it
can use the mmio space to perform hardware accelerated operations.

Sean

"Sean D'Epagnier" <geckosenator@gmail.com> writes:

Hi,

4) As the video is written to the upper plane, overlapping of
window is not working. Currently, I disable the video in such
case. Any suggestion of roadmap would be appreciated.

Can you write the video in a slower way instead for this case?

The hardware supports colour keying the overlay. The omapfb kernel
driver might even expose it. There's code there for it, but I haven't
tested it.

So as suggested by Mans, the easiest path could be to use the color key.
I see the get/set color key in the omap driver which is encouraging. The
problem is that the X11 events generated (Expose, Visibility...) are the
opposite of what we want. They are basically designed to give areas of
what becomes visible, not the opposite.

I'm sure that such problem has already been solved 10 times. Any
pointer? Any known example?

Another idea: would it work to switch the planes? fb0 for the video and
fb1 for the desktop. Then just set the color key, and draw it in the
mplayer X11 window. Would it make sense? How to tell X to use fb1?

Grégoire

Gregoire Gentil <gregoire@gentil.com> writes:

4) As the video is written to the upper plane, overlapping of window is
not working.
Currently, I disable the video in such case. Any suggestion of roadmap
would be appreciated.

So as suggested by Mans, the easiest path could be to use the color key.
I see the get/set color key in the omap driver which is encouraging. The
problem is that the X11 events generated (Expose, Visibility...) are the
opposite of what we want. They are basically designed to give areas of
what becomes visible, not the opposite.

The overlay is only active where the graphics plane has the key
colour. Simply paint the entire video window with the colour key.

So as suggested by Mans, the easiest path could be to use the color

key.

I see the get/set color key in the omap driver which is encouraging. The
problem is that the X11 events generated (Expose, Visibility...) are the
opposite of what we want. They are basically designed to give areas of
what becomes visible, not the opposite.

I'm sure that such problem has already been solved 10 times. Any
pointer? Any known example?

Another idea: would it work to switch the planes? fb0 for the video and
fb1 for the desktop. Then just set the color key, and draw it in the
mplayer X11 window. Would it make sense? How to tell X to use fb1?

Kalle's XV driver already does that for you: http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=summary

and as installable binary:

http://www.angstrom-distribution.org/repo/?pkgname=xf86-video-omapfb

The only downside of that driver is that it's using a C based colour conversion implementation:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD

This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to work properly, and it has a small glitch when used with DSS2. Any volunteers for adding the NEON colour conversion to this driver?

regards,

Koen

So as suggested by Mans, the easiest path could be to use the color key.

I see the get/set color key in the omap driver which is encouraging. The
problem is that the X11 events generated (Expose, Visibility...) are the
opposite of what we want. They are basically designed to give areas of
what becomes visible, not the opposite.

I'm sure that such problem has already been solved 10 times. Any
pointer? Any known example?

Another idea: would it work to switch the planes? fb0 for the video and
fb1 for the desktop. Then just set the color key, and draw it in the
mplayer X11 window. Would it make sense? How to tell X to use fb1?

Kalle's XV driver already does that for you:
http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=summary

I'll gladly integrate patches to implement the support for the
XV_COLORKEY attribute :wink:

It's even defined already in the code and there's stubs for the set
and get calls:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/omapfb-xv.c;hb=HEAD#l62

So all it needs is to implement the ioctl call I suppose... But I
haven't really investigated that further.

and as installable binary:

http://www.angstrom-distribution.org/repo/?pkgname=xf86-video-omapfb

The only downside of that driver is that it's using a C based colour
conversion implementation:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD

This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to
work properly, and it has a small glitch when used with DSS2. Any volunteers
for adding the NEON colour conversion to this driver?

Yes, please, that'd be awesome. :slight_smile:

The C version is just a placeholder to verify correctness, there is
some effort done to get an optimized version in, but that's only for
ARMv6. NEON-enabled platforms would most certainly benefit from such
conversion.

[...]

The only downside of that driver is that it's using a C based colour
conversion implementation:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD

This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch) to
work properly, and it has a small glitch when used with DSS2. Any volunteers
for adding the NEON colour conversion to this driver?

Yes, please, that'd be awesome. :slight_smile:

The C version is just a placeholder to verify correctness, there is
some effort done to get an optimized version in, but that's only for
ARMv6. NEON-enabled platforms would most certainly benefit from such
conversion.

Just out of curiosity, is it an effort to port existing ARMv6
optimized color conversion code from Xomap or somebody is going after
a completely new implementation?

Back to the subject. Having all the color conversion optimizations
implemented in Xv and using it from MPlayer with direct rendering
enabled (-dr option) theoretically might provide performance close to
that of direct framebuffer access. But of course there might be a lot
of issues to solve too.

Unfortunately, the Tuomas' patch doesn't work on Beagleboard as the
author (=Tuomas) reported it. And I can unfortunately confirm :frowning:

Grégoire

I was aware of the XV branch. But Siarhei has a point and it's partly
the reason why I made some work on this fbdev front. The XV branch is a
great effort but before getting something stable and mature will take a
long time, while the framebuffer is already working for a while. And you
will never get the same memory consumption: X takes at least 10 to 20
more MB than Kdrive, which makes a difference on a 128MB system. Without
mentioning some missing features like rotation. On the other side, it's
true that the main advantage of XV is that you will get multiple videos
at the same time,

Grégoire

Gregoire Gentil wrote:

I didn't see much difference (1 or 2 MiB) between Xorg and kdrive on the beagle. It all depends on how you build X and libx11 :slight_smile:

regards,

Koen

[...]

The only downside of that driver is that it's using a C based colour
conversion implementation:

http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-format-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD

This driver needs some fixes to omapfb (e.g. Tuomas' downscaling patch)
to
work properly, and it has a small glitch when used with DSS2. Any
volunteers
for adding the NEON colour conversion to this driver?

Yes, please, that'd be awesome. :slight_smile:

The C version is just a placeholder to verify correctness, there is
some effort done to get an optimized version in, but that's only for
ARMv6. NEON-enabled platforms would most certainly benefit from such
conversion.

Just out of curiosity, is it an effort to port existing ARMv6
optimized color conversion code from Xomap or somebody is going after
a completely new implementation?

It's something new, but I'm not sure what the status is and whether
it'll realize any time soon... I gave the conversion routine in XOmap
(to the quirky YUV format of blizzard, this driver started out on
N800...) a test, but couldn't get it to work. The whole thing sounded
so silly I tried simply converting planar formats to one of the
supported packed formats and it didn't bog down the performance
completely, even when written in C. And that's what happens on beagle
too. I don't own one, nor do I have too much free time at the office
so the non-N800 side hasn't been that actively developed by me...

I got 512x288@24fps running smoothly on N800 and didn't really miss
the extra performance of 12bit planar format or optimized color
conversion so I left it at that for the time being... :slight_smile:

Back to the subject. Having all the color conversion optimizations
implemented in Xv and using it from MPlayer with direct rendering
enabled (-dr option) theoretically might provide performance close to
that of direct framebuffer access. But of course there might be a lot
of issues to solve too.

I was aware of the XV branch. But Siarhei has a point and it's partly
the reason why I made some work on this fbdev front. The XV branch is a
great effort but before getting something stable and mature will take a
long time, while the framebuffer is already working for a while. And you
will never get the same memory consumption: X takes at least 10 to 20
more MB than Kdrive, which makes a difference on a 128MB system. Without
mentioning some missing features like rotation. On the other side, it's
true that the main advantage of XV is that you will get multiple videos
at the same time,

I didn't see much difference (1 or 2 MiB) between Xorg and kdrive on the
beagle. It all depends on how you build X and libx11 :slight_smile:

Yeah, the "X.Org is bloated, use kdrive" argument has been moot for
some time now. This was recently discussed on the X.Org mailing list
and this comment should be convincing enough:

  DRI2 Heads up

And for the record, xf86-video-omapfb is not a branch of anything,
it's a whole new driver for the OMAP framebuffer kernel driver. The
idea is to support some tricks that the basic fbdev driver doesn't.

The advantage of XV is that you don't have to optimize your *client*
software for a specific board (which naturally yields the optimal
solution), instead you optimize the driver. Thus instead of one
program working nicely, you have N programs working nicely for the
same effort. Don't get me wrong, using the framebuffer directly is
fine and dandy for a number of use cases, but if X is going to be
running, XV is the only decent way to interact with it really.

The advantage of XV is that you don't have to optimize your *client*
software for a specific board (which naturally yields the optimal
solution), instead you optimize the driver. Thus instead of one
program working nicely, you have N programs working nicely for the
same effort. Don't get me wrong, using the framebuffer directly is
fine and dandy for a number of use cases, but if X is going to be
running, XV is the only decent way to interact with it really.

I'm not fully convinced on the performance issue but I do agree that XV
is much more polyvalent and powerful than fbdev.

In the mean time, find attached a version that adds color key and hence
fixes the overlap problem. Thanks to Mans for the hint!

It remains the problem of the border conversion interpolation, which is
common to both fbdev and xv,

Grégoire

vo_omapfb.c (16.7 KB)

Judging from the code you attached, and assuming I'm not totally
wrong, the only part where XV "needs" to be inferior is the data
transfer between the client and the server. And when XSHM is used,
that overhead is bound to be dwarfed by the decoding and color
conversion to a point of not mattering any more.

Still XV is harder to use in a video player to get good performance. The
problem is mostly related to OSD and subtitles.

With direct access to framebuffer, video decoding is very simple. The client
just does color format conversion and then can easily draw subtitles over the
image in the framebuffer.

With XV everything gets more complex if we want to avoid any redundant memcpy
operations to copy data around. The client needs to provide a ready frame
with all the subtitles and OSD data drawn over it in a planar format to XV.
But subtitles can be applied only to the frame which is already retired from
video decoding pipeline and is not used as a reference frame for decoding next
frames anymore. So if everything is implemented right, the frame is available
with some delay which needs to be compensated and taken into account. As I
mentioned before, this stuff is implemented in MPlayer using "direct
rendering" method (-dr option), also see [1]. The problem is that the last
time I checked it (admittedly long ago), direct rendering was not working well
in MPlayer (including not making use of direct rendering for some
codec/configuration combinations and rendering bugs with subtitles).
Theoretically, everything should be fixable given enough efforts. But in
practice it may be definitely more complex than just going with direct
framebuffer rendering hacks :slight_smile:

1. http://www.mplayerhq.hu/DOCS/tech/dr-methods.txt

Op 21 nov 2008, om 11:37 heeft Siarhei Siamashka het volgende geschreven:

Siarhei Siamashka wrote:

>> The advantage of XV is that you don't have to optimize your *client*
>> software for a specific board (which naturally yields the optimal
>> solution), instead you optimize the driver. Thus instead of one
>> program working nicely, you have N programs working nicely for the
>> same effort. Don't get me wrong, using the framebuffer directly is
>> fine and dandy for a number of use cases, but if X is going to be
>> running, XV is the only decent way to interact with it really.
>
> I'm not fully convinced on the performance issue but I do agree that XV
> is much more polyvalent and powerful than fbdev.

Judging from the code you attached, and assuming I'm not totally
wrong, the only part where XV "needs" to be inferior is the data
transfer between the client and the server. And when XSHM is used,
that overhead is bound to be dwarfed by the decoding and color
conversion to a point of not mattering any more.

Still XV is harder to use in a video player to get good performance. The
problem is mostly related to OSD and subtitles.

If the hardware supports the native output format of the decoder, and
there is enough video memory for all delayed frames (3 frames for MPEG2,
16 for H.264), XV imposes an additional copy of each frame from the SHM
segment into the actual video memory. If there is insufficient video
memory or if pixel format conversion is required, there is no reason for
XV to be less efficient than the application accessing the framebuffer
directly.

With direct access to framebuffer, video decoding is very simple. The client
just does color format conversion and then can easily draw subtitles over the
image in the framebuffer.

Unless alpha blending of subtitles with the video frame is required, one
can simply draw the subtitle text directly in the X window used for video
and enable colour keying for the overlay.

With XV everything gets more complex if we want to avoid any redundant memcpy
operations to copy data around. The client needs to provide a ready frame
with all the subtitles and OSD data drawn over it in a planar format to XV.
But subtitles can be applied only to the frame which is already retired from
video decoding pipeline and is not used as a reference frame for decoding next
frames anymore. So if everything is implemented right, the frame is available
with some delay which needs to be compensated and taken into account.

Any post-decode rendering into the video frames requires either an extra
copy or a delay. When the hardware support the codec-native pixel format
and sufficient video memory is present, decoding directly to video memory
and rendering subtitles after a delay is the most efficient. XV does not
allow this, unfortunately.

If pixel format conversion is required, the player can simply render the
subtitles into the video frames after a safe delay before passing them
to XV, which will do the format conversion while writing the frame to
video memory. No extra copy needed. The delayed subtitle rendering
is trivial to implement.

As I mentioned before, this stuff is implemented in MPlayer using "direct
rendering" method (-dr option), also see [1]. The problem is that the last
time I checked it (admittedly long ago), direct rendering was not working well
in MPlayer (including not making use of direct rendering for some
codec/configuration combinations and rendering bugs with subtitles).
Theoretically, everything should be fixable given enough efforts. But in
practice it may be definitely more complex than just going with direct
framebuffer rendering hacks :slight_smile:

Am I misreading you, or is the above paragraph saying that XV is suboptimal
because mplayer has bugs when *not* using it?

>>>
>>> [...]
>>>
>>>>> The only downside of that driver is that it's using a C based colour
>>>>> conversion implementation:
>>>>>
>>>>>
>>>>> http://gitweb.pingu.fi/?p=xf86-video-omapfb.git;a=blob;f=src/image-fo
>>>>>rmat-conversions.c;h=5a82a3625be2962197ae58acd0772e7b27243f04;hb=HEAD
>>>>>
>>>>> This driver needs some fixes to omapfb (e.g. Tuomas' downscaling
>>>>> patch) to
>>>>> work properly, and it has a small glitch when used with DSS2. Any
>>>>> volunteers
>>>>> for adding the NEON colour conversion to this driver?
>>>>
>>>> Yes, please, that'd be awesome. :slight_smile:
>>>>
>>>> The C version is just a placeholder to verify correctness, there is
>>>> some effort done to get an optimized version in, but that's only for
>>>> ARMv6. NEON-enabled platforms would most certainly benefit from such
>>>> conversion.
>>>
>>> Just out of curiosity, is it an effort to port existing ARMv6
>>> optimized color conversion code from Xomap or somebody is going after
>>> a completely new implementation?

It's something new, but I'm not sure what the status is and whether
it'll realize any time soon... I gave the conversion routine in XOmap
(to the quirky YUV format of blizzard, this driver started out on
N800...) a test, but couldn't get it to work. The whole thing sounded
so silly I tried simply converting planar formats to one of the
supported packed formats

This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel vs.
16-bit per pixel). Each video frame needs to be pushed to the external LCD
controller for Nokia N800/N8100 devices. The link to external LCD controller
(RFBI) is relatively slow and is one of the weak spots of these devices. It is
barely able to manage tear-free 800x480 screen updates for RGB565 and YUY2
color formats when running at top clock frequency. But for any screen updates,
RFBI bandwidth is a scarce resource, especially when using tearing
synchronization (as the driver occasionally needs to wait for the right moment
to push frame to LCD controller, keeping RFBI idle and reducing overall
efficiency of using its limited bandwidth).

But even without considering RFBI transfers, the color format conversion to
quirky YUV format is faster than conversion to YUY2 just because it needs
to write less data. That's why we went through the trouble of fixing support
for this quirky color format in Nokia 770 omapfb driver:
http://www.mail-archive.com/maemo-developers@maemo.org/msg09979.html
https://garage.maemo.org/tracker/index.php?func=detail&aid=881&group_id=164&atid=683

and it provided something like ~1-2% of *overall* video playback performance
improvement in MPlayer which is quite a good result (of course, the
performance improvement for only color conversion part is much more
impressive).

and it didn't bog down the performance
completely, even when written in C. And that's what happens on beagle
too. I don't own one, nor do I have too much free time at the office
so the non-N800 side hasn't been that actively developed by me...

I got 512x288@24fps running smoothly on N800 and didn't really miss
the extra performance of 12bit planar format or optimized color
conversion so I left it at that for the time being... :slight_smile:

The problem of C implementation is in heavy cpu usage. Even if it is able to
display static images in a synthetic test with a decent framerate, video
player is a bit different. Every cpu cycle spent in color format conversion
code is stolen from the video decoder. Low overhead in XV is important for any
practical use of it in video players. An old discussion about the color
conversion overhead with some benchmark numbers for N800 can be found here:
http://www.mail-archive.com/maemo-developers@maemo.org/msg09869.html
Nowadays the performance of ARMv6 optimized color format conversion is a lot
more modest because of the disabled hit-under-miss feature in order to
workaround 364296 ARM1136 r0pX errata (one can grep for this workaround in
N800 kernel sources) and nonworking software prefetch as a result - PLD
instructions are now practically useless. Nevertheless ARMv6 assembly still
outperforms C code quite significantly.

The same is of course true for beagleboard (so the above part is not a
complete offtopic). XV really needs to get NEON optimized color format
conversion code integrated. Otherwise it would remain completely
noncompetitive when compared to the media players which are using
direct framebuffer access with all the necessary optimizations.

Siarhei Siamashka wrote:
>> >> The advantage of XV is that you don't have to optimize your *client*
>> >> software for a specific board (which naturally yields the optimal
>> >> solution), instead you optimize the driver. Thus instead of one
>> >> program working nicely, you have N programs working nicely for the
>> >> same effort. Don't get me wrong, using the framebuffer directly is
>> >> fine and dandy for a number of use cases, but if X is going to be
>> >> running, XV is the only decent way to interact with it really.
>> >
>> > I'm not fully convinced on the performance issue but I do agree that
>> > XV is much more polyvalent and powerful than fbdev.
>>
>> Judging from the code you attached, and assuming I'm not totally
>> wrong, the only part where XV "needs" to be inferior is the data
>> transfer between the client and the server. And when XSHM is used,
>> that overhead is bound to be dwarfed by the decoding and color
>> conversion to a point of not mattering any more.
>
> Still XV is harder to use in a video player to get good performance. The
> problem is mostly related to OSD and subtitles.

If the hardware supports the native output format of the decoder, and
there is enough video memory for all delayed frames (3 frames for MPEG2,
16 for H.264), XV imposes an additional copy of each frame from the SHM
segment into the actual video memory. If there is insufficient video
memory or if pixel format conversion is required, there is no reason for
XV to be less efficient than the application accessing the framebuffer
directly.

> With direct access to framebuffer, video decoding is very simple. The
> client just does color format conversion and then can easily draw
> subtitles over the image in the framebuffer.

Unless alpha blending of subtitles with the video frame is required, one
can simply draw the subtitle text directly in the X window used for video
and enable colour keying for the overlay.

> With XV everything gets more complex if we want to avoid any redundant
> memcpy operations to copy data around. The client needs to provide a
> ready frame with all the subtitles and OSD data drawn over it in a planar
> format to XV. But subtitles can be applied only to the frame which is
> already retired from video decoding pipeline and is not used as a
> reference frame for decoding next frames anymore. So if everything is
> implemented right, the frame is available with some delay which needs to
> be compensated and taken into account.

Any post-decode rendering into the video frames requires either an extra
copy or a delay. When the hardware support the codec-native pixel format
and sufficient video memory is present, decoding directly to video memory
and rendering subtitles after a delay is the most efficient. XV does not
allow this, unfortunately.

If pixel format conversion is required, the player can simply render the
subtitles into the video frames after a safe delay before passing them
to XV, which will do the format conversion while writing the frame to
video memory. No extra copy needed. The delayed subtitle rendering
is trivial to implement.

The overall style of your reply seems a bit strange:

me: Implementing fast video output (on beagleboard) using XV is somewhat
harder than just using direct framebuffer access, because you need to
implement delayed subtitle rendering (handled by "direct rendering" in
MPlayer) in order to avoid extra data copies.
you: The delayed subtitle rendering is trivial to implement (plus some
additional useful details about delayed subtitle rendering).

Are you trying to question something? You have also taken an easy way with
omapfbplay instead of investing efforts in tweaking one of the full-fledged
media players to get it work well :wink:

Thanks anyway for the additional details. Gregoire and Kalle may find all this
information useful and they are the ones who are *actually* working
on "Improved MPlayer: At the performance level of Omapfbplay" and XV for
beagleboard.

BTW, the comment about colour keying is quite interesting (though MPlayer is
normally using alpha blending for subtitles). It might be worth implementing
(as it is *really* trivial). Probably you should submit your idea to
mplayer-dev-eng mailing list.

> As I mentioned before, this stuff is implemented in MPlayer using "direct
> rendering" method (-dr option), also see [1]. The problem is that the
> last time I checked it (admittedly long ago), direct rendering was not
> working well in MPlayer (including not making use of direct rendering for
> some codec/configuration combinations and rendering bugs with subtitles).
> Theoretically, everything should be fixable given enough efforts. But in
> practice it may be definitely more complex than just going with direct
> framebuffer rendering hacks :slight_smile:

Am I misreading you, or is the above paragraph saying that XV is suboptimal
because mplayer has bugs when *not* using it?

Yes, you are definitely misreading me.