Improved MPlayer: At the performance level of Omapfbplay

Siarhei Siamashka <siarhei.siamashka@gmail.com> writes:

Siarhei Siamashka wrote:
>> >> The advantage of XV is that you don't have to optimize your *client*
>> >> software for a specific board (which naturally yields the optimal
>> >> solution), instead you optimize the driver. Thus instead of one
>> >> program working nicely, you have N programs working nicely for the
>> >> same effort. Don't get me wrong, using the framebuffer directly is
>> >> fine and dandy for a number of use cases, but if X is going to be
>> >> running, XV is the only decent way to interact with it really.
>> >
>> > I'm not fully convinced on the performance issue but I do agree that
>> > XV is much more polyvalent and powerful than fbdev.
>>
>> Judging from the code you attached, and assuming I'm not totally
>> wrong, the only part where XV "needs" to be inferior is the data
>> transfer between the client and the server. And when XSHM is used,
>> that overhead is bound to be dwarfed by the decoding and color
>> conversion to a point of not mattering any more.
>
> Still XV is harder to use in a video player to get good performance. The
> problem is mostly related to OSD and subtitles.

If the hardware supports the native output format of the decoder, and
there is enough video memory for all delayed frames (3 frames for MPEG2,
16 for H.264), XV imposes an additional copy of each frame from the SHM
segment into the actual video memory. If there is insufficient video
memory or if pixel format conversion is required, there is no reason for
XV to be less efficient than the application accessing the framebuffer
directly.

> With direct access to framebuffer, video decoding is very simple. The
> client just does color format conversion and then can easily draw
> subtitles over the image in the framebuffer.

Unless alpha blending of subtitles with the video frame is required, one
can simply draw the subtitle text directly in the X window used for video
and enable colour keying for the overlay.

> With XV everything gets more complex if we want to avoid any redundant
> memcpy operations to copy data around. The client needs to provide a
> ready frame with all the subtitles and OSD data drawn over it in a planar
> format to XV. But subtitles can be applied only to the frame which is
> already retired from video decoding pipeline and is not used as a
> reference frame for decoding next frames anymore. So if everything is
> implemented right, the frame is available with some delay which needs to
> be compensated and taken into account.

Any post-decode rendering into the video frames requires either an extra
copy or a delay. When the hardware support the codec-native pixel format
and sufficient video memory is present, decoding directly to video memory
and rendering subtitles after a delay is the most efficient. XV does not
allow this, unfortunately.

If pixel format conversion is required, the player can simply render the
subtitles into the video frames after a safe delay before passing them
to XV, which will do the format conversion while writing the frame to
video memory. No extra copy needed. The delayed subtitle rendering
is trivial to implement.

The overall style of your reply seems a bit strange:

me: Implementing fast video output (on beagleboard) using XV is somewhat
harder than just using direct framebuffer access, because you need to
implement delayed subtitle rendering (handled by "direct rendering" in
MPlayer) in order to avoid extra data copies.
you: The delayed subtitle rendering is trivial to implement (plus some
additional useful details about delayed subtitle rendering).

Are you trying to question something?

Yes, I am questioning your claim of XV being unsuitable for the Beagle
board because using it optimally would be difficult. XV *is*
inefficient on hardware supporting planar YUV, but not when a
conversion is necessary.

You have also taken an easy way with omapfbplay instead of investing
efforts in tweaking one of the full-fledged media players to get it
work well :wink:

I wrote omapfbplay purely for demo purposes. Furthermore, it achieves
better performance than is possible with mplayer due to the aggressive
buffering of decoded frames.

Thanks anyway for the additional details. Gregoire and Kalle may
find all this information useful and they are the ones who are
*actually* working on "Improved MPlayer: At the performance level of
Omapfbplay" and XV for beagleboard.

Why do you emphasise the word "actually"? Are you implying that I
ought to be doing more? Let me remind you that everything I do for
FFmpeg or the Beagle board is done in my spare time. I have no
obligations towards you or anybody else.

> As I mentioned before, this stuff is implemented in MPlayer using "direct
> rendering" method (-dr option), also see [1]. The problem is that the
> last time I checked it (admittedly long ago), direct rendering was not
> working well in MPlayer (including not making use of direct rendering for
> some codec/configuration combinations and rendering bugs with subtitles).
> Theoretically, everything should be fixable given enough efforts. But in
> practice it may be definitely more complex than just going with direct
> framebuffer rendering hacks :slight_smile:

Am I misreading you, or is the above paragraph saying that XV is suboptimal
because mplayer has bugs when *not* using it?

Yes, you are definitely misreading me.

What *are* you saying?

It's something new, but I'm not sure what the status is and whether
it'll realize any time soon... I gave the conversion routine in XOmap
(to the quirky YUV format of blizzard, this driver started out on
N800...) a test, but couldn't get it to work. The whole thing sounded
so silly I tried simply converting planar formats to one of the
supported packed formats

This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel vs.
16-bit per pixel).

[snip justification]

Yes, I do know and agree that the 12bit format is a definitive win :slight_smile:

The thing I deemed as silly was all the comments about serious
problems using it and tricks to avoid that in Xomap and the
endianness-issue which makes the conversion code tricky to implement.
As said, I tried putting the conversion routine from Xomap there, but
couldn't get it to show meaningful picture. So I thought that instead
of spending time fixing that and subjecting the driver to all the
problems mentioned in Xomap, I'd just convert to the other supported
formats as a short-term solution.

and it didn't bog down the performance
completely, even when written in C. And that's what happens on beagle
too. I don't own one, nor do I have too much free time at the office
so the non-N800 side hasn't been that actively developed by me...

I got 512x288@24fps running smoothly on N800 and didn't really miss
the extra performance of 12bit planar format or optimized color
conversion so I left it at that for the time being... :slight_smile:

The problem of C implementation is in heavy cpu usage. Even if it is able to
display static images in a synthetic test with a decent framerate, video
player is a bit different.

Indeed, and that's why I used gst-launch as a testing tool and
Elephants Dream as the source material. It actually worked out really
well since the first 60 seconds of the clip contain a slow pan and a
high velocity action bit (ie. showing errors smoothness and decoding
speed pretty clearly).

Now, I must admit that my "smooth" might not be the "smooth" of a True
HiFist :wink: but it should be ok for mundane people like me.

The same is of course true for beagleboard (so the above part is not a
complete offtopic). XV really needs to get NEON optimized color format
conversion code integrated.

This I agree with and in no point I meant to indicate that the C
implementation would be anything but a correctness test. IIRC it
wasn't fast enough for the Big Buck Bunny clip (480p) on beagle.

Otherwise it would remain completely
noncompetitive when compared to the media players which are using
direct framebuffer access with all the necessary optimizations.

This I don't agree with, since the competition (for me) isn't only
about getting the absolute best framerate. It's also about integration
and code reusability. It all depends on what your goals are, of
course.

>> It's something new, but I'm not sure what the status is and whether
>> it'll realize any time soon... I gave the conversion routine in XOmap
>> (to the quirky YUV format of blizzard, this driver started out on
>> N800...) a test, but couldn't get it to work. The whole thing sounded
>> so silly I tried simply converting planar formats to one of the
>> supported packed formats
>
> This quirky YUV format is more tightly packed than YUY2 (12-bit per pixel
> vs. 16-bit per pixel).

[snip justification]

Yes, I do know and agree that the 12bit format is a definitive win :slight_smile:

The thing I deemed as silly was all the comments about serious
problems using it and tricks to avoid that in Xomap and the
endianness-issue which makes the conversion code tricky to implement.
As said, I tried putting the conversion routine from Xomap there, but
couldn't get it to show meaningful picture. So I thought that instead
of spending time fixing that and subjecting the driver to all the
problems mentioned in Xomap, I'd just convert to the other supported
formats as a short-term solution.

Yes, it's a bit tricky to get this quirky YUV format working right. N800 has
omap display controller and external LCD controller chained. Scaling and
YUV->RGB conversion for video can be done on either of them. But this quirky
format is supported on external LCD controller only, which introduces a bit of
difficulties. Everything is fine while the video overlay is unobscured. But in
order to display anything over it (battery status notification for example),
video overlay gets migrated to omap display controller and starts using YUY2
format. You can have a close look at video overlay migration code from Xomap.

In any case, I can understand that you don't see this task as high priority.

[...]

> The same is of course true for beagleboard (so the above part is not a
> complete offtopic). XV really needs to get NEON optimized color format
> conversion code integrated.

This I agree with and in no point I meant to indicate that the C
implementation would be anything but a correctness test. IIRC it
wasn't fast enough for the Big Buck Bunny clip (480p) on beagle.

> Otherwise it would remain completely
> noncompetitive when compared to the media players which are using
> direct framebuffer access with all the necessary optimizations.

This I don't agree with, since the competition (for me) isn't only
about getting the absolute best framerate. It's also about integration
and code reusability. It all depends on what your goals are, of
course.

I don't see a disagreement here. Everybody understands that the color format
conversion optimization needs to be added to XV eventually. The only question
is about the priority of this task.

I was just getting an impression that you are a bit underestimating the
importance of having this optimization. Surely it is quite reasonable to
get the code working correctly first and applying optimizations to it a
bit later. But this color format conversion code could be the last straw
preventing smooth playback of some heavy video. And the players using direct
framebuffer access (MPlayer patch discussed in this thread) will have a
clear advantage over anything else using nonoptimized XV. That's what I mean
by it being noncompetitive.

You can and probably should aim for both integration/reusability/whatever and
low cpu usage (and that's not difficult actually as the NEON optimized color
format conversion code already exists)

Siarhei Siamashka <siarhei.siamashka@gmail.com> writes:
>> Siarhei Siamashka wrote:

[...]

> The overall style of your reply seems a bit strange:
>
> me: Implementing fast video output (on beagleboard) using XV is somewhat
> harder than just using direct framebuffer access, because you need to
> implement delayed subtitle rendering (handled by "direct rendering" in
> MPlayer) in order to avoid extra data copies.
> you: The delayed subtitle rendering is trivial to implement (plus some
> additional useful details about delayed subtitle rendering).
>
> Are you trying to question something?

Yes, I am questioning your claim of XV being unsuitable for the Beagle
board because using it optimally would be difficult.

Which claim? Please provide a relevant quote (the one where I supposedly say
something about XV being "unsuitable", "impossible" or whatever) or stop
trolling.

I only mentioned that the use of XV (in beagleboard port of MPlayer as implied
by this topic) in an efficient way is more complex than just using
framebuffer, but surely possible. And I'm well aware of what needs to be done
in order to achieve this. That's all.

XV *is* inefficient on hardware supporting planar YUV, but not when a
conversion is necessary.

It's not a news for me for sure. Just for your information, it was me who
mentioned this fact about XV performance first in this thread:
http://groups.google.com/group/beagleboard/msg/571cae7993c95f6a?hl=en

> You have also taken an easy way with omapfbplay instead of investing
> efforts in tweaking one of the full-fledged media players to get it
> work well :wink:

I wrote omapfbplay purely for demo purposes.

That is what I call "taking an easy way". Integrating such stuff in a real
media player (MPlayer) is a bit more complex practical task, as one needs
to work with an arguably messy codebase, solve the technical issue itself and
come through the real challenge of having to please the maintainers.

Furthermore, it achieves
better performance than is possible with mplayer due to the aggressive
buffering of decoded frames.

Strictly speaking, this is not quite correct. First, it does not provide
better performance on average. Whenever video decoding in MPlayer is late
(can't keep synchronized with audio), it tries to catch up having no delay
between frames and fully utilizing cpu. If video is too late and exceeds a
certain limit, framedropping comes into action. So what you get is a more
consistent framerate and better a/v sync, but not exactly performance.

Second, regarding this being impossible with MPlayer. See
http://mplayerxp.sourceforge.net/ (more specifically, you can look
into their FAQ, "Howto to improve quality of playback with MPlayerXP"
section). Just because MPlayer core developers are generally as "friendly"
as FFmpeg ones, none of such improvements got into official MPlayer tree
though.

I also considered trying to use this MPlayer fork and build a package based on
it for maemo long ago, but did not find it worth the efforts in the end
(but maybe it was a good idea after all).

> Thanks anyway for the additional details. Gregoire and Kalle may
> find all this information useful and they are the ones who are
> *actually* working on "Improved MPlayer: At the performance level of
> Omapfbplay" and XV for beagleboard.

Why do you emphasise the word "actually"?

You see, Gregoire started this thread here. Probably he considered us both
as experts in the area of video and multimedia for OMAP devices and added
us to CC with the hope that we may add some useful comments.

You are apparently not happy for some reason and try really hard
to "misunderstand" me, though I don't see any contradiction regarding
"delayed subtitles rendering". You know, the core FFmpeg developers
are not the bearers of the sacred knowledge or something. We, mere
mortals, can understand MPlayer and FFmpeg code pretty well too. That
was sarcasm by the way.

Nevertheless, whatever you reply to me, try to argue or flame is pointless,
because I'm not working on the beagleboard MPlayer or XV myself.

Are you implying that I ought to be doing more?

Not really.

Let me remind you that everything I do for FFmpeg or the Beagle board is
done in my spare time. I have no obligations towards you or anybody else.

So what? Whatever I post or contribute using my private gmail address is also
done in my spare time.

And it's not quite relevant to this discussion, but I dare to remind that you
actually *do* have some obligations now as ARM port maintainer of FFmpeg,
that's the responsibility you have voluntarily taken upon yourself not so long
ago...

>> > As I mentioned before, this stuff is implemented in MPlayer using
>> > "direct rendering" method (-dr option), also see [1]. The problem is
>> > that the last time I checked it (admittedly long ago), direct
>> > rendering was not working well in MPlayer (including not making use of
>> > direct rendering for some codec/configuration combinations and
>> > rendering bugs with subtitles). Theoretically, everything should be
>> > fixable given enough efforts. But in practice it may be definitely
>> > more complex than just going with direct framebuffer rendering hacks
>> > :slight_smile:
>>
>> Am I misreading you, or is the above paragraph saying that XV is
>> suboptimal because mplayer has bugs when *not* using it?
>
> Yes, you are definitely misreading me.

What *are* you saying?

Just try to read the following, paying a bit more attention:

1. MPlayer implements "direct rendering" method, which is specifically used to
avoid excessive data copies to improve performance, it is expected to
provide delayed subtitles rendering.
2. The implementation of "direct rendering" in MPlayer is not very good, and
it is even not enabled by default.
3. More specifically, the first problem is that sometimes "direct rendering"
is internally disabled and naturally does not provide expected speedup (when
used with H264 for example)
4. The other problem is that even when "direct rendering" works, it may
sometimes corrupt subtitles or OSD. I just checked and don't see it anymore in
the latest MPlayer (it was a lot worse 1 or 2 years ago). Anyway, this problem
is still mentioned in MPlayer man page.
5. It is definitely possible to fix "direct rendering" in MPlayer, but people
seem prefer to prefer direct framebuffer access hacks as they are much easier
to implement.

Also don't forget the subject of this topic (that's why mplayer bugs are
relevant here). Anything else you want to know?

PS. If you really want to flame, it is better to continue this on IRC :slight_smile: