Experimental ffmpeg-vaapi plugin

w23

New Member
So I really wanted to stream Clustertruck in 1440p60, so I spent a whole weekend reading ffmpeg sources instead.
As a result, there's a very naive ffmpeg-vaapi plugin (basically is a copy of ffmpeg-nvenc with vaapi-specific hw frame upload added) in the obs-ffmpeg module in this branch: https://github.com/w23/obs-studio/tree/ffmpeg-vaapi
The exact commit adding it: https://github.com/w23/obs-studio/commit/9c70ee2347285c4d7e087106c565ba5b5bbe16a6

It is in a rather early stage:
  • no GUI controls
  • display/device name is hardcoded to ":0"
  • memory management/leaks were not considered at all
  • actual performance wasn't measured

I'd appreciate any kind of feedback if anyone is interested.

My experience playing with it for a few hours so far:
  1. FFmpeg@master, Mesa 13.0.3, VA-API 0.39 (libva 1.7.3), kernel 4.8.12 on AMD Radeon R9 Fury X: there are weird issues with the video it produces. Basically it looks like ffmpeg+h264_vaapi emits packets very rarely, only a couple per second. Which could be fine otherwise, but the thing is that this packet rate equals apparent framerate of the produced video (1440p2 is not what I wanted!). Also, I don't visually see any P-frames at all. Tuning gop_size or other parameters doesn't make this packet rate better. This HW also requires b-frames to be set to zero, and VAAPI_DISABLE_INTERLACE=1 envvar set.
  2. On another hw (some intel on dell xps'13 from 2015) that I've tested very-very briefly, the stream is also low-fps and jittery, but not that much (likely the vaapi is fine, but the machine itself is rather weak). And p-frames are clearly visible.

Thing is that this jitter is not obs-specific. For example, running a screen capture with ffmpeg itself produces the same (if not actually worse) result:
Code:
ffmpeg \
    -loglevel debug \
    -f x11grab -video_size 2560x1440 -framerate 60 -x 1920 -i :0.0 \
    -vaapi_device ":0" \
    -vf 'format=nv12,hwupload' -map 0:0 -threads 8 -aspect 16:9 -y -f mp4 \
    -bf 0 -qp 42 -quality 8 \
    -vcodec h264_vaapi -profile 100 \
    test-vaapi.mp4

Testing instructions.
0. libva is obviously needed.
1. Rather fresh ffmpeg with h264_vaapi encoding support is required. I took latest master (and I probably shouldn't have done that! This would be ironic if the issues above are due to an dev-unstable ffmpeg).
If you need to compile ffmpeg yourself, then you'd need it to have at least the following options for its ./configure:
Code:
--enable-shared --enable-pic --disable-static \
--enable-hwaccel=h264_vaapi \
--enable-filter=hwupload,scale \
--enable-encoder=h264_vaapi,aac \
--enable-muxer=h264,mp4,flv,md5 \
--enable-protocol=file,rtmp \
--enable-decoder=rawvideo

Also for cmdline ffmpeg testing:
Code:
--enable-indev=v4l2,x11grab_xcb,xcbgrab \
--enable-parser=mjpeg \
--enable-decoder=mjpeg

Also also, don't forget to set envvar PKG_CONFIG_PATH=<where-you-installed-ffmpeg>/lib/pkgconfig before you run cmake on OBS.
2. Build OBS and run it as usual. Go to advanced and pick VAAPI encoder.
 
Last edited:

Lain

Forum Admin
Lain
Forum Moderator
Developer
Just popping in to say that this is awesome that you wrote this. Unfortunately at the moment my dedicated linux machine is down so I can't test, but I'm going to have some other people try this out in the mean time and see how it runs for them with different hardware.
 

w23

New Member
Thanks! I have spent a lot more time on this HW encoding on Linux problem. Here's what I found.

TL;DR: I could get AMD GPU to encode h264 only using gstreamer-vaapi.

Mesa (as of 13.0.3) does support hardware h264 encoding on AMD GPUs. However, there are limitations: vaDeriveImage() function always fails, there is no support for B-frames and packed headers. These are pretty much hardcoded in Mesa VAAPI state tracker, so hardware is not even asked of its capabilities. Have no idea what no packed headers mean (haven't read the MPEG4 AVC spec {yet?:|}), and also not sure about implications of not having B-frames for streaming games. No vaDeriveImage is also not fatal, but it means that we can't direcly map to GPU mem, so there's a performance loss by yet another memcpy (and a more complicated codepath).
And another thing, there's a bug where AMD driver interprets everything as interlaced (despite what's been told via libva API), so one should always have VAAPI_DISABLE_INTERLACE=1 in the environment all the time.

I couldn't make any version of FFmpeg to correctly make use of VAAPI. The release (3.2.2) version just crashes. Current master complains that packed headers aren't there and produces this low-framerate output that I talked about above.

The official libva tests/samples also don't work, because they expect vaDeriveImage to work.

There already was another VAAPI plugin for OBS made a few years ago: https://github.com/reboot/obs-studio/tree/vaapi-h264/plugins/linux-vaapi. Making it compile and load with contemporary OBS is trivial. But I couldn't make it work on my driver. Basically, it expects h264 packed headers, which aren't there.

The only thing that does work, and seems to work sufficiently well is using gstreamer-vaapi. This command does produce a valid 1440p60 video while using under 15% CPU:
Code:
gst-launch-1.0 -e ximagesrc display-name=:0 use-damage=0 startx=1920 starty=0 endx=$((1920+2560-1)) endy=1439 !\
    multiqueue ! video/x-raw,format=BGRx,framerate=60/1 ! videoconvert ! video/x-raw,format=I420,framerate=60/1 !\
    multiqueue ! vaapih264enc dct8x8=true ! h264parse ! multiqueue ! matroskamux name=muxer muxer. ! progressreport name=Rec_time !\
    filesink location=/tmp/gstreamer-video.mkv
However:
- I haven't tried to use it for longer than a few minutes.
- Capturing frames still does interfere with games. E.g. Clustertruck (which triggered all this endeavor!) still experiences frame drops when capturing. This needs to be profiled.

VAAPI seems to have no way of accessing hardware framebuffer. It can use GLX context and texture for output, but not for input. Maybe its possible to use lower level dri2 apis to do something like that, but this is way beyond my immediate capabilities.

Or maybe it would be possible to write a special Xorg compositor that could capture frames at a lower level and more efficiently. I have no idea.

So, a conclusion:
1. The only way forward for me is to make yet another VAAPI plugin for OBS, this time based on gstreamer. From the looks of it, gstreamer seems to be documented and sane (yes, I am looking at you, FFmpeg), so maybe this or next weekend I will come up with something.
2. I need to profile the hell out of all this if I ever want to share with my friends how bad I am at Clustertruck.
 

ZombieMeat

New Member
You are the best! Just created an account to tell you that.

I've been able to test it out on Dell XPS 13(Kaby Lake), with a few more lines to let it be able to tweak the per-encoder options.

I would say my experience have been good, although I haven't used it for a long period. The only thing of note is that higher value for "quality" means faster or lower quality, per "ffmpeg -h encoder=h264_vaapi." And, they are not even supported on intel GPU, with only 0 and 1 supported while other values give crashes.
 

Attachments

  • vaapi-options.txt
    2.9 KB · Views: 405

beniwtv

New Member
Nice work! Going to try that out this weekend - have hoped someone would implement this!
I'm going to try with a AMD RX 480 8GB - MESA 13.0.4.
 

ZombieMeat

New Member
Been playing around a bit. It seems like the most significant parameter is QP. Also, Specifying bitrate made FFmpeg behave a bit funky. It forces the bitrate to be consistent with the specified one even when it doesn't need to. Moreover, when the image changes drastically it needs more bits to encode: the bitrate rises; the buffer overflows, which seems to degrade the encoding quality quite a bit.

I'm all new to this so just conjecturing, but instead of setting bitrate, what if we only set the buffer size to reduce the possibility of overflow. Right now, QP and buffer size combination needs to be tested empirically.

In case anyone is interested, I made a PKGBUILD(archlinux) for my test setup.
 

Attachments

  • obs-vaapi.zip
    6.3 KB · Views: 227

Xaymar

Active Member
The easiest way to explain that is to know the VCE part responds to certain parameters. There's several ones which affect what values are picked on the hardware but lets go with the absolute default one (Usage: Transcoding). Since I don't know what Mesa's VAAPI integration all actually sets, this is mostly from the actual usage on Windows, which should be identical since it maps almost directly to the hardware from my experience (+ some gpu transfer/conversion stuff).

There are three main Rate Control Methods that VCE has: Constant QP, Constant Bitrate and Variable Bitrate. Variable Bitrate has a Peak Constrained and a Latency Constrained version (latter is great for recording with no impact, former is great for actual quality). Constant Bitrate is the only one of these where VCE uses Filler Data, though normally only if enabled. If Constant Bitrate is used without Filler Data, it behaves like Peak Constrained Variable Bitrate except that the Target Bitrate is the Peak Bitrate and Peak Bitrate is ignored.

So in order to actually get Variable Bitrate behaviour, FFMPEG would need to be configured to use VBR mode. I'm not sure if Mesa's VAAPI exposes this.

As for Buffer Size (VBV Buffer Size), if you want your Bitrate to be perfectly matched you'd want a value between 1/FPS*Bitrate and 8/FPS*Bitrate. The lower you go the less space an individual packet can take up (directly affects I&P&B Frame quality.
 

beniwtv

New Member
So I got around to test this now, and after spending the whole day I have to report success!

I used the PKGBUILD files that @ZombieMeat provided above - allthough adapted for Docker and Ubuntu.

With FFmpeg 3.2.2 it crashes - just like @w23 reported. Also it complains about no B-frames and crashes if you set these to 0.
With FFmpeg master from today (5. Feb, 2017), it no longer complains about B-frames and does not crash.

Actually, it seems to encode just fine - and CPU does stay low - the same as if you're not encoding :)
I did notice some text artifacts - but I have not yet been playing around with the options provided (just leaving them standard as they came with it).

So my final configuration was:
MESA 17.0.1-devel from Padoka ppa
Docker using Ubuntu 16.04 as base
AMD RX 480

I have attached my Docker files in case anyone wants to have a quick way of testing :)
NOTE: The start script is currently hard-coded for display :0 and UID 1000

EDIT: Made a video
https://www.youtube.com/watch?v=m8OBFLaNl5Q
 

Attachments

  • OBS.zip
    6.9 KB · Views: 179
Last edited:

w23

New Member
My apologies for long absence and no progress on my plugin.
The thing is I figured out that VAAPI doesn't help me with the capture performance problems I have on my system, and the actual bottleneck is somewhere else (likely XSHM). Therefore, I don't think I will be making any progress here soon. If anyone wants to take the plugin and make it production ready, be my guest. I believe the only major thing left to do is to add proper GUI controls. License is whatever license OBS is under.

I want to do a thorough Linux screen capture performance research (including things like custom compositors, Wayland and friends) in the coming months. But I cannot promise anything as there is just too much stuff on my plate already.
 
Tried to compile but getting the following error and it fails: /home/user/Downloads/obs-studio-vaapi-h264/plugins/linux-vaapi/surface-queue.h:3:19: fatal error: va/va.h: No such file or directory
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2
ubuntu 16.04
ffmpeg version 3.2.4

Fixed: I forgot to install libva-dev
 
Last edited:

Bleuzen

New Member
Works very well on Arch with an i7-7700 iGPU (Intel® HD Graphics 630). I just had to change the display/device name from ":0" to "/dev/dri/renderD128".
Thanks! Now I can record / stream without much CPU usage.
Please implement this in OBS (with GUI) ... this would be great ;D
 
Going to try this out with 16.10 of Ubuntu tonight. It has to be better than the horrible experience I had trying to get QSV support working. At least this doesn't require a new kernel compilation.
 
I managed to get the reboot/vaapi-h264 version working with the master branch of obs-studio. No changes were necessary to get it to compiled besides what was in his base branch. It would be fantastic if somebody could get this incorporated into the main obs-studio code base. I can create a branch and pull request but we really should contact Reboot to get permission to include his work into the main distribution.

My CPU usage went from hitting 19 to 20 percent, to between 5% and 8% with using the VAAPI-H264 encoder settings.

Update: I created a simple blog entry with a link to Reboot's code working with the current master branch (19.x) In case anybody is interested. No problems merging the code in and getting it to work. Here is the link.

https://wordpress.com/post/intellectualcramps.wordpress.com/1151
 
Last edited:

Arjen

New Member
I tried the merge git repo, it all seems to go quite well until I press 'stop recording'. The terminal then shows the following error message:

Code:
info: [ffmpeg muxer: 'adv_file_output'] Writing file '/home/arjen/Videos/2017-06-16_16-09-32.mkv'...
error: [VAAPI encoder]: "vaEndPicture(q->display, q->context)": invalid parameter
error: [VAAPI encoder]: unable to encode frame
error: Error encoding with encoder 'streaming_h264'

My vainfo output is as follows:

Code:
vainfo: VA-API version: 0.40 (libva )
vainfo: Driver version: Intel i965 driver for Intel(R) Haswell Mobile - 1.8.2
vainfo: Supported profile and entrypoints
  VAProfileMPEG2Simple  :   VAEntrypointVLD
  VAProfileMPEG2Simple  :   VAEntrypointEncSlice
  VAProfileMPEG2Main  :   VAEntrypointVLD
  VAProfileMPEG2Main  :   VAEntrypointEncSlice
  VAProfileH264ConstrainedBaseline:   VAEntrypointVLD
  VAProfileH264ConstrainedBaseline:   VAEntrypointEncSlice
  VAProfileH264Main  :   VAEntrypointVLD
  VAProfileH264Main  :   VAEntrypointEncSlice
  VAProfileH264High  :   VAEntrypointVLD
  VAProfileH264High  :   VAEntrypointEncSlice
  VAProfileH264MultiviewHigh  :   VAEntrypointVLD
  VAProfileH264MultiviewHigh  :   VAEntrypointEncSlice
  VAProfileH264StereoHigh  :   VAEntrypointVLD
  VAProfileH264StereoHigh  :   VAEntrypointEncSlice
  VAProfileVC1Simple  :   VAEntrypointVLD
  VAProfileVC1Main  :   VAEntrypointVLD
  VAProfileVC1Advanced  :   VAEntrypointVLD
  VAProfileNone  :   VAEntrypointVideoProc
  VAProfileJPEGBaseline  :   VAEntrypointVLD

Any ideas?
 

cRaZy-bisCuiT

New Member
Update: I created a simple blog entry with a link to Reboot's code working with the current master branch (19.x) In case anybody is interested. No problems merging the code in and getting it to work. Here is the link.

https://wordpress.com/post/intellectualcramps.wordpress.com/1151
Unfortunately I can't read that blog entry: Wordpress asks me for my login credentials. Could you check that address please?

Also, has someone else managed to merge the patch with the current master of obs? Is there a tutorial somewhere? Thanks to all girls & guys participating here! :)
 

RytoEX

Forum Admin
Forum Moderator
Developer
Unfortunately I can't read that blog entry: Wordpress asks me for my login credentials. Could you check that address please?

Also, has someone else managed to merge the patch with the current master of obs? Is there a tutorial somewhere? Thanks to all girls & guys participating here! :)

I assume @David Carver meant this URL: https://intellectualcramps.wordpress.com/2017/06/08/obs-studio-and-hardware-encoding-for-linux/

There is a pull request on GitHub, which I was able to successfully compile in a VM, but I didn't extensively test it. As far as I know, it's currently waiting on some pretty substantial rewrites.
 
Top