Vulkan video shenanigans – FFmpeg + RADV integration experiments – Maister’s Graphics Adventures

Vulkan video is lastly right here and it’s a fierce battle to get issues working absolutely. The leaders of the pack proper now with the complete launch is RADV (Dave Airlie) and FFmpeg (Lynne).
In Granite, I’ve been wanting a stable GPU video decoding resolution and I figured I’d work on a Vulkan video implementation over the vacations to strive serving to iron out any kinks with real-world software integration. The purpose was reaching all the things a 3D engine might probably need out of video decode.
- {Hardware} accelerated
- GPU decode to RGB with out round-trip via system reminiscence (with non-compulsory mip technology when positioned in a 3D world)
- Audio decode
- A/V synchronization
This weblog is generally right here to reveal the progress in FFmpeg + RADV. I made a neat little pattern app that absolutely makes use of Vulkan video to do a easy Sponza cinema. It helps A/V sync and searching for, which covers most of what an actual media participant would want. Ideally, this can be utilized as a take a look at bench.
Place a video feed as a 3D object inside Sponza, why not?
Introduction weblog submit – learn this primary
This blog post by Lynne summarizes the state of Vulkan video on the time it was written. Observe that none of that is merged upstream as of writing and APIs are altering quickly.
Constructing FFmpeg + RADV + Granite
FFmpeg
Be sure that to put in the very newest Vulkan headers. On Arch Linux, set up vulkan-headers-git from AUR for instance.
Try the department within the weblog and construct. Be sure that to put in it in some throwaway prefix, e.g.
./configure --disable-doc --disable-shared --enable-static --disable-ffplay --disable-ffprobe --enable-vulkan --prefix=$HOME/ffmpeg-vulkan
Mesa
Try https://gitlab.freedesktop.org/airlied/mesa/-/commits/radv-vulkan-video-decode. Then construct with:
mkdir construct cd construct meson setup .. -Dvideo-codecs=h264dec,h265dec --buildtype launch ninja
Granite
git clone https://github.com/Themaister/Granite cd Granite git submodule replace --init mkdir construct cd construct cmake .. -DCMAKE_BUILD_TYPE=Launch -DGRANITE_FFMPEG=ON -DGRANITE_AUDIO=ON -DGRANITE_FFMPEG_VULKAN=ON -G Ninja -DCMAKE_PREFIX_PATH=$HOME/ffmpeg-vulkan ninja video-player
Working take a look at app
Fundamental operation, a bizarre video participant the place the picture is a flat 3D object floating in area. For enjoyable the video can also be mip-mapped and the airplane is anisotropically filtered, as a result of why not.
RADV_PERFTEST=video_decode GRANITE_FFMPEG_VULKAN=1 ./assessments/video-player /tmp/take a look at.mkv
Controls
- WASD: transfer digicam
- Arrow keys: rotate digicam
- House: Toggle pause
- HJKL: Vim model for searching for
You probably have https://github.com/KhronosGroup/glTF-Sample-Models checked out you may add a glTF scene as effectively for enjoyable. I hacked it along with Sponza in thoughts, so:
RADV_PERFTEST=video_decode GRANITE_FFMPEG_VULKAN=1 ./assessments/video-player $HOME/git/glTF-Pattern-Fashions/2.0/Sponza/glTF/Sponza.gltf /tmp/take a look at.mkv
and you then get the screenshot above with no matter video you’re utilizing ????
Integration API
The Granite implementation might be present in https://github.com/Themaister/Granite/blob/master/video/ffmpeg_decode.cpp. It is going to most likely be totally different within the ultimate upstreamed model, so beware. I’m not an FFmpeg developer both FWIW, so take this implementation with a number of grains of salt.
To combine with Vulkan video, there are some steps we have to take. This assumes some familiarity with FFmpeg APIs. That is principally fascinating for non-FFmpeg builders. I needed to determine this out with assist from Lynne, spelunking in mpv and looking out over the {hardware} decode samples in FFmpeg upstream.
Creating shared gadget
Earlier than opening the decode context with:
avcodec_open2(ctx, codec, nullptr)
we’ll present libavcodec with a {hardware} gadget context.
avcodec_get_hw_config(codec, index)
to scan via till you discover a Vulkan configuration.
AVBufferRef *hw_dev = av_hwdevice_ctx_alloc(config->device_type); auto *hwctx = reinterpret_cast<AVHWDeviceContext *>(hw_dev->knowledge); auto *vk = static_cast<AVVulkanDeviceContext *>(hwctx->hwctx); hwctx->user_opaque = this; // For callbacks later.
To interoperate with FFmpeg, we now have to supply it our personal Vulkan gadget and many details about how we created the gadget.
vk->get_proc_addr = Vulkan::Context::get_instance_proc_addr(); vk->inst = device->get_instance(); vk->act_dev = device->get_device(); vk->phys_dev = device->get_physical_device(); vk->device_features = *device->get_device_features().pdf2; vk->enabled_inst_extensions = device->get_device_features().instance_extensions; vk->nb_enabled_inst_extensions = int(device->get_device_features().num_instance_extensions); vk->enabled_dev_extensions = device->get_device_features().device_extensions; vk->nb_enabled_dev_extensions = int(device->get_device_features().num_device_extensions);
Happily, I had most of this question scaffolding in place for Fossilize integration already. Vulkan 1.3 core is required right here as effectively, so I needed to bump that too when Vulkan video is enabled.
auto &q = device->get_queue_info(); vk->queue_family_index = int(q.family_indices[Vulkan::QUEUE_INDEX_GRAPHICS]); vk->queue_family_comp_index = int(q.family_indices[Vulkan::QUEUE_INDEX_COMPUTE]); vk->queue_family_tx_index = int(q.family_indices[Vulkan::QUEUE_INDEX_TRANSFER]); vk->queue_family_decode_index = int(q.family_indices[Vulkan::QUEUE_INDEX_VIDEO_DECODE]); vk->nb_graphics_queues = int(q.counts[Vulkan::QUEUE_INDEX_GRAPHICS]); vk->nb_comp_queues = int(q.counts[Vulkan::QUEUE_INDEX_COMPUTE]); vk->nb_tx_queues = int(q.counts[Vulkan::QUEUE_INDEX_TRANSFER]); vk->nb_decode_queues = int(q.counts[Vulkan::QUEUE_INDEX_VIDEO_DECODE]); vk->queue_family_encode_index = -1; vk->nb_encode_queues = 0;
We have to let FFmpeg learn about the way it can question queues. Shut match with Granite, however I had so as to add some additional APIs to make this work.
We additionally want a strategy to lock Vulkan queues:
vk->lock_queue = [](AVHWDeviceContext *ctx, int, int) { auto *self = static_cast<Impl *>(ctx->user_opaque); self->device->external_queue_lock(); }; vk->unlock_queue = [](AVHWDeviceContext *ctx, int, int) { auto *self = static_cast<Impl *>(ctx->user_opaque); self->device->external_queue_unlock(); };
For integration functions, not making vkQueueSubmit internally synchronized in Vulkan was a mistake I feel, oh effectively.
As soon as we’ve created a {hardware} context, we are able to let the codec context borrow it:
hw.gadget = av_hwdevice_ctx_init(hw_dev); // Unref later.
ctx->hw_device_ctx = av_buffer_ref(hw.gadget);
We additionally should override get_format() and return the {hardware} pixel format.
ctx->opaque = this; ctx->get_format = []( AVCodecContext *ctx, const enum AVPixelFormat *pix_fmts) -> AVPixelFormat { auto *self = static_cast<Impl *>(ctx->opaque); whereas (*pix_fmts != AV_PIX_FMT_NONE) { if (*pix_fmts == self->hw.config->pix_fmt) return *pix_fmts; pix_fmts++; } return AV_PIX_FMT_NONE; };
This can work, however we’re additionally purported to create a frames context earlier than getting back from get_format(). This additionally lets us configure how Vulkan pictures are created.
int ret = avcodec_get_hw_frames_parameters( ctx, ctx->hw_device_ctx, AV_PIX_FMT_VULKAN, &ctx->hw_frames_ctx); // Verify error. auto *frames = reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->knowledge); auto *vk = static_cast<AVVulkanFramesContext *>(frames->hwctx); vk->img_flags |= VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT; ret = av_hwframe_ctx_init(ctx->hw_frames_ctx); // Verify error.
The first motivation for overriding picture creation was that I needed to do YCbCr to RGB conversion in a extra unified method, i.e. utilizing particular person planes. That might be appropriate with non-Vulkan video as effectively, however taking airplane views of a picture requires VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT.
Utilizing per-plane views is necessary, as we’ll see later. YCbCr samplers fall flat when coping with sensible video use instances.
Processing AVFrames
In FFmpeg, decoding works by sending AVPackets to a codec and it spits out AVFrame objects. If these frames are emitted by a software program codec, we simply poke at AVFrame::knowledge[] immediately, however with {hardware} decoders, AVFrame::pix_fmt is an opaque kind.
There are two methods we are able to cope with this. For non-Vulkan {hardware} decoders, simply read-back and add planes to a VkBuffer staging buffer later, ewwww.
AVFrame *sw_frame = av_frame_alloc(); if (av_hwframe_transfer_data(sw_frame, av_frame, 0) < 0) { LOGE("Did not switch HW body.n"); av_frame_free(&sw_frame); av_frame_free(&av_frame); } else { sw_frame->pts = av_frame->pts; av_frame_free(&av_frame); av_frame = sw_frame; }
Every {hardware} pixel format allows you to reinterpret AVFrame::knowledge[] in a “magical” method in the event you’re keen to poke into low-level knowledge constructions. For VAAPI, VDPAU and APIs like that there are methods to make use of buffer sharing by some means, however the particulars are extraordinarily bushy and is finest left to consultants. For Vulkan, we don’t even want exterior reminiscence!
First, we have to extract the decode format:
auto *frames = reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->knowledge); active_upload_pix_fmt = frames->sw_format;
Then we are able to question the VkFormat if we wish to keep multi-plane.
auto *hwdev = reinterpret_cast<AVHWDeviceContext *>(hw.device->knowledge); const VkFormat *fmts = nullptr; VkImageAspectFlags elements; VkImageUsageFlags utilization; int nb_images; int ret = av_vkfmt_from_pixfmt2(hwdev, active_upload_pix_fmt, VK_IMAGE_USAGE_SAMPLED_BIT, &fmts, &nb_images, &elements, &utilization);
Nonetheless, this has some pitfalls in apply. Video frames are typically aligned to a macro-block measurement or related, that means that the VkImage dimension won’t be equal to the precise measurement we’re purported to show. Even 1080p falls on this class for instance since 1080 doesn’t cleanly divide into 16×16 macro blocks. The one strategy to resolve this with out additional copies is to view planes individually with VK_IMAGE_ASPECT_PLANE_n_BIT and do texture coordinate clamping manually. This fashion we keep away from sampling rubbish when changing to RGB. av_vkfmt_from_pixfmt may help right here to infer the per-plane Vulkan codecs, however I simply did it manually both method.
// Actual output measurement. ubo.decision = uvec2(video.av_ctx->width, video.av_ctx->peak); if (video.av_ctx->hw_frames_ctx && hw.config && hw.config->device_type == AV_HWDEVICE_TYPE_VULKAN) { // Frames (VkImages) could also be padded. auto *frames = reinterpret_cast<AVHWFramesContext *>( video.av_ctx->hw_frames_ctx->knowledge); ubo.inv_resolution = vec2( 1.0f / float(frames->width), 1.0f / float(frames->peak)); } else { ubo.inv_resolution = vec2(1.0f / float(video.av_ctx->width), 1.0f / float(video.av_ctx->peak)); } // Should emulate CLAMP_TO_EDGE to keep away from filtering towards rubbish. ubo.chroma_clamp = (vec2(ubo.decision) - 0.5f * float(1u << plane_subsample_log2[1])) * ubo.inv_resolution;
Processing the body itself begins with magic casts:
auto *frames = reinterpret_cast<AVHWFramesContext *>(ctx->hw_frames_ctx->knowledge); auto *vk = static_cast<AVVulkanFramesContext *>(frames->hwctx); auto *vk_frame = reinterpret_cast<AVVkFrame *>(av_frame->knowledge[0]);
We’ve to lock the body whereas accessing it, FFmpeg is threaded.
vk->lock_frame(frames, vk_frame); // Do stuff vk->unlock_frame(frames, vk_frame);
Now, we now have to attend on the timeline semaphore (notice that Vulkan 1.3 is required, so that is assured to be supported).
// Purchase the picture from FFmpeg. if (vk_frame->sem[0] != VK_NULL_HANDLE && vk_frame->sem_value[0]) { // vkQueueSubmit(wait = sem[0], worth = sem_value[0]) }
Create a VkImageView from the offered picture. Based mostly on av_vkfmt_from_pixfmt2 or per-plane codecs from earlier, we all know the suitable Vulkan format to make use of when making a view.
Queue household possession switch will not be wanted. FFmpeg makes use of CONCURRENT for sake of our sanity.
Transition the structure:
cmd->image_barrier( *wrapped_image, vk_frame->structure[0], VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT /* sem wait stage */, 0, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_ACCESS_2_SHADER_SAMPLED_READ_BIT); vk_frame->structure[0] = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
Now, we are able to convert this to RGB as we want. I went with an async compute formulation. If this had been a pure video participant we might most likely blit this on to display with some fancy scaling filters.
Once we’re executed, we now have to “launch” the picture again to FFmpeg.
// Launch the picture again to FFmpeg. if (vk_frame->sem[0] != VK_NULL_HANDLE) { vk_frame->sem_value[0] += 1; // vkQueueSubmit(sign = sem[0], worth = sem_value[0]); }
And that’s it!
Take a look at outcomes
I attempted numerous codec configurations to see state of issues.
RADV
- H.264 – 8bit: Works
- H.264 – 10bit: Not supported by {hardware}
- H.265 – 8bit: Works
- H.265 – 10bit: Works
nvidia
- H.264: Damaged
- H.265: Appears to work
ANV
There’s a preliminary department by Airlie once more, however it doesn’t appear to have been up to date for ultimate spec but.
Conclusion
Thrilling occasions for Vulkan video. The API is ridiculously low stage and method too sophisticated for mere graphics programming mortals, which is why having firstclass help in FFmpeg and pals shall be so necessary to make the API usable.