It would be nice if we could get the full suite of nvdec/cuda filters. I am most interested in scale_npp myself, but there's a whole host of these: transpose_npp, scale_cuda、yadif_cuda、thumbnail_cuda、overlay_cuda...
Feature Request: HW Accelerated Filters
-
- Adobe Premiere / MediaEncoder
-
Samwise Gamgee -
19. März 2021 um 11:12
-
-
- Offizieller Beitrag
Yes, I see your point on this. I plan to add this to the voukoder successor actually and I would like to implement this in a smart way.
From what I understand is how the would work in the current voukoder:
- Copy frame from CPU to GPU
- Apply CUDA filter
- Copy frame from GPU to CPU
- Copy frame from CPU to GPU
- Do GPU encoding
- Copy frame from GPU to CPU
- Write to disk
- Go to step 1
I'd love to do it in this way:
- Copy multiple frames from CPU to GPU (as many as fit in the GPUs memory)
- Run CUDA filters over all frames
- Du GPU encoding on all frames
- Copy frame from GPU to CPU
- Write to disk
- Go to step 1
I'm trying to find out if the FFmpeg accelerated filters work like the first way to the second way. Do you have any info on this?
-
I am certain that this is possible. Indeed, in ffmpeg currently such behavior is fairly easy to do. I believe that the difficulty with Premiere/Vegas/etc. will be getting the frame to decode at all on the GPU more than anything else, but I could be wrong. My current frameserver method has to decode frames on the CPU sadly, but the rest is done entirely on the GPU - but you might be able to do everything on the GPU with your method.
https://docs.nvidia.com/video-technolo…cceleration.pdf
https://developer.nvidia.com/blog/nvidia-ff…nscoding-guide/
These seem to be the most immediately relevant.
Currently, the way I do it, using avisynth+, 64-bit ffmpeg, and 64-bit debugmode frameserver, is as follows:
frameserver.avs
-
- Offizieller Beitrag
I believe that the difficulty with Premiere/Vegas/etc. will be getting the frame to decode at all on the GPU more than anything else, but I could be wrong
This is out of my control. At the beginning of it all I get a pointer to a frame buffer in the CPU addressable memory. So the NLE is doing all decoding, rendering, etc.
Some months ago I was investigating if CUDA could improve the speed of pixel format conversion in the high bit depth mode. I noticed coping frames back and forth between CPU and GPU memory is quite expensive and I should do it as less as possible.
I guess the "cuda" pixel format in FFmpeg contains a pointer to the GPU memory. So I guess with the FFmpeg command line above you're doing exactly what is was talking about as "way 2".
As I am using FFmpeg/libav on the C API level, I still have to figure it all out. But this will be included in voukoder sooner or later.
-
If you would like to put a bounty on this, I would definitely be willing to fill it, for what it's worth.