Feature Request: HW Accelerated Filters

Samwise Gamgee

It would be nice if we could get the full suite of nvdec/cuda filters. I am most interested in scale_npp myself, but there's a whole host of these: transpose_npp, scale_cuda、yadif_cuda、thumbnail_cuda、overlay_cuda...

Vouk

Yes, I see your point on this. I plan to add this to the voukoder successor actually and I would like to implement this in a smart way.

From what I understand is how the would work in the current voukoder:

Copy frame from CPU to GPU
Apply CUDA filter
Copy frame from GPU to CPU
Copy frame from CPU to GPU
Do GPU encoding
Copy frame from GPU to CPU
Write to disk
Go to step 1

I'd love to do it in this way:

Copy multiple frames from CPU to GPU (as many as fit in the GPUs memory)
Run CUDA filters over all frames
Du GPU encoding on all frames
Copy frame from GPU to CPU
Write to disk
Go to step 1

I'm trying to find out if the FFmpeg accelerated filters work like the first way to the second way. Do you have any info on this?

Samwise Gamgee

I am certain that this is possible. Indeed, in ffmpeg currently such behavior is fairly easy to do. I believe that the difficulty with Premiere/Vegas/etc. will be getting the frame to decode at all on the GPU more than anything else, but I could be wrong. My current frameserver method has to decode frames on the CPU sadly, but the rest is done entirely on the GPU - but you might be able to do everything on the GPU with your method.

https://docs.nvidia.com/video-technolo…cceleration.pdf

https://developer.nvidia.com/blog/nvidia-ff…nscoding-guide/

These seem to be the most immediately relevant.

Currently, the way I do it, using avisynth+, 64-bit ffmpeg, and 64-bit debugmode frameserver, is as follows:

frameserver.avs

Code

AviSource("frameserver.avi")
ConvertToYUV444()

Code

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i frameserver.avs -filter:v "hwupload_cuda,scale_npp=w=2560:h=1440:interp_algo=lanczos" -c:v hevc_nvenc -rc vbr -cq 24 -qmin 24 -qmax 24 -level 6.2 -profile:v rext -y output.mkv

Vouk

Zitat von Samwise Gamgee

I believe that the difficulty with Premiere/Vegas/etc. will be getting the frame to decode at all on the GPU more than anything else, but I could be wrong

This is out of my control. At the beginning of it all I get a pointer to a frame buffer in the CPU addressable memory. So the NLE is doing all decoding, rendering, etc.

Some months ago I was investigating if CUDA could improve the speed of pixel format conversion in the high bit depth mode. I noticed coping frames back and forth between CPU and GPU memory is quite expensive and I should do it as less as possible.

I guess the "cuda" pixel format in FFmpeg contains a pointer to the GPU memory. So I guess with the FFmpeg command line above you're doing exactly what is was talking about as "way 2".

As I am using FFmpeg/libav on the C API level, I still have to figure it all out. But this will be included in voukoder sooner or later.

Samwise Gamgee

If you would like to put a bounty on this, I would definitely be willing to fill it, for what it's worth.

Feature Request: HW Accelerated Filters

Tags

Benutzer online in diesem Thema