So it is exporting the file successfully but it is rather slow?
Well, voukoder only accelerates the encoding part of the export. Unfortunately AfterEffects is rather slow with rendering the single frames (and handing them over to voukoder).
So it is exporting the file successfully but it is rather slow?
Well, voukoder only accelerates the encoding part of the export. Unfortunately AfterEffects is rather slow with rendering the single frames (and handing them over to voukoder).
Please provide a log file.
I will try to reproduce this.
You might also try this: https://www.voukoder.org/forum/thread/239
AMD has a CUDA equivalent called ROC but it isn't that popular.
I'm not sure how CUDA will be finally implemented. Currently I'd like to get some feedback first. But yes, I won't force you to use slower functionality.
Yes, please use i.e. YUV 4:2:0 10 bit.
Kleinrotti Seems the high core count works pretty good with AVX2 code.
@All Other experiences?
Please report if ...
At some point we have use the latest SDK. At least when FFmpeg doesn't allow us to use older SDKs anymore. It is just a question time, I guess.
In the Voukoder successor I plan to have the FFmpeg binaries as DLL. They would be exchangeable. But I don't want to hack this in the existing voukoder.
I need to perform one FMA operation per sub-pixel. This can be parallelized with CUDA.
I'm not sure how I can archive this using tensor yet.
Test release: https://github.com/Vouk/voukoder-….0.msi?raw=true
(Use the latest NVIDIA driver version)
These presets are available when compiling FFmpeg with the NVIDIA SDK 10.0.
Voukoder currently uses SDK 9.0 to support even older GPUs or mobile GPUs (Bring back NVENC support for mobile Kepler-series GPUs to Voukoder)
This would mean to drop support for older GPUs.
Exporting a project with 2048 x 1152 and float32 color precision (per channel).
1 pixel per CUDA thread processing 4 floats per FMA in a 32 x 32 block
Unfortunately the cudaMemCopy's are like 10-15ms each. Processing more frames per call would speed this up even more.
[11:26:35] Frame #277: vRender: 39 us, vProcess: 41738 us, vEncoding: 19984 us, aRender: 70 us, aEncoding: 268 us, Latency: 64781 us
[11:26:35] Frame #278: vRender: 37 us, vProcess: 44431 us, vEncoding: 19885 us, aRender: 61 us, aEncoding: 12 us, Latency: 66833 us
[11:26:35] Frame #279: vRender: 33 us, vProcess: 41376 us, vEncoding: 18816 us, aRender: 62 us, aEncoding: 310 us, Latency: 62777 us
[11:26:35] Frame #280: vRender: 39 us, vProcess: 43909 us, vEncoding: 18867 us, aRender: 55 us, aEncoding: 218 us, Latency: 65696 us
[11:26:35] Frame #281: vRender: 35 us, vProcess: 43756 us, vEncoding: 20499 us, aRender: 53 us, aEncoding: 231 us, Latency: 66800 us
[11:26:35] Frame #282: vRender: 31 us, vProcess: 43390 us, vEncoding: 20808 us, aRender: 65 us, aEncoding: 315 us, Latency: 66789 us
[09:54:33] Frame #288: vRender: 40 us, vProcess: 18746 us, vEncoding: 22134 us, aRender: 68 us, aEncoding: 19 us, Latency: 44891 us
[09:54:33] Frame #289: vRender: 36 us, vProcess: 21211 us, vEncoding: 18160 us, aRender: 72 us, aEncoding: 322 us, Latency: 42256 us
[09:54:34] Frame #290: vRender: 36 us, vProcess: 18531 us, vEncoding: 20253 us, aRender: 59 us, aEncoding: 214 us, Latency: 41408 us
[09:54:34] Frame #291: vRender: 35 us, vProcess: 18369 us, vEncoding: 22336 us, aRender: 70 us, aEncoding: 327 us, Latency: 43288 us
[09:54:34] Frame #292: vRender: 40 us, vProcess: 17668 us, vEncoding: 18668 us, aRender: 63 us, aEncoding: 17 us, Latency: 38560 us
[09:54:34] Frame #293: vRender: 36 us, vProcess: 17704 us, vEncoding: 19705 us, aRender: 71 us, aEncoding: 327 us, Latency: 40145 us
[11:30:00] Frame #70: vRender: 31 us, vProcess: 36414 us, vEncoding: 16255 us, aRender: 1083 us, aEncoding: 10 us, Latency: 55197 us
[11:30:00] Frame #71: vRender: 83 us, vProcess: 40397 us, vEncoding: 15759 us, aRender: 577 us, aEncoding: 244 us, Latency: 59374 us
[11:30:00] Frame #72: vRender: 30 us, vProcess: 36319 us, vEncoding: 15735 us, aRender: 930 us, aEncoding: 357 us, Latency: 54855 us
[11:30:00] Frame #73: vRender: 1774 us, vProcess: 47668 us, vEncoding: 70102 us, aRender: 13 us, aEncoding: 245 us, Latency: 121703 us
[11:30:00] Frame #74: vRender: 34 us, vProcess: 40626 us, vEncoding: 15824 us, aRender: 610 us, aEncoding: 8 us, Latency: 58531 us
[11:30:00] Frame #75: vRender: 35 us, vProcess: 40386 us, vEncoding: 15860 us, aRender: 565 us, aEncoding: 234 us, Latency: 58775 us
[09:28:47] Frame #1720: vRender: 30 us, vProcess: 12659 us, vEncoding: 13886 us, aRender: 893 us, aEncoding: 332 us, Latency: 29435 us
[09:28:47] Frame #1721: vRender: 38 us, vProcess: 13909 us, vEncoding: 17943 us, aRender: 894 us, aEncoding: 421 us, Latency: 35310 us
[09:28:47] Frame #1722: vRender: 39 us, vProcess: 13063 us, vEncoding: 14418 us, aRender: 558 us, aEncoding: 8 us, Latency: 30184 us
[09:28:47] Frame #1723: vRender: 32 us, vProcess: 13319 us, vEncoding: 14304 us, aRender: 12 us, aEncoding: 343 us, Latency: 29725 us
[09:28:47] Frame #1724: vRender: 51 us, vProcess: 14712 us, vEncoding: 15048 us, aRender: 653 us, aEncoding: 244 us, Latency: 33087 us
[09:28:47] Frame #1725: vRender: 30 us, vProcess: 13147 us, vEncoding: 15400 us, aRender: 570 us, aEncoding: 7 us, Latency: 30813 us
Playing with CUDA right now and already got a speed increase of 69% when using bit depths greater than 8 bit on my development machine.
Still working on it.
The DirectX way doesn't work because there is no texture support of YUVA floats.
Trying CUDA ...
That might take some time.
I have in mind to create a directx texture of the frame and convert it using the gpu, yes. But this is still to be done, and i have it planned for the voukoder successor.
Voukoders processing path for > 8 bit is not optimized yet. Premiere delivers floating point data for high bit depth video. This needs to be converted to a pixel format that FFmpeg understands.