Beiträge von Vouk

Vouk

So it is exporting the file successfully but it is rather slow?

Well, voukoder only accelerates the encoding part of the export. Unfortunately AfterEffects is rather slow with rendering the single frames (and handing them over to voukoder).

Vouk

Please provide a log file.

Where do I find the log file?

Vouk

I will try to reproduce this.

You might also try this: https://www.voukoder.org/forum/thread/239

Vouk

So you did install this component?

Did the installation finish successfully?

Vouk

Use the x264 or x265 encoder. You will find it in the "Pixel Format" dropdown.

Vouk

AMD has a CUDA equivalent called ROC but it isn't that popular.

Vouk

I'm not sure how CUDA will be finally implemented. Currently I'd like to get some feedback first. But yes, I won't force you to use slower functionality.

Vouk

Yes, please use i.e. YUV 4:2:0 10 bit.

Kleinrotti Seems the high core count works pretty good with AVX2 code.

@All Other experiences?

Vouk

Please report if ...

It works
It crashes
it is faster than the version before
it is slower than before
...

Vouk

At some point we have use the latest SDK. At least when FFmpeg doesn't allow us to use older SDKs anymore. It is just a question time, I guess.

In the Voukoder successor I plan to have the FFmpeg binaries as DLL. They would be exchangeable. But I don't want to hack this in the existing voukoder.

Vouk

I need to perform one FMA operation per sub-pixel. This can be parallelized with CUDA.

I'm not sure how I can archive this using tensor yet.

Vouk

NOT FOR PRODUCTION USE

Test release: https://github.com/Vouk/voukoder-….0.msi?raw=true

(Use the latest NVIDIA driver version)

Vouk

These presets are available when compiling FFmpeg with the NVIDIA SDK 10.0.

Voukoder currently uses SDK 9.0 to support even older GPUs or mobile GPUs (Bring back NVENC support for mobile Kepler-series GPUs to Voukoder)

This would mean to drop support for older GPUs.

Vouk

Exporting a project with 2048 x 1152 and float32 color precision (per channel).

1 pixel per CUDA thread processing 4 floats per FMA in a 32 x 32 block

Unfortunately the cudaMemCopy's are like 10-15ms each. Processing more frames per call would speed this up even more.

System One

AVX2 (i7-7700K)

[11:26:35] Frame #277: vRender: 39 us, vProcess: 41738 us, vEncoding: 19984 us, aRender: 70 us, aEncoding: 268 us, Latency: 64781 us

[11:26:35] Frame #278: vRender: 37 us, vProcess: 44431 us, vEncoding: 19885 us, aRender: 61 us, aEncoding: 12 us, Latency: 66833 us

[11:26:35] Frame #279: vRender: 33 us, vProcess: 41376 us, vEncoding: 18816 us, aRender: 62 us, aEncoding: 310 us, Latency: 62777 us

[11:26:35] Frame #280: vRender: 39 us, vProcess: 43909 us, vEncoding: 18867 us, aRender: 55 us, aEncoding: 218 us, Latency: 65696 us

[11:26:35] Frame #281: vRender: 35 us, vProcess: 43756 us, vEncoding: 20499 us, aRender: 53 us, aEncoding: 231 us, Latency: 66800 us

[11:26:35] Frame #282: vRender: 31 us, vProcess: 43390 us, vEncoding: 20808 us, aRender: 65 us, aEncoding: 315 us, Latency: 66789 us

CUDA (Quadro P2000)

[09:54:33] Frame #288: vRender: 40 us, vProcess: 18746 us, vEncoding: 22134 us, aRender: 68 us, aEncoding: 19 us, Latency: 44891 us

[09:54:33] Frame #289: vRender: 36 us, vProcess: 21211 us, vEncoding: 18160 us, aRender: 72 us, aEncoding: 322 us, Latency: 42256 us

[09:54:34] Frame #290: vRender: 36 us, vProcess: 18531 us, vEncoding: 20253 us, aRender: 59 us, aEncoding: 214 us, Latency: 41408 us

[09:54:34] Frame #291: vRender: 35 us, vProcess: 18369 us, vEncoding: 22336 us, aRender: 70 us, aEncoding: 327 us, Latency: 43288 us

[09:54:34] Frame #292: vRender: 40 us, vProcess: 17668 us, vEncoding: 18668 us, aRender: 63 us, aEncoding: 17 us, Latency: 38560 us

[09:54:34] Frame #293: vRender: 36 us, vProcess: 17704 us, vEncoding: 19705 us, aRender: 71 us, aEncoding: 327 us, Latency: 40145 us

System Two

AVX2 (i7-8700K)

[11:30:00] Frame #70: vRender: 31 us, vProcess: 36414 us, vEncoding: 16255 us, aRender: 1083 us, aEncoding: 10 us, Latency: 55197 us

[11:30:00] Frame #71: vRender: 83 us, vProcess: 40397 us, vEncoding: 15759 us, aRender: 577 us, aEncoding: 244 us, Latency: 59374 us

[11:30:00] Frame #72: vRender: 30 us, vProcess: 36319 us, vEncoding: 15735 us, aRender: 930 us, aEncoding: 357 us, Latency: 54855 us

[11:30:00] Frame #73: vRender: 1774 us, vProcess: 47668 us, vEncoding: 70102 us, aRender: 13 us, aEncoding: 245 us, Latency: 121703 us

[11:30:00] Frame #74: vRender: 34 us, vProcess: 40626 us, vEncoding: 15824 us, aRender: 610 us, aEncoding: 8 us, Latency: 58531 us

[11:30:00] Frame #75: vRender: 35 us, vProcess: 40386 us, vEncoding: 15860 us, aRender: 565 us, aEncoding: 234 us, Latency: 58775 us

CUDA (GeForce RTX 2080 TI)

[09:28:47] Frame #1720: vRender: 30 us, vProcess: 12659 us, vEncoding: 13886 us, aRender: 893 us, aEncoding: 332 us, Latency: 29435 us

[09:28:47] Frame #1721: vRender: 38 us, vProcess: 13909 us, vEncoding: 17943 us, aRender: 894 us, aEncoding: 421 us, Latency: 35310 us

[09:28:47] Frame #1722: vRender: 39 us, vProcess: 13063 us, vEncoding: 14418 us, aRender: 558 us, aEncoding: 8 us, Latency: 30184 us

[09:28:47] Frame #1723: vRender: 32 us, vProcess: 13319 us, vEncoding: 14304 us, aRender: 12 us, aEncoding: 343 us, Latency: 29725 us

[09:28:47] Frame #1724: vRender: 51 us, vProcess: 14712 us, vEncoding: 15048 us, aRender: 653 us, aEncoding: 244 us, Latency: 33087 us

[09:28:47] Frame #1725: vRender: 30 us, vProcess: 13147 us, vEncoding: 15400 us, aRender: 570 us, aEncoding: 7 us, Latency: 30813 us

Vouk

Playing with CUDA right now and already got a speed increase of 69% when using bit depths greater than 8 bit on my development machine.

Still working on it.

Vouk

The DirectX way doesn't work because there is no texture support of YUVA floats.

Trying CUDA ...

Vouk

That might take some time.

Vouk

I have in mind to create a directx texture of the frame and convert it using the gpu, yes. But this is still to be done, and i have it planned for the voukoder successor.

Vouk

Voukoders processing path for > 8 bit is not optimized yet. Premiere delivers floating point data for high bit depth video. This needs to be converted to a pixel format that FFmpeg understands.

Vouk

It should be possible to use Voukoder as a VfW codec. But video and audio will be exported separately to two files. You can't have video and audio muxed to one file.

So i guess we'll still need connectors.

Beiträge von Vouk

Adobe After Effects is not using GPU

Unable to initialize the encoder (NVENC)

AAC audio encoding problem

Installation path does not get created

Use CUDA to speed up encoding > 8bit color formats

Use CUDA to speed up encoding > 8bit color formats

Use CUDA to speed up encoding > 8bit color formats

Use CUDA to speed up encoding > 8bit color formats

Use CUDA to speed up encoding > 8bit color formats

NVENC new FFmpeg Presets

Low GPU utilization when using NVENC 10 Bit

Use CUDA to speed up encoding > 8bit color formats

NOT FOR PRODUCTION USE

NVENC new FFmpeg Presets

Use CUDA to speed up encoding > 8bit color formats

System One

AVX2 (i7-7700K)

CUDA (Quadro P2000)

System Two

AVX2 (i7-8700K)

CUDA (GeForce RTX 2080 TI)

Use CUDA to speed up encoding > 8bit color formats

Low GPU utilization when using NVENC 10 Bit

Low GPU utilization when using NVENC 10 Bit

Low GPU utilization when using NVENC 10 Bit

Low GPU utilization when using NVENC 10 Bit

There is no need for these connectors! Universal Solution: Please develop a VFW version of Voukoder.