​Voukoder Performance Analysis

    • Offizieller Beitrag

    Analyzing the Performance

    This document describes how Voukoder processes data and how it measures the time needed to perform each step of an exporting process.

    Terms

    A host application ("Host") is the application the plugin runs in. In example Adobe Premiere or VEGAS Pro.

    (UNDER CREATION)

    First ...

    First we need to talk about some basic facts.

    • Voukoder does not necessarily do exports faster! It simply provides access to different encoders. These give you the possibilities to choose between the three encoding priorities. Encoding speed or video quality or filesize. You can't have all together!
    • Voukoder does not speed up rendering! If the host application is slow with rendering / delivering frames Voukoder can't make it go faster and it is actually limited to that speed then. No matter how fast any encoder / GPU is!

    It's not that easy ...

    It is quite complicated to analyze performance issues for various reasons ...

    • There are millions of different computer configurations and the slightest difference could have an impact on the measurements
    • Comparing encoders is complicated as hell! One can not simply say "Voukoder is better than xyz!", because:
      • What do you compare against? Encoding speed? File size? Visual quality?
      • What is happening? FFmpeg will always be faster than Voukoder because Voukoder includes the rendering part of the Host.
      • What settings or presets do you use? Are the settings optimized for the CPU / GPU?
      • ...

    Breaking it down

    In the end most of you are interested in the encoding speed, how many frames per second (fps) can be exported. Let's see how we're calculating the average fps at the end of the encoding process. This value is the average time (latency) of all encoded frames. So lets take a look at a single frame like the line from the log file:

    Code
    [12:31:35] Frame #920: vRender: 31 µs, vProcess: 9 µs, vEncoding: 13274 µs, aRender: 106 µs, aEncoding: 806 µs, Latency: 14378 µs


    The exporting process is basically executing the tasks above from the left to the right:

    Render > Process > Encoding

    These three steps can be broken down to "substeps":

    Rendering

    • Single, compressed frames will be acquired from the demuxer
    • Uncompressing these frames
    • Layers, Effects and Transitions get rendered to a single, uncompressed frame

    Processing

    • Depending on the host application pixel format conversions need to be done
    • Maybe images need to be vertically flipped

    Encoding

    • Maybe small, secondary pixel format conversions are required
    • The filter chain will be processed
    • The final frame gets encoded and will send to the muxer (and to disk)

    Each of these steps take a certain amount of time. In the example above rendering the frame takes like 31 µs. That is pretty fast. So if one frame takes like 31 µs we could theoretically have like 32258 frames per second (1 / time(s)), right? Awesome but until now the Host only rendered the frame. It has not been process, encoder nor written to disk yet. All the times of each step add up to each other and the export of a single frame can never be faster than the previous steps.

    Step Time Sum Theoretical fps
    Render video frame 31 µs 31 µs 32258 fps
    Process video frame 9 µs 40 µs 25000 fps
    Encoding video frame 13274 µs 13314 µs 75 fps
    Render audio frame 106 µs 13420 µs 74 fps
    Encode audio frame 806 µs 14226 µs 70 fps
    Total frame-to-frame latency - 14378 µs 69 fps

    That's how you'll get like 69 fps in the end (if all frames of the project have an average of 69 fps). But what if your project is UHD and has lots of filters and effects? Your log could look a bit different because the Host has a lot of work to render all the effects and text layers and images before handing it over to Voukoder:

    Code
    [22:14:01] Frame #515: vRender: 12891µs, vProcess: 110 µs, vEncoding: 2912 µs, aRender: 118 µs, aEncoding: 791 µs, Latency: 16972 µs

    The table above would look like this:

    Step

    Time Sum Theoretical fps
    Render video frame 12891µs 12891 µs 77 fps
    Process video frame 110 µs 12991 µs 76 fps
    Encoding video frame 2912 µs 15903 µs 62 fps
    Render audio frame 118 µs 16021 µs 62 fps
    Encode audio frame 791 µs 16812 µs 59 fps
    Total frame-to-frame latency - 16972 µs 58 fps

    In this example you see a slightly lower fps. But while you could improve the first example with add hardware encoding (GPU) encoding to accelerate the "Encoding video frame" step, it would not be possible in the second example. The percentual GPU usage would be very low. This means Voukoder is not able to accelerate this.

    Video metrics

    You will have this metrics only when having video encoding enabled.

    vRender

    The time the host application requires to render the frame. Voukoder has no impact on this (with a few exceptions) as all of this happens in the host application.

    • Decoding the source frame to uncompressed values
    • Applying filters, effects and all layers
    • Converting it to the requested pixel format

    CUDA

    CUDA can have a significant impact on vRender depending on your project structure. It will most likely limit your export speed to a certain value (You can test this with the VRPT-Tool), but it will also accelerate your effects and filters.

    A a rule of thumb: If your project makes use of lots of effects and filters turn CUDA on. If not, turn it off. It can be changed in the project settings.

    Hardware decoding

    The Host can use the hardware decoding support of an integrated GPU on Intel systems. This sounds like it would be faster than CPU decoding but this is not the case in general. On slower CPUs this could have a positive effect, on faster CPUs it might be better to disable it. Again, you can test it with the VRPT-Tool which is faster on your system / project.

    vProcess

    This is the time Voukoder needs to prepare the frame date for using it with FFmpeg/libav. With YUV 4:2:0 (8 bit) data this value should be pretty low as no conversion is necessary. With other pixel formats the time needed for this task could increase drastically.

    vEncoding

    After rendering and processing the raw frame data this is the final step. The value of vEncoding is the time the software- or hardware encoder needs to compress the frame and write the frame to disk. This also includes the processing of all video filters.

    Audio metrics

    You will have this metrics only when having audio encoding enabled.

    aRender

    Just like vRender this is the time the host application needs to render the audio samples. It also is the combined value of several tasks:

    • Decoding the source audio to uncompressed values
    • Applying filters, effects and all layers

    Voukoder has no impact or acceleration possibilities on this value.

    aEncoding

    After rendering the raw sample data this is the final step. The value of aEncoding is the time the encoder needs to compress the data and write the it to disk. This also includes the processing of all audio filters.

    Latency

    The overall end-to-end time a frame needed to export. This includes all steps above as well as the required glue code. 1 / Latency (in seconds) equals the frames per second.

  • Thank you, it's interesting.

    About performance, may be I can ask my question here. For a site in French, I am looking for someone who can do a little test of encoding performance with Voukouder and a Nvidia RTX or GTX 16x0 card. It would be better if the person speaks almost French (or possibly English). We would like to know the performance gap between Nvidia new and old video cards.

    You can contact me via the Voukouder website.

    2 Mal editiert, zuletzt von MyPOV (8. Juni 2019 um 23:12)

  • Just looked through my log file and pulled out this random frame. Is this is a sign my encoding settings are excessive or could it just be a complex video frame?

    Using 5800x3D.

    Frame #19818: vRender: 8434 us, vProcess: 0 us, vEncoding: 160016 us, aRenderEncode: 441 us, Latency: 168905 us

    • Offizieller Beitrag

    vRender with 8.4 ms is quite high. Looks like either a complex video or a slow CPU.

    vProcess is 0 ms. I guess you're inputting yuv420 and passing it directly to the encoder

    vEncoding with 160 ms is also high. Either slow cpu or slow gpu (depends on what encoder you're using)

    aRenderEncode with 0.44 ms is okay.

    All in all, if all frames are like this it'd an encoding speed of 5.92 fps.

  • Hmm ok. Am using x265 encoder.

    Total average fps was 11. Cpu threads were completely saturated.

    I am also upscaling the video from 1440p to 4k, maybe that adds a lot of work?