Capturing video footage and playing games at 8K resolution with 60 frames per second (FPS) is now possible, thanks to advances in camera and display…
Capturing video footage and playing games at 8K resolution with 60 frames per second (FPS) is now possible, thanks to advances in camera and display technologies. Major leading multimedia companies including RED Digital Cinema, Nikon, and Canon have already introduced 8K60 cameras for both the consumer and professional markets.
On the display side, with the newest HDMI 2.1 standard, 8K60 is now widely available, supporting both gaming monitors and smart TVs. While 8K60 provides stunning image quality and sharpness, it comes with the significant cost of consuming more data for both transfer and storage.
Fast codecs are therefore paramount in bridging the gap between the sensors and the display. To make 8K60 widely available, NVIDIA Ada Lovelace GPU architecture provides multiple NVENC engines to accelerate video encoding performance while maintaining high image quality. (Two NVENCs are provided with NVIDIA RTX 4090 and 4080 and three NVENCs with NVIDIA RTX 6000 Ada Lovelace or L40).
In practice, this can double or triple the encoding performance with a single GPU when compared to previous generations, enabling 8K60 video encoding and beyond.
This post showcases how the multiple available NVENCs in NVIDIA Ada Lovelace architecture are leveraged using a split-frame encoding (SFE) technique to achieve 8K60 video encoding performance. We explore how this SFE technique works at 4K and 8K resolutions and how to enable it through the NVENCODE API. Finally, we present several benchmarks that show, in practice, the massive performance benefits of this technique.
SFE is a technique that enables exploiting multiple NVENCs present in NVIDIA Ada Lovelace GPUs when encoding a single video sequence by splitting the frames and encoding each partial frame with different NVENC engines. It was introduced in NVIDIA Video Codec SDK 12.0. SFE can effectively split the encoding work across the available NVENCs (Figure 1). However, until now, SFE was implicitly enabled based on the encoding preset, tuning information, and resolution to support 8K live encoding in HEVC or AV1. (Note that 8K is not supported on H.264.) To learn more, see Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture.
With NVIDIA Video Codec SDK 12.1, you can enable or disable the SFE feature. This means that SFE can now be used to take advantage of two or even three NVENCs present within the NVIDIA RTX 4090 and the NVIDIA RTX 6000 Ada Generation, respectively, without resolution, preset, and tuning information restrictions. This enables the application to double or even triple the encoding performance when encoding a single video sequence by using a two-way or three-way SFE. Such performance is especially important when encoding 8K, which is a particularly demanding use case.
SFE at 4K and 8K resolution
How SFE is applied can vary depending on the resolution and selected video codec. When using HEVC with SFE turned off, expect only a single slice to be used. When two-way or three-way SFE is used, two or three slices, respectively, are used. These horizontally separate each frame. It applies to both 4K and 8K resolutions (Figure 2). Additionally, the same applies to AV1 when encoding video up to 4K resolution. However, AV1 uses tiles instead of slices to create these independent frame partitions.
When encoding 8K video with AV1, consider that the maximum tile resolution defined by the standard is 4096 x 2304 pixels. This means that when encoding 8K video, each frame will be split into four tiles, each with a quarter of the resolution (3840 x 2160 pixels). When SFE is used, to achieve the same performance benefits as for HEVC, each tile will be further split horizontally, for eight or 12 tiles, for two-way and three-way SFE, respectively (Figure 3).
Figure 3. Generated frame partitions when encoding AV1 at 8K for the several SFE configurations: AV1 8K with no SFE (left); AV1 8K with two-way SFE (center); AV1 8K with three-way SFE (right)
Table 1 summarizes the number of partial frames and resolution to expect per codec and input video resolution.
CodecVideo resolutionNumber of partial frames and resolutionNo SFETwo-way SFEThree-way SFEHEVC4K1x 3840 x 21602x 3840 x 10803x 3840 x 7208K1×7680 x 43202×7680 x 21603×7680 x 1440AV14K1x 3840 x 21602×3840 x 10803×3840 x 7208K4x 3840 x 21608×3840 x 108012×3840 x 720Table 1. Summary of number of partial frames and respective resolution per codec when encoding 4K and 8K videos
Enabling split-frame encoding
With the API update of Video Codec SDK 12.1, in the latest NVENCODER API header, you can find NV_ENC_SPLIT_ENCODE_MODE. This enables control over SFE, as shown in Table 2. It is now quite easy to configure SFE using either implicit or explicit modes. NV_ENC_SPLIT_AUTO_MODE and NV_ENC_SPLIT_AUTO_FORCED_MODE provide a way to use the SFE implicit mode. To learn more, see Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture.
The remaining options refer to explicit SFE configuration. These include forcing SFE to be disabled, two-way, or three-way. To force two-way or three-way SFE requires an NVIDIA GPU with the appropriate number of NVENC engines.
NV_ENC_SPLIT_ENCODE_MODESFE typeDescriptionNV_ENC_SPLIT_AUTO_MODE (0)Auto Mode (default)Two-way SFE will be implicitly triggered based on input video resolution and encoding parametersNV_ENC_SPLIT_AUTO_FORCED_MODE (1)Force Auto ModeNV_ENC_SPLIT_TWO_FORCED_MODE (2)Force two-way SFEThe respective SFE configuration will be used regardless of the input video and encoding parametersNV_ENC_SPLIT_THREE_FORCED_MODE (3)Force three-way SFENV_ENC_SPLIT_DISABLE_MODE (15)Force no SFETable 2. Selected SFE type and description per NV_ENC_SPLIT_ENCODE_MODE option
The latest Video Codec SDK encoding sample AppEncMultiInstance also highlights how to add explicit SFE control to an application.
Performance and compression efficiency benchmarking
Several configurations and input 8K videos were tested, which are listed in Table 3.
Benchmarking configurationGPUGPU RTX 6000 Ada Generation (3 NVENCs)Input videos7 videos (4 gaming and 3 natural)EncodersHEVC and AV1PresetsP1 (fastest), P4 (medium) and P7 (slowest)Tuning InformationLow latency (LL) and high quality (HQ)Bitrates15, 20, 60, 150, and 250 MbpsTable 3. Benchmarking configuration summary
Two types of benchmarks were performed:
Transcoding Performance: Transcoding was used to minimize the influence of system bottlenecks (file I/O and memory copies between CPU and GPU). To test transcoding, the original 8K videos were pre-encoded with very high bitrates. During transcoding, NVDEC decodes the video. It is encoded by one to three NVENCs, when no split, two-way SFE, and three-way SFE are used, respectively. The performance results are shown in Figures 4 and 5 for HEVC and AV1, respectively.
Compression Efficiency Penalty: By splitting encoding work across several NVENCs, expect a compression efficiency penalty. To measure this penalty, BD-RATE was used across several benchmark configurations to compare the compression efficiency between no-split, two-way SFE, and three-way SFE. This metric indicates the average compression efficiency penalty for the same objective quality. The objective quality metric used in these benchmarks was PSNR. The compression efficiency penalty results are shown in Figures 6 and 7 for HEVC and AV1, respectively.
Figure 4. Average performance benchmarking results for 8K transcoding using HEVC
Figure 5. Average performance benchmarking results for 8K transcoding using AV1
When using two-way SFE, expect an average performance scaling of about 1.8x for both HEVC and AV1. Three-way SFE can achieve a performance scaling of up to 2.95x for HEVC and 2.31x for AV1. In practice, this enables you to encode 8K60 video with NVIDIA RTX 6000 Ada Generation, using both HEVC and AV1, with LL and HQ tuning information at a medium preset (P4).
Given that one to three NVENCs and a single NVDEC are used, NVDEC may become the bottleneck when transcoding 8K. For this reason, the fastest preset (P1) can result in the FPS reaching a maximum of about 120 FPS on average. This is the average maximum performance achieved by a single NVDEC at 8K.
You can observe better scaling as long as NVDEC isn’t the bottleneck. This is the case for slower presets, such as P4 and P7, where the performance scaling is much better in comparison to P1.
Figure 6. Average compression efficiency penalty results for 8K encoding using HEVC
Figure 7. Average compression efficiency penalty results for 8K encoding using AV1
In general, the compression efficiency penalty isn’t expected to exceed 2% for two-way SFE and 4% for three-way SFE when using BD-RATE (PSNR) to measure quality. This penalty is more noticeable for HQ tuning information than for LL. Additionally, according to the benchmarks performed, this penalty is slightly more prominent when using HEVC compared to AV1.
Although this compression efficiency penalty is still relatively low compared to the performance tradeoff, it’s up to the user to determine if the required use case benefits from more performance or compression efficiency. Regardless, the NVENCODE API provides full control over SFE not only for 8K but also for lower resolutions.
Split-frame encoding (SFE) is a breakthrough feature that unlocks video encoding capabilities at 8K60 and beyond. It empowers users to harness the power of multiple NVENCs within NVIDIA Ada Lovelace architecture GPUs for encoding a single video sequence. This post has explained the performance advantages of two-way SFE (using two NVENCs) and three-way SFE (using three NVENCs). The latest NVIDIA Video Codec SDK provides explicit control over SFE for optimal customization.