Nvidia reveals their Turing graphics architecture – Doubling Down on Ray Tracing

Nvidia reveals their Turing graphics architecture - Doubling Down on Ray Tracing

Nvidia reveals their Turing graphics architecture – Doubling Down on Ray Tracing

At their SIGGRAPH 2018 keynote, Nvidia’s CEO, Jensen Huang, officially unveiled the company’s Turing GPU architecture, which will move beyond both Pascal and Volta to deliver enhanced performance levels and all-new hardware-level features. 

There is a lot to talk about when it comes to Turing, so we will split things up into a few sections, starting with Turing’s core design.

Turing – The Core

Looking at the diagram above, most of you will notice that Turing is about a lot more than Shader/Compute performance, with around half of Turing’s die size being dedicated to Tensor Cores and Nvidia’s new RT Core components. 

Tensor Cores are what Nvidia use to accelerate AI performance, with Turing being the first time that Nvidia has offered the feature outside of their premium Volta GV100 core design. RT cores, as the name suggests, are designed explicitly for Ray Tracing workloads, with Nvidia touting performance levels of 10 Giga Rays per second when using their large 754mm squared Turing graphics chip. Nvidia says that their fastest Turing parts can offer a 25% performance boost over unaccelerated Pascal GPUs, showcasing the benefits of specialised hardware.  

Nvidia compares their 754 mm squared die to their Pascal GP102 die, which was used to make the company’s Titan Xp and GTX 1080 Ti graphics cards. While the move from 13 TFLOPS to 16 TFLOPS is nice to see here, it is clear that Turing is designed to take advantage of its other hardware features, enabling increased performance due to changes outside of generalised shader compute. 

Nvidia reveals their Turing graphics architecture - Doubling Down on Ray Tracing  
Turing SM – Is it better than Pascal

Nvidia lists several changes in their Turing SMs when compared to Pascal, confirming that Turing will support Variable Rate Shading, a feature that appears to be very similar to AMD’s Rapid Packed Math technology. 

In short, it looks like Turing will be able to conduct FP32 and FP16 (and perhaps lower precision) calculations at the same time, with FP16 calculations offering a 2x boost in throughput. Using lower precision math will allow developers to glean performance benefits, with Variable Rate Shading enabling more complicated, or less complicated, math to be utilised whenever required. 

AMD uses Rapid Packed Math to offer increased performance levels in Far Cry 5 and Wolfenstein: The New Colossus, showcasing the benefits of Variable Rate Shading. Nvidia jumping on this train will help them in both the gaming and workstation markets, especially after developers realise that both GPU makers support a form of Rapid Packed Math.

Nvidia also references a new Unified L1 Cache architecture, with Nvidia stating that the cache now offers increased levels of bandwidth, though at this time it is unknown how much of an impact this design tweak will have.  

Display and Video

On the Display side, Nvidia states that their new Quadro RTX Turing graphics cards will now support the VirtualLink display output standard (more info here), which enables single-cable VR headset connectivity which combines USB data rates and four high-speed HBR3 DisplayPort lanes and offers up to 27W of power over a single USB Type-C cable. Turing is the first graphics architecture to support this standard. 

Turing is also said to offer a 25% increase in bitrate while also adding support for 8K HVEC encoding. Nvidia has also made changes to their encoder, which reportedly allows the company to offer the same levels of quality as previous generation parts while using a bit rate that is 25% lower. 

RT (Ray Tracing) Cores
Turing is Nvidia’s first graphics architecture to include RT cores, a feature which takes up a considerable amount of die space on Nvidia’s next-generation graphics cards. The cores are dedicated Ray Tracing hardware, which is designed to accurately simulate light in a way that is many times faster than conducting the same simulations/workloads on Nvidia’s CUDA cores. 

Nvidia reveals their Turing graphics architecture - Doubling Down on Ray Tracing  
GDDR6 memory

It is no secret that Turing graphics cards were built with GDDR6 memory in mind. After all, GDDR6 is made by 3rd party companies like Samsung, Micron and SH Hynix, all of which have been developing GDDR6 VRAM for quite some time. 

GDDR6 memory can offer bandwidth levels of up to 16 Gbps per pin, two times that of most GDDR5 memory modules and 60% higher than the 10Gbps GDDR5X memory that is used with most GTX 1080 graphics cards. This boost in memory bandwidth will allow Turing to be fed with enough data to complete modern and future gaming/development workloads, enabling higher resolution workflow. 

Quadro RTX

So far, Nvidia has only announced professional-grade Turing graphics cards, the Quadro RTX series, focusing on Turing’s Ray Tracing capabilities. So far, the company has announced their 10,000 USD Quadro RTX 8000, their 6,300 USB Quadro RTX 6000 and their 2,300 USD Quadro RTX 5000. Should we expect to see Turing-based gaming cards at Nvidia’s “Geforce Gaming Celebration”?

You can join the discussion on Nvidia’s Turing graphics architecture on the OC3D Forums.