Big changes are coming with AMD's RDNA 3 architecture, and it will give Navi 31 crazy TFLOPS

AMD's RDNA 3 flagship could be 23% more powerful than expected

Latest AMD rumours suggests crazy TFLOPS for their flagship Navi 31 RDNA 3 GPU

AMD's Radeon RX 7900XT could be a 92 TFLOPS beast

The AMD leaker @Greymon55 has some new information to share, and it appears to be good news for team Radeon. According to the leaker, AMD's Navi 31 GPU design, the design behind AMD's rumoured Radeon RX 7900 XT, will offer users 92 TFLOPS of compute performance, a 22.6% boost over Greymon55's prior leak that suggested 75 TFLOPS of FP32 performance. 

Why has @Greymon55 suggested that AMD's RDNA 3 flagship is getting such a huge performance increase? Simple, the graphics card's clock speeds are said to be much higher. While he previously expected clock speeds in the 2.4-2.5GHz range, he is now suggesting that the graphics card has a clock speed within the 3.0GHz range.

Greymon55 has speculated that AMD's RDNA 3 Navi 31 GPU will feature 120 Compute Units (CUs) across 60 Workgroup Processors (WGPs). Each WGP features two CUs, and rumour has it that AMD has made some big changes to their CUs with RDNA 3.

Right now, we can only speculate as to why Greymon55's information has changed regarding RDNA 3's expected FP32 TFLOPS. His prior guesses may have simply been inaccurate, it is possible that AMD's Navi 31 silicon is delivering higher clock speeds than expected, or that AMD is targeting a higher power limit with their Navi 31 silicon than previously expected, enabling boosted clock rates. As always with hardware leaks, take everything you hear with a grain of salt.

Latest AMD rumours suggests crazy TFLOPS for their flagship Navi 31 RDNA 3 GPU

Another leaker called @Kepler_L2 has looked at AMD's latest Linux drivers, which appear to reveal new information about AMD's RDNA 3 graphics architecture. It suggests that AMD is increasing the number of SIMD32 units within each of their RDNA 3 CUs, doubling the SIMD32 resources that are available to per CU/WGP. 

It looks like AMD is dramatically increasing the resources that are available within a single RDNA 3 Compute Unit/Workgroup Processor, and it is likely that RDNA 3's per CU resources have been increased in other areas.

A tweet from @Greymon to Videocardz has stated that AMD's Radeon Navi 31 silicon features 120 Compute Units and 60 Workgroup Processors.  Previously, he claimed that AMD's Navi 31 silicon featured 120 Workgroup Processors, which stemmed from the fact that his sources claimed that AMD's Navi 31 GPU featured 15,360 Stream Processors with 64 Stream Processors per CU (Like Vega, RDNA 1 and RDNA 2). 15,350 divided by 64 is 240. 240 Compute Units with two Compute Units per Workgrpup Processor equals 120 Workgroup processors.

The above calculations change when you assume that AMD has increased the number of Stream Processors are in each RDNA 3 Compute Unit. If AMD has indeed doubled the resources within their RDNA 3 compute units, giving them 128 Stream Processors per CU, Navi 31's rumoured 15,350 stream processors will be divided into 120 Compute Units and 60 Workgroup Processors.

Latest AMD rumours suggests crazy TFLOPS for their flagship Navi 31 RDNA 3 GPU

What this means for RDNA 3

These changes explain why AMD's Navi 31 silicon reportedly delivers an insane 92 TFOPS of FP32 compute performance, a near 4x increase in TFLOPS over AMD's Radeon RX 6900 XT. This also explains why AMD's Navi 31 silicon is rumoured to deliver such a huge number of TFLOPS while other sources have suggested that AMD's Navi 31 flagship will deliver a 90-130% rasterisation performance boost over today's RX 6900 XT (source here).

With 120 alleged Compute Units, AMD's RDNA 3 Navi 31 flagship GPU will have 50% more Compute units than today's Radeon RX 6900 XT (which has 80 CUs). If today's reports are correct, AMD is beefing up their individual CUs to deliver more performance than their last-generation counterparts and boosting their clock speeds to deliver additional performance benefits. Both of these changes are great news for AMD.

Does more TFLOPS mean more performance?

While a 4x increase in FP32 performance sounds insane, it is worth remembering that there is more to GPU design than TFLOPS. Just look at Nvidia's Ampere lineup.

Nvidia's Ampere GPU lineup increased the number of FP32 Shading units within their Streaming Multiprocessors (SMs). With Turing, Nvidia's SMs had 64 FP32 Shaders per SM and 64 INT32 shaders per SM. With Ampere, Nvidia had 64 FP32 SMs and 64 mixed INT/FP32 SMs per CU (128 total FP32 CUs?).

So what does that mean? It means that Nvidia lists their RTX 2070 has 2304 FP32 shading units and their RTX 3060 as having 3584 FP32 shading units. Does the RTX 3060's 55% increase in, on paper, shaders giving Ampere a huge performance lead? No. In general, the RTX 3060 has similar gaming performance to an RTX 2070. Boosted FP32 performance does not guarantee boosted gaming performance. Well, at least the scaling isn't linear in this case. 

AMD's alleged 4x increase in FP32 TFLOPS will be a big deal for some workloads, but not all workloads. Per Compute Unit (or in Nvidia's case per SM) performance is what matters more here. Going back to the RTX 2070 and RTX 3060, which has 36 Turing SMs and 28 Ampere SMs respectively, we can see that Ampere is delivering a lot more Per SM performance than Turing. The same principle will apply to AMD's RDNA 3 architecture, if today's leaks are legitimate.

With today's reports in mind, we expect AMD to achieve much higher performance levels per Compute Unit than they did with RDNA 2. FP32 performance should see a huge jump, but that doesn't necessarily mean that gaming performance will increase by the same amount. AMD is making big changes to their RDNA architecture with RDNA 3, and that is great news for consumers. However, right now we can only speculate what the overall impact of these changes will be.

You can join the discussion on AMD's latest RDNA 3 rumours on the OC3D Forums.  

«Prev 1 Next»

Most Recent Comments

30-04-2022, 20:08:27

Also hearing above 3Ghz clock speeds on some of the RDNA3 chips.Quote

30-04-2022, 20:29:44

IF this is true it'll be interesting to see how this TFLOP increase will translate to real world performance over RDNA2.Quote

30-04-2022, 20:39:34

Originally Posted by Dicehunter View Post
IF this is true it'll be interesting to see how this TFLOP increase will translate to real world performance over RDNA2.
It will be very workload dependent. Just like Ampere/Turing over Pascal. That said, this will be great for AMD in workloads outside of gaming, where the TFLOPS really matter.

With architectural changes like this, it is more important to think of per CU or per SM performance than shader counts and stream processor counts. Like I highlighted in the article with Turing VS Ampere.Quote

01-05-2022, 09:07:27

So the rumor is that this thing will draw 450W. That would be insanely power efficient. It sounds too good to be true, honestly.Quote

01-05-2022, 19:42:50

Yeah for gaming TFLOPs don't really mean anything. It's really more about how efficient the hardware is at managing it's resources so it's able to be as fast as possible.

Now for science and math workloads that's obviously an entirely different matter and TFLOPs actually mean something.

I'm excited for what the bring. I hope it releases alongside a bunch of new technologies like Nvidia has but that's actually competitive or innovative.Quote

Register for the OC3D Newsletter

Subscribing to the OC3D newsletter will keep you up-to-date on the latest technology reviews, competitions and goings-on at Overclock3D. We won't share your email address with ANYONE, and we will only email you with updates on site news, reviews, and competitions and you can unsubscribe easily at any time.

Simply enter your name and email address into the box below and be sure to click on the links in the confirmation emails that will arrive in your e-mail shortly after to complete the registration.

If you run into any problems, just drop us a message on the forums.