RPCS3 Developer explains why AVX-512 is important for PS3 emulation and more
AVX-512 is important, and not because of its increased Vector Width
Published: 17th June 2022 | Source: Whatcookie |
No, AVX-512 is not a useless instruction set - Claims PS3 Emulation Guru
The AVX-512 instruction set has a bad reputation, and most of that reputation comes from Intel's spotty support for the instruction set on its processors. Beyond that, the instruction set's fragmented feature set made the feature unappealing to many users, as optimised AVX-512 code only works on supported processors, and not all x86 processors support AVX-512.
One person that believes AVX-512 is useful is Whatcookie, one of the developers behind the popular RPCS3 emulator. The RPCS3 emulator has successfully utilised the AVX-512 instruction set to dramatically increase the speed of PlayStation 3 emulation, allowing PS3 classics to run at higher framerates on supported PCs.
Whatcookie wrote a detailed blog covering the topic of AVX-512, and why it is important for PlayStation 5 emulation. This blog is available to read here, but for the sake of simplicity, the advantages of AVX-512 are as follows. AVX-512 supports an increased register width, new instructions, and mask registers.
For RPCS3, the majority of AVX-512's benefits come from the instruction set's new instructions, which can be used to accelerate the emulator by allowing several aspects of PS3 emulation to be completed in fewer clock cycles and with less latency. Several AVX-512 instructions can be mapped to specific PS3 functions to accelerate specific workloads, and that can have a dramatic impact on the performance of some games.
(From left to right: SSE2, SSE4.1, AVX2/FMA, and Icelake tier AVX-512)
The image above showcases God of War 3 running through RPCS3 using SSE2, SSE4.1, AVX2/FMA, and AVX-512 instructions. Here, we can see how newer instruction sets enable higher performance levels with RPCS3. Moving from AVX2/FMA instructions to AVX-512 enables a 23% performance boost, which dramatically improves God of War's framerate.
While God of War's 200+ FPS framerates seem a little insane, many PlayStation 3 games run poorly using RPCS3. The benefits of AVX-512 are more impactful with these titles, as it makes framerates both higher more stable. For modern AVX-512 CPUs, the instruction set also helps to decrease power draw by completing calculations in the most efficient way.
The performance when targeting SSE2 is absolutely terrible, likely due to the lack of the pshufb instruction from SSSE3. pshufb is invaluable for emulating the shufb instruction, and it’s also essential for byteswapping vectors, something that’s necessary since the PS3 is a big endian system, while x86 is little endian.
The SSE4.1 target achieves an average of 160 FPS, while the AVX2/FMA target achieves an average of 190 FPS. This is a 18% improvement over the SSE4.1 target. AVX2 doesn’t include many new instructions over SSE4.1, but it does include a new 3 operand form for instructions, which eliminates many register to register mov instructions. Crucially, all CPUs that support AVX2 also support FMA instructions. FMA instructions aren’t just faster than a chain of multiply + add instructions, but can also produce different results due to not rounding to single precision between the multiply and the add. Accurately emulating this without FMA instructions adds some overhead, and so native FMA operations help out quite a bit.
The Icelake tier AVX-512 target hits a ludicrous 235 FPS average, 23% faster than the AVX2/FMA target. The sheer number of new instructions added in AVX-512 is so large that quite a number of them end up being useful for RPCS3. Unlike AVX2 which was mostly a straightforward extension of existing SSE instructions to 256 bits, AVX-512 includes a huge number of new features which are very useful for SIMD programming, even at lower bit widths. However, since intel chose to market AVX-512 with the -512 moniker, people who aren’t familiar with the instruction set usually fixate on the 512 bit vector aspect of the instruction set.
AVX-512 could make Valve's next generation Steam Deck an excellent emulation machine
With AMD's Zen 4 processors, AVX-512 support will be coming to AMD's mainstream processor. This will dramatically increase the number of PCs that could benefit from AVX-512 acceleration. Rumour has it that AMD could be developing a next-generation Steam Deck SOC for Valve that could use Zen 4 CPU cores and an RDNA 3 graphics component. This SOC could support AVX-512, making Valve's next gaming system a portable PS3 emulation monster.
One aspect of AVX-512 support that is useful for gaming laptops is that AVX-512 allows higher framerates to be achieved with less CPU grunt. This can decrease laptop power consumption, increasing battery life or giving more usable power to other areas of the system. Below is what Whatcookie observed with a mobile Tiger Lake system.
The recently announced Zen 4 was announced to support AVX-512 instructions as well. Since it’s likely that the successor to devices such as the steam deck will use a Zen 4 based CPU, it’s possible the number of people wanting to play games on a low end device that supports AVX-512 will increase significantly. Even when the target framerate is already achievable without AVX-512, enabling AVX-512 optimizations could improve battery life, or provide more TDP to the gpu which could enable gameplay at higher resolutions. I’ve personally already observed this phenomenon today on my Tigerlake based laptop. When targeting AVX-512 the CPU cores use 1W less, and the GPU uses 1W more, enabling higher framerates in RPCS3.
AVX-512 could improve other emulators
While AVX-512 has obvious advantages for PlayStation 3 emulation, AVX-512 can be used to accelerate the ARM recompiler used for emulators like Citra and Yuzu. On top of that, AVX-512 could be used to accelerate aspects of PlayStation 2 emulation with PCSX2. Now that AVX-512 support is becoming more commonplace, the feature now has a chance to be more routinely utilised by developers, with emulation being one useful use case.
Outside of RPCS3 AVX-512 isn’t widely used by many emulators, however the Arm recompiler dynarmic can take advantage of many AVX-512 instructions as well. Dynarmic is used by the 3DS emulator Citra, the Nintendo Switch emulator Yuzu as well as the PS Vita emulator Vita3K. I’m not aware of any benchmarks comparing AVX2 targets vs AVX-512 for any of these emulators, but I would assume that the gap is smaller than it is with RPCS3, since Arm cores support both vector instructions as well as scalar instructions. Since the average game will spend more time executing scalar instructions than vector instructions, the potential gain from vector optimizations isn’t huge. For RPCS3 a large reason why these vector optimizations are so effective is because the SPUs only support operations on vector registers, and so any time spent emulating the SPUs is spent executing vector instructions.
One emulator that would likely benefit greatly from AVX-512 optimizations is PCSX2. Since the PS2’s VUs inspired much of the behavior and design for the SPUs, many of the optimizations which apply to RPCS3 should also apply to PCSX2. In particular, vrangeps should be helpful for improving their clamping code.
While Emulation alone will not make AVX-512 seem any less useless to its detractors, it remains a useful tool that can be used to accelerate a wide range of workloads. Sadly, Intel's AVX-512 support remains as choppy as ever, as Intel has been unable to get the instruction set working properly with their new hybrid architecture processors. Both Intel's 12th generation Alder Lake and 13th generation Rocket Lake processors both lack official support for AVX-512, a feature that was available with the company's 11th Generation Rocket Lake processors, and mobile systems based on Ice Lake and Tiger Lake CPUs.
Like always, Intel is one of the main reasons why AVX-512 cannot be widely supported by developers, as their inability to bring the instruction set everywhere has consistently limited the instruction set's appeal. Perhaps things will change now that AMD's is on the scene with Zen 4.
You can join the discussion on AVX-512's impact on the RPCS3 PlayStation 3 emulation on the OC3D Forums.