Oak Ridge's AMD-powered Frontier Exascale Supercomputer is reportedly facing hardware difficulties

Computing at this scale is a difficult task

Oak Ridge's AMD-powered Frontier Exascale Supercomputer is reportedly facing hardware difficulties

The AMD-powered Frontier supercomputer reportedly can't run for a day without failures

Supercomputers are always challenging to operate. They consume insane amounts of power, and require a huge number of processors and graphics cards to run together in tandem without any issues. Like most supercomputers, oak Ridge's AMD-powered Frontier supercomputer is facing early issues, with the system reportedly running for less than a day without failures. 

According to Inside HPC, Oak Ridge's Frontier has faced numerous issues as the system gets prepped for "full user operations" in January 2023. Currently, oak Ridge is optimistic about meeting this deadline, even with the huge hardware demands of Exascale computing.

Early issues for Oak Ridge's Frontier system include problems with the HPE Cray Slingshot fabric that is used to interconnect the systems within Frontier. Other issues are based around the system's use of AMD Instinct GPU accelerators, which carry a large amount of Frontier's computational horsepower.

Teething issues are expected with PCs of this scale. Like all large projects, a lot of effort is needed to make sure that the entire system is working together effectively. That said, Oak Ridge's efforts have been focused on reducing Frontier's "mean-time-to-failure rate", which is currently measured in hours and not days. 

Beyond hardware issues, Oak Ridge has been working to ensure that large jobs make use of the entirety of Frontier's resources. This is a challenging task given the scale of Frontier. Achieving maximised performance levels is a big deal for systems of this scale, as offering researchers more computational throughput will allow their simulations to be completed faster and maximise the work that the system can complete. 

Oak Ridge's AMD-powered Frontier Exascale Supercomputer is reportedly facing hardware difficulties

While a lot of people are pinning Frontier's early hardware issues on AMD's Instinct accelerators, Justin Whitt, the program director for the Oak Ridge Leadership Computing Facility (OLCF), stated that “The issues span lots of different categories, the GPUs are just one.”

Regarding the reliability of the AMD products used within Oak Ridge's Frontier system, Whitt commented that "I don’t think that at this point that we have a lot of concern over the AMD products. We’re dealing with a lot of the early-life kind of things we’ve seen with other machines that we’ve deployed, so it’s nothing too out of the ordinary.”

Currently, Oak Ridge's Frontier exascale supercomputer is due to enter service on January 1st 2023. Currently, the team at Oak Ridge are "largely on track" to deliver that. The Oak Ridge team are confident that they will be able to meet their deadlines, despite the issues that they have faced bringing Frontier online. 

You can join the discussion on Oak Ridge's Frontier Exascale computer facing teething issues on the OC3D Forums.

«Prev 1 Next»

Most Recent Comments

x

Register for the OC3D Newsletter

Subscribing to the OC3D newsletter will keep you up-to-date on the latest technology reviews, competitions and goings-on at Overclock3D. We won't share your email address with ANYONE, and we will only email you with updates on site news, reviews, and competitions and you can unsubscribe easily at any time.

Simply enter your name and email address into the box below and be sure to click on the links in the confirmation emails that will arrive in your e-mail shortly after to complete the registration.

If you run into any problems, just drop us a message on the forums.