Nvidia’s Tesla A100 has a whopping 6,912 CUDA cores – Specs Detailed
Nvidia’s Tesla A100 has a whopping 6,912 CUDA cores – Specs Detailed
Right now, we know that Nvidia’s Tesla A100 features 6,912 CUDA cores, which feature the ability to calculate FP64 calculations at half-rate.
With 7nm, Nvidia has delivered a greater than 2x increase in transistor count over the company’s Tesla V100 core design, a feat which allows Nvidia to deliver some incredible performance increases for its Tesla A100. Nvidia has also combined this core design with 40GB of HBM memory by using five 8GB HBM memory modules. The image below shows that Nvidia’s Tesla A100 can feature up to six of these chips to deliver 48GB of VRAM. This reduction in VRAM capacity was likely a design decision which will help Nvidia increase the production yields of such a large graphics card.Â
Nvidia’s Tesla A100 chips use Nvidia’s SMX3 form factor and support PCIe 4.0 and third-generation NVLink. With 3rd Generation NVLink, Nvidia can support up to 4.8 TB per second in bi-direction bandwidth and 600 GB per second in GPU-to-GPU bandwidth. This means that Nvidia can connect up to eight Tesla A100 graphics cards together with 600 GB per second of bandwidth. Nvidia has also combined its offerings with 200Gbps Mellanox interconnects to increase platform scalability further.Â
With the company’s new 3rd generation Tensor cores, Nvidia’s Tesla A100 is said to deliver a 20x increase in eight-bit integer math (INT8) performance when compared to the company’s older Tesla V100. Nvidia has also reported a 2.5X increase in double-precision floating-point, FP64, performance. While there are fewer Tensor cores on the Tesla A100, these redesigned Tensor cores are much more powerful than before, more than making up for this decrease Tensor core count.Â
Nvidia is expected to reveal more information about its Ampere architecture later today.Â
 | Tesla A100 | Tesla V100 | Tesla P100 |
GPU Architecture | Ampere | Volta | Pascal |
Process node | 7nm | 12nm | 16nm |
Die Size (mm^2) | 826mm^2 | 815mm^2 | 610mm^2 |
FP64 TFLOPS | 9.7 | 7.8 | 5.3 |
FP32 TFLOPs | 19.5 | 15.7 | 10.6 |
FP16 TFLOPS | 39.0 | 31.4 | 21.2 |
Transistor Count | 54 Billion | 21.1B | 15.3B |
CUDA Core Count | 6,920 | 5,120 | 3,584 |
Tensor Cores | 432 | 640 | N/A |
VRAM Type | HBM2E | HBM2 | HBM2 |
VRAM Cappacity | 40GB | 32GB or 16GB | 16GB |
Memory Bus Size | 5120-bit | 4096-bit | 4096-bit |
Memory Bandwidth | 1,555 GB/s | 900GB/s | 720GB/s |
Boost clock speed | Â ? | 1455MHz | 1480MHz |
TDP | 400W | 300W | 300W |
(Image from Videocardz)
You can join the discussion on Nvidia’s Tesla A100 graphics/AI accelerator on the OC3D Forums.Â