Nvidia’s Tesla A100 has a whopping 6,912 CUDA cores – Specs Detailed

Nvidia's Tesla A100 GPU has been pictured - The largest Ampere GPU

Nvidia’s Tesla A100 has a whopping 6,912 CUDA cores – Specs Detailed

Nvidia has revealed its Tesla A100 graphics accelerator, and it is a monster. Thanks to CRN, we have detailed specifications for Nvidia’s Tesla A100 silicon, complete with CUDA core counts, die size and more. 

Right now, we know that Nvidia’s Tesla A100 features 6,912 CUDA cores, which feature the ability to calculate FP64 calculations at half-rate.

With 7nm, Nvidia has delivered a greater than 2x increase in transistor count over the company’s Tesla V100 core design, a feat which allows Nvidia to deliver some incredible performance increases for its Tesla A100. Nvidia has also combined this core design with 40GB of HBM memory by using five 8GB HBM memory modules. The image below shows that Nvidia’s Tesla A100 can feature up to six of these chips to deliver 48GB of VRAM. This reduction in VRAM capacity was likely a design decision which will help Nvidia increase the production yields of such a large graphics card. 

Nvidia’s Tesla A100 chips use Nvidia’s SMX3 form factor and support PCIe 4.0 and third-generation NVLink. With 3rd Generation NVLink, Nvidia can support up to 4.8 TB per second in bi-direction bandwidth and 600 GB per second in GPU-to-GPU bandwidth. This means that Nvidia can connect up to eight Tesla A100 graphics cards together with 600 GB per second of bandwidth. Nvidia has also combined its offerings with 200Gbps Mellanox interconnects to increase platform scalability further. 

With the company’s new 3rd generation Tensor cores, Nvidia’s Tesla A100 is said to deliver a 20x increase in eight-bit integer math (INT8) performance when compared to the company’s older Tesla V100. Nvidia has also reported a 2.5X increase in double-precision floating-point, FP64, performance. While there are fewer Tensor cores on the Tesla A100, these redesigned Tensor cores are much more powerful than before, more than making up for this decrease Tensor core count. 

Nvidia is expected to reveal more information about its Ampere architecture later today. 

  Tesla A100 Tesla V100 Tesla P100
GPU Architecture Ampere Volta Pascal
Process node 7nm 12nm 16nm
Die Size (mm^2) 826mm^2 815mm^2 610mm^2
FP64 TFLOPS 9.7 7.8 5.3
FP32 TFLOPs 19.5 15.7 10.6
FP16 TFLOPS 39.0 31.4 21.2
Transistor Count 54 Billion 21.1B 15.3B
CUDA Core Count 6,920 5,120 3,584
Tensor Cores 432 640 N/A
VRAM Type HBM2E HBM2 HBM2
VRAM Cappacity 40GB 32GB or 16GB 16GB
Memory Bus Size 5120-bit  4096-bit 4096-bit
Memory Bandwidth 1,555 GB/s 900GB/s 720GB/s
Boost clock speed  ? 1455MHz 1480MHz
TDP 400W 300W 300W

Nvidia's Tesla A100 GPU has been pictured - The largest Ampere GPU(Image from Videocardz)

You can join the discussion on Nvidia’s Tesla A100 graphics/AI accelerator on the OC3D Forums.Â