NVIDIA RTX Blackwell architecture

New Features〔Key highlights just from my perspective〕

  • Streaming Multiprocessors(SM): Throughput per clock cycle for many integer arithmetic operations is doubled compared to NVIDIA Ada GPUs.
  • Tensor cores: 
    • new FP4 capabilities
    • new Second-Generation FP8 Transformer Engine
  • GDDR7 Memory 
    • ultra-low voltage 
    • PAM3 (Pulse Amplitude Modulation) signaling technology
    • higher-speed memory subsystems

 Blackwell GB202 GPU

  • 12 Graphics Processing Clusters (GPCs)
  • 96 Texture Processing Clusters (TPCs)
  • 192 Streaming Multiprocessors (SMs)
  • a 512-bit memory interface with sixteen 32-bit memory controllers
  • (GB202) the total L2  cache is 128 MB
  • (RTX5090) the total L2  cache is 96 MB

Fig. 1 [GB202 GPU block diagram (full chip)] From NVIDIA RTX Blackwell GPU Architecture, ver. 1.1, NVIDIA Corp., 2025, p. 8.

Fig. 2 [The Blackwell GPC] From NVIDIA RTX Blackwell GPU Architecture, ver. 1.1, NVIDIA Corp., 2025, p. 9. 

  • A GPU is made up of 12 GPCs ( as shown in Fig. 1 )
  • Each GPC contains 8 TPCs ( as shown in Fig. 2 )
  • Each TPC contains 2 SMs ( as shown in Fig. 2 )
  • Each SM contains 128 CUDA cores ( as shown in Fig. 2 )
  • 12 × 8 × 192 × 2 = 24576 CUDA cores
It is very interstng: 
The FP64 TFLOP rate is 1/64th the TFLOP rate of FP32 operations. The small number of FP64 Cores are included to ensure any programs with FP64 code operate correctly. Similarly, a very minimal number of FP64 Tensor Cores are included for program correctness. — NVIDIA Corporation, NVIDIA RTX Blackwell GPU Architecture, Version 1.1, p. 8 

 

Fig. 3 [The Blackwell Streaming Multiprocessor (SM)] From NVIDIA RTX Blackwell GPU Architecture, ver. 1.1, NVIDIA Corp., 2025, p. 11. 

 SM Architecture 

    • 128 CUDA cores
    • a 256 KB Register File
    • 128 KB of L1/Shared Memory
    • 4 Blackwell Fifth-Generation Tensor Cores
    • 4 Texture Units 
    • 1 Blackwell Fourth-Generation RT Core

 Blackwell 5th Generation Tensor Cores

  • FP 4, 6, 8, 16
  • BF16
  • TF32
  • INT8

 

[reference]

NVIDIA Corporation. NVIDIA RTX Blackwell GPU Architecture. Version 1.1, 2025, https://images.nvidia.com/aem-dam/Solutions/geforce/blackwell/nvidia-rtx-blackwell-gpu-architecture.pdf.


Popular posts from this blog

關於鋼鐵人的弧形反應器