NVIDIA’s DirectX 11 Architecture: GF100 (Fermi) In Detail
Article written by Mark Poppin and BFG10K, AlienBabelTech Senior Editors.
At their Graphics Technology Conference (GTC) last September 30th, NVIDIA announced their next-generation graphics architecture, codenamed Fermi. We reported on it for you here, here and here in a three-part series. At the GTC, graphics performance was not the focus of Tesla Fermi. Rather the conference was emphasizing NVIDIA’s new architecture as a revolutionary General Purpose Processor that takes much more advantage of their new Fermi GPU’s abilities of superfast parallel processing over their current architecture. NVIDIA’s goal is to dominate the professional market with their Tesla GPUs. Now that Fermi GF100 GPUs for NVIDIA’s new video cards are finally in mass production, we will be looking at how NVIDIA intends to dominate gaming.
To summarize the new architecture, Fermi boasts a brand new shader core whose compute clusters comprise a single shader multiprocessor (SM). Each stream processor has a fully-pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). Each SM can dual-issue two independent instructions per clock to two different warps. Each instruction is run by a 16-way SIMD block that handles single-precision Floating Multiply-Add Instruction (FMAs). The Fermi memory hierarchy is also new, sporting a new unified L2 cache that serves all of the SMs without partitions. In addition, a new unified memory space allows each SM to not only communicate with its own local registers and shared memory, but now with L2 cache and beyond.
The GF100 features 768KB unified level-two cache as well as a rather complex cache hierarchy. In addition, many other GPU-compute areas of performance are improved over NVIDIA’s current Tesla architecture GPUs, GT200. The GF100 hardware can sustain peak Single Precision (SP) and Double Precision (DP) FMA instruction throughput. Atomic instruction throughput is maximized over the current generation and Fermi is backed by ECC which is absolutely necessary for GPU computing. This all comes together to support a new type of multi-threading technology which improves the efficiency of the 512 cores working together. The entire Fermi family is compatible with DirectX 11, OpenGL 3.x and OpenCL 1.x application programming interfaces (APIs). The new chips are finally in mass production using 40nm process technology at TSMC.
Let’s go ahead and see what is new and improved with GF100.