NVIDIA’s DirectX 11 Architecture: GF100 (Fermi) In Detail
GF100 Architecture
Lets look at the diagrams:
The first diagram from NVIDIA’s slides, shows the GF100 block diagram illustrating the Host Interface, the GigaThread Engine, four GPCs, six Memory Controllers, six ROP partitions, and a 768 KB L2 cache. Each GPC contains four PolyMorph engines. The ROP partitions are immediately adjacent to the L2 cache. The second image illustrates how GF100’s graphics architecture is built from a number of hardware blocks called Graphics Processing Clusters (GPCs). A GPC contains a Raster Engine and up to four SMs. The third image illustrates how it all works together.
Firstly, CPU commands are read by the GPU via the Host Interface. In turn the GigaThread Engine fetches data from the system memory and copies it to the framebuffer. GF100 implements six 64-bit GDDR5 memory controllers for 384-bit total which facilitates high bandwidth access to the framebuffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 32 threads) to CUDA cores and to the other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline.
In the first image, the rectangular structures are SMs, or as NVIDIA calls them, streaming multiprocessors of which Fermi has sixteen. NVIDIA calls the green squares inside of each SM, “CUDA cores”. These CUDA cores compromise the chip’s most fundamental execution resource which helps to determine the chip’s total processing power and ultimately its performance. The GT200 has 240 and Fermi has 512.
The memory interfaces are 64-bit. This means that Fermi has its total path to memory that is 384 bits wide. This is in contrast to the higher 512 bit pathway on the GT200. However, Fermi compensates by delivering almost twice the bandwidth per pin due to its support for GDDR5 memory; GT200 used GDDR3 memory.
To summarize, Fermi GF100 has:
- 512 CUDA cores
- 16 Geometry Units
- 4 raster units
- 64 texture units
- 48 ROP units
- 384-bit GDDR5
NVIDIA’s current generation product, the GT200 – of which GTX 285 is the single GPU flagship – was able to improve on the original G80 design as represented by the 8800 GTX. By refining G80’s architecture, NVIDIA made it more programmable by adding double precision (DP) support and atomic operations. GT200 managed all of this while still holding on to the highest performance crown for a single GPU until nearly five months ago when AMD/ATI’s Radeon 5870 launched. Their competitor has the first DX11 chip that was built with incremental changes made over its last generation resulting in significant performance improvements over HD 4800 series.
So now NVIDIA has announced their Fermi GF100 next generation DX11 architecture which aims for even greater performance and also is more programmable and software friendly. There is no “GT300”. Until now, NVIDIA has chosen to primarily discuss Fermi Tesla GPU computing architecture and not to disclose microarchitecture or especially game-related performance details of GF100.
The biggest changes in GF100 architecture show us that the geometry pipeline has been significantly revamped with improved performance in geometry shading, stream out, and culling. Fillrate has also been improved which enables multiple displays to be driven simultaneously by GF100 SLI, much like AMD’s Eyefinity; but now additionally in 3D and at 120 Hz.
From studying the second image, we can see that the GPC is GF100’s dominant high-level hardware block. It features two key innovations—a scalable Raster Engine for triangle setup, rasterization, and z-cull, and a scalable PolyMorph Engine for vertex attribute fetch and tessellation. The Raster Engine resides in the GPC, whereas the PolyMorph Engine resides in the SM. On earlier NVIDIA GPUs, SMs and Texture Units were grouped together in hardware blocks called Texture Processing Clusters (TPCs). On GF100, each SM has four dedicated Texture Units.
As we look deeper, we can see that Fermi’s tessellation engine is impressive. It is not something just “tacked on” to GT200. NVIDIA saw early on that if they only made incremental changes to GT200, they would run into severe bottlenecks. Simply adding tessellation to GT200 would lead to intolerable geometry bottlenecks. They tell us that this is what took them so long – they had to design a better balanced new chip architecture that could also have better sequential rendering semantics built into its engine.
ati status
[told] x
benchmarks? none?
Benchmarks in a review of brand new GPU architecture!?!
– when have you seen that before?
We expect to have benchmarks vs. GTX 285 and vs. Radeon when we get the actual cards.
I noticed that you mentioned “time check”. These comments must be approved manually; sorry for any delay.
I thought you guys guys were gonna post something substantial, not this rehash. NDA my ass…
This is what they gave us TTimmy. It is not like other sites got anything different and we got garbage.
However, I do understand that this is not what you all wanted to see. We also wanted to see some more definitive information such as benchmark numbers, clock speeds, release date and pricing but…ABT isn’t releasing a video card (yet), they are.
I wouldn’t call what we posted, a “rehash”. What has happened with this progressive revelation, always happens with new architecture – no matter who releases it, AMD, NVIDIA or Intel. First you get the general information about upcoming architecture, then more and more information is released until it goes into production.
Much of the information about Fermi’s GeForce in our article is brand new information about Fermi’s gaming capabilities. Much of what we wrote about was not disclosed anywhere previously. There is a lot more to add since we wrote about Fermi’s computing architecture last year.
As I understand it, only the devs would have engineering samples of GF100. That means NVIDIA’s partners would not have them nor would any tech review site. There are no fixed clocks, nothing about power consumption nor thermals – and certainly no solid performance benchmarks other than what NVIDIA did internally. Not yet.
I hope Nvidia release a decent mid range $300 Fermi GPU.