NVIDIA’s DirectX 11 Architecture: GF100 (Fermi) In Detail
Geometry
NVIDIA’s goal is for GF100 to enable film-like geometric realism for game characters and objects. Geometric realism is central to the GF100 architectural enhancements for graphics. In addition, PhysX simulations are faster and developers can utilize GPU computing features in games more easily and effectively.
While programmable shading has allowed PC games to mimic the cinema in per-pixel effects, geometric realism is way behind. The most advanced modern PC games will use one to two million polygons per frame whereas a typical frame in a computer generated film uses hundreds of millions of polygons. While the number of pixel shaders has grown from one to many hundreds, the triangle setup engine has remained a singular unit. For example, the GeForce GTX 285 has more than 150 times the shading horsepower of the old GeForce FX, but less than 3 times the geometry processing rate. This means that pixels are shaded well but their geometric detail is weak.
Take a look at NVIDIA’s example from Far Cry 2. The holster has a heavily segmented strap. The corrugated roof is just a flat surface with a striped texture instead of curving properly. We also note that this character wears a hat to avoid the complexity of rendering hair.
On the other hand, the exquisitely detailed characters in CG films are made possible by tessellation and displacement mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement mapping changes their relative position. To achieve these same goals, GF100’s entire graphics pipeline is designed to deliver higher performance in tessellation and geometry throughput.
GF100 replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple “PolyMorph Engines”. Each of these engine includes a tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine as shown by the three grouped diagrams that we showed you earlier (above).
Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel compared to a single Raster Engine in GT200 and in earlier GPUs. On-chip L1 and L2 caches now enable high bandwidth transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100 with improved geometry throughput. GF100‘s ability to perform parallel geometry processing is possibly the single most important GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock while maintaining correct rendering order is a significant technical achievement.
Major compute features improved on GF100 that will be useful in games include faster context switching between graphics and PhysX, concurrent compute kernel execution and an enhanced caching architecture which is good for irregular algorithms such as ray tracing, and AI. Simultaneously, improved atomic operations performance allows threads to safely cooperate through work queues, accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects to be rendered without presorting (order independent transparency) enabling developers to create levels with complex glass environments. GF100’s GigaThread engine reduces context switch time, making it possible to execute multiple compute and physics kernels for each frame.
ati status
[told] x
benchmarks? none?
Benchmarks in a review of brand new GPU architecture!?!
– when have you seen that before?
We expect to have benchmarks vs. GTX 285 and vs. Radeon when we get the actual cards.
I noticed that you mentioned “time check”. These comments must be approved manually; sorry for any delay.
I thought you guys guys were gonna post something substantial, not this rehash. NDA my ass…
This is what they gave us TTimmy. It is not like other sites got anything different and we got garbage.
However, I do understand that this is not what you all wanted to see. We also wanted to see some more definitive information such as benchmark numbers, clock speeds, release date and pricing but…ABT isn’t releasing a video card (yet), they are.
I wouldn’t call what we posted, a “rehash”. What has happened with this progressive revelation, always happens with new architecture – no matter who releases it, AMD, NVIDIA or Intel. First you get the general information about upcoming architecture, then more and more information is released until it goes into production.
Much of the information about Fermi’s GeForce in our article is brand new information about Fermi’s gaming capabilities. Much of what we wrote about was not disclosed anywhere previously. There is a lot more to add since we wrote about Fermi’s computing architecture last year.
As I understand it, only the devs would have engineering samples of GF100. That means NVIDIA’s partners would not have them nor would any tech review site. There are no fixed clocks, nothing about power consumption nor thermals – and certainly no solid performance benchmarks other than what NVIDIA did internally. Not yet.
I hope Nvidia release a decent mid range $300 Fermi GPU.