Nvidia’s GTX 680 arrives! “Faster, Smoother, Richer” – is it enough to take the Performance Crown?
Architecture and Features
We have covered Fermi’s GF100 architecture in a lot of detail previously. You can can read our articles here and also in our coverage of NVIDIA’s GPU Tecnology Conference 2010 that we reported on it for you here, here and here in a three-part series. The new Kepler architecture builds on Fermi architecture with some important improvements and refinements that we will briefly cover before we get into performance testing.
SMX architecture
As Nvidia’s slide indicates, the new architecture is called SMX and it emphasizes 2x the performance per Watt of Fermi. Their multi-threaded engine handles all of the information using four graphics processing clusters including the raster engine and two streaming multi-processors.
The SM is now called the SMX cluster. Each SMX cluster includes a Polymorph 2.0 engine, 192 CUDA cores, 16 texture units and a lot of high-level cache. To add it all up, 2 SMXs each times 16 SMXs each including 192 CUDA cores, equal 1536 CUDA cores.
Four raster units and 128 Texture units comprise 32 ROPs; eight geometry units each have a tessellation unit, and more lower-level cache.
Nvidia has really improved their memory controller over last generation as there is a 256-bit wide GDDR5 memory interface at 6Gbps declared throughput.
The Kepler PolyMorph and Raster engines have been improved over Femi.
We see bindless textures which dramatically increase the number of textures to the shaders allowing for more detail. Logic has been improved.
We also see improvements in Kepler scheduling over Fermi.
Of course, this lead to changes in design decisions for more efficiency.
This is a very brief overview of Kepler architecture as presented to the press at Kepler Editor’s Day in San Francisco two weeks ago. If we are able to attend the upcoming GTC, you can expect a lot more details about the architecture.
GPU Boost
GPU Boost was invented by Nvidia to improve efficiency and to raise the GTX 680 clocks automatically in response to dynamically changing power requirements. Up until now, Nvidia engineers had to select clock speeds on a specific “worst case” power target – often a benchmark.
Unfortunately, all apps are not equal in their power requirements and some applications are far more power-hungry than others – that means that in some games with lower power requirements, the game is not optimized for higher core frequency because it is limited by a global power target.
With GPU Boost, there is real time dynamic clocking with polling every millisecond. In this way, clocks can be ramped up to meet the power target of each application – not held back by the most stressful app, which is usually a benchmark, not a game.
Fortunately, GPU Boost goes hand in hand with overclocking and delivers additional frequency in addition to the clocks set by the end user. GPU Boost continues to work while overclocking to the maximum allowed by the ever-changing power envelope.
Moving the voltage higher also moves the frequency and boost higher. In practice, if you monitor the frequencies, they constantly change up and down. You can see this in the overclocking section by using EVGA’s Precision.
Adaptive VSync
Traditionall VSynch is great for eliminating tearing until the frame rate drops below the target – then there is a severe drop from usually 60 fps down to 30 fps if it cannot meet exactly 60. When that happens, there is a noticeable stutter.
Nvidia’s solution is to dynamically adjust VSync – to turn it on and off instantaneously. In this way VSynch continues to prevent tearing but when it drops below 60 fps, it shuts off VSynch to reduce stuttering instead of drastically dropping frame rates from 60 to 30 fps or even lower. When the minimum target is again met, VSynch kicks back in. In gaming, you never notice Adaptive VSynch is happening; you just notice less stutter (in demanding games, especially).
Adaptive VSynch is a good solution that works well in practice. We did not spend a lot of time with Adaptive VSynch as we rarely use VSynch when we game and never when benching.
FXAA & TXAA
TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution).
There is TXAA 1 which extracts a performance cost similar to 2xAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA. We simply did not have time to investigate these claims but present Nvidia’s examples below.
How does TXAA compare with 8xMSAA? Pretty good if these images are good examples of it. Again, TXAA seems to work really well in areas of high contrast lighting.
TXAA will be the subject of an IQ analysis in a forthcoming article. For now, it works and appears to be a great option for the situations where MSAA doesn’t work efficientlyl
FXAA
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games (Duke Nukem Forever), in other games it may be a bit blurry for some. First, a screenshot from the upcoming Samaritan demo with no AA and the jagged edges are intolerable.
Now 4xMSAA is applied to the entire screen slowing performance considerably. Jaggies are reduced visibly at the cost of performance.
Here is the same scene with FXAA applied. In many cases, where there is a lot of difference between light and dark scenes it works better than MSAA; in others it lacks a bit and blurs the entire scene. In this example, it works very well to eliminate aliasing.
Again, FXAA is a great option to have when MSAA kills performance. We plan to devote a entire evaluation to comparing IQ between the HD 7000 series and the GTX 600 series as well as to the older series video cards.
Specifications
Here are the specifications for the GTX 680:
There were quite a few changes between the GTX 680 and the GTX 580.
Here are the specifications of the GTX 580
Right off we see the memory speed/bandwidth has taken a huge jump from 4Gbps to 6Gbps while the memory configuration of the GTX 58o has been reduced from 384-bit to 256-bit; while the vRAM has been increased from 1.5GB to 2GB! And the core clock has gone up from 772MHz to 1006MHz while the power requirements have dropped from 244W down to 195W. We also notice that DisplayPort is now an option.
TheGeForce GTX 680 was designed from the ground up to deliver exceptional tessellation performance which Nvidia claims is 4 times the HD 7970’s tessellation performance. Tessellation is the key component of Microsoft’s DirectX 11 development platform for PC games.
Tessellation allows game developers to take advantage of the GeForce GTX 680 GPU’s tessellation ability to increase the geometric complexity of models and characters to deliver far more realistic and visually rich gaming environments. Needless to say, the new GTX brings a lot of features to the table that current Nvidia’s customers will appreciate, including improved CUDA’s PhysX, 2D and 3D Surround plus the ability to drive up to 3 LCDs plus a 4th accessory display from a single GTX 680; superb tessellation capabilities and a really fast and power efficient GPU in comparison to their GTX 580.
Surround plus an Accessory display from a single card
One of the criticisms that Kepler addressed in Fermi was that two video cards in SLI are required to run 3-panel Surround or 3D Vision Surround. The GTX 680 now can run three displays plus an accessory display and Nvidia has changed their taskbar from the left side to the center screen. We didn’t find much difference with the taskbar in the center; it might be more convenient for some users.
One new option is in the Bezel corrections. In the past, the in-game menus would get occluded by the bezels and it was annoying if you use the correction. Now with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel. However, this editor does not ever use bezel correction.
Nvidia claims a faster experience with the custom resolutions because of a faster center display acceleration. Of course, we tested Surround’s 5760×1080 resolution and compared it to identical settings with Eyefinity driven off of a HD 7970. Check out the results in the Performance Summary chart and the 3D Vision/Surround section.
Awesome review !
Great review. I appreciate that you separated the benchmarks into their respective categories (dx9, dx10, dx11, synthetic, physx, etc.) Very thorough!
Overclocking charts directly comparing the GTX 680 to the HD 7970 were just added to the performance summary.
Great job apoppin. Thanks for covering overclocking so nicely, a lot of early reviews were rather poor on that.