Nvidia’s GTX 760 arrives to take direct aim at the HD 7950
Architecture and Features
We have covered Fermi’s GF100 architecture in a lot of detail previously. You can can read our articles here and also in our coverage of NVIDIA’s GPU Tecnology Conference 2010 that we reported on it for you here, here and here in a three-part series. The Kepler architecture builds on Fermi architecture with some important improvements and refinements.
SMX architecture
As Nvidia’s slide indicates, the Kepler architecture is called SMX and it emphasizes 2x the performance per Watt of Fermi. Their multi-threaded engine handles all of the information using four graphics processing clusters including the raster engine and two streaming multi-processors.
The SM is now called the SMX cluster. Each SMX cluster includes a Polymorph 2.0 engine, 192 CUDA cores, 16 texture units and a lot of high-level cache. To add it all up, 2 SMXs each times 16 SMXs each including 192 CUDA cores, equal 1536 CUDA cores in the GPU used for the GTX 680 and the GTX 770.
The GTX 670 and the GTX 660 Ti each have 7 SMX clusters and 1344 CUDA Cores while the GTX 760 has 1152 CUDA Cores. The GTX 670 and the GTX 760 are built on a 256-bit memory interface while the GTX 660 Ti’s is 192-bit.
The GTX 760’s 96 Texture units are part of 32 ROPs but the rest of the chip is cut down from the GTX 680/770 128 Texture Units and 8 SMX units downscaled to 3 or 4.
Nvidia significantly improved their memory controller over the Fermi generation as the GTX 680 uses a 256-bit wide GDDR5 memory interface at 6Gbps declared throughput. The GTX 660 Ti/GTX 670 and GTX 680 share this interface and memory.
The above is a very brief overview of Kepler architecture as presented to the press at Kepler Editor’s Day in San Francisco last year and adapted for the 700 series.
Adaptive VSync
Traditional VSync is great for eliminating tearing until the frame rate drops below the target – then there is a severe drop from usually 60 fps down to 30 fps if it cannot meet exactly 60. When that happens, there is a noticeable stutter.
Nvidia’s solution is to dynamically adjust VSync – to turn it on and off instantaneously. In this way VSync continues to prevent tearing but when it drops below 60 fps, it shuts off VSync to reduce stuttering instead of drastically dropping frame rates from 60 to 30 fps or even lower. When the minimum target is again met, VSync kicks back in. In gaming, you never notice Adaptive VSync is happening; you just notice less stutter (in demanding games, especially).
Adaptive VSync is a good solution that works well in practice.
Specifications
Here are the specifications for the GTX 760:
Now let’s look at the GTX 670 specifications. Remember, however, the GTX 670 formerly at $399 is being replaced by the GTX 770 (which is faster than the GTX 680) which also launched at $399.
Now let’s look at the $299 GTX 660 Ti’s specification
First, we notice that everything is identical between the GTX 670 and the GTX 660 Ti except for the wider memory bus. Using the GTX 670’s 256-bit bus, the increased base core speed, from 915MHz of the GTX 670/660 Ti to 980MHz of the GTX 760 allows it to nearly catch the GTX 670’s performance.
We note that the original Boost of the GTX 670 and GTX 660 Ti has been increased also and replaced with the improved Boost 2 of the 700 series. The TDP of the GTX 760 GPU has gone up from 150W of the GTX 660 Ti to match the 170W for the GTX 670 which all requires 6+6 pin PCIe power connectors.
Let’s take a closer look at the GTX 760.