The EVGA GTX 660 Ti Superclocked edition arrives
Architecture and Features
We have covered Fermi’s GK104 architecture in a lot of detail previously. You can can read our GTX 680 introductory article and and its follow-up. We also covered the launch of the GTX 690 and the launch of the GTX 670. The new Kepler architecture builds on Fermi architecture with some important improvements and refinements that we will briefly cover here before we get into performance testing.
As Nvidia’s slide for the GTX 680 indicates, the new architecture is called SMX and it emphasizes 2x the performance per Watt of Fermi. Their multi-threaded engine handles all of the information using four graphics processing clusters including the raster engine and two streaming multi-processors.
The SM is now called the SMX cluster. Each SMX cluster includes a Polymorph 2.0 engine, 192 CUDA cores, 16 texture units and a lot of high-level cache. In the GTX 680, four raster units and 128 Texture units comprise 32 ROPs; eight geometry units each have a tessellation unit, and more lower-level cache. Both the GTX 670 and the GTX 660 Ti each have 4 graphics engines but one less SMX unit and only 24 ROPs.
The other main differentiation between the GTX 670/680 and the GTX 660 Ti is that the Ti bus is much narrower at 192-bit, cut down from 256-bit. Nvidia has really improved their memory controller over last generation as there is a 192-bit wide GDDR5 memory interface at 6Gbps declared throughput.
GeForce GTX 660 Ti’s memory speed is 6008MHz data rate. The base clock speed of the GeForce GTX 670 and the GTX 660 Ti is 915MHz. The typical Boost Clock speed is 980MHz.
The GeForce GTX 680 reference board measures 11″ in length whereas the GTX 670 and GTX 660 Ti are 9.5″. Display outputs include two dual-link DVIs, HDMI and one mini-DisplayPort connector. Two 6-pin PCIe power connectors are required for both the GTX 660 Ti’s and the GTX 670’s operation.
This is a very brief overview of Kepler architecture as presented to the press at Kepler Editor’s Day in San Francisco a few months ago. We also attended Nvidia’s GPU Technology Conference (GTC) and you can find a lot more details about the architecture in our GTC 2012 report.
GPU Boost was invented by Nvidia to improve efficiency and to raise the GTX 660 Ti clocks automatically in response to dynamically changing power requirements. Up until now, Nvidia engineers had to select clock speeds on a specific “worst case” power target – often a benchmark.
Unfortunately, all apps are not equal in their power requirements and some applications are far more power-hungry than others. That means that in some games with lower power requirements, the game is not optimized for higher core frequency because it is limited by a global power target.
With GPU Boost, there is real time dynamic clocking with polling every millisecond. In this way, clocks can be ramped up to meet the power target of each application – not held back by the most stressful application, which is usually a benchmark, not a game.
As we found with the GTX 680 and the GTX 670, GPU Boost goes hand-in-hand with overclocking and it delivers additional frequency in addition to the clocks set by the end user. GPU Boost continues to work with the GTX 660 Ti while overclocking to the maximum allowed by the ever-changing power envelope.
Moving the voltage higher also moves the frequency and boost higher. In practice, if you monitor the frequencies, they constantly change up and down.
Traditional VSync is great for eliminating tearing until the frame rate drops below the target – then there is a severe drop from usually 60 fps down to 30 fps if it cannot meet exactly 60. When that happens, there is a noticeable stutter.
Nvidia’s solution is to dynamically adjust VSync – to turn it on and off instantaneously. In this way VSync continues to prevent tearing but when it drops below 60 fps, it shuts off VSync to reduce stuttering instead of drastically dropping frame rates from 60 to 30 fps or even lower. When the minimum target is again met, VSync kicks back in. In gaming, you never notice Adaptive VSync is happening; you just notice less stutter (especially in demanding games).
Adaptive VSync is a good solution that works well in practice. We spent more time with Adaptive VSync by playing games and it is very helpful although we never use it when benching.
FXAA & TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution). It’s main advantage is that it reduces shimmering and texture crawling when the camera is in motion.
There is TXAA 1 which extracts a performance cost similar to 2xMSAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA.
TXAA was the subject of a short IQ analysis of the Secret World – the first game to use it. So far, it appears to be a great option for situations where MSAA doesn’t work efficiently and it almost completely eliminates shimmering and texture crawling when the camera is in motion. It works particularly well for the Secret World as the slight blur gives the game a cinematic look.
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games (Duke Nukem Forever/Max Payne 3), while in other games text or other visuals may be a bit blurry. FXAA is a great option to have when MSAA kills performance. We plan to devote a entire evaluation to comparing IQ between the HD 7000 series and the GTX 600 series as well as comparisons with the older series video cards.
Here are Nvidia’s specifications for the reference GTX 660 Ti:
As discussed, everything is very similar to the GTX 670 but on a narrower bus. The GeForce GTX 660 Ti was also designed from the ground up to deliver exceptional tessellation performance which Nvidia claims is several times the HD 7950’s tessellation performance. Tessellation is the key component of Microsoft’s DirectX 11 development platform for PC games.
Tessellation allows game developers to take advantage of both GeForce GTX 660 Ti’s GPU’s tessellation ability to increase the geometric complexity of models and characters to deliver far more realistic and visually rich gaming environments. Needless to say, the new GTX 660 Ti brings a lot of features to the table that current Nvidia’s customers will appreciate, including improved CUDA’s PhysX, 2D and 3D Surround plus the ability to drive up to 3 LCDs plus a 4th accessory display from a single GTX 660 Ti; superb tessellation capabilities and a really fast and power efficient GPU in comparison to their previous GTX 560 Ti.
Surround plus an Accessory display from a single card
One of the criticisms that Kepler has addressed from Fermi was that two video cards in SLI are required to run 3-panel Surround or 3D Vision Surround. From a single card, the GTX 670, 680, the GTX 690 and now the GTX 660 Ti now can run three displays plus an accessory display. Interestingly, Nvidia has changed their taskbar from the left side to the center screen. We now prefer the taskbar in the center; it might be more convenient for some users rather than clicking all the way over to the left for the start menu as with Eyefinity.
One thing that we did notice. Suround and 3D Vision Surround are now just as easy to configure as AMD’s Eyefinity. And AMD has no real answer to 3D Vision or 3D Vision Surround – HD3D lacks basic support in comparison.
One new option with the GTX 660 Ti/670/680/690 is in the bezel corrections. In the past, the in-game menus would get occluded by the bezels and it was annoying if you use the correction. Now with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel. However, this editor does not ever use bezel correction in gaming.
One thing that we still note – Surround suffers from less tearing than Eyefinity although AMD appears to be working on a solution with their latest drivers. The only true solution to tearing in Eyefinity is to have all native DisplayPort displays or opt for the much more expensive active adapters. And you will need two adapters for Eyefinity for most HD 7970s to run Eyefinity, whereas you only need one for Surround with the GTX 660 Ti, GTX 670 and the GTX 680.
Nvidia also claims a faster experience with the custom resolutions because of a faster center display acceleration.
A look at the EVGA GTX 660 Ti Superclocked
The reference GTX 660 Ti is on a short 9.5″ PCB especially compared to the GTX 680. With the GeForce GTX 660 Ti, Nvidia’s board partners have the option to produce custom GTX 660 Ti boards on launch day. Just like the GeForce GTX 670 was made into a smaller form factor chassis, Nvidia made a number of adjustments to the 660 Ti reference board to save space by moving the power supply closer to GPU.
Display outputs include two dual-link DVIs, one HDMI, and one DisplayPort connector. Two 6-pin PCIe power connectors are required for operation. If a user fails to connect the power connectors properly, a brief message is displayed at boot-up instructing them to plug-in the power connectors.
With the GTX 660 Ti’s power circuitry moved to the other side of the board, the area on the right side of the PCB was removed to save board space. The same cooling fan used on the GeForce GTX 670 is adapted for the GTX 660 Ti and it is fitted with acoustic dampening material to minimize unwanted tones in the fan noise. It is a pretty quiet card although not as quiet as the GTX 680.
The GTX 660 Ti’s blower fan exhausts hot air from the GPU outside the system chassis helping to reduce temperature inside the PC. This feature is particularly useful for small form factor PCs including possibilities for home theater PC (HTPC).
The GTX 660 Ti is set up for SLI or TriSLI by using two or three GTX 660 Tis. We hope to bring you a follow-up evaluation comparing GTX 660 Ti SLI performance scaling over a single GTX 660 Ti. We received our second GTX too late to do any SLI benching although we installed them into our case.
Super-Widescreen 5760 x1080, Surround, 3D Vision Surround, and PhysX
The EVGA GTX 660 Ti is set up exactly the same way as the more expensive GTX 670 and GTX 680. Since the GTX 660 Ti is almost 15% slower than the GTX 670 overall, one can reasonably expect the performance delta to be about the same for super-widescreen resolutions as well as for Surround, 3D Vision Surround and for PhysX as in our last evaluation of the GTX 670 in May. The HD 7970 is a stronger performer than the GTX 670 in some games at the highest resolutions, and it will even more evident because of the GTX 660 Ti’s narrow bus.
For 3D Vision and for Surround, several games need to have their settings reduced. Just remember that you are playing across three screens and are also rendering each scene twice for 3D Vision!! And turning on PhysX on a GTX 660 Ti, although affecting the frame rate, it is enough to play the game with fully maxed out details and FXAA or AAA compared to the GTX 560 Ti it replaces.
Our EVGA GTX 660 Ti Superclocked edition is already overclocked +65MHz over the Nvidia reference clocks. We were able to overclock a further +70MHz with complete stability even though we did not adjust the voltage nor our fan profile. We also managed +190MHz on the memory clocks which were considerably lower than the +400MHz we managed on the GTX 670 and the +550MHz on the GTX 680 and the GTX 690.
Even with overclocking further, temperatures stayed below 80C and the fan rarely exceeded 30%. The EVGA GTX 660 Ti Superclocked is a quiet card very similar to the reference GTX 670.
Check out the performance summary charts and particularly the overclocking charts to note how well the GTX 660 Ti scales. The specifications look extraordinary with solid improvements over the Fermi-based GTX 560 Ti. Let’s check out performance after unboxing our EVGA GTX 660 Ti Superclocked. Head to the next page for the unboxing and then to the test configuration .