The GTX 670 Arrives – is it a game changer?
Architecture and Features
We have covered Fermi’s GK104 architecture in a lot of detail previously. You can can read our GTX 680 introductory article and and its follow-up. The new Kepler architecture builds on Fermi architecture with some important improvements and refinements that we will briefly cover before we get into performance testing.
SMX architecture
As Nvidia’s slide indicates, the new architecture is called SMX and it emphasizes 2x the performance per Watt of Fermi. Their multi-threaded engine handles all of the information using four graphics processing clusters including the raster engine and two streaming multi-processors.
The SM is now called the SMX cluster. Each SMX cluster includes a Polymorph 2.0 engine, 192 CUDA cores, 16 texture units and a lot of high-level cache. Four raster units and 128 Texture units comprise 32 ROPs; eight geometry units each have a tessellation unit, and more lower-level cache.
Nvidia has really improved their memory controller over last generation as there is a 256-bit wide GDDR5 memory interface at 6Gbps declared throughput for each of the two GPUs. An onboard PLX bridge chip provides independent PCI Express 3.0 x16 access to both GPUs for maximum multi-GPU throughput.
The memory subsystem of the GeForce GTX 670 is identical to the GTX 680, consisting of four 64-bit memory controllers (256-bit) with 2GB of GDDR5 memory. GeForce GTX 690’s memory speed is 6008MHz data rate. The base clock speed of the GeForce GTX 670 is 915MHz. The typical Boost Clock speed is 980MHz.
The GeForce GTX 680 reference board measures 11″ in length whereas the GTX 670 is 9.5″. Display outputs include two dual-link DVIs, HDMI and one mini-DisplayPort connector. Two 6-pin PCIe power connectors are required for its operation.
This is a very brief overview of Kepler architecture as presented to the press at Kepler Editor’s Day in San Francisco a few weeks ago. When we attend Nvidia’s upcoming GPU Technology Conference (GTC) in less than two weeks time, you can expect a lot more details about the architecture.
GPU Boost
GPU Boost was invented by Nvidia to improve efficiency and to raise the GTX 690 clocks automatically in response to dynamically changing power requirements. Up until now, Nvidia engineers had to select clock speeds on a specific “worst case” power target – often a benchmark. Unfortunately, all apps are not equal in their power requirements and some applications are far more power-hungry than others. That means that in some games with lower power requirements, the game is not optimized for higher core frequency because it is limited by a global power target. With GPU Boost, there is real time dynamic clocking with polling every millisecond. In this way, clocks can be ramped up to meet the power target of each application – not held back by the most stressful application, which is usually a benchmark, not a game.
As we found with the GTX 680, GPU Boost goes hand-in-hand with overclocking and it delivers additional frequency in addition to the clocks set by the end user. GPU Boost continues to work while overclocking to the maximum allowed by the ever-changing power envelope.
Moving the voltage higher also moves the frequency and boost higher. In practice, if you monitor the frequencies, they constantly change up and down.
Adaptive VSync
Traditional VSync is great for eliminating tearing until the frame rate drops below the target – then there is a severe drop from usually 60 fps down to 30 fps if it cannot meet exactly 60. When that happens, there is a noticeable stutter.
Nvidia’s solution is to dynamically adjust VSync – to turn it on and off instantaneously. In this way VSync continues to prevent tearing but when it drops below 60 fps, it shuts off VSync to reduce stuttering instead of drastically dropping frame rates from 60 to 30 fps or even lower. When the minimum target is again met, VSync kicks back in. In gaming, you never notice Adaptive VSync is happening; you just notice less stutter (especially in demanding games).
Adaptive VSync is a good solution that works well in practice. We spent more time with Adaptive VSync in playing games and it is very helpful although we never use it when benching.
FXAA & TXAA
TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution).
There is TXAA 1 which extracts a performance cost similar to 2xAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA.
TXAA will be the subject of an IQ analysis in a forthcoming article and we are told that we shall see games that support it natively this year. For now, it appears to be a great option for the situations where MSAA doesn’t work efficiently.
FXAA
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games (Duke Nukem Forever), while in other games text may be a bit blurry. FXAA is a great option to have when MSAA kills performance. We plan to devote a entire evaluation to comparing IQ between the HD 7000 series and the GTX 600 series as well as comparisons with the older series video cards.
Specifications
Here are the specifications for the GTX 680:
Here are the specifications of the GTX 670:
We see everything is very similar to the GTX 680. The GeForce GTX 670 was designed from the ground up to deliver exceptional tessellation performance which Nvidia claims is about 8 times the HD 7950’s tessellation performance. Tessellation is the key component of Microsoft’s DirectX 11 development platform for PC games.
Tessellation allows game developers to take advantage of both GeForce GTX 670 GPU’s tessellation ability to increase the geometric complexity of models and characters to deliver far more realistic and visually rich gaming environments. Needless to say, the new GTX 670 brings a lot of features to the table that current Nvidia’s customers will appreciate, including improved CUDA’s PhysX, 2D and 3D Surround plus the ability to drive up to 3 LCDs plus a 4th accessory display from a single GTX 670 ; superb tessellation capabilities and a really fast and power efficient GPU in comparison to their previous GTX 570.
Surround plus an Accessory display from a single card
One of the criticisms that Kepler has addressed from Fermi was that two video cards in SLI are required to run 3-panel Surround or 3D Vision Surround. From a single card, the GTX 670, 680 and the GTX 690 now can run three displays plus an accessory display. Interestingly, Nvidia has changed their taskbar from the left side to the center screen. We now prefer the taskbar in the center; it might be more convenient for some users rather than clicking all the way over to the left for the start menu as with Eyefinity.
One thing that we did notice. Suround and 3D Vision Surround are now just as easy to configure as AMD’s Eyefinity. And AMD has no real answer to 3D Vision or 3D Vision Surround – HD3D lacks basic support in comparison.
One new option with the GTX 670/680/690 is in the bezel corrections. In the past, the in-game menus would get occluded by the bezels and it was annoying if you use the correction. Now with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel. However, this editor does not ever use bezel correction in gaming.
One thing that we did note – Surround suffers from less tearing than Eyefinity. The only true solution to tearing in Eyefinity is to have all native DisplayPort displays or opt for the much more expensive active adapters. And you will need two adapters for Eyefinity for most HD 7970s to run Eyefinity, whereas you only need one for Surround and the GTX 670 and the GTX 680.
Nvidia claims a faster experience with the custom resolutions because of a faster center display acceleration. Of course, we tested Surround’s 5760×1080 resolution and even 3D Vision Surround. Check out the results in the Performance Summary chart and the 3D Vision/Surround section.
A look at the GTX 670
Nvidia has redesigned their GEFORCE logo and the GTX 670 is on a short PCB compared to the GTX 680. With the GeForce GTX 670, Nvidia’s board partners have the option to produce custom GTX 670 boards on launch day. To get the GeForce GTX 670 into smaller form factor chassis, Nvidia made a number of adjustments to the reference board to save space by moving the GTX 670’s power supply much closer to GPU.
With the GTX 670’s power circuitry moved to the other side of the board, the area on the right side of the PCB was removed to save board space. The same cooling fan used on the GeForce GTX 680 is adapted for the GTX 670 and it is fitted with acoustic dampening material to minimize unwanted tones in the fan noise. It is a pretty quiet card although not as quiet as the GTX 680.
The GTX 670’s blower fan exhausts hot air from the GPU outside the system chassis helping to reduce temperature inside the PC. This feature is particularly useful for small form factor PCs including possibilities for home theater PC (HTPC).
.
Here you can see the GTX 670 with its cover removed.
And here is the bare PCB.
SLI
The GTX 670 is set up for SLI by using two GTX 570s. We hope to bring you a follow-up evaluation comparing GTX 670 SLI performance to the GTX 690.
The specifications look extraordinary with solid improvements over the Fermi-based GTX 570. Let’s check out performance after we look at our test configuration on the next page.
A job well done
Wow, I’m flabbergasted by this GTX 670!
GTX 680 has 25% more shader and texturing power than GTX 670 (and also 10% more ROP performance due to 10% higher clock), yet GTX 670 comes to within less than 10% of GTX 680’s overall performance. Also, when overclocked, GTX 670 beats stock GTX 680 in every single game tested here!!!
I do not think I have seen such a card being that badly bandwidth-bottlenecked in a long long time (GTX 680).
the gtx670 is very freakn nice. Can only imagine what the bigK is gonna look like. A beast for sure.
More research shows that GTX 670 has only about 3.5% lower core/shader clock than GTX 680 on average (actual clocks across a wide range of games, since the automatic boost usually hovers around 1.05 GHz). This means GTX 680 has 18% more shader/texturing power on average, rather than 25% as stated in the above post.
As usual, Mark, great review. Thank you!