The GTX 690 Arrives – Exotic Industrial Design takes the Performance Crown!
Architecture and Features
We have covered Fermi’s GK104 architecture in a lot of detail previously. You can can read our GTX 680 introductory article and and its follow-up. The new Kepler architecture builds on Fermi architecture with some important improvements and refinements that we will briefly cover before we get into performance testing.
SMX architecture
As Nvidia’s slide indicates, the new architecture is called SMX and it emphasizes 2x the performance per Watt of Fermi. Their multi-threaded engine handles all of the information using four graphics processing clusters including the raster engine and two streaming multi-processors.
The SM is now called the SMX cluster. Each SMX cluster includes a Polymorph 2.0 engine, 192 CUDA cores, 16 texture units and a lot of high-level cache. Four raster units and 128 Texture units comprise 32 ROPs; eight geometry units each have a tessellation unit, and more lower-level cache.
To sum it all up, a GTX 680 GPU consists of 2 SMXs each, times 16 SMXs each including 192 CUDA cores, equal 1536 CUDA cores for a GTX 680 and these are doubled for the GTX 690 because two full GTX 680 cores are used.
Nvidia has really improved their memory controller over last generation as there is a 256-bit wide GDDR5 memory interface at 6Gbps declared throughput for each of the two GPUs. An onboard PLX bridge chip provides independent PCI Express 3.0 x16 access to both GPUs for maximum multi-GPU throughput.
The memory subsystem of the GeForce GTX 690 is similar to the GTX 680, consisting of four 64-bit memory controllers (256-bit) with 2GB of GDDR5 memory per GPU (4GB total).
The base clock speed of the GeForce GTX 690 is 915MHz. The typical Boost Clock speed is 1019MHz. The Boost Clock speed is based on the average GeForce GTX 690 card running a wide variety of games and applications. Note that the actual Boost clock will vary from game-to-game depending on actual system conditions. GeForce GTX 690’s memory speed is 6008MHz data rate.
The GeForce GTX 690 reference board measures 11″ in length. Display outputs include three dual-link DVIs, and one mini-DisplayPort connector. Two 8-pin PCIe power connectors are required for its operation.
This is a very brief overview of Kepler architecture as presented to the press at Kepler Editor’s Day in San Francisco a few weeks ago. When we attend Nvidia’s upcoming GPU Technology Conference (GTC) in less than two weeks time, you can expect a lot more details about the architecture.
GPU Boost
GPU Boost was invented by Nvidia to improve efficiency and to raise the GTX 690 clocks automatically in response to dynamically changing power requirements. Up until now, Nvidia engineers had to select clock speeds on a specific “worst case” power target – often a benchmark.
Unfortunately, all apps are not equal in their power requirements and some applications are far more power-hungry than others. That means that in some games with lower power requirements, the game is not optimized for higher core frequency because it is limited by a global power target. With GPU Boost, there is real time dynamic clocking with polling every millisecond. In this way, clocks can be ramped up to meet the power target of each application – not held back by the most stressful application, which is usually a benchmark, not a game.
As we found with the GTX 680, GPU Boost goes hand-in-hand with overclocking and it delivers additional frequency in addition to the clocks set by the end user. GPU Boost continues to work while overclocking to the maximum allowed by the ever-changing power envelope.
Moving the voltage higher also moves the frequency and boost higher. In practice, if you monitor the frequencies, they constantly change up and down.
Adaptive VSync
Traditional VSynch is great for eliminating tearing until the frame rate drops below the target – then there is a severe drop from usually 60 fps down to 30 fps if it cannot meet exactly 60. When that happens, there is a noticeable stutter.
Nvidia’s solution is to dynamically adjust VSync – to turn it on and off instantaneously. In this way VSync continues to prevent tearing but when it drops below 60 fps, it shuts off VSync to reduce stuttering instead of drastically dropping frame rates from 60 to 30 fps or even lower. When the minimum target is again met, VSync kicks back in. In gaming, you never notice Adaptive VSync is happening; you just notice less stutter ( especially in demanding games).
Adaptive VSync is a good solution that works well in practice. We spent more time with Adaptive VSync in playing games and it is very helpful although we never use it when benching.
FXAA & TXAA
TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution).
There is TXAA 1 which extracts a performance cost similar to 2xAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA.
TXAA will be the subject of an IQ analysis in a forthcoming article and we are told that we shall see games that support it natively this year. For now, it appears to be a great option for the situations where MSAA doesn’t work efficiently.
FXAA
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games (Duke Nukem Forever), while in other games text may be a bit blurry for some. FXAA is a great option to have when MSAA kills performance. We plan to devote a entire evaluation to comparing IQ between the HD 7000 series and the GTX 600 series as well as comparisons with the older series video cards.
Specifications
Here are the specifications for the GTX 680:
Here are the specifications of the GTX 690; it is a near-doubling of the specifications and an 8+8-pin PCIe power connectors are used.
We see everything is just about double when compared to the GTX 680 – the only difference is the memory clocks are set a bit lower but the GPU boost is set higher to nearly compensate. The GeForce GTX 690 was designed from the ground up to deliver exceptional tessellation performance which Nvidia claims is about 8 times the HD 7970’s tessellation performance. Tessellation is the key component of Microsoft’s DirectX 11 development platform for PC games.
Tessellation allows game developers to take advantage of both GeForce GTX 680 GPU’s tessellation ability to increase the geometric complexity of models and characters to deliver far more realistic and visually rich gaming environments. Needless to say, the new GTX 690 brings a lot of features to the table that current Nvidia’s customers will appreciate, including improved CUDA’s PhysX, 2D and 3D Surround plus the ability to drive up to 3 LCDs plus a 4th accessory display from a single GTX 690 with no adapters required for the most popular dual-DVI enabled displays; superb tessellation capabilities and a really fast and power efficient GPU in comparison to their previous dual-GPU flagship, the GTX 590.
Surround plus an Accessory display from a single card
One of the criticisms that Kepler addressed in Fermi was that two video cards in SLI are required to run 3-panel Surround or 3D Vision Surround. The GTX 680 and the GTX 690 now can run three displays plus an accessory display and Nvidia has changed their taskbar from the left side to the center screen. We didn’t find much difference with the taskbar in the center; it might be more convenient for some users.
One thing that we did notice. Suround and 3D Vision Surround are now just as easy to configure as AMD’s Eyefinity. And AMD has no real answer to 3D Vision or 3D Vision Surround – HD3D lacks basic support in comparison.
One new option with the GTX 680/690 is in the Bezel corrections. In the past, the in-game menus would get occluded by the bezels and it was annoying if you use the correction. Now with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel. However, this editor does not ever use bezel correction in gaming.
Nvidia claims a faster experience with the custom resolutions because of a faster center display acceleration. Of course, we tested Surround’s 5760×1080 resolution and even 3D Vision Surround. Check out the results in the Performance Summary chart and the 3D Vision/Surround section.
just WOW!!!!!
I added a section on Overclocking, Power Draw and Temperatures that compares the overclocked and overvolted HD 7970’s power draw to the overclocked GTX 690.
Also, added the charts that specifically focus on performance scaling that comes from overclocking the GTX 690. Ivy Bridge might be too slow for some games at 1920×1200!
This graphics card really looks amazing! I can’t believe the pure power it packs. The price is quite high, however – I guess Nvidia can justify this as some of the components are quite rare.