Nvidia’s Titan arrives to take the performance crown – the Preview
Last Autumn, the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) launched a new era of scientific supercomputing with Titan. In fact, Titan took the title as the fastest supercomputer in the world. Titan was made possible by using Nvidia’s new K20X Tesla accelerators to do ninety percent of the computing load. This same GPU is now in the GTX Titan and Nvidia wants you to be able to have your own Personal Super Computer.
Using 18,688 Nvidia Tesla K20X GPU accelerators, the Titan supercomputer took the world’s top spot with a performance record of 17.59 petaflops as measured by the LINPACK benchmark. Best of all, Tesla K20X accelerator is energy efficient and Titan achieved 2,142.77 megaflops of performance per watt, which surpasses the energy efficiency of the number one system on the most recent Green500 list of the world’s most energy-efficient supercomputers.
The Titan is the same GeForce GK110 GPU that is also used for Nvidia’s new Kepler architecture K20 family of accelerators – the K20 and the K20X. The K20X provides the highest computing performance ever available in a single processor, surpassing all other processors on two common measures of computational performance – 3.95 teraflops single-precision (SP) and 1.31 teraflops double-precision (DP) peak floating point performance.
Titan is Double Precision Capable
The GeForce Titan can also run double-precision compute at one-third of single-precision speeds, giving over 1 teraflop double-precision peak floating point performance. However, the DP rate is set to 1/24th of single-precison since no games use double-precision calculations. The full 1/3rd ratio can be set via the control panel, yet doing so forces the GPU’s clocks down. Nvidia is actually looking to grow their CUDA ecosystem by providing Titan GeForce cards to programmers on a budget since K20X costs over $3,000; of course, Titan will not be able to provide the full features of the Tesla professional card.
The new K20 family also includes the Tesla K20 accelerator, which provides 3.52 teraflops of single-precision and 1.17 teraflops of double-precision peak performance. The K20X is clocked at 732MHz for the core clock and 5.2GHz for the memory clock while the K20 is clocked slightly lower at 706MHz with the same memory clock. The Titan GTX GPU is clocked at 837MHz on the core clock with a Boost to 876MHz and the memory is clocked at 6GHz.
Above is pictured Nvidia’s GK110 Kepler GPU which at 7.1 billion transistors is the most complex piece of silicon anywhere. Although we are primarily gamers, we realize that these very same GPUs that power supercomputers as Tesla are used in our GeForce video cards. ABT has been following GK110 closely since we covered Nvidia’s GTC 2012 where the new architecture was unveiled.
The GPU Technology Conference 2012 was not about gaming GPUs and it wasn’t mentioned in the Whitepaper, but look very carefully at the die shot above. It is obvious that there are 5 Graphics Processing Clusters (GPCs) and 3 SMX modules per GPC. A GPC is constructed like a “mini GPU” which contain SMXs; two in the case of GK104 and three in the case of GK110.
A completely functional GK110 GPU would be made up of 15 SMXes for a total of 2880 CUDA cores (192 x 15). However, for yield purposes and to differentiate faster, more complex, and more expensive processors from less expensive, slower and less complex products, parts are often disabled and clockspeeds lowered. In the both cases for Titan and the K20X, 14 SMXes are enabled for a total of 2688 CUDA cores.
The Specifications
Here are the specifications as released in Nvidia’s chart.
Features
Compared to GeForce GTX 680, GeForce GTX Titan boasts 75% more cores and texture units for handling core graphics functions like pixel/vertex/geometry shading and texture filtering. Titan’ 187.5 Gigatexels/sec fill-rate is over 45% higher than GeForce GTX 680, and with a 384-bit memory interface and 6GB of memory running at 6GHz effective memory clock, GeForce GTX Titan boasts 50% more memory bandwidth than GTX 680.
There are also some new or reworked Kepler features for Titan over the GTX 600 series. The first is GPU Boost 2.0
GPU Boost 2.0
The original GPU Boost was designed to reach the highest possible clock speed while remaining within a predefined power target so that the GPU would boost to the maximum clock speed it could achieve while remaining under a certain power level. In the case of the GTX 680, that power level was 170 watts. Nvidia noted that GPU power was unnecessarily limiting performance when GPU temperatures was low. Therefore for Boost 2.0, they switched from boosting based on a GPU power target, to a GPU temperature target. This new temperature target is 80 degrees Celsius. As a result of Nvidia’s change, the GeForce GTX Titan GPU will automatically boost to the highest clock frequency it can achieve as long as the GPU temperature remains at or below 80C. The GPU constantly monitors GPU temperature, adjusting the GPU’s clock and its voltage on-the-fly to maintain this temperature. GPU Boost 2.0 can also deliver quieter noise levels since temperature is controlled to a tighter range around the user target. This in turn helps keep the fan speed stable and reduces the overall system level noise.
In addition to switching from a power-based boost target to a temperature-based target, Nvidia also gave end users more controls for tweaking GPU Boost behavior. Using software tools provided by add-in card partners, Titan users can adjust the GPU temperature target. By adjusting the temperature target higher, the GPU will then boost to higher clock speeds until it reaches the new temperature target.
Due to the change in the way GPU Boost 2.0 functions, the power target setting no longer sets the typical board power, but rather it sets the board’s max power. At the default power target setting of 100%, the max power setting is 250W. At the max slider setting of 106%, max board power is 265W. Note that typical board power will vary based on the ambient temperature. GPU Boost 2.0 is designed ensure maximum performance from water cooled solutions since the temperature now basically determines the clocks.
GPU Boost 2.0 – Overvolting
Because Titan’s boost clock and voltage level is now tied to the GPU temperatures, Nvidia allows the GPU voltage to go higher than they allowed with Kepler 600 series cards. Just as with the 600 series, voltages on the Titan are limited to a range fully qualified by Nvidia. This voltage range is designed to protect the silicon from long term damage.
However, those who want to push their GPUs to the limit by raising the maximum voltage further, GPU Boost 2.0 enables extra “overvoltaging” capability. This “unlocking” requires users to acknowledge the risk to their GPU’s warranty by clicking through a warning as overvoltaging is disabled by default. Each individual Titan manufacturer may limit the degree of overvoltaging supported by their cards. Support of overvoltaging is optional, and can be completely disabled in the VBIOS if they choose to do so.
GPU BOOST 2.0: DISPLAY OVERCLOCKING
Many gamers prefer playing their games with Vertical sync (VSync) enabled. While this prevents tearing, enabling VSync caps the frame rate to the refresh rate of your monitor which is usually 60 Hz for most LCDs. As a result, the game is limited to a maximum frame rate of 60 frames per second, even though your GPU could be rendering the scene at a much higher frame rate.
With GPU Boost 2.0, Nvidia added a new feature that makes display overclocking possible to deliver a higher effective frame rate. Using tools provided by Titan manufacturers, you may be able to overclock the pixel clock of your display, However, not too many displays support overclocking.
FXAA & TXAA
TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution).
There is TXAA 1 which extracts a performance cost similar to 2xAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA.
FXAA
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games (Duke Nukem Forever), while in other games text may be a bit blurry for some. FXAA is a great option to have when MSAA kills performance. We plan to devote a entire evaluation to comparing IQ between the HD 7000 series and the GTX 600 series as well as comparisons with the older series video cards.
Surround plus an Accessory display from a single card
One of the criticisms that Kepler addressed in Fermi was that two video cards in SLI are required to run 3-panel Surround or 3D Vision Surround. The GTX 680 and the GTX 690 now can run three displays plus an accessory display and Nvidia has changed their taskbar from the left side to the center screen. We didn’t find much difference with the taskbar in the center; it might be more convenient for some users.
One thing that we did notice. Suround and 3D Vision Surround are now just as easy to configure as AMD’s Eyefinity. And AMD has no real answer to 3D Vision or 3D Vision Surround – HD3D lacks basic support in comparison.
One new option with the GTX Titan as with other 600 series cards, is in the Bezel corrections. In the past, the in-game menus would get occluded by the bezels and it was annoying if you use the correction. Now with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel. However, this editor does not ever use bezel correction in gaming.
Nvidia claims a faster experience with the custom resolutions because of a faster center display acceleration. Of course, we tested Surround’s 5760×1080 resolution and even 3D Vision Surround. Check out the results in the Performance Summary chart and the 3D Vision/Surround section of part two.
holy geeeez
Hi there, first of all thank you for a great site. I love your extended benchmark suite, it’s so great to see a site not use the same few games again and again. I have a few things though, that I’m missing.
1) It would be nice if you could take a screenshot of the image settings in the Catalyst and nvidia control panel. There are just so many settings, and I’m not quite sure how to interpret what you write.
2) Looking at your previous “The war of the WHQL” article, I was missing an easy way to compare the overall performance. Something like a performance index with and without AA would be great. And perhaps highlight the highest score in the table
3) I’m not sure what the most normal/popular clock speed of the 680 and 7970 is, but since few people buy reference cards, I think you should use a clock speed that more closely resembles what people get when they buy an Asus, MSI or Sapphire card and so on. The performance difference is quite large between the slowest and fastest Asus 670 for instance. The TOP model (which I think is the most popular one), has a GPU Boost Clock of 1137 MHz and a GPU Base Clock of 1058 MHz, compared to a GPU Boost Clock of 980 MHz and a GPU Base Clock of 915 MHz. In benchmarks the fastest model is a lot faster, there is really a big difference but many don’t seem to pay any attention to it. Also your 7970 is listed as overclocked? I’m not sure what the most popular speed of a 7970 is again, but it just caught my eye.
4) Don’t stop testing for smoothness
Thank-you for your comments.
1) As to the control panels, they are set to default (“use application settings”) except that Nvidia’s has power limitations removed, High Quality is used instead of Quality and Vsync is off. In the AMD CP, High Quality is used instead of quality, surface and other optimizations including tessellation are OFF (application settings override CP setting)
2) We almost never bench without AA. We do highlight the highest scores when we are only looking at two sets of drivers to see the performance changes. We don’t usually do it when we are comparing 4 sets of drivers (2-AMD and 2-Nvidia)
3) We use the reference clocks for a GTX 680 and the reference clocks for a HD 7970 at GHz speeds (with the boost locked on)
4) We are resuming frame time benching this week
You might consider joining ABT forum. We’d love to have your input there!