The GTX 780 arrives – 25 Games benchmarked!
Last Autumn, the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) launched a new era of scientific supercomputing with Titan. In fact, Titan took the title as the fastest supercomputer in the world. Titan was made possible by using Nvidia’s new K20X Tesla accelerators to do ninety percent of the computing load. This basic GK110 GPU is in the GTX Titan and now also in the GTX 780.
Using 18,688 Nvidia Tesla K20X GPU accelerators, the Titan supercomputer took the world’s top spot with a performance record of 17.59 petaflops as measured by the LINPACK benchmark. Best of all, Tesla K20X accelerator is energy efficient and Titan achieved 2,142.77 megaflops of performance per watt, which surpasses the energy efficiency of the number one system on the most recent Green 500 list of the world’s most energy-efficient supercomputers.
The Titan is the same GeForce GK110 GPU that is also used for Nvidia’s new Kepler architecture K20 family of accelerators – the K20 and the K20X. The K20X provides the highest computing performance ever available in a single processor, surpassing all other processors on two common measures of computational performance – 3.95 teraflops single-precision (SP) and 1.31 teraflops double-precision (DP) peak floating point performance.
Differences between the GTX 780 and Titan
The GeForce Titan can also run double-precision compute at one-third of single-precision speeds, giving over 1 teraflop double-precision peak floating point performance. Nvidia is actually looking to grow their CUDA ecosystem by providing Titan GeForce cards to programmers on a budget since K20X costs over $3,000. In contrast, there is no significant double-precision available on the GTX 780 for compute; all of its resources are devoted to performance gaming.
The new K20 family also includes the Tesla K20 accelerator, which provides 3.52 teraflops of single-precision and 1.17 teraflops of double-precision peak performance. The K20X is clocked at 732MHz for the core clock and 5.2GHz for the memory clock while the K20 is clocked slightly lower at 706MHz with the same memory clock. The Titan GTX GPU is clocked at 837MHz on the core clock with a Boost to 876MHz and the memory is clocked at 6GHz. In contrast, the GTX 780 is clocked higher at MHz and with a more aggressive boost backed up by a significantly higher TDP. The GTX 780 is a pure gaming card!
The new GTX 780 is slightly cut down from the K20 as it has 12 instead of 13 SMX units enabled – and two less enabled than Titan – but it runs at a higher clock and with a higher boost in an attempt to compensate for performance.
Above is pictured Nvidia’s GK110 Kepler GPU which at 7.1 billion transistors is the most complex piece of silicon anywhere. Although we are primarily gamers, we realize that these very same GPUs that power supercomputers as Tesla are used in our GeForce video cards. ABT has been following GK110 closely since we covered Nvidia’s GTC 2012 where the new architecture was unveiled.
The GPU Technology Conference 2012 was not about gaming GPUs and it wasn’t mentioned in the Whitepaper, but look very carefully at the die shot above. It is obvious that there are 5 Graphics Processing Clusters (GPCs) and 3 SMX modules per GPC. A GPC is constructed like a “mini GPU” which contain SMXs; two in the case of GK104 and three in the case of GK110.
A completely functional GK110 GPU would be made up of 15 SMXes for a total of 2880 CUDA cores (192 x 15). However, for yield purposes and to differentiate faster, more complex, and more expensive processors from less expensive, slower and less complex products, parts are often disabled and clockspeeds lowered. In the both cases for Titan and the K20X, 14 SMXes are enabled for a total of 2688 CUDA cores. The K20 has 13 SMXes enabled and the GTX 780 has 12 SMXes.
The GeForce GTX 780 ships with 12 SMX units providing 2304 CUDA Cores. The memory subsystem of GeForce GTX 780 consists of six 64-bit memory controllers (384-bit) with 3GB of GDDR5 memory.
The base clock speed of the GeForce GTX 780 is 863MHz. The typical Boost Clock speed is 900MHz. The Boost Clock speed is based on the average GeForce GTX 780 card running a wide variety of games and applications. Note that the actual Boost clock will vary from game-to-game depending on actual system conditions. GeForce GTX 780’s memory speed is 6008MHz data rate.
The GeForce GTX 780 reference board measures 10.5” in length. Display outputs include two dual-link DVIs, one HDMI and one DisplayPort connector. One 8-pin PCIe power connector and one 6-pin PCIe power connector are required for operation. The GeForce GTX 780 will be taking the place of the GeForce GTX 680 in Nvidia’s lineup.
The Specifications
Here are the GTX 780 specifications as released in Nvidia’s chart.
Now compare to the GTX Titan and notice that although the GTX 780 is lacking in CUDA cores, it has a higher clockspeed, boost and TDP:
Features
Compared to GeForce GTX 680, the GeForce GTX Titan boasts 75% more cores and texture units for handling core graphics functions like pixel/vertex/geometry shading and texture filtering. Titan’ 187.5 Gigatexels/sec fill-rate is over 45% higher than GeForce GTX 680, and with a 384-bit memory interface and 6GB of memory running at 6GHz effective memory clock, GeForce GTX Titan boasts 50% more memory bandwidth than GTX 680. Similarly, the GeForce GTX 780’s 384-bit memory interface provides up to 288.4GB/sec of peak memory bandwidth to the GPU using 3GB of memory at 6GHz effective memory clock.
In comparison, to the GTX 680, the GTX 780 has 50% more cores and 50% more memory. There are also some new or reworked Kepler features for Titan also present in the GTX 780 over the GTX 600 series. The first is GPU Boost 2.0
GPU Boost 2.0
The original GPU Boost was designed to reach the highest possible clock speed while remaining within a predefined power target so that the GPU would boost to the maximum clock speed it could achieve while remaining under a certain power level. In the case of the GTX 680, that power level was 170 watts. Nvidia noted that GPU power was unnecessarily limiting performance when GPU temperatures was low. Therefore for Boost 2.0, they switched from boosting based on a GPU power target, to a GPU temperature target. This new temperature target is 80 degrees Celsius. As a result of Nvidia’s change, the GeForce GTX 780 GPU will automatically boost to the highest clock frequency it can achieve as long as the GPU temperature remains at or below 80C. The GPU constantly monitors GPU temperature, adjusting the GPU’s clock and its voltage on-the-fly to maintain this temperature. GPU Boost 2.0 can also deliver quieter noise levels since temperature is controlled to a tighter range around the user target. This in turn helps keep the fan speed stable and reduces the overall system level noise.
In addition to switching from a power-based boost target to a temperature-based target, Nvidia also gave end users more controls for tweaking GPU Boost behavior. Using software tools provided by add-in card partners, Titan users can adjust the GPU temperature target. By adjusting the temperature target higher, the GPU will then boost to higher clock speeds until it reaches the new temperature target.
Due to the change in the way GPU Boost 2.0 functions, the power target setting no longer sets the typical board power, but rather it sets the board’s max power. At the default power target setting of 100%, the max power setting is 250W. At the max slider setting of 106%, max board power is 265W. Note that typical board power will vary based on the ambient temperature. GPU Boost 2.0 is designed ensure maximum performance from water cooled solutions since the temperature now basically determines the clocks.
GPU Boost 2.0 – Overvolting
Because GTX 780’s boost clock and voltage level is now tied to the GPU temperatures, Nvidia allows the GPU voltage to go higher than they allowed with Kepler 600 series cards. Just as with the 600 series, voltages on the GTX 780 are limited to a range fully qualified by Nvidia. This voltage range is designed to protect the silicon from long term damage.
However, those who want to push their GPUs to the limit by raising the maximum voltage further, GPU Boost 2.0 enables extra “overvoltaging” capability. This “unlocking” requires users to acknowledge the risk to their GPU’s warranty by clicking through a warning as overvoltaging is disabled by default.
GPU BOOST 2.0: DISPLAY OVERCLOCKING
Many gamers prefer playing their games with Vertical Sync (VSync) enabled. While this prevents tearing, enabling VSync caps the frame rate to the refresh rate of your monitor which is usually 60 Hz for most LCDs. As a result, the game is limited to a maximum frame rate of 60 frames per second, even though your GPU could be rendering the scene at a much higher frame rate.
With GPU Boost 2.0, Nvidia added a new feature that makes display overclocking possible to deliver a higher effective frame rate.
FXAA & TXAA
TXAA
There is a need for new kinds of anti-aliasing as many of the modern engines use differed lighting which suffers a heavy performance penalty when traditional MSAA is applied. The alternative, to have jaggies is unacceptable. TXAA – Temporal Anti-Aliasing is a mix of hardware mult-sampling with a custom high quality AA resolve that use temporal components (samples that are gathered over micro-seconds are compared to give a better AA solution).
There is TXAA 1 which extracts a performance cost similar to 2xAA which under ideal circumstances give similar results to 8xMSAA. Of course, from what little time we have spent with it, it appears to be not quite as consistent as MSAA but works well in areas of high contrast. TXAA 2 is supposed to have a similar performance penalty to 4xMSAA but with higher quality than 8xMSAA.
FXAA
Nvidia has already implemented FXAA – Fast Approximate Anti-Aliasing. In practice, it works well in some games and is very useful when MSAA kills performance or is not available in the game engine.
Surround plus an Accessory display from a single card
Unlike Fermi which required SLI, all Kepler cards including the GTX 780 can run three displays plus an accessory display. Surround and 3D Vision Surround are now just as easy to configure as AMD’s Eyefinity. Some will like Nvidia’s centered Windows taskbar compared to Eyefinity’s which is on the far left.
Bezel corrections are available to all Kepler series cards supporting Surround including the GTX 780. In the past, the in-game menus would get occluded by the bezels. However, with Bezel Peek, you can use hotkeys to instantly see the menus hidden by the bezel.
FYI eyefinity has a centered taskbar as well
where the hell is 3dvision benchs ?