Announcing Nvidia’s GK110 Tesla K20 and K20x
Two weeks ago, the U.S. Department of Energy’s Oak Ridge National Laboratory (ORNL) launched a new era of scientific supercomputing with Titan. Today it is announced that Titan has just taken the title as the fastest supercomputer in the world. Titan was made possible by using Nvidia’s new K20X Tesla accelerators to do ninety percent of the computing load.
Using 18,688 Nvidia Tesla K20X GPU accelerators, the Titan supercomputer took the world’s top spot with a performance record of 17.59 petaflops as measured by the LINPACK benchmark. Best of all, Tesla K20X accelerator is energy efficient and Titan achieved 2,142.77 megaflops of performance per watt, which surpasses the energy efficiency of the number one system on the most recent Green500 list of the world’s most energy-efficient supercomputers.
To coincide with SC12 in Salt Lake City – where it is today confirmed that Titan is the world’s fastest supercomputer on the top 500 list – Nvidia is announcing the Tesla K20 family of accelerators. Intel will be there also to unveil their brand new Xeon Phi accelerators and although AMD is behind in supercomputing, AMD launches its dual-GPU FirePro S10000 server-card with 6GB of memory.
Today we are going to take a closer look at Nvidia’s new K20 family of accelerators – the K20 and the K20X. The K20X provides the highest computing performance ever available in a single processor, surpassing all other processors on two common measures of computational performance – 3.95 teraflops single-precision and 1.31 teraflops double-precision peak floating point performance.
The new K20 family also includes the Tesla K20 accelerator, which provides 3.52 teraflops of single-precision and 1.17 teraflops of double-precision peak performance. The K20X is clocked at 732MHz for the core clock and 5.2GHz for the memory clock while the K20 is clocked slightly lower at 706MHz with the same memory clock.Above is pictured Nvidia’s new GK110 Kepler GPU which at 7.1 billion transistors is the most complex piece of silicon anywhere. Although we are primarily gamers, we realize that these very same GPUs that power supercomputers as Tesla are used in our GeForce video cards. ABT has been following GK110 closely since we covered Nvidia’s GTC 2012 where the new architecture was unveiled.
The GPU Technology Conference 2012 was not about gaming GPUs and it wasn’t mentioned in the Whitepaper, but look very carefully at the die shot above. It is obvious that there are 5 Graphics Processing Clusters (GPCs) and 3 SMX modules per GPC. A GPC is constructed like a “mini GPU” which contain SMXs; two in the case of GK104 and three in the case of GK110.
A completely functional GK110 GPU would be made up of 15 SMXes for a total of 2880 CUDA cores (192 x 15). However, for yield purposes and to differentiate faster, more complex, and more expensive processors from less expensive, slower and less complex products, parts are often disabled and clockspeeds lowered.
In the case of K20X, 14 SMXes are enabled for a total of 2688 CUDA cores. The K20 features 13 SMXes for a total of 2496 CUDA cores. Although Nvidia absolutely will not comment on unreleased products, we can probably expect gaming GPUs based on this same GK110 GPU and perhaps the K20X may correspond to a future GTX 780 and the K20, to the future GTX 770 just as the K10 corresponds to the GTX 690. So the specifications of these professional cards do interest gamers.
The Specifications
Here are the specifications as released in Nvidia’s chart comparing the new single GPU K20/K20X to the dual GPU K10. TDP isn’t mentioned but both varieties of KX20 fit within the 225W TDP specification.
Features
Here are the K20 features and benefits as released in Nvidia’s chart.
As we learned at the GTC 2012 this Spring, the new Kepler architecture is far more efficient than Fermi and it also offers new features which speed up computing such as Dynamic Parallism and HyperQ. HyperQ (below left) provide a significant speedup for legacy MPI codes while Dynamic Parallelism (below right) enables the GPU to generate code for itself instead of waiting on the CPU.
ABT has watched Nvidia venture officially into supercomputing with Nvision08, just 4 years ago. And during this time, it has carved out and pioneered GP-GPU computing – first taking a simple graphics accelerator for PC gaming and making it programmable back in 1999 thus allowing it to be used for calculations other than graphics. There is very little difference in a GPU making scientific calculations to calculating geometry for a game.
To support GP-GPU, Nvidia released CUDA in 2007 and it has gone through 5 iterations. From humble beginnings, CUDA has grown significantly in just four years. CUDA is Nvidia’s own proprietary GPU language which can be considered similar to x86 for CPU. From 150,000 CUDA downloads and 1 Supercomputer in 2008 to 1,500,000 CUDA downloads and 36 supercomputers today; and from 60 universities and 4,000 academic papers to 629 universities teaching CUDA and over 22,500 academic papers – all in 4 years!
The reason that GPGPU computing has exploded onto the supercomputing scene is because it is very disruptive to the CPU technology that dominates. The GPU is much more efficient at certain tasks – often there is a ten times speedup or far more in many programs over using the CPU by itself. Also, the energy efficiency of the GPU is superior. What Jaguar could accomplish in 42 days, can now be done in less than ten with Titan using about the same amount of energy per day!
K20 Availability
The Nvidia Tesla K20 family of GPU accelerators is shipping today and available for order from leading server manufacturers, including Appro, ASUS, Cray, Eurotech, Fujitsu, HP, IBM, Quanta Computer, SGI, Supermicro, T-Platforms and Tyan, as well as from Nvidia’s reseller partners.
The Future
To achieve exascale computing, GPUs have to become much faster as well as more energy efficient. Here is Nvidia’s latest roadmap which shows Maxwell due to arrive in 2014 on the new 20nm process.
Nvidia’s Competition
Of course, Nvidia is competing against the traditional CPU supercomputers which include IBM. And there is also Intel and AMD. AMD is brand new to supercomputing and they are working quickly to build their own ecosystem of partners using OpenCL as their GPU language. Intel is launching Xeon Phi at SC12 and is no doubt hoping their relationship with x86 will bring them success. From the slide below, Nvidia wishes to remind us that there is no advantage in x86 whatsoever.
What gamers are most looking forward to most are to the new video cards that will be based on GK110. Since Spring, ABT has been predicting their arrival after the demand for GK110 Tesla and Quadro are filled and as yields continue to improve on the 28nm process.
Nvidia is over two months ahead of schedule in delivering over 18,000 Tesla K20X GK110 accelerators in filling Titan’s order. We are hoping that they are stockpiling GPUs for their next GeForce and we will keep our readers up-to-date with the very latest news in graphics. Make sure you check out ABT’s forum where some of the best tech discussions anywhere are taking place.
Happy Gaming Computing!
Please join us in our Forums
Become a Fan on Facebook
Follow us on Twitter
For the latest updates from ABT, please join our RSS News Feed
Join our Distributed Computing teams
- Folding@Home – Team AlienBabelTech – 164304
- SETI@Home – Team AlienBabelTech – 138705
- World Community Grid – Team AlienBabelTech