NVIDIA’s GTC 2014 – final report
Tuesday featuring Jensen’s Keynote
Tuesday, March 25, 2014 – after the keynote | ||
Applications of GPU Computing to Mission Design and Satellite Operations at NASA’s Goddard Space Flight Center | ||
OpenGL: 2014 and Beyond | ||
A GPU-Based Free-Viewpoint Video System for Surgical Training | ||
Practical Real-Time Voxel-Based Global Illumination for Current GPUs | ||
Smoke & Mirrors: Advanced Volumetric Effects for Games |
Besides, Jensen’s Keynote from 9AM to 11AM, this editor attended the above sessions. At each of the previous GTCs, we have been following the progress of NASA using the GPU, we checked out programming; we have been following the GPU in medical research; and we are especially interested in what FlameWorks and Pixar have been most recently working on. And we were not disappointed as much progress has been made from year-to-year, and it has been 24 months since our last GTC. Before we look at each of these sessions, let’s check out the GTC’s Keynote address.
The GTC Keynote Address always sets the stage for each GTC. Nvidia’s CEO outlines the progress that they have been making in CUDA and with hardware since the last GTC. And this year, there were six major announcements that we shall look at in detail. The other sessions that we attended were related to the keynote and we have been following their progress since the first GTC.
We can look back on each of Jensen’s Keynotes since Nvision08 as a progress report and the state of CUDA and GPU computing. Each year, besides introducing new companies to GPU computing, we see the same companies invited back – including Pixar and VMworks – and we see their incredible progress which is tied to the GPU and its increases in performance. Each year, for example, we see Pixar delivering more and more incredible graphics and each movie requires exponentially more processing power than the one before it. This is progress that is *impossible* using the CPU. And it is almost now taken for granted.
The Keynote speech delivered by Nvidia’s superstar CEO Jen-Hsun Huang (aka “Jensen”) highlighted the rapid growth of GPU computing from its humble beginnings. No longer is GPU computing in the background, as shown by ten of the world’s top ten “greenest” and most power-efficient supercomputers now all run Nvidia GPU technology inside. Jensen made several major announcements during his two hour keynote.
Major announcements during the keynote
- NVLink is a high-speed interconnect between the CPU and the GPU which allows processors to share data approximately 5 to 12 times faster than what is possible currently.
- Erista (son of Wolverine aka Parker) has been inserted into the Tegra roadmap between Logan and Parker to add Maxwell GPU architecture.
- Pascal is the next architecture on Nvidia’s GPU roadmap which was inserted as new (since CES in January) between Maxwell and Volta. Pascal will utilize NVLink and a new 3D chip design stacking process to radically improve memory density and energy efficiency.
- Jetson TK1 was announced as an embedded development board SOC in a mobile form factor for $192 to match its 192 CUDA cores.
- The Iray VCA, is a graphics processing appliance for real-time photo-realistic rendering and real-world simulation for 50 thousand dollars, and an unlimited number of them can be chained together.
- GTX TITAN Z is a dual-GPU GK110 graphics card that will become Nvidia’s $3,000 flagship hybrid gaming/programming card
- Nvidia’s partnership with VMworks to virtualize entire enterprises into the GRID cloud was one of the most important announcements made at the GTC.
-
NVLink, a high-speed interconnect between the CPU and the GPU, enabling these two key processors to share data five to 12 times faster than possible today;
-
Pascal, the next milestone on Nvidia’s GPU architecture roadmap, will leverage NVLink and a new 3D chip design process to improve memory density and energy efficiency;
-
GeForce GTX Titan Z, a dual-GPU graphics card that will become the company flagship when it ships later this year for a suggested $2,999;
-
Jetson TK1, an embedded development board computer that puts supercomputing power in a mobile form factor for $192;
-
The Iray VCA, a graphics processing appliance for real-time photorealistic rendering and real-world simulation.
– See more at: http://gfxspeak.com/2014/03/25/keynote-announcements-wallop/#sthash.nt7y5Lbz.dpuf
-
NVLink, a high-speed interconnect between the CPU and the GPU, enabling these two key processors to share data five to 12 times faster than possible today;
-
Pascal, the next milestone on Nvidia’s GPU architecture roadmap, will leverage NVLink and a new 3D chip design process to improve memory density and energy efficiency;
-
GeForce GTX Titan Z, a dual-GPU graphics card that will become the company flagship when it ships later this year for a suggested $2,999;
-
Jetson TK1, an embedded development board computer that puts supercomputing power in a mobile form factor for $192;
-
The Iray VCA, a graphics processing appliance for real-time photorealistic rendering and real-world simulation.
– See more at: http://gfxspeak.com/2014/03/25/keynote-announcements-wallop/#sthash.nt7y5Lbz.dpuf
Nvidia’s CEO delivers the Keynote that defines the GTC
Nvidia’s CEO Jen-Hsun Huang (aka “Jensen”) delivered the keynote to a packed hall and he set the stage for the entire conference. This time, there was no general press conference, and the press had to question Nvidia staff afterward for further details. Jensen called the GTC, “the Woodstock of computational mathematicians” and reaffirmed that CUDA and the invention of GPU computing is what launched the GTC.
As in previous keynotes, Jensen detailed the history and the incredible growth of GPU computing and how it is changing our lives for the better. From humble beginnings, GPU computing has grown significantly since the first Nvision08, just six years ago. CUDA is Nvidia’s own proprietary GPU language which can be considered similar to x86 for CPU. There were over 600 presentations at the GTC. We attended about 18 sessions in two and one-half days.
NVLink
Jensen introduced NVLink as Nvidia’s solution to bottlenecks in communication between the GPU and the CPU in GPU computing and in graphics. It uses the PCIe programming model with DMA+ and it will function with Unified Memory and Cache coherency in Generation 2.0. It will significantly improve memory bandwidth by at least doubling the upcoming PCIe 4.0 specification and allowing a further up to 5X more bandwidth for multi-GPU (mGPU) scaling over the current standard.
Pascal Architecture – Needed for big data
NVLink will be introduced with Pascal architecture which is now scheduled after Maxwell and before Volta. Nvidia realizes that there needs to be a leap in performance from one generation to the next, and Pascal will feature stacked 3D memory interconnected by vias. ABT has been following the progress of 3D memory since our beginning.
3D or stacked memory is a technology which enables multiple layers of DRAM components to be integrated vertically on the package along with the GPU. Compared to current GDDR5 implementations, stacked memory provides significantly greater bandwidth, doubled capacity, and increased energy efficiency.
Faster data movement plus unified memory, will simplify GPU programming. Unified Memory allows the programmer to treat the CPU and GPU memories as one block of memory without worrying about whether it resides in the CPU’s or the GPU’s memory.
In addition to continuing to use PCIe, NVLink technology will be used for connecting GPUs to NVLink-enabled CPUs as well as for providing high-bandwidth connections directly between multiple GPUs. The basic building block for NVLink is a high-speed, 8-lane, differential, dual simplex bidirectional link.
Nvidia has designed a module to house GPUs based on the Pascal architecture with NVLink that is one-third the size of the standard PCIe boards used for GPUs today. Connectors at the bottom of the Pascal module enable it to be plugged into the motherboard, improving system design and signal integrity. It is also more energy-efficient than PCIe, despite the higher bandwidth.
The large increase in GPU memory size and bandwidth provided by 3D memory and NVLink will enable GPU applications to access a much larger working set of data at higher bandwidth, improving efficiency and computational throughput, and reducing the frequency of off-GPU transfers allowing it to handle Big Data.
The Google Brain and the need for Big Data
Jensen next talked about research that has taken place with the “Google Brain” which consists of 1000 CPU servers (16,000 cores) using 600 KWh of power in just five days, and costing $5 million dollars. Unfortunately, it is woefully short of what the human brain can do, being more comparable to a honey bee’s brain which is still more complex.
In contrast to the Google Brain, the human brain, on average, contains 100 billion neurons with each 1000 connections – or about 100 trillion connections. To simulate a single human brain would take 5 million Google brains, use more power than the sun outputs, and take 40,000 years to train it using CPUs as a platform.
In contrast, the Google brain can be replicated with 3 GPU accelerated servers consisting of 12 GPUs, 18K cores resulting in the same level of computation for 4K watts and $33K in energy costs, instead of $5 million dollars. Using GPU-computing allows even small-scale setups to develop new research, including Artificial Intelligence (AI) and for new applications including the following:
- Training a neural network
- Universal translator is possible
- Machine learning
- Big Data
It isn’t yet exascale, but it is on the way.
The next generation of graphics
Jensen went on to talk about the TITAN, a GeForce hybrid product that is being used by researchers and design artists for CUDA as well as for gaming. It has been very successful for Nvidia which Jensen likened to, “selling like hotcakes”.
Nvidia is focused also on gaming and graphics. Jensen introduced the TITAN Z for $3,000 reminding the audience that three of them can fit into a desktop PC and it just about equals the computing performance of the Google brain – for $9,000, not $5 million dollars.
Although the PC has become a million times more powerful over the past 30 years since the first computer-generated effects, it still takes 250 hours to render a single frame of the latest movies due to the incredible complexity of simulations. Jensen then put on an incredible demonstration in real time that showcased the latest state of computer graphics art. As he pointed out, much of the time is spent on the physics of the simulation.
Jensen went on to introduce what he called the world’s first physics solver using a sparse matrix that requires no grid. He showed a fire simulation split into 32 million Volumetric pixels (Voxels). What makes this physics solver special is the amount of work that it saves the artists and the developers. Once the scene is written, it becomes completely automatic.
On the scene on the right, everything is completely interactive and automatic. This is a big step forward and a great tool for developers.
The presentation that we attended, “Smoke and Mirrors” went into greater detail on using advanced volumetric effects in gaming.
The goal is that these kinds of tools will save significant time in the creation of games. Jensen then transitioned to Unreal Engine 4 as the first engine to have integrated Nvidia’s GameWorks into its engine. GameWorks is a collection of technologies and tools that include visual and physical simulation software, development kits, debuggers, algorithms, engines, and libraries.
Daylight, the upcoming UE4 game by Zombie Studios, is the first example is what is possible with Nvidia’s GameWorks and Unreal Engine 4. Daylight will contain a number of key GameWorks features including environmental cloth, Turbulence technology for dynamic fog, particle and smoke effects; PostWorks, as well as ShadowWorks (HBAO+).
Daylight is expected to be released for PC and PS4 on April 29 and we will bring you an ABT performance evaluation and perhaps a game review of it.
Epic Unreal Engine 4 and Integrated GameWorks.
Unreal Engine 4 can now be used to build games on more devices due to its support for the Android operating system. Nvidia and Epic have publicly shown UE4 PC demos running on Tegra K1 to demonstrate engine and hardware capabilities. Nvidia’s vision is to have mobile and desktop merge so that on Tegra K1, you can run the same content and the same engine using the OpenGL 4.3 API and also on the Android OS.
This same Unreal Engine 4 which includes the source code, is available to anyone who wants to create high-end, multi-platform game content for under $20 a month.
Jensen showed off a demo of the Unreal Engine 4 and pointed out that the on-the-fly physics calculations and deformable environment were being calculated in real time on today’s Nvidia GPUs.
Jensen called the future of graphics, “simulation and visualization coming together” and he said, the first glimpse of it is UE 4. Interactive is foremost about performance, while making the graphics as realistic as possible. Photorealism’s goal is different – its minimum bar is photo real – and its secondary goal is to make it perform well.
Photorealism and Iray
Jensen went on to demonstrate that it has become impossible to tell the difference between a “real” photo and an image that is completely generated by computer graphics.
This kind of realism opens up all kinds of possibilities that were demonstrated again and again at the GTC. Each GTC brings successive improvements to photorealism and now we are looking at improving performance again.
The Nvidia Iray Visual Computing Appliance (VCA) combines hardware and software to greatly accelerate the work this photorealistic renderer integrated into leading design tools like Dassault Systèmes’ CATIA and Autodesk’s 3ds Max. Because the appliance is scalable, multiple units can be linked, speeding up by hundreds of times or more the simulation of light bouncing off surfaces in the real world.
As a result, automobiles and other complex designs can be viewed seamlessly at high visual fidelity from all angles. This enables the viewer to move around a model while it’s still in the digital domain, as if it were a 3D physical prototype. The car on the right used 19 stacked Iray VCAs for 1 petaflop of computing.
The cost is $50,000 per unit – a real bargain with 8 Kepler GPUs with 12GB RAM, considering that it takes about $300,000 worth of Quadro K5000 workstations to achieve similar results. It was also interesting to see that the background can be manipulated to place the car anywhere in any scene. Iray VCA is called “an entire world simulator” and we were able to see this demonstrated many times at the GTC.
GRID – GPU in the cloud – virtualizing an entire industry, end-to-end with VMWare DaaS Platform
The partnership of VMware with Nvidia’s GRID may be potentially a huge announcement as there are about 30 million workstations worldwide and now they may be moved into the GRID cloud allowing for even more workstation growth. VMware is adding virtualized GPU support to bring Nvidia GRID technology to its VMware’s Horizon DaaS (desktop as a service) platform.
Jensen said, “this is a big deal”. With GRID and virtualized GPUs, any enterprise VMware cloud will be able to deploy virtual GPUs into their systems anywhere. There are about 500 million commercial desktops that are potential customers for Nvidia/VMWare and the entire industry can be virtualized end-to-end by next year.
Mobile CUDA
Mobile CUDA requires efficiency. Since Kepler powers 10 of the top 10 greenest supercomputers, Nvidia was able to extend that efficiency for mobile CUDA.
Tegra K1 unifies the KeplerGPU and Tegra architecture for the first time across all of Nvidia’s GPUs. Mobile CUDA allows for Computer Vision for assisted driving which requires many 2D images to reconstruct a 3D scene. We attended a session later on at the GTC which showed how GPU computing makes it possible for assisted and ultimately for autonomous driving.
The Jetson TK1
We are not sure if Nvidia is influenced by the 1960s Hanna-Barbera cartoon, “The Jetsons” and the family’s flying car, as they are aiming for a completely self-driving car; but they introduced the Jetson TK1 as the world’s first mobile super computer.
Nvidia is providing developers with the tools to create systems and applications that can enable robots to automatically navigate; for doctors to perform mobile ultrasounds; for drones to avoid moving objects; and for cars to detect pedestrians. The Jetson TK-1 developer kit uses Linux, has 192 CUDA cores, produces 326 Gflops and includes the a full C/C++ toolset plus the VisionWorks SDK for $192.
Erista
Jensen next teased the follow-up to Logan, as Erista. He gave very few details, but we know that Erista’s purpose is to bring Maxwell architecture to Tegra. Nvidia’s roadmap has not fundamentally changed as Parker is still expected after Erista.
Jensen did say to expect more efficiency for Tegra with Erista than with Kepler, and of course, more performance. We expect to get more information and we will will update ABT forum first.
Autonomous Cars
Jensen made a commitment to progress as cars literally need a supercomputer to pilot them, and Audi made the next presentation.
A self-piloting Audi joined Jensen on stage and a contrast was made between a year or two ago when the trunk was completely filled with computer equipment, and now instead with a tiny TK1 module that does even more!
The last surprise of the Keynote
Jensen showed off Portal which was recently ported by Valve to Android gaming and gave every member of the audience a free SHIELD as a gift. SHIELD’s software and capabilities have been recently updated along with the GeForce Experience and we shall give ABT readers a full report of its expanded capabilities.
After the Keynote session, we headed for an hour-long session before we broke for lunch in the main exhibit hall.
More Tuesday Sessions
There were many more sessions at the GTC that are quite technical and dealt with CUDA programming. From GTC to GTC, our goal has been to attend similar lectures and workshops to see the progress. One frequent stop is NASA as their presentations are technical, but they are interesting and well-presented. The following one-hour session demonstrated NASA’s need for GPU computing to advance one of their more exciting projects.
Applications of GPU Computing to Mission Design and Satellite Operations at NASA’s Goddard Space Flight Center
NASA’s goal is clear: They plan to study Magnetic Reconnection and the only way to have even a chance to do this is by using the earth as a laboratory. NASA will launch satellites this year to study this phenomenon. The budget for this project is a billion dollars and it has been going on for ten years.
One interesting application that may come from this research is to find Zero Energy Pathways that future spacecraft may be able use to save on fuel for inter-planetary and inter-stellar travel.
NASA plans to launch six satellites to study the magnetosphere and Magnetic Reconnection and these satellites flight paths have to be coordinated perfectly for 120 maneuvers over several weeks, or they will crash into each other or be ineffective.
The CPU platform cannot be used at all as it would take two weeks to compute a single ten day collision forecast whereas the GPU’s can do the same calculations in 20 minutes! We wish NASA success with their project.
We broke for a two hour lunch and we checked out the posters and the exhibits in the main exhibit hall. Then we headed back to the afternoon sessions which lasted until 6PM.
OpenGL: 2014 and Beyond
At every GTC, we make it a point to check up on the progress of OpenGL and DX rendering. This year, there is renewed interest in OpenGL as there have been major updates to it, and Valve’s new tool is getting some attention. Much of the discussion was quite technical, and it focused on the increasing options for debugging tools as well as very specific suggestions for Path Rendering and Subdivisions.
The future looks quite healthy for OpenGL and Tegra TK1 was heralded as very important to Android gaming. All of the rendering tools – and there are many available for OpenGL – will run on the K1.
There were many more sessions dedicated to OpenGL including programming workshops and there were plenty of places to get questions answered at the GTC. Since we have been following medical research that is supported by the GPU, we headed to our next session.
A GPU-Based Free-Viewpoint Video System for Surgical Training
This lecture was rather technical and presented by a medical doctor who is a researcher. Fortunately, he has a gift for being able to explain the complex in less technical terms.
Basically, the need is to set up a multi-video system for surgeons and for training. Multi-perspective views are critical for surgeons involved in laparoscopy.
This is not just any kind of system as it has to get true pixel synchronized HD video from multiple cameras with the caveat that it must also be low-cost.
This kind of real time processing requires the GPU and because of its requirements for parallel processing, CPU-based systems are impractical and impossible.
Here are their conclusions and what we might expect next.Since we are particularly interested in graphics, we wanted to follow up on Global Illumination and and advanced effects for gaming since the last GTC that we attended, two years ago.
Practical Real-Time Voxel-Based Global Illumination for Current GPUs
This is another very technical discussion although it was aimed at all audiences. It was primarily about lighting and the progress from single source direct lighting, to adding area lighting. Finally with Global Illumination, the scene becomes lit more realistically and HBAO+ adds even more. With Voxel-based Global Illumination, the volumetric structure of the scene becomes more obvious and real.
The problem is that Voxel-based GI is resource intensive and the rest of the course was devoted to its practical uses without impacting performance severely.
At this point, tips were given that the programmers in the audience loved. They learned that using 3D Clipmaps are easier to build than by using SVO, voxelization downsampling, voxelixation for emittance, multi-resolution voxelation, and implementing light-emittance algorithms.
The conclusion reached was that full Global Illumination is practical for a GTX 770 class of GPU while even Voxel-based Ambient Occlusion works well for a GTX 650 class of card while looking better than SSAO.
We see real progress. Fully integrated GI tools are planned to be integrated into Nvidia’s GameWorks.
And now we head to the next related session on our custom schedule.
Smoke & Mirrors: Advanced Volumetric Effects for Games
This presentation is almost a follow-up to the fire rendering that was co-presented by Pixar and Nvidia at the GTC in 2012. Back then, a “how to” was detailed and the evolution of fire rendering in movies and in gaming was discussed. Pixar gave their best tips on how to properly render fire convincingly for games using CUDA calculations and they made it ‘step-by-step’. They also showed how to use motion blur, heat and embers to make fires look really realistic in games. Embers along with noise add to the realism.
This time, the discussion focused on Volume Rendering as the highest quality option. Compositing was discussed and use of the boundary boxes were demonstrated. Heat Haze is another focus of fire to bring added realism and they demonstrated how to calculate its effects.
Some alternative looks were dismissed as compute intensive while others work well. Shadows and reflections are now emphasized in fire rendering and the techniques to get there were demonstrated.
The session was very technical but it is very clear to anyone in attendance that there has been incredible progress made in fire rendering that we will see in PC gaming. One thing that is reassuring is that a GTX 770 will have no trouble with these simulations as they have not yet been fully optimized. These tools are planned to be a part of Nvidia’s GameWorks and there is still more work to be done. The presenters also talked about the possibilities of implementing these effects server-side for multiplayer gaming.
After the sessions
There was a lot of note taking, and of course, these and all GTC sessions can be accessed on Nvidia’s web site by the end of April. Let’s hope we see even more realistic-looking fire in upcoming video games as a result. It’s time for dinner and we head to the exhibit hall bearing in mind the keynote for tomorrow is by Pixar.
Each day, after the sessions end at 6 PM, there was a networking happy hour and it was time to visit some of the exibitors until 8 PM. On the way to the exhibitors hall, one passed the posters on display. The many uses of the GPU were illustrated at the GTC by the posters displayed for all to see. There are some impressive uses of the GPU that affect our life.
Almost all of the posters are quite technical – some of them deal with extremes, from national security and tsunami modeling to visually programming the GPU and neural networks.
There were “hangout” areas where you could get answers to very specific questions. You could even vote for your top 5 favorite posters.
The best poster winner was announced at Thursday’s Keynote – “GPU Performance Auto-tuning using machine learning” by Tianyi David Han
All of the posters have in common using the massively parallel processing of the GPU.
Of course there is much more to the GTC and Nvidia’s partners had many exhibits that this editor just got a glimpse of. More on that later. But prominently set up was Nvidia’s GeForce exhibit and 4K Surround gaming.Of course, it takes TITAN Black Quad-SLI to power it.
We have barely scratched the surface of GTC 2014. Tuesday was a full day and there were still two days to go. It was time to head to the hotel room and we answered our forum members questions about the GTC on ABT forum.