Nvidia’s GTC 2012
This is the third time that ABT has been priviledged to attend Nvidia’s GPU Technology Conference (GTC). GTC 2012 was held May 14-17, 2012 in San Jose, California. The very first article that this editor wrote for AlienBabelTech covered Nvision 08 – a combination of LAN party and the first GPU computing technology conference. The second time was the following year at GTC 2009 when it was held in the Fairmont Hotel, also in downtown San Jose.
Each attedee at the GTC will have their own unique account of it. The GTC is a combination trade show/networking/educational event attended by 2800 people each of whom may have their own unique schedule and personal reasons for attending. All of them have in common their passion for GPU computing. This editor’s primary reason for attending this year was for the minority report – the interest in Kepler GPU technology primarily for gaming, and he was not disappointed.
Nvidia used the GTC to debut a great array of new technology since the new 28nm architecture Kepler generation of GPU processing power is much more than a simple upgrade over Fermi. Kepler is much more powerful as well as energy efficient, and Nvidia intends to use it to revolutionalize GPU and cloud computing, including for gaming.
Well, things have certainly grown since the first GTC in 2009 and the pre-GTC, Nvision08. Nvidia is again using the San Jose Convention Center for their venue just as they will do for 2013 – returning to their original venue for Nvision08. And this time it was also much bigger and it has been expanded to four days, up from their three-day GTC conference in 2009 held in the Fairmont Hotel. Not only that, the schedule was far more packed and this editor was forced to make some hard choices, to attend some of the sessions and to skip others.
The GTC at a Glance
Here is the GTC schedule “at a glance”. We took this image on Monday as it was mostly set aside for the hard-core GPU programmers and the press orientation. After that, there was always a crowd surrounding this sign. We will take a look at each day that we spent at the GTC.
Monday
We arrived Monday afternoon, flying from Palm Springs to San Francisco airport in just over an hour. The temperature in Palm Springs was 105F and San Francisco was 55F! San Jose, 35 miles inland, is more moderate with temperatures for the entire 5 days that were perfect, with daytime highs in the 70sF.
After picking up the press pass and a 180 page GTC Program Guide and many pages of press-related material, this editor returned to the room so as not to look like a GPU tourist. It’s bad enough to feel like one is in college again. Nvidia treats their attendees and press well, and gives a nice insulated bag included with the $900 all-event pass to the GTC. The press gets in free. A cool commemorative GPU Technology T-shirt was included with the bag and it is useful as it is large enough for grocery shopping for years to come. These are practical items. This editor still has his Nvision08 T-shirts and insulated stainless steel mug from the 2009 GTC.
Monday was set aside for the hardcore programmers and the developer-focused sessions were mostly ‘advanced’. There was a poster reception between 4-6 PM and anyone could talk to or interview the exhibitors who were mostly researchers from leading universities and organizations who were focusing on GPU-enabled research. The press had an early 6-9 PM evening reception at the St. Claire Hotel across the street and this editor got to meet some of his friends from past events and also made new friends. Mostly the smaller hardware sites like ABT did not attend the GTC. The GTC is also all about networking. There are those looking for capital for starting or building businesses, and those with venture capital to lend, and of course, advertising. Most of all, it was for sharing information to advance GPU programming.
There were dinners scheduled and tables reserved at some of San Jose’s finest restaurants for the purpose of getting like-minded individuals together. And discussions were scheduled at some of the dinners; some of them were devoted to discussing programming while others talked business, or just enjoyed the food.
ABT forum members and readers are particularly intersted in the Kepler architecture as it relates to gaming and we knew that very little would be said about the GK110 as a gaming chip. However, we were not disappointed as Kepler is definitely oriented toward gaming, graphics and computing. And Jensen’s Keynote on Tuesday reinforced Nvidia’s committment to gaming and a new foray into Cloud Gaming as Kepler is the world’s first Virtualized GPU built to compensate for latency issues inherent in today’s online streaming games.
Tuesday featuring Jensen’s Keynote
The Keynote speech delivered by Nvidia’s superstar CEO Jen-Hsun Huang (aka “Jensen”) highlighted the rapid growth of GPU computing from humble beginnings. What was really surprising is Nvidia’s foray into cloud computing – including gaming – where “convenience” is the new buzzword. Just as the 2009 GTC covered the then upcoming 40nm GF100 Fermi architecture, the 2012 GTC was all about the new 28nm Kepler GPU. It is a huge advance in effficiency in performance per watt over Fermi and it will be the foundation for Nvidia’s new cloud initiative based on Kepler architecture.
Kepler Hardware
K10 is the Tesla version of the GTX 690 – a dual GPU with single precision that is directed at Computing and it especially has uses for oil and gas research, national defense including signal and image processing, and industry. Here is one example out of many that shows the uses of GPU calculations for NASA.K10 will be available next month and a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver a combined performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.
Available later in Q4 this year will be Nvidia’s flagship GPU based on GF110 Kepler. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee as well as other supercomputers. Here is the chip itself – 7.1 Billion transistors, the most complex piece of silicon anywhere!
Tesla’s K20 should have three times the double precision capabilities of K10. Here is the K20 as it should look when it is released in Q4’s timeframe of November-December:
Of course, Nvidia won’t speak publicly about it’s gaming GPU based on GK110, but it is logical that it will be released after the professional market is satisfied, probably early next year. TSMC still has less capacity and much less production than Nvidia would like as it appears that they are easily selling out of every GTX 690, GTX 680 and GTX 670 that they can make.
Nvidia’s CEO delivers the Keynote that defines the GTC
Nvidia’s CEO Jen-Hsun Huang (aka “Jensen”) delivered the keynote to a packed hall and he set the stage for the entire conference. He is a superstar in his own right and the $100 replicas of his leather jacket that he wore on stage sold out at the Nvidia store within an hour of the keynote’s ending.
Here is one of Nvidia’s photos that shows the press in front at the tables and the rest of the audience is also paying rapt attention. This editor is in the audience and in this picture.
Jensen began by showing the incredible growth of GPU computing and how it is changing our lives for the better. From humble beginnings, it has grown significantly since the first Nvision08 just four years ago. CUDA is Nvidia’s own propritary GPU language which can be considered similar to x86 for CPU. From 150,000 CUDA downloads and 1 Supercomputer in 2008 to 1,500,000 CUDA downloads and 35 supercomputers today; and from 60 universities and 4,000 academic papers to 560 universities teaching CUDA and 22,500 academic papers in just 4 years!
While Nvidia offers support for OpenCL, they say that they are not seeing any shift to OpenCL even though OpenCL gives developers a much more cross-platform approach. No one except AMD appears to be waiting for the OpenCL tools to evolve or even for Intel to get tools out there for its own multi-core MIC processor. Nvidia has created the tools and programmers are excited to use them.
Convenience
One of the trends noted is that companies no longer supply notebooks or devices for employees any more than than a “company car”. The trend is for employees to Bring Your Own Devices (BYOD) to work. Of course, this means that all of the devices that an employee must use have to be configured by the company’s IT department and they must be configured to work together securely.
Nvidia is going to be at the forefront of this with their new initiative with Kepler – the first GPU that can be “virtualized” to drive cloud computing. Cloud computing is simply convenient.
It is a great advantage to be able to use any device seamlessly and the important thing is that excellent graphics can be delivered to any device – now with about the same amount of latency that a gaming console has. This has application for business including for gaming.
Kepler is Nvidia’s first “Virtualized GPU” which allows end users to use any device from anywhere allowing the same excellent graphics on all devices. This allows data centers to be driven from Kepler GPUs as the applications will reside in the cloud no longer requiring the applications to reside on individual devices. The PC then itself becomes an application.
Joining Jensen on stage at GTC’s Day One Keynote were executives representing Nvidia’s partners supporting these cloud technologies. They included: David Yen, general manager and senior vice president of the Data Center Group at Cisco; Brad Peterson, technology strategist, and Sumit Dhawan, group vice president and general manager, at Citrix; David Perry, CEO and co-founder of Gaikai; and Grady Cofer, visual effects supervisor at Industrial Light & Magic.
A practical way the virtualized GPU can be used for business was illustrated when Jensen invited Grady Cofer of Industrial Light & Magic on stage. The problem that Grady explained, exists now when he tries to demo movie clips for a director. No matter how many shots he loads up his PC with to demonstrate clips, there is never enough flexibility nor enough storage on his local machine. However, by using Nvidia’s GRID, he is able to instantly access his server and do anything that he could do from his own office – both remotely and securely.
He demonstrated how he did this with the upcoming “Battleship” and also with “The Avengers”:
Of course, gaming can also benefit by having the application reside in the cloud. No longer do gamers have to wait to download anything. They just get connected and start playing – on any device – and with the same excellent graphics across all of the devices. Kepler has taken care of the latency issues.
The idea behind using the cloud is that movies are convenient. Movies simply work on any device and games should also. Jensen looked forward to the day when anyone with broadband can have a game subscription to a gaming service similar to what Netflix provides for movies, and perhaps even at a similar monthly price. It is called the GeForce GRID and it will be implimented by Nvidia’s partners in various forms.
Jensen invited Gaikai’s CEO onstage and they explained that latency should not be an issue considering that it takes light only 100ms to circunvent the globe at the equator. Much more was revealed at the question and answer session with the press afterward which we will cover later in our article today. Even Nvidia’s Project Denver was mentioned.
Nvidia’s CEO Jensen, Chief Scientist Bill Dally, Jeff Brown, and Robert Sherbin took live questions from the press.
There were some good questions including, “who owns the GeForce GRID?” However, before we check out the Q&A session with the press, let’s look at the new initiatives that Kepler will support as outlined in Jensen’s keynote presentation. We will check them out one at at time beginning with Kepler Virtualized GPU especially as it relates to Industry, Cloud gaming, and specifically to the new GeForce GRID.
KEPLER as a VIRTUALIZED GPU
Jensen’s keynote revealed Nvidia’s VGX platform which enables IT departments to deliver a virtualized desktop with the graphics and GPU computing performance of a PC or workstation to employees using any connected device. Using the Nvidia VGX platform in the data center, employees can now access a true cloud PC from any device regardless of its operating system, and enjoy a responsive experience for the full spectrum of applications once previously reserved for the office PC. It even allows outsourcing across continents seamlessly as a Citrix slide shows.
Nvidia’s VGX enables knowledge workers for the first time to remotely access a GPU-accelerated desktop similar to a traditional local PC. The platform’s manageability options and ultra-low latency remote display capabilities extend this convenience to those using 3D design and simulation tools, which had previously been too expensive and bandwith-hungry for a virtualized desktop, to say nothing of latency issues.
Citrix had an interesting presentation. They offer many options for this kind of desktop virtulizations for high end graphic designers and users. They even have lossless imaging for medical uses available.
Kepler can only improve on Fermi and Critix is welcoming Nvidia’s new cloud initiative. Early tests are showing that using Fermi, 2MB/s of bandwidth were needed, but now with more efficient codecs and Kepler, only 1.5MB/s are required for the same results.
Integrating the VGX platform into the corporate network also enables enterprise IT departments deliver a remote desktop to employees own devices, providing users the same access they have on their own desktop terminal. At the same time, it helps reduce overall IT costs, improves data security and minimize data center complexity.
Nvidia’s VGX is based on three key technology breakthroughs: the (1) VGX plaform and boards, the (2) GPU Hypervisor, and the (3) User Selectable Machines (UCMs).
Nvidia’s VGX platform and boards
Nvidia’s VGX boards are the world’s first GPU boards designed for data centers and they enable up to 100 users to be served from a single server powered by a single VGX board. This is pretty impressive compared with traditional virtual desktop infrastructure (VDI) solutions and it expands this kind of service to far more employees in a company in a much more cost-effictive manner. It reduces current latency issues, sluggish interaction and limited application support which are associated with traditional VDI solutions.
Working together across continents securely and seamlessly is possible today and can only get better with Kepler architecture as the latency is reduced further as this Citrix presentation slide shows.
The initial VGX board features four GPUs, each with 192 CUDA architecture cores and 4 GB of frame buffer. This initial board is designed to be passively cooled and easily fits within existing server-based platforms.
NVIDIA VGX GPU Hypervisor
The Nvidia VGX GPU Hypervisor is a software layer that integrates into a commercial hypervisor, enabling access to virtualized GPU resources. This allows multiple users to share common hardware and ensure virtual machines running on a single server have protected access to critical resources. As a result, a single server can now economically support a higher density of users, while providing native graphics and GPU computing performance.
This new technology is being integrated by leading virtualization companies, such as Citrix, to add full hardware graphics acceleration to their full range of VDI products. From the Citrix presentation:
NVIDIA User Selectable Machines (USMs)
USMs allow the VGX platform to deliver the advanced experience of professional GPUs to those requiring them across an enterprise. This enables IT departments to easily support multiple types of users from a single server.
USMs allow better utilization of hardware resources, with the flexibility to configure and deploy new users’ desktops based on changing enterprise needs. This is particularly valuable for companies providing infrastructure as a service, as they can repurpose GPU-accelerated servers to meet changing demand throughout the day, week or season.
Citrix, HP, Cisco and many other virtualization companies are on board with Nvidia for this project which is being deployed later this year and additional information is available at www.nvidia.com/object/vdi-desktop-virtualization.html.
Cloud gaming
Nvidia’s virtualization capabilities allow GPUs to be simultaneously shared by multiple users. Its ultra-fast streaming display capability eliminates lag, making a remote data center feel like it’s just next door. And its extreme energy efficiency and processing density lowers data center costs. Just as Citrix has branch repeaters for reducing latency in mission critical industrial application, similar things can be done for less than ideal latency connections.
The gaming implementation of Kepler cloud technologies, Nvidia’s GeForce GRID, powers cloud gaming services. Gaming-as-a-service providers will use it to remotely deliver excellent gaming experiences, with the potential to surpass those enjoyed on a console.
With the GeForce GRID platform, service providers can deliver the most advanced visuals with lower latency, while incurring lower operating costs, particularly related to energy usage. Gamers benefit from the ability to play the latest, most sophisticated games on any connected device, including TVs, smartphones and tablets running iOS and Android.
The key technologies powering the new platform are Nvidia’s new Kepler GRID GPUs with dedicated ultra-low-latency streaming technology and cloud graphics software. Together, they fundamentally change the economics and experience of cloud gaming, enabling gaming-as-a-service providers to operate scalable data centers at costs that are in line with those of movie-streaming services. Previously it required one GPU for each player.
Kepler architecture-based GPUs enables providers to render highly complex games in the cloud and encode them on the GPU, rather than the CPU, allowing their servers to simultaneously run more game streams. Server power-consumption per game stream is reduced to about one-half that of previous implementations, an important consideration for data centers. And two users can play off of one Kepler GPU in the cloud now compared to a one to one ration previously.
Fast streaming technology reduces server latency to as little as 10 milliseconds by capturing and encoding a game frame in a single pass. The GeForce GRID platform uses fast-frame capture, concurrent rendering and single-pass encoding to achieve ultra-fast game streaming.
The latency-reducing technology in GeForce GRID GPUs compensates for the distance in the network, so gamers may feel like they are playing on a high end gaming PC located in the same room. Lightning-fast play is now possible, even when the gaming supercomputer is miles away.
Nvidia and Gaikai demonstrated a virtual game console, consisting of an LG Cinema 3D Smart TV running a Gaikai application connected to a GeForce GRID GPU in a server 10 miles away. Instant, lag-free play was enabled on HAWKEN, an upcoming Mech PC game, with only an Ethernet cable and wireless USB game pad connected to the TV. The GRID has captured the endorsement of several of these cloud providers.
For more information about GeForce GRID, please visit: http://www.nvidia.com/geforcegrid.
Q & A Session with the press
There was a long line waiting to ask questions about Jensen’s keynote and about Kepler
Bill Dally was hired as Nvidia’s chief scientist before Fermi was released. Formerly Stanford University’s computer science department chairman, Bill Dally was hired as chief scientist and vice president of Nvidia at the beginning of 2009. He has a really strong academic background in GPU programming and his impact on Nvidia’s architecture is going be felt even more in the future.
The question and answer session was clearly unscripted and it was made clear that Nvidia’s partners, the ones providing the cloud gaming service actually “own” the GeForce GRID. Jensen indicated that Nvidia was not considering running their own gaming cloud currently but ruled nothing out. Latency was going to become much less of an issue with Kepler and even OnLive was evaluating Kepler for their own service. The possibility was mentioned of using a home server using GTX 680s to stream wirelessly great gaming graphics to any room of the house and to any device!
Pricing was not given for either the K10 or the upcoming K20 and we still have no word yet. Tegra 3 and 4 were mentioned in passing and even Project Denver was given a quick nod but nothing of real interest could be gleaned about them at the Q&A session other than everything was progressing well.
There was also some humor involved when one programmer asked, “why is it taking so long?” And another asked after seeing the simulation in the keynote, “what is Nvidia doing in view of our Milky Way galaxy colliding with Andromeda?” (in 3.5 billion years). Jensen answered, “I for one am busy making plans”.
A question was asked if it was cheaper to buy the Kepler hardware or to use the cloud. And apparently the cloud is not a cheap alternative – it makes for convenience and for using all sorts of devices and for more employees to be able to access it seamlessly. They also talked about K20, the GK110 GPU as the most complex piece of silicon ever developed at 7.1 billion transistors compared to the dual-GPU K10 with 6.8 billion (total) transistors. Power usage was stressed and Kepler is much more efficient than Fermi; to say nothing of the 20 to 1 power one must use to get comparable calculations out of an all-CPU server.
What about the GTX “780”?
There were a lot of questions left unanswered and this editor was able to get his question answered by a Nvidia official afterward. The question was naturally about the future GeForce GPU that would be based on the GK110 – the video card using a 7.1 Billion transistor GPU. Of course, Nvidia doesn’t discuss unreleased products, but it was made clear that there would be such a card after the demand for Tesla and Quadro were met.
This conference is not about gaming GPUs and it won’t be mentioned in the Whitepaper, but look very carefully at the die. It is obvious that there are 5 Graphics Processing Clusters (GPCs) and 3 SMX modules per GPC. A GPC is constructed like a “mini GPU” which contain SMXs; two in the case of GK104 and three in the case of GK110. Here is the Kepler GK110 architecture whitepaper which may be downloaded as a .pdf
http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
Since we know that the T20 will be released in the November/December time frame, it is logical that the new Nvidia flagship gaming GPU will be after that. So you can rest assured that the GTX 690/680 will be the fastest gaming GPUs for the next six to nine months. At any rate, we will take a deep dive into Kepler architecture in Wednesday’s session.
More Tuesday Sessions
There were many more sessions at the GTC that are quite technical and dealt with CUDA programming. Our earliest session before the keynote involved fire rendering and was co-presented by Pixar and Nvidia. A “how to” was described and the evolution of fire rendering in movies and in gaming was discussed.
Pixar gave their best tips on how to properly render fire convincingly for games using CUDA calculations and they made it ‘step-by-step’.
They also showed how to use motion blur, heat and embers to make fires look really realistic in games. Embers along with noise add to the realism.
And of course, their results were displayed.
There was a lot of note taking and of course this and all GTC sessions can be accessed on Nvidia’s web site. Let’s hope we see a lot more realistic-looking fire in upcoming video games as a result.
After the keynote and Press Q&A, we had a quick lunch then we managed to catch “Graphics in the Cloud” by Nvidia’s Will Wade where he discussed the cloud visualization and the technology behind it. Afterward, we ran for a CUDA compute session where we learned about shared memory and how to use it efficiently and that planning was the most critical part of any programming. The CUDA programmers were encouraged to know their limitations and to plan out on a whiteboard if necessary, so as to be able to write CUDA code efficiently.
Nsight the Visual studio was highlighted and the details were unfortunately lost on this editor.
After the Citrix presentation described earlier, we headed to a large session that discussed the 5 years of CUDA and the progress that was made. And this got technical as usual but it was easy to see the incredible interest and how disruptive that GPU programming has been to the industry, changing the way we do things by speeding up calculations a magnitude faster than previously workable on the CPU.
After these sessions ended at 6 PM, there was a networking happy hour and it was time to visit some of the exibitors until 8 PM. On the way to the exhibitors hall, one passed the posters on display. Many uses of the GPU were illustrated at the GTC by the many posters displayed for all to see. There are some impressive uses of the GTC that affect our life.
Almost all of them are quite technical – some of them deal with national security and this one involves using the massively parallel processing of the GPU for lunar research:
Of course there is much more to the GTC and Nvidia’s partners had many exhibits that this editor just got a glimpse of (note the oxygen bar to the left). We got to interview IOFusion’s CEO about some amazing products that put the SSD to shame for speed and storage and will be making it to a gamer’s PC this year. More on that later.
Some of the exhibits are quite whimsical, like Zoobe
However, all of them use the GPU to accelerate applications, like Scalable – across huge multiple displays seamlessly, using projectors.
Even the upcoming 60,000 dollar all-electric Tesla Model S sedan was highlighted. It features a 17″ in dash Nvidia-powered display:
We have barely scratched the surface of GTC 2012. Tuesday was a full day and there were still two days to go. It was time to head to the hotel room where we had begun to write our first day analysis and we answered our forum members questions about the GTC on ABT forum.
Wednesday
There were plenty of programming sessions Wednesday morning. At 10:30 AM, Professor Iain Couzin of Princeton University Department of Ecology and Evolutionary Biology kicked off the Keynote address of Day 2 at GTC 2012. He is one of the first educators who had the foresight to realize the potential of GPU computing for his line of work and he spent two years porting all of his research tools to CUDA long before it became popular to do so.
He believes it was the best investment in time because he is now able to do in minutes on the GPU what used to take weeks to accomplish on the CPU. He detailed his early years as a researcher beginning with a regular GeForce gaming video card and then migrating to Tesla when he had the money to do so.
What he is looking for is the patterns in nature of collective behavior. He was able to demonstrate the similarity of animal groups – think of huge flocks of birds in flight or fish in huge schools that function as a sort of “collective mind”. There is no telepathy involved although it was believed to be so not that long ago. Even invading cancer cells in a tumor – or colliding galaxies – and humans, seem to exhibit similar collective behavior in patterns that can be charted using the GPU.
What GPU computing has allowed Dr. Couzin to do is to simulate thousands of individals in an experimental framework and to track their collective behavior. Dr. Couzin was able to demonstrate how collective behavior and collective action emerges in a wide range of groups – from fish and birds to plague locusts and even to humans crowds.
He explained how important it is for a group to align and yet not collide – as collision can be fatal in birds. They also need to be able to avoid predators. These are the patterns in nature that are considered models.
One of the most interesting findings is that certain individuals that have information (food, perhaps) influence the group that does not have this information. There is an interesting democritization going on that has implications for human behavior.
The experiment goes to show how the leaders (informed individuals) influence a group and how their influence is mitigated by uninformed individuals. He again stressed how important CUDA is because they want to study thousands of individuals – not just a few. And of course, the actions of predators on the group – or in the case of humans, actors to disrupt a group – is important to track and only GPU computing can do it in real time.
This has implications for tech forums. Professor Couzin made a surprising discovery that counters conventional wisdom that uninformed humans are more easily influenced by extremists. Instead, his findings suggest that the presence of those without strong views increase the odds that a group will go with the majority opinion. Uninformed individuals in a group are very important as they dilute a minority preference with strongly held preferences, and they tend to support the majority.
He gave an interesting example of a person stating on a forum saying that he will buy a Radeon. A person with an strongly held opposite view might interact with this person, suggesting that it is a stupid decision. One person may give this individual pause but is unlikely to change his mind. However when two or more individuals suggest it is a bad decision simultaneously, it may have a much stronger effect on the original purchasing decision
Of course there is a lot of mathematics involved, but it may be expressed as a chart:
Below a critical density of uninformed individuals, the minority with strongly-held opinions can easily win. However, when there is a higher density of uninformed individuals, not much happens until suddenly the situation flips and the majority wins. When there is a sufficient amount of people with no bias, the minority cannot win no matter how intransigent they are as demonstrated in the following model.
According to this research, uninformed individuals tend to promote democracy in animals, and of course this suggests further researcher with humans. Professor Couzin went on to suggest that humans interaction is not as complex as we like to think.
There is much more that the Professor presented and he also went on to explain why Locusts migrate to become a plague.
You can catch his entire keynote here:
http://smooth-las-akam.istreamplanet.com/live/demo/nvid4/player.html
Then it was time for lunch and back to the networking/exhibit hall
And of course, there were two-hour sessions on each day for the exhibitors, Nvidia and their partners, to show 0ff their products and GPU-related technology. Many big names and also very small new startups were represented. At 2 PM this editor headed for “Inside Kepler” to listen to co-presenters, Stephen Jones and Lars Nyland of Nvidia. Unfortunately, ABT was unable to attend the Emerging Companies Summit Fireside Chat with Jensen because it was held at the same time.
Inside Kepler
This is the deep dive into the GK110 architecture that we touched on earlier. The differences between Fermi and Kepler architecture was highlighted and the advantages of Kepler was stressed. We are not going to spend a lot of time on it as it is incredibly detailed.
Here is the link to the video of Inside Kepler which includes the Q&A session for a total of 90 minutes:
http://smooth-las-akam.istreamplanet.com/live/demo/nvid5/player.html
Nvidia said they were stuck with performance on Fermi as they could not increase power any further. Their engineering goal was to make a more efficient chip that was even more fully-featured for programming. Kepler is the result and the performance of this 7.1 billion transistor chip is impressive.
It was stressed that Kepler was redesigned to do much more programming than Fermi and the clock speed was dropped so as to make it more efficient. The SMX architecture was redesigned to be far more complex so the clockspeed needed to be slower.
To take advantage of Kepler’s expanded programmability, there are new instruction sets.
Shuffle instructions have been simplified, and it is easier to exchange instructions more efficiently with Kepler than Fermi. The more advanced sessions actually showed how to do this by examples. Atomic Operations have also been significantly improved with a performance factor of 2x-10x over Fermi.
So the code needs to be relatively complex on Fermi without Atomics; now see the simplification of code and timesaving with Kepler’s high speed Atomics:
The examples kept coming of how Kepler was going to speed up and improve programming over Fermi. Next, they demonstrated the incredible crunching power of Kepler’s GK110 using all of the extra registers, improved shuffle and floating-point code.
This galaxy demo simulation was run in Jensen’s keynote and again in this session using real astrophysics code to show what will happen beginning 3.5 billion years in the future.
Next up was the topic of Kepler’s dynamic parallism. In other words, it is the GPUs ability to do work without waiting for the CPU.
This will change how programmers program.
Your program can now be adjusted dynamically and automatically saving the programmer a lot of time as more work is being done on the GPU with less dependence on the CPU. Nested parallism thus becomes possible on the GPU freeing up the CPU for other tasks. And of course, there were a lot of architectural improvements in Kepler over Fermi that makes this possible.
Instead of a straight feed-forward with Fermi, we now have a feedback unit and management to keep track of what normally had to be taken care of by software. Also, with Fermi, processes cannot share the GPU so that they had to be run one at a time. With Hyper-Q, all of the streams can launch at once instead of queueing up in the pipeline.
This is Fermi.
And this is Kepler. True multi-processing can occur in a single CUDA program.
Programming has been greatly enhanced with Kepler. The session then went to 1/2 hour of questions and answers. Anyone interested in Kepler architecture should consult the whitepaper which can be downloaded from Nvidia’s site as a .pdf
http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
After that, it was time to head back to the hotel room to finish the news bit on the GTC then off to the networking exhibit halls again and finally across the street to the GTC party where jugglers provided the main entertainment – even tossing running chainsaws to each other! We didn’t stay too late as Thursday would be our busiest day with a full session schedule.
Thursday
Our day started early as usual with a continental breakfast in the press lounge then we hurried off to our first Thursday session presented by Fusion-io Storage at 9AM. The beginning of the session was aimed primarily at the people in the movie industy as it was pointed out that the upcoming ‘The Hobbit” is not only filmed in 5K resolution, it was filmed at 48 fps instead of the usual 24, plus it is in 3D! All of this requires a lot of fast storage capacity at about 6GB/s.
Clearly current storage including use of SSDs has latency and other issues which tend to throttle an otherwise fast GPU-based system. Fusion-io Storage uses PCIe attached flash-based memory for image compositing, editing, video playback, 3D content creation and other data intensive tasks to save massive amounts of time. Here is an example using a small PC and two of these devices to concurrently present a dozen movies.
At the Fusion-io Storage oxygen bar ABT caught up with the presenter and got to look at the little machine powering this wonder.
It is based on a pair of add-in PCIe flash-based Fusion-io Storage solutions that were just released for $2495 each which are running in dual-SLI configuration. Last year, the prototypes cost $17,000 which means that this technology may be coming to a gaming PC later this year at a much more reasonable price.
Fusion-io Storage is meant to completely eliminate the bottleneck in memory.
Here is the comparison with flash and the evolution of their device.
Fusion-io Storage has been around for years and they even partnered up with Samsung back in 2007. They have some pretty mature products for industry and some of them will be heading for the mainstream soon.
After that session, we barely had time to make it to SeeReal’s presentation. They are doing interesting research on sub-hologram processing – while still intensive and depending on the GPU for computing – is still far less intensive than traditional hologram generation. You would need a Supercomputer to generate the same result with traditional methods that SeeReal can manage with a single GPU
SeeReal’s method uses off the shelf graphics hardware and this pipeline
SeeReal shows their pipeline
They even covered transparancy in holgrams and all of this is because of GPU-processing
SeeReal revealed their new holographic prototype
And the outlook is good.
Next up was the Last Keynote was “Not your Grandfather’s Moon Landing”.
http://www.gputechconf.com/gtcnew/on-demand-gtc.php
“Part-time scientists” consist of about 100 engineers, researchers and scientists including former members of the Apollo mission, gathered as a group to meet Google’s challenge to send a robot to the moon. Not your grandfather’s mission to the moon means that we can no longer depend on the dangerous task of sending humans into space to navigate and to land for us. It must all be done remotely and this requires extreme computing capabilities under the most demanding conditions and tightest of quarters.
Google’s incentive of a $30M grand prize requires the winner to send a robot to the moon with a “precise and soft landing” and that the robot must navigate and send HD data back from the moon. An additional prize of 5 million dollars is awarder if the team can drive the robot 500 meters in these harshest of conditions on the moon’s surface while collecting and transmitting data. Soft landing means having a lander and a robot survive a 10′ drop on the surface in a predetermined landing place.
These Scientists are using a former Russian made ICBM to deliver the lander and robot. Bandwith becomes a factor with HD video and so does minituirization and extreme hardening of the hardware to protect it from the extremes of space. And Asimov is the name of the GPU-driven Lunar Rover.
The team picked the landing site of the Apollo 17 as it was a good choice for the original landing. The remains of the original Apollo Rover is there and it is just off of the moon’s equator, an excellent place to use solar panels. Temperatures on the moon can be plus or minus 160C degrees but at the Apollo site it is slightly less harsh, plus or minus 125C. For 1 lunar day (14-1/2 earth days) the temperatures are a constant +125C. For the hardware just to survive, it must use passive cooling with heatpipes for all of the electronics including the minature stereo 3D HD cameras.
The lander in orbit needs to use the GPU for simulations since exact initial values cannot be correctly and accurately determined in advance, including calculating millions of non-differential equations. This must be done in real time where dynamic adjustments cannot be done by humans as the Apollo astronauts were able to do – but all of the calculations must be done on the GPU as the CPU is simply too slow.
R0 is the team’s mini test vehicle which is driven by using an Android tablet, and it was actually given away to a lucky attendee at the GTC. There is a 3 second delay on the moon and the team has to practice driving and steering this representation of the Asimov on earth. The extra $5 million prize for going 500 meters is quite an incentive.
Since the team’s moon rover is in incredibly harsh conditions in +125C conditions, it must drive itself without stopping all the while it has to avoid obsticles. It thus becomes the first real-time autonomous moon rover running on GPUs. Consider that the Mars rover actually had to pause for 60 second to calculate where to go next – that pause would be fatal to Asimov; it needs to keep moving and only GPU calculations in real time iare the only practical way to proceed.
This keynote session was an impressive demonstration of the need for super-fast massively parallel computing only the GPU can provide. Without it, this project simply could not get off the ground.
More GPU Computing Sessions
We weren’t finished. After lunch and a meeting with Fusion-io Storage and the last networking/exhibit of the event, we were off to find out about mixing Graphics and Compute with multi-GPU. This session was hosted by Nvidia.
GPU programming was stressed and helpful examples given. And if one GPU is not enough to program with, multi-GPU can be used to speed processing up.
It was another programming session that was rated “Beginner” but we found the information rather technical and by the reaction of the audience, very useful for programmers.
GPGPU GAMES
Our final session at the GTC was an interesting one dealing with techniques for designing GPGPU games. Traditionally, games process most of their tasks using the CPU and only using the video card for graphics.
The GPU is responsible for the game logic including the game physics and enemy NPC behavior.
For GPGPU games, the game logic is divided up.
The goal is to detail all of the game logic on the GPU.
Here is a specific example of AI.
The Neighborhood Gathering is a novel way to gather neighbors of in-game entities.
A sorting mechanism has to be implemented,
Some novel techniques to integrate AI Behavior with Physics.
Using this framework, they can implement far more entities in real time than either using either the CPU or the GPU.
They had a simple game to demonstrate.
And it turned out to be quite impressive.
There were many questions from the audience about applying these techniques to programming games and especially if they would work with complex games. The speakers were surrounded after the session and the questions went on for some time as we headed back to our hotel room to pack.
We will finish off with a look at some more of the networking sessions. At 6 PM, the GTC was over.
Networking/exhibits
In the pole position is Fusion-io Storage and their very popular oxygen bar.
Nvidia had a large section.
Gaming was featured and MainGear supplied the PCs.
Here is the Tesla S Sedan with a 17″ display provided by Nvidia.
All of Nvidia’s major partners in GPU computing was represented including ASUS ..
… and IBM.
Microway …
. . . and Sharp . . .
. . . And SuperMicro . . .
PNY was featuring their professional and consumer graphics including their new SSD line-up. ABT will be posting PNY news and perhaps evaluating their hardware now.
Everywhere we see networking. Appetizers were served and beer was available.
Raytrix and their 3D Light Field Video was highlighted.
But something is missing in the above photo. We have beautiful technology but it is cold. In contrast, look at the next picture.
We are reminded that the GTC is all about people. Helpful people. People with a passion for GPU computing and desire to share and learn. The GPU cannot yet replace face-to-face human contact.
Ubitus is a cloud-gaming service similar to Gaikai that is popular in the Far East. Their challenge is to break into the West and to deal with our comparatively sub-standard broadband.
Even Zoobee uses the GPU for their greeting cards which synthesize human voice. All of the above is possible because of a brand new GPU computing industry that began in 2009 with Nvidia’s Fermi. And now we see the incredible efficiency, promise and progress that Kepler brings to GPU compute and to gaming.
The final statistics on the GTC showed that 2,800 people from 48 countries attendeed which is double the attendence of the 2009 GTC. Of course no one could see all 340 sessions. There were 200 volunteers which made it possible. Our thanks to Nvidia for inviting ABT to the 2012 GTC.
Friday
Friday we packed up early and headed for the airport. It was the quickest TSA line ever and there wasn’t even havy traffic on the way to SF airport. We said good-bye to our adventure at the 2012 GTC and we hope we can return next year. It was an amazing experience and it is really an ongoing revolution that started as part of Nvidia’s vision relatively few years ago. It is Nvidia’s disruptive revolution to make the GPU “all purpose” and just as important as the CPU in computing. Over and over their stated goal is to put the massively parallel processing capabilities of the GPU into the hands of smart people.
Here is our conclusion from the 2009 GTC
Nvidia gets a 9/10 for this conference; a solid “A” for what it was and is becoming and I am looking forward to GTC 2010! As a plea to them, make next year’s conference schedule less hectic and definitely make it longer. Kudos for not dumping us into San Jose rush hour traffic at 3 PM, as last year. This editor sees the GPU computing revolution as real and we welcome it!
Well, the GTC 2012 gets another ‘A’. Nvidia has made the conference longer but not any less hectic. Next year, attendees can look forward to another 4-day conference March 19-22, 2013 at the expanded San Jose Convention Center.
Our hope for future GTCs is that Nvidia can make it a “spectacular” like they did with Nvision08 – to bring the public awareness of GPU computing to the fore by again highlighting the video gaming side of what their GPUs can do.
This was a relatively short and very personal report on the GTC. We have literally a hundred gigabytes of untapped information in raw video and hundreds of pictures and sessions that did not make it into this wrap-up. However, we shall continue to reflect back on the GTC on ABT Forum until the next one.
Mark Poppin
ABT Editor-in-Chief
Please join us in our Forums
Become a Fan on Facebook
Follow us on Twitter
For the latest updates from ABT, please join our RSS News Feed
Join our Distributed Computing teams
- Folding@Home – Team AlienBabelTech – 164304
- SETI@Home – Team AlienBabelTech – 138705
- World Community Grid – Team AlienBabelTech
LoL, what was that Zoobe thing – in that little picture on the 3rd page – for that hardcore nerd to insert something into this “alien” enclosure?
http://www.zoobe.com/