It is an exaggeration to say that deep learning requires the cloud, but the standard way to do the training part of deep learning has become the cloud, especially servers equipped with NVIDIA GPUs. EDA is involved in deep learning in several different ways. For a start, those NVIDIA GPUs have to be designed, and it turns out that NVIDIA is one of Cadence's customers for its Palladium emulators. But deep learning and AI has made chip companies interesting again from an investment point of view, and there are a few hundred chip startups in the space (all of whom, I'm pleased to report, require EDA tools). Summit It's not exactly your typical cloud server, although it does have NVIDIA GPUs, but because I've not written about this before, I'll drop it in here. The fastest supercomputer in the world is, once again, in the US. For the last few years, the top supercomputers have been in China. The top 3 today are below, with the dates in parentheses showing the periods when they were #1): IBM Summit (Oak Ridge National Laboratory United States, June 2018 – present) NRCPC Sunway TaihuLight (National Supercomputing Center in Wuxi China, June 2016 – November 2017) NUDT Tianhe-2A (National Supercomputing Center of Guangzhou China, June 2013 – June 2016) For all things to do with ranking supercomputers, there is a list, the top 500. I wrote about that in my post Supercomputers last year (it mentions Summit as being under construction). As I said, this is not a typical cloud server, it is more like a highly specialized standalone cloud datacenter on its own. And it looks really cool, as you can see. Each node in Summit has 2 22-core IBM Power9 CPUs and 6 NVIDIA Tesla V100 accelerators. But there are 4,608 of these nodes (that number turns out to be 4,096 + 512, so it's not quite as weird as it looks to computer scientists who only count in powers of 2). So I make that 202,752 Power9 cores, and 27,648 NVIDIA Volta GPUs. There's also a little memory...10 petabytes, along with 250 petabytes of storage (I'm assuming flash-based). Peak performance 200 petaflops. The DoE is planning an exaflops level machine for 2021. Cloud and EDA At the Design Automation Conference, we announced Cadence Cloud. There are various flavors of this, but they all allow Cadence's tools to scale to new levels using the leverage of the cloud. Some tools are "cloud-ready" in the sense that an individual tool can scale to hundreds or thousands of cores. For example, see my post Pegasus Flies to the Clouds , about Pegasus Physical Verification. Another approach is to use large numbers of cores to run a huge number of jobs in parallel and so get through the workload faster. Probably the best example of this is cell-library characterization, where with perhaps a hundred or more corners and a thousand or more cells, you are looking at literally hundreds of thousands of jobs. When you have effectively an infinite number of cores, it is easy to waste them. But it is not cheap to waste them. It is all too easy to waste simulations or waste iterations. Where machine learning, the cloud, and EDA all come together is when the machine learning is put under the hood. One example that we announced at the same time as Cadence Cloud is Liberate Trio. See my post Liberate Trio: Characterization Suite in the Cloud . This uses machine learning to discover which runs you might consider need to be done and which can safely be skipped as adding no new information. At DAC a few years ago on a panel session, Sifuei Ku of Microsemi memorably described the old way of doing things as "The British Museum Algorithm": You walk everywhere, but if you don't walk to just the right place, you miss something. I'm sure you've had that experience in a large museum such as the Met or Le Louvre. Museums solve the problem by giving you an audio guide and taking you to the highlights of their collection. But a better solution would be if those audio guides learned what you were interested in and updated where they took you, and what they told you appropriately. That's more like what Liberate Trio does, it learns what needs to be done and adjusts what it does appropriately. The big challenge in library characterization is that the intuitive old rules don't work anymore. You characterize some parameter at slow-slow, typical, and fast-fast (at the same voltage and temperature) and you expect typical to lie between slow and fast. But it might not be. The same if you hold process constant and characterize at different temperatures, the middle one might not lie between the high and low. The obvious solution, just characterize absolutely everything absolutely everywhere, requires too much computation, or at the very least, a lot more than required. Machine learning can work out which simulation runs really need to be done. This is not something that can be done statically in advance—it requires the results of previous simulations to make the decisions. No-Human-in-the-Loop Design Flows Library characterization isn't the only place that machine learning can be brought to bear. A lot of what we do in EDA is to try and avoid iteration by making the flow more linear. But, let's face it, there is still a lot of iteration. For example, you run synthesis, you don't like the result, so you tweak some parameters and run it again. Of course, in a process like synthesis, there is a huge amount of iteration going on under the hood (literally billions of iterations) but to the designer that is linear. It is only when the parameters have to be tweaked that the designer counts it as iteration. However, the designer has a lot of (non-machine) learning that the tool does not have: experience with previous runs of the tool, experience with the same IP in previous designs, a feel for how the tool is likely to respond to parameter tweaking. A more subtle problem is that the optimization tradeoffs are hard to make explicit: "make the design as fast as possible, but not at the expense of too much area or power" is too vague. Furthermore, there are tradeoffs between blocks. For example, timing budgets can be traded off between adjacent blocks. On the other hand, the specification for the overall system might be clearer: the chip must run at 2.4 gigahertz say. The cloud scales to help: the designer can set several values to parameters, try all of them, then pick the best. With the cloud, all the parameter sets can be tried in parallel. But that is only partially solving the problem. We want to get away from the British Museum Algorithm here too, and take the human out of the loop so that what today requires designer intervention and iteration is replaced with machine learning. Of course, no human in the loop is an ideal, where you simply start the job, on tens of thousands of servers, and hopefully a day, or a week later, it finishes with a solution that meets all the specifications. You would get out of the habit of looking under the hood, in just the same way that software programmers don't bother to look at the assembly code generated. [Cadence] was selected by the Defense Advanced Research Projects Agency (DARPA) to support the Intelligent Design of Electronic Assets (IDEA) program, one of six new programs within DARPA’s Electronics Resurgence Initiative (ERI) to use advanced machine learning techniques to develop a unified platform for a fully integrated, intelligent design flow for systems on chip (SoCs), systems in package (SiPs) and printed circuit boards (PCBs). The ERI investments are the next steps in creating a more automated electronics design capability that will benefit the aerospace/defense ecosystem and the electronic industry’s commercial needs. To fulfill the program charter over the four-year term of the contract, Cadence created the Machine learning-driven Automatic Generation of Electronic Systems Through Intelligent Collaboration (MAGESTIC) research and development program. This program will create a foundation for system design enablement by introducing greater autonomy within the design process and developing truly design-intent-driven products. The Cadence-led team includes Carnegie Mellon University and NVIDIA, two of the most renowned machine learning leaders in the world. For more details, see my post Cadence is MAGESTIC . Cadence Cloud: the Movie www.youtube.com/watch NVIDIA Emulation At DAC, we also announced Palladium Cloud. See my blog post Palladium Cloud for details. And if you want to see more Palladium emulators in one room than your company has...or indeed than you have ever seen before, then watch gaming blogger Blunty scoring a visit to the NVIDIA emulation lab and getting a tour from Narendra Konda, NVIDIA's Mr Emulation: https://youtu.be/650yVg9smfI Sign up for Sunday Brunch, the weekly Breakfast Bytes email.
↧