It's going to be academic week here on Breakfast Bytes. There is an anniversary coming up, the Kaufman award was announced recently, the Cadence Academic Network is having a birthday, and then there is the book that changed everything. So check in each day for something to go with your breakfast coffee. But we will start the week with Andrew Kahng, who came by Cadence recently to give a presentation about the challenges facing EDA now that semiconductor processes are not delivering "free" scaling. Andrew considers this challenge important enough that he says: A big part of the future of EDA is going to be about which of Cadence and Synopsys masters this first I think I first met Andrew in about 1999, soon after Cadence acquired Ambit. I ended up as the strategic marketing person for engineering (in those days Cadence was functionally organized) and, as a result, was on our Technology Advisory Board or TAB. This was the descendent of an organization started back in SDA days, and consisted of of a few Cadence employees, a few academics (including Andrew and Alberto), and a few representatives of industry. I think we met a couple of times per year. In those days, Andrew was at UCLA, but he has since moved to UCSD where he is Professor of CSE and ESE. A couple of weeks ago, Andrew came to Cadence and gave an internal presentation Quality, Schedule, and Cost: Design Technology and the Last Semiconductor Scaling Levers. The Scaling Crisis What is scaling? The critical dimensions in a modern process are the contacted poly pitch (which is the vertical pitch between adjacent gates in standard cells drawn the normal way) and the metal-x pitch (which is the pitch between tracks for the tightest metal, normally metal2). If both of these shrink by 0.7 between one node and another, the density goes down by 0.7 x 0.7 which is 0.49 or near enough a doubling of density. If it is a couple of years between nodes, then the math works out to the nice round number of ~1% per week or ~5% per month. Andrew didn't say this, but until recently, the rule of thumb was that the cost per wafer would go up by about 15% or so between nodes, due to the increased complexity of the process, more expensive masks, more costly materials, etc. Putting these two numbers together meant that the cost of any given functionality would decline by about 30-35% per node. Of course, the most recent process nodes have had the costs increase by much more than 15% with FinFETs and multiple-patterning. In the 20nm era, there were plenty of graphs around showing that costs were going up per transistor, but everything I hear these days is that second generation 14/16nm are at least down a bit from then. But we are in a new era, where if you want to double the number of cores on a processor between one generation and the next (roughly doubling the number of transistors) then it will cost you not much less than twice as much. I think it is interesting that apparently Apple's latest application processor, the A11, has a smaller die size than its predecessor—it would be too expensive to use all the silicon area on offer. Another problem is margin stacking. It is a general rule in EDA that you can either have increased accuracy, or you can guard-band a parameter and use the "worst case". If you go back a couple of decades, then this just meant that you did the design to the slow-slow corner even though you knew that the chips would mostly be typical (by definition of typical). However, now we are in the era of lots of process corners, lots of different sets of resistance/capacitance corners, lots of temperatures and voltages. This causes a lot of problems in getting design closure but there is a more insidious problem. When you take the margin that each tool and technology file is adding, this margin stacking is eating up almost all the gain of moving from one node to the next, more advanced, node. So typical gets something close to Moore's Law scaling, but worst case doesn't scale so much (due to the process), and then signoff using excessive margin stacking completely wipes out the gains. It is worth pointing out that this is only considering the chip. There are more stacks of margin at the system level, on the board, in the package. All these need to be tightened up, too, or they can eat up the gains from improving things at the chip level. This has resulted in an IC design crisis, with many steps in a long design flow, which must all be completed to really find out what you end up with. Of course, all problems are technically unsolvable in reasonable time (NP-hard is the technical computer science term) and so can only be solved by heuristics, and the size of designs makes any algorithm at all challenging. Iteration is expensive. Remember, one week is 1% of Moore's Law. Being suboptimal is very expensive, too. If the tools deliver 10% off the optimal for power or performance, then this is about half the benefit of the new node. So margin stacking is very expensive. Andrew talked a bit about the challenges of future semiconductor scaling. But that is the TD guys' job. We'll just take it as given that it is getting more challenging. If you are really interested in what is in the funnel, then watch out for an upcoming post previewing this year's International Electron Devices Meeting (IEDM) in December. Beyond the chip, there are a number of More than Moore technologies, such multi-die packages (2.5D and 3D packaging), monolithic ICs (where more than one layer of transistors is built up on the same wafer, or alternatively where two die are attached top-metal to top-metal). There are rebooting computing paradigms, such as approximate and stochastic computing. To read more about this, see my post The IRDS Panel at IRPS . As a software guy by background, I have to say that when we go away from Von Neumann architecture, we will miss it. Thought Experiments What if we had infinite dimensions? So we could do netlist optimization with zero wire parasitics (just the transistor parasitics, they don't go away). Maximum power benefit is 36% for a Cortex-M0, 20% for for AES encryption. Or what if frequency didn't matter at all? We can get up to 65% area difference, although usually more like 30% between the minimum clock period and the relaxed clock period (this is for an AES block). See the graph above. And if we throw in free magical no-resistance no-capacitance wires, the cycle time goes from 2.8ns to 2.25ns. Of course, these gains are not nothing. But they do make it clear just how hard scaling is becoming. That's before we get down to the nitty-gritty practical stuff, like is there a good material to make liners for future vias, or can we build vertical gate-all-around transistors economically. The Last Levers for Scaling What levers does that leave us with? First, Improved design tools and reduced margins. "Owning a flow, like Cadence does," Andrew said, "you should be able to attack this margin lever." Second, cutting the schedule. Every improvement of one week in schedule is a 1% gain. Can something be done in one hour that used to take one week? Can "doomed runs" be avoided? Third, cutting the cost of design, which means either or both of fewer humans and fewer months. But, most obviously, means using a lot more cheap machines instead of expensive humans (fewer humans) and hopefully getting automatic results sooner (fewer months). Machine Learning In 2017, it doesn't seem to matter what the question is, the answer is machine learning. Chris Rowen has a joke that an AI startup is any startup founded after 2015, since they all claim to be be "AI-enabled." However, there are many high-value opportunities in and around EDA. But we have an HR challenge: those of us here already are all CS/EE guys, but we are not trained on machine learning (and the people that are trained are all going to Facebook, Google, and Amazon). But there is huge potential. IC design tools are black boxes, with millions of lines of code, thousands of commands and options. Customers are applying machine learning around the tools, but a company like Cadence can bake it in. Andrew went into a lot of details of specific places where machine learning can be used. I will stick to the high level here (this post is long enough already!). EDA tools in general suffer from not doing automatically what human designers do. Run the tool, inspect the results, decide what to change in the input, make the change, and run the tool again. With a big cloud of compute power available, this can be made more of a natural selection process. Run the tool, inspect the results...but then try a hundred changes and run the tool a hundred times again. Then throw away the ones that turned out badly. This is what Andrew calls the "multi-armed bandit" where you spin a hundred sets of wheels at once. Since EDA tools pushed to the limit behave chaotically (in the mathematical sense), then machine learning can be used to tame this. In fact, you can even use machine learning to optimize this process as a process: you have 300 licenses, 1200 cores, and 90 days, so what do you do? UCSD is working towards a "no human in the loop" tool that does this. Launches parallel tool runs, looks at outcomes, rinse and repeat. Analysis Accuracy The next place to attack is analysis accuracy. All analysis is, at some level, simulating the laws of physics. In the tight loop, we need to trade accuracy for speed. But can we get accuracy for free from machine learning? For example, can you get path-based-analysis accuracy (in timing) from the much cheaper graph-based-analysis using machine learning? Can you run a few timing corners and predict the rest with machine learning? Or how about getting coupling-aware timing analysis from non-coupling-aware analysis? For example, the above graphs show the process of improving timing correlation (reducing that margin stack) with machine learning. This reduces errors in path slack from 123ps to 31ps (Andrew's slide calls this a 4X reduction...well, if you went from 31 to 123 it would be a 4X increase, but I don't think you can call it a 4X reduction the other way around, it is a 75% reduction...but when I pointed that out, Andrew said that 75% reduction still sounds pretty good). C-DEN Andrew also directs C-DEN, the Center for Design Enabled Nano-fabrication, involving not just UCSD, but also UCLA and Berkeley. Cadence is once of the sponsors. Their mission is to address fundamental aspects of process limits, process and device integrations, heterogeneous integration, and design enablement. These three are not even really separate any more. They have their own website . Conclusion Machine learning will deliver future scaling through better design quality of results, faster tool and design flow convergence, lower runtime. Andrew's final remark: Cloud and parallel search can compensate for many sins. How do we get there? It will take the EDA companies, academics, and the design teams all working together. Andrew's final sketch: Sign up for Sunday Brunch, the weekly Breakfast Bytes email.
↧