Ever since Pat Gelsinger, then CTO of Intel, shocked everyone with his power graphs, the design world has been more concerned about power than almost anything else. Pat pointed out that if we kept increasing clock rates then the power density of an Intel microprocessor was going to approach that of a rocket nozzle. Even in a server in a datacenter that was not going to be feasible, and a processor in a smartphone can’t get anywhere close to that. Nobody wants a rocket in their pocket (insert your own Mae West joke here). Of course power is important not just for thermal reasons but, for portable devices, for battery life. The graph below shows more detail of the problem today. The scale is logarithmic so the differences are much larger than they look. At the top is the increasing number of transistors we can put on a chip. The blue line shows the single-thread processor performance, and the green line shows clock frequency, which have both stalled out since we can't really increase the clock frequency any more without getting into rocket nozzle range. Power has to be capped since we can't get any more heat out of the package. The solution has been to add more and more cores. But we are running into problems there, the so-called "dark silicon" problem, where we can put more and more cores on a chip (we have enough transistors) but we can't power them all up at full speed at once. This is not just a theoretical problem. High-performance microprocessors and SoCs sense the temperature and modulate performance via dynamic voltage and frequency scaling (DVFS), reducing the voltage and slowing the clock. The picture below shows an eight-core design. For one to four cores, the performance increases as expected. But with eight cores, the chip overheats and drastically cuts back the frequency and voltage to such an extent that eight cores run at the same speed as one. It became clear that power-reduction techniques involving different parts of the chips would become more important than they had historically. In 2G cellphones, everything except the real-time clock could be turned off when the phone was not in use. Before smartphones, a cellphone was either making a call (or texting, gaming, etc.) or off. In fact, a cellphone can’t ever be completely off or it would never be able to receive a call. Under the hood, everything except the real-time clock could be turned off, leaving the clock to wake the receiver up every second or so to listen to the paging channel to see if there was an incoming call or text. Dividing the design into regions and having a power policy for each one gave a finer grain of control at the cost of a big increase in complexity. Each region could be a different voltage, powered on and off, and even have a varying voltage. The first problem was that design tools could not cope with this. Vdd and Vss were not explicit in the netlist so there was no way to capture these decisions. This problem drove the creation of CPF and UPF (since unified into IEEE 1801) to capture power policy so that tools could correctly create power networks, add level shifters, retention registers, and more. Having a file format to capture power policy was just an entry to the game though. It did nothing to help decide what that power policy should be. Management of power suffers from a number of problems: First, actual power dissipation depends on what the chip is doing (making a cellphone call, playing a video game, in your pocket). As a result, average power is a poorly defined idea that depends on the duty cycles of the various applications and thus, in turn, which blocks on the chip are actually active. Second, is the contrast between early and late in the design cycle. Early in the design cycle, at the architectural level, the reductions in power are potentially the largest, but the capability to compare two possible choices are the least feasible. Late in the design cycle, at the physical layout level, the actual power numbers are known fairly accurately but the impact of any changes are comparatively small. The sweet spot seems to be at the RTL level. At that level, it is possible to make changes that have large impact while reasonably estimating what that impact will be. But this requires very good estimation of how the RTL will look after physical design without going to the expense of actually doing physical design. For example, in a typical chip, perhaps 30-40% of the power is consumed in the clock tree, meaning that it needs to be estimated. The key technology to making this happen is to be able to do a fast, physically aware synthesis that doesn’t give up too much accuracy. A blazingly fast synthesis that is only 50% accurate is the wrong tradeoff, as is saving only 20% over actually doing the whole implementation for 95% accuracy. Once that has been done, then vectors for the various modes (what the chip is doing) can be run, and power dissipation numbers obtained. These numbers can then be used as the basis for making RTL-level changes to the design. Cadence’s Joules RTL Power Solution is a tool that is accurate to within 15%. It keeps performance high by parallelizing the analysis across multiple CPUs, and being able to analyze multiple stimulus files (the different modes for what the chip might be doing) in parallel. The Joules solution is actually built on top of an ultra-fast prototype mode of the Genus Synthesis Solution released earlier in the year. To generate more extensive data, Joules can be linked into Palladium Dynamic Power Analysis (DPA), which uses emulation. This is especially attractive when the analysis requires running a realistically large software load which, in effect, results in literally billions of vectors. The result is the capability of doing RTL power analysis with good accuracy, and about 20X faster than any other approach. For example, 10 million instance designs can be run overnight at the RTL level, producing power results with accuracy within 15% of signoff. Oh, and that picture. That is James Joule, who Wikipedia describes as a physicist (I knew that) and brewer (who knew?). He is generally credited with discovering the conservation of energy. However, more relevant for this blog, he also discovered the relationship between the current through a resistor and the heat dissipated.
↧