Quantcast
Channel: Cadence Blogs
Viewing all articles
Browse latest Browse all 6660

All Models Are Wrong; Some Are Useful

$
0
0
"All models are wrong, some are useful.” This remark is attributed to the statistician George Box who used it as the section heading in a paper published in 1976. Anyway, George Box went on to clarify what he meant Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V, and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules. Models Depend on the Purpose My favorite example of different kinds of models being useful for different things are these two: The model on the left is the sort of plastic kit that I used to put together as a kid (you can still buy them apparently). If you wanted to, for example, measure the wingspan then this would be a good model. Trying to learn anything about aerodynamics, not so much. The model on the right is useless for pretty much anything other than learning about aerodynamics. It doesn’t even look much like a real plane. But it flies. Semiconductors I think the biggest model that we have in semiconductors is what I like to call the digital illusion. We have analog transistors and voltage levels, but we pretend they are digital gates which are 0 or 1 and have a delay that can be captured in a few parameters. When I started in EDA, we didn’t even model input and output slopes. We would characterize a gate by using SPICE and putting a step function on the input and seeing when the output reached some threshold—that was the delay. Then we went to adding slopes, so that we would measure the delay from when the input rose to 50% to when the output fell to 40% (or 60% if it was rising). It turned out some gates reached 40% on the output even before the input reached 50% so they had negative delay. So we lifted up the corner of the rug, swept the dust underneath, and set that value to zero. Then we had to start to model interconnect resistance... But even today we still model gates as being digital with a delay function. When we do signoff we get a bit more complex and start to model more of the analog effects to make sure our digital illusion hasn’t broken down. If we had to model a billion transistors as genuine analog devices using circuit simulation, then we would never be able to design a microcontroller, never mind a smartphone application processor or a server microprocessor. Of course, we also set bounds on the environment where we expect the model to work. If you set the power supply voltage to 0.1V, we don’t expect the digital illusion to tell us how the silicon would actually behave. But in the normal operating range, luckily for us, the digital illusion seems to hold and people who don’t even know how to run SPICE can confidently write thousands of lines of SystemVerilog knowing that these will accurately be transformed into real analog transistors about which they know next to nothing. Now that’s a model that is useful. Scottish Deer When I was doing my PhD (in Scotland), I had a friend, Andrew, who was an ecologist. He had a lot of data about forests in the Scottish highlands, and their deer populations. In that era, the Forestry Commision (roughly the equivalent of the US Forest Service) was in the mode of planting high-density evergreen forests that could be harvested to make pulp for paper, and to create lots of employment in forestry and paper mills. It turned out that this was a terrible idea: there was very limited demand for newsprint and other low-quality paper; once grown, evergreen forests block the sun so completely that the forest floor is completely dead, even the lower branches of the trees; they are horrible to look at or walk in, and so recreationally useless, they created a little employment except during the planting phase (which was highly desirable and well-compensated summer employment for students, who didn't mind moving around every couple of weeks). Anyway, eventually the whole strategy was abandoned, and the Forestry Commission decided its mission was "to protect and expand forests and woodlands and increase their value to society and the environment", not try and grow commercial timber. Unfortunately, the Scottish Highlands are still blighted by huge evergreen forests of no commercial or recreational value. Andrew's data tracked deer population over the years from when a new forest was planted and fertilized. He had data from dozens of forests. In that era, the Forestry Commision was very active planting all over the highlands (they had a lot of land—in fact, they were the largest landowner in Britain). In the early days of a new forest, the deer would move into the area, breed, eat other plants (and some of the trees) that grew using the fertilizer. Then, as the trees matured and closed off the canopy, there would be fewer plants, and eventually, the deer couldn't even get between the trees. So the deer population would go from next to nothing, to a peak, and then back to zero. One problem was that the data wasn't very good. Counting deer is not that easy or accurate. Which reminds me of an off-topic story of zoologists in Africa who were trying to do something with zebras, so they tranquilized them and tagged them electronically. But they couldn't distinguish one zebra from another and would lose visual track of the ones they were trying to follow, the ones they had tagged. So they painted red dye on the rears of those zebras when they tagged them. Unfortunately, the lions killed all the tagged zebras. It turned out that zebras are striped, not for camouflage against the savannah (a lion is much better camouflaged) but so that predators can't focus on one of them. It's why fish often go around in shoals too, a single fish is too easy to pick off. But the red dye meant focus was easy, and so those were the zebras that the lions (actually the lionesses) went after. Anyway, back to deer. Andrew and I attempted to build a model that could predict the deer population from knowing when the forest was planted, how big it was, the species of tree, and some other stuff I forget. We started with a very simple model, since just the year since planting was obviously one of the major variables. Different species of tree grew at different rates, so that obviously was going to affect how fast the cycle ran its coures. The forest size seemed important. Some of them were very small, meaning that most of the forest was near the edge and still had lots of light, accessible to seeds from other plants, and so on. We ended up with a reasonable model that seemed to have some predictive power. But it wasn't great, and lots of the data didn't match the model. Forests grow pretty slowly, so we weren't going to sit around and see whether the model predicted accurately over the next couple of decades. So we used a bit over half the forests to derive the parameters for the model, and then used the other forests to see how accurate it was. The more parameters we added, the better the model fitted the forests we used to train the model, but often the worse it fitted the other forests that we used to test the model. This is known as over-fitting (the same word is used for machine learning, which I suppose is what we were doing manually here). If we used all the forests to derive the parameters, presumaby it fitted really well, but we didn't have any more forests to test it on (nor a couple of decades to sit around and see). What I learned from this is something attributed to John von Neumann (yes, he of the von Neumann architecture in computers): With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. Climate Change I'm skeptical about the models used for global warming, based on this experience. The basic model is well understood. Every doubling of CO 2 produces a one-degree Fahrenheit increase in temperature. But that is a logarithmic scale, and is nowhere near dramatic enough to produce publishable papers, and isn't the answer the IPCC and the politicians wanted. So there needs to be a strong positive feedback added, to get a huge increase in temperature (despite there being no evidence for the sign of the feedback, let alone a value). But now the temperature increases too much in the early part of the 20th century, so another parameter (typically particulates from old power stations) is added so that the warming is muted until cleaning up the industrial emissions reduced the particulates. Of course, given this model, we should be arguing for dirtying up power station emissions (China is doesn't get nearly enough credit for this!). Living in California, what's the biggest thing that affects the weather on a multi-year basis? Whether it is an El Niño year, a La Niña year, or neutral, what the oceanographers call ENSO, the El Niño Southern Oscillation. There is actually another longer timeframe one that is also important, the PDO, the Pacific Decadal Oscillation. These obviously figure prominently in the climate models. Nope, they are not even in the models. How have those models done over the last forty years? They do great in the first 20 or 25 years, the part that is used to train the models. Then they run wildly hot compared to reality: Summary Climate is complicated, much more complicated than the US economy, which at some level has only about 1/3 billion moving parts (aka people). How did those macroeconomic models do predicting the recent recession? Or the great depression? Or anything else? Nobody would even claim that they can model the economy accurately enough to have any predictive power, they just aim for a bit of explicative power. A bit like the paper airplane I started with. The digital illusion is one of the most powerful models in our industry, able to get very close to SPICE accuracy without the computational load. Don't forget SPICE itself is a model, there is another level underneath where TCAD models individual charges. Even that is a model, not quite how actual atoms behave in all their quantum weirdness. It's turtles all the way down. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Viewing all articles
Browse latest Browse all 6660

Trending Articles