Quantcast
Channel: Cadence Blogs
Viewing all 6655 articles
Browse latest View live

EE Thermal 101 – Thermal basics for Electrical Engineers (Part 4 of 4)

$
0
0
In part 3 of this series, we used the concept of thermal resistors to develop a thermal equivalent network of a system and determined its equivalent junction to ambient thermal resistance. With this approach, we were able to link thermal resistances to physical properties of the system and intuitively understand the dominant heat transfer mechanisms with the equivalent thermal resistor equations. In this blog we’ll take a look at several popular cooling techniques that are commonly employed in electronic systems and discuss how they work. Heat Sink Heat sinks are passive heat transfer devices that transfers heat from an IC package to the ambient environment with a much smaller thermal resistance than the parallel thermal resistance from the package to the environment due to convection and radiation. In order for the heat sink to be effective, its equivalent thermal resistance must satisfy the following equation: where is the effective thermal resistance of the heat sink, is the thermal resistance of the package top due to convection and , is the thermal resistance of the package top due to radiation. Figure 6. Thermal resistance model of a N-fin heat sink with a TIM connected to the top of a package. Figure 6 shows a thermal resistance model of a N-fin heat sink (N is the number of fins) with a thermal interface material (TIM) connected to the top of a package. The TIM is needed to improve the contact between the package and the heat sink and thus the effective thermal resistance of the heat sink needs to include the resistance of the TIM. Using the techniques we learned from the previous blog, we can find: which tells us the effective resistance is equal to the resistance of the TIM plus some resistance from the base of the heat sink and the parallel resistance of the N-fins. If we assume the fin resistances are equal, then the equation can be further simplified to: The equivalent resistance of the heat sink simplifies to the resistance of the TIM plus some resistance from the base of the heat sink and the resistance of a heat sink fin divided by the N number of fins. Since the area of the heat sink fins can be larger than the top surface area of the package, its convection and radiation resistance can be smaller than that of the package top surface. Furthermore, this resistance is divided by the number of fins of the heat sink leading to an N times improvement. However, for a given heat sink base area, increasing the number of fins above a certain number will eventually lead to an increase in the thermal resistance of each fin since the fins will start to approach each other dropping the effective heat transfer coefficient. It is also important to choose high thermal conductive materials for the heat sink and TIM to improve the overall performance of the heat sink since these thermal resistances add directly to the effective thermal resistance of the heat sink. Heat Spreaders Another technique to cool an electronic system is to spread more of the heat from the IC to the back side of the PCB using thermal Vias and heat spreaders. Thermal Vias placed under the IC can significantly reduce the thermal conduction resistance of the PCB and help guide the heat to the heat spreader plates placed on the bottom side of a PCB. Heat spreaders are made from high thermal conductive material like graphite and have large surface areas to improve heat dissipation. Fans Electronic fans are routinely used in consumer electronic systems such as desktop computers, laptops, and projectors, etc. when the use of passive heat sinks and heat spreaders may not be sufficient to remove the heat. A fan uses a motor and requires power to actively move airflow around the system to remove heat. It can also be a source of audio noise, so you would need consider noise generation as well as reliability issues when choosing a fan. Many fans today allow you to control the speed with a pulse width modulated (PWM) signal so you can design a thermal management system that allows you to dynamically adjust the fan speed as a function of your system temperature. Heat Pipes A heat pipe is a heat transfer device that uses the principles of thermal conductivity and phase change to transfer heat between solid components. Phase change for a heat pipe general refers to a liquid changing to a gas once the liquid reaches its boiling point at the hot spot and then propagates as a gas down the pipe and condenses back to a liquid when it reaches the lower temperature interface. The liquid is then wicked back up to the hot spot usually through capillary action and the process repeats as it removes the heat from the hot spot to the cooler interface. Heat pipes are also widely used in consumer electronic systems and can be found in computers, tablets, and even smart phones. Dynamic Throttling Finally, as electrical engineers, we do have the ability to control the power dissipation of our system with various power throttling techniques but usually at the cost of degrading our system’s performance. The goal here would be to trade-off performance as gracefully as possible so our customers can appreciate that you have done everything possible to maintain the best user experience. Many electronic systems now employ thermal sensors throughout the PCB such that an onboard processor can make monitor the temperature in the system and make dynamic throttling decisions as the temperature increases. As electrical engineers, we intimately understand the various different power profiles of our system and can start to turn on fans, reduce features, disable different parts of the system, and/or throttle clock speeds as the temperature in our system reaches different temperature thresholds. Congratulations on making it through our EE thermal 101 blog series. We hope you were able to learn some basics of heat transfer. With a foreseeable future of higher power density electronics, electrical engineers will play a critical role in the thermal management design of products. Cadence® offers Sigrity TM PowerDC TM as a proven electrical and thermal technology that has been used in the design, analysis, and sign-off of real-world packages and PCBs for many years. PowerDC enables electrical engineers to extend power integrity with fast and accurate thermal analysis for IC packages and PCBs. It includes an integrated electrical/thermal co-simulation environment that considers the effect of increasing electrical resistance that occurs at higher temperatures to help you confirm the design has met specified DC voltage and temperature margins. Check out our PowerDC page for more information. Read more blogs of thermal topic: EE Thermal 101 – Thermal Basics for Electrical Engineers (Part 3 of 4) EE Thermal 101 – Thermal Basics for Electrical Engineers (Part 2 of 4) EE Thermal 101 – Thermal Basics for Electrical Engineers (Part 1 of 4) Why is Power Integrity Hot (or is it Cool)? Some Don't Like It Hot: Thermal Model Exchange

Sanjay Goes to Europe via Texas

$
0
0
While I was in Germany recently, I sat down with the head of Cadence EMEA (Europe, Middle-East, and Africa), Sanjay Lall. Back when we were both a lot younger, we both worked at VLSI Technology. I was a software development engineer, he was an application engineer before moving into marketing. We both toured the solar system of EDA like lost planets before coming back together at Cadence a couple of years ago. Sanjay told me that the key turning point in his career was working at Epic design working for Sang Wang. Epic had a family of superfast high capacity transistor simulators called TimeMill, Pathmill, and Powermill. He joined in August 1990, only to be told in December that they were running out of cash and had a real short runway left to operate. But they brought in Bernie Aronson as the new President, who raised money, and then they went public in 1995. Eventually, they were acquired by Synopsys in 1997. Bernie asked Sanjay to move to Texas from San Jose, so he went, initially just working out of his apartment before he set up the regional sales and support office in Austin, Texas. After Synopsys acquired EPIC, he parted ways, and started his own rep firm covering the Texas market. He had an investment model where he would only rep companies if he was allowed to invest and or receive stock. This avoids the rep’s dilemma that if you don’t sell anything you get fired, and if you are wildly successful, the company gets acquired and the new owner promptly terminates the rep agreement. The rep agreement still gets canceled, but at least you make money on the acquisition. During the course of his company, he was in discussions with one of his suppliers, ExtremeDA, in 2005 when the CEO, Mustafa Celik and Board Member, Lucio Lanza asked him to join the company instead of just representing them. They were competing against PrimeTime. A few years later ExtremeDA got sued and then acquired for a token amount. So he was out of a job and became a consultant for AtopTech, Ausdia, and a few other companies. Charlie Huang, who was running sales at the time (and emulation engineering, not quite as weird a combination as Mark Gogoloski who was Denali’s CTO and CFO), called him in 2013. Cadence had just launched Tempus to…you’ve guessed it…compete with PrimeTime. There was a bit of turmoil and when the dust settled, Neil Zaman was running sales (WFO in Cadence-speak) and asked Sanjay to run emulation. Cadence had just launched Palladium Z1 (see my post Palladium Z1, an Enterprise Server Farm in a Rack ) so, as Sanjay put it, “the rest is history.” As you could tell from listening to earnings calls over the last couple of years, the product has been incredibly successful. We were even selling boxes with no evaluation, which is amazing given that emulators are…let’s just say not cheap. At the end of 2016, Neil asked Sanjay to take over Europe and “get growth.” So on April 1st 2017 he took over EMEA, and moved himself and his family to the UK. His timing was great once again, since Europe is strong in automotive, machine learning, AR/VR, 5G & IoT. The numbers for automotive semiconductors hide what is going on, since the growth is healthy but not exceptional. But on the design side, where Cadence makes its money, it is exploding. Those chips won’t be in production until model year 2022, 2023 (or even later) but they have to be designed now. Sanjay realigned the organization, especially around this automotive vertical which is strong in Europe and system design enablement to address the other growth markets. It is especially strong in Germany (have you heard of Bosch, BMW, Mercedes, Audi, VW, Porsche…). As Sanjay put it: Automotive is in my face 24/7 across the whole ecosystem…OEMs, Tier-1s…ADAS is going amazingly Translations from automotive-speak: OEMs are car companies like Ford or BMW. Tier-1s are their primary electronic suppliers like Bosch or Delphi. ADAS is “advanced driver assistance systems” which is another way of talking about the early phases of self-driving cars. Talking of BMW, I can’t resist, at this point, of showing you the picture to the right. It’s in the BMW Museum (BMW is headquartered in Munich—the B stands for the equivalent of Bavaria in German, where Munich is the capital). Yes, it really is a BMW, the 1955 Isetta. I showed the picture to a journalist at the press dinner and he immediately knew what it was since his uncle had owned one. It’s obviously rubbish by modern standards, but it would be pretty cool to drive a car like that around Silicon Valley with a BMW logo on the front. Well, it would be fun to arrive—the driving not so much. In automotive, there are lots of startups in Lidar, infotainment, sensors, cybersecurity and more. Some will fail and some will get acquired. But in the meantime they are all doing designs. The OEMs (car companies) are dabbling in potentially doing their own silicon since future differentiation will not be in ICE anymore (that’s “internal combustion engine” in automotive-speak). Israel is also on fire with lots of well-funded startups, coming out of the starting-blocks with $5-10M funding, and doing advanced node work. There are lots of new analog companies in the UK, and the Nordic area (this is code for Nokia and Ericsson) are chasing 5G for base-stations. Nokia has also re-entered the handset business after its non-compete with Microsoft expired (they bought its previous handset business). The big semiconductor companies in Europe are doing well. Even NXP has leveraged its reliability and security technology to major positions in banking and automotive. Just look at ST’s stock price over the last 3 years. Or the job openings at ST, Nokia, Ericsson, and more. So many engineers being hired. With local labor laws, companies are cautious about hiring, but they are truly investing for the long-haul. They are well-aligned with us. The challenge is to deliver everything they need. Sanjay said that they even have over half-a-dozen 7nm startups. It costs $10M just for a 7nm mask-set so you don’t even build a prototype without customers willing to buy in high volume. It takes $15-25+M to develop the SoC, tape out in 2019, get parts into 2022 model cars. I asked Sanjay how the future looks: I’m excited about the prospects for EMEA over the next 2-3 years. There is lots of innovation and energy in many of the verticals. Arm has created a lot of new angels who are silicon-literate. There’s a tremendous push and investment in the 5G race in Scandinavia. And automotive everywhere. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Tales From DAC: Netspeed and the Cadence Interconnect Workbench Pair Up

$
0
0
Services like facial detection, efficient cloud server workload management, artificial intelligence, and image enhancement are all the rage these days; but creating a design to accommodate these needs can be incredibly taxing on your engineering resources. Luckily, Netspeed Systems is here to help. The devices being designed nowadays are more complicated than ever, and the design requirements are more complex, too. Designers that want the highest performance, multi-core capabilities, mixed traffic requirements, and other features all delivered with either CPUs or GPUs need more. Designers today need a mix of both, for each of their strengths—they need heterogenous computing. Services like facial detection, efficient cloud server workload management, and image enhancement all call upon heterogenous computing to meet their complex needs. Designers want to use GPUs for easily parallelized data and big data, while they also want CPUs for smaller data and highly-structured, non-parallelizable data. There’s also issues with efficient memory management and maintaining cache coherencies without compromising system-level quality-of-service. This is where NetSpeed Gemini comes in. Gemini is highly configurable—you can easily set your cache hierarchies and ensure caching and coherency participation. Its distributed architecture makes for lower latency and allows for floorplan-aware configurable cache hierarchies. This is all pretty cool—but you still need software automation to help avoid deadlock in this complex computational platform. The burning question is: how do you know when you’re done with the verification? Netspeed’s architectural design approach seeks to answer exactly that question. You begin with a specification of your architectural requirements; then, Netspeed helps you weigh the different tradeoffs and explore the design space so you can find the best solution for your project. Then, you can get design feedback at any step of the process. This way, you can create and reach concrete goals. The Netspeed platform has its own built-in simulator, called the Integrated Performance Simulator. This simulator—and its accompanying SystemC model—are great if you don’t have a solid grasp of your traffic requirements yet, and things are still a little more abstract. But, if you’re looking for something more precise, you want Verilog simulation—and the best way to get that is through the Cadence Interconnect Workbench (IWB). Cadence IWB gets you cycle accurate performance analysis, protocol checking via VIP, data consistency checking via IVD, and loads more—and it’s easy to execute on Xcelium or Palladium XP II or Z1. You can get great graphs showing your workload-base traffic simulations, alongside other data analytics to help you identify and fix your performance bottlenecks. Next-generation applications are driving us to next generation architectures. Devices have caches everywhere, and they’re all snooping each other—how can you expect to keep coherency in that kind of chaos? With next-generation performance analysis—the kind brought to you by Cadence IWB and NetSpeed. For more information on Netspeed Gemini, check here ; for more information on the Cadence Interconnect Workbench, check here .

Virtuoso: The Next Overture: Prelude to Cadence Virtuoso "Symphony No. 18.1"

$
0
0
When the curtains raise on our new, novel, and innovative symphony, it will play in four movements • Hierarchical schematic driven layouts that combine the best of top-down and bottom-up design methodologies while avoiding the shortcomings of each • Hierarchical visualization that allows you to easily view or hide details on any level, anywhere in your design, to only view what you need, when you need it • Hierarchical and congestion aware floorplanning and placement that provides automated and assisted productivity • Hierarchical routing and congestion analysis that makes real routing and congestion analysis information available upfront Sounds good? Stay tuned to hear the melody of Cadence Virtuoso "Symphony No. 18.1" Until then, we leave you with Tchaikovsky Symphony 18.12 Overture For more information on the New Virtuoso Design Platform, contact team_virtuoso@cadence.com . To receive updates about new and exciting features being built into Virtuoso for our upcoming Advanced Nodes and Advanced Methodologies releases, type your email ID in the Subscriptions field at the top of the page and click SUBSCRIBE NOW. Rishu Misri Jaggi

Evolution of DisplayPort

$
0
0
In 2006, the Video Electronics Standards Association (VESA) designed a new display interface to compete with HDMI: the DisplayPort. Since then DisplayPort has become more and more popular in the computer world. Let’s take a look at the evolution of DisplayPort over the years. DisplayPort is the first display interface which applies packetized data transmission, like Ethernet, USB and PCIe. Unlike legacy standards that transmit a clock signal over each output, the DisplayPort protocol is based on small data packets known as micro packets, which can embed the clock signal within the data stream. This allows higher resolution using fewer pins in addition to make DisplayPort extensible, meaning that additional features can be added over time without significant changes to the physical interface. Since 2006, the DisplayPort specification has been evolving with the following generations: v1.0-1.1a (2006-2008): DisplayPort v1.0-1.1a provides a maximum bandwidth of 10.8 Gbit/s (8.64 Gbit/s data rate) over a standard 4-lane main link with HDCP support. eDP 1.0 (2008): Based on DisplayPort, eDP aims to define a standardized display panel interface for internal connections; e.g., graphics cards to notebook display panels. v1.2-1.2a (2010 - 2013): The most significant improvements are the doubling of the effective bandwidth to 17.28 Gbit/s in High Bit Rate 2 (HBR2) mode and the multiple independent video streams called Multi-Stream Transport. v1.3 (2014): DisplayPort 1.3 increases overall transmission bandwidth to 32.4 Gbit/s (25.82Gbit/s data rate) with the new HBR3 mode featuring 8.1 Gbit/s per lane. This bandwidth is sufficient for a 4K UHD display, a 5K display, or even an 8K UHD display. v1.4 (2016): Add Display Stream Compression v1.2 (DSC), Forward Error Correction(FEC), HDR10 metadata support. DSC is a "visually lossless" encoding technique with up to a 3:1 compression ratio. Using DSC with HBR3 transmission rates, DisplayPort v1.4 can support up to 8K UHD at 60 Hz with 30 bit/pixel RGB color and HDR. v1.4a (2018): Latest DisplayPort spec with an incremental update on v1.4. In 2018, VESA announced that work has begun on the next generation DisplayPort and they’re looking to double the available bandwidth versus the current HBR3 signaling standard. The goal is to publish the standards update by 2019. With the availability of the Cadence Verification IP for DisplayPort up to v1.4a, adopters can start working with these specifications immediately, ensuring compliance with the standard and achieving the fastest path to IP and SoC verification closure. The DisplayPort VIP provides a full-stack solution for Sink and Source devices with a comprehensive coverage model, protocol checkers and an extensive test suite. More details are available in DisplayPort Verification IP product page .

ImageNet: The Benchmark that Changed Everything

$
0
0
I like to date technical transitions from specific events, even though realistically they take place over an extended period. For example, I think the modern era of IC design and EDA started with the publication of "Mead & Conway" which I wrote about in The Book that Changed Everything . Today, the most important area of computer science, and semiconductor too, is the huge advances being made on almost a daily basis in neural networks and artificial intelligence. A decade ago, this was a sleepy backwater that had been studied for 50 years. Now it is a new paradigm of "programming" where the system is trained rather than programmed algorithmically. The most highly visible area that this is being used is probably the drive(!) towards autonomous vehicles. However, neural networks are creeping into other less visible areas, such as branch prediction in high-performance microprocessors where they outperform traditional approaches. For me, the watershed moment was at Yann LeCun's keynote in 2014 at the Embedded Vision Summit. He had a little handheld camera attached to his laptop on the podium, and he pointed it at things that he had up there, like the space-bar, a pen, a cup of coffee, his shoe, and so on. The neural network he was running identified what the camera was looking at, using the NVIDIA GPU in his laptop to power things. I had never seen anything like it. Remember, this was not identifying static images, this was a low-quality camera, not being held steady, pointing at real-world objects, identifying them in real-time. I've seen similar demonstrations since. Indeed, at any trade show where Cadence is focusing on its Tensilica product line, such as the Consumer Electronics Show, we have a similar demonstration running some standard visual recognition algorithms, such as ResNet or Inception (running on a Tensilica processor, of course). How did "standard vision algorithms" even come into existence? The 2009 CVPR The milestone event, in my mind, was a poster session at the 2009 CVPR, the Conference on Computer Vision and Pattern Recognition, by a number of people from the Princeton CS department. The undramatic paper was ImageNet: A Large-Scale Hierarchical Image Database by Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. I assume it was a poster session, rather than a full presentation at the conference because it wasn't considered that important or ground-breaking. In some ways, it wasn't. But it changed everything. ImageNet was and is a collection of annotated images. The images are all over the internet (lots on Instagram) and don't truly form part of ImageNet itself (and the copyright on the images is owned by whoever put it on the net in the first place). ImageNet consists of the annotations and, in some cases, bounding boxes for the things of interest in the image. The identification in ImageNet was crowdsourced, much of it using Amazon's MechanicalTurk. Today there are over 14 million images. The annotations are basic, along the lines of "there is a cat in this image." There are over 20,000 different categories identified. One focus area is pictures of dogs, where the images are further identified by 120 different dog breeds ("there is a beagle in this image"). In fact, the classification is done using the WordNet hierarchy, which provides some of the knowledge. So if a picture contains, say, a beagle, then ImageNet doesn't also need to explicity identify that it contains a dog, since a beagle is already known to be a dog, and a dog is known to be a mammal. ILSVRC In 2010, the ILSVRC was launched, the ImageNet Large Scale Visual Recognition Challenge. Researchers competed to achieve the highest recognition accuracy on several tasks. It uses a subset of the whole database, and only 1000 image categories, but including the dog breeds. In 2010, image recognition was algorithmically based, looking for features like eyes or whiskers. They were not very good, and a 25% or larger error rate was normal. Then suddenly the winning teams were all using convolutional neural networks, and the error rates started to drop dramatically. Everyone switched, and the rates fell to a few percent. The details of ILSVRC changes each year, but it has been run every year since, and continues today. The dog breeds turned out to be an area where the networks rapidly did better than humans. I can't track it down now, but I read about one researcher who actually trained himself to recognize the different breeds, but even so, the neural networks did even better. Having a huge dataset to use for driving algorithm development turned out to be the missing jigsaw piece that enabled the rapid and enormous advances in AI that have taken place, especially in the last 5 or 6 years. I've seen estimates that AI has advanced more in the last 3 years than in the decades since the ideas were first toyed with back in the 1950s. Data Test data is extraordinarily important. You've probably heard that "data is the new oil" but actually it is processed data that is valuable (like processed oil). There were already millions of pictures on the net, but classifying a few million of them suddenly made them useful. In the automated driving area, and specialized image recognition, there is the GTSDB, the German Traffic Sign Database. Cadence has (or maybe had, these things change fast) the leading network for identifying the signs, and performs better than humans. If you've not seen the database, you might wonder how a human would ever get anything wrong—traffic signs aren't that hard to identify. But some of them are in fog, at dusk, or covered with dirt, and so on. Yes, the clearest ones are like identifying signs in a driving handbook, but the more obscure ones are barely identifiable at all. EDA Test Data In EDA, test data is all important. I took over as CEO of an EDA startup a few years ago, and our tool got great results on all the public domain designs we could get our hands on. However, we were addressing power reduction, and those designs were done in an era when power was less important and the designers had not made a lot of effort to reduce it. So finding savings was easy. When we finally engaged with customers and got our hands on real leading-edge designs, our results were less impressive. Instead of saving 30-35% of power, we were saving 7-9% of power. Not nothing, but not compelling enough to get groups to introduce a new tool into the flow. In fact, one big advantage a company like Cadence has over smaller companies is that we are engaged with the foundries early, and with their leadng customers too (leading, in the sense that they will be the first designs in the new node). There is a chicken-and-egg aspect to designing tools for a new node, since there are no test cases until a customer creates them, and they can't create them until we supply some (inevitably immature) tools. So it is a bit like timing convergence, getting both EDA tools and the initial designs to the finish line at the same time. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

What's For Breakfast? Video Preview September 3rd to 7th 2018

$
0
0
https://youtu.be/ipUI2OaAWX4 \ Coming from the old Ambit Design Systems office on Augustine Drive (camera Bill Deegan) Monday: Labor Day Off-Topic: Almost Everyone Has More Than the Average Number of Legs Tuesday: Ambit Design Systems Wednesday: PCAST: Presidents Council of Advisors and Science and Technology Thursday: DARPA's Electronics Resurgence Initiative Friday: ERI: CHIPS and Chiplets www.breakfastbytes.com Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Whiteboard Wednesdays - The Simplest Neural Network Explanation Ever - Part 2

$
0
0
In this week's Whiteboard Wednesday, Tom Hackett continues his explanation of neural network basics using an Excel spreadsheet as a learning vehicle. You can download the spreadsheet here: https://ip.cadence.com/uploads/1213/neural-network-calculator-xlxs-zip https://youtu.be/ow1ZuOPD72I

Carbon Nanotube Memory: Too Good to Be True?

$
0
0
One of the sessions at HOT CHIPS is on new technologies, and one of those presentations was by Bill Gervasi of Nantero on Architecture for Carbon Nanotube Based Memory (NRAM) . The technology sounds too good to be true, as someone in the audience said during the time for questions afterwards. It is faster than DRAM. It is cheaper than DRAM. It has greater capacity than DRAM. It is persistent (which, apart from the persistence aspect, also means it doesn't need to spend about 15% of its time on refresh, and other things). Unlike flash, it has no wear out. Persistence seems to be infinite, or measured in hundreds of years. There doesn't seem to be any temperature sensitivity, at least below 300°C. And if space is your thing, it appears to be immune to alpha particles. I happened to talk to Bill during the social hour after the first tutorial day, before I'd seen the presentation. It sounded intriguing, and even more intriguing after seeing the presentation. Who is Nantero? They are an IP company, so they don't build their own products, they license them. They have been around for quite some time developing the technology. The one licensee he could talk about was Fujitsu, who had announced a few weeks before that they would take the technology to mass market starting next year. What is CNT Nonvolatile Memory? The memory works using the Van der Waals effect which keeps nanotubes that are apart apart, and keeps nanotubes that are together together. Van der Waals forces are attractive or repulsive depending on distance. It takes a pulse of energy to switch them in both directions, which is where the non-volatility comes from. The energy requirement is 5fJ/bit (compared to 5-7fJ/bit for DRAM, so compatible) in the form an electrostatic pulse, voltage down the word line, with activation through an associated bit line. Some of Nantero's secrets are involved in how they prevent sneak to adjacent bit lines. One weird characteristic of the memory is that writes are faster than reads. The above diagram shows the CNTs near the bottom electrode pushing up towards the top and opening up a gap. The resistance between the two electrodes changes by 10X, so detecting the difference between 0 and 1 is not hard to detect and there is no need to calibrate across the wafer. There are between hundreds and thousands of CNTs per bit cell. The top tubes don't move and don't participate in the bit storage—they are there to prevent the metal sputtered on to make the top electrode from permeating all the way and shorting out the cells. The cells either stay apart or stay together, driven by the Van der Waals forces. The manufacturing process is simple, at least the way Bill described it: Build the die normally. Spin coat carbon nanotube slurry, bake it, sputter metal, etch, seal, done. Having said that, in the questions, Bill said that: Part of our secret sauce is the formula by which we create the CNT slurry. We build the machines for that, and licensees clone them. Each CNT cell has a certain diameter, certain vertical height, and the CNT diameter and length have to be chosen for that. The CNTs need to be the right diameter and length for the process technology. If they are too long, then they won't move when switched, and if they are too short, they can end up standing vertically and not switching either. Bill didn't say anything about it, but I know one problem with making CNTs is that some of them turn out to be metallic (conductors) but in this stochastic configuration where lots of tubes are used to store a bit then I assume this doesn't matter (it matters a lot if you try and build transistors using CNTs for the channel, since you can't turn off a metallic "CNT"). The NRAM is built on top of the top layer of the die, and so can be added on top of any technology. It doesn't even have to be silicon (or even logic). Scaling NRAM Initially, Nantero and their licensees are targetting drop in DDR4 replacement, moving later to DDR5 replacement as that transition takes place in the DRAM market. The technology can be further scaled though: Add more layers of CNTs Standard die stacking using TSVs (like hybrid memory cube). Process scaling is a function of the number of CNTs per bit and is well understood all the way down to 5nm and beyond. Today there is one bit per cell, but multi-level cells can be built as a function of the pulse. Persistence There is increasing interest in persistent memory and making the software aware of it. Some of this is driven by 3D X-point (which Intel calls Optane). Some is driven by NVDIMMs, which are actually complex DIMMs that have the DRAM, some NAND to hold the backup, and a battery (or big capacitor) to keep it all working over a power fail since it might take several minutes to copy the DRAM to flash or back. With NRAM, there is no need for either the power source, nor the backup flash. 3DXpoint (aka Octane) is called "storage class memory." Bill likes to call NRAM "memory class storage." Architecture Above is the architecture of a DDR4 NRAM chip. The dark blue parts are standard DDR functions. The orange are the NRAM components. The pale green are added value blocks that may or may not be present. Summary If this technology is really as good as it appears, I don't understand while all the DRAM manufacturers (or people that want to compete with them) haven't licensed this technology and built gigscale fabs to manufacture it. Advances in DRAM are getting really difficult and so going really slowly. In fact we are scaling DRAM bit volume mostly by building more and more fabs, not by getting more and more bits per wafer. That is unsustainable. Lots of technologies, various flavors of MRAM, RRAM, spin technologies, and so on, have looked like the answer to be "what comes after DRAM". So the big question for me is whether this is, as it looks on the slides, a technology that can take over after DRAM, or if it has some fatal flaw nobody in the audience at HOT CHIPS seemed to spot. But given that, here is the summary: Electrostatic effects set & reset each bit Resistance delta of 10X allows reliable sensing Dielectric-free cell shows no wear-out DDR4 NRAM includes a DRAM-compatible front end Defines a new category “Memory Class Storage” NRAM per die capacity scales far beyond DRAM Fully deterministic timing better than a DRAM On-the-fly ECC incorporated for server class reliability Module level NRAM products are plug and play compatible Industry is ready for persistent main memory Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Virtuoso: 新序曲-DRD推出新界面

$
0
0
新版的Virtuoso平台(*ICADVM18.1)提供了突破性的分析功能和创新的仿真驱动布局,实现了更强大,更高效的设计,支持最先进的工艺技术。基于该解决方案,我们能够通过最先进的方法来显著提高生产力,并且提供业内最全面的解决方案,能实现芯片,封装,模块和电路板间相互操作的流程。 *该功能也能用于IC6.1.8的成熟工艺节点。 最近几个月,如果您关注Cadence 新闻的话,一定会听说DRD重新推出的消息。设计规则驱动(DRD)工具的研发团队长期致力于研究新的软件架构和交互算法,以期望相较于以往,能实现更智能,更高效的设计内验证。 他们密切关注现有的挑战,并下定决心从头开始重建DRD。新推出的DRD 将为您带来全新的体验,让我们近距离了解一下吧! 高性能且更智能的使用性 您也许会想更了解新DRD 如何提高您的体验以及为什么要选择它的原因。那么不要犹豫,跟着我们一起来探讨它的优点: 性能提升: 新DRD 能显著提高产品性能,其结果令人感兴趣。您可以使用与DRD 关闭时相同的速度,用于编辑DRD开启时的布局。提升新工具性能的主要功能包含: 优化区域加载以用于检查 基于滑动窗口的检查方式 在处理实例和对象集时,可基本文本进行检查 大量的数据,为DRD 性能提升提供了有力证明。 提高可视化的智能使用模型: 新DRD 配备了智能使用模型,能提高错误问题的可视化和简化调试的体验。主要优点如下: 增量显示不符合电路设计规则的信息 提升屏幕外违反电路设计规则检查的显示 限制光晕数量的选项 统一界面: 新DRD工具栏只需单击便能快速访问及整合界面。这个新界面同时包含了交互和批量式检查选项,而不是两个独立的表格。 用户控制能力的提升: 新DRD能让用户更好的控制层次结构 和用于设计验证的约束和层过滤器。 相关资源 欲知更多新DRD界面及其使用模型信息,请点击 Cadence Support portal 查看相关视频: 新DRD用户界面简介 DRD 增量违规显示的特征使用 DRD滑动窗口的使用 是不是迫不及待想了解DRD?请时刻关注即将推出的Virtuoso 平台的ICADVM18.1 及IC6.1.8 ,了解并体验全新的DRD。 联系我们 更多Virtuoso ICADVM18.1的相关信息,请查看 What's New in Virtuoso ,关于Cadence 定制IC/模拟/RF 设计产品和解决方案,请访 www.cadence.com. 想要了解 Virtuoso 设计平台的最新消息或者您对这个博客所涵盖的产品功能,有任何疑问及反馈,请联系 team_virtuoso@cadence.com . 如果您需要接收即将推出的Virtuoso 高工艺节点和高级方法学版本的最新消息,请于页面顶部 Subscriptions 处输入您的邮箱ID ,赶快点击SUBSCRIBE NOW。 作者:Pallabi Roy 和 Navit Rana (Virtuoso团队 )

Don't Miss These 6 Things At CDNLive India 2018!

$
0
0
1. The Cadence keynotes: Lip-Bu Tan on Day 1 and Babu Mandava on Day 2 Apart from being part of the executive management team at Cadence, both Lip-Bu and Babu are veterans of the industry. They have a wealth of experience and insight, and they will be sharing their visions of where the industry is going and what technology trends to look out for in the coming years. We don't often get to hear from global leaders, so make sure to come in time to listen to Lip-Bu and Babu. 2. Networking We have two tea breaks and one lunch break which are excellent opportunities to network with your peers from the industry. The value of going to conferences is not just what you learn in the sessions, but also who you meet, the professional bonds that you form, and getting to know about what your peers in the industry are working on. 3. Stop by the Designer Expo We have three of our global sponsors - GLOBALFOUNDRIES, Doulos and ClioSoft - exhibiting at the Designer Expo, as well as several of our partners. Each of them is going to be showcasing a unique offering, so stop by and see what they have to show. 4. Selfie booth Since we started it three years ago, the selfie booth at CDNLive India has been hugely popular. Each year it's a slightly new design, and we're sure you'll love what we've done this year. Make sure you use the hashtag #CDNLiveIndia when you post online! 5. The Closing Ceremony Not only do we announce the Best Paper award winners in each track during the closing ceremonies on both days, we also have a lucky draw for the Designer Expo Passport Game and the CDNLive app leaderboard. You can't win if you're not physically present in the room, so don't miss it! 6. The CDNLive app I mentioned it in my last blog, and I'll say it again - download the app to be on top of whatever is happening at CDNLive India. It's got surveys, polls, posts...it's the one-stop destination to know what's happening in real time. All of us in the CDNLive India team are super excited about the event and we hope you are too! If for whatever reason you can't make it to the event, make sure to follow theCDNLive India Facebook page to follow along in real time.

Great Academic Networking in Taiwan - 2018 VLSI Design/CAD Symposium

$
0
0
Cadence Academic Network and Taiwan Marketing co-supported the annual VLSI Design/CAD Symposium in Taiwan for the 3rd consecutive year. This year, around 800 local industry experts, academic experts, and students attended the symposium held in Tainan, Taiwan on August 7 th to 10 th , 2018. Taiwan is renowned as an IC design power house. The VLSI Design/CAD Symposium, which plays an important role in stimulating local research, is aimed at providing an open forum for professors, industrial engineers and ,most important of all, graduate students to exchange cutting-edge R&D knowledge and ideas in the fields of SoC/ASIC design and EDA research. In this event, Mr . Chen Liang-Gee , Minister of Science and Technology of Taiwan, gave a keynote with a topic of "Review IC 60 Years Histroy and Start A New AI Innovation” on the first day. And Dr. Qi Wang , Vice President of Cadence Intellectual Property Group (IPG) and CEO at Nanjing Kai Ding Electronics Technology Co. Ltd., warmly delivered an indursty keynote to address the topic of “Machine Learning and Artificial Intelligence – Systems to Silicon: Faster and Smarter”. It was the 2 nd time he gave a keynote at this symposium. Dr. Qi’s speech received positive feedback and stimulated interactions with professors about industry trends and Cadence technology. Cadence will continually work with Taiwan academia to build a sustaining ecosystem.

Google's Titan: How They Stop You Slipping a Bogus Server into Their Datacenter

$
0
0
At the recent HOT CHIPS conference, Scott Johnson of Google talked about some challenges that Google has. There have been stories about hackers infiltrating malware into the supply chain. Given the stories about the NSA intercepting Cisco router shipments and adding trojan loggers, this is not pure paranoia. As Scott put it, "how do we even know it is our equipment?" The solution is to tag and verify every device. Cloud companies like Google have numbers of servers measured in the millions, so you can't just go round and check them all visually. Next problem is verifying the boot chain. When a server (or even a smartphone) is powered on, it first runs what is called the primary bootstrap. usually out of ROM (which can't be changed). Its function is to find the real bootstrap, sometimes called the secondary bootstrap or the bootloader. This checks various stuff and then finds the real code for the operating system and transfers control to it. Google worries about whether the bootloader is truly their code, and then whether the operating system code is truly Google's operating system. Remember, Google is not worried about some teenager in their basement, they are worried about national organizations and organized crime. The solution is to sign and verify all boot code. They rapidly came to the conclusion that they need a silicon root of trust, and built on that they can move up to the datacenter hardware, then to the software infrastructure (operating system etc), and then up to the cloud software. They wanted this to have four important properties: Every element in the datacenter should be securely identifiable, what Scott calls "cryptographic attestation." The first code executed should be cryptographically signed and verified firmware, live-monitored for protection. All activities in the datacenter should be monitored and logged in a tamper-resistant manner. Own and/or verify every piece of the stack from transistors up to critical firmware. So they decided to create a chip to do this. In turn, the above requirements led to a set of requirements for the chip itself: On-chip verified boot. Cryptographic identity and secure manufacturing. Boot firmware check and monitor. Silicon physical security. Transparent development, full stack. Titan The chip they built is called Titan. It sits low down in the system hierarchy as you can see from the above diagram. Titan is a secure low-power microcontroller designed with cloud security as a first-class consideration. But it is more, not just a chip. It also involves a supporting system and security architecture, and a secure manufacturing flow. Their motivation for doing their own chips was partially that there wasn't anything existing they could use. But also that they wanted complete ownership, auditability, and to build up local expertise in the area and not depend on 3rd party security experts. Also, new attack vectors arrive all the time and so they wanted agility and velocity. If it is their chip, they can respond faster. The above diagram shows the architecture of the chip. The blue boxes are memory: 32b microcontroller core, boot ROM, flash for instructions and data, SRAM scratchpad, and one-time programmable fuses (more about these later). The green boxes contain cryptographic acceleration, key management and storage, and (true) random number generator, along with the usual mix of peripherals. The red boxes are physical defenses, live status checking, and hardware security alert response. Let's take a look under the hood. Verified Boot The verified boot progresses as follows, with each stage verifying the next. There is duplicate flash code so that it can be updated live, and the system is still in good shape if it fails during the update. Code signing is taken seriously, and though it was beyond the scope of this talk, Scott said that there are multiple key holders, offline logs, playbooks for who can do what, when. The boot works like this: LBIST (logic built-in-self-test) and MBIST (memory BIST) are run. If either fails, the system stays in reset. If all is OK, the system jumps to the boot rom. The boot ROM compares the two bootloader (BL) version and chooses the most recent. The bootloader signature is verified. If that fails, try and verify that other one. If that fails too, freeze. Next, the bootloader compares the two firmware (FW) versions and chooses the most recent. The FW signature is verified. If that fails, the other one is tried. If that fails too, freeze. Execute the successfully verified FW. Trusted Chip Identity Trust is established at manufacturing. Each tested device is uniquely identified with an assigned serial number (unique but not secret), and it then generates its own cryptographically strong identity key. This is done using multiple silicon technologies (ROM, fuse, flash, logic) all of which need to be defeated to compromise the chip. This identity is registered in an off-site secure database. Parts are shipped and then put on datacenter devices for production. They are then available for attestation, proof that the servers are Google's. The boot ROM is locked down at tapeout, so it has to be small and bug-free since there is no way to change it. The Life-Cycle Tracking After manufacturing, there is a continuing need to guarantee authenticity. So Titan is in one of six states, and moves irreversibly from one to another by blowing OTP fuses. The 6 stages are: Raw: no features enabled, deters wafer theft Test: enables test features only, no production features. Development: enables production features for lab bringup. Production: final production features, no testability, unique keys. RMA: re-enable testability but disable production. RIP: after RMA or manufacturing failure, permanently disable the device. The above diagram shows the fuses used for each stage. Note that due to the choice of fuses, a given chip can only go from left to right, and a development chip (for playing in the lab) can never be enabled for production. Physical and Tamper-Resistant Security Scott admitted that some of this is overkill for a datacenter that is already protected by armed guards. If you manage to get into a datacenter, you are probably not going to use lasers to attack the Titan chips, but they wanted to learn what it would take and, in the future, Titan or similar chips might be used in less secure environments like smartphones. Attack detection (power supply glitch, laser, thermal, voltage). Fuse, key storage, clock and memory integrity checks. The clocks are generated on-chip, so you can't attack them directly. Memory and bus scrambling and protection. Register and memory range address protection and locking. TRNG (true random number generator) entropy monitoring. Boot-time and live status checks. In the event tampering is detected, Titan responds by one of: an interrupt, a non-maskable interrupt, freezing the system, or performing a full system reset. Open Titan Titan as described is proprietary to Google, but the basic security mechanisms and the digital implementation are commodities, and good candidates for open-sourcing. So Google is moving towards an open, transparent implementation of a secure root-of-trust, built around a RISC-V processor. It could be implemented in "any" technology, with standard-cells, memories, I/Os etc provided either open source or by the foundry, along with foundry specific blocks such as OTP and flash. Some of the blocks, such the TRNG, require more than digital logic and would depend on an analog implementation (with a digital wrapper). Those blocks have dotted red lines around the blocks in the above diagram. In fact, Google has set up the Silicon Transparency Working Group along with lowRISC, and ETHZurich to drive this project. Eventually, this will be open to anyone (some time next year, probably). Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Adding a Patch Just in Time! — Or Can You Really Allow Yourself to Waste So Much Time?

$
0
0
One animation video - Patch Like The Wind - is worth a thousand words :) If you don’t use Specman or don’t use Specman correctly, you spend most of your time waiting for compilation to finish. One of the most frustrating (and common…) scenarios is when you know more or less what the fix should be (such as, “wait additional cycle before sending” or “the variable should be int and not uint”) and the fix can be done in a matter of minutes. However, you are forced to spend hours waiting for the compilation to end in order to analyze the results and decide if you are satisfied with the fix. While fixing the code, usually you do not write the exact right code the first time. So, you adjust your code – another matter of few minutes – and then you have few more hours to wait for compilation. Horrifying. But not if you use Specman. With Specman, you can fix a file loaded on top of the compiled environment. Here, you don’t need to wait for hours of compilation. You can create a small ‘patch’ file, in which you implement the changes, and then you can load this patch file on top of the compiled environment. Once you are happy with the changes you made – you can move the fixed code to the relevant file/s and compile. Loading one or two files instead of compiling the whole environment can save you hours, if not days, for every fix of your code. If you want to save some more time – don’t run the test from the start. Save the test before the interesting point (before sending the item, when calling the checker), and dynamic load the fixed file. The capability to modify the testbench with code loaded on top is also very helpful when it comes to getting fixes from other teams/companies. Instead of waiting for the VIP or any other tool provider to create a fixed version – they send you one e file which you can load on top. Even Specman, as it is written in e , can be extended with patches. No need to wait for an official hot fix - you can get a patch file from the Support team with the required fix. Yes, this is no news to Specman users. In meetings with Specman users, asking them “What is your favorite feature?”, one of the top answers is “the Great Patching Capability”. Next time you are asked “Are you still using Specman?”, you can reply “Sure, and are you still compiling?”

Virtuoso: 新序曲—设计意图工具(Design Intent)工具简介

$
0
0
新版的Virtuoso平台(*ICADVM18.1)提供了突破性的分析功能和创新的仿真驱动布局,实现了更强大,更高效的设计,支持最先进的工艺技术。基于该解决方案,我们能够通过最先进的方法来显著提高生产力,并且提供业内最全面的解决方案,能实现芯片,封装,模块和电路板间相互操作的流程。 *该功能也能用于IC6.1.8的成熟工艺节点。 您来说,我们聆听。您也许需要一个系统来辅助,位于不同地区,国家或时区的原理图和版图设计人员之间交流。在定义和讨论设计目标的情况下,解决实施限制,达成一致,并且记录其决策,以防止在设计重用期间的重复工作。那么,Cadence 带给您… Virtuoso 设计意图工具(Design Intent)是一种新功能,完善了Virtuoso Schematic Editor XL和Virtuoso Layout Suite XL应用程序。 解放设计人员 Virtuoso Design Intent促进并捕获指定设计目标的原理图设计师与版图设计师之间关于实施和实现这些目标的沟通。 原理图设计师们只专注于捕获其自己的设计意图,而忽略设计过程中是否能创造物理约束, 这并不意味着与Virtuoso Unified Custom Constraints (Virtuoso 独有的定制约束)毫无关系。相反,现在版图设计师们可以随心所欲的决定如何执行物理约束,并且能实其设计意图。 接下来将继续介绍更多相关的Virtuoso Design Intent, 之后我们将会再介绍如如何完善约束流程。 设计意图工具的流程 Virtuoso 设计意图工具流程包含于Schematics XL 和Layout XL 中,它要求原理图设计师们和版图设计师们间进行交互设计。 原理图设计师选择一个目标或者一组目标以添加设计意图 (例如:器件匹配要求,噪声/敏感网络,高电流,压降,引脚信息)。他们结合文本注释和预定义属性配置文件,在“Create Design Intent”表格中获得其设计目标。其中,这些配置文件包含常用的设计意图的特定属性,并且这些属性能形成其设计目标, 例如:添加屏蔽,及添加保护环等。 产生的每个设计意图都被存于原理图中, 并且通过色彩注释显示于画布上,易于识别。 创建的设计意图可以同步到Layout XL中,并且从那时起,在设计意图工具上的任何更改都会同步于其设计,并且Schematics XL 和 Layout XL 中都能查看到其更改。 版图设计师们可以清楚地识别使用设计意图指定的对象,并开始实行每个意图。使用“Edit Design Intent”表格,他们可以更新当前的实现阶段并且添加实现说明,或将问题反馈给原理图设计师。 通过定期同步,原理图设计师可以更新其设计中每个设计意图的实现进展。通过编辑设计意图表单,他们可以响应版图设计师所记录下的任何疑问和评论,在需要时调整其设计意图,并且最终签发他们设计意图的实现。 可以使用通过Schematic XL或者Layout XL生成总结报告并随时检查其设计中所有设计意图的实现进展。 那么,现有Virtuoso 独特定制约束流程是什么呢? 如果您熟悉Cadence 交互自动约束驱动工具(Virtuoso Unified Custom Constraints), 那么您会意识到它能帮助您在Schematic XL 和 Layout XL应用程序中进行无错设计。每个设计的约束即是用户自定义的一条物理规则或者电气规则,从而帮助您实现设计目标。 Virtuoso 设计意图工具完善了现有的约束流程,能够在更高的级别捕获原理图设计师的需求,在角色不重叠的情况下将其需求发给版图设计师。使用Virtuoso 设计意图工具能抓住其设计目标,使得约束只专注于定义满足和实现设计师们最初意图所需要的特定规则。 听起来不错吧?实际上更好……请时刻关注即将推出的Virtuoso 平台ICADVM18.1 及IC6.1.8。 联系我们 更多Virtuoso ICADVM18.1的相关信息,请查看 What's New in Virtuoso ,关于Cadence 定制IC/模拟/RF 设计产品和解决方案,请访问 www.cadence.com . 想要了解 Virtuoso 设计平台的最新消息或者您对这个博客所涵盖的产品功能,有任何疑问及反馈,请联系 team_virtuoso@cadence.com . 如果您需要接收即将推出的Virtuoso 高阶工艺节点和高级方法学版本的最新消息,请于页面顶部“ 订阅 ”处输入您的邮箱ID ,赶快点击“SUBSCRIBE NOW”。 作者:Sarah Finlayson,Gautam Kumar 和 Mark Baker

Breakfast Buffet for August

$
0
0
https://youtu.be/elQgyXvkcjU The three highlighted posts for August were: Two posts on Shockley Labs: The Birthplace of Silicon Valley: 391 South San Antonio Road The Brief but Spectacular History of Shockley Labs SEMICON 5nm: 7nm Is Just a Dress-Rehearsal Google's Titan: How They Stop You Slipping a Bogus Server into Their Datacenter Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

HOT CHIPS Tutorial: On-Device Inference

$
0
0
The Sunday of the annual HOT CHIPS (the 30th!) conference is tutorial day. In the morning, it was the Blockchain, which I missed due to other commitments. in the afternoon it was Deep Learning. This was divided into 3 parts: Overview of Deep Learning and Computer Architectures for Accelerating DNNs. Accelerating Inference at the Edge Accelerating Training in the Cloud I am going to focus on the on-device inference section since that is most relevant to applications for the higher end entries in Cadence's Tensilica processor portfolio. I'll also pull in some information from the cloud-based segment on benchmarks. This was presented by Song Han of the MIT Han's Lab. Han's Lab is not just using his name, the H stands for High-performance, high-energy-efficient Hardware. A is for Architectures and Accelerators for Artificial intelligence, N is for Novel algorithms for Neural Networks, and the S is for Small Models (for inference), Scalable Systems (for training), and Specialized Silicon. Obviously, one way you can do a better job of on-device inference is to build a special on-device inference engine. Indeed, during the two main days of HOT CHIPS several were presented from Arm, NVIDIA, Xilinx, and DeePhi...except that a few weeks ago Xilinx acquired DeePhi, so that one was Xilinx too. But there's more. All the server processor presentations had optimizations for neural network programming. Even the next generation Intel processor, which is called Cascade Lake SP for now, has new extensions to the ISA adding a couple of instructions specifically for evaluating neural nets faster than is possible with the regular instructions. But that is a topic for another day (or two). Training a neural network almost always takes place in the cloud, using 32-bit floating point. There is a lot of research that shows that you need to keep the precision during training even if eventually you plan to run a reduced model. If you reduce too soon, you miss getting stuck in local minima, or ending up in something that does not converge. Usually, when you see a graph showing a surface representing the space that the training algorithm is exploring, it is a nice smooth saddle where nothing can go wrong. But the picture below is actually more representative: Deep Model Compression I first saw Song Han speak at Cadence. See my post The Second Neural Network Symposium for more details. Back then Song was still doing his PhD on compressing neural networks. Somewhat to everyone's surprise, it turns out that you can compress neural networks a lot more than anyone expected. Pruning The first optimization is pruning. The network as it comes out of training has a lot of connections, and many of them can be removed without any loss of accuracy. Once they are removed, the network can be retrained with the reduced connectivity, and the accuracy is regained by retraining and recalculating all the weights. The process of pruning and retraining can be iterated until there is no reduction without too much loss of accuracy. It turns out that the human brain does pruning too. A newborn has 50 trillion synapses, this grows with the brain until there are 1,000 trillion synapses by the time a baby is one year old. But that gets halved back down to 500 trillion synapses by the time that baby is an adolescent. Pruning the neural network this way has a similar effect, and sometimes the pruned and retrained network is not just smaller than the original but has increased accuracy too. Using this approach on AlexNet, the convolutional layers can be reduced by 3X, and the fully connected layers by 10X. Sparsity The next technique is sparsity. There is obviously a straightforward optimization when a zero weight is fed into a multiplier since we know that anything times zero is zero. So not only do we not need to feed the zero into the multiplier, we can hold the other input at its old values and save both a memory access and power from toggling the bus. When training, a lot of weights are barely participating in the inference and are close to zero. By setting them exactly to zero, the matrix becomes sparse and all sorts of optimizations are possible. The sparsity can be unstructured, or it can be structured, as in the diagram below. By using sparsity, a network that looks like it can deliver, say, 1TOPS can deliver 3 TOPS (if you count all the operations involving zero that were never actually executed). Quantization Quantization in this context means reducing the width of the weights from 32-bit floating point, to 16-bit, 8-bit, or even lower. It seems surprising that you would not lose a lot of accuracy by doing this, but deep compression really works. Song actually did a lot of the research in this area as part of his doctoral thesis, and found "you could be significantly more aggressive than anyone thought possible." Putting It All Together If you do all of this, you get compression ratios as high as 50X. If that number is surprising, then more surprising still is that every one of the benchmarks that Song talked about had increased accuracy with the compressed networks. Compression is not a compromise, there is clearly a reason mother nature prunes our brains too. But wait, there's more...the pruned models accelerate image classification and object detection. This is because the limit on speed (the so-called "roof line") is the hitting the memory bandwidth limit, not hitting the computational limit. By reducing memory accesses, the computation units can be kept busier. This is almost independent of what engine is being used to perform the inference. There really does seem to be only upside to compressing the network: smaller, faster, more accurate. Designing Hardware Based on the presentations of specialized neural network processors over the following couple of days, I would say that the lesson that everyone has taken away from the work of Song Han (and others) is: Train in the cloud at full precision Compress the network using the techniques above Optimize the inference hardware for sparse matrices, avoiding representing zeros. Optimize for MAC operations where one input is zero, and suppress the operation, and the access to the non-zero operand. Reduce the precision to 8-bits (or maybe 16-bits) and built lots of 8-bit MACs. Don't use caches, you are just wasting area. Be smart about ordering the operations so that values fetched from memory are re-used as much as possible, rather than moving on and coming back to reload the same value later (of course, you can't avoid this completely, but you can be smart, or rather your compiler can be). Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

为什么电源完整性(PI)是个“热”话题————如何进行电/热协同仿真

$
0
0
在设计新一代产品时,我们共同追求的目标都是“更快,更小,更便宜”。然而当这与更长的电池寿命和更低的功耗要求相遇时,就向我们提出了艰巨的设计挑战。唯一可以肯定的是,项目开发进度并不会因为我们需要克服挑战而延期。 每个电子产品的设计师无疑都需要能够 分析供电网络 的工具。虽然元器件可以承受电源和地通路的某些波动,但这种容限是有限的。穿孔严重以至于像瑞士奶酪般的板层,以及为给信号布线腾出空间而在填充区域走线、打孔的做法只会加剧电压波动。但是当我们处于“更快,更小,更便宜”的压力之下时,这些却成为了我们的权宜之计。 直流电源分析(也称为 压降分析 )工具通常是电子产品设计人员在面临设计挑战时首先使用的工具。然而在固定温度下进行的分析却存在一个常见问题:当电流通过板层中的穿孔平面和阻塞区域(瓶颈区域)返回时,电流密度将导致这些部分的温度高于PCB的其他正常区域。因此,在固定环境温度下分析压降会导致压降预测的不准确性。 解决方案则是使用专业的工具(如下图所示)对 压降分析与热分析同时进行 :根据电子产品PCB区域的运行温度对直流压降进行准确的预测。 除电/热协同仿真外,还可以 分析多板配置 :即对于附带存储卡的产品,可以对其进行完整的系统供电网络分析。 (点击查看大图) (点击查看大图) (点击查看大图) 以下是11分钟的详细技术演示视频,演示所用工具为 Sigrity PowerDC technology。 https://youtu.be/1WJL3f--uGM 如果您正在使用Allegro工具进行PCB或IC封装设计,您甚至可以在设计中直接调用、访问 电热协同仿真工具 Sigrity PowerDC。精准的压降分析可以在批处理模式下运行;通过报告文件提供的链接,设计人员可以准确定位超出规范的设计部分。您从此不必再为完成设计而在不同工具之间来回切换,提高工作效率的同时缩短了设计周期。 期待您的意见和评论。 *原创内容,转载请注明出处: https://community.cadence.com 更多相关阅读: 封装/ PCB系统的热分析:挑战及对策 警惕发热!——热模型交换 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com

升级到Allegro17.2-2016的10大理由之1:先进的柔性和刚柔结合板设计支持

$
0
0
为何要刚柔结合? 对几乎所有应用,客户一直希望能有更小、更轻、性价比更高的产品。竞争压力也促使设计工程师们以不断增长的速度将这些新产品带到市场。设计工程师们可使用柔性PCB材料(柔性/刚柔结合)来满足小型化需求、代替连接器,以提高产品性能。 (点击查看大图) 有哪些新技术可支持刚柔结合设计? 制造商对设计的进一步要求已准备好,如在柔性衬底上实现元件安装,支持多层柔性板以缩小尺寸,并提高高速性能。 从第一次成功开始 为了节省时间和成本,必须要尽早与制造商合作,来为您的刚柔结合PCB建立关于性能、材料和文件预期的相互理解。设计标准IPC-2223C, “柔性印刷板的部分设计标准” 提供了关于粘合材料选择、与电镀通孔及过孔相关的贴装放置的信息。 为了提高首次成功概率、减少迭代,设计工程师应该包含更多设计规则来保证“设计即正确”的制造交接和最终完成。这些设计规则包括柔性及刚柔设计中导电层和绝缘层的中间层检查。现在有新的工具可使这些工具更自动化,在过程中更早被采用。 (点击查看大图) 为什么要更多规则?使用柔性带来了更大的责任 … 跟往常一样,在产品结构开发阶段,与结构设计(MCAD)密切配合将会最小化意外问题。当这种支持MCAD-ECAD协同设计的新规则实现后,团队可安心地享受柔性刚柔结合板设计(双关语)。因为可靠性是关键,设计规则通常关注设计的过渡区和柔性衬底。规则包括:最小弯曲半径,避免在弯曲区域或过渡区域放置过孔,避免元件焊盘的位置离弯曲区域过近,最后一条,避免放置会影响弯曲半径、离过孔和引脚太近的补强板。 (点击查看大图) 使用柔性 / 刚柔设计时, Allegro 17.2-2016 如何提高您的首次成功率 我们重新设计 Allegro叠层编辑器 ,为不同的技术用不同的层叠来容纳新的刚柔特性。您现在可以定义包含导电层和绝缘层的完整层叠,比如阻焊层、覆盖膜、补强板和贴装胶。您可以创建、编辑和管理布局布线区域,分配任何层叠到任何区域,包括约束区域和空间(柔性板上允许过孔的非弯曲区域)。随着设计的进展,您可以将刚性区域的一部分移至柔性区域,Allegro动态区域感知的布局自动地将元件转换到内部数据库层,代表柔性外层。之前,这由工作区完成,涉及到焊盘编辑或者使用嵌入式元件技术。 (点击查看大图) 在典型的刚柔设计中,创建各种各样的掩模、弯曲区域和补强板,需要材料和间距的特别间隙或重叠区。因此,我们引入新的中间层检查表,使用定制DRC规则的配置表,来保证您满足刚柔设计的需求。这份电子表格提供了正在建造什么的真实映像,允许设计工程师进行更精确的DRC检查,接收到更好的反馈,提供更好的数据给CAD CAM工具来进行制造。因为PCB设计工程师需要处理很多不同的材料,应可采用多层结合的方式并制定特定的规则。通过我们简单的操作过程,您可以选择两层,定义DRC类型和值,分配特殊的DRC错误代码,从而在Allegro环境中易于辨认。 (点击查看大图) 攻克复杂的布线路径 柔性电路常有复杂的布线路径来匹配这项技术的特定能力。使用Allegro弧-识别布线,设计工程师可以在画复杂板轮廓时轻松布总线,同时调整走线来匹配一直变化的需求 。 (点击查看大图) 最后:别忘了 Allegro 技术全力支持使用 IPC-2581 刚柔设计工程师在传递设计数据至制造过程中是特别的。组成最终产品的各种积累起来的材料必须被明确定义,尤其是为了阻抗管理或复杂柔/刚设计。为了明确沟通构造意图,PCB设计工程师现在可以使用IPC-2581来交换电子版的层叠数据。IPC-2581是开放的、智能的中性数据交换格式,由全球的PCB设计和制造商支持。IPC-2581修订版B现在支持层叠数据的双向交换,来清除在设计传递周期的后期发现的问题。 (点击查看大图) 欢迎您的评论 您在设计中使用过刚柔结合设计吗?您是否有可分享的使用技巧? 您可以通过 PCB_marketing_China@cadence.com 联系我们,非常感谢您的关注以及宝贵意见。 相关视频 欢迎观看Allegro刚柔结合设计技术小视频: https://youtu.be/SjawMC_oIU4 * 原创内容,转载请注明出处: https://community.cadence.com 相关文章: 升级到Allegro17.2-2016的10大理由 Allegro刚柔结合设计支持 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com

升级到Allegro17.2-2016的10大理由之2:新的实时并行团队设计功能

$
0
0
利用团队设计实现快速设计 无论如何,PCB团队设计总是实现快速设计的最佳捷径。您可以使用Allegro 17.2 2016版本中新的实时并行团队设计功能,通过动态分配资源来应对设计周期不断缩短的挑战。我们的产品给设计工程师提供一个快捷的方式,可分享共同的Allegro数据库,进行协同设计。无论是组建正规的设计团队或者是临时通知的任务,设计工程师仅需简单地分享他们当前的设计,并邀请其他设计师加入帮忙。您也可让专家做出贡献来加速项目进程。通过复制周围的数据库,剪切/粘贴设计更新到主数据库,无需增加设计管理,新的 Allegro® PCB团队 协同设计 选项 是一个实时合作的环境,只使用一个数据库,几分钟内即可在设计团队内共享。 解决方案一: 考虑 "24 x 7" 跨越多区域分配工作的理想解决方案是我们独创的“异步” Allegro PCB 团队设计选项 ,其可支持将工作拆分并将结果整合至主数据库。工作流程管理器提供分割区域的状态及多方向的通讯特性,支持设计工程师们加强合作。团队成员可以查看别人的工作,可便捷地在分割区域间处理分享的区域,使用软边界在分割区域外工作,除了支持垂直方向的分割, 我们也支持水平方向的分割。创建基于板层的分割。整合向导可自动处理原理图网表变化,自动化整合并导入网表,同时解决约束条件冲突问题。 解决方案二: 考虑“三人团队”(17.2 2016更新) 在同一地点加入更多团队成员的理想解决方案是我们新的“同步” Allegro PCB团队协同设计选项 ,其可支持实时协同设计。通过将用户连接到共用的Allegro PCB数据库,设计工程师们可以同时修改同一个设计,任何来自团队成员的修改变化都可以实时被所有成员看到。最初的版本关注的是在如何减少布线阶段的PCB设计时间,如果是高密高复杂设计,布线将占用高达80%的设计时间。现在我们也支持元器件的布局,包括模块重用和布局模块复制;支持动态铜皮/交互布线;自动交互式布线工具,如自动差分线相位补偿/自动总线调整等长、可视化时序检视和定制自动修线;甚至可以完成丝印调整、装配数据处理。所有用户的编辑会产生临时的根据 “颜色编码” 锁定,每个客户端有能力锁定工作对象, 其他人可以对未锁定对象进行操作, 使团队工作更加顺畅。 节省时间,提前完成 如今设计工程师在布线阶段花费了大部分时间,可以采用自动化的交互功能来加速布线和调试。使用Allegro 17.2-2016发行版中新的协同团队设计选项,设计工程师们可以在更高程度上合作同一个设计,来完成布线,其可将高密高复杂设计的走线时间缩短约80%。 欢迎您的评论! 您在一个项目中与多位PCB设计工程师合作时遇到了什么问题?您是否有可分享的经验? 您可以通过 PCB_marketing_China@cadence.com 联系我们,非常感谢您的关注以及宝贵意见。 相关视频 https://youtu.be/wIv8-qOTgDw * 原创内容,转载请注明出处: https://community.cadence.com 相关文章 升级到Allegro17.2-2016的10大理由 Cadence ® Allegro ® Symphony Team Design Cadence ® Allegro ® TimingVision 技术 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com
Viewing all 6655 articles
Browse latest View live