Two breakthrough developments in the system-on-chip (SoC) design ecosystem were announced in early 2015 – the ARM ® Cortex ® -A72 processor and the Cadence ® Innovus ™ Implementation System . As described at the recent CDNLive Silicon Valley conference, ARM used an early version of Innovus software to design the Cortex-A72. This allowed ARM to maximize power, performance, and area (PPA) as Cadence fine-tuned the Innovus tool to produce high-quality ARM-based SoCs at advanced process nodes. The Cortex-A72 is ARM’s highest performance and most advanced processor. This 64-bit processor boasts frequencies of 2.5GHz in a 16nm FinFET process technology, 3.5X the performance of 2014 devices based on the ARM Cortex-A15, and a 75% reduction in energy consumption when matching performance to 2014 devices. Innovus Implementation System is a massively parallel IC implementation toolset that provides a new placement engine, enhanced power optimization, 10-20% better PPA than previous solutions, and a 10X full-flow speedup. The CDNLive presentation was titled “ Maximizing PPA on ARM Cortex-A72 Using Latest Cadence Implementation and Signoff Tools/Flow .” It was given by Paddy Mamtora, product engineering group director at Cadence, and Brent McKanna, senior principal design engineer at ARM. Mamtora kicked off the presentation by noting that the ARM-Cadence collaboration goes back seven years and includes many ARM CPUs and GPUs along with ARM (Artisan) physical IP. “ARM internally has been using Cadence for some time, and that helps us get ahead of the curve,” Mamtora said. “Typically, we work with ARM on a core a year and a half before it’s announced. We started on the Cortex-A72 two years ago.” A Change in Plans McKanna noted that ARM worked with Cadence throughout the entire development flow for the Cortex-A72, and that Cadence tools “are now very well optimized for building ARM processors.” Cadence understands ARM designs and can help SoC designers get the best use out of them, he said. Innovus Implementation System actually up-ended some plans that McKanna had developed – but he’s not complaining. 18 months ago, he said, runtimes for designing ARM processors were growing too long. It looked like ARM would need to resort to hierarchical flows and break CPU designs up into multiple blocks. “As the manager of the CPU implementation team that meant that I needed more people,” he said. “All of a sudden I had to deal with constraint management and physical pin management. It wasn’t very exciting to me but it was the reality of where we were headed.” “Fortunately, Innovus came along and threw those plans out the window,” McKanna said. “There really is no value to me in investing all the resources required to build a hierarchical flow when I can build flat in two days now.” And that’s a point that McKanna cited several times during the presentation – thanks to the Innovus system, an ARM Cortex-A72 CPU can be built from RTL to post-route in about two days. With the Cortex-A57 CPU, before the Innovus Implementation System was available, it took 10 days. McKanna provided a brief overview of the Cortex-A72. It offers higher performance and lower power than the ARM Cortex-A57, and will make its debut in next year’s highest performing smartphones. Here are some of the more significant Cortex-A72 features: 3.5X performance of ARM Cortex-A15 processor in smartphone power envelope 75% less energy for the same workloads, enabling slimmer and cooler devices ARMv8-A for 64-bit performance and 32-bit app backward compatibility “Most reliable path” for migration to 16nm FinFET 2X power efficiency over 28nm implementations Enables ARM Cortex-A72 mobile implementations up to 2.5 GHz Physical Implementation Challenges McKanna noted that the Cortex-A72 has high target frequencies, resulting in high wire-to-device delay ratios. To achieve the best balance for power efficiency, ARM chose ultra-low voltage threshold (uLVT) C20 devices for a baseline. “Certainly you could go to a higher frequency with uLVT C16, and that’s what some of our partners will do,” he said. “We limit ourselves to the C20 to give our partners a reasonable starting point.” The Cortex-A72 design flow has multi-corner optimization. It also provides advanced on-chip variation (AOCV), which McKanna noted is required for 16nm. The flow provides both clock and data derating, and the data derates are brought in during pre-clock tree synthesis (CTS) optimization. The ARM and Cadence teams had some challenges that customers won’t have. When the work started, 16nm FinFET was a relatively new process node, and some workarounds were required for routability issues. Those workarounds are no longer needed, McKanna said. Further, he observed, “we didn’t have the benefits of stable RTL. We were always running new versions of RTL, which means new floorplans, and new critical paths, every week.” What the ARM-Cadence collaboration ended up building was an MP4 (quad processor) physical implementation with a 2MByte L2 cache. The process was 16nm FinFET. The Cortex-A72 implementation had an 11-layer metal stack. Cadence tools used included Encounter RTL Compiler, Innovus Implementation System, Tempus Timing Signoff Solution, Encounter Conformal, and Quantus QRC Extraction Solution. (ARM does not actually tape chips out, so there was no design rule checking). The basic Innovus flow is depicted below. McKanna noted that ARM has one designer who owns the flow from start to finish. “We’ll have multiple designers doing the design, but we don’t break it up into a front end and back end,” he said. McKanna also described the Tempus timing analysis flow in some detail, noting that Tempus solution-generated power recovery ECOs can provide a 40% static power reduction and a 5% dynamic power reduction. Key Technologies in Innovus Implementation System Mamtora discussed two differentiating technologies in the Innovus Implementation System – the GigaPlace placement engine and the enhanced GigaOpt power-driven optimization capability. The overall Innovus benefit, he said, is a significant turnaround time advantage and 10-20% better PPA. “I am pretty confident that on any ARM core, we can get you the best PPA in the industry,” he said. GigaPlace is a new placement engine. It provides slack-driven placement and can optimize timing concurrently with routing congestion. It takes only one command to do the placement and optimization. GigaOpt, while previously announced, can now treat power as a native function and can optimize both static and dynamic power. McKanna’s parting comment: “Between ARM and Cadence working together, we have the best knowledge and the best understanding of how to build the Cortex-A72, Cortex-A57, and some of the other processors that are going on within ARM. With the Cortex-A72 you can get a reference flow that you can use as a starting point, and you can work with Cadence to customize it to your needs.” More information about the Innovus Implementation System can be found at this landing page . Richard Goering Related Blog Posts Anirudh Devgan Q&A: What’s Lacking and What’s Needed in Digital IC Implementation Anirudh Devgan at CDNLive 2015 – How Innovus Will Change IC Implementation CDNLive Silicon Valley 2015: A “New Era” in Digital Implementation
↧