Quantcast
Channel: Cadence Blogs
Viewing all 6713 articles
Browse latest View live

Virtuoso: The Next Overture - Congestion Analysis with a New Perspective

$
0
0
The new release of the Virtuoso platform (ICADVM18.1) offers groundbreaking analysis capabilities and an innovational new simulation-driven layout for more robust and efficient design implementation as well as extending our support for the most advanced process technologies. With this solution, we are able to significantly improve productivity through advanced methodologies and provide the most comprehensive set of solutions in the industry with an interoperable flow across chip, package, module and board. With each new technology node, routing becomes even more challenging. The number of design rules has increased significantly - as have the number of metal layers. Designs are becoming far more complex and require advanced constraints to drive the implementation of Analog and Mixed signal flow. To overcome some of these challenges, designers require quick and accurate modeling of congestion for floorplanning, pin optimization, and die size reduction. In this blog, we introduce you to the new Congestion Analysis assistant and how it helps users to visualize, analyze, and plan nets in the design. Introducing Congestion Analysis Assistant The new Congestion Analysis assistant lets the users extract, display, and analyze routing congestion both visually and statistically. Furthermore, the assistant gives the designer tools to optimize routing paths for critical nets and net groups. Here are a few reasons why you should try out the new Congestion Analysis assistant. Easy to Visualize Routing Congestion Heatmap Histogram Statistics The Heatmap graphically displays congestion hot spots in the layout window. This form of display is not new in EDA. However, in large or complex designs, Heatmap might hide important information by itself. To complement the Heatmap, we are introducing the first ever "Congestion Histogram" . It shows congestion in a novel yet easy to understand display that can be customized and filtered. In addition, we have made actual routing statistics available to aid you in the interpretation of the congestion results. Easy to Analyze Routing Congestion Congestion based filterable heatmap Customizable histogram Net probing and path display With large, complex designs, it can be hard to disseminate important information quickly and efficiently. For this reason, we have incorporated many ways to quickly dive down and focus on congested areas of interest. We’ve made it possible for the users to filter congestion data in various ways, customize the Histogram’s “congestion buckets”, and display the reduced data set onto the Heatmap. You can cross probe critical nets in the Navigator Assistant and see them displayed over the Heatmap. Easy to Plan and Optimize Global bias constraints Automated pin optimization Integrated floorplanning with Design Planner It is good to find out that a design has over congested areas. However, how can you resolve it and trust that the design will converge? For this, we are introducing a new method to graphically plan nets and net groups in the design. This is a unique route planning feature that relies on a new type of constraint called "Global Bias". The Global Bias constraints allow the user to set preferred routing paths and areas in the design for routing a specified net or net group – think of it like planning a driving route on your favorite map application . Using this exclusive set of routing features along with the newly introduced Design Planner in ICADVM 18.1, users can now experiment with multiple floorplans and see how different placement strategies impact routing convergence. This can enable users to drive a tighter design and a smaller die size. It also gives the confidence that the design will converge while considering all routing constraints and requirements. Keep calm and Happy routing! Watch out for our upcoming Virtuoso platform ICADVM 18.1 release, and then take a fresh new look at the new Congestion Analysis assistant. Related Resources Congestion Analysis and Global Biasing Virtuoso: The Next Overture - Introducing Design Planner For more information on Cadence circuit design products and services, visit www.cadence.com . Contact Us For more information on the New Virtuoso Design Platform, or if you have any questions or feedback on the features covered in this blog, please contact team_virtuoso@cadence.com . To receive similar updates about new and exciting capabilities being built into Virtuoso for our upcoming Advanced Nodes and Advanced Methodologies releases, type your email ID in the Subscriptions field at the top of the page and click SUBSCRIBE NOW. Parul Agarwal, Michael Hunter, and Mark Rossman (Team Virtuoso)

Come Join Us for "Deep Dive into the UVM Register Layer" - A Webinar From Duolos

$
0
0
Join us on September 14th for a free one-hour webinar on the finer aspects of the UVM register layer. We’ll be focusing on key aspects of the UVM Register Layer that can help you with your UVM modeling in ways you may not be aware of. We’ll be covering the following topics: How to use user-defined front doors and back doors to expand what the register layer can do Understanding the role played by the predictor, and how to use it with the aforementioned user-defined front doors Using register callbacks to help model quirky register behaviors, alongside the side-effects of register read/writes What changes you can or can’t make to UVM code while preserving the random stimulus generation. Combined, the information covered in these topics can make you a better user of the UVM register layer. Code examples shown during the webinar can all be run with our Xcelium Parallel Simulator. Come join in! For more information on this webinar, and for available times on September 14th, check out the link here .

What's For Breakfast? Video Preview September 17th to 21st 2018

$
0
0
https://youtu.be/3drxzhMFGD8 Coming from PCB West (camera Sean) Monday: HOT CHIPS: Some HOT Deep Learning Processors Tuesday: Intel Cascade Lake Wednesday: Embargoed Announcement from Tegensee Thursday: Samsung Galaxy S9 AP Friday: Jaswinder's Only Job Interview www.breakfastbytes.com Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Spectre/Meltdown & What It Means for Future Design 3

$
0
0
I gave an introduction to speculative execution and the vulnerabilities that have come to light this year in my post Spectre/Meltdown & What It Means for Future Design 1 . Yesterday, I covered the first half of the keynote (John Hennessy, Paul Turner, and Job Masters) in Spectre/Meltdown & What It Means for Future Design 2 . Today, I wrap up with Mark Hill and the panel discussion that followed. Mark Hill: Exploiting Modern μ Architectures: Hardware Implications Last of the panelists was Mark Hill. He started with a bit of history: Architecture 0.0: (pre-1964) each computer implementation was new, requiring all software to be rewritten (in assembly language, typically). Architecture 1.0: (1964 on) the timing independent functional behavior of a computer was captured in an ISA (which would be implemented by more than one design such as the pioneering IBM/360 series), and all microprocessors today. Architecture 2.0: what we need next. The flaws in implementation that Spectre and Meltdown have revealed are not bugs, in the sense that all the affected processors are faithfully implementing their ISA correctly. The flaw is in the 50-year old timing-independent definition of Architecture 1.0. Since leaking protected information can't really be "correct", we need to do two things. First, manage micro-architecture problems like we manage crime, not completely fixing it which would be too expensive. Second, we need to define Architecture 2.0 and change the way we do things. Some things to consider at the micro-architectural level: Isolate branch predictors, BTB, TLBs per process and context switch them. Currently, weird as it seems, branch predictors are shared between all processes, meaning that sometimes it gets the guess wrong due to a different process, which is a trivial problem, but also that one process can train the branch predictor to affect another one, which has turned out to be bad. Patition caches among trusted processes (and flush on context switch?) Reduce aliasing such as fully-associative caches (use all the bits) Hardware protection within a single user address space, such as one browser tab treating another as an enemy Undo some speculation where it has minimal performance impact. Is there a "happy knee" where we get good performance and good safety? Mark fears that there is not. There is a potential to bifurcate, and have cores (or modes) that are fast(er) or safe(r), where some speculation is disabled. This is an extension of what is being done for security, where hardware "enclaves" hold the keys, and perhaps the encryption algorithm implementation. This also plays well with dark silicon, where there is no point in just adding more and more identical cores if we can't turn them all on at once. But, as Mark pointed out, this is all very esoteric: I'd be just happy if I could stop my Dad executing downloaded code! Mark's big point is that we need Architecture 2.0 since Architecture 1.0 is now known to be inadequate to protect information. We need to augment Architecture 1.0 with: (Abstraction of) time-visible micro-architecture. Bandwidth of known timing channels. Enforced limits on user software behavior. But he admits that none of this seems good enough yet. Another fact of life is the growing use of specialized accelerators such as GPUs, DSPs, neural net processors, and FPGAs. This can actually reduce the need for speculation since the "main" processor is increasingly just doing housekeeping and not running the CPU-intensive algorithms. However, they have timing channels that may be exploitable too. Nobody seems to have looked too hard yet. Security experts disdain "security by obscurity" in favor of many eyeballs on the code. Only the keys are kept secret. Open source software helps, but even lots of eyeballs on a bad implementation doesn't stop it being bad. Open source hardware is only really getting started, with RISC-V being the most well-known open-source hardware-like thing (it is an ISA, not a hardware implementation, so Architecture 1.0). But as John Hennessy's co-Turing-award honoree, Dave Patterson, said: Most future hardware security ideas will be tried with RISC-V first. Discussion (Note: John is John Hennessy. Jon is Jon Masters). Question : Who should bear the cost? Today, Intel, Red Hat and Google are paying. John: Welcome to an industry where the warranty says nothing is guaranteed to work. We have to change how the industry works. As a community, we have acted for functionality over other properties that might be more important. Bill Gates complained to me back in the days when Word still had some competition that people would make checklists and the users would buy the one with the most checks, not the one that worked best. With a processor the first checkbox is how fast it is, not how secure it is. Until a year ago, nobody would have said that they would trade more security for less performance. To be fair, we never asked that question until now. Mark: We are talking about how to get hardware and software to work in concert, and that will take the next 24 months. Jon: I'm worried about fatigue. If we get 10 of these per week, we will need to decide which ones to fix, and people will get burned out. John: This is important, but users accept much greater security issues. People don't create long passwords, different on every system, and change them every month. Mark: Open source hardware is not a full solution. It is a way to try out security idea and get more eyeballs on it. Paul: There is so much value-add in the fabrication that it is always going to be secret sauce. it is worth too much money. But it is important to have a spec. The specs today don't address any of this. John it is good to have an open implementation. In theory you could have an open implementation of an existing ISA, but I don't see that happening for obvious reasons. But with RISC-V people can try things out. You can have a class and get people to implement Meltdown as a teaching tool. Paul: We need greater isolation but it's a heavy hammer. We need a way to map abstractions at the high level down to abstractions at the low level. Mark the code that is in the sandbox separately from the code running the sandbox. Question : Better late than never for the era of security. ISA 2.0 first principle could be simple: no access without authorization. It is a challenge for us educators to look at non-quantitative aspects like security. John: For sure we need to do a better job, but this is not easy. I'm sure a number of you have worked on cache-coherence protocols, and that is really hard. Now think about verifying that you never leak information from a hardware structure. It will require a new set of tools Mark: I apploud the idea of a simple principle, but that is just what the original architects thought they were doing. John: Don't dwell too much on caches as side-channels. There are tons of others. Question : You guys talked about public clouds and paying extra for exclusivity? Paul: Browsers and cloud providers and operating systems are going to have to find better ways to create more separation. John: The tree has fallen in the forest and anyone can read it. Isn't the problem that we are not broadcasting who can reference the information? Jon: The problem is that modern computers share a lot of stuff: the cache, the branch predictors, and so on. These share across boundaries. Paul: As the user I can control how branches are taken, by training the predictor, but it is impossible for the hardware to know if it was tricked. All it knows is it went down a bad branch. Question : What about accelerators? Today this has all been about the CPU. John: Currently accelerators are single-user mode and we currently clear all the state, so that reduces the surface for attack and the rate at which you suck data out. But if these become more pervasive, we’ll have to work out how to make them shared, and we'll be back to the problem of having boundaries. Jon: We don’t want to build Spectre accelerators, using FPGAs in the cloud to leak more data faster! Paul: Northbridge is no longer a separate chip, and so more and more comes under the title of “the CPU”. John: Randomizing page placement, randomizing lots of other stuff, will reduce the bandwidth, but not to zero. But it’s like crime, a temporary fix for now, but not really managing the problem. On that happy note, the session wrapped up. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Virtuoso: 新序曲- Cadence Virtuoso “第18.1 交响乐” 的前奏曲

$
0
0
当焕然一新的交响曲拉开帷幕时,优美动听的四乐章也随之上演: 层次化原理图驱动: 结合了自上而下与自下而上设计方法的优势,避免单独使用任意一种设计方法时而引起的缺陷; 层次可视化: 用户可在其设计阶段随时随地根据需求轻松地查看或隐藏设计细节,便于仅查看其所需的内容 层次及拥塞意识的布局及摆放: 提供自动化和辅助生产力 层次化布线和拥塞分析: 提前提供真实的路布线及拥塞分析信息 听起来不错吧?敬请聆听Cadence Virtuoso “第18.1交响乐” 的优美旋律! 在那之前,请关注 柴可夫斯基18.12的交响曲。 欲知更多 Virtuoso 设计平台的最新消息,请联系 team_virtuoso@cadence.com . 如果您想要接收即将推出的Virtuoso 高阶工艺节点和高级方法学版本的最新消息,请于页面顶部 Subscriptions 处输入您的邮箱ID ,赶快点击立即订阅。 作者: Rishu Misri Jaggi

Intel's Cascade Lake: Deep Learning, Spectre/Meltdown, Storage Class Memory

$
0
0
At the recent HOT CHIPS in Cupertino, Sujal Vora of Intel gave a look inside the Future Intel Xeon Scalable Processor (Codename: Cascade Lake-SP) . To make sure we all stayed to the end, this was the last presentation in the conference. The three focus areas for this processor, in addition to various enhancements, are: Special instructions for deep learning inference Support for storage class memory (Optane in Intel-speak, 3DXpoint in everyone-else-speak) Side channel mitigations (Spectre and Meltdown) Tick-Tock The above table shows where Cascade Lake fits in. Haswell was a new microarchitecture (tick) in 22nm, the first FinFET process, back when Intel was still calling it Trigate. This then became Broadwell when it was moved to 14nm (tock). Skylake was the next new microarchitecture, still in 14nm (tick). Normally, you would expect this microarchitecture to be moved to 10nm (tock) but Intel's 10nm is late and Cascade Lake is also in 14nm (although I'm assuming in what Intel has called 14nm++ in other contexts, Sujal just said "process tuning"). Apart from the three focus areas, the core is very similar to the first-generation Xeon Scalable Platform, with the same core count, cache size, and I/O speeds. Neural Networks New is VNNI, the Vector Neural Network Instruction. There is a new 8-bit instruction, VPDPBUSD, that fuses 3 instructions in the inner convolutional loop using 8-bit data and operates on all 3 in parallel. A similar instruction, VPDPWSSD, fuses two instruction working on 16-bit data, and does two operations in parallel. The above table shows the number of elements processed per cycle. There is no change for 32-bit floating point, but 16-bit go from 64 to 128 elements per cycle, and 8-bit from 85.33 (no, I don't know where that comes from either) to 256 elements per cycle. Storage Class Memory 3DXpoint memory, which is close to DRAM speed and close to NAND capacity, is the first so-called storage class memory. So far it is the only one, although see my post Carbon Nanotube Memory: Too Good to Be True? about Nantero's technology, also presented at HOT CHIPS. 3DXpoint was co-developed by Intel and Micron, and the Intel version goes under the name Optane. The biggest challenge in the server memory hierarchy is the huge gap in performance between DRAM and SSD. DRAM is too expensive for truly huge data-intensive applications, but SSD performance is too slow. Between the two technologies, SSD latency is about 1000x that of DRAM, and the bandwidth is just 1/10 as big. On the other hand, DRAM is about 40 times more expensive. You can have fast, or you can have cheap, but not both. Until now, maybe. Storage class memory slips into the gap: big, affordable, persistent, DDR4 compatible. However, to take advantage of storage class memory requires changes to the processor and to the operating system. With DRAM, if the power fails, the data will be lost, so there is not a lot of point in the processor taking a lot of care to note whether a write to memory completed or not before the power went out. That all changes with storage class memory. It is critical to know which writes completed, because when the system restarts, those values will be in memory. The writes that didn't complete will be lost along with the rest of the processor state. There are also changes required to the operating system and to other persistence-aware applications. When they restart, they can access data in memory without having to re-read it from disk. Because of this capability, less needs to be written out to disk in the first place, since it can be recovered from the storage class memory after a reboot. Adding a layer of fast persistent memory to the hierarchy changes a lot. However, care needs to be taken to manage what is in cache, and what is in the persistence domain, which requires some special instructions to flush data and wait for writes to complete. Defense Against Side-Channel Attacks The last focus area is hardware mitigations for side-channel methods. Intel seems to go out of its way to avoid mentioning the words Spectre and Meltdown, but that is what this is all about. The basic message is that Cascade Lake should provide higher performance mitigations over software-only mitigations. The table above is pretty much all the detail given. In particular, "mitigation" can mean anything from "that approach is completely shut down" to "we catch the easy cases but a determined adversary can still get through." Summary The last slide of HOT CHIPS summarized Cascade Lake. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

三维建模与电磁场分析新工具——3D Workbench

$
0
0
在Cadence 公司刚刚发布的 Sigrity 2018 版本中,介绍了全新的三维建模与电磁场仿真工具—— 3D Workbench 。它具有当前市场上主流 3D CAD 产品的用户界面( GUI ),采用了经业界多年验证的 PowerSI ® 3D-EM 仿真引擎( Engine )与多样且高效的网格划分( Mesh )选项 。 它的出现弥补了机械与电气仿真领域的隔阂。通过支持机械设计导入, 3D Workbench 可在同一仿真模型中融合机械部件(如连接器、插座、接口等)与 PCB 、 IC 封装,从而对整个电气连接系统作通盘考虑,精确分析其他仿真工具所忽略的连接点、接触等对整个信号链路的传输特性影响,产品开发团队自此可以实现跨多板信号的快速精准分析。 3D Workbench 作为 Cadence 首款支持 3D CAD 三维电磁场工具,可以与同属 Cadence 旗下的 Sigrity 、 Allegro ® 等其他仿真与设计平台实现数据互通与设计仿真无缝结合。较当前市场上依赖第三方数据与模型转换的的其他三维电磁场仿真工具而言, 3D Workbench 可提供效率更高、出错率更低的解决方案,大幅缩短设计 - 仿真周期的同时降低设计失误风险。 (点击查看大图) 3D Workbench工具特点 1. 友好的建模界面与模型导入便利 3D Workbench 具备当前主流的 3D CAD 工具的建模界面。具有三维模型界面(三维坐标轴与建模网格)、工程树、几何体构建工具以及布尔运算工具等。如下图所示,该建模界面由上方的菜单栏和工具栏、下方的 Tcl 命令窗口和信息输出窗口、左侧的“工程树”面板和“实体属性”面板,以及占据主要位置的三维模型界面构成。 用户可以在三维模型界面中创建或对导入的模型做修改,并且利用3D Workbench 提供的诸多方便的实用程序,完成材料编辑、边界设置、端口设置等一系列仿真模型定义工作。 (点击查看大图) 作为Cadence 公司 Sigrity 品牌下的新成员,三维建模与电磁场分析工具—— 3D Workbench 支持来自 Cadence 其他工具的 PCB/ 封装设计文件(如 brd , sip 等)、 IC 设计文件(如 gds 等)以及模型文件(如 spd , psix 等)的模型导入。除此之外 , 3D Workbench 也支持导入来自于第三方(如 PADS , Zuken 等)的设计文件。更重要的是, 3D Workbench 支持 sat , stp , step 等格式的机械设计文件导入,从而实现跨领域的模型构建与仿真 。 2. 丰富的网格划分(Mesh)选项 3D Workbench 提供了 Sigma Mesh 与 D Mesh 两大类网格划分算法,采用四面体网格单元对求解区域内所设定的电磁场问题进行求解。在初始网格划定并求解后, 3D Workbench 会根据预先设定的收敛标准,对不符合误差设定的区域做网格精炼,最终获得符合收敛标准的仿真结果。 除上述两大类网格算法外, 3D Workbench 还提供了‘基于全局 (Global) ’ 、 ‘ 基于局部 (Local Map)’ 以及‘全局粗糙 (Coarse) ’三种网格划分子策略,以及‘信号线最大网格长度 (Signal net max edge length) ’,‘网格播种 (Mesh Seeding) ’等子选项,适用于多种电磁场仿真场景。 与其他第三方三维电磁场仿真软件相比, 3D Workbench 提供的网格算法清晰透明,用户可以设定初始网格的粗糙程度,甚至可以具体到特定网络、特定实体的网格尺寸。 (点击查看大图) 3D Workbench更可以根据不同设计需求,选择不同的算法: 对模型求解区域作整体快速粗略估计,选择Sigma Mesh算法或D Mesh算法中的 ‘Coarse’子选项 对模型求解区域内的信号网络做精细分析,选择Sigma Mesh 算法或 D Mesh 算法中的 ‘ Local Map’ 子选项,并根据信号线尺度、求解波长等参量对 ‘ 信号线最大网格长度 (Signal net max edge length)’ 做适当设置。信号网络会以该值为最大网格边长做初始网格划分 对模型求解区域内的某一条或某几条信号线做精细分析,选择D Mesh算法中的 ‘Local Map’子选项,并根据该信号线尺度、求解波长等参量对该信号线的表面网格播种值 (Seeding Value) 做适当设置。这些信号线会以该值为最大网格边长做初始网格划分 对模型求解区域全局做精确分析,选择D Mesh算法中的 ‘Global’子选项,网格算法会以求解区域内的最小结构为参考并结合求解波长做初始网格剖分,以牺牲部分计算资源与时间来换取对模型全局的仿真精度 (点击查看大图) 以下是不同仿真场景的简单对比,用户可以根据需求选择最合适的仿真方式: (点击查看大图) 3. 支持参数扫描仿真与分析功能 通过内置的 Sweeping Analyzer 实用程序,用户可以对特定的模型参数(包括尺寸参数,位置参数等)在一定范围内做扫描分析。 首先,对需要做扫描分析的结构做参数化建模并添加‘Sweep ’标记。 (点击查看大图) 其次,设置参数扫描方案。用户可以设置单一参数、多个参数以及组合参数的扫描方案。 (点击查看大图) 然后,选择参数扫描仿真(Start Sweeping Simulation )。上述参数扫描方案中的子案例( Case )会依次生成并自动进入仿真队列进行仿真。 (点击查看大图) 最后, 待所有子案例仿真完成,通过选择‘Result Summary ’可以一次性在 BNPViewer 中观察所有扫描子案例的 S 参数结果,方便做结果筛选与对比分析。 (点击查看大图) 4. 支持全功能脚本命令(Tcl)录制与回放 即用户在3D Workbench 界面上的所有操作均可以以 Tcl 脚本语言的方式记录下来,并且该脚本可以为用户所编辑并在 3D Workbench 中做回放。 全功能脚本命令的支持对复杂模型的建立与重构带来了极大的便利。 (点击查看大图) 3D Workbench案例演示 现在展示3D Workbench基本的工作流程。本案例包含了PCB设计文件与机械结构设计文件的联合导入与模型融合,并展示了模型修正、材料设置、端口与边界设置、仿真设置等基本的工作步骤。 1. 启动3D Workbench ,新建一个 Project 并通过 Import 导入 PCB 模型(.spd 文件) (点击查看大图) 2. 为SMA 机械设计模型创建一个新的 ‘用户坐标系’ (点击查看大图) 3. 通过Import 导入 SMA 的机械设计模型 (点击查看大图) 4. 通过调整 ‘用户坐标系’参数,使PCB 模型与 SMA 模型在物理上 ‘对准’并在特定点 ‘接合’ (点击查看大图) 5. 为SMA 模型中的实体设置 ‘属性 ’(材料,网络名,网络类型等) (点击查看大图) 6. 分别在PCB 端与 SMA 端添加端口 (点击查看大图) 7. 设置仿真边界条件(通过调整原属于PCB 模型的边界条件实现) (点击查看大图) 8. 设置仿真选项(频率设置,求解设置,网格设置等) (点击查看大图) 9. 启动仿真,完成后查看仿真结果 (点击查看大图) (本篇仅展示了3D Workbench 中基本和重要的设置步骤, 3D Workbench 的详细使用流程与方法请参阅 3D Workbench User Guide 与相关 Tutorial 文档 。该类文档可在安装工具后菜单栏中的Help>Document 或安装目录中的 doc 文件夹中找到;或发邮件至 PCB_marketing_China@cadence.com 获取相关文档) * 原创内容,转载请注明出处: https://community.cadence.com 了解更多关于 Cadence ® Sigrity PowerSI 3D EM 技术如何帮助您解决 IC 封装和 PCB 设计中的 问题,欢迎访问: Sigrity PowerSI 3D EM Extraction Option 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com

CDNLive India: Asynchronous Design

$
0
0
Every few years the idea of doing completely clockless design gets proposed again. This is also known as locally asynchronous design (no clocks at all), as opposed to simply having lots of clock domains and having asynchronous communication from one domain to the next. There are all sorts of issues with this: Engineers learn synchronous design in school and never work on anything else, so they are unfamiliar with the basis ideas. Even in modern designs when asynchronous events happen, at the edge of clock-domain boundaries, it tends to be a major area of error, especially before clock-domain-crossing (CDC) tools existed. The modern digital design flow is based around clocks, such as static timing analysis. Even if you can get the system designed using whatever tool flow works, how is verification done? Despite these difficulties, there are a lot of attractive aspects to locally asynchronous design. First, about 30% of the power and a large amount of the interconnect is "spent' on clock distribution. Synchronous designs have their frequency set to the worst case silicon. But most silicon is "typical" by definition. So a lot of performance is being left on the table. Locally asynchronous design runs at the fastest speed possible given the actual silicon corner. Furthermore, when performance is data-dependent, synchronous design still runs worst case, whereas locally asynchronous design will run at the appropriate speed for the data values. A Trip to Greece Several years ago I did some consulting for a company called Nanochronous, who supported locally asynchronous design. They provided silicon structures to handle the scheduling of operations, along with software which read a "normal" netlist and produced an equivalent design that realized the same RTL operation sequence but without requiring an explicit clock. Their engineering organization was on the Greek island of Crete. The company was running low on its seed funding, so they didn't have a lot of money. But I got offered the deal of having my expenses paid for a trip to Crete, and I would do 3 days consulting, doing what was effectively an operations review. If the company got funded, they would pay my consulting fee. Otherwise, I would just get an all-expenses vacation in Crete. Since I'd not been to Crete since I was 19 on Inter-rail, this was quite attractive. Plus Greek food is wonderful. It was a great trip, and certainly more pleasant to fly to Heraklion from Athens rather than sleeping outside on the deck of a ferry, which had been my previous mode of arrival. I arrived on a Sunday and got taken up to a little village up in the mountains where they roasted a couple of whole lambs each morning, and after church (Greek Orthodox, of course) everyone went to the local taverna for lamb, pita, greek salad, ouzo and wine. I reminded me of some of the places up in the foothills of the Alps near where I lived in the south of France. One issue with the Nanocrhonous technology was, as always, verification. Their technology seemed to work well but it was hard to verify except using SPICE, which didn't scale. Despite significant interest from some large fabless semiconductor companies, eventually the company were unable to break through the verification wall. In the era when I went to Crete, Formal Verification was in its infancy, but it is not a promising technology since clocks are fundamental to (most of) the algorithms used. Texas Instruments At CDNLive in India recently, Texas Instrument's Sudhakar Surendran, presented on Locally Asynchronous Design Verification . I don't think TI does any designs where the whole chip is locally asynchronous, but these techniques are widely used in delay-based speed optimization, power management circuits, and other applications. But these still require verification. In addition to problems that asynchronous FSM might have, an asynchronous one has some new things that might occur such as glitches. They can also have unstable states (for example, if, for a certain combination of inputs, state A goes to next state B, but next state for B is state A, then this will oscillate). TI used a trick with Xs as a ternary logic to model unstable states (which formal has no concept of an unstable state). The biggest challenge in using Jasper Gold for verifying a design like this is that the formal tools have no concept of delay (they don't do timing, they just do functionality). The design won't even compile. So TI created a delay element that emulated the delays for the formal tool. Then they modified the source file to instantiate these elements at all delay points. The next issue was that combinational loops are not acceptable to formal tools. So any combinational loop was broken by adding a one-clock delay element (and adjusting all the other delay values appropriately). They could then use both formal verification and standard constrained random verification. In a bit more detail, TI would: Discretize the timing delays Model the discretized delays using FV tool crank or internal clock Create TIMING_FV Model for each delay formats (SDF,…) TIMING_FV model uses basic template for each construct Create FV timing delay wrapper for each cell in the library Create FV friendly design netlist for the design with timing delay Uses original netlist, cell FV delay wrapper, timing/SDF and the TIMING_FV model Sudhakar went into detail in each of these steps, but that is beyond the scope of this post. I think the important takeaway is that they have developed an approach that allows a formal verification tool like Jasper Gold to be faked out by using delay elements to generate a clock that doesn't really exist in reality. This approach allows formal verification to be used on asynchronous designs. Summary TI showed methods to use formal verification for asynchronous designs. A big advantage was to "shift left" by doing this very early in verification, once RTL was first available. Model timing in Formal tools that enabled reading in asynchronous designs Detect hazards in asynchronous designs Enable ‘unstable’ states detection The presentations are not yet available, but eventually all presentations will appear on the CDNLive India page . Sign up for Sunday Brunch, the weekly Breakfast Bytes email

HOT CHIPS: Some HOT Deep Learning Processors

$
0
0
If there was a theme running through the recent HOT CHIPS conference in Cupertino then it was deep learning. There were two sessions on machine learning, but also every processor described in the server processor session had something to handle deep learning training. I'm not going to attempt to write about all of them, but because of their ubiquity, I'll discuss the presentations by Arm and NVIDIA on their deep learning processors. On the subject of deep learning, I covered the Sunday tutorial in HOT CHIPS Tutorial: On-Device Inference . The Arm and NVIDIA chips are focused on this area, and also take into account a lot of the specific compression techniques discussed in the tutorial. Arm Ian Bratt presented Arm's First-Generation Machine Learning Processor . This is a brand new processor optimized for machine learning. Like any specialized processor in this area, it is a big efficiency uplift from CPUs, GPUs, and DSPs. For now, at least, it seems to be called simply the Arm ML processor, although I expect it will get a Armish name when it is officially released later this year (TechCon is in October, so if I were a betting man I'd go for Mike Muller's keynote). Ian gave the four key ingredients for a machine learning processor as: static scheduling, efficient convolutions, bandwidth reduction mechanisms, and programmability. The static scheduling is implemented by a mixture of compilation, which analyzes the NN (neural network) and produces a command stream, and a control unit that executes the command stream. There are no caches or memory and DMA is managed directly by the compiler/processor. Convolutions are done efficiently, mapping different parts of the input and output feature maps among the 16 processors in the system. The MAC engine itself (on each processor) is capable of eight 16x16 8-bit dot products, so with 16 MAC engines, you get 4096 ops/cycle, making 4.1 TOPS at a 1GHz clock. There is full datapath gating for zeros, giving a 50% power reduction. See tutorial post linked above for much more about handling zeros. There are also mechanisms for activations from one compute engine to another, which are broadcast on the network that links them all. The processor has a POP optimization kit for the MAC engines, tuned for 16nm and 7nm. This provides an impressive 40% area reduction and 10-20% power reduction versus just using the normal cells. DRAM power can be nearly as high as the processor itself (yellow in the pie chart is the ML power, the rest is memory, blue for the weights, black for the activation), so compression to reduce this is important. The ML processor supports weight compression, activation compression, and tiling. This results in a saving of about 3X with no loss in accuracy (since it is lossless compression). As discussed in the tutorial linked above, pruning during the training phase increases the number of zeros, and clustering can snap the remaining non-zero weights to a small collection of possible non-zero values (easy to compress). The models are compressed offline during compilation. The weights, which dominate later layers of networks, remain compressed until read out of internal SRAM. Compiler-based scheduling is tuned to keep the working set in SRAM, and tiled or wide scheduling minimizes trips to DRAM. Multiple outputs can be calculated in parallel from the same input. This is all possible due to the static scheduling, which is set up at compile time, and executed in the processor. At the bottom of the block diagram above is the programmable layer engine. This is largely to future-proof the processor since the state-of-the-art in neural networks is evolving on almost a daily basis. Ian was deliberately vague about exactly what this processor is, but it "extends ARM CPU technology with vector and NN extensions targeted for non-convolutional operators". It handles the results of the MAC computations, and most of this is handled by a 16-lane vector engine. The basic design is very scalable, in the number of compute engines (16 in this implementation), MAC engine throughput (add more MACs), and in the overall number of ML processors. The summary of the new Arm ML processor is: 16 compute engines ~ 4 TOP/s of convolution throughput (at 1 GHz) Targeting > 3 TOP/W in 7nm and ~2.5mm^2 8-bit quantized integer support 1MB of SRAM Support for Android NNAPI and ARMNN To be release d 2018 NVIDIA Frans Sijstermans of NVIDIA presented the NVIDIA Deep Learning Accelerator, NVDLA. It was originally developed as part of Xavier, NVIDIA's SoC for autonomous driving. It is optimized for convolutional neural networks (CNNs) and computer vision. NVIDIA decided to open source the architecture and the RTL. You can simply download it and use it without needing any special permission from NVIDIA. They have taken the view that they cannot cover all applications of deep learning. As Frans put it: The more people who do deep learning, the better it is for us. Obviously, for now anyway, the more people do inference at the edge, the more people need to do training in the cloud, and that means the more NVIDIA GPUs will be needed to do it. Of course, there are the usual advantages of open source in contributions from others in the community, and there is nothing but upside for NVIDIA if NVDLA becomes a de facto standard. The high-level architecture is shown in the above block diagram. The processor is scalable. Frans talked about two particular configurations. "Small" has an 8-bit datapath, 1 RAM interface, and none of the advanced features. The "large" configuration had 8-bit- 16-bit and 16-bit floating point datapaths, 2 RAM interfaces, an integrated controller, weight compression, and more. Below is some performance data for the large configurations. The processor is available at nvdla.org . Their summary paragraph there says: The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

为什么电源完整性(PI)是个“热”话题——如何进行电/热协同仿真

$
0
0
在设计新一代产品时,我们共同追求的目标都是“更快,更小,更便宜”。然而当这与更长的电池寿命和更低的功耗要求相遇时,就向我们提出了艰巨的设计挑战。唯一可以肯定的是,项目开发进度并不会因为我们需要克服挑战而延期。 每个电子产品的设计师无疑都需要能够 分析供电网络 的工具。虽然元器件可以承受电源和地通路的某些波动,但这种容限是有限的。穿孔严重以至于像瑞士奶酪般的板层,以及为给信号布线腾出空间而在填充区域走线、打孔的做法只会加剧电压波动。但是当我们处于“更快,更小,更便宜”的压力之下时,这些却成为了我们的权宜之计。 直流电源分析(也称为 压降分析 )工具通常是电子产品设计人员在面临设计挑战时首先使用的工具。然而在固定温度下进行的分析却存在一个常见问题:当电流通过板层中的穿孔平面和阻塞区域(瓶颈区域)返回时,电流密度将导致这些部分的温度高于PCB的其他正常区域。因此,在固定环境温度下分析压降会导致压降预测的不准确性。 解决方案则是使用专业的工具(如下图所示)对 压降分析与热分析同时进行 :根据电子产品PCB区域的运行温度对直流压降进行准确的预测。 除电/热协同仿真外,还可以 分析多板配置 :即对于附带存储卡的产品,可以对其进行完整的系统供电网络分析。 (点击查看大图) (点击查看大图) (点击查看大图) 以下是11分钟的详细技术演示视频,演示所用工具为 Sigrity PowerDC technology。 https://youtu.be/1WJL3f--uGM 如果您正在使用Allegro工具进行PCB或IC封装设计,您甚至可以在设计中直接调用、访问 电热协同仿真工具 Sigrity PowerDC。精准的压降分析可以在批处理模式下运行;通过报告文件提供的链接,设计人员可以准确定位超出规范的设计部分。您从此不必再为完成设计而在不同工具之间来回切换,提高工作效率的同时缩短了设计周期。 期待您的意见和评论。 *原创内容,转载请注明出处: https://community.cadence.com 更多相关阅读: 封装/ PCB系统的热分析:挑战及对策 警惕发热!——热模型交换 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com

Whiteboard Wednesdays - Standalone AI Processor: Tensilica DNA 100 Processor IP for On-Device AI

$
0
0
In this week's Whiteboard Wednesdays episode, Megha Daga Megha Daga describes the new Tensilica DNA 100 Processor IP for on-device AI. This AI processor delivers industry leading high performance and power efficiency across a full range of compute from 0.5 to 100s of TMACs and is well suited for on-device neural network inference applications. https://youtu.be/eT4f2CoBByo

The New Tensilica DNA 100 Deep Neural-network Accelerator

$
0
0
Today, at the beautiful Tegernsee resort outside Munich in Germany, Cadence announced their latest processor IP, the Tensilica DNA 100 Deep Neural-network Accelerator. This is a highly scalable processor with a range from 0.5 TMACS (tera multiply-accumulates per second) up to hundreds of TMACS. Neural Network Development I have heard it said that there has been more advance in deep learning and neural networks in the last 3 years than in all the years before. I rejoined Cadence 3 years ago. Coincidence? I think not. Joking aside, neural networks have become increasingly important and I have found myself writing about various aspects of them many times. At first, it was mostly about using 32-bit floating point in the cloud, probably with GPUs, too. It has been a hot area in Universities for both undergraduates (instant hire) and research. Just as a datapoint, I happened to see a tweet from Yan LeCun about the most cited authors in the whole of computer science over the last 3 years: The top researchers are all neural network researchers. Note the units that these are measured in. This is not citations over the whole year, it is citations per day over the whole year. I don't actually know if this should be multiplied by 260 (weekdays) or 365 to get to annual numbers. Even if we use the lower number, in 2018 (annualized) Yoshua Bengio was cited over 34,000 times. Also look at the citation growth over the three years, the rates all basically doubled. Once research had found effective ways to use the cloud and GPUs for training, a new important area for research was how to do on-device inference. There are lots of drivers for this, such as wanting more responsive systems than is possible sending everything up to the cloud and back. But the biggest is that some systems need to operate without permanent connectivity. Most obviously, an autonomous vehicle cannot depend on cellular connectivity being good before it decides if a traffic light is red or green. Another driver is the need for privacy: people are uncomfortable with, for example, their smart TV uploading all their conversations to the cloud to find the occasional command of relevance to the TV among the everyday conversation. On-device inference means doing inference with limited resources. There are two big aspects to this. How to compress the network (weight data) without losing accuracy, and how to architect hardware that can handle on-device inference using the compressed weight data. At the recent HOT CHIPS conference in Cupertino, one of the tutorials was on how to do the compression. I won't cover that ground again here, you can read my post HOT CHIPS Tutorial: On-Device Inference . The bottom line is to reduce everything from 32-bit floating point to 8-bit, and to use techniques to make as many of the weight values zero, and so the matrices involved as sparse as possible. Surprisingly, instead of this being a difficult tradeoff of size and accuracy, the reduced networks seem to end up with slight increases in accuracy. The compression ratios can be as high as 50 times. Having made many of the weights zero, the next step is to build optimized hardware that delivers a huge number of MACS and deals with all those zeros specially. The reason optimizing the zeros is so important is that zero times anything is zero. So not only is it unnecessary to explicitly load zero into a register, nor do the multiply, but the other value in the calculation does not need to be loaded either. The values involved can also be compressed so that fewer bits need to be transferred to and from memory—every memory transfer uses power, and it is not hard to end up with data transfer interface consuming more power than the calculations themselves, as happens with the Google TPU. DNA Architecture The computational requirements (and the power and silicon budgets to pay for them) vary a lot depending on the end market. For example: IoT is less than 0.5 TMACS Mobile is 0.5 to 2 TMACS AR/VR is 1-4 TMACS Smart surveillance is 2-10 TMACS Autonomous vehicles from 10s to 100s of TMACS Every application of a processor like the DNA 100 is going to be different, but one high-end use case is perception and decision making in automotive, with cameras, radar, lidar, and ultrasound. A typical architecture is to have local pre-processing of the different types of data, and then bring it all together to analyze it (is that a pedestrian?) and act upon it (apply the brakes). Cadence has some application specific Tensilica processors such as the Vision C5 suitable for handling the pre-processing, and the new DNA 100 is powerful enough to handle all the decision making. The DNA 100 processor architecture is shown in the block diagram above. The left-hand gray background block is a sparse compute engine with high MAC utilization. The block on the right is a tightly coupled Tensilica DSP that controls the flow of processing, and also future-proofs designs by providing programmability. You can think of these two blocks as the orchestra and the conductor. The DNA 100 architecture is scalable internally, mostly by how many MACs are included. It can easily scale from 0.5 to 12 TMACS. The next level of scaling is to put several DNA 100 cores on the same chip, communicating via some sort of network-on-chip (NoC). If that is not enough, multiple chips (or boards) can be grouped into a huge system. Autonomous driving has been described as requiring a super-computer in the trunk. This is how you build a supercomputer like that. Performance ResNet50 is a well-known network for image classification. A DNA 100 processor in a 4K MAC configuration, running at 1GHz can handle 2550 frames per second. This high number is enabled by both sparse compute and high MAC utilization. It is also extremely power-efficient. In 16nm it delivers 3.4 TMACS/W (in a 4 TMACS configuration, with all the network pruning). Software A complex processor like the DNA 100 is not something where you would consider programming "on the bare metal." There are frameworks such as Caffe, TensorFlow, and TensorFlow Lite that are popular for creating the neural networks. Cadence has the Tensilica Neural Network Compiler that takes the output from these frameworks and maps them onto the DNA 100, performing all the sparseness optimization, and eventually generating the DNA 100 code. Another popular approach is the Android Neural Networks API, which handles some levels of mapping before passing on to the Tensilica IP Neural Network Driver that produces the DNA 100 code. And in a bit of late-breaking news from last Thursday: At Facebook’s 2018 @Scale conference in San Jose, California today, the company announced broad industry backing for Glow, its machine learning compiler designed to accelerate the performance of deep learning frameworks. Cadence, Esperanto, Intel, Marvell, and Qualcomm committed to supporting Glow in future silicon products. The DNA 100 doesn't support Glow yet...but watch this space. For higher level functionality, Cadence partners with specialists such as ArcSoft for facial recognition, or MulticoreWare for face-detection. Summary The Tensilica DNA 100: Can run all neural network layers, including convolution, fully connected, LSTM, LRN, and pooling. Can easily scale from 0.5 to 12 effective TMACS. Further, multiple DNA 100 processors can be stacked to achieve 100s of TMACS for use in the most compute-intensive on-device neural network applications. Also incorporates a Tensilica DSP to accommodate any new neural network layer not currently supported by the hardware engines inside the DNA 100 processor. Complete software compilation flow, including compression for sparsity. Will be available to select customers in December 2018, with general availability in Q1 2019. For more details, see the product page. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Seeing Sound

$
0
0
You may know something about deep learning and machine learning when it comes to visual applications. Using various filters (or convolutions) and other methods of interpreting visual data, the layers of processing that data can, for example, identify the subject of a picture. An example of a convolutional neural net (CNN) But what about audio data? Machine learning and audio processing obviously work, because we have Siri and Alexa and so on—so how do we apply what we know about processing images to processing sound for natural language processing (NLP) systems? Wave Breakdown Just as a convolution layer of your visual image filters out the relevant data, so an audio signal is filtered into different segments before being fed into a neural network. You’ve seen 2D or 3D waveforms of music you’re listening to, right? This is the kind of information that is fed into the neural network. From there, you have a “visual” image of sound, which can then be processed just as an image would, whether you’re identifying the sound, removing background noise, or processing speech. Examples of waveforms, or audio waves These images, or spectrograms, are “…images representing sequences of spectra with time along one axis, frequency along the other, and brightness or color representing the strength of a frequency component at each time frame. This representation is thus at least suggestive that some of the convolutional neural network architectures for images could be applied directly to sound.” [L. Wyse] As an example of a spectrogram to look at, here is a one-second clip of a man’s voice saying, “Nice work.” This heat map shows a pattern in the voice which is above the x-axis. A neural network will be able to understand these kinds of patterns and classify sounds based on similar patterns recognized. To quote Daniel Rothmann in his article, “ The promise of AI in audio processing ”, Essentially, we are taking audio, translating it to images and performing visual processing on that image before translating it back to audio. So, we are doing machine vision to do machine hearing. In other words, we are seeing sound. Note that the heatmap above is for a short clip of audio, and the inherent problem is this: If you freeze a frame of, say, a movie, you get an image from which you can glean a lot of information, including being able to feed the image into a neural net. If you freeze a “frame” of audio, however, it means nothing. Audio is dependent on the “time” axis. Now imagine a sound clip with multiple voices, background noises, foreground noises, overlapping voices, and all kinds of sounds mixed together in one chunk of sound. It takes so much processing to distinguish each element, and then identify it. And then distinguishing what the voices are saying, and then not only what they’re saying, but what they mean—imagine the processing power required for each segment of audio that is processed by the neural network. This is where Cadence Tensilica HiFi DSPs for audio, voice, and speech comes in, with more than 225 audio, voice, speech recognition, and voice enhancement software packages already ported to the HiFi DSP architecture. Audio applications present unique problems that must be addressed specifically by the DSP, and Cadence is on the leading edge of what is possible. —Meera What I read for this post: 10 Audio Processing Tasks to get you started with Deep Learning Applications Audio spectrogram representations for processing with Convolutional Neural Networks Getting Started with Audio Data Analysis Using Deep Learning The Promise of AI in Audio Processing Using Deep Learning for Sound Classification: An In-Depth Analysis

Samsung Galaxy S9's Application Processor

$
0
0
At this year's HOT CHIPS, Jeff Rupley of Samsung presented the application processor that goes in their Galaxy S9 and S9+ smartphones. Apple only ever gives cursory information about their Ax chips, and I don't remember seeing a lot of detail about the HiSilicon chips that go into Huawei's smartphones, so this was an opportunity to get a more detailed look under the hood at a state-of-the-art smartphone SoC. The chip is called M3, simply because it is the 3rd version. Jeff gave some insight into the development schedule: Planning started Q2 2014 RTL started Q1 2015 Forked features for an incremental M2 in Q4 2015 Replanned for a bigger M3 push Q1 2016 First tapeout Q1 2017 Product launch (Galaxy S8) Q1 2018 The chip is an Arm v8.0 64-bit (with 32-bit compatibility) processor, manufactured in Samsung's 10nmm LPP process. It runs at 2.8GHz. Jeff talked about the processor in parts: the front end (instruction decode and branch prediction), the middle machine (instruction reordering and dispatch), the FPU, and the load/store unit. There is not a lot of point in having an enormous 228 entry ROB in the middle machine (comparable to Intel server chips) unless your branch prediction is extraordinarily good. They did a lot of work on this, using machine learning in the branch predictor. The M2 MPKI (missed predictions per 1000 instructions) was 3.92 so it is hard to get much better. But they got the M3 down to 3.29. I have no idea what numbers other processor manufacturers achieve since I don't recall anyone revealing their statistics before. Everything is bigger and wider in the middle machine: Decode up to 6 instructions per cycle (vs 4 in the M2) Rename, dispatch, retire up to 6 instructions per cycle (vs 4) Up to 9 integer ops issued per cycle (vs 7) and a 4th ALU including a second multiplier 228 entry ROB (vs 100) 128 entry distributed integer scheduler (also >2X) More ops done in 1 cycle, and some optimized to 0-cycle (no idea quite what that means, but I assume the ops somehow get overlapped with stuff in the front end) Floating point unit is also a "beast". With the importance of a lot of machine learning around floating point MAC operations, they have added a lot. A 3rd dispatch and issue port.3x 128b FP FMAC/FADD. 62-entry FP scheduler (>2x). FMAC down from 5 cycles to 4. FADD down from 3 cycles to 2. The overall pipeline is in the above diagram. The load and store unit has been beefed up, with 2 loads per cycle (up from 1), 1 store per cycle. It can handle 12 outstanding misses (versus 8 before). The translate lookaside buffers (TLBs) have been expanded with a new mid-level DTLB and a l2TLB with 4 times the capacity. The graph above shows the result across 4800 instruction trances. The IPC (instructions per cycle) has gone from 1.26 for the M2 to 2.01 (so cycles per instruction is below 0.5). No HOT CHIPS presentation is complete without a die picture. The above plot shows the chip layout. This is just one core, and there are 4 cores on the whole M3. The overall performance is impressive. The above chart compares the M3 to the M2 and also to the Arm A75 (presumably as it comes from Arm before all the modifications that Samsung made, but even running at a slightly higher clock rate). The graph for performance per Watt was equally impressive. To wrap up, Jeff said that they were on a roll and are doing a new processor every year. He didn't quite say it, but it seems clear that there will be an M4 in 2019. The takeaway that I got from this presentation is that there is starting to be very little difference between mobile processors (at least at the very high end) and server processors. The servers have a higher clock rate and burn a lot more power as a result, mobile processor have to back that off a bit (but 2.8GHz is not backing off a lot). Servers have a lot more cores too. But the underlying architecture with speculative execution, large caches, very wide and deep out-of-order execution, great branch prediction, and more, make for similar architectures. HiSilicon I wrote this soon after HOT CHIPS even though it is only appearing now. In the meantime, having said that I'd not seen anything about HiSilicon's processors, they announced their latest Kirin 980, which is the world's first 7nm mobile chip. It was announced at IFA, Europe's biggest tech show, which I wasn't at, so this is second-hand information. What does IFA stand for, I hear you ask? Internationale Funkausstellung. So just say IFA like everyone else. In case you don't know, HiSilicon is a wholly-owned fabless semiconductor arm of Huawei, based in Shenzhen, just over the river from Hong Kong. It has 6.9B transistors and it took 1000 engineers 36 months (3 years) to design. Venture Beat doesn't understand semiconductor design, since it says "it took more than 5,000 prototypes" to get it right. I can only guess that meant they compiled the RTL for verification 5,000 times, which is still about 5 times per day for 3 years. Power is down 40% from the prior 10nm chip. It also contains two (up from one) NPUs, neural processing units, presumably for doing all the MACs associated with neural net inference. All that and "Facebook opens 0.3s faster; Snapchat opens 0.2s faster." That seems underwhelming, maybe unless you are an impatient teenager. More impressive to me is LTE cat 21 with download speeds of 1.4 Gbps, and WiFi even faster ("the world's fastest") at 1.7 Gbps. Despite its high-end specs, Huawei has announced that it will be used by Honor, its budget brand, in its Magic 2 smartphone, and not just in the high-end Mate 20 (not yet officially announced). Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

What's For Breakfast? Video Preview September 24th to 28th 2018

$
0
0
https://youtu.be/NYsYkQzZADo Coming from SAP Center, San Jose (camera Sean) Monday: EDPS: Design Process in Milpitas Tuesday: CDNLive India: Invecas and FD-SOI Wednesday: RF Design with Cadence and National Instruments Thursday: GlobalFoundries Technology Conference Friday: Figure-Skating Champion Wins Kaufman Award www.breakfastbytes.com Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

Jaswinder's Only Job Interview

$
0
0
On Labor day, I didn't get the day off since I was in Delhi. I had to labor, not celebrate it by eating barbecue. Instead, I ate chicken curry, naan, and fried okra at the lunch I had with Jaswinder Ahuja in a conference room. I knew he had just passed his 30-year anniversary at Cadence, which meant that he was already up to about 10 years when we would run into each other regularly last time I worked for Cadence in the late 1990s and early 2000s, and we both worked for Shane Robison, who ran engineering at the time (we were organized functionally in that era). The obvious first question is how he ended up in Cadence in the first place. He told me he was born in Delhi and spent the first part of his life growing up there (or here, since I'm sitting in Delhi as I type this). Then he lived in Chennai for five years, still called Madras back then. He studied engineering in Varanasi and then went to grad school at Northeastern University in Boston. But he was always interested in coming back to India to be near his family. He told me he avoided what people called the +1 problem back then, of "I'll stay one more year and then go back to India." He was wondering what he might do to find a job when he happened to get given a copy of India Abroad by a friend, and it had an ad for a company called Gateway Design Automation who were looking for engineers in India. But they were headquartered in the Boston area, where he was already living. As Jaswinder put it to me: So I went for the one-and-only job interview of my life, and here I am still. That one interview was with Prabhu Goel (the CEO), with Phil Moorby (the inventor of Verilog, see my post Phil Moorby and the History of Verilog for more about that story), Manoj Gandhi (now EVP of Verification for Synopsys), and a couple of other famous names. He got the job. When he got to Noida (which is a suburb of New Delhi) there were already 6 people there. They were developing Verilog models of standard parts. Prabhu had realized that they wouldn't be able to sell the simulator without models, but the models would be a sort of loss leader—Gateway wouldn't be able to make money on the models. So he had made the strategic decision to create a modeling group in India. Jaswinder was engineer number 7 in the group. At the end of 1989, Cadence acquired Gateway (and the Verilog simulator) and the team in India became the CAE division. Prabhu ran the division and nothing much changed for a couple of years. The big change was when Cadence acquired Valid Logic. The two companies were a similar size with a number of overlapping products. As is typical in such mergers, much of the time was spent on trying to merge these products to rationalize the product line, and give guidance both internally to engineering, and externally to customers, as to what the combined roadmap would be. My experience at Cadence with the Ambit merger was similar—we spent all our time on timing engine issues since both Ambit and Cadence already had timing engines, and customers wanted to know which one would "win". After the Valid merger there was a similar dynamic. For example, it seemed "obvious" that Cadence didn't need two schematic capture tools, one for PCB (Allegro Capture from Valid) and one for mixed-signal (Composer from Cadence), and a lot of effort went into merging them. Their descendants are still separate today! The merger with Valid changed the structure of Cadence and put many new managers in place. One effect of all the change was that Prabhu left Cadence in 1992 to found S&T (or Software and Technologies, to give its full name). They would be a company who would do subcontract development of software and models for EDA companies, with a mixture of development services and some common code they would own and license—such as a Verilog parser. Jaswinder admitted to me that one little-known fact, lost in the fog of history, is that he followed Prabhu and was employee #1 at S&T. However, he came back to Cadence after six months (to the day: he left Cadence on February 1st 1992, and rejoined on September 1st). He came back as the engineering manager for the organization, which had now grown to 40 people. Cadence bridged his starting date, so if you want to quibble, Jaswinder has only worked at Cadence so far for 29½ years! Cadence India started to work with other divisions than CAE, expanding to IC and PCB tools. They grew the group rapidly in the 92-96 time frame. It was a challenging time with a mixture of growing pains, but especially people leaving for other opportunities. By the mid-1990s, having an Indian development strategy became a standard part of every EDA startup's business plan, and so people who already had many years experience were suddenly in demand. Attrition was something like 30+% per year. This was also a period where all engineers with a few year's experience were in demand, and salaries were going up over 20% per year. The rupee was falling against the dollar, so this was still single-digit measured in dollars. My own experience as an engineering manager is that it is very difficult to handle the human resource issues when salaries in the marketplace are changing rapidly. New hires get hired at the market rate (sort of by definition, they don't accept their offers otherwise) but that leads to new hires being paid more than the loyal engineers who have been there for a long time, a phenomenon known as "salary compression". It is very hard to get management to commit to fix it, especially after several years of not fixing it, since an across-the-board salary increase of 30% is probably impossible financially. I went through the process a couple of times in my career. Shane Robison joined Cadence in 1996 (he would be my boss after the Ambit acquisition) and gave Jaswinder responsibility for the whole India center. Jaswinder is proud that they built up a good team in the mid-1990s, and there are many people still at Cadence from that era who have 20-25 years of experience. Today, Cadence India has four sites, totaling nearly 2,000 people. Noida is the biggest with 1150 people in 4 buildings (but they are out of space and Jaswinder said they are looking to take a 5th building). Bengaluru (Bangalore) has 750, and they moved buildings a couple of years ago. There are 100 in Pune, which is a group inherited with the Tensilica acquisition. Finally, there are 50 people in Ahmedabad, doing verification IP (VIP), a group that we purchased from Sibridge. Cadence India, on the business side, has grown very rapidly. As major Cadence customers such as Qualcomm, Samsung, Intel, Broadcom, TI, ST, NXP, Mediatek, IBM, Arm, and more have set up their own Indian engineering organizations, they have obviously needed design tools and local support. These groups are doing a lot of work with the most advanced tools—there are many 7nm designs being done in India. There is also some startup activity, but Jaswinder admitted that it's nothing compared to China or Israel. He thinks it is partially cultural, but also there are a lot of attractive opportunities to work for these large multi-nationals. It has turned out that a lot of startup activity has been to start service companies, which then get acquired when someone wants a design team that is already in place, rather than having to build one up from a standing start. Even in deep-learning, which has led to a lot of fabless semiconductor startups in the US and China, it is all software-only startups. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

升级到Allegro17.2-2016的10大理由之5:如何层叠

$
0
0
在这我们谈论的不是您的叠层设计跟其他人比怎么样,而是您设计的 PCB 层叠结构,是刚性板、柔性板、刚柔板,或者使用了镶嵌技术。层叠的定义,更具体而准确的层叠的定义,是至关重要的。各种材料的安排会影响需要的阻抗控制和减少关键信号串扰的计算和分析,从而影响了最终产品性能。表达真实基材厚度(包括掩模层)的 ECAD / MCAD 协作对于系统物理建模至关重要,特别是对需要最有效利用空间的最小或最薄的设备。最重要的是,精确的层叠定义将被发送给电路板制造设备,来制造符合设计数据的最终产品。 材料镶嵌(点击查看大图) 刚柔结合板(点击查看大图) 针对之前提到的每种设计类型, Cadence ® Allegro ® PCB Designer 17.2-2016 使得精确定义层叠结构变得很简单。我们新的 Allegro PCB 层叠编辑器可以定义绝缘层、导电层和刚性、柔性设计的外部掩膜层。新的层叠编辑器可以轻松定义那些必须包含的不同层叠,满足刚性、刚柔结合或镶嵌结构所使用的各种导电层和绝缘层设计要求。可以通过使用关联到层叠编辑器的掩模位置文件来自动添加材料。掩膜位置文件由创建材料库和标准掩膜层名称(由公司首选)库的组合来管理。当在定义层叠中的某一层时,用户选择合适的层名称,该层和相关材料就会被添加到层叠结构中。 掩膜位置文件 定义层叠时,把掩膜层添加到层叠非常简单。加层时,用户可读取掩膜位置文件,层名称和相关材料会向用户展示出来。选择层类型时,用户可看到高亮出来的被指定的 IPC-2581 层功能及相关材料,但如果某个属性需要特定设计的变化,只需选择列表中的新属性,就可以更新现有的分配。 创建层类型(点击查看大图) 掩膜位置文件由位置文件编辑器管理。这个工具管理了设计中所有可能使用的 IPC-2581 层功能类型、子类名称,以及每个层叠的层类型的种类和材料。维护这一数据可保持设计间的一致性,减少可能发生的制造错误。一旦创建完成,掩膜位置文件存储在 MATERIALPATH ,所有用户都可访问这个文件。 掩膜层位置文件编辑器(点击查看大图) 在层叠编辑器中轻松定义多层叠 在层叠编辑器中定义多层叠从启用多层叠模式开始 (View -> MultiStackup mode ) 。然后用户可定义整个层叠,要注意设计中用到的层和材料的顺序。首列定义了设计的缺省层叠结构。勾选与层邻近的首列,识别出主层叠的层。创建另一个层叠,可选择加入层叠列(或 Edit->AddStackup ,或选择 “+” 标签 ),在创建层叠表单中输入新层叠的名称就可加入新层叠。通过勾选,可分配层叠的层。层叠编辑器也会更新层叠表右边的图形,作为快速视图参考。用户可以导出已完成的层叠定义到层叠技术文件,同样的结构可复用于其他设计。 多层叠定义(点击查看大图) 还记得准确文档的评论吗?一旦层叠结构在层叠编辑器中定义完成,好处之一就是能够创建文档的层叠表格。在编辑器中定义的同样的数据可被提取到表格中,制造商可用于制造绘图。现在制造商有材料和结构的信息,可以准确帮助您完成设计。 层叠表格示例(点击查看大图) 通过新的 Allegro 17.2-2016 发行版,增强的层叠编辑器能够对层叠进行准确的定义,具备掩膜层、保持一致的层名称以及文档细节。现在您问:“很棒,但我怎么在设计中使用这些新功能呢?”敬请期待下一篇文章来详细讲述如何使用多层叠功能。 * 原创内容,转载请注明出处:https://community.cadence.com 相关阅读 升级到Allegro17.2-2016的10大理由 Allegro最新技术 欢迎订阅“PCB、IC封装:设计与仿真分析”博客专栏, 或扫描二维码关注“CadencePCB和封装设计”微信公众号,更多精彩内容期待您的参与! 联系我们:spb_china@cadence.com

EDPS: Design Process in Milpitas

$
0
0
For the second year, the Electronic Design Process Symposium (EDPS) took place in Milpitas, having been at Monterey for many years. This was apparently the 25th year EDPS has run. I find EDPS to be a fascinating conference, and I think it is a shame that more people don't attend. Over the years, things I've come across at EDPS include: the first time I heard about RISC-V, long before anyone else seemed to have heard about it (see my post A Raven Has Landed: RISC-V and Chisel ) while I knew about differential power analysis as a side-channel attack on chips, it was at EDPS I actually saw it done: the encryption key was actually read out of a chip using the technique (see my post EDPS Cyber Security Workshop: "Anything Beats Attacking the Crypto Directly" ) even the first time I heard in any detail about David White's work on machine learning in EDA, and we both work for Cadence (see my post EDPS: the Remains of the Day ). In this post I'll summarize what took place this year so that you get a real sense of the breadth of what you missed if you weren't there. I will cover a few of the presentations in their own posts over the coming weeks. Chris Rowen The conference opened with a keynote from Chris Rowen. He's slightly renamed his company to Babblelabs since people were clueless at how to pronounce it with the shorter version of its name. The introduction to Chris's talk was similar to what I covered in Rowen on Vision, Innovation, and the Deep Learning Explosion . One key message is that deep learning silicon is easy: compute is dominated by multiply-add (or MAC). The coefficients (weights) are read-only and heavily re-used. The memory pattern is static and regular and so caches are not required. Programmability means that the same fabric can be used for many applications such as both image recognition and voice recognition. On the other hand, deep learning silicon is also hard. There are impediments to efficiency such as mixed convolution sizes, non-unit strides, difficult parallelization, optimizing sparsity. Memory bandwidth is a challenge since models are large (10+ megabytes), fully connected layers are hard to optimize with each coefficient used once, and complex inter-layer connectivity. The whole chain, from the standard frameworks like Caffe and Tensorflow all the way down to the silicon, needs to be optimized. Chris thinks that silicon availability is getting ahead of deployable applications, leading to chips being a solution looking for a (valuable) problem. Chris wrapped up talking about deep learning startups, of which there are many. As he quipped, an AI startup is "any startup founded in the last three years". I think a deep learning silicon startup is any silicon startup over the period. He had a graph (above) giving a taxonomy of what silicon is where on the edge inference to cloud acceleration axis, and the general purpose processor to deep learning specific. Patrick Groeneveld The program proper opened with Cadence's Patrick Groeneveld talking about a course that he had run at Stanford (along with Antun Domic and Raúl Camposano (that is a lot of years of EDA experience) called EE292A Electronic Design Automation (EDA) and Machine Learning Hardware. I will cover that in its own post . Deep Learning The rest of the morning was taken up with various facets of deep learning. The presenters were: Balachandran Ranejendran of Dell EMC on Machine Learning in System Design and EDA Rohit Sharma of Fiarpath on Exploring Machine Learning for EDA Joonyoung Kim of NVXL on Design Flow for Machine Learning FPGA Jai Kumar of Intel on Efficient HW/SW Co-Design of Complex Emerging Systems Andrew Kahng The afternoon kicked off with Andrew's keynote Driving, Driven, Along for the Ride: Evolutions of EDA, Manufacturing and Design . Even back in 2001, the ITRS roadmap called out "cost of design is the greatest threat to the continuation of the semiconductor roadmap." Andrew's presentation was about who is going to drive EDA going forward: EDA itself, or everyone else? Or is EDA just along for the ride, as in the title. I'll cover Andrew's talk in a separate post. Smart Manufacturing, System Reliability The first part of that afternoon was taken up with presentations on smart manufacturing: Tom Salmon of SEMI (going for the winner of the longest title) on Smart Manufacturing: Convergence, Co-Design, and Co-Optimization Improve Performance, Sustainability, and Yield A cross Microelectronic Supply Chains Willfried Bier or NextFlex on Flexible Hybrid Electronics: New Challenges , New Opportunities Mark Knowles of Mentor (runner-up for the longest title) on Connecting Advanced Manufacturing Test to Design, Fab, and Final Product Yield for Complex FinFET Defect Challenges Dave Armstong of Advantest on Device Manufacturing in an Era of Neural Networks The second half of the afternoon was presentations on system reliability for ADAS, 5G, AI, and photonics: Di Liang of HP Labs on Integrated Photonic Interconnect Reliability for Datacom Applications Amisha Sheth of Intel on 5G Validation Process and Challenges Ritesh Tyagi of Infineon on Functional Safety Architecture Challenges to Achieve Failsafe Operation in ADAS and AD Applications Norman Chang of ANSYS on Achieving 5G-ADAS-AI Reliability for Advanced FinFET Designs ESD Alliance Hogan Evening That evening was a "keynote" co-organized with the ESD Alliance in which Jim Hogan interviewed Amit Gupta on his keys for crossing the chasm and success in an EDA startup. I wasn't able to attend, although I've talked to Amit about this before, see my post Crossing the Chasm: Hogan Interviews Amit Gupta . I don't want to take anything away from Amit's success with two startups based in Saskatoon, but part of his recipe doesn't generalize that well. Locate in a desolate part of Canada, get subsidies from the Canadian government, have basically zero turnover since there's nowhere else locally, and so on. If you are in in Silicon Valley, this advice is a bit like that old joke where a driver asks a local farmer for directions (this is pre GPS obviously) and gets told "if I were you, I wouldn't start from here." Confidential Cloud What is this building? The new Apple Campus in Cupertino? Actually, it is GCHQ in Cheltenham England, roughly the UK equivalent of the NSA. That is where the second day's keynote speaker, Simon Johnson, worked for many years. After 9/11, the US basically said that to continue to be part of the visa waiver program, where citizens don't need to go to the US embassy every time they want to visit the US, passports would have to be biometric and machine-readable. He worked on that biometric passport program "and some other stuff I can't talk about." Today, he is at Inteo working in their security platform division. He talked about Confidential Cloud, which is the idea of building applications in the cloud where even the cloud service provider does not need to be trusted. I will cover that in a separate post. Security The rest of the morning was taken up with various presentations on security: Gong Qu of ISR on Polymorphic Gates and Their Applications in Hardware Security Alessandra Nardi of Cadence on Functional Safety for Semiconductor Designs Ujjwal Guin of Auburn University on Cybersecurity Solutions in Hardware Blockchain The last presentation of the morning session was by Naresh Sehgal of Intel on Introduction to Blockchain and Its Potential Applications in EDA . This was followed (after lunch) by a panel session on BlockChain: Will it Work for IoT Security? although most of the discussion was on other blockchain topics. The panelists were Jim Hogan (who has invested in some blockchain companies), Naresh back to be on the panel, who has been digging into blockchain while on his sabbatical from Intel, and James Gambale, who is a chief engineer of Loomasoft Corp (and was, for many years, a patent attorney at Qualcomm). My experience with discussions on blockchain, and this was no different, was that people mix up some application (such as keeping track of IP usage throughout the design and manufacturing flow) with the underlying technology. Blockchain is sexy and so that rubs off onto whatever you are talking about. I think VCs are irrationally keen on it since it doesn't have an elephant like Google or Facebook sitting there already, so if it does become something big, there is the possibility of being "the Google of blockchain". But mostly it still seems to be a solution chasing a problem. For example, for the problem of helping the DoD keep track of IP used in their devices over the multi-decade lifecycle of the typical defense product, why would you use an unwieldy expensive technology like blockchain to hold a modestly sized, low activity, database that the DoD could perfectly well centralize on a single server (or in the cloud for more redundancy). Digital Marketing 2.0 In the same room at SEMI where EDPS was held will be the second ESD Alliance digital marketing workshop, with Nicolas Athanasopoulos of OneSpin. I wrote three posts about the previous workshop (if you want to read them, then start with the first one Digital Marketing in EDA...With No Hands on the Wheel ). It is $30 for anyone who works for an ESD Alliance member company (and that includes dinner). It is on October 3rd, with dinner at 6pm and the workshop proper starting at 7pm. Yes, it is also TSMC's OIP symposium that day, so not the best choice of evening since a lot of EDA marketing folk will be in the Santa Clara Convention Center all day. More details, including registration, are on the ESD Alliance website . I'll see you there. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.

CDNLive India: Invecas and FD-SOI

$
0
0
Today it is GTC, the GlobalFoundries Technology Conference. I will be there and I will cover what was said later in the week. When I was at CDNLive India a couple of weeks ago, one highly relevant presentation was by Invecas, titled PPA Strategies Using FD-SOI Technology . You might not have heard of Invecas. They are an IP company with a portfolio of what is often called foundation IP: standard cells, memories, IO, interfaces, and so on. GlobalFoundries licensed FD-SOI technology from ST Microelectronics. If you want to read the story of how FD-SOI came into existence, read my post Silicon on Nothing: the Origins of FD-SOI . GF licensed the technology at 28nm but their customers told them that it wasn't differentiated enough from planar 28nm and so they developed 22FDX, a 22nm version of the basic FD-SOI technology. But this left them with a problem as a foundry, namely the lack of IP. They didn't want to build up an in-house IP group, but on the other hand, as a new player in the market, they couldn't just rely on the "if we build it they will come" approach, and just provide the process and wait for the IP suppliers to simply create a full portfolio opportunistically. It is easy to underestimate this chicken-and-egg problem. I well remember in the mid-1990s that VLSI Technology and other Arm licensees had to fund an RTOS club since companies like Green Hills and Wind River didn't see porting their RTOSes to Arm as a viable business opportunity unless their costs were already covered by Arm and its licensees. Of course, today, Arm could probably charge RTOS vendors for the privilege. Invecas GF could have paid any of the big foundation IP vendors such as Arm to develop their libraries, but a GF executive told me at the time that they were worried that they wouldn't get the focus they needed. Instead, they commissioned Invecas to create their libraries. They were just getting started and didn't create libraries for anyone else and so GF would get the focused attention they needed. Invecas have been working with GF ever since. At CDNLive India, Surya Narayana Varma Uppalapati presented their work. He started with an overview of FD-SOI. I won't repeat that here. If you want a basic background then I covered it in my post Cadence Tool Suite Qualified for 22FDX Reference Flow . From a design point of view, the big difference between FD-SOI and other processes (planar or FinFET) is that there is a backside gate under the channel. This cannot be used to turn the gate on and off, but it can be used to increase the performance of the gate (forward body bias) at the cost of leakage, or to reduce leakage (reverse body bias) at the cost of performance. This gate voltage can be set statically, it can be adjusted when the chip is first powered up to adapt to the actual corner at which the silicon was manufactured, or it can be varied dynamically (but not fast) in the FD-SOI equivalent of dynamic voltage and frequency scaling (DVFS). In the years since, Invecas has developed a wide portfolio of 22FDX standard cell libraries, with varying numbers of tracks and supply voltages The above table summarizes the markets, and the most appropriate foundation IP to be considering. One thing that Invecas emphasized is that in FDX you don't just think PVT as a process corner, you think PVTB, where the B stands for bias. Characterization is done at all the corners shown in the above table. Note that whatever bias you intend to use in your design, the zero bias values are important since the design needs to be functional at power-on and reset, before there is time for any bias to be applied. Otherwise, there is a risk that the circuit would work fine once bias is established, but it can never get there. Bias Control There are a number of different practical aspects to using body bias (BB). First is that there need to be special TAP cells to allow the bias voltage to get down to the back gates (wells). These cells are taken care of by the Cadence Innovus Implementation System. The voltages involved in bias can be higher than normal signals, and so special high voltage spacing rules are required, which Innovus honors. As with clocks and power supplies, the bias network needs to be planned as part of the overall design and floorplan, and which parts of the metal stack can be used depends on resistance and capacitance requirements. The bias voltages are controlled by special cells called body-bias-generators, BBGENP (for p-wells) and BBGENN (for n-wells). These are controlled in turn by a body-bias controller, which monitors the difference between unbiased performance and biased performance, and depending on control registers, drives the BBG cells to actually generate the voltages. The number of BBGENs required depends on the active area (actual standard cell areas times utilization) and biased memory areas. Due to interconnect resistance, the BBGENs are best distributed through the chip. In addition, it is good practice to have power switches for the biasing to allow support for external (off-chip) biasing both as a backup option and in post-silicon design analysis. The same pads can also be used to monitor biasing when it is controlled on-chip. Summary Invecas has a broad portfolio of foundation and other IP available. I would be amiss not to point out that Cadence also has a range of more specialized IP available in FD-SOI processes (28nm, 22nm, and 12nm) although Cadence does not supply standard cell libraries. As to the EDA tool side, the single sentence summary is that Innovus (and other tools) fully support everything that is required to do FD-SOI designs including well taps, high voltage rules, and more. Sign up for Sunday Brunch, the weekly Breakfast Bytes email

Virtuoso - The Next Overture: Introducing Simulation Driven Routing

$
0
0
The new release of the Virtuoso® platform (ICADVM18.1) offers groundbreaking analysis capabilities and an innovational new simulation-driven layout for more robust and efficient design implementation as well as extending our support for the most advanced process technologies. With this solution, we are able to significantly improve productivity through advanced methodologies and provide the most comprehensive set of solutions in the industry with an interoperable flow across chip, package, module and board. Simulation Driven Routing (SDR) is another buzz word that you must have heard for a while. Our interactive routing research and development team has been busy with developing a feature that elevates Virtuoso® from an electrically-aware environment to a simulation-driven environment. It is a significant step towards “correct-by-construction” routing. Virtuoso® Layout Suite EAD already improved the design cycle by 30%. SDR is going to provide another improvement on top of this one. The new feature has been well received by our early access partners. About Simulation Driven Routing Despite having a lot of cool stuff that Electrically Aware Design offers, SDR addresses many of the electromigration (EM) and parasitic challenges of critical circuits and advanced-node designs. Featuring a unique in-design solution, interactive SDR provides a powerful new way for a layout designer to have a predictable flow to meet the current density constraints and in turn significantly reduces the sign-off time and improves the productivity and design reliability. SDR decreases the number of iterations and improves the layout productivity by up to 50% . The figure here illustrates the various design flows. The first block on the left illustrates the regular design flow without EAD, where the electrical impact is unknown until the layout is completed. The next block in-between illustrates the electrically-aware design flow. In this flow, the parasitic extraction and EM checks are performed concurrently during the layout implementation process. The last block in the figure illustrates the Simulation Driven design flow. With SDR, we use simulation derived electrical information to drive the layout implementation. This provides a way to take into account EM constraints during interactive routing and be EM compliant. SDR, working in parallel with EAD in-design, facilitates for both working on a design with the correct-by-construction approach and accurate extraction for checking and sign-off. Introducing simulation derived information even earlier in the design cycle, further minimizes the loop in between layout and sign-off, elevating Virtuoso from an electrically aware environment to a simulation driven environment. The key features are: Easy way to visualize the net topology and current distribution per net before routing. Interactive way to calculate the current into a wire, according to the topology. Auto sizing of wires and vias according to the estimated current. Easy and flexible way to connect devices, according to the estimated current, as you route, especially for multi-finger devices. Related Resources Simulation Driven Interactive Routing For more information on Cadence circuit design products and services, visit www.cadence.com . Contact Us For more information on the New Virtuoso Design Platform, or if you have any questions or feedback on the features covered in this blog, please contact team_virtuoso@cadence.com . To receive similar updates about new and exciting capabilities being built into Virtuoso for our upcoming Advanced Nodes and Advanced Methodologies releases, type your email ID in the Subscriptions field at the top of the page and click SUBSCRIBE NOW. Parul Agarwal, and Alexandre Soyer (Team Virtuoso)
Viewing all 6713 articles
Browse latest View live