Make Sure Your Car Doesn't Break Too Often...When It Does, Make Sure You Catch It

We need our cars to be safe as the amount of electronics in them increases almost exponentially. One aspect of that is that the component suppliers need to provide Automotive Safety Integrity Level (ASIL) certification, which includes a formal assessment of the development tools. Obviously it makes no sense for every user of Cadence tools to do that assessment independently, that would duplicate a lot of effort. Today, Cadence announced an evaluation by TÜV SÜD that satisfies documentation requirements the component supplier has to provide for their tools and flow. TÜV SÜD, an internationally accredited independent testing and conformity assessment company, completed an evaluation and confirmed the Tool Confidence Level 1 (TCL1) predetermination for our analog/mixed-signal tool chain and digital front-end design and verification flows. An evaluation of the digital implementation and signoff flow is expected to be completed by the end of year. I will write about this in more detail in a future post. For more background, I attended a recent presentation by Alessandra Nardi where she dug into the details of Cadence's Functional Safety and Reliability Reference Flow for Automotive Applications . For some history, and a sense of just how fast everything advancing in this domain, see Ten Years Ago Self-Driving Cars Couldn't Go Ten Miles . Reliability and Functional Safety The first thing to start off with is the difference between reliability and functional safety: Reliability: The vehicle does not break too often Functional safety (FuSa): If something breaks, the vehicle recovers to a safe situation In the past, this was not a big issue since there was a limited amount of electronics; it was all built in older, very well characterized, process nodes; and there was always a driver actively driving. None of that is going to continue to be true. The amount of electronics is increasing fast with the level of autonomous driving. However, it is not just more and more electronic control units (ECUs). The performance requirements of individual ECUs is increasing dramatically with the need for functionality like vision processing. This can only be achieved with advanced nodes. But reliability is more challenging on advanced nodes. Reliability is measured in FITS, failures in time. This is the number of failures per billion hours of operation. For an entire vehicle, this might be 10 FITS. But an chip built on an advanced node has a minimum of 500 FITS. So you can see the problem, especially when you consider that a single chip may only get 0.1 FITS of the overall 10 FITS budget. There is more than one critical chip in a vehicle. It is necessary to take active action to reduce FITS, it is not something that can be achieved by tightening process manufacturing windows. Functional Safety Functional safety is covered by the ISO 26262 standard which lays down the principles. It covers both random and systematic errors. However, the imporance depends on the Automotive Safety Integrity Level (ASIL) of which the highest level is ASLI-D, "likely potential for severely life-threatening or fatal injury." A systematic error is one like a software bug. If the right (wrong?) circumstances arise, the issue will always be handled incorrectly. It is addressed by FuSa management processes (planning, traceability, documentation, specs...) and by checking the software tools used for ISO compliance. The strictness of the processes depends on the ASIL level. Random failures are ones which may be permanent (a chip fails and will never recover) or transient (such as high-energy neutron effects or random noise). It is addressed by designing safety mechanisms to correct faults and verifying the failure rates that show through. Failure rates and diagnostic requirements depend on the ASIL. At an abstract level, the FuSa flow is shown above. There is a front-end part of the process where FuSa analysis is done. Then it is put into practice with design and functional verification. After that a separate FuSa verification set is done. This is not to verify that the chip works to spec but rather than faults are correctly handled. Then, during physical implementation, there are steps to be taken to ensure that, for example, duplicated logic doesn't get contaminated through physical design to have single points of failure for both copies. One requirement for many designs is that they have the capability to test themselves, at power-up as a minimum, but perhaps also regularly during operation. There are several features that can be used to help address this: IEEE 1500, which allows analog to be isolated from digital so that in-system LBIST can be used Test point analysis/insertion to achieve ISO 26262/ASIL coverage goals Logic BIST (LBIST) Memory BIST (MBIST) Advanced fault modeling (exhaustive, bridging faults, cell-aware) One important tool in the FuSa Verification stage is Incisive Functional Safety Simulation. This allows faults (in the integrated-circuit design sense, such as a signal stuck at 1) to be injected and the consequences and recovery (or not) verified. This is a complex process in practice, starting from the safety requirements and working down to safety goals, then down to failure modes, and finally to the technical level where the faults can be estimated or verified. FuSa is not over once it comes to physical design. One approach to FuSa is to add redundancy, but that only helps if the redundant logic is truly independent. So physical design needs to be aware of this and avoid introducing common cause failures by: Special placement of some registers, such as voting flops Logic isolation (safety islands) Power-domain routing with specific safety coloring Increase reliability by, for example, 100% multi-cut via coverage Reliability Semiconductor components, like all of us, grow old and change. Aging and self-heating occur in all components but automotive has some factors that make this more severe: Device lifetime of 15-20 years: transtors age due to hot carrier injection, bias temperature instability, time dependent dielectric breakdown. Electromigration (EM) can lead to shorts and opens over time, and high resistance leading to voltage loss Power train direct mount junction temperature can be 175°C, which accelerates aging and EM FinFETs, at 16nm and below, have significant self-heating Actuators demand significant current and voltage. Power ICs have high current, voltage, and noise Reliability needs its own flow to address all the aging effects, especially in more advanced process nodes. This is one reason that automotive has historically used mature process nodes where these effects were less severe and where 10 years of characterization was available. The need for high performance means that advanced processes are a necessity today, despite their inherently worse reliability. A final picture to finish with. We want high reliability, but we also need to accept that 100% reliability is not possible. So we also need functional safety to make sure that the car always recovers to a safe situation when something fails. Reliability feeds into safety. Learn more about Cadence's functional safety products . Also this week, Tesla announced that all its cars are equipped with the hardware needed for full autonomy. They are not yet equipped with the software. Presumably, like with AutoPilot, at some time in the future, they will update cars with the new software and they will suddenly have much more powerful capabilities. They also released a video of a self-driving Tesla driving itself, dropping off the passenger, and then driving off and finding its own parking place. Previous: DVCon Europe Highlights