System-on-chip (SoC) designers are always optimizing what has become known as PPA, which stands for power, performance, and area. Almost always, the most severe constraint is on power. We can put a lot of functionality on a chip and we can clock it very fast. Except that power limitations stop us doing that, making it impossible to use all the transistors on the chip at the same time and impossible to clock the design as fast as we would like. For many chips, the limitation is thermal, but increasing numbers of systems are battery powered, and the time the battery will last is a key design parameter. System Design Enablement (SDE) is a design approach in which optimization is done concurrently at all levels of the design from software down to the silicon. There are certainly power gains to be made at the lower levels, such as power-aware synthesis, but they are typically reductions of at most 10%. The big gains come from optimizing at the architectural level, where they can be as big as 60%. Furthermore, it is not possible to ignore the software level. Apart from the fact that the power dissipated by a CPU depends on what the software is doing (especially the cache-hit/miss ratio), the software typically controls the coarse-grained power management, such as powering an audio processor up or down depending on whether any audio activity is taking place. TSMC has been working on a methodology, called System-PPA, to make it possible to do early power analysis and optimization at the system level, or IP block level. They have been working with Cadence and Arteris on an early trial implementation focused on Tensilica processors, along with standard interface IP such as LPDDR. At the system level, design is largely about assembling pre-existing IP blocks. But from a power point of view, these IP blocks may have many different modes of operation with very different power characteristics. For example, the diagram below shows a simplified view of the major states of a Tensilica DSP. The power dissipated varies enormously between the DSP processing data, compared to when it is clock gated or power gated. The methodology assumes existing C (or SystemC) models for IP blocks. Users then create the power-state APIs for each IP block, and incorporate them into a provided TLM 2.0 wrapper template. Then, using Voltus, power characterization is done to create power data lookup tables at specified PVT (process, voltage, temperature) conditions. These tables can then be used to do power analysis and optimization at the system level. As part of this effort, TSMC has developed a baseline virtual platform (BVP) where IP vendors and system houses can plug in these system-level power models and perform the power analysis and optimization using the TSMC’s Virtual Platform Analyzer (VPA). The goals are to be able to do early system-level power and architectural exploration, especially hardware/software partitioning. Then the platform can be used for bringing up bare-metal software and device drivers, and booting operating systems such as Linux, Android, or a real-time operating system (RTOS). The approach is compatible with power policies captured in CPF or IEEE 1801. It is essential that the power and architectural analysis and optimization be done at the early system level for the biggest gains. This means running a virtual platform incorporating software, IP, power estimation to both get a realistic estimate of power before embarking on detailed design, and also for making tradeoffs. The Tensilica processor family, for example, allows a lot of customization, adding additional instructions and interfaces, which have a major impact on all three of PPA but especially power. Modeling at the system level can get surprisingly good percentage accuracy for major IP blocks such as CPUs, DSPs, NoCs, and memory interfaces. Previous: Mellanox: Using Palladium ICA ModeImage may be NSFW.
Clik here to view.
Clik here to view.
