RTL synthesis is not a simple pushbutton tool, especially at 28nm and below. In a recorded presentation at the Cadence web site Ramesh Rajagopalan, chip lead for physical implementation of networking SoCs at Cisco Systems, shares some of his company's strategies for generating high-quality, predictable netlists from RTL synthesis.
Rajagopalan was a speaker at the Cadence DAC Theater at the Design Automation Conference (DAC 2014) in June, where over 40 speakers - mostly customers and partners -- offered informal half-hour presentations. Audio recordings and slides are now available for most of those presentations, including Rajagopalan's presentation, at the Cadence DAC microsite.
The mere fact that Rajagopalan gave this presentation shows how RTL synthesis has become more and more entwined with physical implementation. "I am basically a physical design engineer, so the very fact that there are physical designers talking about synthesis strategies shows how far physical constraints have moved up in the design space," he said.
Getting predictable
Rajagopalan works in the ASIC Implementation Team at the Cisco Data Center Group, which is responsible for networking chips. These ASICs use 28nm and below process nodes, contain up to 160M gates with up to 82 sub-chips, and run from 750MHz to 1GHz. Silicon robustness, low power, and time to market are key concerns. "The main thing we are focused on is bringing some predictability to physical implementation," he said.
Rajagopalan noted that a netlist that results in routing congestion will impact routing predictability. If area or gate utilization increases between synthesis and post-route optimization, area won't be predictable. "We need a quality netlist that can converge in physical design," he said. There are two figures of merit:
- The physical implementation must be congestion-free and routable
- Timing must converge with the lowest possible cost to area and power
To achieve these goals, Rajagopalan outlined some of Cisco's success strategies, as noted below.
Avoiding routing congestion
RTL code that has a lot of case statements is likely to result in route congestion. The only fix is to go back to RTL and look at how the netlist is structured.
Cisco uses the Cadence RTL Compiler Physical synthesis tool, which has a Physical Aware Structuring (PAS) feature that helps reduce congestion. In his presentation, Rajagopalan showed how PAS targets logic restructuring for congestion, and optimizes high-congestion structures such as crossbars, barrel shifters, and memory-connected multiplexer chains. PAS performs a mux selection for mapping and can restructure a large mux into a set of cascaded muxes.
Getting timing to converge
"You need a placement strategy to improve the timing convergence," Rajagopalan said. What's needed is better control logic optimization with resource sharing and operator merging, better datapath structuring and optimization, and more accuracy in the RC parasitic estimates used in synthesis.
Rajagopalan noted that the variation in layer RC is quite high in advanced nodes. This means the synthesis tool must have good RC information from the placed design. RTL Compiler can provide this information because internally it invokes the Cadence Encounter Digital Implementation System (Encounter DIS). Encoutnter DIS does the placement, gets the global route estimates, and provides the necessary information so that synthesis can annotate layer assignments.
Earlier optimization is better
The biggest impact on quality of results (QoR), Rajagopalan noted, comes from the earliest phases of the RTL synthesis flow, such as RTL optimization and global mapping. However, this is when RC information is at its least accurate. RTL Compiler Physical, he said, strikes a good balance. "They give the feedback back from the placement engine after the global mapping and optimization stage. That way, whatever customers use to do the global mapping is dependent on the actual floorplan and the actual RC estimates of long nets."
Obtaining better quality datapath optimization
First, Rajagopalan advised, elaborate the design and run generic RTL optimization in RTL Compiler Physical. This optimization performs arithmetic simplifications and resource sharing. Then write out a report and evaluate how many elaborated modules were transformed into carry-save arithmetic, and examine if the datapath area and slack are within acceptable ranges.
Awareness of long interconnects in RTL Compiler Physical
The Physical Aware Mapping (PAM) feature in RTL Compiler Physical estimates long wires based on cluster placement. The initial map is purely logical with no long wire predictability. When cluster placement is added, the mapping can identify long wires, and automatically time and optimize the paths.
Maximizing the synthesis flow
Rajagopalan showed a detailed, high-performance synthesis flow including design elaboration, RTL optimization, Boolean network synthesis, global focus mapping, logic gate synthesis and incremental mapping. These steps are followed by placement. "So what we get out of this is fully physically aware," he concluded. "It knows exactly the floorplan, the physical constraints for the long nets, and the layer assignments." The end result, he said, is a higher quality netlist.
To listen to this presentation and see the slides, click here and scroll down to 2:30 pm Wednesday June 4. No registration is required.
Richard Goering
Related Blog Posts
RTL Compiler Beginner's Guides Available on Cadence Online Support
Front-End Design Summit: The Future of RTL Synthesis and Design for Test