At CDNLive in Munich, Amjad Qureshi talked about High-Speed DDR and LPDDR Memory Sub-system Challenges . Amjad is Cadence's VP of R&D for memory interface IP, so he and his team have to live with these challenges every day. Amjad started off with some of the trends. DDR4 ramps are driven by cloud datacenter buildout. It has a 50% bandwidth increase over DDR3 and 20% less energy. It is optimized for high capacity such as 3D stacking and LRDIMM. In 2016 it is expected to be the dominant memory and also extend from cloud into other segments. Several of the ARM ® -based micro-servers contain Cadence's DDR4 solution. LPDDR4 is being driven by mobile application processors, in turn driven by gaming and media apps. It has twice the bandwidth of LPDDR3 and half the energy. It started to appear in 2015 phones, and several tier 1 mobile suppliers use the Cadence LPDDR4 subsystem. One thing that I didn't know is that mobile memory now exceeds the speed of PC memory (it passed it in volume a long time ago). So that is the market. What are the design challenges? Well, in the same way as the three most important things in real estate are location, location, and location, the three most important things in modern memory sub-systems are power, power, and power. This is not as trite as it sounds since there are actually three different types of power to be concerned about: Maximum dynamic power under heavy load (need clock gating and DVFS) Leakage when under light load (Vt selection and leakage recovery) Shutdown leakage when under standby conditions (power switch, voltage lowering), which, in some applications such as IoT, can be most of the time There are other constraints, too, such has high-performance DDR3200, and there are also EM/IR targets that must be met. One of the biggest issues is leveling, optimizing the communication paths between the PHY and the DRAM itself. The timing constraints are very tight and each bit has different package pins and different board traces to get to the (off-chip) memory, so per-bit adjustment has become essential. To make that worse, as temperature changes and as devices age, the adjustment will need to vary, too. Advanced process nodes present a new challenge. Providers of IP such as Cadence would like a one-size-fits-all product so that they only need to design one supercombo for DDR3, DDR3L, DDR4, LPDDR3, LPDDR4, and LPDDR4X. Of course, the users would like to have a range of I/Os optimized to their different markets (mobile, server, IoT) and memory configurations (POP, memory down, DIMM). The reality is that there are some analog design challenges that rule out the single DDRIO solution optimized for everything. On the other hand, it is obviously economically infeasible to design several dozen interfaces optimized for every single combination. The analog design challenges come from the fact that DDR and LPDDR standards require a wide range of I/O supply voltages. The I/O devices themselves have low gain and high thresholds at 1.06V, and speeds of 4266M cannot be achieved by I/O devices alone at the higher voltage, meaning that some of the core needs to run at the higher voltage, which in turn requires complex device topologies. The voltage change also stresses level-shifting between the SoC core voltage and the I/O voltage, since DDR3 is greater than 1:2 up and, at the other end of the scale, LPDDR4X is 2:1 down. Another analog design challenge is coping with the fact that different memory configurations (POP, memory down, and DIMMs) all have different needs for drive and impedance. For example, DDR3/4 DIMM applications, especially multi-DIMM, need high driver linearity and strength to handle board loss, whereas POP and memory down LPDDR applications have short channels and don't need equalization. The different standards make life difficult for the designer in many ways. One is that different DDR standards are terminated differently: DDR4 is terminated to VDDQ (I/O voltage, high termination) LPDDR4 is terminated to VSS (low termination) DDR3 is terminated to VDDQ/2 (center termination) All three of these configurations must be supported, which considerably restricts output drive topologies. On the input side, the receivers must support rail-to-rail inputs and a variety of input signal common mode voltages. A further challenge is the shrinking read window. Even if we are not memory designers we have seen those cute "eye-diagrams." Well, the eyes are getting smaller. DDR3-1600 had 375ps, but for LPDDR4-4287 it is down to 140.6ps. A lot of the time of each bit is taken up in tolerance for static timing analysis, jitter, crosstalk, the channel model, and so on. As the data rates go up, these take a bigger and bigger piece of the pie, leaving a smaller and smaller slice for the controller. In summary, the big issues: Low power, especially leakage Multi-protocol I/Os with wide voltage range Requires a system-level view with a scalable architecture and training algorithms TL;DR don't try this at home, get your DDR IP from Cadence. Details here . Previous: EDPS Cyber Security Workshop: "Anything Beats Attacking the Crypto Directly"
↧