MemCon, the annual all-things-memory conference originally started by Denali and since continued under the Cadence umbrella, took place at the Santa Clara Convention Center this past Tuesday. The morning started with two keynotes, the first by Hugh Durdan, VP Design IP Marketing at Cadence, titled Diverging Products for Diverging Needs: Where Will the Memory Industry be in 5 Years? The second, by Stephen S. Pawlowski, VP of Advanced Computing Solutions at Micron, was titled The New Focus on Memory: How Market Forces Are Driving Toward New Memory-Centric Architectures . I will cover the keynotes in a future post on Breakfast Bytes. Adding SCM One theme that ran through several of the sessions that I attended in the afternoon was Storage Class Memory or SCM. This is best explained by the diagrams below from Jim Handy's presentation on "A Time of Change for Memories." On the left is the old hierarchy that has lasted for decades. On the left of the diagram is the processor, which may or may not have an on-chip cache. The main "memory" is DRAM, and the main "storage" is a hard disk drive. To the left memory is faster and more expensive, to the right it is slower but cheap. The first change that has happened is that NAND flash has got cheap enough that SDDs have replaced (in laptops) or complement (in servers) the rotating media HDD. It is a big gap that is being filled since the difference in latency between accessing DRAM and accessing a hard disk is 100,000 or 1M to 1. The new development is SCM, which is expected to fit into the memory hierarchy between DRAM and flash/rotating storage media. The most well-known SCM technology is Intel/Micron's 3DXpoint. But there have been delays in bringing this to market, which is opening up a window for competing technologies. I said "expected" above, since there is a chicken and egg problem. It only fits into the memory hierarchy if it is cheap enough, and it will only become cheap enough if it ships in volume. In the context of memory technologies, that means a dedicated fab producing a single product, running at full capacity. This is something like a $6B investment and it is going to take time to get there. Jim's prediction is that for 3DXpoint, Intel will sell it at a loss and offset the loss with the profit on the associated processors. As volume goes up the costs will come down until it is profitable, a sort of "fake it till you make it" pricing strategy. SCM and flash are not just another layer in the memory hierarchy. They also reduce the need for DRAM. In fact, DRAM upgrades are rarely the best way to spend dollars to upgrade memory performance. It can be much better to add some (or more) flash, and when SCM becomes mainstream, that will be very attractive too. It is a well-known saying among IT professionals who configure server farms that "adding an SSD reduces DRAM requirements." The above diagram shows some tests on various amounts of DRAM and SSD. The red triangle in the bottom left shows that with only 1GB of DRAM, performance suffers. But once you get to 2GB DRAM, there isn't a lot of change across the plot. But adding an SSD makes a huge difference. Jim's prediction is that DRAM upgrades will become a thing of the past since you lose a lot of performance by having upgradable DIMMs rather than "soldering the memory down". In fact, Jim thinks that High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) will be come standard, despite the high price. Having a small amount of very high performance DRAM, and then SCM and SSD will be an optimal way to spend your dollars. Since there is only a small amount of DRAM, the cost adder is less important than the performance increase. And it is a real cost increase. Jim's cost model is that adding thru-silicon vias (TSVs) to a memory wafer adds about $500 to the roughly $1500 wafer cost, so about a 30% increase. Plus the logic chip (for HMC) and assembly. However, HBM is inherently not upgradable (it is in the package) and HMC is point-to-point, so not upgradable. So systems will have one block of non-upgradable DRAM and then if more memory is required it will come as SCM or flash. The other big positive about SCM other than the lost cost is that it is non-volatile. Today, the software assumes that all contents of DRAM are lost during a power failure (because they are) and the contents of SSD/HDD are not. If SCM is just treated as DRAM by the operating system, then the non-volatility is not taken advantage of. However, there is an increasing demand for in-memory databases, not just keeping the indexes in memory and loading the data from disk on demand. But that only works if the data is not lost when the power goes out. Currently this is done by using NVDIMMs which are a mixture of DRAM and flash, with the DRAM backed up to flash on a power failure. With this architecture, as in the first diagram I showed, SCM is only copied out to SSD when space is needed, in a similar way to how DRAM is paged by the operating system today. It is a requirement that the intermediate SCM is truly a separate level in the memory hierarchy. SSD can only be 6-7 times as fast as NAND-flash-based SSD, no matter what the underlying memory technology. The PCI interface, controller, and software create a barrier to further performance increase. In fact, Jim thinks that the three levels will become so explicit that the processor will be aware of them and the controllers will move inside the processor package, with separate pins for DRAM, for SCM, and perhaps SSD if it doesn't just stay on the peripheral bus. Everything will be under control of the software, not a hidden storage hierarchy. So the summary slide, with the history of how memory hierarchy has evolved, and how Jim thinks it is going to evolve in the future, looks like this: IBM: Emerging Memory, In-System Enablement That perspective on SCM all "looks good on Powerpoint" but for building memory hierarchies for IBM servers, they want real numbers. That means putting some memory into a server (such as MRAM, RRAM, PCM) and seeing how the system performs with real workloads. It is too expensive to architect and build an entire new server around each potential memory technology. The existing ASIC-based memory buffers do not have enough flexibility for this. The performance of the memory subsystem is high, and so any memory buffer needs to have high performance, too, or the measurements will not be representative of a real server using that technology. Edgar Cordero of IBM talked about how they solved this conundrum. They needed to keep as much of the functionality of the existing memory buffers as possible, but have the flexibility to meet the requirements of new memory technologies. The answer was an at-speed full-function FPGA card on IBM's DMI bus. This card was called ConTutto ("con tutto" is Italian for "with everything"). This card could be inserted in place of one of the eight memory buffers in the chassis for an IBM server (it actually is a much larger board so it sticks out and it is not possible to replace the covers completely). In fact, the server could operate with any combination of CDIMMS (normal DRAM boards) and ConTutto boards, not just one. Above is the board. Edgar had one there so this is an actual photo I took. It has two DDR3 DIMM connectors for whatever memory is being used. The next generation of ConTutto will have DDR4, and hopefully a smaller board so that it will fit properly into the box. ConTutto has a top-of-the-line Intel/Altera FPGA, which is the large black square (actually the FPGA is underneath, that is the heatsink). In fact, in the Q&A, someone asked about using such an expensive FPGA and Edgar's response was "when you are only building a dozen boards, who cares what they cost? But you need the speed." Below is a photo of the ConTutto board plugged into an IBM Power server. Part of the reason to be at MemCon was to see if there was interest from memory suppliers. Obviously ConTutto is of interest to IBM to assess potential technologies, but it might well be of interest to memory engineering groups to assess technologies on which they are working, but under real conditions. There are currently no plans to productize ConTutto, but that could change depending on the business case. Here is a video of a (different) presentation about ConTutto from the Open Power Foundation: (Please visit the site to view this video) Previous: Cache Coherency Is the New Normal
↧