SANTA CLARA, Calif.—The promise of flash memory storage systems—despite robust market success in recent years—has yet to be realized. And the unique technical challenges and changing market requirements in that segment require a new approach to storage processing architectures.
That was the word from Chris Rowen (right), CTO Cadence IP Group, who
delivered a keynote address to the Flash Memory Summit here Thursday (Aug. 7).
Rowen described a unique design opportunity for flash storage systems at a time of increasing processor specialization: Over the past two decades, the need to offload specialty functions from the microprocessor has flowered to include graphics, audio, video, network processing, and more. It's now time to explore a fresh approach to storage processor architectures, he argued.
"There are enough unique characteristics about the kinds of computation needed in a flash-management system that justifies new structures, new instructions sets, new storage models, new ways for the processors to interact with these very high bandwidth interfaces," such as PCIe, NVMe, SATA, DDR controllers, and the like.
Clearly, designers have certain subsystem needs, including the storage processor, interface controllers and PHYs, and software. They also require:
- Flexibility to support various algorithms
- Scalability in areas such as capacity, bandwidth, and transaction rate
- System robustness
- Data integrity
- Reasonable silicon cost
- Fast time to market
The architect also demands faster command queue processing, more memory bandwidth, and reduced latencies, particularly DDR latencies, he added.

The architect needs "to be able to take a tool that says ‘I want to be able to select or describe all of the key features of my processor'...and from that, in minutes, generate the complete hardware design, RTL, test environment, EDA scripts for physical implementation—all without manual intervention," Rowen told the audience.
"This has been the key to the wide proliferation of these data plane processors because it becomes dramatically cheaper to make a data plane processor and to make a highly tuned processor for these environments," Rowen said.
Rowen—who studied at Stanford, worked on the RISC architecture, and helped found both MIPS and Tensilica (the latter of which Cadence bought in early 2013)—described how, as part of a reference platform development he showed the audience, Cadence has built a storage processor instruction set built on the Xtensa processor family and implemented in the Tensilica Instruction Extension (TIE) code. This storage processing unit (SPU) includes:
- 64-bit register file
- Nonblocking 64-bit input and output queues
- Local cache tags to do rapid table lookups
- Packing load and packing stores
- 64-bit load/store including address update load/store
Rowen noted that that some of the key tasks on storage data structures—creating table hashes, doing lookups in linked-lists, inserting and deleting elements from linked lists, parsing command and packet structures—are slow, often painfully sequential, tasks that generally defy significant architectural speed up on CPUs.
"This (Xtensa) architecture routinely shows a 3X to 4X performance advantage over a good RISC processor for these kinds of structures," Rowen said. Implementing the processor's core logic takes between just 0.1 and 0.2 mm2 of silicon, Rowen added.
He closed by saying:
"We really are just at the beginning of a kind of revolution of smart, cost-effective, but highly scalable flash storage systems. [This approach to storage processing architectures] unleashes the creative potential of the architect and unleashes more of the bandwidth and transaction rate potential of the flash devices."
Brian Fuller
Related stories
- Disk Drive's Days Might be Numbered: Woz
- Flash Memory Summit: 3D NAND Flash Faces Cost, Reliability Challenges