The two obvious ways to implement a complex algorithm are to write a large amount of RTL and synthesize a specialized hardware block, or write a large amount of C and run it on a microprocessor. The RTL approach has the potential for producing an optimal implementation but it suffers from two major disadvantages. The first is that generating all the RTL is both expensive and time-consuming, and risks pushing out delivery of the overall system. As General Patten said, "A good plan, violently executed now, is better than a perfect plan next week." In the same way, a perfect block delivered too late for the product is not the best approach. But a worse problem is that almost certainly the specification of the system is going to change. When a system involves complex algorithms, it also means that it involves unstable algorithms: vision processing, LTE modems, speech recognition, and so on. This means that the RTL doesn't just need to be written once, it needs to be written and re-written. This is even more time-consuming. If the algorithm changes after the product ships, forget field upgradability. So there is a need for flexibility. What could be more flexible than software? You write the algorithm in C or C++, it is probably already described that way anyhow, and then you can run it on an embedded microprocessor. ARM has some good ones at all sorts of power/performance points. But they almost certainly don't have the point that you need. The highest performance will be too low and the power dissipation will be too high. Or both. Flexibility comes at too high a price. So if one bowl of porridge is too hot and the other is too cold, what is the solution that is "just right"? The answer is to use a specialized processor optimized for the application domain, be it vision, radio, audio, or whatever. The programmability gives the flexibility that is required and the domain-optimized architecture hits the power and performance points. The algorithms will change, but not so much that they enter a new domain. Facial recognition doesn't suddenly transmogrify into speech recognition. This was the Tensilica approach for many years, long before they became part of Cadence. The Xtensa environment has a lot of flexibility, too much for many designs. Instead, for many markets, Tensilica delivered specialized cores for different applications: the HiFi audio/voice DSP, optimized processors for LTE wireless baseband, and the IVP image and vision processor. These are optimized for their particular domain, but retain the extensibility of Xtensa in terms of extending the capabilities through TIE, the Tensilica Instruction Extension. Most recently, Cadence announced a new core taking this "just right" approach to vision processing, the catchily named Cadence Tensilica Vision P5 digital signal processor. The Vision P5 DSP is built from the ground up for applications requiring ultra-high memory and operation parallelism to support complex vision processing at high resolution and high frame rates. It is ideal for off-loading vision and imaging functions from the main CPU (probably that ARM) to increase throughput and reduce power. Applications that can benefit from the Vision P5 DSP's capabilities include image and video enhancement, stereo and 3D imaging, depth map processing, robotic vision, face detection and authentication, augmented reality, object tracking, object avoidance, and advanced noise reduction. Vision is becoming increasingly important. For example, the latest upgrade to Tesla's Model S is now in beta testing with lane-following, traffic-aware cruise control (basically following the car in front), and automatic emergency braking (not running into the car in front even if you are driving the car manually). To do this, for nearly a year now, all Model S cars have shipped with a forward-facing camera, a forward-facing radar, and other sensors (you can see the radar in the grille in the picture to the right). So there is plenty of signal processing to be done, especially vision processing. This is not yet a fully autonomous vehicle, but more than is usually meant by advanced driver assistance systems (ADAS). On the highway, the car should largely be able to drive itself a lot of the time but it clearly doesn't have enough sensors yet to be able to drive on surface streets with pedestrians and complex junctions. The Vision P5 DSP improves the ease of software development and porting, with comprehensive support for integer, fixed-point, and floating-point data types, and an advanced toolchain with a proven, auto-vectorizing C compiler. The software environment also features complete support of standard OpenCV and OpenVX libraries for fast, high-level migration of existing imaging/vision applications with over 800 library functions. The specs are: Wide 1024-bit memory interface with SuperGather technology for maximum performance on the complex data patterns of vision processing Up to 4 vector ALU operations per cycle, each with up to 64-way data parallelism Up to 5 instructions issued per cycle from 128-bit wide instruction delivering increased operation parallelism Enhanced 8-,16-, and 32-bit ISA tuned for vision/imaging applications Optional 16-way IEEE single-precision vector floating-point processing unit delivering a massive 32GFLOPs at 1GHz Learn more about the Tensilica Vision P5 DSP .
↧