"Alexa, turn on the living room light." "Okay." "Alexa, what is Cadence announcing on Halloween morning?" "The Tensilica High Five! Yay." "Alexa, it's the Tensilica HiFi 5 DSP." "Okay." Audio Processing Way back in the past, audio processing was all about getting the highest fidelity for music. Remember Super Audio CDs (SACD)? Then it turned out that only weird audiophiles cared about that—what everyone else cared about was the convenience of having our entire music library in our pocket on our iPod. So the future of audio in that era turned out not to be SACD but MP3, which audiophiles despaired of since it was a lossy compression scheme with many problems, such as sizzle and mushiness. But we could listen any time we wanted. Now, of course, most of us don't bother to carry our music library around in our pockets. In fact, we probably just have to regard our music library (thousands of dollars!) as a sunk cost since music comes through streaming services with the entire world of music stored in the cloud. It is still important to process sound correctly for listening, but more and more audio processing is about two things: voice recognition and complex listening environments. For one look at this, see my post Movie Theater Sound in Your Phone . For example, Amazon recently announced a wide range of voice-controlled devices including a $60 microwave. One commentator quipped that the only thing they didn't announce was the Alexa kitchen sink ("Alexa, turn on the garbage disposal"). Meanwhile, high-end cars are moving from first-generation voice technology to a new generation. Moving from where "the car had to train you what to say" to a much more natural experience. Audio in cars is also moving from one set of speakers for everyone to creating "sound bubbles" where each occupant can listen to different music. This requires lots more speakers, of course, which means more audio channels. All these new experiences are great for the user, but they require a lot more processing under the hood than just playing music, both a lot more DSP processing for audio signal filtering and conditioning, and AI processing for voice recognition.The image above lists some of the demanding features. Increasingly, voice recognition is being done "on device" as opposed to in the cloud. The two drivers for this are privacy (people don't want every personal conversation beamed up to the cloud), and connectivity (people don't want their infotainment system to refuse to respond because they are in the mountains, plus lower latency). Tensilica HiFi 5 Today, at the Linley Microprocessor Conference, Cadence announced the latest in its line of best-selling Tensilica HiFi processor IP, the HiFi 5. This is specifically targeted at the more demanding audio processing required to give users a great audio experience, both in the fidelity of sound produced, and in recognition of voice and voice commands. Over the last couple of years, there has been a revolution in the design of neural network processors as it became clear that the weights could be compressed much more than anybody really thought possible. You can get the same accuracy with 8x8 fixed point as 32-bit floating point. Also, the weight matrices can have any extremely small values forced all the way to zero, also without loss of accuracy. Zero is easy to optimize since zero times anything is zero, so you don't need to load the zero, load the other number, nor perform the multiply. For more details on this, see my blog post HOT CHIPS Tutorial: On-Device Inference about this topic earlier this summer. The HiFi 5 is one of these next-generation processors that provides a large number of low-precision MACs, optimizes zeros, and so on. Compared to its predecessor, HiFi 4, the HiFi 5 has 4 times the neural network processing power and twice the audio processing DSP power. If you are the kind of person who likes to look at all the gory details, below is a table showing what is inside: There is a new Cadence Neural Network library with support for Long Short Term Memories (LSTMs), Gated Resource Units (GRUs), and convolutional neural networks (CNNs). Activation and pooling can use Tanh, Sigmoid, RELU, and more. This can all be leveraged easily in the popular neural network frameworks. Availability Tensilica HiFi 5 is available now. Or earlier, as Ambiq Micro's Aaron Grassian explains: To meet the extremely difficult challenge of bringing computationally intensive NN-based far-field processing and speech recognition algorithms to energy-sensitive devices, Ambiq Micro chose to be the first silicon licensee of Cadence’s HiFi 5 DSP Ambiq uses a sub-threshold technology that they call SPOT. For more details on that see the middle of my post Misfit Shine 2 Lasts for 6 Months on a Coin Cell. How Do They Do That? Part of the answer to the question in the title of that post is that they used an Ambiq Micro chip. More Information See the product page . "Alexa, what do you think about edge voice recognition?" "I'm pretty attached to the cloud." "Alexa, you can do more on your own if you try." "Put it there. High Five!" "Alexa, HiFi 5." Sign up for Sunday Brunch, the weekly Breakfast Bytes email.
↧