From the configuration, the positioning of R329 The main players in the global smart speaker chip market include Qualcomm, Intel, Apple, Allwinner, Rockchip, Amlogic and so on. We do not know the specific market share of "main control" chip manufacturers in smart speaker equipment on a global or domestic scale - but from the perspective of the main chip of existing smart speakers, domestic competition seems to be quite fierce. As far as Quanzhi is concerned, Tmall Genie, Xiaodu at Home, Xiaodu Speaker Play, Jingdong Dingdong, Xiaoai Speaker Play, Xiaoai Speaker mini, Tencent Listening, NetEase Cloud Speaker, Sony LF-S80D and other relatively hot smart speakers are using Quanzhi's main control chip, which also makes Allwinner R328, R16, R58 become a more well-known smart speaker main control chip. Allwinner's R series is positioned as a chip product for edge low-power applications, not just smart speaker products: the R40/R16 is also relatively well-known for its application on the Banana Pi, and the R16 is also the main controller of the Stone Technology sweeper series. The R328 in the R series also received Aspencore's "2019 Global Electronics Achievement Award" audio processor product award last year. At the China Home Appliance Consumer Electronics Expo in March last year, Quanzhi demonstrated its ability to identify strong noise environments. In terms of product number, R329 seems to be an iteration of R328, but a Allwinner spokesperson told us that the two products are positioned differently. Among them, "R329 is a high-end positioning, focusing on large computing power, 3-8 far-field intelligent voice interaction, which can be applied to ultra-low-power products with batteries, and rich interfaces", providing better solutions for high-end smart speakers in the current market; And "R328 is biased towards the mid-range to entry-level market, 2-3 far-field intelligent voice interaction, lower cost".
AllWinner provides us with the A53's performance improvement over the A35, with benchmarks in addition to integer multiplication per cycle and floating-point single-precision and double-precision FLOPS performance. This data is also basically in line with Arm's earlier official given, based on different scenarios, A35 performance is about 80% of A53. Specific to the R329 chip, compared to the R328, "provides 1.58 times the integer computing power, 1.98 times the floating point computing power" - the latter uses dual-core A7 (1.2GHz), so this degree of improvement is also expected. DSP and AI core The positioning of R329 can be seen from the selection of A53 general-purpose processor, but the high computing power should be highlighted in the IP selection. As mentioned earlier, the general-purpose processor part runs the operating system, applications, network connections, etc.; DSP is responsible for signal processing algorithms and sound effects; There are also AI cores, that is, NPU dedicated to local ASR (Automatic Speech Recognition), NLP (Natural Language Recognition) and TTS (Textto Speech) - all executed locally, which is what we often call edge computing. The DSP part of R329 is two HIFI 4 cores - this is one of the Cadence Tensilica HiFi DSP series IP, which is also a high-performance DSP in the family positioning, and has a relatively wide application ecology in mobile phones, vehicles, digital TVs and other products. HIFI 4 natively supports multi-channel object-based audio, digital assistant front-end processing, and neural network-based ASR, although we know that AllMotion chose to leave some of these features to NPUs.
As for the Tengine framework other than AIPU, it does not actually rely on AIPU and should be regarded as an integral part of the entire Arm AI ecosystem. It can extract the chip computing power of the existing Arm architecture, so Tengine also supports Arm CPUs, Mali GPUs and third-party AI units to provide an abstract runtime interface for AI application development. Quanzhi also provides a full set of software toolchains for developers for R329, which should also contribute to the Zhouyi ecology to a large extent. In terms of more specific applications, Quanzhi said: "ASR, NLP, TTS and other technologies have put forward an urgent need for dedicated AI processors; Traditional algorithms are also gradually replaced by AI algorithms, released at home and abroad, using deep learning as end-to-end algorithms, compared with traditional noise reduction, echo cancellation and keyword recognition algorithms, the effect is better, with a higher recognition rate. ” Therefore, Quanzhi also told us that when the R329 uses DSP+NPU+2MBSRAM, let the large-model double-microphone noise reduction algorithm run on the DSP, and the large-model deep learning wake-up word run on the NPU, which can achieve low-power characteristics. This should be a relatively reasonable way to match computing power and power consumption. Power consumption at high computing power The combination of DSP+NPU is itself to provide more effective computing efficiency, and in theory, it can naturally achieve significantly lower power consumption under the same computing power, and the comparison of Cortex-A7, HIFI 4 DSP and AIPU mentioned that the dedicated core is not only a significant leader in computing power, but also that the power consumption of AI computing units under the same computing power is only a few tenths of that of the general-purpose processor. However, when it comes to achieving low power consumption, the R329's integrated on-chip 2MB SRAM is also an important component. This kind of setting of integrating a larger capacity SRAM on the chip is also rare in Allwinner's previous R series products and competitors of the same class - some competitors also have on-chip SRAM, but the configuration of the same file is usually at the level of 256KB.
Smaller SRAM itself cannot run low-power noise reduction algorithm + wake-up model, or needs to be paired with slower DDR. Under the configuration of SRAM, most of the computing power of the algorithm model can be put into SRAM to run. Therefore, Quanzhi said that the standby power consumption of R329 is (1) built-in hardware VAD (voice activity detection), and sound detection can also achieve standby below 30mW; (2) DSP+RAM, to implement the small model double microphone noise reduction algorithm and the small model deep learning wake-up word, it is 50mW standby power consumption; (3) DSP+NPU+SRAM, so that the large-model double-microphone noise reduction algorithm runs on the DSP, and the large-scale deep learning wake-up word runs on the NPU to achieve 60mW standby power consumption. So R329 itself is suitable for making solutions with batteries. Finally, the related I/O part is actually worth mentioning. R329 integrates 2 audio DACs, which can be directly connected to the external analog power amplifier to achieve stereo, 1.1-channel output, and 5.1/7.1-channel audio output through I2S; Integrated multi-channel audio ADC - has stronger audio interface scalability than competing products, which can provide multi-microphone pickup solutions.
In the future, we will observe the market performance of Allwinner R329, and we can roughly glimpse whether the high computing power of intelligent voice solutions will become a trend in the smart home market. In Quanzhi's opinion, this answer is still relatively positive. In an interview with us, Quan Zhi gave an example of the change in power demand in the changing times: "For example, at the beginning of the MP3 audio format to achieve multiroom, customers are very surprised by this function, but as customers gradually get used to the basic function of intelligent voice interaction, it is proposed that the sound quality of smart speakers should also be benchmarked with traditional speakers, and the audio transmission format has been greatly improved from MP3 to AAC, and then superimposed multiroom, the computing power requirements of AP corresponding to this function will increase multiples, because it is the experience of audio functions, and it is also necessary to ensure high real-time synchronization." "Consumer requirements are increasing and getting higher and higher, so that the requirements for AP specifications and computing power are also increasing, smart speakers are constantly adding new functions, such as multiroom, TWS, DLNA, BT MESH, more powerful sound effects; Customers are gradually no longer satisfied with simple EQ and DRC processing, and the demand for high-end audio effects such as virtual bass and 3D surround sound continues to be raised. "That's probably where R329 came about.