HOME ABOUT US PRODUCT SOLVE ABILITY INVESTOR CONTACT
cn / en
Intelligent voice chips have begun to integrate AI specialized cores
2021-07-08
Smart speakers, smart homes are now involved in the field of AI is a hot topic, to the market is reflected in the shipment of such products and the manufacturer's publicity caliber. What is more interesting is that although the so-called "main control" chip manufacturers of smart speakers are always promoting their AI attributes, most of the chips still have few AI cores, or neural network special computing units. Probably the AI computing power requirements of such edge devices can be achieved by CPU or possible GPUs...
However, with the increase in demand for edge computing power, the strengthening of intelligent voice chip computing power itself is also the trend of smart home and smart speaker development in the past two years. For example, in the main SoC for smart speakers, the CPU part has a trend of higher and higher computing power. However, general-purpose processors are not very efficient when targeting smart audio devices, so we see manufacturers starting to add DSP and AI cores (NPUs) to chips.
Among them, the relatively typical R329 chip and its intelligent voice solution recently launched by Allwinner Technology, through this chip and corresponding solutions, we can roughly glimpse the current smart home/smart speaker on the road of AI development ideas and direction, by the way, after this type of chip really has an AI core, what does the AI intelligent voice chip with high computing power look like.


From the configuration, the positioning of R329 The main players in the global smart speaker chip market include Qualcomm, Intel, Apple, Allwinner, Rockchip, Amlogic and so on. We do not know the specific market share of "main control" chip manufacturers in smart speaker equipment on a global or domestic scale - but from the perspective of the main chip of existing smart speakers, domestic competition seems to be quite fierce. As far as Quanzhi is concerned, Tmall Genie, Xiaodu at Home, Xiaodu Speaker Play, Jingdong Dingdong, Xiaoai Speaker Play, Xiaoai Speaker mini, Tencent Listening, NetEase Cloud Speaker, Sony LF-S80D and other relatively hot smart speakers are using Quanzhi's main control chip, which also makes Allwinner R328, R16, R58 become a more well-known smart speaker main control chip. Allwinner's R series is positioned as a chip product for edge low-power applications, not just smart speaker products: the R40/R16 is also relatively well-known for its application on the Banana Pi, and the R16 is also the main controller of the Stone Technology sweeper series. The R328 in the R series also received Aspencore's "2019 Global Electronics Achievement Award" audio processor product award last year. At the China Home Appliance Consumer Electronics Expo in March last year, Quanzhi demonstrated its ability to identify strong noise environments. In terms of product number, R329 seems to be an iteration of R328, but a Allwinner spokesperson told us that the two products are positioned differently. Among them, "R329 is a high-end positioning, focusing on large computing power, 3-8 far-field intelligent voice interaction, which can be applied to ultra-low-power products with batteries, and rich interfaces", providing better solutions for high-end smart speakers in the current market; And "R328 is biased towards the mid-range to entry-level market, 2-3 far-field intelligent voice interaction, lower cost".


Allwinner Technology said that the two main features of R329 are high computing power and low power consumption. The high computing power part also involves DSP and NPU. Let's try to look at the product features of R329 from these two aspects. Before that, let's take a global look at the parameter configuration and features of R329: - Dual-core Cortex-A531.5GHz clock; - DSP: dual-core HIFI 4,400MHz frequency; - NPU: Zhou Yi AIPU, 800MHz, 0.256T; - Storage: on-chip SRAM; Built-in 128MB DDR3; - Expansion: Integrated multi-channel audio ADC and DAC, 3-channel I2S and 8-channel DMIC, while integrated LDOs. For a complete smart speaker solution, Allwinner also provides supporting WiFi and Bluetooth chips, audio ADC chips, etc. to meet different customer needs. From this series of configurations, it can be roughly seen that R329 is prepared for high computing power in terms of positioning, and it is mainly aimed at intelligent voice interaction products without screens. The general-purpose processor part is chosen for the Arm Cortex-A53 microarchitecture: the functions of this part of the entire system are usually to run the operating system, applications, network connections, etc. Allwinner's earlier mainstream solution of the R series used the A7 microarchitecture - which is also the choice of many competitors on the market, and some chose the A35. Both A53 and A7 are energy-efficient architectures, and they have more similarities in the render pipeline, such as sequential 8-stage pipelines. However, the former has significant co-frequency performance improvements, including switching to a 64-bit Armv8-A instruction set architecture and extensions, and more complete superscalar support. The A53's dual emission has greater flexibility and improved branch prediction accuracy; In addition, integer, floating-point unit, Neon, and storage performance are improved. The subsequent A35 actually focuses more on efficiency, it is positioned in the same class of A7 in terms of performance, and the overall microarchitecture is also similar to the A53 - there are some changes in the front end, and the finger unit has been redesigned, the index bandwidth has made a trade-off on energy efficiency, and the instruction queue is smaller; Neon/floating-point pipelines vary in area efficiency.

AllWinner provides us with the A53's performance improvement over the A35, with benchmarks in addition to integer multiplication per cycle and floating-point single-precision and double-precision FLOPS performance. This data is also basically in line with Arm's earlier official given, based on different scenarios, A35 performance is about 80% of A53. Specific to the R329 chip, compared to the R328, "provides 1.58 times the integer computing power, 1.98 times the floating point computing power" - the latter uses dual-core A7 (1.2GHz), so this degree of improvement is also expected. DSP and AI core The positioning of R329 can be seen from the selection of A53 general-purpose processor, but the high computing power should be highlighted in the IP selection. As mentioned earlier, the general-purpose processor part runs the operating system, applications, network connections, etc.; DSP is responsible for signal processing algorithms and sound effects; There are also AI cores, that is, NPU dedicated to local ASR (Automatic Speech Recognition), NLP (Natural Language Recognition) and TTS (Textto Speech) - all executed locally, which is what we often call edge computing. The DSP part of R329 is two HIFI 4 cores - this is one of the Cadence Tensilica HiFi DSP series IP, which is also a high-performance DSP in the family positioning, and has a relatively wide application ecology in mobile phones, vehicles, digital TVs and other products. HIFI 4 natively supports multi-channel object-based audio, digital assistant front-end processing, and neural network-based ASR, although we know that AllMotion chose to leave some of these features to NPUs.


An Allwinner learned that one of the two HIFI 4 cores can be used for "audio pre-processing, such as noise reduction, echo cancellation, and wake-up word recognition"; One can be used for "audio post-processing, to achieve audio decoding, sound enhancement, recording, etc."; With SRAM on the chip, "low-power small-model double-microphone noise reduction algorithm and small-model deep learning wake-up word can be realized". From this description, it can be seen that the DSP of R329 also has typical light AI computing properties. The dual-core DSP design is also quite rare in other R series products of Allwinner, and this design itself is also to provide a dedicated computing unit for some audio application scenarios to obtain a better energy efficiency ratio, which is related to low power consumption. The section on low power consumption is also mentioned later. However, it seems that in Quanzhi's view, only the design of general-purpose processor + DSP (and on-chip SRAM) is still not enough to achieve higher computing power, so R329 is also equipped with a special AI dedicated processor: Zhouyi AIPU. Zhouyi AIPU is an AI processor IP developed by Arm China. Adding AI core seems to be relatively rare among competitors in the industry.


Arm China has previously mentioned the advantages of AIPU compared to DSP, and now more AI cores also consider the problem of supporting programmability to adapt to different algorithms. Arm China can use its own advantages to build an AI software ecosystem, and although DSP can also do AI processing, it has never formed a scale ecology between different architectures, which is not so friendly to software development. In addition, of course, the AI core has a set of instructions for AI and neural network optimization, and it will be better when running some specialized load tasks in terms of computing power and efficiency. The "Zhouyi" platform released by Arm China in November 2018 mainly includes two parts, one is AIPU and the other is the Tengine framework. The biggest feature of AIPU is that it has a set of AI and neural network optimization instruction sets, including tensor instructions, specific AI instructions for implementing custom hardware acceleration units, and scalar instructions for AI computing, etc., and also supports user-defined hardware implementation. Support for various general-purpose frameworks including TensorFlow is also standard for contemporary AI processors, and Arm China's materials mention that AIPU "supports users to load algorithms with one click" and achieves programming flexibility through efficient and flexible tensor execution cells. Regarding the specific efficiency of AIPU, Quanzhi also provides a comparison of computing power and power consumption, as shown in the figure above: this degree of efficiency is of course not surprising, after all, AIPU is an AI core. However, in terms of performance, compared with the 600MHz HIFI 4, it has a 25-fold advantage, which can still show the value and trend of contemporary development of AI-specific processors. It is important to note that the comparison here is limited to single-core performance versus power consumption. It is understood that Quanzhi R329 is the first publicly released chip using Zhouyi AIPU, which has been strongly supported by Arm China, indicating that both parties attach great importance to the future application of NPU in smart speakers and other fields of artificial intelligence. Therefore, it is easier to imagine that R329 has a greater advantage in computing power in the face of competing products.

As for the Tengine framework other than AIPU, it does not actually rely on AIPU and should be regarded as an integral part of the entire Arm AI ecosystem. It can extract the chip computing power of the existing Arm architecture, so Tengine also supports Arm CPUs, Mali GPUs and third-party AI units to provide an abstract runtime interface for AI application development. Quanzhi also provides a full set of software toolchains for developers for R329, which should also contribute to the Zhouyi ecology to a large extent. In terms of more specific applications, Quanzhi said: "ASR, NLP, TTS and other technologies have put forward an urgent need for dedicated AI processors; Traditional algorithms are also gradually replaced by AI algorithms, released at home and abroad, using deep learning as end-to-end algorithms, compared with traditional noise reduction, echo cancellation and keyword recognition algorithms, the effect is better, with a higher recognition rate. ” Therefore, Quanzhi also told us that when the R329 uses DSP+NPU+2MBSRAM, let the large-model double-microphone noise reduction algorithm run on the DSP, and the large-model deep learning wake-up word run on the NPU, which can achieve low-power characteristics. This should be a relatively reasonable way to match computing power and power consumption. Power consumption at high computing power The combination of DSP+NPU is itself to provide more effective computing efficiency, and in theory, it can naturally achieve significantly lower power consumption under the same computing power, and the comparison of Cortex-A7, HIFI 4 DSP and AIPU mentioned that the dedicated core is not only a significant leader in computing power, but also that the power consumption of AI computing units under the same computing power is only a few tenths of that of the general-purpose processor. However, when it comes to achieving low power consumption, the R329's integrated on-chip 2MB SRAM is also an important component. This kind of setting of integrating a larger capacity SRAM on the chip is also rare in Allwinner's previous R series products and competitors of the same class - some competitors also have on-chip SRAM, but the configuration of the same file is usually at the level of 256KB.

Smaller SRAM itself cannot run low-power noise reduction algorithm + wake-up model, or needs to be paired with slower DDR. Under the configuration of SRAM, most of the computing power of the algorithm model can be put into SRAM to run. Therefore, Quanzhi said that the standby power consumption of R329 is (1) built-in hardware VAD (voice activity detection), and sound detection can also achieve standby below 30mW; (2) DSP+RAM, to implement the small model double microphone noise reduction algorithm and the small model deep learning wake-up word, it is 50mW standby power consumption; (3) DSP+NPU+SRAM, so that the large-model double-microphone noise reduction algorithm runs on the DSP, and the large-scale deep learning wake-up word runs on the NPU to achieve 60mW standby power consumption. So R329 itself is suitable for making solutions with batteries. Finally, the related I/O part is actually worth mentioning. R329 integrates 2 audio DACs, which can be directly connected to the external analog power amplifier to achieve stereo, 1.1-channel output, and 5.1/7.1-channel audio output through I2S; Integrated multi-channel audio ADC - has stronger audio interface scalability than competing products, which can provide multi-microphone pickup solutions.

In the future, we will observe the market performance of Allwinner R329, and we can roughly glimpse whether the high computing power of intelligent voice solutions will become a trend in the smart home market. In Quanzhi's opinion, this answer is still relatively positive. In an interview with us, Quan Zhi gave an example of the change in power demand in the changing times: "For example, at the beginning of the MP3 audio format to achieve multiroom, customers are very surprised by this function, but as customers gradually get used to the basic function of intelligent voice interaction, it is proposed that the sound quality of smart speakers should also be benchmarked with traditional speakers, and the audio transmission format has been greatly improved from MP3 to AAC, and then superimposed multiroom, the computing power requirements of AP corresponding to this function will increase multiples, because it is the experience of audio functions, and it is also necessary to ensure high real-time synchronization." "Consumer requirements are increasing and getting higher and higher, so that the requirements for AP specifications and computing power are also increasing, smart speakers are constantly adding new functions, such as multiroom, TWS, DLNA, BT MESH, more powerful sound effects; Customers are gradually no longer satisfied with simple EQ and DRC processing, and the demand for high-end audio effects such as virtual bass and 3D surround sound continues to be raised. "That's probably where R329 came about.

According to a research report released by Strategy Analytics, global smart speaker shipments totaled 125 million units in 2019, an increase of 60% over 2018. Driven by Ali, Baidu, Xiaomi, etc., China's smart speaker shipments increased from 21.9 million in 2018 to 52 million in 2019, showing a spurt of growth. Allwinner Technology is a participant in the voice master control chip market, and smart speakers are a key area for the company. In 2018, the R series products of smart speakers have made certain breakthroughs. In 2019, Allwinner launched the intelligent voice dedicated processor R328, which has good market results. R329 is an upgraded product based on R328, which is positioned as a special chip for AI voice with high computing power and low power consumption. A spokesman for Allwinner said that in 2020, Allwinner will launch a variety of chips for smart speakers. In addition to R329, Allwinner is currently planning the next generation of screenless smart speakers with integrated WiFi/BT RTOS system chips to meet the iterative needs of the low-cost product market; For the iteration of the screen speaker product, Allwinner will soon launch the high-performance quad-core A53 chip R818.