Today Arm announces its newest addition to the Cortex-M series, the new Cortex M55. In addition to the new CPU microarchitecture which brings several new improvements, we also see the introduction of the new Ethos-U55 NPU IP that is meant to be integrated with the new M55 core. Arm’s new IP is meant to advance the machine learning and inferencing capabilities of billions of low-power embedded devices over the next several years, and expand its product portfolio for new use-cases.
We’ve seen Machine Learning become quite the buzzword over the past several years, but today the ecosystem has evolved to the point that it’s no longer just a brand-new novelty, but rather quickly becoming a useful feature to the point that it’s being increasingly deployed in various systems and use-cases in the industry. Arm sees the endpoint AI market particularly an area where we’ll be seeing explosive growth over the coming years, and this is the area that Arm wants to cover with the new IP releases.
The new Cortex-M55 is a new generation IP more closely related to the M33, but brings a few new architectural advances with it that promise some large performance and flexibility improvements when it comes to machine learning as well as vector instructions.
The Ethos-U55 is a dedicated “microNPU” dedicated inference accelerator that ties in with a Cortex-M class CPU and offers the performance and power efficiency of a dedicated NPU, or MAC-engine would usually bring to the table – all in within the similar small footprint of the M-class IPs.
Cortex-M55: First Helium and Custom Instruction capable CPU core
The new Cortex-M55 is important as it’s the first Arm CPU core that is announced with both Helium as well as Custom Instructions capabilities. Helium, whose technical name is actually MVE (for M-Profile Vector Extension), is the new vector extensions and dedicated vector execution units in the M-class processor line-up, making it the first CPU in this range that is capable of SIMD instructions. The addition gives the new core up to a 5x increase in DSP performance, and the optimised instructions for ML workloads in combination with MVE adds up to a 15x performance improvement compared to previous generation M-cores.
In terms of overall microarchitecture, it’s a successor to the M33 and combined µarch as well as frequency improvements will see scalar workloads increase performance by roughly 20%, depending on the vendor’s configuration. The core had been designed with a focus on bandwidth and enabling the new MVE and new ML workloads that require it, so improvements have been made to the memory subsystem, such as having 4x 32-bit interfaces to the TCM (Tightly Coupled Memory).
The Ethos-U55: Arm’s first microNPU
Arm was relatively late to the NPU scene as most vendors had employed their own first-party IP architectures in products, and most vendors today use such implementations. The embedded market however is a bit different and there’s a need for something that is a lot lower area and lower power than what you’re generally used to in “larger” implementation such as in mobile SoCs, which are covered by Arm’s Ethos-N NPU IP.
The new U55 is a small-scale NPU that scales from 32 to 256 MACs, and requires coupling with a Cortex-M class NPU. Arm didn’t go into major specifics of the microarchitecture, but it’s a very lean design that focuses on area and power efficiency, as well as having small memory footprints, including some features that we see in the N-series such as weight decompression. We’re saying the U55 needs to be coupled with an M-class CPU to serve as the controller, but actually this isn’t all too different to what the N-series does as that IP already includes an M-class CPU. When it comes to the architecture of the NPU, it’s said to be different and not related to its bigger brethren, and was designed specifically for low-power use-cases.
In terms of area size, the smallest 32 MAC implementation of the U55 is said to be around 2x the size of an M55. We don’t have absolute figures to present here, but we’re essentially talking about fractions of a mm².
The performance improvements in such systems that use the M55 and U55 represent very major step-function increases over past generation solutions. Figures that Arm provides include up to a 50x performance uplift in a comparison to a Cortex-M7 based system, all while improving energy efficiency by 25x.
As to where the new IPs will be employed, is a very wide variety of embedded systems. It’s important to understand here that the major volume of such systems will be actually subsystems of current existing chips. If we were to take mobile as an example, you’d see subsystems using the IP inside the fingerprint sensors of a phone, the always-listening audio chip for voice assistant features, or even uses inside the RF systems that would optimise workloads such as antenna tuning. There are hundreds of M-class processors in today’s mobile devices that would benefit from ML capabilities, most of them completely transparent to the user.
Arm has currently licensed the M55 and U55 to its lead partners, and will open up wider range licensing to other customers in the coming months. As usual with IP, you should expect products using the new designs in around 2 years – if vendors ever publicly confirm whether they use the designs in their products.