Intel recently released a new version of its document for software developers revealing some additional details about its upcoming Xeon Scalable ‘Cooper Lake-SP’ processors. As it appears, the new CPUs will support AVX512_BF16 instructions and therefore the bfloat16 format. Meanwhile, the main intrigue here is the fact that at this point AVX512_BF16 seems to be only supported by the Cooper Lake-SP microarchitecture, but not its direct successor, the Ice Lake-SP microarchitecture.
The bfloat16 is a truncated 16-bit version of the 32-bit IEEE 754 single-precision floating-point format that preserves 8 exponent bits, but reduces precision of the significand from 24-bits to 8 bits to save up memory, bandwidth, and processing resources, while still retaining the same range. The bfloat16 format was designed primarily for machine learning and near-sensor computing applications, where precision is needed near to 0 but not so much at the maximum range. The number representation is supported by Intel’s upcoming FPGAs as well as Nervana neural network processors, and Google’s TPUs. Given the fact that Intel supports the bfloat16 format across two of its product lines, it makes sense to support it elsewhere as well, which is what the company is going to do by adding its AVX512_BF16 instructions support to its upcoming Xeon Scalable ‘Cooper Lake-SP’ platform.
AVX-512 Support Propogation by Various Intel CPUs Newer uArch supports older uArch |
||||||
Xeon | General | Xeon Phi | ||||
Skylake-SP | AVX512BW AVX512DQ AVX512VL |
AVX512F AVX512CD |
AVX512ER AVX512PF |
Knights Landing | ||
Cannon Lake | AVX512VBMI AVX512IFMA |
AVX512_4FMAPS AVX512_4VNNIW |
Knights Mill | |||
Cascade Lake-SP | AVX512_VNNI | |||||
Cooper Lake | AVX512_BF16 | |||||
Ice Lake | AVX512_VNNI AVX512_VBMI2 AVX512_BITALG AVX512+VAES AVX512+GFNI AVX512+VPCLMULQDQ (not BF16) |
AVX512_VPOPCNTDQ | ||||
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 16) |
The list of Intel’s AVX512_BF16 Vector Neural Network Instructions includes VCVTNE2PS2BF16, VCVTNEPS2BF16, and VDPBF16PS. All of them can be executed in 128-bit, 256-bit, or 512-bit mode, so software developers can pick up one of a total of nine versions based on their requirements.
Intel AVX512_BF16 Instructions Intel C/C++ Compiler Intrinsic Equivalent |
||
Instruction | Description | |
VCVTNE2PS2BF16 | Convert Two Packed Single Data to One Packed BF16 Data
Intel C/C++ Compiler Intrinsic Equivalent: |
|
VCVTNEPS2BF16 | Convert Packed Single Data to Packed BF16 Data
Intel C/C++ Compiler Intrinsic Equivalent: |
|
VDPBF16PS | Dot Product of BF16 Pairs Accumulated into Packed Single Precision
Intel C/C++ Compiler Intrinsic Equivalent: |
Only for Cooper Lake?
When Intel mentions an instruction in its Intel Architecture Instruction Set Extensions and Future Features Programming Reference, the company usually states the first microarchitecture to support it and indicates that its successors also support it (or are set to support it) by calling them ‘later’ omitting the word microarchitecture. For example, Intel’s original AVX is supported by Intel’s ‘Sandy Bridge and later’.
This is not the case with AVX512_BF16. This one is said to be supported by ‘Future Cooper Lake’. Meanwhile, after the Cooper Lake-SP platform comes the long-awaited 10nm Ice Lake-SP server platform and it will be a bit odd for it not to support something its predecessor does. However, this is not an entirely impossible scenario. Intel is keen on offering differentiated solutions these days, so tailoring Cooper Lake-SP for certain workloads while focusing Ice Lake-SP on others may be the case here.
We have reached out to Intel for additional information and will update the story if we get some extra details on the matter.
Update: Intel has sent us the following:
At this time, Cooper Lake will add support for Bfloat16 to DL Boost. We’re not giving any more guidance beyond that in our roadmap.
Related Reading
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (via InstLatX64/Twitter)