
At a time when traditional approaches such as Moore’s Law and process scaling are struggling to keep up with performance demands, XUPs emerge as a viable candidate for artificial intelligence (AI) and high-performance computing (HPC) applications.
But what’s an XPU? The broad consensus on its composition calls it the stitching of CPU, GPU, and memory dies on a single package. Here, X stands for application-specific units critical for AI infrastructure.
Figure 1 An XPU integrates CPU and GPU in a single package to better serve AI and HPC workloads. Source: Broadcom
An XPU comprises four layers: compute, memory, network I/O, and reliable packaging technology. Industry watchers call XPU the world’s largest processor. But it must be designed with the right ratio of accelerator, memory and I/O bandwidth. And it comes with the imperative of direct or indirect memory ownership.
Below is an XPU case study that demonstrates sophisticated integration of compute, memory, and I/O capabilities.
What’s 3.5D and F2F?
The 2.5D integration, which involves integrating multiple chiplets and high-bandwidth memory (HBM) modules on an interposer, has initially served AI workloads well. However, increasingly complex LLMs and their training necessitate 3D silicon stacking for more powerful silicon devices. Next, 3.5D integration, which combines 3D silicon stacking with 2.5D packaging, takes silicon devices to the next level with the advent of XPUs.
That’s what Broadcom’s XDSiP claims to achieve by integrating more than 6000 mm2 of silicon and up to 12 HBM stacks in a single package. And it does that by developing a face-to-face (F2F) device to accomplish significant improvements in interconnect density and power efficiency compared to the face-to-back (F2B) approach.
While F2B packaging is a 3D integration technique that connects the top metal of one die to the backside of another die, F2F connection assembles two dies ended by a high-level metal interconnection without a thinning step. In other words, F2F stacking directly connects the top metal layers of the top and bottom dies. That provides a dense, reliable connection with minimal electrical interference and exceptional mechanical strength.
Figure 2 The F2F XPU integrates four compute dies with six HBM dies using 3D die stacking for power, clock, and signal interconnects. Source: Broadcom
Broadcom’s F2F 3.5D XPU integrates four compute dies, one I/O die, and six HBM modules while utilizing TSMC’s chip-on-wafer-on-substrate (CoWoS) advanced packaging technology. It claims to minimize latency between compute, memory, and I/O components within the 3D stack while achieving a 7x increase in signal density between stacked dies compared to F2B technology.
“Advanced packaging is critical for next-generation XPU clusters as we hit the limits of Moore’s Law,” said Frank Ostojic, senior VP and GM of the ASIC Products Division at Broadcom. “By stacking chip components vertically, Broadcom’s 3.5D platform enables chip designers to pair the right fabrication processes for each component while shrinking the interposer and package size, leading to significant improvements in performance, efficiency, and cost.”
The XPU nomenclature
Intel’s ambitious take on XPUs hasn’t gone much far as its Falcon Shores platform is no longer proceeding. On the other hand, AMD’s CPU-GPU combo has been making inroads during the past couple of years. Though AMD calls it an accelerated processing unit or APU. It partly comes from the industry nomenclature where AI-specific XPUs are called custom AI accelerators. In other words, it’s the custom chip that provides the processing power to drive AI infrastructure.
Figure 3 MI300A integrates CPU and GPU cores on a single package to accelerate the training of the latest AI models. Source: AMD
AMD’s MI300A combines the company’s CDNA 3 GPU cores and x86-based Zen 4CPU cores with 128 GB of HBM3 memory to deliver HPC and AI workloads. El Capitan—a supercomputer housed at Lawrence Livermore National Laboratory—is powered by AMD’s MI300A APUs and is expected to deliver more than two exaflops of double precision performance when fully deployed.
The AI infrastructure increasingly demands specialized compute accelerators interconnected to form massive clusters. Here, while GPUs have become the de facto hardware, XPUs seem to represent another viable approach for heavy lifting in AI applications.
XPUs are here, and now it’s time for software to catch up and effectively use this brand-new processing venue for AI workloads.
Related Content
- The role of cache in AI processor design
- Top 10 Processors for AI Acceleration at the Endpoint
- Server Processors in the AI Era: Can They Go Greener?
- Four tie-ups uncover the emerging AI chip design models
- Using edge AI processors to boost embedded AI performance
The post AI designs and the advent of XPUs appeared first on EDN.