AI designs and the advent of XPUs

At a time when traditional approaches such as Moore’s Law and process scaling are struggling to keep up with performance demands, XUPs emerge as a viable candidate for artificial intelligence (AI) and high-performance computing (HPC) applications.

But what’s an XPU? The broad consensus on its composition calls it the stitching of CPU, GPU, and memory dies on a single package. Here, X stands for application-specific units critical for AI infrastructure.

Figure 1 An XPU integrates CPU and GPU in a single package to better serve AI and HPC workloads. Source: Broadcom

An XPU comprises four layers: compute, memory, network I/O, and reliable packaging technology. Industry watchers call XPU the world’s largest processor. But it must be designed with the right ratio of accelerator, memory and I/O bandwidth. And it comes with the imperative of direct or indirect memory ownership.

Below is an XPU case study that demonstrates sophisticated integration of compute, memory, and I/O capabilities.

What’s 3.5D and F2F?

The 2.5D integration, which involves integrating multiple chiplets and high-bandwidth memory (HBM) modules on an interposer, has initially served AI workloads well. However, increasingly complex LLMs and their training necessitate 3D silicon stacking for more powerful silicon devices. Next, 3.5D integration, which combines 3D silicon stacking with 2.5D packaging, takes silicon devices to the next level with the advent of XPUs.

That’s what Broadcom’s XDSiP claims to achieve by integrating more than 6000 mm² of silicon and up to 12 HBM stacks in a single package. And it does that by developing a face-to-face (F2F) device to accomplish significant improvements in interconnect density and power efficiency compared to the face-to-back (F2B) approach.

While F2B packaging is a 3D integration technique that connects the top metal of one die to the backside of another die, F2F connection assembles two dies ended by a high-level metal interconnection without a thinning step. In other words, F2F stacking directly connects the top metal layers of the top and bottom dies. That provides a dense, reliable connection with minimal electrical interference and exceptional mechanical strength.

Figure 2 The F2F XPU integrates four compute dies with six HBM dies using 3D die stacking for power, clock, and signal interconnects. Source: Broadcom

Broadcom’s F2F 3.5D XPU integrates four compute dies, one I/O die, and six HBM modules while utilizing TSMC’s chip-on-wafer-on-substrate (CoWoS) advanced packaging technology. It claims to minimize latency between compute, memory, and I/O components within the 3D stack while achieving a 7x increase in signal density between stacked dies compared to F2B technology.

“Advanced packaging is critical for next-generation XPU clusters as we hit the limits of Moore’s Law,” said Frank Ostojic, senior VP and GM of the ASIC Products Division at Broadcom. “By stacking chip components vertically, Broadcom’s 3.5D platform enables chip designers to pair the right fabrication processes for each component while shrinking the interposer and package size, leading to significant improvements in performance, efficiency, and cost.”

The XPU nomenclature

Intel’s ambitious take on XPUs hasn’t gone much far as its Falcon Shores platform is no longer proceeding. On the other hand, AMD’s CPU-GPU combo has been making inroads during the past couple of years. Though AMD calls it an accelerated processing unit or APU. It partly comes from the industry nomenclature where AI-specific XPUs are called custom AI accelerators. In other words, it’s the custom chip that provides the processing power to drive AI infrastructure.

Figure 3 MI300A integrates CPU and GPU cores on a single package to accelerate the training of the latest AI models. Source: AMD

AMD’s MI300A combines the company’s CDNA 3 GPU cores and x86-based Zen 4CPU cores with 128 GB of HBM3 memory to deliver HPC and AI workloads. El Capitan—a supercomputer housed at Lawrence Livermore National Laboratory—is powered by AMD’s MI300A APUs and is expected to deliver more than two exaflops of double precision performance when fully deployed.

The AI infrastructure increasingly demands specialized compute accelerators interconnected to form massive clusters. Here, while GPUs have become the de facto hardware, XPUs seem to represent another viable approach for heavy lifting in AI applications.

XPUs are here, and now it’s time for software to catch up and effectively use this brand-new processing venue for AI workloads.

Related Content

The post AI designs and the advent of XPUs appeared first on EDN.

AI designs and the advent of XPUs

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112