Introduction to the AMD MI300X Accelerator
Sep 04, 2024
We engineers often assume that everyone we talk to understands the jargon, three-letter acronyms (TL...

We engineers often assume that everyone we talk to understands the jargon, three-letter acronyms (TLAs), and other language shortcuts we drop into conversations when we discuss our work. When you’re passionate about what you do to make the world a better place, it’s easy to forget when you’re speaking to a lay audience.
Here at TensorWave, we drop the term “MI300X” quite a bit. For the part of the world that we at TensorWave aim to improve—AI development—we have staked our success on a specific hardware product from chipmaker AMD. It’s the rival to current incumbent, Nvidia's H100 series GPU. The AMD product we favor is the “AMD Instinct MI300X accelerator.
So what is this thing that we’re so crazy about? What’s so great about it? And what is an “accelerator” anyway?
We’re glad you asked.
Accelerators Explained
Training an AI model involves immense numbers of mathematical operations performed on immense quantities of data. Any CPU, such as the one on your computer or smartphone, can perform these operations, but given the number of parameters that need to be calculated (and recalculated, and re-recalculated…) in a typical AI model, it would take forever. This is because your CPU has other things to do than perform these low-level, repetitive math operations.
Another application that requires this type of math processing is video graphics rendering, in particular for video games. For many years, computer graphics cards have incorporated a special-purpose chip called a graphical processing unit (GPU). The GPU is designed to perform these operations much faster than the CPU can, so computer-generated video is smoother and more detailed than would otherwise be the case.
Hence, a GPU is known as a “hardware accelerator.”
AI Accelerators
With the emergence of deep-learning artificial neural networks (ANNs), the type of AI most commonly used today, developers soon realized that GPUs could be repurposed to train their models. This was helpful, but GPU makers saw an opportunity to cater to the specific needs of AI developers and started selling products designed for AI acceleration. We still call these accelerators “GPUs,” and, in theory, they could be used for that purpose, but because they cost quite a bit more than regular GPUs, few of them are.
Driven by the explosion in AI development in general and generative AI in particular, recent years have seen chipmakers release ever more powerful AI accelerators. At this writing, the market leaders are NVIDIA, AMD, and Intel, and NVIDIA owns the lion’s share of the market.
AMD’s Instinct MI300X
The MI300X is part of AMD’s Instinct MI300 product line. The MI300 products have a common architecture with multiple “chiplets” joined together with memory and intra-chip communications circuitry into one large device.
The two main MI300 products are the MI300A and MI300X. The MI300A incorporates six accelerator chiplets and three compute chiplets (multi-core CPUs).It is used where both CPU and accelerator tasks need to be performed in close cooperation. The MI300X that forms the basis of the TensorWave AI cloud platform has eight accelerator chiplets and no compute chiplets, an arrangement better suited for AI model training, which depends more on accelerator tasks than CPU tasks.
The eight accelerator chiplets of the MI300X are surrounded by 3D-stacked layers of high-bandwidth memory—192 GB, to be exact. The proximity between memory and the chiplets, coupled with super-fast communications circuitry, enables the MI300X to transfer data in and out of memory at 5.3 TB/s. This speed reduces the bottleneck (and energy consumption) caused by having to shuttle large amounts of data between the two.
Comparison with Other GPUs
Apples-to-apples comparisons of GPUs from different manufacturers are somewhat difficult to come by. In particular, performance claims from each manufacturer must be taken with a grain of salt because they are often based on ideal conditions that one might rarely encounter in real applications.
The main competitors to the MI300X are:
- NVIDIA’s H100, which has been that company’s flagship AI accelerator for the last couple of years.
- NVIDIA’s B200, which was recently announced and at this writing has not started shipping.
- Intel’s Gaudi 2 and recently announced Gaudi 3.
From an architecture standpoint, the MI300X differs from the H100 and Gaudi 2 in that those are each composed of a single chip. The B200 and Gaudi 3 more closely resemble the MI300X architecture in that they use chiplet strategies.
In terms of memory size, the B200 catches NVIDIA up with the 192 GB of the MI300X; the H100 has 80 GB, and Gaudi 3 will have 128 GB. The 5.3 TB/s memory bandwidth of the MI300X is faster than that of the H100 (3.5 TB/s) and the claimed value of 3.8 TB/s for Gaudi 3. The B200 has a claimed memory bandwidth near 8 TB/s.
Why We Chose the MI300X
Why did we choose the MI300X to build the TensorWave cloud-based AI development platform? The reasons are simple:
- Superior hardware specifications compared to currently available options
- Ready availability
- Scalability (multiple MI300X units can work together on a single AI training task)
- AMD’s ROCm software
In short, with the MI300X, we can offer superior performance today—without waiting for recently announced options to be offered for sale—and without waiting months for NVIDIA to fill an order for the H100 GPU. Furthermore, AMD’s ROCm software makes it easy to migrate your existing AI development projects and take advantage of the scalability of the MI300 platform.
About TensorWave
TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.