Published: Jul 18, 2024

Milestone Unlocked: FP8 Achieved on AMD MI300X

We are thrilled to announce that TensorWave now officially supports FP8 on the MI300X! Thanks to the incredible joint effort with our partners at MK1, the Flywheel Inference Engine now leverages FP8, delivering an impressive 1.6x performance boost over FP16 on Llama 3 70B.

Llama3-70B

Llama3-8B

[email protected], [email protected], input/output token distribution (2048/128) and batch size (128)


Key Highlights:

  1. Performance Boost: The transition to FP8 offers a significant performance boost, delivering up to 1.6x the speed of FP16 on large language models like Llama 3 70B. This enhancement can greatly reduce latency and improve the efficiency of model inference.
  2. Innovative Quantization: MK1's approach to FP8 quantization leads to higher model fidelity compared to previous methods, leading to outputs that are closer to the original.

We invite you to explore the capabilities of this system and consider how it could be integrated into your workflows to achieve new levels of language model performance and utility for your use case on the TensorWave Cloud.



Stay updated on our progress by signing up for our email updates.


About TensorWave and MK1

TensorWave
TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.

MK1
Engines for the AI Economy. Visit us online at mk1.ai