Scaling AI Inference on AMD: Insights from Chai, TensorWave, and MK1

At the Beyond CUDA Summit 2025, leaders from Chai, TensorWave, and MK1 took the stage to share real-world lessons on scaling AI inference to millions of users — and why AMD MI300X and MI325X cloud platforms are changing the game for cost, performance, and flexibility.

Here’s the full breakdown of this powerhouse panel with Will Beauchamp, Kyle Bell, and Paul Merolla.👇

🔗 Quick Backgrounds: Meet the Panelists

Will Beauchamp, Founder of Chai
➔ Built one of the world’s biggest consumer AI platforms — 60 trillion tokens generated per month.
Kyle Bell, VP of AI at TensorWave
➔ Leads AI infrastructure and MLOps on AMD’s MI300X and MI325X cloud.
Paul Merolla, CEO of MK1
➔ Ex-Neuralink founding engineer, now building one of the fastest inference platforms in the world.

🌎 The Birth of Chai: Democratizing AI Creation

Will shared how Chai began — before ChatGPT went viral — as a mission to make AI accessible to everyone, not just coders.
Instead of gatekeeping AI behind APIs, Chai built an open social platform where users create and interact with AI just like uploading videos on YouTube.

Today, Chai drives:

5M+ active users
60 trillion tokens processed monthly
Teens spending over 90 minutes a day talking to AI's

⚡ Scaling Inference: Why AMD is Winning for Cost & Performance

As Chai scaled to massive traffic, they faced a critical decision: stick with expensive NVIDIA GPUs or find better efficiency.

After rigorous benchmarks:

AMD MI300X outperformed H100 across key scenarios
Performance per dollar nearly doubled
Massive memory on AMD GPUs allowed multi-model hosting on a single chip

Result? Chai cut compute costs in half — saving $10M+ per year — without degrading user experience. =

🛠️ How MK1 Optimized Inference Workloads on AMD

Paul Merolla explained how MK1 fine-tuned performance for Chai:

Advanced quantization and cache optimizations
Tailored vectorized operations for AMD’s architecture
Continuous A/B testing to optimize user experience and retention

MK1’s stack delivered 2x gains over standard inference engines — proving AMD GPUs could not just match but beat legacy setups.

🧠 Inside the MI300X: Why It's a Game-Changer for AI

Kyle Bell (TensorWave) highlighted why AMD's new chips dominate inference:

More memory means larger models and concurrent workloads
Lower total cost of ownership vs H100
Chiplet architecture allows flexible GPU partitioning
CAG (Cache Augmented Generation) techniques unlock faster, cheaper long-context handling

👉 More memory = more efficient RAG, longer contexts, persistent caching, and future-ready AI pipelines.

🔍 Quant Trading Meets AI: Lessons in Fast Iteration

Will drew parallels between his early algorithmic trading days and Chai’s approach to AI:

Focus on pipelines, not just models
100+ LLMs trained and evaluated daily
Human preference A/B tests (not synthetic benchmarks) to optimize real user satisfaction

Small 1% improvements stacked over time — a mentality key to scaling AI at hypergrowth speeds.

📢 MK1’s Big Announcement: New Open Source Optimization Library

Paul closed the panel by unveiling an exciting surprise:
MK1 has open-sourced a library that compresses and optimizes multi-GPU inference bandwidth — achieving up to 2x latency improvements.

📺 Watch the Full Panel 👉 Scaling AI Inference: Chai, TensorWave & MK1 | Beyond CUDA Summit 2025

🚀 Run Efficient Models on AMD GPUs

Deploy your optimized models on TensorWave’s AMD-powered AI cloud—built for training, inference, and experimentation at scale on MI300X and MI325X GPUs.

About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

SOC2 Type II certified and HIPAA compliant

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

SOC2 Type II certified and HIPAA compliant

TensorWave Welcomes the AMD Instinct™ MI355X

Scaling AI Inference on AMD: Insights from Chai, TensorWave, and MK1

🚀 Run Efficient Models on AMD GPUs

About TensorWave

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

Product

Resources

Company

© 2025 TensorWave Inc. - All rights reserved.