Published: Apr 22, 2025

Shoulda, Woulda, CUDA: Breaking the GPU Mold with Jay Dawani from Lemurian Labs

At Beyond CUDA Summit 2025, Jay Dawani, CEO of Lemurian Labs, didn’t just give a talk — he threw down a challenge to the entire AI industry:
What if CUDA wasn’t the only path forward?

In a bold, thought-provoking session, Jay shared his journey from OpenAI to founding Lemurian Labs — and why accelerated software, not just chips, is the future of AI.👇

🎓 Jay’s Journey: From Robotics to Revolution

Jay’s background reads like a roadmap for the future of AI:

  • Studied Applied Math, mentored by Geoffrey Hinton
  • Early engineer at OpenAI working on robotics
  • Helped develop self-driving cars, rocket landers, and robotic explorers

But Jay didn’t stop at applications — he saw a much bigger problem brewing deep in the compute stack.

The Harsh Lesson: Building Chips Is Hard — And Not Enough

Lemurian Labs originally set out to build their own chip.
Simulations showed orders of magnitude better performance... but reality hit hard:

  • Hardware alone wouldn’t solve the coming compute challenges
  • Free performance gains every chip generation were ending
  • Software had to evolve — fast
"We realized the era of accelerated computing was ending. Now it’s about accelerated software."

🧠 The Big Idea: Reinventing Compilers and Runtimes

Faced with an exploding world of AI models and compute needs, Jay and his team went back to the drawing board:

  • Designed a new compiler and runtime stack for a truly heterogeneous world
  • Built dynamic scheduling systems that optimize memory, not just ops
  • Generate optimal kernels for any hardware — not just for CUDA or a single vendor

Their stack doesn’t just compete — it crushes benchmarks on AMD and Nvidia hardware alike.

📊 Performance Highlights: AMD MI300X and Beyond

Testing Lemurian Labs' software on real-world hardware showed jaw-dropping results:

  • 2x faster than ROCm on AMD MI300X
  • 30–40% faster than standard CUDA pipelines on Nvidia GPUs

And they're just getting started.

🏛️ Why Breaking the CUDA Monopoly Matters

For Jay, this isn’t just about performance — it's about unlocking economic growth worldwide:

  • AI agents everywhere, for every person and every company
  • Democratized compute to drive the next Internet-scale boom
  • Tearing down the walls around closed ecosystems like CUDA
"If CUDA stays the moat, we don't get the economic revolution AI could bring."

Jay’s mission is simple yet radical: Open up compute for everyone — or risk losing the biggest opportunity in history.

💬 Final Thoughts: Building a Fairer AI Future

Jay’s story is a reminder that true innovation isn’t just about chasing margins — it’s about betting on bigger ideas:

  • Building systems that scale for billions of agents
  • Making AI affordable and accessible
  • Enabling a future where AI boosts prosperity worldwide

Lemurian Labs isn’t just chasing performance.
They’re chasing freedom, speed, and a new economic future.

📺 Watch the Full Talk 👉 Shoulda, Woulda, CUDA | Jay Dawani at Beyond CUDA Summit 2025

🚀 Run Efficient Models on AMD GPUs

Deploy your optimized models on TensorWave’s AMD-powered AI cloud—built for training, inference, and experimentation at scale on MI300X and MI325X GPUs.

About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.