Jay Dawani of Lemurian Labs shares how accelerated software is breaking CUDA's monopoly and unlocking AI performance on AMD MI300X and beyond.

Shoulda, Woulda, CUDA: Breaking the GPU Mold with Jay Dawani from Lemurian Labs

At Beyond CUDA Summit 2025, Jay Dawani, CEO of Lemurian Labs, didn’t just give a talk — he threw down a challenge to the entire AI industry:
What if CUDA wasn’t the only path forward?

In a bold, thought-provoking session, Jay shared his journey from OpenAI to founding Lemurian Labs — and why accelerated software, not just chips, is the future of AI.👇








🎓 Jay’s Journey: From Robotics to Revolution

Jay’s background reads like a roadmap for the future of AI:

 * Studied Applied Math, ment

At Beyond CUDA Summit 2025, Jay Dawani, CEO of Lemurian Labs, didn’t just give a talk — he threw down a challenge to the entire AI industry:
What if CUDA wasn’t the only path forward?

In a bold, thought-provoking session, Jay shared his journey from OpenAI to founding Lemurian Labs — and why accelerated software, not just chips, is the future of AI.👇








🎓 Jay’s Journey: From Robotics to Revolution

Jay’s background reads like a roadmap for the future of AI:

 * Studied Applied Math, mentored by Geoffrey Hinton
 * Early engineer at OpenAI working on robotics
 * Helped develop self-driving cars, rocket landers, and robotic explorers

But Jay didn’t stop at applications — he saw a much bigger problem brewing deep in the compute stack.

⚡ The Harsh Lesson: Building Chips Is Hard — And Not Enough

Lemurian Labs originally set out to build their own chip.
Simulations showed orders of magnitude better performance... but reality hit hard:

 * Hardware alone wouldn’t solve the coming compute challenges
 * Free performance gains every chip generation were ending
 * Software had to evolve — fast

"We realized the era of accelerated computing was ending. Now it’s about accelerated software."

🧠 The Big Idea: Reinventing Compilers and Runtimes

Faced with an exploding world of AI models and compute needs, Jay and his team went back to the drawing board:

 * Designed a new compiler and runtime stack for a truly heterogeneous world
 * Built dynamic scheduling systems that optimize memory, not just ops
 * Generate optimal kernels for any hardware — not just for CUDA or a single vendor

Their stack doesn’t just compete — it crushes benchmarks on AMD and Nvidia hardware alike.

📊 Performance Highlights: AMD MI300X and Beyond

Testing Lemurian Labs' software on real-world hardware showed jaw-dropping results:

 * 2x faster than ROCm on AMD MI300X
 * 30–40% faster than standard CUDA pipelines on Nvidia GPUs

And they're just getting started.

🏛️ Why Breaking the CUDA Monopoly Matters

For Jay, this isn’t just about performance — it's about unlocking economic growth worldwide:

 * AI agents everywhere, for every person and every company
 * Democratized compute to drive the next Internet-scale boom
 * Tearing down the walls around closed ecosystems like CUDA

"If CUDA stays the moat, we don't get the economic revolution AI could bring."

Jay’s mission is simple yet radical: Open up compute for everyone — or risk losing the biggest opportunity in history.

💬 Final Thoughts: Building a Fairer AI Future

Jay’s story is a reminder that true innovation isn’t just about chasing margins — it’s about betting on bigger ideas:

 * Building systems that scale for billions of agents
 * Making AI affordable and accessible
 * Enabling a future where AI boosts prosperity worldwide

Lemurian Labs isn’t just chasing performance.
They’re chasing freedom, speed, and a new economic future.

📺 Watch the Full Talk 👉 Shoulda, Woulda, CUDA | Jay Dawani at Beyond CUDA Summit 2025


🚀 Run Efficient Models on AMD GPUs

Deploy your optimized models on TensorWave’s AMD-powered AI cloud—built for training, inference, and experimentation at scale on MI300X and MI325X GPUs.


About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.

AMD Instinct™ MI355X GPUs Now Available

Shoulda, Woulda, CUDA: Breaking the GPU Mold with Jay Dawani from Lemurian Labs

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

Product

Resources

Company

© 2025 TensorWave Inc. - All rights reserved.