Published: Apr 22, 2025
Shoulda, Woulda, CUDA: Breaking the GPU Mold with Jay Dawani from Lemurian Labs

At Beyond CUDA Summit 2025, Jay Dawani, CEO of Lemurian Labs, didn’t just give a talk — he threw down a challenge to the entire AI industry:
What if CUDA wasn’t the only path forward?
In a bold, thought-provoking session, Jay shared his journey from OpenAI to founding Lemurian Labs — and why accelerated software, not just chips, is the future of AI.👇
🎓 Jay’s Journey: From Robotics to Revolution
Jay’s background reads like a roadmap for the future of AI:
- Studied Applied Math, mentored by Geoffrey Hinton
- Early engineer at OpenAI working on robotics
- Helped develop self-driving cars, rocket landers, and robotic explorers
But Jay didn’t stop at applications — he saw a much bigger problem brewing deep in the compute stack.
⚡ The Harsh Lesson: Building Chips Is Hard — And Not Enough
Lemurian Labs originally set out to build their own chip.
Simulations showed orders of magnitude better performance... but reality hit hard:
- Hardware alone wouldn’t solve the coming compute challenges
- Free performance gains every chip generation were ending
- Software had to evolve — fast
"We realized the era of accelerated computing was ending. Now it’s about accelerated software."
🧠 The Big Idea: Reinventing Compilers and Runtimes
Faced with an exploding world of AI models and compute needs, Jay and his team went back to the drawing board:
- Designed a new compiler and runtime stack for a truly heterogeneous world
- Built dynamic scheduling systems that optimize memory, not just ops
- Generate optimal kernels for any hardware — not just for CUDA or a single vendor
Their stack doesn’t just compete — it crushes benchmarks on AMD and Nvidia hardware alike.
📊 Performance Highlights: AMD MI300X and Beyond
Testing Lemurian Labs' software on real-world hardware showed jaw-dropping results:
- 2x faster than ROCm on AMD MI300X
- 30–40% faster than standard CUDA pipelines on Nvidia GPUs
And they're just getting started.
🏛️ Why Breaking the CUDA Monopoly Matters
For Jay, this isn’t just about performance — it's about unlocking economic growth worldwide:
- AI agents everywhere, for every person and every company
- Democratized compute to drive the next Internet-scale boom
- Tearing down the walls around closed ecosystems like CUDA
"If CUDA stays the moat, we don't get the economic revolution AI could bring."
Jay’s mission is simple yet radical: Open up compute for everyone — or risk losing the biggest opportunity in history.
💬 Final Thoughts: Building a Fairer AI Future
Jay’s story is a reminder that true innovation isn’t just about chasing margins — it’s about betting on bigger ideas:
- Building systems that scale for billions of agents
- Making AI affordable and accessible
- Enabling a future where AI boosts prosperity worldwide
Lemurian Labs isn’t just chasing performance.
They’re chasing freedom, speed, and a new economic future.
📺 Watch the Full Talk 👉 Shoulda, Woulda, CUDA | Jay Dawani at Beyond CUDA Summit 2025
🚀 Run Efficient Models on AMD GPUs
Deploy your optimized models on TensorWave’s AMD-powered AI cloud—built for training, inference, and experimentation at scale on MI300X and MI325X GPUs.
About TensorWave
TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.
Ready to get started? Connect with a Sales Engineer.