Hot Takes on AI Compute: Industry Leaders Sound Off at Beyond CUDA 2025

Apr 04, 2025

What do you get when you put the loudest voices in AI hardware on one stage? A panel full of unfilte...

What do you get when you put the loudest voices in AI hardware on one stage? A panel full of unfiltered opinions, sharp critiques, and honest optimism for what’s next.

At the Beyond CUDA 2025 Summit, a heavyweight crew — Dylan Patel (SemiAnalysis), Anush Elangovan (AMD), Darrick Horton (TensorWave), Eugene Cheah (Featherless AI), and Mark Saroufim (PyTorch) — sat down for a rapid-fire discussion on AI infrastructure, open-source ecosystems, ROCm vs. CUDA, and what the future of compute really looks like.

Here’s a no-fluff breakdown of the boldest takes, biggest tensions, and emerging consensus on the future of AI compute.

ROCm’s Growing Pains — and Rapid Recovery

Dylan Patel didn’t sugarcoat it: when SemiAnalysis first benchmarked the MI300X for training, the experience was rough.

“We filed the most issues AMD got from any PyTorch user for three straight months.”

The performance wasn’t there out of the box. Memory leaks. Broken layers. Missing optimizations. But Patel credits AMD’s engineering team — and especially Anush Elangovan — for responding fast, fixing fast, and shifting focus toward a better dev experience.

Anush owned it:

“We acknowledge where we are. That’s step one. And now we’re executing aggressively — measured in weeks, not years.”

Why TensorWave Went All-In on AMD

TensorWave CEO Darrick Horton addressed a question he gets all the time: why bet the company on AMD?

“When we launched, compute was scarce, pricing was insane, and the market was locked into a closed ecosystem.”

AMD, he said, was the only hardware vendor aligned with TensorWave’s ethos: open source, accessibility, and the desire to break Nvidia’s monopoly. AMD wasn’t perfect, but they were hungry — and credible.

“We don’t want to offer both Nvidia and AMD. We want to fix the problem.”

Open-Source AI: Movement or Mirage?

When it comes to open-source, the panel had range.

Eugene Cheah brought the heat:

“There’s a tug-of-war between centralized superclusters and the open army of smaller models. That’s where AMD has the edge — sub-16 node workloads, not 100K GPU mega-clusters.”

Dylan Patel was more skeptical:

“A lot of AMD software is just copied from Nvidia… If you open source something, AMD might just replace the ‘N’ with an ‘R’.”

Anush, ever the diplomat, focused on building from first principles:

“We’re on the other side of the bridge now. We’re not shuttling code anymore — we’re camping out and building from the ground up.”

Beyond Attention… and Even Backprop?

The panel didn’t just stay in the weeds — they zoomed out, too.

What happens after attention? What replaces backprop? Will we even need GPUs as we know them?

Cheah noted:

“By the time we optimize for one architecture, it’s obsolete. R1 came along and broke everything again. We’re always chasing.”

The consensus: backpropagation might eventually get replaced, but no one’s found the breakthrough yet. And when that moment hits — as it did with quantization — the stack will need to adapt fast.

Why Exotic Hardware Isn’t Ready (Yet)

Someone from the audience asked about non-GPU architectures. The answer was nearly unanimous: not yet.

Darrick made it plain:

“New architectures might be great for one paradigm. But what happens when that paradigm shifts? AI is moving too fast. GPUs are the only hardware flexible enough to survive the transition.”

Until someone builds a viable software ecosystem and gains adoption, GPUs — and increasingly, AMD GPUs — are the only game in town.

2025 → 2030: Who’s on Top?

The panel closed with a crystal ball question: who are the top AI hardware players today, and who will lead in 2030?

Today’s top 3 were obvious:

  1. Nvidia
  2. Google TPU
  3. AMD Instinct

By 2030?

“Number one: AMD.
Number two: AMD.
Number three: AMD.”

Bold prediction. But if ROCm continues maturing, open-source momentum grows, and developer-first thinking stays center stage — it might not be as crazy as it sounds.

TL;DR

  • ROCm’s dev experience still has gaps, but AMD is catching up fast
  • TensorWave is all-in on AMD because of shared values: openness, accessibility, and disruption
  • Open source is where the real flexibility lies — and AMD thrives in sub-100 GPU workloads
  • GPUs remain the most viable hardware for future-proof AI development
  • The next two years? All about training at scale and software maturity

About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.