Simplifying AI Infrastructure: dstack’s Open Source Alternative to Kubernetes

At the Beyond CUDA Summit 2025, Andrey Cheptsov, CEO and Founder of dstack, unveiled a bold vision:
Simplify container orchestration for AI teams — without the complexity of Kubernetes or Slurm.

Here’s everything you need to know about this open-source innovation designed to accelerate AI development and deployment.👇

🔧 The Problem: Why Kubernetes and Slurm Fall Short for AI

While Kubernetes and Slurm are widely used to orchestrate workloads, they weren’t built with AI in mind:

Kubernetes ➔ Great for DevOps, but too low-level and manual for AI engineers
Slurm ➔ Built for HPC, not modern cloud-native AI workflows

Result?
AI teams waste valuable time building internal platforms instead of focusing on models, training, and data.

🛠️ The Solution: dstack — AI-Native Container Orchestration

dstack offers a simple, cloud-agnostic container orchestrator built specifically for AI.

Key features:

Works with any accelerator: NVIDIA, AMD, Google TPUs, Intel Gaudi
Supports any cloud: Hyperscalers, private clouds, and even on-prem clusters
Vendor agnostic: Total freedom over frameworks, data, and models
Integrated with TensorWave for high-performance AMD MI300X and MI325X cloud deployments

dstack abstracts away infrastructure complexity — letting AI teams focus only on building and shipping models.

Unified Interfaces for the Entire AI Workflow

dstack provides five simple interfaces to cover all AI team needs:

Dev Environments ➔ Spin up remote workspaces instantly from your desktop IDE
Tasks ➔ Launch training, fine-tuning, and batch jobs across clouds or on-prem
Services ➔ Deploy scalable inference endpoints (e.g., using VLLM, SGLang)
Fleets ➔ Manage distributed GPU clusters
Volumes ➔ Use persistent storage across runs for checkpoints, caching, and datasets

All controlled by a few YAML specs and a simple CLI:
dstack apply ➔ Done. ✅

🧠 Real-World Examples: Development to Large-Scale Training

🔹 Dev Environments:
Spin up a remote GPU-powered coding environment from your laptop in minutes.

🔹 Training with Tasks:
Define distributed jobs using any framework (Megatron, DeepSpeed, HuggingFace Accelerate) and let dstack handle cluster provisioning.

🔹 Inference with Services:
Auto-scale your LLM inference endpoints based on demand — without worrying about infrastructure plumbing.

🔹 Persistent Storage:
Cache models, save training checkpoints, and manage data across sessions — cloud and on-prem supported.

Built for Flexibility: Cloud, On-Prem, and Hybrid

Whether you run on TensorWave's AMD AI Cloud, AWS, GCP, Azure, or your own GPU servers:

Cloud-native ➔ Native integrations with all major providers
On-prem friendly ➔ Just register your GPU hosts via SSH
Hybrid-ready ➔ Combine cloud and on-prem seamlessly

You get full control — no lock-in, no compromises.

💬 Final Takeaway: Open Source Simplicity for AI Builders

Andrey closed the session by inviting everyone to try out dstack:

100% Open Source
Fast-moving development
Designed to make AI infrastructure effortless

👉 Explore the dstack GitHub repo and start building smarter, not harder.

The future of AI infrastructure is open, simple, and accelerator-agnostic — and dstack is leading the way. 🚀

📺 Watch the Full Talk 👉 Simplifying Container Orchestration for AI | Beyond CUDA Summit 2025

🚀 Deploy AI Workloads on AMD MI300X and MI325X Cloud 👉 Explore TensorWave’s AI Cloud Solutions for training, inference, and scaling LLMs at cost-effective speeds.

About TensorWave

TensorWave is the AI and HPC cloud purpose-built for performance. Powered exclusively by AMD Instinct™ Series GPUs, we deliver high-bandwidth, memory-optimized infrastructure that scales with your most demanding models—training or inference.

Ready to get started? Connect with a Sales Engineer.

SOC2 Type II certified and HIPAA compliant

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

SOC2 Type II certified and HIPAA compliant

TensorWave Welcomes the AMD Instinct™ MI355X

Simplifying AI Infrastructure: dstack’s Open Source Alternative to Kubernetes

About TensorWave

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.

Product

Resources

Company

© 2025 TensorWave Inc. - All rights reserved.