Building Limitless Worlds: TensorWave’s Hardware Meets Genies’ Vision for AI Interoperability
Mar 04, 2025
DeepSeek V3 and DeepSeek R1 have taken the world by storm, providing highly performant open weight r...

DeepSeek V3 and DeepSeek R1 have taken the world by storm, providing highly performant open weight reasoning models to the global AI community. These large (671 billion parameters) mixture-of-experts (MOE) models are perfect to make use of the MI300X and MI325X GPUs hosted on TensorWave, and our customer Genies is doing just that.
Genies is an AI avatar and games technology company powering the next generation of digital experiences through two core mantras:1) Anyone can create anything. Genies’ UGC tools let users create AI avatars, fashion, props, behaviors, and experiences; 2) Everything works with everything. The Genies Avatar Framework ensures interoperability across all AI avatars and experiences. This empowers individuals to build limitless digital experiences and enables IP owners to create social gaming ecosystems—what Genies calls “Parties”.
Over the past few months, Genies has been working closely with TensorWave to optimize AI-driven experiences using the latest advancements in machine learning, fine-tuning large-scale models to enhance interactivity, automation, and data-driven insights.
Genies is familiar with the DeepSeek family of models, using DeepSeek Coder V2 to enhance human-AI interaction for unstructured data extraction and statistical analytics. This includes utilizing Huggingface TRL (Transformer Reinforcement Library) and Microsoft DeepSpeed to conduct distributed fine-tuning of DeepSeek Coder V2 to their specific data and needs, all locally on TensorWave hardware. Using vLLM, this fine-tuned model can then be leveraged for query language auto-generation and automatic statistical analysis to extract user insights and support data-driven decision-making.
Now with the release of DeepSeek V3 and DeepSeek R1, Genies is leveraging these new and improved models for their improved performance and reasoning capabilities. The MI300X’s high GPU memory enables these large models to be run on a single node of 8 GPUs, eliminating any bottlenecks that would arise from hosting a single model across multiple nodes. In the larger family of new DeepSeek models, Genies has also explored DeepSeek-R1-Distill-Llama-70B for conversational AI use cases. This has enabled Genies to integrate Deepseek-R1 in their current pipelines while ensuring availability for critical workloads and addressing privacy concerns.
Genies has also been exploring the fine-tuning of R1-distilled models, using Huggingface’s Open R1 framework. Open R1 is an exciting tool to replicate DeepSeek’s R1 pipeline and fine-tune any model into a reasoning model. Their experiments reveal that training LLMs on high-reasoning tasks has become significantly easier. Labeling Chain-of-Though (CoT) datasets has always been a big bottleneck in LLM communities. GRPO algorithm (one of the differentiators of DeepSeek R1) no longer requires labeled CoT datasets, as models can now learn reasoning trajectories directly from final results. Their experiments revealed that even smaller LLMs with size 7B can learn decent reasoning trajectories for problems with deterministic rewards from as low as 10k samples.
Recent improvements by AMD ensure that even this new fine-tuning pipeline operates optimally and efficiently.
TensorWave is excited to continue exploring new models and techniques with Genies, building on the latest and greatest AMD hardware!