Building Large Language Models with the Power of AMD Instinct GPUs and AMD EPYC CPUs
Aug 28, 2024
In the popular conscience, ChatGPT seemed to come out of nowhere. All of a sudden in 2023, people we...

In the popular conscience, ChatGPT seemed to come out of nowhere. All of a sudden in 2023, people were talking about large language models (LLMs), computer applications that “hallucinate,” and the important new skill set of “prompting.”
In point of fact, ChatGPT, which is based on the proprietary GPT 4 LLM, didn’t “come out of nowhere.” It was the culmination of years of research and development at OpenAI to expand and improve on earlier LLMs.
ChatGPT may be a curiosity, but LLMs are becoming increasingly important in research, education, and business as organizations imagine new ways to use these tools to automate content creation.
Training an LLM requires immense amounts of data and computing power. One interesting LLM project, called TurkuNLP, is underway at the University of Turku in Finland. Researchers there are attempting to develop and extend open, non-proprietary LLMs to Finnish and other languages using the LUMI supercomputer.
Background: The Challenge of Building Large Language Models
The size and complexity of an LLM depends mainly on the number of parameters in its design. A small machine-learning model designed for a single narrow task might have a few thousand parameters. Because the purpose of an LLM is to use natural language processing (NLP) techniques to process text or spoken prompts and generate reasonable and “natural” responses, the parameter counts are much higher—in the millions or billions. GPT 3 has 176 billion parameters, and GPT 4 is rumored to have over 1 trillion.
Compounding the problem of LLM training resource requirements is the fact that an LLM has to be trained and then tested multiple times before it is released for production use, and even then, there might be further tweaks as issues are uncovered. All this activity increases the demand for computing resources.
The LUMI Supercomputer Solution
The LUMI supercomputer, built by Hewlett-Packard Enterprise (HPE) and owned by the EuroHPC Joint Undertaking, is hosted at the IT Center for Science’s data center (CSC) in Finland and is managed by the LUMI consortium. With 2,560 AMD EPYC CPUs and 10,240 AMD Instinct MI250X GPU accelerators, LUMI was just what the TurkuNLP needed to power their LLM research.
The application of LUMI to TurkuNLP’s LLM projects brought together researchers and engineers from TurkuNLP, HPE, and AMD to configure LUMI with the right software to take full advantage of the computer’s high-performance CPU and GPU resources.
AMD's Technology: A Closer Look
The AMD CPUs and GPUs powering LUMI are ideal for high-performance computing tasks such as LLM development.
- AMD EPYC server processors, with up to 64 processing cores and 128 threads each, combine high speed and high energy efficiency—both important factors for LLM training.
- AMD Instinct MI250X GPUs feature high throughput, with 95.7 trillion floating-point operations per second (TFLOPS) at 64-bit precision and 383 TFLOPS at 16-bit precision. Because many LLM computational operations do not require high precision, TurkuNLP can take advantage of the higher throughput.
In LUMI, each EPYC CPU manages four GPUs. The LLM development and training software platform efficiently divides up the computational legwork among all resources to perform operations as quickly as possible.
Project Outcomes: Scaling AI Workloads with LUMI
TurkuNLP is working with BLOOM, currently the largest open-access LLM. With LUMI, the team was able to leverage the 176-billion-parameter BLOOM model and add Finnish to it (40 billion tokens, which can be characters, syllables, or words). By scaling to 192 of LUMI’s 2,560 nodes on LUMI, the team trained the model in two weeks. This was exceptional performance for training an LLM, where training even a “small” 1-billion parameter model can take months using conventional hardware.
Sampo Pyysalo, a University of Turku Research Fellow with TurkuNLP, is thankful that LUMI was available for this important research and for AMD’s involvement. “AMD did a great job importing the most important software in this area to their platform,” he says. “AMD technical staff also worked closely with us during the LUMI pilot period, helping us get over bottlenecks. For example, a communications overhead issue was resolved using a custom module with
libfabric access. That fundamentally changed our ability to continue scaling to several
hundred nodes.”
Väinö Hatanpää, a Machine Learning Specialist at CSC, emphasizes the contribution LUMI can make to other LLM research projects. “Large-scale experiments like this are providing really valuable information for us. The optimization of the libfabric connection, for example, gives valuable information for CSC that we can include in our guides, which then helps others to use our systems more efficiently. The computing capacity and the ability to scale further with LUMI enables our customers to push the boundaries of machine learning/AI.”
Many LLMs, whether proprietary or open-access, focus on English, so efforts to extend them to other languages have important implications for spreading the benefits of this technology around the world.
AMD's Role in Advancing AI Research
For several years, the GPU market has been dominated by NVIDIA’s line of GPU products. Recently, however, AMD has been making inroads in this area with compelling, high-performance GPUs. In this way, AMD is committed to advancing the state of the art in AI research.
AMD understands the importance of collaboration. Without the involvement of AMD and HPE, TurkuNLP’s research would not have come as far or as fast. These partnerships help research teams such as TurkuNLP achieve their goals while giving AMD real-world feedback to inform its future generations of GPU products.
Conclusion
LLMs, with their potential to automate many areas of content creation using natural-language interfaces, can be an important part of organizations’ digital transformations, but much research remains to be done to make them useful and reliable in multiple languages. As the foregoing case study shows, research teams such as TurkuNLP are relying on advanced technology from AMD to “move the needle” on this research.
As the technology spreads and improves, look for further breakthroughs in LLMs powered by high-performance computing platforms such as LUMI.
Further Reading and Resources
- AMD’s case study summary of the TurkuNLP–LUMI project
- AMD’s EPYC server processors
- AMD’s Instinct GPU accelerators
This outline is designed to showcase the challenges of developing LLMs, the solution provided by AMD's technology, and the remarkable outcomes of this collaboration. Add a section about how these advantages are why TensorWave has chosen to utilize AMD's Instinct™ MI300X Accelerators for its cloud compute platform.
About TensorWave
TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.