AI’s Green Problem

Let's talk about AI's green problem. Among the numerous other technical and ethical issues that require the attention of AI developers in general—and generative AI developers in particular—there’s an AI elephant in the room, and he’s hungry: It takes an immense amount of electricity to train and operate AI applications. Given the stated goal of countries around the world to reduce the emission of carbon dioxide (CO₂) emissions from all sources, AI’s thirst for electricity isn’t helping.

AI Electricity Requirements By the Numbers

How much electricity does AI need? Hard numbers are difficult to come by, but recent research paints a distressing picture:

Across the board, data centers of all kinds account for 1 to 1.5% of the world’s energy use.
On average, cooling the servers in a data center adds up to 50% to the amount of electricity.
Researchers estimate that training OpenAI’s GPT-3, the predecessor to the much larger GPT-4, consumed 1.3 terawatt-hours of electricity, which generated 552 tons of carbon dioxide (equivalent) in the process. This is like driving 100 gas-powered cars for a year.
In the near future, AI-related electricity consumption could increase by 85 to 134 terawatt-hours per year.

Historically, the bulk of AI energy use occurred when training the model. With the large generative AI models, the opposite is true: Training requires relatively less electricity, but inference–using the model to do something useful–requires more, in some cases 10 times more. LLM inference also requires more electricity than conventional computing. For example, a generative AI query can use four to five times more electricity than an equivalent search-engine query.

And none of this includes the energy required to obtain raw materials and use them to manufacture and ship all the new GPUs that will be needed to support the expected explosion of AI models.

What Can Be Done?

What can society do to temper AI’s appetite for electricity and reduce its carbon footprint? Quite a bit, it turns out. Efforts to make AI development, training, and inference more efficient fall into a few broad categories.

Data Center Cooling

As noted earlier, the cooling of servers accounts for a large chunk of the energy used in data centers. CPUs and GPUs emit large amounts of waste heat, especially when they work at full capacity. This is why every desktop and laptop computer has at least one cooling fan, and some high-end graphics cards have one or two fans mounted directly on the card.

Most data centers rely on forced-air cooling—air conditioning—to cool the servers. But new hardware and data center designs incorporate liquid cooling—they pump a non-conducting oil through the servers, which is much more efficient. This approach requires new server hardware designs and new infrastructure in the data center, so it may be some time before it becomes common in data centers.

New Chip Designs

A wide range of approaches is being attempted to increase the energy efficiency of GPUs and CPUs. Among them:

Wafer-scale chips: Some new chips occupy an entire silicon wafer, which measure over 10 inches on a side. The newest Wafer-Scale Engine from Cerebras uses various tricks to cut down on the amount of computing needed for LLM inference, and these reduce energy requirements by 30% or more. However, these chips are intended for use in supercomputers and won’t likely find their way into standard servers.
Photonic chips: Most microchips do their work by moving electrons around. Using photons—light—instead of electrons could reduce the energy to move information around the chip. Some research groups have demonstrated the feasibility of this approach, but it’s still in the research stage with no commercial products yet.
Neuromorphic chips: The artificial neural networks (ANNs) that power most AI applications mimic the behavior of real neurons by modeling them in software. The hardware on which the software runs behaves nothing like a real neural network, which can accomplish the same tasks as an ANN faster and with much less energy. Research on neuromorphic chips intends to find ways to mimic, in silicon, the behavior of biological neurons. Again, this is in the early research stage.

Model Design

Another way to make AI more efficient is better model design and training. LLM model size (expressed by the number of parameters) has grown exponentially in recent years, with OpenAI’s GPT-4 model rumored to have over 1 trillion parameters. More parameters means more compute resources and energy needed for training.

But recent research shows that “small language models” (SLMs), with parameter counts that top out in the low billions, can perform as well as their larger counterparts. The key is to train them on better-curated data rather than on data indiscriminately scraped from the internet.

Another approach to improving model design uses less-precise floating-point numbers to represent the relationships between neurons in the ANN. Standard computers represent floating-point numbers (as opposed to integers) with 64 bits, but research has found that AI models work just as well—and more efficiently—with less precision (that is, fewer bits), and modern GPUs support eight-bit or even four-bit numbers.
Taking this to its logical extreme, some researchers propose one-bit LLMs. So far, however, no mainstream GPUs support this approach, so it’s not a short-term solution.

GPU Architecture

GPU manufacturers are doing their best to maximize the efficiency of their products. The latest flagship GPUs from AMD, Intel, and NVIDIA minimize the distance between the compute modules and high-bandwidth memory and use other design tricks that reduce power requirements while they raise performance.

How TensorWave Does Its Part

Most of the solutions described above involve exotic or experimental approaches that can’t be applied today. SLMs are an intriguing possibility, but gathering and vetting “good” data to train them on requires more effort than current approaches that acquire training data.

This leaves the incremental improvements in GPU efficiency and performance. As the leading AI cloud platform provider using AMD’s MI300X GPUs, TensorWave built its service from the ground up. We standardize on the MI300X for maximum efficiency and scalability. Coupled with AMD’s ROCm AI development software, which helps developers optimize training and inference tasks on the MI300 platform, TensorWave offers the most efficient AI development platform available today.

To learn more about how TensorWave can help you build efficient AI applications, contact TensorWave today.

About TensorWave

TensorWave is a cutting-edge cloud platform designed specifically for AI workloads. Offering AMD MI300X accelerators and a best-in-class inference engine, TensorWave is a top-choice for training, fine-tuning, and inference. Visit tensorwave.com to learn more.

SOC2 Type II certified and HIPAA compliant

Subscribe to our Blog

Stay ahead of the curve with the latest in AI, AMD accelerators, and all things TensorWave.