Do you need GPUs for generative AI systems?

Often, I catch on to trends by looking for common patterns in the questions reporters ask me. In many instances, they are much more in touch with the market than I am, and they are a good data point. Take the calls that I’ve been getting about what problems may arise if there is a graphics processing unit (GPU) shortage.

First, if that does happen, it wouldn’t likely last long. Second, other viable options should be considered. Of course, the angle is doom and gloom, with the fear that businesses won’t be able to take advantage of the generative AI revolution if they can’t get these processors, either for use with on-premises systems or in the cloud and on demand.

Fake problem?

I’m the first to admit that generative AI systems are complex and processor-intensive. Thus, the assumption is that they must rely on highly specialized hardware to perform tasks that were once the exclusive domain of human imagination. People figure that generative AI needs GPUs or even more specialized processing such as quantum computing.

Are those assumptions always right? Is this another specialized system where specialized components are needed at very specialized prices?

GPUs were initially developed for rendering graphics in video games but have become instrumental in AI due to their highly parallel structure. They can perform thousands of operations simultaneously. This aligns perfectly with the tasks required by neural networks, the critical technology in generative AI. That’s a technical fact that people designing and building generative AI systems (like yours truly) should carefully consider.

Tensor Processing Units (TPUs), on the other hand, are Google’s custom-developed, application-specific, integrated circuits designed explicitly for TensorFlow. TensorFlow is an open-source machine-learning framework that’s been around for a while. TPUs assist in the machine learning processes since they are tailored for forward and backward propagation. These are processes leveraged for training neural networks. I don’t view TPUs as being as much of an issue as GPUs when it comes to cost. However, they are often bound together, so it’s worth a mention here.

Those of you who build and deploy these systems know that no matter what AI framework you’re using, most of the processing and time is spent training the models from gobs and gobs of data. For instance, consider OpenAI’s GPT-4 or Google’s BERT models, which have billions of parameters. Training such models without specialized processors could take an impractical amount of time.

Are specialized processors always needed?

GPUs greatly enhance performance, but they do so at a significant cost. Also, for those of you tracking carbon points, GPUs consume notable amounts of electricity and generate considerable heat. Do the performance gains justify the cost?

CPUs are the most common type of processors in computers. They are everywhere, including in whatever you’re using to read this article. CPUs can perform a wide variety of tasks, and they have a smaller number of cores compared to GPUs.

However, they have sophisticated control units and can execute a wide range of instructions. This versatility means they can handle AI workloads, such as use cases that need to leverage any kind of AI, including generative AI.

CPUs can prototype new neural network architectures or test algorithms. They can be adequate for running smaller or less complex models. This is what many businesses are building right now (and will be for some time) and CPUs are sufficient for the use cases I’m currently hearing about.

How much do you really need to pay?

CPUs are more cost-effective in terms of initial investment and power consumption for smaller organizations or individuals who have limited resources. However, even for enterprises with many resources, they still may be the more cost-effective choice.

Also, AI is evolving. With the recent advancements in AI algorithms, there are new developments like SLIDE (Sub-Linear Deep Learning Engine). This technology claims to train deep neural nets faster on CPUs than on GPUs under certain conditions. They are using hashing techniques and reducing memory access costs.

Also, consider field-programmable gate arrays (FPGAs). These processors can be programmed after manufacturing to perform specific tasks, such as AI, much more efficiently. Also, associative processing units (APUs) specialize in pattern recognition and can handle associative memory tasks, making certain types of neural network applications run faster.

There are many instances where non-GPU processors are much more cost-effective. So why is the answer always GPUs when it comes to generative AI or just AI in general? I’m not sure it needs to be.

I suspect enterprises will spend millions of dollars more than they need to because they feel that the cost justifies the performance gains. This will be both GPU processing consumption within a public cloud, on-premises, and some within edge computers.

The call-out here is not to limit the use of GPUs but to consider what you really need for your specific use case. Most generative AI applications will be small tactical deployments and really won’t need the cost and the carbon impact of GPUs.

The core job of systems architects, cloud architects, and now generative AI architects is to find the most cost-optimized solution. What configuration of technology will cost the least and provide the most business value at the same time? Perhaps generative AI is an area of forthcoming new development where we can make better and more pragmatic choices. Don’t just follow the hype.

READ SOURCE