Google unveils dual TPU strategy for training and inference

Mathures Paul

Published 24.04.26, 12:07 PM

Google is separating the tasks of training artificial intelligence and handling inference work into distinct processors. The move will allow the company to take on Nvidia in AI hardware.

Alphabet’s Google Cloud division said its eighth generation of custom-built AI chips, or tensor processing units (TPUs), will be split in two. The TPU 8t will be geared towards model training, while TPU 8i is designed to run AI services after they have been created — a stage known as inference. Demand for inference is gaining momentum as businesses embrace AI agents capable of writing software and performing other tasks. Both chips will become available later this year.

At the Google Cloud Next event, the company also announced a $750 million fund to help boost corporate AI adoption and showcased tools for building AI agents.

Google is one of the most successful makers of in-house AI chips in an industry dominated by Nvidia. TPUs have become wildly popular in Silicon Valley, and Google is trying to build on that momentum with the latest versions.

“With the rise of AI agents, we determined the community would benefit from chips individually specialised to the needs of training and serving,” Amin Vahdat, a Google senior vice-president and chief technologist for AI and infrastructure, said in a blog post.

Most leading technology companies are pursuing custom semiconductor development for AI to maximise efficiency, while also addressing specialised use cases. Apple, for example, has included Neural Engine AI components in its in-house iPhone chips for years. Microsoft announced a second-generation AI chip in January. A few days ago, Meta said it is working with Broadcom to develop multiple versions of AI processors.

Google has been early to the field. In 2015, the company began using processors it had designed for running AI models.
The training chip, 8t, can be combined into groups of 9,600 semiconductors. Systems of this scale require massive amounts of power, so owners need more efficient systems to make the best use of limited electricity. TPU 8t delivers 124 per cent more performance per watt than the preceding generation, with TPU 8i providing a gain of 117 per cent.

The architecture is designed “to deliver the massive throughput and low latency needed to concurrently run millions of agents cost-effectively,” Sundar Pichai, chief executive of Google parent Alphabet, wrote in a blog post.