Amazon Plans to Rival Nvidia With Its Own AI Chips

From Matt Day, Ian King and Dina Bass, published at Sun Nov 24 2024

In a bland north Austin neighborhood dominated by anonymous corporate office towers, Amazon.com Inc. engineers are toiling away on one of the tech industry’s most ambitious moonshots: loosening Nvidia Corp.’s grip on the $100-billion-plus market for artificial intelligence chips.

Amazon’s utilitarian engineering lab contains rows of long work benches overlooking the Texas capital’s mushrooming suburbs. The place is kind of a mess. Printed circuit boards, cooling fans, cables and networking gear are strewn around workstations in various states of assembly, some muddied with the thermal paste used to connect chips to the components that keep them from overheating. There’s a bootstrapping vibe you’d expect to see at a startup not a company with a market cap exceeding $2 trillion.

The engineers who work here think nothing of running to Home Depot for a drill press and are happy to learn subjects outside their area of expertise if doing so will speed things up. Years into a scramble to create machine learning chips from scratch, they have found themselves on the hook to roll out an Nvidia fighter as quickly as they can. This is not about raw horsepower. It’s about building a simple, reliable system that can quickly turn Amazon data centers into humongous AI machines.

Rami Sinno, a gregarious Lebanese-born engineer who has worked in the chip industry for decades, is in charge of chip design and testing. He helped create the first two generations of Amazon AI semiconductors and is now rushing to get the latest iteration, Trainium2, running reliably in data centers by the end of the year. “What keeps me up at night is, how do I get there as quickly as possible,” Sinno says.

In the past two years, Nvidia has transformed from a niche chipmaker to the main supplier of the hardware that enables generative AI, a distinction that has made the company the world’s largest by market value. Nvidia processors cost tens of thousands of dollars apiece and, thanks to overwhelming demand, are hard to get hold of. Last week, after reporting earnings, the chipmaker told investors that demand for its latest hardware will outstrip supply for several quarters — deepening the crunch.

Nvidia’s biggest customers — cloud providers like Amazon Web Services, Microsoft Corp.’s Azure and Alphabet Inc.’s Google Cloud Platform — are eager to reduce their reliance on, if not replace, Nvidia chips. All three are cooking up their own silicon, but Amazon, the largest seller of rented computing power, has deployed the most chips to date.

In many ways, Amazon is ideally situated to become a power in AI chips. Fifteen years ago, the company invented the cloud computing business and then, over time, started building the infrastructure that sustains it. Reducing its reliance on one incumbent after another, including Intel Corp., Amazon ripped out many of the servers and network switches in its data centers and replaced them with custom-built hardware. Then, a decade ago, James Hamilton, a senior vice president and distinguished engineer with an uncanny sense of timing, talked Jeff Bezos into making chips.

When OpenAI’s ChatGPT kicked off the generative AI age two years ago, Amazon was widely considered an also-ran, caught flat-footed and struggling to catch up. It has yet to produce its own large language model that is seen as competitive with the likes of ChatGPT or Claude, built by Anthropic, which Amazon has backed to the tune of $8 billion. But the cloud machinery Amazon has built — the custom servers, switches, chips — has positioned Chief Executive Officer Andy Jassy to open an AI supermarket, selling tools for businesses that want to use models built by other outfits and chips for companies that train their own AI services.

After almost four decades in the business, Hamilton knows taking Amazon’s chip ambitions to the next level won’t be easy. Designing reliable AI hardware is hard. Maybe even harder is writing software capable of making the chips useful to a wide range of customers. Nvidia gear can smoothly handle just about any artificial intelligence task. The company is shipping its next-generation chips to customers, including Amazon, and has started to talk up the products that will succeed them a year from now. Industry observers say Amazon isn’t likely to dislodge Nvidia anytime soon.

Still, time and again, Hamilton and Amazon’s teams of engineers have demonstrated their capacity to solve big technical problems on a tight budget. “Nvidia is a very, very competent company doing excellent work, and so they’re going to have a good solution for a lot of customers for a long time to come,” Hamilton says. “We’re strongly of the view that we can produce a part that competes with them toe to toe.”

Hamilton joined Amazon in 2009 after stints at International Business Machines Corp. and Microsoft. An industry icon who got his start repairing luxury cars in his native Canada and commuted to work from a 54-foot boat, Hamilton signed on at an auspicious time. Amazon Web Services had debuted three years earlier, singlehandedly creating an industry for what came to be known as cloud computing services. AWS would soon start throwing off gobs of cash, enabling Amazon to bankroll a number of big bets.

Back then, Amazon built its own data centers but equipped them with servers and network switches made by other companies. Hamilton spearheaded an effort to replace them with custom hardware, starting with servers. Since Amazon would be buying millions of them, Hamilton reckoned he could lower costs and improve efficiency by tailoring the devices for his growing fleet of data centers and leaving out features that AWS didn’t need.

The effort was successful enough that Jassy — then running AWS — asked what else the company might design in-house. Hamilton suggested chips, which were gobbling up more and more tasks that had previously been handled by other components. He also recommended that Amazon use the energy-efficient Arm architecture that powers smartphones, a bet that the technology’s ubiquity, and developers’ growing familiarity with it, could help Amazon displace the Intel chips that had long powered server rooms around the world.

“All paths lead to us having a semiconductor design team,” he wrote in a proposal presented to Bezos in August 2013. A month later, Hamilton, who likes to hang out with startups and customers in the late afternoon, met Nafea Bshara for a drink at Seattle’s Virginia Inn pub.  

An Israeli chip industry veteran who relocated to the San Francisco Bay area in the early 2000s, Bshara co-founded Annapurna Labs, which he named for the Nepalese peak. (Bshara and a co-founder had intended to summit the mountain before founding the startup. But investors were eager for them to get to work, and they never made the trip.) 

The stealthy startup set out to build chips for data centers at a time when most of the industry was fixated on mobile phones. Amazon commissioned processors from Annapurna and, two years later, acquired the startup for a reported $350 million. It was a prescient move. 

Bshara and Hamilton started small, a reflection of their shared appreciation for utilitarian engineering. Back then, each data center server reserved a portion of its horsepower to run control, security and networking features. Annapurna and Amazon engineers developed a card, called Nitro, that vacuumed those functions off the server entirely, giving customers access to its full power.

Later, Annapurna brought Hamilton’s Arm general-purpose processor to life. Called Graviton, the product operated more cheaply than rival Intel gear and made Amazon one of the 10 biggest customers of Taiwan Semiconductor Manufacturing Co., the titan that produces chips for much of the industry.  

Amazon brass had by then grown confident Annapurna could excel even in unfamiliar areas. “You’ll find a lot of companies are very good in CPU, or very good in networking,” Bshara says. “It’s very rare to find the teams that are good in two or three or four different domains.”

While Graviton was in development, Jassy asked Hamilton what other things Amazon might make itself. In late 2016, Annapurna deputized four engineers to explore making a machine learning chip. It was another timely bet: A few months later, a group of Google researchers published a seminal paper proposing a process that would make generative AI possible. 

The paper, titled “Attention is All You Need,” introduced transformers, a software design principle that helps artificial intelligence systems identify the most important pieces of training data. It became the foundational method behind systems that can make educated guesses at the relationships between words and create text from scratch.

At about this time, Rami Sinno was working for Arm Holdings Plc in Austin and coaching his school-age son through a robotics competition. The team built an app that used machine learning algorithms to pore over photos and detect the algae blooms that periodically foul Austin’s lakes in the summer. Impressed by what kids could do with little more than a laptop, Sinno realized a revolution was coming. He joined Amazon in 2019 to help lead its AI chipmaking efforts.  

The unit’s first chip was designed to power something called inference — when computers trained to recognize patterns in data make a prediction, such as whether a piece of email is spam. That component, called Inferentia, rolled out to Amazon’s data centers by December 2019, and was later used to help the Alexa voice assistant answer commands. Amazon’s second AI chip, Trainium1, was aimed at companies looking to train machine learning models. Engineers also repackaged the chip with components that made it a better fit for inference, as Inferentia2.

Demand for Amazon’s AI chips was slow at first, meaning customers could get access to them immediately rather than waiting weeks for big batches of Nvidia hardware. Japanese firms looking to quickly join the generative AI revolution took advantage of the situation. Electronics maker Ricoh Co., for example, got help converting large language models trained on English-language data to Japanese.

Demand has since picked up, according to Gadi Hutt, an early Annapurna employee who works with companies using Amazon chips. “I don’t have any excess capacity of Trainium sitting around waiting for customers,” he says. “It’s all being used.”

Trainium2 is the company’s third generation of artificial intelligence chip. By industry reckoning, this is a make-or-break moment. Either the third attempt sells in sufficient volume to make the investment worthwhile, or it flops and the company finds a new path. “I have literally never seen a product deviate from the three-generation rule,” says Naveen Rao, a chip industry veteran who oversees AI work at Databricks Inc., a purveyor of data and analytics software.

Databricks in October agreed to use Trainium as part of a broad agreement with AWS. At the moment, the company’s AI tools primarily run on Nvidia. The plan is to displace some of that work with Trainium, which Amazon has said can offer 30% better performance for the price, according to Rao. “It comes down to sheer economics and availability,” Rao says. “That’s where the battleground is.”

Trainium1 was comprised of eight chips, nestled side by side in a deep steel box that allows plenty of room for their heat to dissipate. The full package that AWS rents to its customers is made up of two of these arrays. Each case is filled with wires, neatly enclosed in mesh wrapping.  

For Trainium2, which Amazon says has four times the performance and three times the memory of the prior generation, engineers scrapped most of the cables, routing electrical signals instead via printed circuit boards. And Amazon cut the number of chips per box down to two, so that engineers performing maintenance on one unit take down fewer other components. Sinno has come to think of the data center as a giant computer, an approach Nvidia boss Jensen Huang has encouraged the rest of the industry to adopt. “Simplification is critical there, and it also allowed us to go faster for sure,” Sinno says.

Amazon didn’t wait for TSMC to produce a working version of Trainium2 before starting to test how the new design might work. Instead, engineers fixed two prior generation chips onto the board, giving them time to work on the control software and test for electrical interference. It was the semiconductor industry equivalent of building the plane while it’s flying. 

Amazon has started shipping Trainium2, which it aims to string together in clusters of up to 100,000 chips, to data centers in Ohio and elsewhere. A broader rollout is coming for Amazon’s main data center hubs.

The company aims to bring a new chip to market about every 18 months, in part by reducing the number of trips hardware has to make to outside vendors. Across the lab from the drill press sits a set of oscilloscopes Amazon uses to test cards and chips for bum connectors or design flaws. Sinno hints at the work already underway on future editions: In another lab, where earsplitting fans cool test units, four pairs of pipes dangle from the ceiling. They’re capped now but are ready for the day when future AWS chips produce too much heat to be cooled by fans alone.

Other companies are pushing the limits, too. Nvidia, which has characterized demand for its chips as “insane,” is pushing to bring a new chip to market every year, a cadence that caused production issues with its upcoming Blackwell product but will put more pressure on the rest of the industry to keep up. Meanwhile, Amazon’s two biggest cloud rivals are accelerating their own chip initiatives.

Google began building an AI chip about 10 years ago to speed up the machine learning work behind its search products. Later on, the company offered the product to cloud customers, including AI startups like Anthropic, Cohere and Midjourney. The latest edition of the chip is expected to be widely available next year. In April, Google introduced its first central processing unit, a product similar to Amazon’s Graviton. “General purpose compute is a really big opportunity,” says Amin Vahdat, a Google vice president who leads engineering teams working on chips and other infrastructure. The ultimate aim, he says, is getting the AI and general computing chips working together seamlessly.

Microsoft got into the data center chip game later than AWS and Google, announcing an AI accelerator called Maia and a CPU named Cobalt only late last year. Like Amazon, the company had realized it could offer customers better performance with hardware tailored to its data centers.

Rani Borkar, a vice president who spent almost three decades at Intel, leads the effort. Earlier this month, her team added two products to Microsoft’s portfolio: a security chip and a data processing unit that speeds up the flow of data between CPUs and graphics processing units, or GPUs. Nvidia sells a similar product. Microsoft has been testing the AI chip internally and just started using it alongside its fleet of Nvidia chips to run the service that lets customers create applications with OpenAI models.

While Microsoft’s efforts are considered a couple of generations behind Amazon’s, Borkar says the company is happy with the results so far and is working on updated versions of its chips. “It doesn’t matter where people started,” she says. “My focus is all about: What does the customer need? Because you could be ahead, but if you are building the wrong product that the customer doesn’t want, then the investments in silicon are so massive that I wouldn’t want to be a chapter in that book.”

Despite their competitive efforts, all three cloud giants sing Nvidia’s praises and jockey for position when new chips, like Blackwell, hit the market.

Amazon’s Trainium2 will likely be deemed a success if it can take on more of the company’s internal AI work, along with the occasional project from big AWS customers. That would help free up Amazon’s precious supply of high-end Nvidia chips for specialized AI outfits. For Trainium2 to become an unqualified hit, engineers will have to get the software right — no small feat. Nvidia derives much of its strength from the comprehensiveness of its suite of tools, which let customers get machine-learning projects online with little customization. Amazon’s software, called Neuron SDK, is in its infancy by comparison.

Even if companies can port their projects to Amazon without much trouble, checking that the switch-over didn’t break anything can eat up hundreds of hours of engineers’ time, according to an Amazon and chip industry veteran, who requested anonymity to speak freely. An executive at an AWS partner that helps customers with AI projects, who also requested anonymity, says that while Amazon had succeeded in making its general-purpose Graviton chips easy to use, prospective users of the AI hardware still face added complexity.  

“There’s a reason Nvidia dominates,” says Chirag Dekate, a vice president at Gartner Inc. who tracks artificial intelligence technologies. “You don’t have to worry about those details.”  

So Amazon has enlisted help — encouraging big customers and partners to use the chips when they strike up new or renewed deals with AWS. The idea is to get cutting-edge teams to run the silicon ragged and find areas for improvement.

One of those companies is Databricks, which anticipates spending weeks or months getting things up and running but is willing to put in the effort in the hopes that promised cost savings materialize. Anthropic, the AI startup and OpenAI rival, agreed to use Trainium chips for future development after accepting $4 billion of Amazon’s money last year, though it also uses Nvidia and Google products. On Friday, Anthropic announced another $4 billion infusion from Amazon and deepened the partnership.

“We’re particularly impressed by the price-performance of Amazon Trainium chips,” says Tom Brown, Anthropic’s chief compute officer. “We’ve been steadily expanding their use across an increasingly wide range of workloads.”

Hamilton says Anthropic is helping Amazon improve quickly. But he’s clear-eyed about the challenges, saying it’s “mandatory” to create great software that makes it easy for customers to use AWS chips. “If you don’t bridge the complexity gap,” he says, “you’re going to be unsuccessful.”