Amazon.com Inc. (NASDAQ:AMZN) has debuted AWS Trainium, which is a custom chip designed to offer cost-effective training of machine learning models in the cloud. This comes ahead of the availability of the new Habana Gaudi-based Amazon Elastic Compute Cloud instances created for ML training and powered by the Gabana Gaudi processors from Intel Corp (NASDAQ:INTC).
Trainium to offer enhanced price performance on ML training in the cloud
The company indicated that AWS Trainium would offer enhanced performance compared to rivals in the cloud with PyTorch, TensorFlow, and MXNet support. It will be available inside Amazon SageMaker and EC2 instances. The company said that instanced based on the next-generation custom chips would launch in 2021. The custom chips’ upside is their speed and costs, with AWS saying the throughput will be 30% higher and cost per inference will be 45% lower than the current AWS GPU instances. Amazon claims that Trainium will deliver most teraflops for ML instances in the cloud with a teraflop translating to a chip processing 1 trillion calculations per second.
Andy Jassy, AWS’s CEO, said that the company wants to push price performance on ML training, and therefore the company has to invest in its own chips. He said that there is an unmatched instances array in AWS accompanied by chip innovations. Andy was speaking during this year’s Amazon re:Invent developer conference.
AWS Trainium to complement Inferentia
The new offerings will complement AWS Inferentia that Amazon launched last year. Inferentia is an inferencing counterpart to machine learning pieces that also uses custom chips. It is important to note that Trainium will employ the same SDK as Inferentia.
AWS noted that although Inferential addressed the inference costs, which comprise around 90% of ML infrastructure costs, most development teams are still limited by fixed machine learning training budgets. As a result, this limits the scope and frequency of training necessary to enhance models and applications. Fortunately, AWS Trainium will address the challenge by offering enhanced performance and low machine learning training costs in the cloud.