The Blackwell Delay: A Yield Snag or a Sneak Peek at the Future?

Sun Sep 08 2024
Advertisement
Nvidia's recent quarterly results brought both good news and disappointment for investors. While revenue and earnings exceeded guidance, the company's fiscal Q3 outlook fell short of expectations due to the delay in its next-generation AI accelerator chip, Blackwell. Nvidia management clarified that the issue with Blackwell was not related to functional design flaws but rather a problem with the chip's mask that affected production yield. The CEO reiterated that there were no issues with the chip's functionality itself. Developing the optimal mask for advanced chips like Blackwell is an incredibly complex process, and even with TSMC's investments in computational lithography, the prediction of the actual pattern produced by the mask doesn't always match reality. The notion that Nvidia's engineering challenges stemmed from the size of the Blackwell chips is a misinterpretation. Nvidia has been making its flagship GPU accelerators at TSMC's reticle limit for years. Blackwell consists of two such chips, but the process to make each chip is essentially unchanged from Hopper. The approach used by Nvidia for Blackwell is certainly less complex than what AMD uses for its flagship data center accelerator, the MI300X. While AMD claims performance parity in AI training compared to Nvidia's Hopper H100, Nvidia has released results showing a significant improvement in both training and inference performance with Blackwell.
The MI300 series chips are powerful accelerators but fall short of Nvidia in AI performance as they were not originally designed for AI workloads. AMD is expected to release the MI325X soon, which will improve memory capacity and bandwidth compared to the MI300X, but it likely won't close the gap with Blackwell. While some wafers had to be scrapped due to the Blackwell mask issue, contributing to a sequential decline in gross margin from 78. 35% in fiscal Q1 to 75. 15% in Q2, Nvidia's fiscal Q3 guidance still showed impressive revenue and net income growth on a year-over-year basis. However, the expectation that revenue growth should always be proportionate to revenue is probably unrealistic. A better way to look at Nvidia's Data Center revenue trajectory is by considering the sequential revenue difference or delta. The slope of this line has been nearly constant since fiscal Q2, indicating sustainable growth. Blackwell represents a quantum leap in AI performance and computational efficiency, which is crucial as the world revamps its data centers to keep up with the growing demand for AI. This revamp will be driven not only by AI but also by the versatility of GPU accelerators for various workloads like big data analytics, video streaming, game streaming, and metaverse applications. I expect Nvidia to remain a dominant player in the data center space over the next decade, despite competition from AMD, Google's TPU, Amazon's in-house developed chips, and even Apple.
https://localnews.ai/article/the-blackwell-delay-a-yield-snag-or-a-sneak-peek-at-the-future-391dd0fe

actions