Digging Deep, Seeking Gold

RCD
Feb 2
9 min read

Updated: Feb 5

It may just be a one time occurrence, but if not, there is nothing good about the DeepSeek story for the Tech hardware supply chain.

The DeepSeek panic this past week could be the milestone event that marks the popping of the AI Capex bubble. On the other hand, DeepSeek could be just another incremental evolution in AI model development. As this consultancy has stated in previous posts, we have to approach AI forecasts with some humility. Much like the California gold rush that started one hundred and seventy-seven years ago last month (Jan 2025), no one knows how much of that precious metal is in the field or how long it will last.

Many smart people have already dissected DeepSeek's impact on AI in news articles, blog posts, substack accounts, and social media. Thus far, the analysis runs the entire spectrum from dismissive to panic. Indeed, on Monday, January 27th, Wall Street was in panic mode.

The main claim circulating that day was that DeepSeek could achieve performance about as good as the latest models from OpenAI and Google with only a fraction of the investment cost (~$5.6M). When fact-checked against what their researchers said in their paper, it became clear that the quoted investment cost was only for the final training run. It did not include all the other costs associated with "prior research and ablation experiments on architectures, algorithms, or data."

Moreover, it appears that DeepSeek used existing models from OpenAI and other model makers to "distill" (i.e., train) their new R1 reasoning model. Practically, this means that DeepSeek researchers sent prompts to the other models and then used the responses to (at least partially) train their model. The irony of this teacher/student model relationship has not been lost. OpenAI had trained their model on copyrighted data scraped off the internet for years. Now, DeepSeek has returned the favor.

Regardless of these hacks, DeepSeek's design team expanded on and developed several architectural techniques that significantly improved model performance. They include:

Mixture of Experts
Multi-head latent attention
Lower precision parameter storage during model training
Multi-token prediction
Reinforcement "chain-of-thought" in contrast to supervised fine-tuning.

AI researchers are well aware of these techniques. However, DeepSeek combined them in their V3 and R1 models to achieve an impressive cumulative improvement. They did so out of necessity. The company was working with inferior processors due to export restrictions. (Although there are some questions on this point)

It's worth spending a few minutes identifying the connections between AI model innovations, hardware spending, and value creation. "Scaling" is the main pillar of the whole AI/LLM investment thesis. That is, the performance of LLM models improves proportionately as the models get bigger. Training and inferencing with larger models are linearly correlated to the compute resources available. The original scaling law was published in early 2020 by the folks at OpenAI just before the world went into lockdown. Since then, the results have been retooled and published repeatedly.

Model performance (as measured by benchmarks) can be linearly correlated to the amount of compute needed for training. Note how the latest models fell off the fitted curve. Source

This linearization of LLM performance with FLOPS offers a convenient transformation for folks designing hardware. It allows engineers to take the "valuable output," which is the performance of large language models (in benchmarks), and express it as something related to hardware (FLOPS).

The AI hardware value equation boils down to, in layperson's terms:

The guts of the equation are available, but not shown. They include terms like model flop utilization, system flop utilization, etc. None of which are essential for this discussion.

Model innovations increase value without automatically incurring hardware or energy costs. As the CEO of Anthropic has noted, architectural improvements shift the scaling curve. Until mid-2024, the conventional wisdom was that architectural improvements for training had slowed. To achieve better performance, frontier model makers had to ride up the scaling curve through costly innovations in hardware. Hardware had become the new gating factor. You can see all of these hardware innovations in the latest Blackwell server. We expect several more advancements along these vectors when Nvidia introduces its Rubin processor later in 2025.

However, a "paradigm shift" in AI happened in mid-2024 when OpenAI introduced its o1 model. The o1 model was able to perform chain-of-thought reasoning during inference. This reasoning capability, combined with synthetic training data and reinforcement learning, offered new avenues for architectural innovation. Advancements in chain-of-thought reasoning through model development meant that performance was no longer gated by the hardware. Moreover, the scaling laws change from correlating model performance to compute, to correlating model performance to test-time compute (see here and here).

Architectural innovations aren't actually free because you still have to pay the smart engineers. But they are much cheaper than the hardware costs of going to the latest generation fab node, improving thermal management, increasing interconnect bandwidth, or increasing power density. Architecture gains create more value because they allow frontier model makers to use hardware more efficiently.

Up until last week, it was widely thought that only a handful of well-funded frontier model makers with access to expensive hardware and huge capital spending budgets could develop better models. DeepSeek's R1 has shown that the shift to reasoning and the new avenues for model innovation has leveled the playing field.

The key question we don't know the answer to is: how much more architectural "gold" is left in those AI fields? If there is a robust path to further architectural innovations in reasoning and reinforcement learning, other innovative, scrappy companies like DeepSeek can come out of nowhere and neutralize the competitive advantage that an incumbent has accrued from the latest AI hardware. That capital spending moat evaporates, and it will force model makers into an arms race.

Most AI researchers suspected this would occur (i.e., the leaked Google memo from 2023). The rationale from a few years ago was that everyone had access to the same models and that open-source would drive innovation faster than closed-source models.

The folks dismissing this new competitive environment are basically saying that the value created by AI will be so great that it won’t matter. As long as you can stay ahead of the game, no matter how much it costs, there will be a return sometime in the future, somewhere in the value chain. It is an unbounded demand view of the market.

Because the value of having a more intelligent system is so high, this shifting of the curve typically causes companies to spend more, not less, on training models: the gains in cost efficiency end up entirely devoted to training smarter models, limited only by the company's financial resources. People are naturally attracted to the idea that "first something is expensive, then it gets cheaper" — as if AI is a single thing of constant quality, and when it gets cheaper, we'll use fewer chips to train it. But what's important is the scaling curve: when it shifts, we simply traverse it faster, because the value of what's at the end of the curve is so high.

—-Dario Amodei, Anthropic CEO

Mr. Amodei implies that there will be a willingness to pay for the best models because of the value that is unlocked downstream. Imagine how good the expensive models can get when they copy the DeepSeek innovations and improve on them. As long as frontier model makers can stay ahead of the game by improving with each new innovation, they will benefit from a premium market segment where the "winner takes all."

Innovations in the premium segments will then trickle down into the value (cost-sensitive) segments, rapidly commoditizing and increasing access to AI capability. Indeed, the ability to train smaller student models derivatives from larger base model teachers will accelerate this trend. There is a great graphic outlining this diffusion of AI capability from Lennart Helm and his team.

Better performing models increase performance for premium markets but also lower investment in compute resources, which increases accessibility to value parts of the market. Source

If the frontier model market develops in the way they are describing, the linear correlation of model performance to FLOPS is meaningless. The correlation reduces the value generated by AI in terms of its compute costs. A better approach would be to work backward and apportion the value created downstream for the users (i.e., growth in GDP, labor savings, etc.) But no one knows what the actual value of AI is downstream, and unfortunately, that is the real problem in the DeepSeek panic.

We have a lot of reservations about thinking of AI in terms of unbounded demand. And if your organization is in the hardware part of the AI supply chain, you should also be skeptical. History has not been kind to these unbounded market opportunities. And there is a trail of incinerated capital from previous technology disruptions to prove it (i.e., railroads, telephones, lighting, automobiles, radio, PCs, the internet, etc.) See this book for a good historical review.

Much of this history shows that disruptive new industries attract a rush of new players looking for opportunity. That "gold rush" inevitably leads to cut throat competition and rapid price erosion. The bigger the opportunity, the more players enter, and the stiffer the competition. And sooner or later every "gold rush" finds the upper bound where there are more competitors than there is room for in the market.

Sure. Rapidly falling costs are a net positive for AI adoption in the long run. It will mean more AI processors per capita and more AI for humanity. Whether that is good or bad is a different story. But it is a terrible investment if all the value creation gets quickly competed away into consumer surplus. DeepSeek showed that, at least for model makers, that could be a possibility. It is a low-cost disruption to an emerging high-Capex industry, and there is no alternate reality where Jevon’s paradox can rationalize a positive spin.

The closed model makers are likely to soon include "distillation traps" in their models to prevent freeloading. There is no way the big spending closed model makers will invest billions into their developments and allow scrappy startups to innovate off of them for free. We will find out quickly if the DeepSeek folks are mostly brilliant or mostly freeloading.

It may even start a new “cat and mouse” game. New entrants figure out ways to circumvent these traps, leading to more secure traps, leading to more clever evasion tactics, etc. We are not software experts, but it sure sounds like a new AI software security industry is emerging.

Even if models do commoditize, there are moats in the application layer, edge device layer, or bundling with the subscriber base where organizations can capture value and monetize the hardware investments. That is why hyperscalers are likely to be just fine. They can leverage their subscriber base (Microsoft, Google, Meta) to bundle AI functionality. Edge device makers (Apple, Tesla) will also be fine because they own the channel and the platform for delivering AI functionality. Finally, the application layer suppliers will have the deep expertise in their verticals to customize AI models so that they can perform valuable work (Salesforce, Oracle, SaS, and the hundreds of companies that operate in various vertical markets). Model makers will move to other parts of the value chain to develop new moats as well.

Unfortunately, reasoning models and lower entry barriers are a headwind for the Tech hardware industry and the supply chain. In our last client note in December, we had expected AI Capex to slow down in 2026 to digest the hardware that is being installed in 2025. That slowdown may turn into a full-scale pullback in 2026 and 2027 if it is determined there is still a lot more room for improvement in chain-of-thought reasoning models. It could be a long AI hardware winter. There is no way to sugarcoat it.

Conversely, bigger Capex bets (like Stargate, if it happens) could point to more confidence that DeepSeek's innovations are a "one-off," i.e., there aren't many more avenues for architectural gains, and/or new entrants will be “knee-capped” once the distillation traps are installed.

Hardware technology development will follow the same cues as Capex. The advantage of massively paralleled clusters would be less compelling if there is still a lot of room for reasoning model improvements. Those designs will give way to lower cost, smaller, and more distributed nodes. Power delivery, thermal management, interconnect, and memory bandwidth designs will become more conservative and crimp engineering investments in next-generation CPO, HBM, liquid cooling, and vertical power delivery technologies.

Even if Mr. Amodei's assessment of unlimited AI demand is accurate, the commoditization assumed in that scenario will quickly trickle down to the hardware. Yes, there will likely be many more AI processors per capita, but they won't cost $35k each. They would probably cost closer to $350 each-100x less. That massive cost drop implies that AI could move to edge devices much faster than previously thought. This scenario looks similar to the super computing/PC market dynamic back in the 1980s and 1990s. The big iron analogy we have discussed in a previous post is looking more accurate in a post-DeepSeek world.

The bottom line is that no matter if you believe that emerging chain-of-thought reasoning architectures are a positive or negative for AI, all hardware roadmaps will get pushed out until all the juice is squeezed out of reasoning architectures. The silver lining is that as the infrastructure Tech growth cycle wanes, a new personal electronics growth cycle could begin, opening up opportunities for a different set of suppliers and technology vectors.

In contrast, if DeepSeek is a one-off scenario, it will likely signal continued aggressive investments in next generation technologies like co-packaged optics, co-packaged electrical, micro-channel liquid cooling, vertical power delivery, etc.

The next few months will be interesting. Watch Capex announcements carefully.

Despite some "hacks," DeepSeek's results are impressive. It has given every AI industry stakeholder a reason to reassess their positioning and ask the question, just how much gold is left in those architectural fields?

Note that nothing in this post should be construed as investment advice. If you find these posts insightful, subscribe above to receive them by email. If you want to learn more about the consulting practice and how we help organizations in the Tech hardware supply chain, contact us at info@rcdadvisors.com.

It may just be a one time occurrence, but if not, there is nothing good about the DeepSeek story for the Tech hardware supply chain.

info@rcdstrategicadvisors.com +1 631 825 7280

info@rcdstrategicadvisors.com
+1 631 825 7280