Folks who have read this blog or know me, are already aware that I am a value investor at heart and a contrarian at that! My passion lies in uncovering hidden gems—investment ideas that go against the grain of popular opinion. This approach hasn’t been easy in recent years, as the market has often favored growth and momentum strategies. However, the thrill of discovering undervalued assets or contrarian positions that I strongly believe in is irresistible to me. When I find an opportunity that I have high conviction in, it’s difficult to contain my excitement.
Before diving into my current excitement, let me highlight two recent instances that equally captivated my interest as a contrarian investor. The first occurred when Netflix’s stock plummeted below $200 following two consecutive quarters of missed earnings and subscriber growth projections (see here). This significant drop presented a potential value opportunity that went against prevailing market sentiment.
The second instance involved Google’s BARD AI mishap. During a demonstration, BARD incorrectly claimed that the James Webb Space Telescope took the first pictures of an exoplanet outside our solar system (see here). In reality, this milestone was achieved by the European Southern Observatory’s Very Large Telescope in 2004. This error led to a sharp decline in Google’s stock, with shares falling approximately 7% initially and an additional 5% the following day (see here). The incident highlighted potential limitations in Google’s AI capabilities and created a contrarian investment opportunity as the market reacted strongly to a single misstep in the rapidly evolving AI landscape.
In both instances, I was convinced that the market reaction was overblown. Regarding Netflix, I wrote a note to myself justifying a potential price recovery based on several key observations. First, Netflix was significantly overspending on their production budget compared to other studios, presenting an opportunity for cost optimization. Second, password sharing was rampant among Netflix users, undermining revenue potential. Third, growth opportunities in emerging markets, particularly in Asia, were largely untapped. Finally, Netflix had not yet explored an ad-supported business model to cater to price-sensitive customers.
These were all addressable issues, and I believed that as management tackled these problems, the stock would recover. My analysis proved prescient, as Netflix has since cracked down on password sharing, leading to substantial subscriber gains and increased revenue. The company has successfully expanded into emerging markets, with the Asia-Pacific region now accounting for a significant portion of its global subscriber base. Additionally, Netflix introduced a lower-priced ad-supported tier, expanding its market reach. As a result, Netflix’s stock has more than quadrupled from its 2022 lows and risen over 60% year-to-date in 2024, validating this contrarian perspective.
Given this context, I put a substantial (by my standards) long position in Netflix, dollar-cost averaging at approximately $175 per share. My conviction was reinforced by the actions of renowned value investor Bill Ackman, who had established a significant long position in Netflix a quarter earlier. Although Ackman exited his position after Netflix’s share price dropped 50% for the second consecutive time, I remained steadfast in my investment thesis. I recognized that while buy decisions are often based on specific rationales, sell decisions can be influenced by various factors, potentially weakening their signaling power. As such, Ackman’s initial investment in Netflix, despite his subsequent exit, bolstered my confidence in maintaining a sizeable long position in the company.
The Bard fiasco presented a straightforward investment opportunity. As an AI scientist with extensive experience in deep learning models, I recognized that the market’s reaction was disproportionate to the actual issue. AI model inference errors are a well-known challenge in the field. Models often struggle to produce accurate answers in real-world settings, especially when faced with data that differs from their training environment. The incident highlighted the gap between lab-trained AI and real-world performance, a challenge that AI researchers and developers constantly work to address. However, the market’s severe reaction to a single inference error was in my view clearly overblown offering me an excellent contrarian buying opportunity.
Returning to the focus of this blog post, I find myself once again excited about potential opportunities in the tech sector. However, this time my approach diverges from my typical investment style. Instead of seeking long positions, which has been my usual strategy, I’m now considering short positions. This shift is particularly noteworthy given the current market environment, where my view stands in contrast to the prevailing consensus. In today’s market, where tech stocks have seen significant rallies and AI-related optimism is high, adopting a short position is decidedly contrarian.
Specifically, what I am going to do now is build a short thesis on Nvidia, the darling of AI! My thesis is built on two main points:
- Macro-level concerns about Nvidia’s stock valuation and the target on its back.
- Recent news out of China, particularly the public release of a large language model called DeepSeek by a small Chinese AI firm.
I want to note that as I am writing this post, the markets have already reacted to the DeepSeek news, and Nvidia stocks are currently trading at a significant discount. Based on the latest data, Nvidia shares closed at $118.63 today , representing a 17% drop from the previous trading day
Let me begin with a high-level macro view that I’ve held for some time now: Despite the hype surrounding AI, its massive total addressable market (TAM), and its promising future, why should Nvidia maintain such dominance? This question brings to mind several examples of groundbreaking technologies from the past and the fallacy of their impact from an investment perspective.Consider the following historical parallels:
- Railroads in the 18th century
- The Wright brothers at the dawn of the 20th century
- Radio in the mid-1920s
- Internet stocks during the dot-com era
In each of these cases, revolutionary technologies transformed industries and societies, yet the pioneering companies often failed to deliver sustained returns to investors. The initial market leaders frequently lost their dominance as competition intensified and the industry matured.
What is the secret sauce for Nvidia to continue to defy the historic odds? Is it GPUs? While Nvidia certainly leads in GPU technology, other firms are rapidly closing the gap. AMD’s RDNA 4 lineup, including the RX 8800 XT, is competitive with Nvidia’s RTX 4080 and 4080 Super offerings. In the data center space, AMD’s next-generation GPUs, the MI325X, which began shipping in late 2024, offer significant performance improvements over Nvidia’s H200.
Specifically, AMD claims the MI325X delivers up to 40% faster throughput with an 8-group, 7-billion-parameter Mixtral model, 30% lower latency with a 7-billion-parameter Mixtral model, and 20% lower latency with a 70-billion-parameter Llama 3.1 model compared to Nvidia’s H200 (see here and here)The MI325X also boasts 256GB of HBM3E memory and 6 TB/s memory bandwidth, surpassing Nvidia’s H200 in several key areas.
Even Intel, despite its challenges, is making strides with its upcoming Falcon series of GPUs, which are expected to be competitive in the AI and HPC space. This increasing competition suggests that Nvidia’s dominance may face challenges in the near future, as the GPU market becomes more competitive and diverse.
As far as pricing goes, AMD’s offerings are significantly more cost-effective than Nvidia’s GPUs when measured by dollar cost per FLOP of compute. Recent data shows that AMD GPUs can be up to 50% cheaper in terms of price-to-performance ratio. For instance, the AMD Radeon RX 7900 XT 20GB offers FP16 performance at $6.80 per TFLOP, while Nvidia’s RTX 4090 24GB comes in at $5.31 per TFLOP (see here).
This pricing advantage doesn’t significantly impact trillion-dollar tech giants with vast capital expenditure budgets, who prioritize cutting-edge performance for building larger language models. However, the cost difference becomes crucial for smaller companies and researchers operating with limited budgets.
It’s worth noting that the performance gap between AMD and Nvidia is narrowing. AMD’s RDNA 4 architecture is delivering competitive performance, especially in rasterization and improved ray tracing capabilities. This evolving landscape suggests that Nvidia’s technological edge, particularly in AI applications, may be eroding as competitors catch up.
If it’s not GPUs, then what’s the special Nvidia sauce? For those familiar with building AI models using deep learning, it’s clear that Nvidia’s CUDA programming framework is the key differentiator. CUDA allows developers to write low-level code optimized for GPUs, enabling significant performance gains in AI and deep learning applications.
Open-source libraries like PyTorch are built to perform exceptionally well with Nvidia GPUs, leveraging CUDA for GPU acceleration. PyTorch 2.0, for instance, has shown significant performance improvements over previous versions, particularly when using CUDA-enabled Nvidia GPUs.
Another area of Nvidia’s superiority lies in their ability to make multiple GPUs work harmoniously and in synchronization, jointly leveraging the entire GPU stack to train large foundational models like ChatGPT, Claude, or Google’s Bard. This capability is made possible through interconnect technology, which Nvidia acquired through its strategic purchase of Mellanox Technologies, an Israeli company specializing in end-to-end interconnect solutions for data centers and high-performance computing (see here).
Nvidia’s NVLink technology, a key component of their interconnect solution, provides high-bandwidth, low-latency connections between GPUs, enabling them to act as a unified system. The latest generation of NVLink can achieve up to 600 GB/s of bandwidth, significantly improving data transfer rates and enhancing performance for computationally intensive applications (see here)
It’s important to note that interconnects are particularly crucial for training large language models, where leveraging all available GPUs simultaneously is essential. For inference tasks, where pre-trained model weights need to be stored, VRAM capacity becomes more critical than interconnect speed.
The flywheel effect of Nvidia’s ability to pump huge sums of dollars – which markets have offered them through their sky-high valuation – into R&D has allowed the company to further extend their lead in cutting-edge GPU performance.
By reinvesting their massive market capitalization into research and development, Nvidia continues to distance itself from competitors. Their ability to spend billions on technological advancement creates a compounding advantage, making it increasingly difficult for other firms to catch up in this specialized technology sector.
The more successful Nvidia becomes, the more capital they can invest in future innovations, creating a self-reinforcing cycle of technological leadership and market dominance.
As soon as ChatGPT was released to the public in late 2022, the AI race intensified, with big tech companies investing heavily in Nvidia’s latest GPU offerings. With limited competition from other semiconductor players, Nvidia was poised for massive gains. This triggered a surge in AI R&D across major labs, attracting top talent with lucrative compensation packages. Nvidia emerged as the primary beneficiary over the past two years. For example,
Nvidia’s revenue for Q4 fiscal 2024 (ended January 28, 2024) reached $22.1 billion, up 265% from a year ago (see here). Such an eye-popping growth is sure to attract competition, such is the brutality of capital markets.
As such, even in areas where Nvidia is clearly dominant – GPUs with efficient interconnects for training and CUDA as a software stack – competition is brewing.
Cerebras, a Silicon Valley startup, has built their flagship wafer-scale AI training chip, CS-3. This chip packs 4 trillion transistors and 900,000 AI-optimized cores, more than 50 times that of Nvidia’s H100. The CS-3 is designed to simplify AI workloads, allowing researchers to train trillion-parameter models without complex partitioning or refactoring (see here).
Another startup, Groq, has taken a fundamentally different approach to scaling AI training. They’ve developed a “tensor processing unit” (TPU) designed around “deterministic compute,” where chips execute operations in a completely predictable way every time, unlike traditional GPUs (see here and here). Groq has also developed a specialized Language Processing Unit (LPU) for AI workloads like large language models, claimed to be faster, cheaper, and more energy-efficient than NVIDIA’s GPUs for inference tasks.
Samba Nova Systems is another notable startup offering powerful AI accelerators for model training. Their SN40L chip uses a Reconfigurable Dataflow Architecture, specifically designed for AI workloads, offering advantages over the instruction set architecture used by Nvidia GPUs, such as efficient data movement and resource utilization for large model training (see here).
Besides these small (on a relative scale) startups, Nvidia is facing severe headwind from its own large scale customers– the likes of Google, Amazon and Meta.
For brevity, I won’t get into the competitive threat posed by these firms in this article. Instead, I want to focus on a piece of news from China last week that has me most excited and what I believe will be a pivotal moment in bursting the bubble that is Nvidia’s stock price!
For the nerds out there, it would be worth their time to spend their weekend reading the Deepseek papers: see DeepSeek-V3 Technical Report and a more on their approach with Reinforcement Learning, detailed in here.
Let state at the chart from the first page of their Reinforcement Learning paper paper,
We see that,
- In reasoning tasks, R1 performs on par with or slightly behind o1, particularly in the ARC AGI benchmark
- In mathematics, R1 shows competitive performance, though o1 maintains a slight edge
- For coding tasks, R1 is competitive with o1, and its significantly lower cost makes it more attractive
- In creative writing, R1 shows exceptional capabilities, often matching or exceeding o1’s performance
While impressive in itself, what is mind blowing is the table on their training cost, assuming the reported numbers (which are for DeepSeek-V3) are in close ball-park range to those required to train the R1 model with reinforcement learning.
For comparison have a look at the following chart from VisualCapitalist.com
Last year GPT-4 model training cost ~78 M, which was surpassed by Googles, Gemini Ultra which cost close to $200 M to train. At $5.4 M, Deepseek model was trained at fraction of the cost of these large models. What gives?
(For a detailed break down of costs involved for Deepseek and Meta’s Llama models, see here).
How is it possible that a small Chinese startup is able to train a model on par with the best-of-the-best at a fraction of the cost of what it would require for big-tech Silicon Valley firms, with brainpower, capital, and resources that would by multiple folds surpass those of DeepSeek? To think of it, it is not the first time this has happened. Think of India, another 3rd world country who managed to successfully launch a rocket to the moon with a space program costing less than one year’s budget of NASA. To add salt to the wound, China was barred from access to the latest and greatest of Nvidia GPUs, which were supposedly essential to train these large-scale foundational models.
The stark contrast between DeepSeek’s resource-constrained development and the massive investments of Silicon Valley tech giants underscores a potentially disruptive trend in technological innovation. By demonstrating comparable performance at a dramatically lower cost, DeepSeek challenges the conventional narrative that superior technology requires massive capital expenditure. This achievement mirrors India’s frugal space program, proving that innovation can emerge from unexpected sources, often constrained by limited resources but driven by intellectual ingenuity and efficient problem-solving.
In an ironic twist, I asked Deepseek to summarize the key ideas behind their technical report based on my notes from reading the published technical papers and here’s what it spat out ( I have highlighted in bold, the key ideas that I penned down in my review of the work)
DeepSeek has introduced several groundbreaking innovations to enhance the efficiency and performance of large language models. One major advancement is their mixed-precision training framework, which uses 8-bit floating point numbers (FP8) throughout the entire training process. Unlike traditional methods that rely on 32-bit precision (FP32), FP8 sacrifices some precision to significantly reduce memory usage and boost computational performance. DeepSeek’s approach cleverly breaks numbers into smaller tiles and blocks, strategically applying high-precision calculations only where necessary. This native FP8 training avoids the quality loss typically seen in post-training compression, drastically cutting GPU memory requirements and enabling more efficient scaling across thousands of GPUs. Additionally, they developed a multi-token prediction system that predicts multiple tokens simultaneously while maintaining high accuracy (85-90%), effectively doubling inference speed without compromising quality. This system preserves the causal chain of predictions, ensuring structured and contextually accurate outputs.
Another key innovation is their Multi-head Latent Attention (MLA) mechanism, which compresses Key-Value (KV) indices—critical components of the Transformer architecture—directly into the training pipeline. This compression reduces VRAM usage significantly while maintaining model performance, as it forces the model to focus on essential information rather than noise. DeepSeek also optimized GPU communication efficiency with their DualPipe algorithm, which intelligently overlaps computation and communication tasks, maximizing GPU utilization. Furthermore, they employ a Mixture-of-Experts (MoE) architecture with advanced load balancing, enabling efficient scaling of model parameters without requiring excessive hardware resources.
The cost structure for using OpenAI-o1 vs DeepSeek-R1 through API calls as listed below, offers another clue to suggest that indeed the input costs for DeepSeek are quite low. The cost differential is so large that even if we are some what skeptic of the performance numbers, the value for money is so large that it should pretty much kill openAI for most consumer grade user applications.
Imagine the consternation of big-tech CEOs who have allocated huge sums as part of CapEx for model training when a small startup can produce a comparable model using minuscule resources and dollar amounts.
All of the above would be moot if Nvidia was trading at reasonable market multiples. Nvidia had a stunning fiscal year 2024 by all measures. Revenue for the fiscal year 2024 (trailing twelve months) nearly doubled to 113 B, with gross profits soaring to 86 B and free cash flow numbers growing to 41.9 B.
Before the DeepSeek development, consensus projections for 2025 estimated growth numbers to be upwards of 40%. As a result, Nvidia’s stock returned an impressive 183% in 2024, significantly outperforming the S&P 500’s stellar 25% gains. The stock is priced accordingly, with a trailing EV/EBIT of 48.7 and a price-to-sales ratio of 30 times trailing numbers. These valuation metrics reflect the market’s high expectations for Nvidia’s continued dominance in the AI chip market and its potential for future growth.
Ignoring all the threats to Nvidia’s dominance and the DeepSeek development, if we assume a 40% annual growth rate for Nvidia starting from now for the foreseeable future (5 years out), and use a base discount rate of 10%, the company would need to grow in perpetuity at approximately 7.15% to justify a $142 share price, which was the trading price last Friday.
Let me repeat, on top of growth at 40 % for next five years we expect Nvidia to grow in perpetuity at >7 %. It is mind-boggling to me to see these valuation numbers! With Deepseek, I doubt the 40 % growth numbers to survive let alone the perpetuity number of 7 % growth!!
As such, I now have a very compelling reason to go short on Nvidia!