An interesting implication of this is that AI inference and training has a path to a ~3x hardware cost reduction (and maybe ~2x total cost reduction) without any technical innovation whatsoever, we just need to wait for dram supply to meet demand (either by manufacturing scaling or just waiting for the current rate of manufacturing to fill the demand spike).
The memory makers will not expand demand drastically. It is in the nature of their business to keep the market under-supplied, otherwise the following oversupply will kill them. Instead, supply is just rerouted from less profitable segments such as mobile and personal computing.
China doesn't have EUV fabs... They've pushed DUV impressively far... but until they get EUV working industrially (and reasonable timelines are at least 2-4 years for that) it shouldn't be possible for them to compete for that market.
The future is here now, it’s just not evenly distributed. China will mass produce something to the point that it is widely distributed. That is how China acts as a great equalizer on a global scale.
Another way China is a great equalizer is their willingness to do business with anyone that can pay.
No, most of the rest of the large developed economies have some standards (e.g. against buying conflict minerals) and sanctions against certain regimes. China is quite happy to ignore that if they can get away with it.
China is willing to do business while giving zero fucks about the environment they are destroying and the global warming they are causing. It really blows my mind people support the china thing so much around here.
> but until they get EUV working industrially (and reasonable timelines are at least 2-4 years for that)
Does this not count as soon? How often do you buy new computers? That seems pretty soon. I remember a year or few ago being told it'd never happen so they're already infinity years ahead of schedule if we accept that as reasonable. The rate they're pulling ahead of expectations appears to be so sharp there is a risk they leapfrog EUV to go on to the next big thing.
How much does the node size matter for dram? My understanding was that it’s been marginal gains on sram since about 7nm TSMC. I would naively expect the capacitor size requirements not to shrink as well as logic, does the smaller transistor make up for the lower capacitance, or do they have to run at higher frequencies and refresh more frequently?
Not an egalitarian society, but their companies have a honey-badger like mentality from what I have read, where they ruthlessly reduce costs and margin down past where non-Chinese companies cannot compete.
But they do build infrastructure, usable, infrastructure, ever already built that railway all the way to Tehran, and once the war is over between Ukraine and Russia, they almost certainly will build high speed rail all the way to Europe.
Would’ve been nice if the United States had built a rail system to north to Alaska or even a rail system to Chile to the south?
I guess doing things like that are hard to do when you’re busy fighting multiple wars since the early 1950s.
There are no egalitarian societies. Societies in the west favour the super rich and believing anything else is simply delusional. Sorry to burst the bubble.
If you think that South Africa is absolutely egalitarian, you're wrong.
If you think that Norway is absolutely egalitarian, you're also wrong.
But if you think that "South Africa is egalitarian" is as wrong as "Norway is egalitarian", then your views are more wrong than both of them combined.
To state that no country is absolutely egalitarian does not mean that "China is hardly egalitarian" has to be wrong. And even if some other country (say Norway) were to be as hierarchical as China, that would not disprove the claim that China is hardly egalitarian. It would just mean there exist other inegalitarian countries too.
The thing about China is that they will iterate and iterate some more until they get there and then once there, well like BYD they will disrupt the entire market the cozy days of resting on your laurels for the American/Korean memory companies is over.
The big three memory makers will probably face their last big payday. I hope they enjoy it, as China will dominate the global memory market in three-five years due to their short term greed.
Apple will likely bring memory in-house, like they did with CPUs and GPUs. Anyone questioning the time it took to replace Intel and Qualcomm should consider the Chinese expansion in the memory market, which makes it a long-term necessity.
Apple has the money, and while its competitors have spent/squandered $1 trillion on the AI data center fiasco. Apple made a decision to stay away from the blast crater.
Meanwhile Apple which also has the expertise in engineering and chip design can do what is necessary and bring memory in house. Note: Nvidia and Broadcom have also been replaced along the way by Apple also.
Who knows maybe Intel will condescend to do memory too?
There is a certain amount of capacity to produce memory. They are building new facilities but it takes a long time. They have been burned going down this route many times in the past (e.g., losing money, firms that are no longer in business).
The Korean memory makers are playing the same game as Micron and simply moving existing capacity up-market.
GP was referring to upstart Chinese memory manufacturers like ChangXin, who - if their yields manage to catch the wave - could not have asked for a more favorable market after the big 3 have abandoned the consumer segment. Consumers who would have otherwise turned up their noses at CXMT will not have the luxury.
Chinese manufacturers will probably takeover consumer ram that most of us use as current manufacturing contracts expire and Samsung SK and micron move all their production to HBM for data centers. Corsair recently released chinese chips based DDR5 sticks.
I've so far been anti-chinese memory in my recommendations, not because it's Chinese (I don't really trust any big organisation/govt other than the EU?) but because they've been very new, and it's not worth PC stability for saving $50.
However with corsair giving it their blessing, and their technology having matured a bit (a lot?), and more reviews showing good stability (longevity I suppose, is TBD) they're definitely worth recommending these days.
Right, so exactly two countries in the world control most of the memory market (US and Korra). As the person above said, more global competition would be great.
The Chinese healthcare system is not so different from that in the US. It is primarily employer-sponsored insurance, with subsidized insurance programs for poor and rural folk.
Up until 2005, roughly 10% of the population couldn't even access healthcare, at which point the PRC built out more care centers and invested in training more doctors, but there's still a significant shortage, such that scalpers sell outpatient appointment tickets for 10-15x markup over the actual appointment cost.
There's plenty of ways the two countries are different, but healthcare seems like an odd choice to try to "one-up" the US on, even if its programs like medicare, medicaid, social security disability and others still leave gaps.
First of, even per capita, the USA is at 8th place while China is 74.
For sure China has a problem due to its gigantic size and amount of people to even be able to reach its people, but the health care costs are nowere as high as USA has. USA is actually the country with the highgest % of GDP spend for health care alone.
Just checkout a YT video from an US American going to a normal chinese hospital and then compare the bill.
And in parallel the USA is dismantling medicare, medicad and co.
This is also directly reflected in the life expetency: US Americans are getting less old than Chinese people.
Rural China has a per capita GDP 1/10th of the US adjusting for purchasing power; if their health care wasn't cheaper only the very richest could afford anything at all. Even the wealthy coastal areas are 1/3 of the US.
China and the US have the same life expectancy of 79 years, which is a very recent phenomenon due to the 2005-2018 changes I mentioned earlier. Obesity, lack of exercise and other cultural factors weigh down the US life expectancy compared to all other Western nations. China's use of abortion during the one child policy era also prevented a lot of people who would have had chronic medical conditions and disabilities from being born.
It is not yet true, however, that Americans are getting "less old", though it may soon depending on how China manages it's own growing obesity problem and tobacco use.
In a thread about China, I replied to a post about how American healthcare is bad.
My main argument, which is in my first comment, is that healthcare is a bad way to show the difference between China and the US, since they are actually have a lot of similarities, especially with access at the lower end of the income spectrum.
There's literally no reason to bring other countries into the conversation other than to say "US is bad", which does nothing to change the reality of healthcare access in China.
Why would I, as an American engineer and user of tech hardware from China for quite some time now, need to immerse myself in the Chinese factories as if they are somehow worse than other ones throughout the world?
It's not that I cannot see them; it's that I don't care. And I certainly don't entertain sanctimonious "le China bad" nonsense like that. You live in a country where every 5 years has been substantially better than the previous 5 for a while now. You don't understand your own "privileges" if you can't understand what it's like to live in a country on a free fall that has foreclosed on the possibility of ever building anything again unless it's the latest investor ponzi scheme.
China can afford and has the political will and power to centrally plan parts of the economy it feels like planning. Cars are obvious examples and if dram is next, western manufacturers should brace for impact.
They’ll also go out of business if they make a massive investment to increase supply, and then the “AI” bubble pops, cratering demand. It’s a tough spot.
man i keep thinking. why cant india get into stuff like this. Do their own manhattan project to build factories and tech for this and immigrate experts with high salaries.
During Mao/cultural revolution for all the bad, two good things were great focus on K-12 education and reducing religious fundamentalism (curbing religious powerhouses). Both of them are now biting India, alongwith standard problems of corruption that plague most poor democracies.
As an Indian, my quality of life would be improved more by the Indian state first figuring out how to make functional roads and garbage collection systems
India needs to first figure out the absolute basics
I am typing this on my return flight home from a business trip in Delhi. There are many other areas the Indian government needs to be focussing on first.
I had a similar view to you ~2 weeks ago. Spending some time there very quickly made me realize that there’s a lot of other things that are much more pressing.
Thanks to Indian constitution and other laws, incompetence is heavily praised and promoted. Go to any public high/elementary schools, go to any private colleges, it is rampant. It has taken three generations to produce these bad results. Of course, I am not saying that students are dumb. It is just that many smart kids in villages if given an opportunity, will leave India in a heart beat because they don't see a bright future for their future kids.
Indian bureaucracy thinks extremely short term. The state prioritizes extracting revenue over all else. So many baffling short-term decisions over everything from corporate tax breaks to a incentives for global events like Formula 1.l
You really can’t expect the same bureaucratic setup to think in terms of the decade+ it would take to be competent at something like chips
Indian bureaucracy exists to enrich themselves for their next ten generations. The bureaucracy itself promotes, recruits incompetents. There is no way to reform such a bureaucracy.
The whole system needs to be dismantled while an alternative system gets built; given the nature of Indian politics (freebies/jobs/reservations to certain groups; monthly stipends to certain groups by borrowing money while at the same time looting public funds), it is impossible.
I suspect Chinese factories will get built first, but quality may take a few years to really nail down.
Basically:
China floods the market with cheaper but less QA'd parts, makes a gazillion dollars, is able to spend said money to fix yields / QA issues and streamline operations, by the time that happens Micron and maybe a few other existing players will have new memory production, and then we'll have a flood of cheap, reliable memory. 4yr, maybe?
How long would it take an aggressive company to expand production capacity? I always thought it takes a few years, at minimum, for even established players to stand up new fabs
As far as I can tell, Micron and SK Hynix are using EUV lithography and may be constrained by availability of the equipment, whereas CXMT does not have EUV machines. There were reports that EUV lithography is needed for high yields, but CXMT appears to be proving that wrong.
It is not a law of nature that Chinese products are lower quality (cf. electric cars) and I don't see why they would go for that. They can just bin what they produce like everyone else and sell their products for what they have been tested to deliver.
But it is a near law that the first to market attempts will fully embraces the deeply engrained culture of 差不多, until market forces beat it out of the product line.
The west absolutely loves enshitified products. So why not sell them what they want? If they wanted quality they would pay slightly more and do something about it.
Because we buy that stuff even without it. And if you make both good and crappy products, why sell the good stuff internationally?
The US did it when it was a bigger steel supplier, good steel was sold domestically, crappy steel was sold elsewhere. If you got crappy steel in Africa at the time you might have thought US steel was garbage with poor QA, but in reality US steel was great and they just shipped the crappy stuff because people still kept buying it.
China is a gigantic country where one in 6 humans live that either produce directly or indirectly, 70%+ of the world's goods.
It's quite difficult to make general statements at such a gargantuan scale encompassing every single sector.
China has an abundance of terrific QA in electronics and advanced technologies as much as it has an abundance of the opposite, just simply due to its sheer size.
Don't think they'll flood the market. Instead gov will subsidize entire vertical (gpu, memory and power) - you'll just buy deepseek tokens on the cheap, just like EVs, solar and batteries. In return you'll give away your data.
This is wrong. It is NOT in their nature to keep the market under-supplied -- eg, Samsung, the industry's largest company, was notorious for expanding their capacity during the industry downturn to gain market share while everyone else was cutting back to minimize loss.
I'm guessing you are also probably unfamiliar with the terms like "chicken game" which refers to the cutthroat, high-stakes price wars where dominant semiconductor manufacturers intentionally overproduce and slash prices. This is literally how the industry went from dozens to just three majors today since the 80's.
You're making the point for him. Undersupply in a boom, store cash to ramp up capacity in a downturn. Presevres capital and avoids overcapacity during the turning
This sounds like a plan to sell less when prices are high and more when prices are low. That is one of the stupidest strategies a company could adopt. I assure you, the RAM makers are pumping out as much as they can and increasing capacity as fast as they think the market can handle.
I'm not sure what world we live in when the scheming capitalists are all hunched around their table working out how to dodge selling their products into an enormous price boom. Do they not like money all of a sudden?
Building new capacity takes years. The idea is that the market is reliably cyclical, so you should expand when there is a downturn, when costs are low and you can afford the short-term capacity hits that expansion causes (fe. when you divide productive teams in two and fill both halves to full strength with new hires).
The industry is so naturally prone to oversupply that the only stable equilibrium is undersupply. Aggressive expansion kicks off a price war, which immediately undercuts the logic of the expansion.
This only changes with new entrants, which will come, especially from China. But it takes time to build fab capacity, so the medium-term modal outcome is consistent undersupply.
If the existing memory makers retains control of the market and don't defect from the optimal-long-term equilibrium for themselves, that's true. It just takes one player to defect for short term gains as we've seen with some past boom-and-bust cycles. Alternatively, it takes a sufficiently-resourced player with enough incentive to enter the market themselves (NVidia, Google, Amazon, the PRC government through one of many companies...)
I struggle to think of a line of business as cyclical as DRAM, maybe like certain kinds of mining would be my only thought.
The DRAM fabs have been on a roundabout for 40 years going from getting accused of price fixing and cartel behavior, to struggling to keep the lights on.
And imo it's not really their fault, it's all the lead time of advanced semiconductors, combined with the commodity dynamics of oil.
And the goal is to match that supply to the demand of everything from consumer electronics to more datacenters than you can shake a stick at.
It's maddening to try and solve that, so at this point I really don't fault them for prioritizing survival.
> from getting accused of price fixing and cartel behavior
"Accused" makes it sound like these things may still be up in the air, when they very much are not. I would choose instead the much clearer "A number of those involved in DRAM production have a proven history of cartel behavior and price fixing."
For those who may not be familiar with some of the history in this area:
I said accused mainly because the big 3 won their last antitrust suit in the US, sort of "what have you caught me for, lately?" approach.
For all I know, maybe they are dumb enough to try and actually coordinate again, my hunch would be no, or they've tried something new and inventive.
Like Matt Levine talked about how so many landlords were using the same software to set prices, that one was pretty shady.
But it is interesting where it is popping up at the moment, like power transformers is another area. These companies have lived through these cycles before, and know there is no one to save them if they overleverage and get it wrong.
What you described only works if the manufacturers agree to price fix. Otherwise, in a free market, they'll race to increase their earnings by meeting the demand.
CXMT is scaling up incredibly fast, they are on a clock (south koreans) their monopoly will end relatively soon, although I'm guessing that the AI companies will crash before that anyways.
Supply and demand always balance out. There is no way manufacturers aren’t going to compete away these inflated margins, as long as they feel like this demand is sustainable.
Increasing the availability doesn't mean decreasing the price ... people think those are intrinsically related - not so much.
You can get a prada shirt for $2,000 ... as many as you'd like, for $2,000 a piece. No problem. They'll make the factories go burr all night long. Still $2,000.sweeping
There's a bunch of things like this. $100 bills for instance ...
a new entrant might yield a price drop, or, it might not.
> why not? i'm sure they can jump into the hustle.
Not so quick. Critical difference is the relationship between enterprises and the state. In China, the state owns the enterprise, in one way or another. High costs of memory is a threat to the established Chinese electronics manufacturers. The Chinese state can optimize returns at a higher level than the one some petty chip manufacturer operates at, especially if doing so means it could gain coercive geopolitical strength, aka blackmailing.
If it costs you $1B and five years to build out new supply and you think demand will not sustain for more than three years, it does not make sense to expand supply.
Instead you will maintain your margins currently and await demand to decrease back to your current supply.
This is pretty common and as others have pointed out is even more common in markets where competition is slow and lead times are long.
Ammunition is a great example over the last decade or so as political turnover caused relatively short lived demand spikes and manufacturers didn't expand supply because they knew once political winds shift, demand would decrease.
There's very few manufacturers, I believe 3 globally? And there's a large moat. Nobody can compete with them in the next 10 years. It's really not hard to coordinate action between 3 companies.
There used to be over 50 memory manufactures in the US alone. Everytime there was a bust (following a boom) there'd be bankruptcies. The lucky ones got bought out and consolidated. Empirically, attempting to capitalize on memory booms is a losing strategy.
There really aren't though. The reason there's only three is because memory is a commodity and margins are historically very low. It's not a very good business to be in, generally.
In the past when memory supply was short and then rebounded, many companies went out of business because making memory was no longer profitable.
And margins will continue to be low, otherwise they'll discover they don't have a moat. Commodity markets being competitive is a self fulfilling prophecy.
The companies have two choices. They either produce RAM cheaply and in large quantities, or they get replaced by someone who will produce RAM cheaply and in large quantities. Current incumbents are free to pick which of those two scenarios they prefer.
That’s not the Apple way, but they might fund a supplier to build out capacity in return for priority access.
The thing is they tend to only do that when they can get a technological competitive advantage. The priority access gives them a locked in competitive edge, for a while. It’s not clear there is an opportunity like that in memory.
If you factor in Nvidia’s profit margin due to the scarcity of the current bleeding-edge chips there is a path to a much larger cost reduction still.
There’s a lot to criticize Sam Altman for saying or popularizing culturally but I’ve come to think his “this is the worst it will ever be” is, in the long run, actually a very intriguing and underrated point.
In a decade training LLMs to the current level of sophistication, which is in my opinion rather advanced and probably has lots of additional upside just from constructing better RL training regime independently of hardware advancement, will become just as table stakes as running a database is now. I highly recommend everyone look into the Allen Institute’s projects in GitHub and HF because they have open source training materials (including an LLM from scratch off common crawl, and some quite interesting tunes of qwen) to get a taste for what will be in the near future afternoon projects or educational material. The future is going to be wild
These crazy hardware price increases will probably delay everything by at least 2-5 years. Then add at least 5-10 years for all these refinements and optimizations to permeate universally.
Until everything matures, most likely the current iteration of OpenAI and Anthropic will be long gone, along with their current business models.
This line of thinking makes sense if we're talking about opex like power usage. This is capex though and we'll be financing this overpaying for a long time after the hardware has "aged out". Not really sure there is an upside to it.
Also, inference cost predictions were made before this price jump, so we really haven't started paying for it yet. Inference will not be getting cheaper.
What demand? Can't shake the notion that it's fictive considering the amount od data centers being built and GPUs sitting in containers, where they will spend quite some time before being even integrated, even more until used...
For lifespan, AWS is still running a ton of T4 GPUs from 2018, that power a lot of computer vision models. A ton of these will have a long life, not all ML is about frontier LLMs.
While the 100× is, I think, rather hyperbolic, there is a real and large efficincy difference, but its economically viable to run them because the supply of newer GPUs is insufficient to meet the demand for compute, so they can charge enough to cover costs for the old ones and a premium (relative to operating costs) for the newer ones.
It would be economically unviable to run the older ones if the supply of newer ones were unconstrained, but that’s not the world we live in.
As long as you have customers that are willing to pay more than it cost you are fine. And with AWS seemingly there is plenty of those. So question isn't is this most efficient way but will someone pay at price that is above what new hardware could attain.
Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-precision math per watt? Where are the efficiency gains?
Edit: It looks like they replaced INT8 and INT4 with FP8 and FP4, with the same speedups of 2x and 4x relative to FP16. That's an improvement but not that big of an improvement.
I wonder if we will see an adoption of alternative floating point formats. IEEE floats are notoriously terrible at lower widths (<= 16 bits). Floating point formats such as posits do much better at 16 or 8 bits. If you could train at 16 bits per value instead of 32, and suffer a much smaller inaccuracy penalty than you would from IEEE32 to IEEE16...
Posits do a little better if your numbers are biased enough toward 1, but not much better. A 16 bit posit in a near-ideal situation matches an 18 bit IEEE float, and in a pretty wide range of situations loses to either fp16 or bf16.
Training anything at 8 bits is going to be tough, and it's hard to say if the flexible exponent is worth the precision tradeoffs.
For some reason I still haven't heard any predictions on when new fabs will come online to meet the current demand. This shouldn't be too hard to find out, since the building time of fabs is very predictable process.
The difficult question is more whether foreseeable memory demand will remain at the current level, grow even further, or shrink again.
Unless there's a new paradigm, scaling up is all they can do to improve performance. They've shrunk down all the way to 1-bit models and all the low-hanging fruit is gone. There's no way for them to get much smaller, so they have to get bigger and faster to meet expectations.
Is this based on an assumption that Opus 4.7 & co are equivalent or smaller to Opus 4.5 & co? I highly doubt the advanced models (Opus, Pro, etc) aren't biggen than the standard ones (Sonnet, Flash, etc) and fairly sure newer models are bigger than older ones.
this is just not true at all, there are massive leaps from algorithms, data, etc. every year. scale is one axis of many and you need to get them all correct.
Probably, but at some point we're very likely to run out of significant training improvements and it's not clear that we'll see that point coming from a long way out.
Likewise it's probably dwarfed by improvements in how we make dram - continuing the roughly exponential (maybe a bit less recently) scaling of chips - but not necessarily.
The 2x from returning to previous costs is interesting because it's practically guaranteed, and it's on top of everything else. We're just currently "overpaying" (relative to the stable market price) for the manufacture of dram because of a sudden increase in demand.
> this is just not true at all, there are massive leaps from algorithms, data, etc. every year. scale is one axis of many and you need to get them all correct.
Supply will not meet demand. What incentive do the handful of dram manufacturers have to end the party? This is what happens when legal monopolies finally win control. Dont't worry. The patents will expire in a few decades. Our grandkids will see DDR5 get cheap again. The system functions as intended.
The up-front investment of a memory fab is measured in billions, and takes years to construct and get running. The margin on the chips themselves is terrible, so without scale its not worth even trying. DDR5 is a industry standard that takes some effort to conform to, but the licence fees is a drop in the bucket to the cost of creating a fab.
The fabricators were cautious about increasing production, and slow to start planning. It takes further time to build up capacity, and if the demand drops down, they may end up producing dram at a loss when the market flips over to oversupply. The demand whiplash could kill any company that dared betting on increasing production. See the "bullwhip effect" https://en.wikipedia.org/wiki/Bullwhip_effect which has killed semiconductor fabricators before.
There is a discussion to be had about how to maintain national semiconductor production in Europe and US as a strategic industry, but historic attempts have all failed.
Billions is nothing in this market - if the market is supply constrained in the medium term then the hyperscalers will purchase their own route to manufacture (e.g. through coinvestment).
Also that's not what the bullwhip effect is - although I know what you are saying. The bullwhip scenario is about the effect of communication and batching through various layers in the supply chain, this is more similar to the cobweb effect/theory.
I have fairly simplistic view of the economics involved here. Could you explain why the ability to sell more chips wouldn't be sufficient enough incentive to increase supply?
Not the person you’re replying to, but RAM has historically been a boom-or-bust business, and companies that invest to meet demand during a boom cycle usually have that new capacity come online just in time for the bust.
If it was just variable costs and new capacity was available today they’d do it. But there are substantial fixed costs and delays to increasing capacity, and that uncertainty makes it risky.
That's such a nonsensical argument, it holds for every other business too and in this case it's just a lame excuse for monopolization. If you are that chicken and can't stomach competition you should not be in business anyway.
The current RAM manufactures were convicted of conspiracy to manipulate prices back in the 2000s or thereabout, doing so is their modus operandi, but this time the government is participating in the racket.
There are other boom/bust businesses that have had waves of bankruptcies. The commodity sector is of particular note. You're seeing the same reluctance to spin up new oil rigs in the shale industry for similar reasons, despite record high energy prices.
Chip manufacturing has unusually long spin-up times, high capital costs and relatively thin margins for anything but the latest and greatest processes, compared to most industries.
Bringing on new fabs takes many years and billions of dollars. You're exposing yourself to a lot of risk if you build now and find that the gold rush is over by the time your new capacity is online.
Let's imagine you're drilling oil instead. You have to spend billions of dollars over years finding and developing a new oilfield to make any profit back. And once you have it, you have to continuously spend enormous amounts of money to keep producing it, which means your effective price floor is higher than the current stable price.
Now it's 2021 and someone gets a tanker stuck in the Suez, sending the price of oil sky-high. How long does the ship have to be stuck before you spend those billions of dollars on a bet that it'll recoup before someone gets the ship out?
Although on the flipside, let's pretend it's 2017's and you are Nvidia selling GPU's for Bitcoin - maybe demand will dry up at some point? Do you stop scaling production as this might be the max of the market, or do you follow the market and increase production?
It's always easier to see the right move in hindsight!
Nvidia doesn't own fabs though, TSMC does. By 2017, ASICs for Bitcoin were well underway. Ethereum hadn't switched to PoS, and wouldn't until 2022. For that specific question, the answer is yes, because the GTX 1080 Ti is/was a monster card, and the crypto miners have a somewhat predictable demand for them, so there's some modeling you can do based on demand for the 2016 generation of cards. The question is ofc, if you're Nvidia, what are you optimizing for? Let's say, without foresight that Ethereum would move to PoS in 2022 and that AI would replace that demand, how many 1090 Ti cards do you make, how many 1070s, how many mobile 1080s,
how many Titans?
In order to answer that, someone at Nvidia would have to have, for better or worse, really had to have gotten into cryptocurrency in order to understand that market. Because you, as Nvidia, know how much better the 1080 will be for mining Ethereum, certain predictions can be made on demand.
Question is, without hindsight, 2022 rolls around, Ethereum moves to PoS, do you sell NVDA?
TSMC doesn’t get to take the profit that currently accrues to Nvidia and Apple, even though they absolutely could from a business/leverage perspective, because they are an economic colony of the United states and hiking their prices (which Apple and Nvidia would have almost no choice but to pay, but would upset their benefactors) would jeopardize their national security/defense.
In a world where TSMC is functionally capable of the same level of production but not in such a complicated geopolitical situation regarding semiconductor manufacturing, things would be quite different.
TSMC builds new bleeding edge fabs and then amortizes them for many different customers over a decade or more, starting with higher margin customers (apple, nvidia, etc) and working down as time goes on and the higher margin customers then move on to newer plants. Today's bleeding edge fabs become tomorrow's mass market fabs for lower margin chips that go into cars/toasters/etc. The idea is that the early adopters pay for a decent chunk of the CAPEX and then it becomes a commodity play. It's the same way some auto manufacturers put new tech into their premium cars, then it trickles down to the mass market cars over time.
It's the main reason outsourcing fabs is so much more economical. If NVIDIA built fabs just for itself, the fab's CAPEX would be amortized over fewer components than if a third party did, even if NVIDIA was the largest customer. It's also one of the main reasons Intel fell behind. So much of their cashflow was to build fabs that made an order of magnitude less chips than TSMC. Even worse, they had to write down the CAPEX for the fabs, which affected their financial statements.
Anyways, companies like apple and nvidia have very long term horizons and contracts, which probably have first right of refusal contracts on capacity, etc. In the short to medium term, apple probably isn't paying much more for most components. If this memory shortage lasts decades, they'll eventually end up paying more.
I bought 192GB of DDR3 a year ago for literally $60 ($5 a stick). It's about $22 a stick now, so more like $350 today. What on earth is _anybody_ doing with DDR3?
Demand for DDR3 is up because people who want DDR5 or DDR4 but can't afford either any more are choosing DDR3 and old DDR3-compatible systems to put it in, instead of what they really want.
All memory products use many shared resources in the supply chain, so if there is high demand in one product line, others have to raise prices to compete for the resources or stop making those lines altogether.
That is to say at least you were able to buy them at $350 today, with the current trajectory there will be no supply at all in few months.
You could set up swap space on Intel Optane media, it'll be about the same performance as DDR3 and sells for ~$1/GB on the secondary market. Though it will be a lot more power hungry than Flash, let alone DRAM - so not suitable for all uses.
Optane is available in NVMe form factor that will work basically everywhere. There's also Optane persistent DIMMs that only work in highly specific systems.
Just decided to buy 8 drives for my NAS and was surprised to see nothing in stock anywhere + prices are 3-4x higher than half a year ago. Just wasted 2k eur for 8x8tb, it should be plenty enough for my NAS but I feel stupid having to waste so much money.
I forgot to add, I paid ~500 each, Samsung for the same drive is quoting $2k on their site (maybe a new sku). These were bought 2ish years ago.
Strange things are a foot at the Circle-K.
Makes prior assumptions that getting tens of gigs of ram is cheap thrown out the window. Would likely lead to super fast SSDs such as optain being way more valuable
It is one of the thing with consumer when they remember they brought it at the absolutely lowest price point when DRAM maker were bleeding money.
Those are not normal pricing. Before the pricing collapse in early 2020, 96GB DDR5 would have cost about $450 to $500. And I will need to restate again the cost of DRAM hasn't really changed much in the past 20 years. Its price just goes up and down in cycles.
So in reality it is more like going from $500 to $1300. But consumer felt it was more like going from $200 to $1300.
Crucial are already selling DRAM made by CXMT. And China are already throwing money at it. I doubt the memory bubble will burst in next 12-24 months. As in going back to money losing DRAM pricing. As they will all pivot to HBM or other money making products. But the bulk of lower end consumer DDR5 or LPDDR5 will goes to Chinese Foundry. Assuming they have figure out how to do them well. Which they have improved but are still so far away from industry leaders.
Normally memory maker will push the next DDR standard to market just to push out Chinese competitors, I am not sure it will work the same this time around. DDR5 have plenty of other usage / demands.
Historically the price has always trended downward. When I first got into computing $200 could buy you 128 MB (yes M) of ram. Really nice systems had 512 MB.
That's obviously changed over the decades as process shrinks have lead to higher memory density. We should generally expect that ram will cheaper up and until the point where process shrinks stop happening. They've definitely slowed, but they haven't stopped.
>They've definitely slowed, but they haven't stopped.
Yes if you span into 40 years. But the spot price for DRAM floor was ~$2/GB in 2008 and touched that 2-3 times over the next 15 year. It wasn't until early 2020s it broke that into $1.
Process shrinks happen but majority of DRAM part can't be shrinked by process any more.
Exactly. My first computer had 48k, yes K of ram :-). My first PC has 2MB and made all my friends jealous as they had 1MB. Amiga 500 at the time had half.
I am keeping a piece of paper that came with my Tex Murphy game which stated that one could get 32MB of RAM for as little as $700 (1990s dollars) which would drastically improve the game!
My main computer has 64GB. I bought that one in late 2022 or so.
Looking at the current prices, even of the same RAM, is just
insane. Those companies really need to pay us compensation
damage here. The whole "free market" notion does not work
when you have de-facto monopolies and mega-corporations abuse
average Joe and average Jane.
Everything I read seems to suggest that RAM capacity is going to grow at 20-25% a year, which just doesn't seem good enough. Even in consumer use cases, phones and laptops would benefit greatly by double the amount of RAM. And then obviously, the AI need is gigantic.
I don't see it going away. I mean, it may not grow as fast as now, but I don't see it growing away either. I get why the memory makers do not want to bankrupt themselves, but it feels like there's got to be some way to push that risk off onto model providers and other people in the ecosystem to allow us to grow ram capacity more like 50% per year.
The openai deal would be absorbed by two years of that. And it would be inefficient for the RAM makers in a competitive market to leave buyers unsold-to.
I don't actually know what the rate of growth before October was, I'm sure someone round here will though.
I mean the biggest risk is Chinese CXML benefits and capturing markets that others are leaving hanging and then being able to compete and push out the others when costs start to normalize.
As for 20-25% growth not being enough, I think it's not that far off, if we assume data center build out plans hit a wall and slow down significantly, and the AI heat starts to cool off.
I don't think 20-25% may be enough in the short term but if the AI build out stops within this year, we have a massive oversupply instead of a under supply.
Looking at the history of the memory industry the biggest risk is that a firm would over produce and go bankrupt. Maybe this time is different but so far no memory chip maker has gone under because their competition increased capacity.
I might be wrong but your second point can't be true if the first one is true.
Let me explain, imagine CXML grows massive and builds a lot of fabs, so much so that it becomes the leader in multiple segments, then the market demand cools off.
Then CXML the company that invested massively has oversupply so it undercuts every other memory company.
Aka, Samsung, SK Hynix are dead, and to protect Micron now US has 10000% tariff on the supply of memory.
Imagine. Because that has happened, if you don't play the boom and bust game someone will because the market is very large during a boom, and generally the player scaling more isn't the one with margins to protect and generally has the ability to undercut others.
Asian memory chip giants were made by under cutting European and American companies, American companies adapted by moving manufacturing to Asia, and European ones got bought for pennies or dissolved.
Is there any indication research is being focused on reducing menory footprint of inference for frontier class models? Is the low hanging fruit already gone there?
Low hanging? how low hanging are we talking, the basic stuff is gone. Largely big challenges around quantization were solved 2 years ago, and we have just been improving from there.
But can massive gains still be made? Definitely.
The entire AI hype is based on the paper Attention is all you need, and Attention is basically loading a huge matrix of all the tokens in memory, how well you can optimize this attention layer is basically how most architectures are trying to solve for performance and memory usage.
Only one with significant gains in it is DeepSeek (or so I would like to believe because others don't make their work open for folks like me not in Big AI Labs to read). Their MLA architecture reduced KV-cache memory requirements by upto 90%, ofc that's purely architectural change.
With some quantization like Turboquant from google you could push it down to ~1/3 of that. So 96% memory savings when talking about kv-cache.
But the models are close to being saturated for quantization based memory optimizations. We will have to see some architectural changes for a significant shift now.
If they manage to make memory more efficient, they’ll just increase the context size and/or model size.
We just haven’t reached the diminishing return of gen AI capabilities yet.
Models will get more useful if you have higher context size or higher param size. Then people will just use the models even more, leading to even more memory demand.
There's no risk to businesses that are paying bonuses of $ 1 million, per worker, per year - like the RAM makes Samsung and SK Hynix.
They are drowning in money but they don't invest in new production in order to maintain high prices. By doing so, they form a virtual trust with monopoly control over pricing. What you call "risk" for them is our best hope, China can't enter the market soon enough.
Oops, the US government is blocking the Chinese chip industry in every way possible and thus becomes a factual member of the aforementioned anti-competitive and anti-consumer trust.
They closed Crucial in an announcement they until very recently still sold stuff to consumers and they have a business entity still to provide support and warranty in most countries.
I got my RAM rma'ed 6 months ago, yes it was intense
According to the recent article HBM memory is 3x less efficient wafer area wise than LPDDR; but the bandwidth is more than triple.
What if its in everyone's interest to buy computers at say 1/3rd the rate and switch everything over to HBM?
the discrepancy between compute and memory has been growing for ages, perhaps a painful switch to HBM is exactly what we need?
Would you rather have 3 intermediate computers with low memory bandwidth, or wait a little longer statistically so that we can all enjoy a new computer at 1/3rd the rate but much higher bandwidth than the area ratio?
I hear people are doing AI workloads on apple hardware, which is LPDDR but with a wider memory bus (1024bit). This requires the SoC to support this; from what I understand not many of any beyond Apple offer this. A wider memory bus may be all we need.
Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.
Historically most devices were serving antivirus and snooping. Ai is the first time they are being used for actual computing again. They will be kept saturated.
It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.
Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.
I'm not moving past my DDR4 build (and the 32 GB of DDR4 2133 MHz backup chips I still have around from way back, before I got the current 3200 MHz ones) until the prices go back to being at least partially sane. This also means that CPU manufacturers are not getting my money (since the 5800X is fine for now) and I have no reason to get a new GPU either (though admittedly the B580 isn't perfect).
As Yogi Berra famously said, "It's tough to make predictions, especially about the future." But based on historical tech industry trends, a price increase in one component that's this rapid and extreme, is likely to eventually regress somewhat toward the long-term trend line - even if that trend line experiences a longer-term shift upward.
As always, some interpret certain recent events as reason to conclude "but this time it's different." Occasionally they are correct. But that doesn't change the fact that it's reasonable to assume some of the recent extreme, rapid price inflation is due to shorter term market distortion. It's also pretty clear that some of the recent increase in demand represents a stable increase in the long-term trendline. The question is how much is long-term stable and how much is short-term distortion.
This is 100% going to kill the home built pc market. When I started building gaming pcs, the top top card was 750$ (NZD). Now they’re 10,000 just for the gpu and another 1-2000 for ram.
People used to get into gaming pcs as an affordable hobby, now it’s making general aviation look like plan B.
This has already happened. Home PC market is practically dead already due to memory, ssd and graphics card price inflation. Makers of components like PC cases and power supplies etc. are seeing demand down 30-40% year over year and this is going to put many suppliers out of business. NVDIA has stopped even listing gaming revenue on their earnings reports. Both NVDIA and AMD are not seriously interested in supplying the consumer GPU market anymore either.
The only hope left is really Apple, but even apple has conspicuously delayed the launch of M5-gen mac minis and mac studio. Mostly because even Apple can't source enough DRAM to fully supply all their product lines.
there's much more than triple A video-games running at 240 Hz on Ultra settings... a 200 USD laptop/computer has enough power to run hundreds of interesting indie games and AAA from the past
My 2019 gaming PC is considered unusable ewaste by most pc gamers. The RX5700 XT GPU is super cheap second hand right now and I've been able to play every game I want including new releases like Kingdom Come Deliverance II on great settings with no noticeable issues.
You don't even have to drop down to old indie games. You just have to turn off the FPS counter and stop pixel peeping screenshots.
You can still play fantastic games with amazing gameplay, great storytelling, and even requiring quite a GPU. But you won't upgrade your GPU or RAM. If it gets broken, people have already gotten their money back instead of replacement (whether that is legal or not, depends on your jurisdiction, and regardless: it is happening). So the demand and adoption of say 240 Hz 4k OLED gaming is going to slow. I currently sport two 1440p IPS capable of 144 Hz, with an AMD 6700 XT, 64 GB DDR4, and a 5700X3D. I'll wait upgrading that to a 4k rig.
What I will do is buy a Nintendo Switch 2 before the price increase hits. Why? Great gameplay for kids.
Prices haven't risen THAT much and are quite affordable. And if you look at the improved quality of upscalers (DLLS 4.5 for example), gaming is now more affordable than ever, despite the increased cost of components.
Of course, the 5090 prices are insane, as are for SOME memory models, but that's nothing new and represents a fairly small market share.
> When I started building gaming pcs, the top top card was 750$ (NZD)
When I started building gaming PC, the top $700 cards didn't even provide comfortable performance or graphics. Back then, you were supposed to have several of this connected SLI or somethin. And even then, it wasn't always reliable, and it resulted in stuttering, lags, and graphical artifacts (in cases when it worked). Today, even $700 graphics cards are a much better product from a user perspective than the high-end cards of that time (and that's not even taking into account that $700 cards back then were much more expensive).
Improved quality used to be the justification for buying new hardware at a similar price to the old hardware when it came out new. Now the 5060/70s are 4 figure cards.
As for how much the prices have actually risen, it’s not hard to see if this is true or not. If doubling of prices doesn’t raise your eyebrows, I’m not sure what will.
> When I started building gaming PC, the top $700 cards didn't even provide comfortable performance or graphics.
When would this have been? I can not remember a time this was accurate for the games of the time, outside of a handful of meme titles like the original crysis that made bad hardware bets. Most of them fulfilled the needs of the software and hardware of the time. I'd say the biggest issue was that for a time, software and hardware were advancing so rapidly that you wouldnt get very long out of your hardware, but that's just the reality of rapid development and not the fault or failure of any specific hardware release.
> Back then, you were supposed to have several of this connected SLI or somethin.
SLI was aimed squarely at enthusiasts, not at joe-average PC gamer and it was certainly never a requirement. It existed as a halo feature for people chasing maximum performance, benchmark scores, and bragging rights.
Why? Those servers still have to pay the same price for components plus a markup for the service. In theory you can serve more gamers per GPU, but these GPUs have to be physically located in your city to have a usable latency, and that means you'll have issues with peak utilization being most users gaming at the same time of day.
I just don't see the cost savings of sharing a GPU overcoming the extra expense + profit such a service would need.
The GPUs do not have to be "psychically located in your city" to have usable latency.
Of course, less latency is always better although running a traceroute between my IP and major city (Sydney) from 1,500 km equates to about 11ms latency with optimal routing. (Real life test, traceroute via an ISP Looking Glass).
1500km is still largely the same timezone though. To actually get consistent usage of the GPUs you'd want users on the other side of the planet using them while the current side is sleeping/etc.
> Those servers still have to pay the same price for components...
Not if Nvidia is running the service.
Seems quite possible to me that Nvidia sells to the public just enough graphics cards to keep any frisky antitrust investigators off its back and reserves the rest for GeForce NOW, its "pay monthly for limited access to a remote gaming PC" service. The cards for NOW are billed to the BU running NOW at or below cost, the few cards available to consumers and System Integrators naturally have a huge markup due to extremely constrained supply, and Nvidia uses the fact that they are the thing behind the LLM Boom to ensure that they have -what a System Integrator in 2022 would recognize as- a reasonable price for just enough RAM for the computers that NOW rents access to.
Downvoters: notice the speculative nature of the previous paragraph. I'm not claiming that this is happening right now. I'm claiming that it's quite possibly more profitable for Nvidia to bill monthly for limited remote access to computers with Nvidia graphics cards in them than it is to sell those cards at retail and to SIs.
These kinds of conspiracies require everyone to collude, which just about never happens since the reward to defect increases. If nVidia tries this, they would just lose the market to AMD who would spam out as many GPUs to gamers as they could. If both AMD and nVidia teamed up, it would leave a gap that either intel or some Chinese startup would jump on.
It's just far more likely that these GPUs actually do cost a ton to make right now.
> These kinds of conspiracies require everyone to collude...
No, only Nvidia makes and sells Nvidia GPUs. They're the sole supplier of the GPUs used in 95% of the graphics cards sold in the US.
> If both AMD and nVidia teamed up, it would leave a gap that either intel or some Chinese startup would jump on.
Fascinating.
a) Explain why the only even vaguely-recent cheap video cards were made by Intel, and why it looks like Intel has pretty much stopped making video cards? [0]
b) Tell me how that Chinese startup gets past USian Sinophobic/protectionist trade barriers?
c) Tell me how that Chinese startup convinces the big gaming development houses to ignore the advice of Nvidia's driver engineering team that just so happens to make their games work great on the hardware in NOW and really, really poorly on that unknown-to-US-customers Chinese startup?
> It's just far more likely that these GPUs actually do cost a ton to make...
You seem to have not been paying much attention to the reports of Nvidia, AMD, and major RAM and storage suppliers changing focus from the consumer market to the far more profitable datacenter (read as "LLM") market. Several such suppliers have exited the consumer space entirely. As any residential renter in San Francisco [1] can tell you, extremely limited supply drives price up to obscene levels.
[0] This shift in Intel's focus may or may not be related to Nvidia becoming the third- or fourth-largest Intel shareholder.
[1] ...or any other "hot" market with large, artificial barriers to entry...
Indeed, Gamers Nexus is doing interviews with PC component manufacturers, and some are hurting bad right now. The PC market is no longer in competition, but rather survival mode. =3
It's more likely to kill the AI market. They're overbuilding capacity and most of it is going unused. The upcoming haircut is going to kill a lot of the major players.
They've intentionally crafted an unsustainable business model in an effort to get users in the front door and raise their MAUs. We've seen this story before. We should know precisely where it's headed.
When you consider how much an employee costs, AI makes a ton of sense. Lots of businesses are stacked with staff doing basic data entry / shuffling. Even if it’s 1000usd a month, AI is still a bargain.
I think it's the opposite. Sure in short term hobbyists are getting squeezed, but the amount of capital that they can put into pushing the edge is small compared to Fortune 500. Sooner or later hobbyists will benefit, especially if the market crashes.
It's impossible to kill gaming like this. Even if hardware was completely unaffordable, people would just use old stuff for longer and then upgrade after prices restore.
Why would it kill PC? There will always be hobbyists, e.g. I can't imagine pro e-sports players running on a Mac. Personally, half of the reason I moved away from Windows is Microsoft stalling/degrading Windows experience.
Price of PCs causing a collapse in demand, then mass bankruptcies of companies making PC components so supply chains get demolished and when prices come back down there’s no one left selling anything so you can’t build a pc at any price
I wonder why the hyperscalers aren't vertically integrating more and building their own fabs. Sure, a fab costs a billion dollars, but they're currently spending hundreds of billions of dollars purchasing chips from NVidia and others.
I'm not sure if they should vertically integrate, it would probably be a better idea to directly fund the expansion of capacity, much like Apple does when they scale up a new technology for iPhones.
However, that the hyperscalers and AI companies aren't doing this says a lot about their true beliefs about how much future demand AI will have.
AI companies claim they will need a ton of massive expansion, but are unwilling to take on the risk of the capital needed for that expansion.
I'm hearing a lot of sad whining from AI folks about how these chip makers are holding them back, but who actually has the money to finance the expansion easily? Chip makers have been through this game far longer, when Sam Altman went around claiming it was time for $7T of fabs the AI companies made it clear that they were willing to make ridiculous claims, eliminating credibility.
What's needed now is for them to funnel a tiny amount of their massive piles of cash into financing fabs directly.
Oracle is getting sold because of how much capex they're spending on new data centers in the middle of a high rates environment. It's not like they're stockpiling cash due to doubting AI.
Oracle had not entered into my thoughts at all; I know they do some cloud stuff but they are in a very different position than OpenAI or Anthropic or Google.
> [...] better idea to directly fund the expansion of capacity [...]
>
> However, that the hyperscalers and AI companies aren't doing this says a lot about their true beliefs about how much future demand AI will have.
With what money? They have to spend the money they get on hardware ASAP else they are left behind.
Another guy answered it ITT. Intel did that, it’s not great because fabs are expensive and risky and it’s less risky to amortize the cost across multiple customers instead of just yourself
Because fabs are about the most complex cutting edge technology out there: the "rocket science" of our day (or one of them). And merely having the money is not sufficient. It would be very easy to blow several billion dollars and end up with nothing to show for it.
Just look at how Intel has struggled to compete in recent years, and they have been in the business for decades.
Intel struggled because they bet the company that Moore's law was over back in ~2014, and instead of upgrading their fabs to EUV they sent the money back to shareholders.
They forgot Moore's main lesson: only the paranoid survive. They thought they could coast, and it nearly killed them.
A fab takes years to build even when you have the necessary know-how. If you don't it'll take some additional experimenting before you can compete with the established manufacturers. By the time you can produce a usable chip the shortage might be over.
Fab margins are on average super thin compared to the margins of big tech companies, and come with a lot of risk because of that. It's not something they are likely to be keen to integrate.
Memory manufactures sit on a war chest of IP. So even if someone has excess fab capacity and wants to get into memory manufacturing, they will have to fight an uphill battle of about a zillion patents.
Most memory companies have backroom deals to exchange tit-for-tat patent violations against each other.
Not sure how a new memory manufacture comes into being without getting sunk from licensing costs?
Bought a second hand Dell server a week ago. The entire rig with a 12-core CPU and 32GB DDR4 ecc RAM cost as much as I'd pay to buy 64 GB of DDR RAM alone. I hope there's an end to this absurdity soon enough otherwise the pain will affect other markets too. I read the other day that PC case sales have collapsed by more than 40%.
I feel like by the time the AI bubble bursts the PC market will be irreparably damaged. Manufactures who have been making "enterprise" parts aren't going to go back to making consumer parts because there will be no market for it. And with a glut of datacenters not making any money on slop, they are going to be repurposed for saas, stuff like OnShape but for every application.
Most users don't seem to care about storing everything they generate in cloud services and this could easily be sold as an alternative to owning "expensive" desktop or laptop hardware.
It's the reason I just build a new PC, despite the insane prices, I'd rather overpay than have reasonable prices but no stock to buy. With any luck I'll get 8-10 years out of this one and by then the PC landscape will be something else entirely.
If hyperscalers are using more RAM, and that RAM is not available for consumers, it means all the heavy stuff will happen in the cloud. Why would we want both the hyperscalers and consumers to have RAM simultaneously? Consumers would want more RAM to run local models but then hyperscalers capacity will be unused.
Because RAM isn’t in PCs only. It’s in tablets, phones, laptops, DIY computers like the Raspberry, mini PCs, watches, smart TVs, game consoles, cars, routers, cameras, all smart appliances from refrigerators to washing machines, fitness trackers, printers etc. Cloud services are irrelevant to most of these categories.
A chip that produces refrigerator ram is also capable of producing hbm3 ? Don't they require retooling? Won't the same problems surface as required to establish new fabs?
They do require retooling and that's what's happening here. RAM manufacturers decided that it's way more lucrative to focus on HMB production than DDR 4/5 production. Capacity is the issue and that's capped unless you build new fabs but they won't do it because there's no guarantee that the demand will keep the same in the next years.
I really don’t want to give anyone ideas, but doesn’t this make the Nvidia 5090 an unbelievably good deal right now?
The VRAM in the 5090 is only made by one country in the world.
The 50xx series is special, because its ram is so dependent on a single commodity. It’s not like a 4090 or a 3090; their VRAM chips have been around for years.
If there’s a shortage or interruption in DDR7 VRAM, it seems like every GPU that requires it would explode in value.
I hope I don’t regret posting this because I’d really like to buy one myself…
With only 32gb of vram, you can only run small/quantized models, in which case what's the point? At $4000, that gets you 20 months of 10x claude or chagpt subscriptions, which provide far better models. You'd need some use case where you can tolerate worse models, and use a steady supply of them. That doesn't match most people's usage patterns.
If you can do what you need with qwen3.6-27b, it starts to look really interesting. That model is crazy good for the size, but it's a pain tweaking the params to run it on a 4090 with decent context and decent token speed. A 5090 looks tasty from that point of view, and only more so if you think in terms of the probability of that model being roflstomped by something in the same weight class in the next couple of years. I reckon that probability is significantly non-zero, but fundamentally it's a guess.
>If you can do what you need with qwen3.6-27b, it starts to look really interesting.
What's the use case here? Churning out massive amounts of slop code through autonomous agents? Running openclaw 24/7? I think the proliferation of codex and claude code, compared to any of the cheaper open models suggests that at least for most software development, the 50-75% discount of open models isn't worth the hassle of the decreased intelligence.
I think there is a reasonable basis for taking a gamble that small models capable of fitting on a 32GB card will continue to advance over the next 5 years and eventually approach Gemini Flash 3.5 / Sonnet 4.6 levels of capabilities, which I would consider to be past the threshold of “probably worth the cost and hassle of running 24/7” if the upfront cost of the hardware was palatable.
My use case would primarily be in search, integration, and indexing other software projects with my own, as well as transcription/indexing of interesting video and audio content (eg Dwarkesh interviews) that I don’t have time to watch but want to easily search and apply to my projects, and search/indexing for useful information from things like Linux kernel and security mailing lists. Basically there is a lot of stuff that, if the cost were low enough, I would point a reasonably intelligent AI at to distill out useful information and apply it to my projects, or just cherry pick the interesting things out and surface them to me so I don’t have to wade through all the mundane stuff and man-made slop getting in the way.
>My use case would primarily be in search, integration, and indexing other software projects with my own, as well as transcription/indexing of interesting video and audio content (eg Dwarkesh interviews) that I don’t have time to watch but want to easily search and apply to my projects, and search/indexing for useful information from things like Linux kernel and security mailing lists. Basically there is a lot of stuff that, if the cost were low enough, I would point a reasonably intelligent AI at to distill out useful information and apply it to my projects, or just cherry pick the interesting things out and surface them to me so I don’t have to wade through all the mundane stuff and man-made slop getting in the way.
All of that feels like something that a $20 chatgpt pro subscription is for, maybe with slightly better tool use capabilities. There's no way that a $4000 purchase on a GPU would ever be worth it if all you're doing is running a handful of queries per day.
It would require much more than a couple of queries per day, I want to basically do bulk ingestion and search/evaluation/integration across tens of thousands of videos and software projects (if it were cheap enough and smart enough). It would basically be setting up and operating a pretty large data ingestion and coding agent pipeline, which I would want to itself be mostly automated.
It’s ok if you don’t want to do the same kind of thing but I find it weird how dismissive so many people get about wanting to use LLMs for large projects, or how anybody who says they’re using them for these kinds of things (I’m doing similar for other stuff) gets challenged on what they’re doing it for.
I don't have 5090, I have 395+ and I use for gpu assisted OCR, embeddings vector, speach to text and etc. I have a freedom of using a large library of various models and I can fit a lot in 128gb.
I don't use it for coding, I have $20 Gemini, $20 codex, etc.
But then I got the framework board for $1700, now it's $2700
My area has a net-metering plan available, so you can send any surplus out to the grid to offset energy pulled from the grid, essentially treating the grid like a large battery. That can extend the 8 hours into full 24-hour coverage with enough panels.
MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.
That's really only "useless" if the only thing you care about is a quick real-time response. Contrary to common perception, MoE models do benefit from batching requests together even when run on a single node, you just have to ensure you have at least ~5 parallel requests in flight (and that's for the very sparsest models) to really see the aggregate benefit.
(Intuitively, that's because the issue of whether any active weights are being shared among requests - thus, any memory throughput is being reused - is a generalized birthday problem. That's why even having a few parallel requests is quite effective. Especially since the "random" choice of experts happens anew at any single layer, so there's a lot of independent samples.)
You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it; that's where the "birthday" framing becomes relevant. Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
>You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it
I'm not sure what you are claiming. Decode is bottle-necked by memory bandwidth. To see a speed up of 2x, you have to ensure each expert weight memory fetch can be used by 2 parallel streams. What exactly is the average factor you are claiming for 5x parallel streams (due to "birthday paradox" factors)? The Birthday paradox isn't really relevant here. It's about coverage, not parallelism.
> Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
An aggregate speedup of 2x is a lot, we don't need that in a local context. Local hardware is heavily constrained by power and thermals, not just bandwidth; so all we really care about is raising compute intensity for decode a little bit to relax the memory bandwidth constraint. The average factor will depend on just how sparse the model is and how far you can push parallelism, there isn't just one single answer.
Which surely is the highest it'll ever be! You're suggesting that the price will go down in the future? Would love to hear more about your thought process!
Are you saying we're entering a period where tech increases in price instead of decreases? I guess it depends upon time horizon, but your statement isn't very specific.
There was only a very brief time it was selling for MSRP (last fall for $2000). Even if you use that as the previous data point, it's only 200% increased.
I find it deeply ironic, that iran has blocked helium supply- while it relies on AI created slopaganda to subvert its advesary. Its one of those afterwits of history.
In the long run cloud gaming is inevitable, it’s just more economically efficient for the cost of the hardware required to render graphics to be amortized across consumers and not sit idle when being unused by collocating them with game assets in POPs.
Once enough gaming compute runs at the edge it also allows for more technically advanced games than would currently be economically feasible (but aren’t made mostly for lack of a market/adoption of cloud gaming and the resulting lack of technical know-how). So I think it will stick and probably end up winning over the holdouts, once the cost of rendering the games they want to play with consumer hardware becomes too large to stomach.
You could make the same economic argument for any SaaS, but the margins SaaS providers look for make it so that the only time it isn't cheaper to run your own software/hardware stack in place of SaaS is when the hardware requirements are very low, not high. SaaS makes sense economically when you take into account the admin, compliance, etc. costs... and the admin costs of a Nintendo Switch are pretty low.
Economic efficiency does not win the day because the free market is a myth. Cloud gaming is a technically worse solution because the latency floor is higher. It's a microeconomic disaster (rent vs buy, buy wins). The only reason it would become a thing is if the multinationals succeed in concentrating more wealth and power, which consumers aren't interested in supporting. It's a bad deal and consumers know it. They would have to be forced into it by having the consumer hardware market taken off the table (which is happening and the only possible avenue for a technical regression like cloud gaming to have a market).
Do we though? DLSS 5 changes that somewhat from a “we need powah” to “we need models”. I think the future consumer GPU market will be tuned for image and world inference while workstation cards will be tuned for image and video inference. The old way of thinking about this will come to an end when we stop looking at the render loop as the be-all-end-all…
From my point of view, I suppose we will enter a "Let AI generate entertainment" era. In which you just might rent everything, including games. No need for a beefy computer at home, you just need a slim endpoint:
"Order yours now, for just $99.99 per month, hardware included! Order today, and you will get three months of 'Office Suite' for free, with a small additional cost of $49.99 after month 4. On a tight budget? Switch to the yearly subscription, and pay comfortably in 18 installments."
If DLSS 5 becomes the norm it's possible that just makes things worse. The DLSS 5 demos required an entire separate card to run the model, though IIRC NVIDIA did claim it would eventually work on a single card. Given what the model is doing (yassifying the whole scene instead of just upscaling/reconstructing) it makes sense to me that it would increase compute demand instead of reduce it like previous versions of DLSS.
The demos did, but look how far we have come in just two years? Running local LLMs, running local diffusion models, running local world models (albeit, barely a scene at this point). I do believe that in 10 years time, game will be producing latents and not events they way they do now. I also hope this means that VR can finally get the fidelity it needs to really take off.
It's still unclear to me: the shortage is semiconductor boules / wafers? or the shortage is semiconductor fab process step availability?
As long as the discussion seems focused on memory, I'd suspect the latter, but if its really the semiconductor boules/wafers, then I'd expect the boule growers to profit, not the memory makers, who just pass on the cost.
It’s fab capacity. Fwiw dram is different enough that fabs are not transferable between dram memory and other usages. It’s nice to think ‘wow if they made the current 10nm dram on the latest 2nm processes it’d be much faster’ but it doesn’t work that way. The specific size is needed for the capacitance. Sram can be made on fabs that make other circuitry since it’s transistor not capacitor based but is less dense.
I asked for evidence different people keep feeding me opposite stories: one insists its not fab capacity but wafer competition, with a recent article claiming HBM3E takes 3 times as much wafer area per bit than LPDDR5X. Others tell me the complete opposite: its fab capacity, not wafer shortage.
Do we have citable references to ground either set of claims?
I believe those are two ways of describing the same thing. If you're able to book some fab capacity, that means you get to decide what the fab does with the next wafers in the queue.
From your sibling comment, I think you're interpreting the 3x HBM stat as contributing to making wafers scarce. It's more that the next wafer to be processed in a fab is especially precious, making the opportunity cost larger. The beach sand remains plentiful.
And that article is contradicting other voices. If that article were correctly identifying the bottleneck as wafer shortage due to switching to HBM, why is everybody discussing the memory makers instead of the boule growers. Memory makers can expand operations all they can, which makes no sense if wafer supply doesn't follow, and the article is suspicously light on semiconductor boule / wafer mfr's.
So which is the bottleneck: fabs or boule growing?
also consider how most solar panels are monocrystalline silicon, how credible is silicon wafer shortage ... really? there is so much disinformation in this market...
This covers it pretty well https://news.ycombinator.com/item?id=48229319, TLDR -memory for AI uses more wafers from same production line as other memory and is more profitable, building new fab very risky historically for companies. The companies have cut production of other memory to favor memory for AI and the market for memory for AI is still unfulfilled so prices still go up for customers of every type.
Regardless of the specific mechanics of the bottleneck, we know what the proximate source of the problem is: openai locking up 40% of Samsung and SK Hynix wafer capacity for the next few years. That's what triggered the madness.
Is there an understanding of what OpenAI intends to do with that memory?
Surely they need GPU capacity and would need memory for those GPUs but OpenAI doesn't build GPUs or any hardware, right? So did they pay to keep the supply locked up, or do they have the ability to put that ram into use?
I guess they could have a thousand GPU's each generate the next 20 microseconds in computer games, and play at 50 kHz frame rates, in order to truly eliminate motion blur regardless of what in game object motion your eyes are tracking.
Good time to focus on more memory efficient means of training and inference.
SeedLM from Apple is an interesting approach for inference memory efficiency. I'd like to see someone try and build that into training so that it's not a post training compression step.
for the most part, unless soldered down, it has been hard to find higher than dual channel (maybe quad for a massive odm gaming laptop). each stick and platform having set maximum memory capacity has put a glass ceiling for those machines.
doesn't matter anyway when things are not reasonably priced. i am stuck at the same memory capacity in my personal system for the better part of two decades, partially due to the above and the current pricing today.
And the max storage in pre-built computers has stagnated at 2010 levels (~1TB). This was first due to the switch to the much more expensive and much faster charge trap flash. In the 2020s it finally started to approach 2010 sizes in pre-builts but then the corporate finance wars re: fab capacity happened.
Here’s the thing, what if memory manufacturers take this opportunity to collude and basically never reduce the price of memory below the current levels since it’s too hard for a new competitor to just rise up and undercut them? Everything I hear about is how hard and risky it is to spin up a new fab.
And by doing this, they ensure local LLMs never become feasible for the vast majority of people and AI companies solidify subscriptions forever.
Keeping prices at this level is precisely how one or more competitor will rise up. Making memory isn’t super hard. That’s why it is a commodity. The problem with the memory market is that up and down cycles have bankrupted the vast majority of players in the past. Now we only have 3 players left except for a few smaller ones in China.
The reason memory prices can stay high for years in this mega cycle is because the 3 players will be very cautious on overbuilding. They’d rather under build, make great profit (not maximum) and reduce the risk of going bust if this suddenly ends.
Same for TSMC in chips.
Great opportunity for Chinese companies though. This shortage is exactly what Chinese companies need to scale.
When Samsung had to sell memory at a loss after COVID, no one came to save them. They buffered their memory division using profits from their other businesses. That’s how Samsung survives memory downturns.
According to some stories, this is how Samsung convinced TSMC to not enter the memory business - that you need a nation or other lines of business to prevent bankruptcies.
You’re confusing two independent things. There are simple processes that are extremely capital intensive with long lead times and then there are complex processes that require lots of R&D and industry secrets. Memory is the former in the chip world.
Other examples from outside of tech of easy but capital intensive processes are power generation and railroads. Very easy to do, but easy to end up broken by overbuilding for demand that fails to materialize or stay stable for the duration of your financing.
If the collude to say make the price $1000 for a component that costs them $100(including opportunity costs), then either a new company or a greedy company in the collusion can make their price secretly $900 and get massively more profit.
Right now their opportunity cost is too high.
> risky it is to spin up a new fab
You don't need a new fab. You can build memory in 20 years old fab.
They will respond when people are loud enough. If memory stays at $1200 for 128GB for years and investigative journalists say it could be colluding, enough people will make enough noise.
I’m sure Nvidia, Elon, Tim Cook, OpenAI, Anthropic are already whispering in Trump’s ears to do something.
Corrupt doesn’t mean “acts without incentives”. If you assume a corrupt system, then the inputs are going to be who has influence over the DOJ. If there is more money to be made by breaking a cartel, then they would absolutely do it.
How can I use this information to MY advantage? Do I started going into something to do with AI chip memory-stuff? If so, how? But just on a software level cause hardware is hard.
Nine years after Google's seminal paper lit the fuse on AI, a total lack of manufacturing foresight has trapped over a trillion dollars of incoming capital in a hardware bottleneck.
The entire sector is now facing a critical RAM starvation crisis where memory manufacturers are actively slow-rolling supply just to keep prices high and avoid running out entirely.
This has created an unprecedented supply-and-demand distortion where desperate companies are getting rejected even at a 5x markup, and mission-critical SKUs are skyrocketing to 10x and 20x their baseline value.
It is a macroeconomic squeeze at a staggering scale, and the massive venture scale opportunity lies in capturing the value created by this memory gatekeeper.
From the perspective of an armchair economist, the winners will be the investors who invest in RAM wisely. The losers will likely be cash strapped SAAS companies. They’re almost completely dependent on a fleet of servers in the hyperscalers, and they’re leasing those servers and services. That leaves small SAAS companies exposed to incoming inflation in the cost of hosting.
Capex expenditure start exploding after covid with the chart going hockey stick at the end of 23/start of 24, almost 2.5 years ago.
A lot of capex is supposed to go into the datacentres, didn't they know that datacentres need to be filled among other stuff with RAM? I wonder if at some point we will discover that there is a shortage of fibre optic cables of SFPs ...
PS: Obviously armchair economist here too ... but for it doesn't seem too difficult to foresee the increase of the demand.
Since memory is becoming an expensive commodity, I guess the old ways of being precious on the efficient memory usage of your program (like it running on the constrained 1mb memory back then) are making a comeback.
I only feel sorrow for the electron devs, they will have a hard time.
Since January, I've been lucky and picking up various used DDR4 memory sticks for cheap-ish. I got a total of 64 GB for $180. I feel like I hit the jackpot!
I think the companies that drive up the prices here, need
to pay an extra-tax to all of us. I fail to see why I now
have to pay more due to the AI monster companies ruining
the economy.
it’s fun and ironic that “having a memory” is what AI appears to lack the most in practice while at the same time it demands more computer memory than anything to run
I heard Greg Brockman on a podcast saying they are limited by computer and memory. They have line of sight in solving many different kinds of problems. But they also have to survive in the meantime. Hence the focus on enterprise recently. They could just ask Government to fund them doing other research areas
You're probably thinking about jevons paradox. But you slightly mis-stated. It is the phenomenon that increasing the efficiency of resource consumption can end up increasing total consumption.
As you stated it, it would merely be a property of (nearly) all demand curves. Jevons paradox only happens sometimes. It isn't a law.
An example of where it stopped happening is with gasoline in developed countries. Cars having better fuel efficiency doesn’t make me drive further to the grocery store or work.
Generally when someone replaces their vehicle the new one is more fuel efficient than the old one even if I bought the same car.
An interesting implication of this is that AI inference and training has a path to a ~3x hardware cost reduction (and maybe ~2x total cost reduction) without any technical innovation whatsoever, we just need to wait for dram supply to meet demand (either by manufacturing scaling or just waiting for the current rate of manufacturing to fill the demand spike).
The memory makers will not expand demand drastically. It is in the nature of their business to keep the market under-supplied, otherwise the following oversupply will kill them. Instead, supply is just rerouted from less profitable segments such as mobile and personal computing.
China is about to flood the market and prove this notion wrong. If there is demand they want to meet it with supply.
But to your point, that is exactly how American companies like to play now. No one is stopping them from screwing over the consumer.
I have a Micron near me and they are building another chip facility but we are years away still so I suspect China will beat them to the punch.
Not just DRAM market, but the GPU market soon. China is the great equalizer of the world.
> Not just DRAM market, but the GPU market soon
China doesn't have EUV fabs... They've pushed DUV impressively far... but until they get EUV working industrially (and reasonable timelines are at least 2-4 years for that) it shouldn't be possible for them to compete for that market.
> China is the great equalizer of the world.
China is hardly an egalitarian society...
The future is here now, it’s just not evenly distributed. China will mass produce something to the point that it is widely distributed. That is how China acts as a great equalizer on a global scale.
Another way China is a great equalizer is their willingness to do business with anyone that can pay.
Isn't any other significant economy willing to do business with anyone that can pay?
No, most of the rest of the large developed economies have some standards (e.g. against buying conflict minerals) and sanctions against certain regimes. China is quite happy to ignore that if they can get away with it.
China is willing to do business while giving zero fucks about the environment they are destroying and the global warming they are causing. It really blows my mind people support the china thing so much around here.
China is deploying more renewables than most of the world, in some calculations outspending the rest of the world.
Chinese per-capita emissions have peaked at lower level than US and are already falling.
According to whom?
Do you think that OpenAI or Google gives any fucks?
extremely hard to argue we give a fuck about the environment.
drop the bs
> but until they get EUV working industrially (and reasonable timelines are at least 2-4 years for that)
Does this not count as soon? How often do you buy new computers? That seems pretty soon. I remember a year or few ago being told it'd never happen so they're already infinity years ahead of schedule if we accept that as reasonable. The rate they're pulling ahead of expectations appears to be so sharp there is a risk they leapfrog EUV to go on to the next big thing.
How much does the node size matter for dram? My understanding was that it’s been marginal gains on sram since about 7nm TSMC. I would naively expect the capacitor size requirements not to shrink as well as logic, does the smaller transistor make up for the lower capacitance, or do they have to run at higher frequencies and refresh more frequently?
Not an egalitarian society, but their companies have a honey-badger like mentality from what I have read, where they ruthlessly reduce costs and margin down past where non-Chinese companies cannot compete.
China is better than USA in this regard though and the only other superpower able to challange the USA.
India is not even trying despite its size and we as germans do not push the EU as a union.
But they do build infrastructure, usable, infrastructure, ever already built that railway all the way to Tehran, and once the war is over between Ukraine and Russia, they almost certainly will build high speed rail all the way to Europe.
Would’ve been nice if the United States had built a rail system to north to Alaska or even a rail system to Chile to the south?
I guess doing things like that are hard to do when you’re busy fighting multiple wars since the early 1950s.
There are no egalitarian societies. Societies in the west favour the super rich and believing anything else is simply delusional. Sorry to burst the bubble.
To paraphrase Asimov:
If you think that South Africa is absolutely egalitarian, you're wrong.
If you think that Norway is absolutely egalitarian, you're also wrong.
But if you think that "South Africa is egalitarian" is as wrong as "Norway is egalitarian", then your views are more wrong than both of them combined.
To state that no country is absolutely egalitarian does not mean that "China is hardly egalitarian" has to be wrong. And even if some other country (say Norway) were to be as hierarchical as China, that would not disprove the claim that China is hardly egalitarian. It would just mean there exist other inegalitarian countries too.
No one says that China is an egalitarian society lmao. Where did you get the idea?
Not really sure if I'd describe what China has done to various manufacturing sectors as "equalizing".
The thing about China is that they will iterate and iterate some more until they get there and then once there, well like BYD they will disrupt the entire market the cozy days of resting on your laurels for the American/Korean memory companies is over.
The big three memory makers will probably face their last big payday. I hope they enjoy it, as China will dominate the global memory market in three-five years due to their short term greed.
Apple will likely bring memory in-house, like they did with CPUs and GPUs. Anyone questioning the time it took to replace Intel and Qualcomm should consider the Chinese expansion in the memory market, which makes it a long-term necessity.
Apple has the money, and while its competitors have spent/squandered $1 trillion on the AI data center fiasco. Apple made a decision to stay away from the blast crater.
Meanwhile Apple which also has the expertise in engineering and chip design can do what is necessary and bring memory in house. Note: Nvidia and Broadcom have also been replaced along the way by Apple also.
Who knows maybe Intel will condescend to do memory too?
"short term greed"
There is a certain amount of capacity to produce memory. They are building new facilities but it takes a long time. They have been burned going down this route many times in the past (e.g., losing money, firms that are no longer in business).
What would you have them do instead?
Yeah, more global competition in DRAM would be great.
SK Hynix and Samsung are South Korean.
> SK Hynix and Samsung are South Korean.
The Korean memory makers are playing the same game as Micron and simply moving existing capacity up-market.
GP was referring to upstart Chinese memory manufacturers like ChangXin, who - if their yields manage to catch the wave - could not have asked for a more favorable market after the big 3 have abandoned the consumer segment. Consumers who would have otherwise turned up their noses at CXMT will not have the luxury.
Chinese manufacturers will probably takeover consumer ram that most of us use as current manufacturing contracts expire and Samsung SK and micron move all their production to HBM for data centers. Corsair recently released chinese chips based DDR5 sticks.
> Corsair recently released chinese chips based DDR5 sticks.
hm interesting
https://www.tomshardware.com/pc-components/ddr5/chinese-memo...
I've so far been anti-chinese memory in my recommendations, not because it's Chinese (I don't really trust any big organisation/govt other than the EU?) but because they've been very new, and it's not worth PC stability for saving $50.
However with corsair giving it their blessing, and their technology having matured a bit (a lot?), and more reviews showing good stability (longevity I suppose, is TBD) they're definitely worth recommending these days.
Right, so exactly two countries in the world control most of the memory market (US and Korra). As the person above said, more global competition would be great.
Help us Xi Jinping, you're our only hope.
It’s a horrible thought. Really horrible. You should come to China and work in those factories and mines for some years by yourself.
You should work in the Central African Republic coltan mines if you think anyone has a leg to stand on.
I am living in China and I think you cannot just talk with imagination.
I manufacture/buy electronics in Shenzhen and have visited my factories. They seemed fine to me.
And are you going to enlighten us? What is life like working in a Chinese mine?
You should try to see how poor US Americans have to life.
Normal people who have to wait for an event were doctors do free health care in a sport gym for a handful of days.
The Chinese healthcare system is not so different from that in the US. It is primarily employer-sponsored insurance, with subsidized insurance programs for poor and rural folk.
Up until 2005, roughly 10% of the population couldn't even access healthcare, at which point the PRC built out more care centers and invested in training more doctors, but there's still a significant shortage, such that scalpers sell outpatient appointment tickets for 10-15x markup over the actual appointment cost.
There's plenty of ways the two countries are different, but healthcare seems like an odd choice to try to "one-up" the US on, even if its programs like medicare, medicaid, social security disability and others still leave gaps.
Thats not true though.
First of, even per capita, the USA is at 8th place while China is 74.
For sure China has a problem due to its gigantic size and amount of people to even be able to reach its people, but the health care costs are nowere as high as USA has. USA is actually the country with the highgest % of GDP spend for health care alone.
Just checkout a YT video from an US American going to a normal chinese hospital and then compare the bill.
And in parallel the USA is dismantling medicare, medicad and co.
This is also directly reflected in the life expetency: US Americans are getting less old than Chinese people.
Rural China has a per capita GDP 1/10th of the US adjusting for purchasing power; if their health care wasn't cheaper only the very richest could afford anything at all. Even the wealthy coastal areas are 1/3 of the US.
China and the US have the same life expectancy of 79 years, which is a very recent phenomenon due to the 2005-2018 changes I mentioned earlier. Obesity, lack of exercise and other cultural factors weigh down the US life expectancy compared to all other Western nations. China's use of abortion during the one child policy era also prevented a lot of people who would have had chronic medical conditions and disabilities from being born.
It is not yet true, however, that Americans are getting "less old", though it may soon depending on how China manages it's own growing obesity problem and tobacco use.
So your main argument is, that the US American health care system is good and has to be so expensive because people can afford to pay more?!
Feel free to us europe than or germany. We pay less and more people are better of than the US Americans.
US Americans pay 2.5 times more than the avg high income nation: https://www.commonwealthfund.org/international-health-policy...
Were is the benefit of paying that much more if you are not getting older than others?
In a thread about China, I replied to a post about how American healthcare is bad.
My main argument, which is in my first comment, is that healthcare is a bad way to show the difference between China and the US, since they are actually have a lot of similarities, especially with access at the lower end of the income spectrum.
There's literally no reason to bring other countries into the conversation other than to say "US is bad", which does nothing to change the reality of healthcare access in China.
Why would I, as an American engineer and user of tech hardware from China for quite some time now, need to immerse myself in the Chinese factories as if they are somehow worse than other ones throughout the world?
Thanks, please give my regards to Kash Patel.
You just cannot see your privileges.
It's not that I cannot see them; it's that I don't care. And I certainly don't entertain sanctimonious "le China bad" nonsense like that. You live in a country where every 5 years has been substantially better than the previous 5 for a while now. You don't understand your own "privileges" if you can't understand what it's like to live in a country on a free fall that has foreclosed on the possibility of ever building anything again unless it's the latest investor ponzi scheme.
China can afford and has the political will and power to centrally plan parts of the economy it feels like planning. Cars are obvious examples and if dram is next, western manufacturers should brace for impact.
China wants a sovereign DRAM capacity. They're playing an entirely different game then the commercial suppliers in the West.
They also want other countries to not have sovereign DRAM capacity.
And those commercial suppliers are making it too easy an putting themselves out of business before 2030
They’ll also go out of business if they make a massive investment to increase supply, and then the “AI” bubble pops, cratering demand. It’s a tough spot.
all that matters is next quarter.
They want both.
man i keep thinking. why cant india get into stuff like this. Do their own manhattan project to build factories and tech for this and immigrate experts with high salaries.
During Mao/cultural revolution for all the bad, two good things were great focus on K-12 education and reducing religious fundamentalism (curbing religious powerhouses). Both of them are now biting India, alongwith standard problems of corruption that plague most poor democracies.
Both issues also plague Brazil.
At least we have the CIA to blame on religious fundamentalism.
As an Indian, my quality of life would be improved more by the Indian state first figuring out how to make functional roads and garbage collection systems
India needs to first figure out the absolute basics
The way you pay for it is by producing goods that other countries want to buy.
Tata and ASML recently signed an agreement to build a fab in India: https://www.reuters.com/world/india/tata-electronics-asml-pa...
The Indian Government is heavily pushing for domestic capabilities.
To understand why India failed to replicate the Chinese or East Asian model, I recommend A Sixth of Humanity by Devesh Kapoor and Aravind Subramanian.
India doesn't have the coordination China does
I am typing this on my return flight home from a business trip in Delhi. There are many other areas the Indian government needs to be focussing on first.
I had a similar view to you ~2 weeks ago. Spending some time there very quickly made me realize that there’s a lot of other things that are much more pressing.
such as ?????
I would argue that they are hold back by their own culture. Hygene, safety, etc.
Higher education want to move or distant themselves from the poor, dirty or just caste separation.
If you have the feeling that certain things are not your problem, you are not rising.
Thanks to Indian constitution and other laws, incompetence is heavily praised and promoted. Go to any public high/elementary schools, go to any private colleges, it is rampant. It has taken three generations to produce these bad results. Of course, I am not saying that students are dumb. It is just that many smart kids in villages if given an opportunity, will leave India in a heart beat because they don't see a bright future for their future kids.
Indian bureaucracy thinks extremely short term. The state prioritizes extracting revenue over all else. So many baffling short-term decisions over everything from corporate tax breaks to a incentives for global events like Formula 1.l
You really can’t expect the same bureaucratic setup to think in terms of the decade+ it would take to be competent at something like chips
Indian bureaucracy exists to enrich themselves for their next ten generations. The bureaucracy itself promotes, recruits incompetents. There is no way to reform such a bureaucracy.
The whole system needs to be dismantled while an alternative system gets built; given the nature of Indian politics (freebies/jobs/reservations to certain groups; monthly stipends to certain groups by borrowing money while at the same time looting public funds), it is impossible.
I suspect Chinese factories will get built first, but quality may take a few years to really nail down.
Basically:
China floods the market with cheaper but less QA'd parts, makes a gazillion dollars, is able to spend said money to fix yields / QA issues and streamline operations, by the time that happens Micron and maybe a few other existing players will have new memory production, and then we'll have a flood of cheap, reliable memory. 4yr, maybe?
They're doing decent enough already for consumer electronics. Corsair is selling 16GB 6000MT/s CL36 DDR5 sticks in China using memory from CXMT: https://www.tomshardware.com/pc-components/ddr5/chinese-memo...
How long would it take an aggressive company to expand production capacity? I always thought it takes a few years, at minimum, for even established players to stand up new fabs
As far as I can tell, Micron and SK Hynix are using EUV lithography and may be constrained by availability of the equipment, whereas CXMT does not have EUV machines. There were reports that EUV lithography is needed for high yields, but CXMT appears to be proving that wrong.
Micron was dragging its heel on EUV and only got it last year[1].
Seems it's mostly useful for LPDDR modules which are predominantly used in battery-powered devices and to improve margins.
[1]: https://www.tomshardware.com/pc-components/dram/micron-sampl...
LPDDR is used by the Nvidia rubin platform. I can see AMD using lpddr as well because it gives you denser memory at the same or reduced power budgets.
I guess we can't bit these yet. I've been thinking about upgrading my ram for my laptop but it's like half the price of a new laptop lol
It is not a law of nature that Chinese products are lower quality (cf. electric cars) and I don't see why they would go for that. They can just bin what they produce like everyone else and sell their products for what they have been tested to deliver.
But it is a near law that the first to market attempts will fully embraces the deeply engrained culture of 差不多, until market forces beat it out of the product line.
That's no different from the Silicon Valley mindset of cashing out and jumping ship.
What is 差不多?
Chabuduo, basically "good enough" (but often not really). Classic essay on the topic:
https://aeon.co/essays/what-chinese-corner-cutting-reveals-a...
Basically means, “good enough” attitude.
Not "good enough", but rather "close enough". Very different connotations.
Close enough in which sense?
This has nothing to do about nationality, it has everything to do with building and running a brand new, highly technical, mass production facility.
The west absolutely loves enshitified products. So why not sell them what they want? If they wanted quality they would pay slightly more and do something about it.
It's pretty sad it used to not be so bad in the US. This is what happens when ethics and morals are removed from the culture.
And historical record of the lack of QA coming from Chinese manufacturing
Because we buy that stuff even without it. And if you make both good and crappy products, why sell the good stuff internationally?
The US did it when it was a bigger steel supplier, good steel was sold domestically, crappy steel was sold elsewhere. If you got crappy steel in Africa at the time you might have thought US steel was garbage with poor QA, but in reality US steel was great and they just shipped the crappy stuff because people still kept buying it.
>And historical record of the lack of QA coming from Chinese manufacturing
My endlessly excellent Chinese gear (Dahua cameras, XikeStor switches, etc) doesn't know what you're referring to.
China is a gigantic country where one in 6 humans live that either produce directly or indirectly, 70%+ of the world's goods.
It's quite difficult to make general statements at such a gargantuan scale encompassing every single sector.
China has an abundance of terrific QA in electronics and advanced technologies as much as it has an abundance of the opposite, just simply due to its sheer size.
Mapping around defects in RAM has been a viable thing for a really long time.
I remember reading about it in Linux contexts decades ago, and these days it's something that Windows does automatically.
When can I expect this flood of cheaper RAM with less QA? I'd like to contribute to the gazillion dollar pile as soon as possible.
Don't think they'll flood the market. Instead gov will subsidize entire vertical (gpu, memory and power) - you'll just buy deepseek tokens on the cheap, just like EVs, solar and batteries. In return you'll give away your data.
>China is about to flood the market and prove this notion wrong.
China is very far away from flooding the DRAM market.
China does not have sufficiently leading edge fab capabilities to fill DRAM demand.
Yet...
This is wrong. It is NOT in their nature to keep the market under-supplied -- eg, Samsung, the industry's largest company, was notorious for expanding their capacity during the industry downturn to gain market share while everyone else was cutting back to minimize loss.
I'm guessing you are also probably unfamiliar with the terms like "chicken game" which refers to the cutthroat, high-stakes price wars where dominant semiconductor manufacturers intentionally overproduce and slash prices. This is literally how the industry went from dozens to just three majors today since the 80's.
You're making the point for him. Undersupply in a boom, store cash to ramp up capacity in a downturn. Presevres capital and avoids overcapacity during the turning
This sounds like a plan to sell less when prices are high and more when prices are low. That is one of the stupidest strategies a company could adopt. I assure you, the RAM makers are pumping out as much as they can and increasing capacity as fast as they think the market can handle.
I'm not sure what world we live in when the scheming capitalists are all hunched around their table working out how to dodge selling their products into an enormous price boom. Do they not like money all of a sudden?
Building new capacity takes years. The idea is that the market is reliably cyclical, so you should expand when there is a downturn, when costs are low and you can afford the short-term capacity hits that expansion causes (fe. when you divide productive teams in two and fill both halves to full strength with new hires).
That works when there are dozen suppliers. Does not when there are three.
Sure, but the key word here is "was"
The industry is so naturally prone to oversupply that the only stable equilibrium is undersupply. Aggressive expansion kicks off a price war, which immediately undercuts the logic of the expansion.
This only changes with new entrants, which will come, especially from China. But it takes time to build fab capacity, so the medium-term modal outcome is consistent undersupply.
If the existing memory makers retains control of the market and don't defect from the optimal-long-term equilibrium for themselves, that's true. It just takes one player to defect for short term gains as we've seen with some past boom-and-bust cycles. Alternatively, it takes a sufficiently-resourced player with enough incentive to enter the market themselves (NVidia, Google, Amazon, the PRC government through one of many companies...)
Relevant article posted on HN about this a few days ago: https://davidoks.blog/p/ai-is-killing-the-cheap-smartphone
I struggle to think of a line of business as cyclical as DRAM, maybe like certain kinds of mining would be my only thought.
The DRAM fabs have been on a roundabout for 40 years going from getting accused of price fixing and cartel behavior, to struggling to keep the lights on.
And imo it's not really their fault, it's all the lead time of advanced semiconductors, combined with the commodity dynamics of oil. And the goal is to match that supply to the demand of everything from consumer electronics to more datacenters than you can shake a stick at.
It's maddening to try and solve that, so at this point I really don't fault them for prioritizing survival.
> from getting accused of price fixing and cartel behavior
"Accused" makes it sound like these things may still be up in the air, when they very much are not. I would choose instead the much clearer "A number of those involved in DRAM production have a proven history of cartel behavior and price fixing."
For those who may not be familiar with some of the history in this area:
https://en.wikipedia.org/wiki/DRAM_price_fixing_scandal
I said accused mainly because the big 3 won their last antitrust suit in the US, sort of "what have you caught me for, lately?" approach.
For all I know, maybe they are dumb enough to try and actually coordinate again, my hunch would be no, or they've tried something new and inventive. Like Matt Levine talked about how so many landlords were using the same software to set prices, that one was pretty shady.
But it is interesting where it is popping up at the moment, like power transformers is another area. These companies have lived through these cycles before, and know there is no one to save them if they overleverage and get it wrong.
What you described only works if the manufacturers agree to price fix. Otherwise, in a free market, they'll race to increase their earnings by meeting the demand.
Reminds me of how Samsung is giving out $340,000 per person bonuses. Shows you how much of a stronghold they have in market.
They did that to avoid losing even more money in a strike, not because they wanted to.
No company ever wants to give out big bonuses, but it's only 10% of their profits. So it still shows the scale of the money they're making right now.
I think you're probably referring to SK Hynix. Samsung's situation was more about dealing with the fallout from the labor strike.
CXMT is scaling up incredibly fast, they are on a clock (south koreans) their monopoly will end relatively soon, although I'm guessing that the AI companies will crash before that anyways.
> their monopoly will end relatively soon
Corsair DDR5 DIMM modules with CXMT RAM started appearing on Friday.
Supply and demand always balance out. There is no way manufacturers aren’t going to compete away these inflated margins, as long as they feel like this demand is sustainable.
You know there's other strategies? Companies can be more clever than naively undercutting each other...
Memory in particular ... https://en.wikipedia.org/wiki/DRAM_price_fixing_scandal
The entry-cost to getting into memory is on the order of $billions and years - you can do just about anything...
not if china gets into the picture
why not? i'm sure they can jump into the hustle.
Increasing the availability doesn't mean decreasing the price ... people think those are intrinsically related - not so much.
You can get a prada shirt for $2,000 ... as many as you'd like, for $2,000 a piece. No problem. They'll make the factories go burr all night long. Still $2,000.sweeping
There's a bunch of things like this. $100 bills for instance ...
a new entrant might yield a price drop, or, it might not.
Only in the most naive sense.
If it costs you $1B and five years to build out new supply and you think demand will not sustain for more than three years, it does not make sense to expand supply.
Instead you will maintain your margins currently and await demand to decrease back to your current supply.
This is pretty common and as others have pointed out is even more common in markets where competition is slow and lead times are long.
Ammunition is a great example over the last decade or so as political turnover caused relatively short lived demand spikes and manufacturers didn't expand supply because they knew once political winds shift, demand would decrease.
...which is presumably why GP said "as long as they feel like this demand is sustainable."
There's very few manufacturers, I believe 3 globally? And there's a large moat. Nobody can compete with them in the next 10 years. It's really not hard to coordinate action between 3 companies.
There are trillions to be made. That moat won't be as insurmountable in hindsight.
There used to be over 50 memory manufactures in the US alone. Everytime there was a bust (following a boom) there'd be bankruptcies. The lucky ones got bought out and consolidated. Empirically, attempting to capitalize on memory booms is a losing strategy.
There really aren't though. The reason there's only three is because memory is a commodity and margins are historically very low. It's not a very good business to be in, generally.
In the past when memory supply was short and then rebounded, many companies went out of business because making memory was no longer profitable.
And margins will continue to be low, otherwise they'll discover they don't have a moat. Commodity markets being competitive is a self fulfilling prophecy.
The companies have two choices. They either produce RAM cheaply and in large quantities, or they get replaced by someone who will produce RAM cheaply and in large quantities. Current incumbents are free to pick which of those two scenarios they prefer.
Apple could always decide to build their own fab or some such thing.
That’s not the Apple way, but they might fund a supplier to build out capacity in return for priority access.
The thing is they tend to only do that when they can get a technological competitive advantage. The priority access gives them a locked in competitive edge, for a while. It’s not clear there is an opportunity like that in memory.
It wasn't their way to design CPUs until it was their way.
Apple doesn't want to enter low margin business
Designing and producing are separate
If you factor in Nvidia’s profit margin due to the scarcity of the current bleeding-edge chips there is a path to a much larger cost reduction still.
There’s a lot to criticize Sam Altman for saying or popularizing culturally but I’ve come to think his “this is the worst it will ever be” is, in the long run, actually a very intriguing and underrated point.
In a decade training LLMs to the current level of sophistication, which is in my opinion rather advanced and probably has lots of additional upside just from constructing better RL training regime independently of hardware advancement, will become just as table stakes as running a database is now. I highly recommend everyone look into the Allen Institute’s projects in GitHub and HF because they have open source training materials (including an LLM from scratch off common crawl, and some quite interesting tunes of qwen) to get a taste for what will be in the near future afternoon projects or educational material. The future is going to be wild
These crazy hardware price increases will probably delay everything by at least 2-5 years. Then add at least 5-10 years for all these refinements and optimizations to permeate universally.
Until everything matures, most likely the current iteration of OpenAI and Anthropic will be long gone, along with their current business models.
This line of thinking makes sense if we're talking about opex like power usage. This is capex though and we'll be financing this overpaying for a long time after the hardware has "aged out". Not really sure there is an upside to it.
Also, inference cost predictions were made before this price jump, so we really haven't started paying for it yet. Inference will not be getting cheaper.
It sure looks like Sam Altman's masterful gambit to corner the memory market has had unforeseen consequences.
Is any of this actually unforeseen? Buying the vast majority of the world’s supply of something does have mostly predictable consequences.
Yes, but that's not what they are insinuating.
“Unforeseen consequences” in the same way death of the target is when someone aims a loaded gun at their head and pulls the trigger.
What demand? Can't shake the notion that it's fictive considering the amount od data centers being built and GPUs sitting in containers, where they will spend quite some time before being even integrated, even more until used...
Really wondering what this might mean for local LLMs when RAM costs plummet...
What’s the lifespan/refurbishability of the capex elements like the “GPU” modules or even the DRAM soldered into them?
For lifespan, AWS is still running a ton of T4 GPUs from 2018, that power a lot of computer vision models. A ton of these will have a long life, not all ML is about frontier LLMs.
How can it be economically viable to still run them?
You can get 100x the output with the same energy use.
While the 100× is, I think, rather hyperbolic, there is a real and large efficincy difference, but its economically viable to run them because the supply of newer GPUs is insufficient to meet the demand for compute, so they can charge enough to cover costs for the old ones and a premium (relative to operating costs) for the newer ones.
It would be economically unviable to run the older ones if the supply of newer ones were unconstrained, but that’s not the world we live in.
As long as you have customers that are willing to pay more than it cost you are fine. And with AWS seemingly there is plenty of those. So question isn't is this most efficient way but will someone pay at price that is above what new hardware could attain.
Going by the stats on wikipedia, T4 and B300 both do about one teraflop of half-precision math per watt? Where are the efficiency gains?
Edit: It looks like they replaced INT8 and INT4 with FP8 and FP4, with the same speedups of 2x and 4x relative to FP16. That's an improvement but not that big of an improvement.
Presumably people using AWS are paying more than they cost to run, and AWS has finite bandwidth to upgrade things due to personel, etc.
Good question!
Maybe the capabilities of newer GPUs allow AWS to charge higher margins for them? I don't actually know.
There has not been a "100x" in efficiency in the past 6-8 years.
I wonder if we will see an adoption of alternative floating point formats. IEEE floats are notoriously terrible at lower widths (<= 16 bits). Floating point formats such as posits do much better at 16 or 8 bits. If you could train at 16 bits per value instead of 32, and suffer a much smaller inaccuracy penalty than you would from IEEE32 to IEEE16...
That's already the case with say bf16
Notoriously terrible?
Posits do a little better if your numbers are biased enough toward 1, but not much better. A 16 bit posit in a near-ideal situation matches an 18 bit IEEE float, and in a pretty wide range of situations loses to either fp16 or bf16.
Training anything at 8 bits is going to be tough, and it's hard to say if the flexible exponent is worth the precision tradeoffs.
This has been around for quite some time, to the point I had to read this a couple times to understand what you meant. Mighta predated LLMs even.
For some reason I still haven't heard any predictions on when new fabs will come online to meet the current demand. This shouldn't be too hard to find out, since the building time of fabs is very predictable process.
The difficult question is more whether foreseeable memory demand will remain at the current level, grow even further, or shrink again.
It's very easy to find out when the new fabs come online. Try asking Claude or ChatGPT.
They say around 2028. See https://manufacturing.economictimes.indiatimes.com/news/hi-t...
Original source (paid): https://asia.nikkei.com/business/tech/semiconductors/memory-...
No new DRAM fabs are being built. That's why you don't see any predictions.
Seems unlikely. Increased demand usually causes increased investment in increasing supply.
Not necessarily in a notoriously cyclical industry where everyone has already been burned by doing exactly that multiple times
What would they be doing with their enormous profits instead?
Return the money to shareholders instead of incinerating it
2-3x is completely dwarfed by the remaining improvements in training which is still in its infancy relatively
Unless there's a new paradigm, scaling up is all they can do to improve performance. They've shrunk down all the way to 1-bit models and all the low-hanging fruit is gone. There's no way for them to get much smaller, so they have to get bigger and faster to meet expectations.
This hasn’t been true for the past 2 years
Is this based on an assumption that Opus 4.7 & co are equivalent or smaller to Opus 4.5 & co? I highly doubt the advanced models (Opus, Pro, etc) aren't biggen than the standard ones (Sonnet, Flash, etc) and fairly sure newer models are bigger than older ones.
this is just not true at all, there are massive leaps from algorithms, data, etc. every year. scale is one axis of many and you need to get them all correct.
What novel data hasn't already been used in training? What new algorithms are there? Can you post some links so we can read about them?
Probably, but at some point we're very likely to run out of significant training improvements and it's not clear that we'll see that point coming from a long way out.
Likewise it's probably dwarfed by improvements in how we make dram - continuing the roughly exponential (maybe a bit less recently) scaling of chips - but not necessarily.
The 2x from returning to previous costs is interesting because it's practically guaranteed, and it's on top of everything else. We're just currently "overpaying" (relative to the stable market price) for the manufacture of dram because of a sudden increase in demand.
my reply from the other thread fits here too:
> this is just not true at all, there are massive leaps from algorithms, data, etc. every year. scale is one axis of many and you need to get them all correct.
> either by manufacturing scaling or just waiting for the current rate of manufacturing to fill the demand spike
Or the more likely scenario that the AI bubble bursts and the hyperscalars realize they have built too many data centers.
Well, no: manufacturers charge more than input price generally, here specifically, Nvidia wouldn’t lower prices because RAM went down.
Supply will not meet demand. What incentive do the handful of dram manufacturers have to end the party? This is what happens when legal monopolies finally win control. Dont't worry. The patents will expire in a few decades. Our grandkids will see DDR5 get cheap again. The system functions as intended.
Patents is not the issue here. Not even close.
The up-front investment of a memory fab is measured in billions, and takes years to construct and get running. The margin on the chips themselves is terrible, so without scale its not worth even trying. DDR5 is a industry standard that takes some effort to conform to, but the licence fees is a drop in the bucket to the cost of creating a fab.
The fabricators were cautious about increasing production, and slow to start planning. It takes further time to build up capacity, and if the demand drops down, they may end up producing dram at a loss when the market flips over to oversupply. The demand whiplash could kill any company that dared betting on increasing production. See the "bullwhip effect" https://en.wikipedia.org/wiki/Bullwhip_effect which has killed semiconductor fabricators before.
There is a discussion to be had about how to maintain national semiconductor production in Europe and US as a strategic industry, but historic attempts have all failed.
Billions is nothing in this market - if the market is supply constrained in the medium term then the hyperscalers will purchase their own route to manufacture (e.g. through coinvestment).
Also that's not what the bullwhip effect is - although I know what you are saying. The bullwhip scenario is about the effect of communication and batching through various layers in the supply chain, this is more similar to the cobweb effect/theory.
I have fairly simplistic view of the economics involved here. Could you explain why the ability to sell more chips wouldn't be sufficient enough incentive to increase supply?
Not the person you’re replying to, but RAM has historically been a boom-or-bust business, and companies that invest to meet demand during a boom cycle usually have that new capacity come online just in time for the bust.
If it was just variable costs and new capacity was available today they’d do it. But there are substantial fixed costs and delays to increasing capacity, and that uncertainty makes it risky.
That's such a nonsensical argument, it holds for every other business too and in this case it's just a lame excuse for monopolization. If you are that chicken and can't stomach competition you should not be in business anyway.
The current RAM manufactures were convicted of conspiracy to manipulate prices back in the 2000s or thereabout, doing so is their modus operandi, but this time the government is participating in the racket.
There are other boom/bust businesses that have had waves of bankruptcies. The commodity sector is of particular note. You're seeing the same reluctance to spin up new oil rigs in the shale industry for similar reasons, despite record high energy prices.
Chip manufacturing has unusually long spin-up times, high capital costs and relatively thin margins for anything but the latest and greatest processes, compared to most industries.
Well, let's remove the sanctions from China then and we'll get a better idea about costs and spin-times.
BTW all RAM is severely overpriced, not only the one using the latest process nodes.
You seem more invested in hating RAM manufacturers than interested in the actual economics of the business.
Look up Qiminda, ProMOS, Elpida. They invested in capacity during booms.
Bringing on new fabs takes many years and billions of dollars. You're exposing yourself to a lot of risk if you build now and find that the gold rush is over by the time your new capacity is online.
Let's imagine you're drilling oil instead. You have to spend billions of dollars over years finding and developing a new oilfield to make any profit back. And once you have it, you have to continuously spend enormous amounts of money to keep producing it, which means your effective price floor is higher than the current stable price.
Now it's 2021 and someone gets a tanker stuck in the Suez, sending the price of oil sky-high. How long does the ship have to be stuck before you spend those billions of dollars on a bet that it'll recoup before someone gets the ship out?
Although on the flipside, let's pretend it's 2017's and you are Nvidia selling GPU's for Bitcoin - maybe demand will dry up at some point? Do you stop scaling production as this might be the max of the market, or do you follow the market and increase production?
It's always easier to see the right move in hindsight!
Nvidia doesn't own fabs though, TSMC does. By 2017, ASICs for Bitcoin were well underway. Ethereum hadn't switched to PoS, and wouldn't until 2022. For that specific question, the answer is yes, because the GTX 1080 Ti is/was a monster card, and the crypto miners have a somewhat predictable demand for them, so there's some modeling you can do based on demand for the 2016 generation of cards. The question is ofc, if you're Nvidia, what are you optimizing for? Let's say, without foresight that Ethereum would move to PoS in 2022 and that AI would replace that demand, how many 1090 Ti cards do you make, how many 1070s, how many mobile 1080s, how many Titans? In order to answer that, someone at Nvidia would have to have, for better or worse, really had to have gotten into cryptocurrency in order to understand that market. Because you, as Nvidia, know how much better the 1080 will be for mining Ethereum, certain predictions can be made on demand.
Question is, without hindsight, 2022 rolls around, Ethereum moves to PoS, do you sell NVDA?
TSMC doesn’t get to take the profit that currently accrues to Nvidia and Apple, even though they absolutely could from a business/leverage perspective, because they are an economic colony of the United states and hiking their prices (which Apple and Nvidia would have almost no choice but to pay, but would upset their benefactors) would jeopardize their national security/defense.
In a world where TSMC is functionally capable of the same level of production but not in such a complicated geopolitical situation regarding semiconductor manufacturing, things would be quite different.
Got me curious, how does TSMC price their products? Why don't they optimize for their own profitability?
TSMC builds new bleeding edge fabs and then amortizes them for many different customers over a decade or more, starting with higher margin customers (apple, nvidia, etc) and working down as time goes on and the higher margin customers then move on to newer plants. Today's bleeding edge fabs become tomorrow's mass market fabs for lower margin chips that go into cars/toasters/etc. The idea is that the early adopters pay for a decent chunk of the CAPEX and then it becomes a commodity play. It's the same way some auto manufacturers put new tech into their premium cars, then it trickles down to the mass market cars over time.
It's the main reason outsourcing fabs is so much more economical. If NVIDIA built fabs just for itself, the fab's CAPEX would be amortized over fewer components than if a third party did, even if NVIDIA was the largest customer. It's also one of the main reasons Intel fell behind. So much of their cashflow was to build fabs that made an order of magnitude less chips than TSMC. Even worse, they had to write down the CAPEX for the fabs, which affected their financial statements.
Anyways, companies like apple and nvidia have very long term horizons and contracts, which probably have first right of refusal contracts on capacity, etc. In the short to medium term, apple probably isn't paying much more for most components. If this memory shortage lasts decades, they'll eventually end up paying more.
Nvidia doesn't own fabs, but TSMC is doing a massive global fab expansion.
Its a lot easier to commit to spending billions of dollars in a hypothetical then reality.
But it's also easier if you have a market cap of 2 trillion and are worried about your competitors scaling to meet demand.
> a path to a ~3x hardware cost reduction
Really?
How long do we have to wait until that ... cost reduction hits us?
For supply to meet demand. Depends very much on how aggressively producers scale and on how demand grows or shrinks.
Safe to say at least a year or two. It'd be shocking if it took a decade.
All the projections I've seen have said that the earliest we might see the curve flatten is 2030.
It just takes that long to get a fab up and running.
I bought 96GB of RAM a couple of years ago for ~$250. That same RAM now costs $1200!
I paid $279 for crucial 96gb DDR5 5600 MHz SO-DIMM ram October 22 of last year. Amazon has the same kit going for $1,048.90 right now.
CORSAIR Vengeance 96GB (2 x 48GB) SO-DIMM DDR5 5600 CMSX48GX5M1A5600C48
Bought an extra one by accident, paid $218.99 March 2025
Goes for $1400 now. I haven't gotten around to selling it.
Nice, you were lucky. =3
I bought 192GB of DDR3 a year ago for literally $60 ($5 a stick). It's about $22 a stick now, so more like $350 today. What on earth is _anybody_ doing with DDR3?
Demand for DDR3 is up because people who want DDR5 or DDR4 but can't afford either any more are choosing DDR3 and old DDR3-compatible systems to put it in, instead of what they really want.
At the rate we're going, soon we're going to draw from SIMM stock.
Just to be clear, this was to go into an ancient Dell T420 NUMA system. Well over 10 years old.
All memory products use many shared resources in the supply chain, so if there is high demand in one product line, others have to raise prices to compete for the resources or stop making those lines altogether.
That is to say at least you were able to buy them at $350 today, with the current trajectory there will be no supply at all in few months.
You could set up swap space on Intel Optane media, it'll be about the same performance as DDR3 and sells for ~$1/GB on the secondary market. Though it will be a lot more power hungry than Flash, let alone DRAM - so not suitable for all uses.
Doesn’t that require an Optane capable system?
Optane is a technology I’m still mad never became mainstream. It would be particularly useful today when trying to run local models.
Optane is available in NVMe form factor that will work basically everywhere. There's also Optane persistent DIMMs that only work in highly specific systems.
there's an economic term for this: substitute good. https://en.wikipedia.org/wiki/Substitute_good
Being desperate?
I’m so mad I didn’t max out my main server when I had the chance. Used enterprise sticks were dirt cheap on eBay.
Used enterprise HDD’s also jacked up now. It’s absurd lol
Just decided to buy 8 drives for my NAS and was surprised to see nothing in stock anywhere + prices are 3-4x higher than half a year ago. Just wasted 2k eur for 8x8tb, it should be plenty enough for my NAS but I feel stupid having to waste so much money.
Ridiculous prices indeed. I was grabbing refurbished/recertified 14tb helium drives for under $200/ea barely a year ago
Yep mad about that too. I was about half way through upgrading my 45 drives server when they started to go up.
Brutal. I only did a 6 drive bay and I was angry lol
He is talking about the company '45Drives', which produce solutions which aren't only 45 drive bays.
[0] - https://www.45drives.com
Appreciate the clarification!
People spend that much a month on restaurants
I just found two 4tb Samsung EVO drives - unused - while organizing my garage.
I forgot to add, I paid ~500 each, Samsung for the same drive is quoting $2k on their site (maybe a new sku). These were bought 2ish years ago. Strange things are a foot at the Circle-K.
Lucky find! Just picked up one of those for a build, ohhhh boy was that a painful purchase. Thank god for my fortune to work in tech.
Makes prior assumptions that getting tens of gigs of ram is cheap thrown out the window. Would likely lead to super fast SSDs such as optain being way more valuable
The price of SSDs is similarly depressing.
I bought a couple of used computers with 256 GB of DDR 4 (total) a year ago. The ram is worth more than I paid for the whole machines now.
Someone was selling an Epyc machine with 512GB RAM @ 500 EUR last year. I regret not buying it now ...
paid a bit more than that just for a half-decent 16 gig stick recently :)
i compensate by never paying for AI
2x16gb for $105 total April of 2025. $600 for that now. Makes no sense.
It is one of the thing with consumer when they remember they brought it at the absolutely lowest price point when DRAM maker were bleeding money.
Those are not normal pricing. Before the pricing collapse in early 2020, 96GB DDR5 would have cost about $450 to $500. And I will need to restate again the cost of DRAM hasn't really changed much in the past 20 years. Its price just goes up and down in cycles.
So in reality it is more like going from $500 to $1300. But consumer felt it was more like going from $200 to $1300.
Crucial are already selling DRAM made by CXMT. And China are already throwing money at it. I doubt the memory bubble will burst in next 12-24 months. As in going back to money losing DRAM pricing. As they will all pivot to HBM or other money making products. But the bulk of lower end consumer DDR5 or LPDDR5 will goes to Chinese Foundry. Assuming they have figure out how to do them well. Which they have improved but are still so far away from industry leaders.
Normally memory maker will push the next DDR standard to market just to push out Chinese competitors, I am not sure it will work the same this time around. DDR5 have plenty of other usage / demands.
> Its price just goes up and down in cycles.
Historically the price has always trended downward. When I first got into computing $200 could buy you 128 MB (yes M) of ram. Really nice systems had 512 MB.
That's obviously changed over the decades as process shrinks have lead to higher memory density. We should generally expect that ram will cheaper up and until the point where process shrinks stop happening. They've definitely slowed, but they haven't stopped.
>They've definitely slowed, but they haven't stopped.
Yes if you span into 40 years. But the spot price for DRAM floor was ~$2/GB in 2008 and touched that 2-3 times over the next 15 year. It wasn't until early 2020s it broke that into $1.
Process shrinks happen but majority of DRAM part can't be shrinked by process any more.
Exactly. My first computer had 48k, yes K of ram :-). My first PC has 2MB and made all my friends jealous as they had 1MB. Amiga 500 at the time had half.
I am keeping a piece of paper that came with my Tex Murphy game which stated that one could get 32MB of RAM for as little as $700 (1990s dollars) which would drastically improve the game!
> Crucial are already selling DRAM made by CXMT.
Crucial was disestablished this year.
He probably meant Corsair which is the DRAM brand selling CXMT produced chips.
Ah, the old decrucialisestablishmentarianism.
I found the phrasing weird myself, I quoted wikipedia
My main computer has 64GB. I bought that one in late 2022 or so.
Looking at the current prices, even of the same RAM, is just insane. Those companies really need to pay us compensation damage here. The whole "free market" notion does not work when you have de-facto monopolies and mega-corporations abuse average Joe and average Jane.
Ramflation
yea, but people now have more money.
Everything I read seems to suggest that RAM capacity is going to grow at 20-25% a year, which just doesn't seem good enough. Even in consumer use cases, phones and laptops would benefit greatly by double the amount of RAM. And then obviously, the AI need is gigantic.
I don't see it going away. I mean, it may not grow as fast as now, but I don't see it growing away either. I get why the memory makers do not want to bankrupt themselves, but it feels like there's got to be some way to push that risk off onto model providers and other people in the ecosystem to allow us to grow ram capacity more like 50% per year.
The openai deal would be absorbed by two years of that. And it would be inefficient for the RAM makers in a competitive market to leave buyers unsold-to.
I don't actually know what the rate of growth before October was, I'm sure someone round here will though.
In theory the new futures markets for chip components would help here, since it would allow DRAM suppliers to insulate themselves from that risk.
I mean the biggest risk is Chinese CXML benefits and capturing markets that others are leaving hanging and then being able to compete and push out the others when costs start to normalize.
As for 20-25% growth not being enough, I think it's not that far off, if we assume data center build out plans hit a wall and slow down significantly, and the AI heat starts to cool off.
I don't think 20-25% may be enough in the short term but if the AI build out stops within this year, we have a massive oversupply instead of a under supply.
Looking at the history of the memory industry the biggest risk is that a firm would over produce and go bankrupt. Maybe this time is different but so far no memory chip maker has gone under because their competition increased capacity.
I might be wrong but your second point can't be true if the first one is true.
Let me explain, imagine CXML grows massive and builds a lot of fabs, so much so that it becomes the leader in multiple segments, then the market demand cools off.
Then CXML the company that invested massively has oversupply so it undercuts every other memory company.
Aka, Samsung, SK Hynix are dead, and to protect Micron now US has 10000% tariff on the supply of memory.
Imagine. Because that has happened, if you don't play the boom and bust game someone will because the market is very large during a boom, and generally the player scaling more isn't the one with margins to protect and generally has the ability to undercut others.
Asian memory chip giants were made by under cutting European and American companies, American companies adapted by moving manufacturing to Asia, and European ones got bought for pennies or dissolved.
Is there any indication research is being focused on reducing menory footprint of inference for frontier class models? Is the low hanging fruit already gone there?
Low hanging? how low hanging are we talking, the basic stuff is gone. Largely big challenges around quantization were solved 2 years ago, and we have just been improving from there.
But can massive gains still be made? Definitely.
The entire AI hype is based on the paper Attention is all you need, and Attention is basically loading a huge matrix of all the tokens in memory, how well you can optimize this attention layer is basically how most architectures are trying to solve for performance and memory usage.
Only one with significant gains in it is DeepSeek (or so I would like to believe because others don't make their work open for folks like me not in Big AI Labs to read). Their MLA architecture reduced KV-cache memory requirements by upto 90%, ofc that's purely architectural change.
With some quantization like Turboquant from google you could push it down to ~1/3 of that. So 96% memory savings when talking about kv-cache.
But the models are close to being saturated for quantization based memory optimizations. We will have to see some architectural changes for a significant shift now.
The other side of this is how powerful small and medium parameter models are.
24b param models today are way more powerful than 24b param models 2 years ago.
If they manage to make memory more efficient, they’ll just increase the context size and/or model size.
We just haven’t reached the diminishing return of gen AI capabilities yet.
Models will get more useful if you have higher context size or higher param size. Then people will just use the models even more, leading to even more memory demand.
What is the risk? Competition is good for consumers.
The risk is to the business not the consumers
There's no risk to businesses that are paying bonuses of $ 1 million, per worker, per year - like the RAM makes Samsung and SK Hynix.
They are drowning in money but they don't invest in new production in order to maintain high prices. By doing so, they form a virtual trust with monopoly control over pricing. What you call "risk" for them is our best hope, China can't enter the market soon enough.
Oops, the US government is blocking the Chinese chip industry in every way possible and thus becomes a factual member of the aforementioned anti-competitive and anti-consumer trust.
Micron is a US company, and US did the same against Japan in the past
> Micron is a US company
Micron doesn't make RAM for the consumer market, they serve corporations only. That's been the case for about 1.5 years now.
> and US did the same against Japan in the past
And the USSR self-isolated from China like 20-30 years before they... disappeared.
They closed Crucial in an announcement they until very recently still sold stuff to consumers and they have a business entity still to provide support and warranty in most countries.
I got my RAM rma'ed 6 months ago, yes it was intense
According to the recent article HBM memory is 3x less efficient wafer area wise than LPDDR; but the bandwidth is more than triple.
What if its in everyone's interest to buy computers at say 1/3rd the rate and switch everything over to HBM?
the discrepancy between compute and memory has been growing for ages, perhaps a painful switch to HBM is exactly what we need?
Would you rather have 3 intermediate computers with low memory bandwidth, or wait a little longer statistically so that we can all enjoy a new computer at 1/3rd the rate but much higher bandwidth than the area ratio?
These are fundamentally different points in design space though, hbm doesn’t have a 10mw idle draw like lpddr does.
I hear people are doing AI workloads on apple hardware, which is LPDDR but with a wider memory bus (1024bit). This requires the SoC to support this; from what I understand not many of any beyond Apple offer this. A wider memory bus may be all we need.
Can’t put HBM in smartphones and laptops. The power drain is too great.
Not many workloads are RAM bandwidth limited. Power and latency are much more common bottlenecks, and HBM loses on both of those.
Multicore workloads do tend to hit RAM bandwidth limits before they hit power constraints. If you do the math, running at max frequency and core utilization would usually imply you could only access a byte or so per core clock cycle. Perhaps a mere handful of bytes for the highest-performance systems with in-package RAM.
What percent of the time do you think the average consumer computing device spends fully clocked up, let alone fully saturated on every core?
Historically most devices were serving antivirus and snooping. Ai is the first time they are being used for actual computing again. They will be kept saturated.
Isn’t memory bandwidth super relevant for AI?
It is like the most important performance figure. When I use an LLM that mostly fits on my GPU, the GPU will run at about 30% of its maximum power consumption - probably because the memory can't feed the ALUs fast enough. Similarly for the part that runs on the CPU, the CPU cores will show 100% utilization but not consume as much power as they usually do under full load. The GUI will also be choppier than usual under full load (noticeable, but not too annoying) presumably because pixel pushing also needs some nontrivial memory bandwidth which is hard to get.
Yes and so we use HBM for AI (among other things), but that's an exception. For things like games or displaying webpages, its not very important and we generally don't put HBM into things for that.
I'm not moving past my DDR4 build (and the 32 GB of DDR4 2133 MHz backup chips I still have around from way back, before I got the current 3200 MHz ones) until the prices go back to being at least partially sane. This also means that CPU manufacturers are not getting my money (since the 5800X is fine for now) and I have no reason to get a new GPU either (though admittedly the B580 isn't perfect).
What if this is the lowest that prices will ever be?
As Yogi Berra famously said, "It's tough to make predictions, especially about the future." But based on historical tech industry trends, a price increase in one component that's this rapid and extreme, is likely to eventually regress somewhat toward the long-term trend line - even if that trend line experiences a longer-term shift upward.
As always, some interpret certain recent events as reason to conclude "but this time it's different." Occasionally they are correct. But that doesn't change the fact that it's reasonable to assume some of the recent extreme, rapid price inflation is due to shorter term market distortion. It's also pretty clear that some of the recent increase in demand represents a stable increase in the long-term trendline. The question is how much is long-term stable and how much is short-term distortion.
Then I will make my build last as long as it can, in protest of that. I do expect at least a performative price drop in the coming years, though.
Then I better divert all of my investment into memory maker stocks.
Awful time for gamers and PC hobbyists not fully into AI.
This is 100% going to kill the home built pc market. When I started building gaming pcs, the top top card was 750$ (NZD). Now they’re 10,000 just for the gpu and another 1-2000 for ram.
People used to get into gaming pcs as an affordable hobby, now it’s making general aviation look like plan B.
This has already happened. Home PC market is practically dead already due to memory, ssd and graphics card price inflation. Makers of components like PC cases and power supplies etc. are seeing demand down 30-40% year over year and this is going to put many suppliers out of business. NVDIA has stopped even listing gaming revenue on their earnings reports. Both NVDIA and AMD are not seriously interested in supplying the consumer GPU market anymore either.
The only hope left is really Apple, but even apple has conspicuously delayed the launch of M5-gen mac minis and mac studio. Mostly because even Apple can't source enough DRAM to fully supply all their product lines.
there's much more than triple A video-games running at 240 Hz on Ultra settings... a 200 USD laptop/computer has enough power to run hundreds of interesting indie games and AAA from the past
My 2019 gaming PC is considered unusable ewaste by most pc gamers. The RX5700 XT GPU is super cheap second hand right now and I've been able to play every game I want including new releases like Kingdom Come Deliverance II on great settings with no noticeable issues.
You don't even have to drop down to old indie games. You just have to turn off the FPS counter and stop pixel peeping screenshots.
Yeah sure, but some folks were in it for the hot rods too.
Were in for the hot rods.
You can still play fantastic games with amazing gameplay, great storytelling, and even requiring quite a GPU. But you won't upgrade your GPU or RAM. If it gets broken, people have already gotten their money back instead of replacement (whether that is legal or not, depends on your jurisdiction, and regardless: it is happening). So the demand and adoption of say 240 Hz 4k OLED gaming is going to slow. I currently sport two 1440p IPS capable of 144 Hz, with an AMD 6700 XT, 64 GB DDR4, and a 5700X3D. I'll wait upgrading that to a 4k rig.
What I will do is buy a Nintendo Switch 2 before the price increase hits. Why? Great gameplay for kids.
I don't understand the threat to the PC market.
Prices haven't risen THAT much and are quite affordable. And if you look at the improved quality of upscalers (DLLS 4.5 for example), gaming is now more affordable than ever, despite the increased cost of components.
Of course, the 5090 prices are insane, as are for SOME memory models, but that's nothing new and represents a fairly small market share.
> When I started building gaming pcs, the top top card was 750$ (NZD)
When I started building gaming PC, the top $700 cards didn't even provide comfortable performance or graphics. Back then, you were supposed to have several of this connected SLI or somethin. And even then, it wasn't always reliable, and it resulted in stuttering, lags, and graphical artifacts (in cases when it worked). Today, even $700 graphics cards are a much better product from a user perspective than the high-end cards of that time (and that's not even taking into account that $700 cards back then were much more expensive).
Improved quality used to be the justification for buying new hardware at a similar price to the old hardware when it came out new. Now the 5060/70s are 4 figure cards.
As for how much the prices have actually risen, it’s not hard to see if this is true or not. If doubling of prices doesn’t raise your eyebrows, I’m not sure what will.
> When I started building gaming PC, the top $700 cards didn't even provide comfortable performance or graphics.
When would this have been? I can not remember a time this was accurate for the games of the time, outside of a handful of meme titles like the original crysis that made bad hardware bets. Most of them fulfilled the needs of the software and hardware of the time. I'd say the biggest issue was that for a time, software and hardware were advancing so rapidly that you wouldnt get very long out of your hardware, but that's just the reality of rapid development and not the fault or failure of any specific hardware release.
> Back then, you were supposed to have several of this connected SLI or somethin.
SLI was aimed squarely at enthusiasts, not at joe-average PC gamer and it was certainly never a requirement. It existed as a halo feature for people chasing maximum performance, benchmark scores, and bragging rights.
Yes, this will definitely renew interest in Stadia type products.
Why? Those servers still have to pay the same price for components plus a markup for the service. In theory you can serve more gamers per GPU, but these GPUs have to be physically located in your city to have a usable latency, and that means you'll have issues with peak utilization being most users gaming at the same time of day.
I just don't see the cost savings of sharing a GPU overcoming the extra expense + profit such a service would need.
The GPUs do not have to be "psychically located in your city" to have usable latency.
Of course, less latency is always better although running a traceroute between my IP and major city (Sydney) from 1,500 km equates to about 11ms latency with optimal routing. (Real life test, traceroute via an ISP Looking Glass).
1500km is still largely the same timezone though. To actually get consistent usage of the GPUs you'd want users on the other side of the planet using them while the current side is sleeping/etc.
> Those servers still have to pay the same price for components...
Not if Nvidia is running the service.
Seems quite possible to me that Nvidia sells to the public just enough graphics cards to keep any frisky antitrust investigators off its back and reserves the rest for GeForce NOW, its "pay monthly for limited access to a remote gaming PC" service. The cards for NOW are billed to the BU running NOW at or below cost, the few cards available to consumers and System Integrators naturally have a huge markup due to extremely constrained supply, and Nvidia uses the fact that they are the thing behind the LLM Boom to ensure that they have -what a System Integrator in 2022 would recognize as- a reasonable price for just enough RAM for the computers that NOW rents access to.
Downvoters: notice the speculative nature of the previous paragraph. I'm not claiming that this is happening right now. I'm claiming that it's quite possibly more profitable for Nvidia to bill monthly for limited remote access to computers with Nvidia graphics cards in them than it is to sell those cards at retail and to SIs.
These kinds of conspiracies require everyone to collude, which just about never happens since the reward to defect increases. If nVidia tries this, they would just lose the market to AMD who would spam out as many GPUs to gamers as they could. If both AMD and nVidia teamed up, it would leave a gap that either intel or some Chinese startup would jump on.
It's just far more likely that these GPUs actually do cost a ton to make right now.
> These kinds of conspiracies require everyone to collude...
No, only Nvidia makes and sells Nvidia GPUs. They're the sole supplier of the GPUs used in 95% of the graphics cards sold in the US.
> If both AMD and nVidia teamed up, it would leave a gap that either intel or some Chinese startup would jump on.
Fascinating.
a) Explain why the only even vaguely-recent cheap video cards were made by Intel, and why it looks like Intel has pretty much stopped making video cards? [0]
b) Tell me how that Chinese startup gets past USian Sinophobic/protectionist trade barriers?
c) Tell me how that Chinese startup convinces the big gaming development houses to ignore the advice of Nvidia's driver engineering team that just so happens to make their games work great on the hardware in NOW and really, really poorly on that unknown-to-US-customers Chinese startup?
> It's just far more likely that these GPUs actually do cost a ton to make...
You seem to have not been paying much attention to the reports of Nvidia, AMD, and major RAM and storage suppliers changing focus from the consumer market to the far more profitable datacenter (read as "LLM") market. Several such suppliers have exited the consumer space entirely. As any residential renter in San Francisco [1] can tell you, extremely limited supply drives price up to obscene levels.
[0] This shift in Intel's focus may or may not be related to Nvidia becoming the third- or fourth-largest Intel shareholder.
[1] ...or any other "hot" market with large, artificial barriers to entry...
Indeed, Gamers Nexus is doing interviews with PC component manufacturers, and some are hurting bad right now. The PC market is no longer in competition, but rather survival mode. =3
https://www.youtube.com/@GamersNexus/videos
Don’t you worry - Microsoft and Amazon will have you covered with cloud streaming.
Can’t afford a computer because they bought up all the supply? They’ll conveniently sell it back to you with a subscription!
You’ll own nothing and be happy.
It's more likely to kill the AI market. They're overbuilding capacity and most of it is going unused. The upcoming haircut is going to kill a lot of the major players.
They've intentionally crafted an unsustainable business model in an effort to get users in the front door and raise their MAUs. We've seen this story before. We should know precisely where it's headed.
When you consider how much an employee costs, AI makes a ton of sense. Lots of businesses are stacked with staff doing basic data entry / shuffling. Even if it’s 1000usd a month, AI is still a bargain.
> most of it is going unused
Sorry that “it is going unused”? From what I've read, most AI providers are capacity constrained.
I think it's the opposite. Sure in short term hobbyists are getting squeezed, but the amount of capital that they can put into pushing the edge is small compared to Fortune 500. Sooner or later hobbyists will benefit, especially if the market crashes.
I fully agree, the billion dollar question is when it will come.
If it crashes after it kills the PC we’ll be left with… nothing? Path matters as much as destination
It's impossible to kill gaming like this. Even if hardware was completely unaffordable, people would just use old stuff for longer and then upgrade after prices restore.
Why would it kill PC? There will always be hobbyists, e.g. I can't imagine pro e-sports players running on a Mac. Personally, half of the reason I moved away from Windows is Microsoft stalling/degrading Windows experience.
Price of PCs causing a collapse in demand, then mass bankruptcies of companies making PC components so supply chains get demolished and when prices come back down there’s no one left selling anything so you can’t build a pc at any price
Apple will survive, but it's like having a car with the hood welded shut and controlled from Apple headquarters. Not much fun for hobbyists.
Apple is very far from anything gaming related - tons of games just dont work. And top tier gaming is pc gaming.
also for ones fully into AI
I wonder why the hyperscalers aren't vertically integrating more and building their own fabs. Sure, a fab costs a billion dollars, but they're currently spending hundreds of billions of dollars purchasing chips from NVidia and others.
I'm not sure if they should vertically integrate, it would probably be a better idea to directly fund the expansion of capacity, much like Apple does when they scale up a new technology for iPhones.
However, that the hyperscalers and AI companies aren't doing this says a lot about their true beliefs about how much future demand AI will have.
AI companies claim they will need a ton of massive expansion, but are unwilling to take on the risk of the capital needed for that expansion.
I'm hearing a lot of sad whining from AI folks about how these chip makers are holding them back, but who actually has the money to finance the expansion easily? Chip makers have been through this game far longer, when Sam Altman went around claiming it was time for $7T of fabs the AI companies made it clear that they were willing to make ridiculous claims, eliminating credibility.
What's needed now is for them to funnel a tiny amount of their massive piles of cash into financing fabs directly.
Oracle is getting sold because of how much capex they're spending on new data centers in the middle of a high rates environment. It's not like they're stockpiling cash due to doubting AI.
Oracle had not entered into my thoughts at all; I know they do some cloud stuff but they are in a very different position than OpenAI or Anthropic or Google.
> [...] better idea to directly fund the expansion of capacity [...] > > However, that the hyperscalers and AI companies aren't doing this says a lot about their true beliefs about how much future demand AI will have.
With what money? They have to spend the money they get on hardware ASAP else they are left behind.
Another guy answered it ITT. Intel did that, it’s not great because fabs are expensive and risky and it’s less risky to amortize the cost across multiple customers instead of just yourself
Because fabs are about the most complex cutting edge technology out there: the "rocket science" of our day (or one of them). And merely having the money is not sufficient. It would be very easy to blow several billion dollars and end up with nothing to show for it.
Just look at how Intel has struggled to compete in recent years, and they have been in the business for decades.
Intel struggled because they bet the company that Moore's law was over back in ~2014, and instead of upgrading their fabs to EUV they sent the money back to shareholders.
They forgot Moore's main lesson: only the paranoid survive. They thought they could coast, and it nearly killed them.
That is not even close to correct.
> They forgot Moore's main lesson: only the paranoid survive.
"Only the Paranoid Survive" is rather a quote and book title by Andrew S. Grove.
A fab takes years to build even when you have the necessary know-how. If you don't it'll take some additional experimenting before you can compete with the established manufacturers. By the time you can produce a usable chip the shortage might be over.
A fab costs $15-20bn and it takes at least five years to build. Plus it requires expertise that none of these companies have.
Fab margins are on average super thin compared to the margins of big tech companies, and come with a lot of risk because of that. It's not something they are likely to be keen to integrate.
A fab costs a billion dollars (really a lot more) and 5 years. It doesn't do anything for anyone today.
Memory manufactures sit on a war chest of IP. So even if someone has excess fab capacity and wants to get into memory manufacturing, they will have to fight an uphill battle of about a zillion patents.
Most memory companies have backroom deals to exchange tit-for-tat patent violations against each other.
Not sure how a new memory manufacture comes into being without getting sunk from licensing costs?
china?
Bought a second hand Dell server a week ago. The entire rig with a 12-core CPU and 32GB DDR4 ecc RAM cost as much as I'd pay to buy 64 GB of DDR RAM alone. I hope there's an end to this absurdity soon enough otherwise the pain will affect other markets too. I read the other day that PC case sales have collapsed by more than 40%.
Poor people are already being priced out of cheap phones due to rise in RAM-related unit costs. https://www.cnet.com/tech/mobile/smartphone-sales-to-plummet...
It makes me sad for the Neo 2.0. More ram is the only thing stopping me switching to it from a Pro.
The Macbook Neo 2 (or likely just Macbook Neo A19) is likely to have 12GB of RAM, given the 12GB iPhone Pro Max.
I feel like by the time the AI bubble bursts the PC market will be irreparably damaged. Manufactures who have been making "enterprise" parts aren't going to go back to making consumer parts because there will be no market for it. And with a glut of datacenters not making any money on slop, they are going to be repurposed for saas, stuff like OnShape but for every application.
Most users don't seem to care about storing everything they generate in cloud services and this could easily be sold as an alternative to owning "expensive" desktop or laptop hardware.
They’re going to pivot to you renting desktop cloud compute instead of owning anything.
Enjoy your HP laptop subscription, it's all the computer you're going to get moving forward.
It's the reason I just build a new PC, despite the insane prices, I'd rather overpay than have reasonable prices but no stock to buy. With any luck I'll get 8-10 years out of this one and by then the PC landscape will be something else entirely.
“Bubble”
I have an alternative take.
If hyperscalers are using more RAM, and that RAM is not available for consumers, it means all the heavy stuff will happen in the cloud. Why would we want both the hyperscalers and consumers to have RAM simultaneously? Consumers would want more RAM to run local models but then hyperscalers capacity will be unused.
Because RAM isn’t in PCs only. It’s in tablets, phones, laptops, DIY computers like the Raspberry, mini PCs, watches, smart TVs, game consoles, cars, routers, cameras, all smart appliances from refrigerators to washing machines, fitness trackers, printers etc. Cloud services are irrelevant to most of these categories.
A chip that produces refrigerator ram is also capable of producing hbm3 ? Don't they require retooling? Won't the same problems surface as required to establish new fabs?
They do require retooling and that's what's happening here. RAM manufacturers decided that it's way more lucrative to focus on HMB production than DDR 4/5 production. Capacity is the issue and that's capped unless you build new fabs but they won't do it because there's no guarantee that the demand will keep the same in the next years.
I really don’t want to give anyone ideas, but doesn’t this make the Nvidia 5090 an unbelievably good deal right now?
The VRAM in the 5090 is only made by one country in the world.
The 50xx series is special, because its ram is so dependent on a single commodity. It’s not like a 4090 or a 3090; their VRAM chips have been around for years.
If there’s a shortage or interruption in DDR7 VRAM, it seems like every GPU that requires it would explode in value.
I hope I don’t regret posting this because I’d really like to buy one myself…
An unbelievably good deal at $4000 plus?
Possibly the best deal there is
I really need to shut up, or bite the bullet and by one.
If you graph the tokens per second on the 5090, your jaw will hit the floor at how cheap it is
With only 32gb of vram, you can only run small/quantized models, in which case what's the point? At $4000, that gets you 20 months of 10x claude or chagpt subscriptions, which provide far better models. You'd need some use case where you can tolerate worse models, and use a steady supply of them. That doesn't match most people's usage patterns.
If you can do what you need with qwen3.6-27b, it starts to look really interesting. That model is crazy good for the size, but it's a pain tweaking the params to run it on a 4090 with decent context and decent token speed. A 5090 looks tasty from that point of view, and only more so if you think in terms of the probability of that model being roflstomped by something in the same weight class in the next couple of years. I reckon that probability is significantly non-zero, but fundamentally it's a guess.
>If you can do what you need with qwen3.6-27b, it starts to look really interesting.
What's the use case here? Churning out massive amounts of slop code through autonomous agents? Running openclaw 24/7? I think the proliferation of codex and claude code, compared to any of the cheaper open models suggests that at least for most software development, the 50-75% discount of open models isn't worth the hassle of the decreased intelligence.
I think there is a reasonable basis for taking a gamble that small models capable of fitting on a 32GB card will continue to advance over the next 5 years and eventually approach Gemini Flash 3.5 / Sonnet 4.6 levels of capabilities, which I would consider to be past the threshold of “probably worth the cost and hassle of running 24/7” if the upfront cost of the hardware was palatable.
My use case would primarily be in search, integration, and indexing other software projects with my own, as well as transcription/indexing of interesting video and audio content (eg Dwarkesh interviews) that I don’t have time to watch but want to easily search and apply to my projects, and search/indexing for useful information from things like Linux kernel and security mailing lists. Basically there is a lot of stuff that, if the cost were low enough, I would point a reasonably intelligent AI at to distill out useful information and apply it to my projects, or just cherry pick the interesting things out and surface them to me so I don’t have to wade through all the mundane stuff and man-made slop getting in the way.
>My use case would primarily be in search, integration, and indexing other software projects with my own, as well as transcription/indexing of interesting video and audio content (eg Dwarkesh interviews) that I don’t have time to watch but want to easily search and apply to my projects, and search/indexing for useful information from things like Linux kernel and security mailing lists. Basically there is a lot of stuff that, if the cost were low enough, I would point a reasonably intelligent AI at to distill out useful information and apply it to my projects, or just cherry pick the interesting things out and surface them to me so I don’t have to wade through all the mundane stuff and man-made slop getting in the way.
All of that feels like something that a $20 chatgpt pro subscription is for, maybe with slightly better tool use capabilities. There's no way that a $4000 purchase on a GPU would ever be worth it if all you're doing is running a handful of queries per day.
It would require much more than a couple of queries per day, I want to basically do bulk ingestion and search/evaluation/integration across tens of thousands of videos and software projects (if it were cheap enough and smart enough). It would basically be setting up and operating a pretty large data ingestion and coding agent pipeline, which I would want to itself be mostly automated.
It’s ok if you don’t want to do the same kind of thing but I find it weird how dismissive so many people get about wanting to use LLMs for large projects, or how anybody who says they’re using them for these kinds of things (I’m doing similar for other stuff) gets challenged on what they’re doing it for.
Or you want to process private data or don’t have reliable connectivity. There are a few more reasons for local models I think.
I don't have 5090, I have 395+ and I use for gpu assisted OCR, embeddings vector, speach to text and etc. I have a freedom of using a large library of various models and I can fit a lot in 128gb.
I don't use it for coding, I have $20 Gemini, $20 codex, etc.
But then I got the framework board for $1700, now it's $2700
Also, electricity isn't free.
With enough solar panels it is!
Not quite.
Free for approximately 8 hours (assuming perfect weather conditions) and excluding unit cost and maintenance cost.
It has a cost.
My area has a net-metering plan available, so you can send any surplus out to the grid to offset energy pulled from the grid, essentially treating the grid like a large battery. That can extend the 8 hours into full 24-hour coverage with enough panels.
The 5090 is crap for inference. Unless you like dummy models, sure they will run at light speed. All the rage is MoE with 500B-1T weights nowadays.
MoE is fine. You can put the shared weights on the 5090 (will fit handily even for the largest models) and expert weights on CPU, possibly with weights offload from storage.
Even if you could fit a 500B model's expert weights in very fast system RAM, it would run so slow as to be useless.
That's really only "useless" if the only thing you care about is a quick real-time response. Contrary to common perception, MoE models do benefit from batching requests together even when run on a single node, you just have to ensure you have at least ~5 parallel requests in flight (and that's for the very sparsest models) to really see the aggregate benefit.
(Intuitively, that's because the issue of whether any active weights are being shared among requests - thus, any memory throughput is being reused - is a generalized birthday problem. That's why even having a few parallel requests is quite effective. Especially since the "random" choice of experts happens anew at any single layer, so there's a lot of independent samples.)
This is just wishful thinking.
For prefill, it's really easy to batch MoE and get really good tk/s, even on a single stream.
For decode, you will run into the problem that:
1) you need more parallel requests which means more memory for context
2) 5 requests will not give you very much expert overlap on parallel requests
You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it; that's where the "birthday" framing becomes relevant. Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
>You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it
I'm not sure what you are claiming. Decode is bottle-necked by memory bandwidth. To see a speed up of 2x, you have to ensure each expert weight memory fetch can be used by 2 parallel streams. What exactly is the average factor you are claiming for 5x parallel streams (due to "birthday paradox" factors)? The Birthday paradox isn't really relevant here. It's about coverage, not parallelism.
> Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
This is not true.
An aggregate speedup of 2x is a lot, we don't need that in a local context. Local hardware is heavily constrained by power and thermals, not just bandwidth; so all we really care about is raising compute intensity for decode a little bit to relax the memory bandwidth constraint. The average factor will depend on just how sparse the model is and how far you can push parallelism, there isn't just one single answer.
But you won't see 2x expert re-use, the speedup with 5 streams will be tiny.
It's gone up like 300% in cost in the last year.
Which surely is the highest it'll ever be! You're suggesting that the price will go down in the future? Would love to hear more about your thought process!
Are you saying we're entering a period where tech increases in price instead of decreases? I guess it depends upon time horizon, but your statement isn't very specific.
Yeah man, obviously. RTX 5090s will almost certainly increase in price over the next two years as memory shortages get worse.
I believe msrp is $2000 right?
There was only a very brief time it was selling for MSRP (last fall for $2000). Even if you use that as the previous data point, it's only 200% increased.
> it's only 200% increased.
If it's 4k instead of 2k msrp, that's a 100% increase.
if you can buy one!
The RTX 5090 is faster than an H200. It just has less ram (32 vs 141), doesn't have NVLink, and technically isn't allowed to be used in a datacenter.
The datacenter GPUs sell at an 80% margin. They're incredibly overpriced. But the laws of supply and demand are undefeated and so here we all are.
> The RTX 5090 is faster than an H200. It just has less ram
H200 has HBM and much more 64-bit compute
Let me try again.
RTX 5090 has more CUDA cores that run at a higher clock speed. H200 has more RAM and significantly more RAM bandwidth.
Which one is net faster depends on your use case. But you may be very surprised that many workflows are faster on an RTX 5090!
I recently built a system at insane ddr4 prices ($2000 for 256gb). But that’s only after seeing how ddr5 prices were 3-4x that!
Yeah I upgraded all of my systems to DDR5 last year, so now I have to buy for ddr5 memory upgrades.
Had to fork over almost $1k for a 64G DDR5 kit a few weeks back. At least AMD chips large L3 cache allows folks to get away with lower grade udimms.
Also had to do an Intel build, and there was no way we were going cudimm at current prices. =3
I find it deeply ironic, that iran has blocked helium supply- while it relies on AI created slopaganda to subvert its advesary. Its one of those afterwits of history.
> iran...slopaganda
A US soldier i know commented that the iranian ai slop is "scary and powerful".
With how things are going, I'm really wondering how we are gonna tackle the consumer market for things like gaming and machine learning.
No doubt Cloud Gaming is in the cards for the future, only purists like myself with an RTX 5090 will pay premium for offline gaming
In the long run cloud gaming is inevitable, it’s just more economically efficient for the cost of the hardware required to render graphics to be amortized across consumers and not sit idle when being unused by collocating them with game assets in POPs.
Once enough gaming compute runs at the edge it also allows for more technically advanced games than would currently be economically feasible (but aren’t made mostly for lack of a market/adoption of cloud gaming and the resulting lack of technical know-how). So I think it will stick and probably end up winning over the holdouts, once the cost of rendering the games they want to play with consumer hardware becomes too large to stomach.
You could make the same economic argument for any SaaS, but the margins SaaS providers look for make it so that the only time it isn't cheaper to run your own software/hardware stack in place of SaaS is when the hardware requirements are very low, not high. SaaS makes sense economically when you take into account the admin, compliance, etc. costs... and the admin costs of a Nintendo Switch are pretty low.
Economic efficiency does not win the day because the free market is a myth. Cloud gaming is a technically worse solution because the latency floor is higher. It's a microeconomic disaster (rent vs buy, buy wins). The only reason it would become a thing is if the multinationals succeed in concentrating more wealth and power, which consumers aren't interested in supporting. It's a bad deal and consumers know it. They would have to be forced into it by having the consumer hardware market taken off the table (which is happening and the only possible avenue for a technical regression like cloud gaming to have a market).
I assume that memory manufacturers don’t really care where the money is coming from, as long as the "numbers go up" game is working.
NVIDIA in their recent quarterly report stopped categorizing "Geforce" as a single category, and merged it into "Edge-Computing".
If you are a PC Gamer or PC Enthusiast as I am, then we have some dark times ahead.
Do we though? DLSS 5 changes that somewhat from a “we need powah” to “we need models”. I think the future consumer GPU market will be tuned for image and world inference while workstation cards will be tuned for image and video inference. The old way of thinking about this will come to an end when we stop looking at the render loop as the be-all-end-all…
Or, we could be fucked.
From my point of view, I suppose we will enter a "Let AI generate entertainment" era. In which you just might rent everything, including games. No need for a beefy computer at home, you just need a slim endpoint:
"Order yours now, for just $99.99 per month, hardware included! Order today, and you will get three months of 'Office Suite' for free, with a small additional cost of $49.99 after month 4. On a tight budget? Switch to the yearly subscription, and pay comfortably in 18 installments."
On your Karna card…
If DLSS 5 becomes the norm it's possible that just makes things worse. The DLSS 5 demos required an entire separate card to run the model, though IIRC NVIDIA did claim it would eventually work on a single card. Given what the model is doing (yassifying the whole scene instead of just upscaling/reconstructing) it makes sense to me that it would increase compute demand instead of reduce it like previous versions of DLSS.
The demos did, but look how far we have come in just two years? Running local LLMs, running local diffusion models, running local world models (albeit, barely a scene at this point). I do believe that in 10 years time, game will be producing latents and not events they way they do now. I also hope this means that VR can finally get the fidelity it needs to really take off.
It's still unclear to me: the shortage is semiconductor boules / wafers? or the shortage is semiconductor fab process step availability?
As long as the discussion seems focused on memory, I'd suspect the latter, but if its really the semiconductor boules/wafers, then I'd expect the boule growers to profit, not the memory makers, who just pass on the cost.
So which is it?
It’s fab capacity. Fwiw dram is different enough that fabs are not transferable between dram memory and other usages. It’s nice to think ‘wow if they made the current 10nm dram on the latest 2nm processes it’d be much faster’ but it doesn’t work that way. The specific size is needed for the capacitance. Sram can be made on fabs that make other circuitry since it’s transistor not capacitor based but is less dense.
Dram is just extremely specialised.
I know the differences between SRAM, DRAM, ...
I asked for evidence different people keep feeding me opposite stories: one insists its not fab capacity but wafer competition, with a recent article claiming HBM3E takes 3 times as much wafer area per bit than LPDDR5X. Others tell me the complete opposite: its fab capacity, not wafer shortage.
Do we have citable references to ground either set of claims?
I believe those are two ways of describing the same thing. If you're able to book some fab capacity, that means you get to decide what the fab does with the next wafers in the queue.
From your sibling comment, I think you're interpreting the 3x HBM stat as contributing to making wafers scarce. It's more that the next wafer to be processed in a fab is especially precious, making the opportunity cost larger. The beach sand remains plentiful.
so the bottleneck is the fab, again
There is a good article (featured on HN a couple of days ago) that explains the issue: https://davidoks.blog/p/ai-is-killing-the-cheap-smartphone
And that article is contradicting other voices. If that article were correctly identifying the bottleneck as wafer shortage due to switching to HBM, why is everybody discussing the memory makers instead of the boule growers. Memory makers can expand operations all they can, which makes no sense if wafer supply doesn't follow, and the article is suspicously light on semiconductor boule / wafer mfr's.
So which is the bottleneck: fabs or boule growing?
also consider how most solar panels are monocrystalline silicon, how credible is silicon wafer shortage ... really? there is so much disinformation in this market...
This covers it pretty well https://news.ycombinator.com/item?id=48229319, TLDR -memory for AI uses more wafers from same production line as other memory and is more profitable, building new fab very risky historically for companies. The companies have cut production of other memory to favor memory for AI and the market for memory for AI is still unfulfilled so prices still go up for customers of every type.
Regardless of the specific mechanics of the bottleneck, we know what the proximate source of the problem is: openai locking up 40% of Samsung and SK Hynix wafer capacity for the next few years. That's what triggered the madness.
Is there an understanding of what OpenAI intends to do with that memory?
Surely they need GPU capacity and would need memory for those GPUs but OpenAI doesn't build GPUs or any hardware, right? So did they pay to keep the supply locked up, or do they have the ability to put that ram into use?
I guess they could have a thousand GPU's each generate the next 20 microseconds in computer games, and play at 50 kHz frame rates, in order to truly eliminate motion blur regardless of what in game object motion your eyes are tracking.
Good time to focus on more memory efficient means of training and inference.
SeedLM from Apple is an interesting approach for inference memory efficiency. I'd like to see someone try and build that into training so that it's not a post training compression step.
It seems to me the max memory you can buy in a laptop stagnated for the past 3 years or so.
for the most part, unless soldered down, it has been hard to find higher than dual channel (maybe quad for a massive odm gaming laptop). each stick and platform having set maximum memory capacity has put a glass ceiling for those machines.
doesn't matter anyway when things are not reasonably priced. i am stuck at the same memory capacity in my personal system for the better part of two decades, partially due to the above and the current pricing today.
I have always felt insulted that most laptops even offer a low 4 GB of RAM I rather take 16 GB in previous gen memory
My several years old laptop has 128GB of RAM, is that not enough? I admit that it's a pretty heavy one.
And the max storage in pre-built computers has stagnated at 2010 levels (~1TB). This was first due to the switch to the much more expensive and much faster charge trap flash. In the 2020s it finally started to approach 2010 sizes in pre-builts but then the corporate finance wars re: fab capacity happened.
I wonder if it is reasonable to assume the propagation of shortages further. At first it was GPUs, then RAM, then what?
Fresh water?
And four-fiths the cost of a consumer PC build.
The cost of memory should continue go up as we tend to have the AI to have context and remember lots more.
Time to let ASML sell to the Chinese memory producers … or not.
As models gain efficiency, will the need for ram cool?
They’ll just fill up the ram with bigger models. Demand will INCREASE, not decrease.
Every time we add capacity with almost anything, we find ways to saturate it.
Braess's paradox for roads. When we add capacity to road networks, traffic increases even more than the capacity.
https://en.wikipedia.org/wiki/Braess%27s_paradox
Jevons paradox is at play. Right now frontier AI is very expensive which heavily suppresses demand.
If you made it 10x cheaper right now you would see a truly unimaginable wave of bot slop.
Here’s the thing, what if memory manufacturers take this opportunity to collude and basically never reduce the price of memory below the current levels since it’s too hard for a new competitor to just rise up and undercut them? Everything I hear about is how hard and risky it is to spin up a new fab.
And by doing this, they ensure local LLMs never become feasible for the vast majority of people and AI companies solidify subscriptions forever.
Keeping prices at this level is precisely how one or more competitor will rise up. Making memory isn’t super hard. That’s why it is a commodity. The problem with the memory market is that up and down cycles have bankrupted the vast majority of players in the past. Now we only have 3 players left except for a few smaller ones in China.
The reason memory prices can stay high for years in this mega cycle is because the 3 players will be very cautious on overbuilding. They’d rather under build, make great profit (not maximum) and reduce the risk of going bust if this suddenly ends.
Same for TSMC in chips.
Great opportunity for Chinese companies though. This shortage is exactly what Chinese companies need to scale.
> Making memory isn’t super hard.
Then why do only 3 companies make it?
Bankruptcy risks.
When Samsung had to sell memory at a loss after COVID, no one came to save them. They buffered their memory division using profits from their other businesses. That’s how Samsung survives memory downturns.
According to some stories, this is how Samsung convinced TSMC to not enter the memory business - that you need a nation or other lines of business to prevent bankruptcies.
The market has stabilized to 3 players.
...And why do they go bankrupt?
Because it's an incredibly capital intensive process, involving billions of dollars of investment into manufacturing infrastructure.
That is to say, making memory is quite hard.
The technical process of making memory is relatively easy. Hence, it is a commodity.
I didn’t say owning a memory business is easy.
You’re confusing two independent things. There are simple processes that are extremely capital intensive with long lead times and then there are complex processes that require lots of R&D and industry secrets. Memory is the former in the chip world.
Other examples from outside of tech of easy but capital intensive processes are power generation and railroads. Very easy to do, but easy to end up broken by overbuilding for demand that fails to materialize or stay stable for the duration of your financing.
Making the memory can be much easier than predicting future demand.
Placing the bet isn't as hard as making an accurate prediction.
//Making memory isn’t super hard. That’s why it is a commodity.
These two aren't related.
Dram is a commodity because the you can replace a chip from hynix with a chip from micron, the have the same behaviour.
And a price competitive Dram isn't easy manufacture, or China would have made it already.
> up and down cycles have bankrupted the vast majority of players in the past
Exactly, so what’s the incentive for anyone to sink half a billy into building out more capacity.
The existing players get to rest on their laurels and succeed whether or not the AI bubble busts.
The incentive is that your 2 competitors will build more than you and gain market share on you if you are too conservative.
Samsung, SK Hynix, and Micron all have to balance between capex spending, making as much profit as possible, and risk of bankruptcy.
So are the new competitors currently in progress of starting up? Because it takes at least several years.
Only Chinese companies have a chance. Problem is that China can’t buy EUV machines and the most advanced memory chips now use EUV.
Heck, the US is now pressuring ASML to not sell even DUV machines to China, period.
Yeqh that is a challenge. DDR5 and LPDDR5X are both manufacturable with DUV. So let's hope they still get access to that...
When costs are high enough, you can recoup that, if you have an appetite for risking the downturn.
If the collude to say make the price $1000 for a component that costs them $100(including opportunity costs), then either a new company or a greedy company in the collusion can make their price secretly $900 and get massively more profit.
Right now their opportunity cost is too high.
> risky it is to spin up a new fab
You don't need a new fab. You can build memory in 20 years old fab.
Then that's a cartel and hopefully regulators will act.
They won’t.
They will. DOJ prosecuted memory makers in the late 90s and 2000s for collusion.
This boom is magnitudes higher than before. The attention will be endless.
Current DOJ is corrupt as fuck, it will not happen. Get back to reality.
They will respond when people are loud enough. If memory stays at $1200 for 128GB for years and investigative journalists say it could be colluding, enough people will make enough noise.
I’m sure Nvidia, Elon, Tim Cook, OpenAI, Anthropic are already whispering in Trump’s ears to do something.
What journalists? People who type shit into ChatGPT and post the article as their own?
Journalism is dead. There will be no more investigative journalists like the type you describe.
Once the masses are disenfranchised network state serfs according to plan, loudness won't matter
> I’m sure Nvidia, Elon, Tim Cook, OpenAI, Anthropic are already whispering in Trump’s ears to do something
You can't expect me to believe that any of those would want any kind of antitrust action against anybody.
Sure they do. They all have money interests in this. They all want lower memory prices.
Memory prices and shortages directly impact all of their profit margins and revenue.
They have other ways of getting what they want.
What magical way do they have of getting cheaper RAM if there really is collusion?
Corrupt doesn’t mean “acts without incentives”. If you assume a corrupt system, then the inputs are going to be who has influence over the DOJ. If there is more money to be made by breaking a cartel, then they would absolutely do it.
That was a very different DOJ. They no longer work for us. They act as Trump's personal law firm.
Then China will come and eat their lunch. I for one will only buy Chinese RAM from now on, no matter the prices.
>I for one will only buy Chinese RAM from now on, no matter the prices.
Memory is a commodity, so I think you will be very lonely in your quest.
Memory makes computation universal.
I wish I had figured that out a year ago. MU up ~10x, SNDK up ~37x. My crystal ball is woefully under performing.
How can I use this information to MY advantage? Do I started going into something to do with AI chip memory-stuff? If so, how? But just on a software level cause hardware is hard.
A commodity rapidly increasing in price. What could go wrong?
unified memory architectures are getting more interesting for inference workloads.
AI is choking the computing economy. Many companies will die. It's already a mass extinction event and will leave behind deserts.
Why did this happen so suddenly?
Why were tech savy investors unable to figure this out when the datacenter craze had already started?
How to explain this lag between quickly rising demand for all datacenter components besides memory?
RAM is a boom-and-bust industry, so memory manufacturers were reluctant to invest. Here's a good blog post on the economics:
https://davidoks.blog/p/ai-is-killing-the-cheap-smartphone
Maybe long-term purchase agreements from big buyers might have helped convince them it's okay to build, but apparently it didn't happen.
Nine years after Google's seminal paper lit the fuse on AI, a total lack of manufacturing foresight has trapped over a trillion dollars of incoming capital in a hardware bottleneck.
The entire sector is now facing a critical RAM starvation crisis where memory manufacturers are actively slow-rolling supply just to keep prices high and avoid running out entirely.
This has created an unprecedented supply-and-demand distortion where desperate companies are getting rejected even at a 5x markup, and mission-critical SKUs are skyrocketing to 10x and 20x their baseline value.
It is a macroeconomic squeeze at a staggering scale, and the massive venture scale opportunity lies in capturing the value created by this memory gatekeeper.
From the perspective of an armchair economist, the winners will be the investors who invest in RAM wisely. The losers will likely be cash strapped SAAS companies. They’re almost completely dependent on a fleet of servers in the hyperscalers, and they’re leasing those servers and services. That leaves small SAAS companies exposed to incoming inflation in the cost of hosting.
"That leaves small SAAS companies exposed to incoming inflation in the cost of hosting".
Which they will pass on to their customers. If their product provides enough value the customers will pay.....
Capex expenditure start exploding after covid with the chart going hockey stick at the end of 23/start of 24, almost 2.5 years ago.
A lot of capex is supposed to go into the datacentres, didn't they know that datacentres need to be filled among other stuff with RAM? I wonder if at some point we will discover that there is a shortage of fibre optic cables of SFPs ...
PS: Obviously armchair economist here too ... but for it doesn't seem too difficult to foresee the increase of the demand.
A lot of words to say that Sam Altman bought up the worlds total supply of ram chips for the next few years.
A dick move or just really prescient?
It's only prescient if it works out. But it's a dick move either way.
The same reason they didn’t all sell everything to buy NVIDIA the day chatGPT came out
Built a new machine with 64GB DDR5 and 5TB SSD in January 2025. It's sheer luck that I dodged that bullet.
Since memory is becoming an expensive commodity, I guess the old ways of being precious on the efficient memory usage of your program (like it running on the constrained 1mb memory back then) are making a comeback.
I only feel sorrow for the electron devs, they will have a hard time.
Since January, I've been lucky and picking up various used DDR4 memory sticks for cheap-ish. I got a total of 64 GB for $180. I feel like I hit the jackpot!
I think the companies that drive up the prices here, need to pay an extra-tax to all of us. I fail to see why I now have to pay more due to the AI monster companies ruining the economy.
Anyone invested in Micron stock?
Up 700% in a year.
WallstreeetBets has been disturbingly accurate in its predictions - basically anything related to AI.
it’s fun and ironic that “having a memory” is what AI appears to lack the most in practice while at the same time it demands more computer memory than anything to run
The algorithm advances are going to crash this so hard.
Or will more efficient algorithms just mean we run even more AI models, increasing the demand for AI chips even more?
I heard Greg Brockman on a podcast saying they are limited by computer and memory. They have line of sight in solving many different kinds of problems. But they also have to survive in the meantime. Hence the focus on enterprise recently. They could just ask Government to fund them doing other research areas
Better algos = more demand
Memory squeeze will get worse before it gets better.
I mean, god willing, but it'll be just as likely that we'll blissfully consume 100 million token contexts in that case.
isn't there a law for that? as things become cheaper you consume more?
You're probably thinking about jevons paradox. But you slightly mis-stated. It is the phenomenon that increasing the efficiency of resource consumption can end up increasing total consumption.
As you stated it, it would merely be a property of (nearly) all demand curves. Jevons paradox only happens sometimes. It isn't a law.
An example of where it stopped happening is with gasoline in developed countries. Cars having better fuel efficiency doesn’t make me drive further to the grocery store or work.
Generally when someone replaces their vehicle the new one is more fuel efficient than the old one even if I bought the same car.
Jevons paradox: https://en.wikipedia.org/wiki/Jevons_paradox
Jevons paradox.
https://en.wikipedia.org/wiki/Jevons_paradox
jevons paradox
classic uneducated algo copium talk
if we survive the bubble bursting and there isn't a "too big to fail" bailout with public money manipulation by bought politicians
we are going to have amazing cheap used hardware for a decade