Nvidia's RTX Pro 6000 has 96GB of VRAM and 600W of power

(theverge.com)

63 points | by mfiguiere 7 months ago ago

74 comments

standardly 7 months ago
Keep in mind the significant cost savings incurred by the fact it doubles as a home heating solution
[-]
- jauntywundrkind 7 months ago
  Just keep in mind that a heat pump will often be 300-400% efficient at adding heat. This is 100% efficient and for once that's not actually very good.
- Joel_Mckay 7 months ago
  We had a GPU cluster in a office building, and were asked to vacate given the HVAC for the entire floor was overwhelmed. It smelled like a tropical beach dumpster most of the time, and the lights would dim when a new job was queued.
  ML is so boring, =3
  [-]
  - 7 months ago
    [deleted]
- chneu 7 months ago
  Every room in my house with a desktop computer is consistently +5F the rest. There's some difference because of monitors and sun exposure, but yeah.
  I actually did have to alter my HVAC automation to account for it, lol.
- pwr22 7 months ago
  The more you buy, the more you save!
  [-]
  - metalman 7 months ago
    more to the point, I hope that the electrical certification for computer's is getting ago over, as these GPU's make more heat than the smallest electrical heaters on sale cant help but consider a whole different mounting option, where, I keep toying with the idea of doing a solar powered, crypto mine, that pre heats water for a coin-opp laundromat business. There are water to water heat pumps that are for boosting the water temp up to whats required for comercial hot water, while cooling the per-heat tank back down to a good temp for the GPU's. Laundromats are one of the places that are truely timeless, with a particular and wide cross section of society rubbing shoulders, proper diners are similar, but are now mostly gone, unfortunate as both enforce a strangely formal civility and even camradery on people who rarely otherwise share space.
bryanlarsen 7 months ago
No price indicated. If you have to ask, you're not the market.
[-]
- whatever1 7 months ago
  A Blank check to nvidia will get you a spot in their buying invitation raffle.
- caycep 7 months ago
  Wasn't the ADA edition like $6k or so?
  [-]
  - ein0p 7 months ago
    Good luck finding it for less than $9K, and it has half the VRAM, and older chip. I predict MSRP of no less than $15K, with IRL prices above $20K.
- blitzar 7 months ago
  Ouch, I was just asking for a friend.
- pacetherace 7 months ago
  Send the bill to my manager
- hooloovoo_zoo 7 months ago
  It still has to compete with renting an actual professional card(s).
  [-]
  - BoredPositron 7 months ago
    These are workstation cards and a lot of professionals need them to do other work than ML. The elitist attitude is pretty amusing, though.
    [-]
    - hooloovoo_zoo 7 months ago
      It's not elitist. The Nvidia 'pro' cards (quadro etc.) have always been a slightly unlocked, wildly more expensive version of the consumer cards. The v100, a100, h100 are meaningful hardware upgrades to the consumer line.
      [-]
      - BoredPositron 7 months ago
        [flagged]
    - blitzar 7 months ago
      > need them to do other work than ML
      I need more frames in CS2.
      [-]
      - BoredPositron 7 months ago
        Whatever your profession is mate but I hate to break it to you the 6000 series is usually 20% slower in gaming performance than the best consumer card of the generation.
- jdprgm 7 months ago
  vram mafia
1024core 7 months ago
What's the limitation that keeps memory limited to 96GB? Could one put 512GB of memory on a card? I'm curious about what is the limiting factor.
[-]
- jsheard 7 months ago
  GDDR memory buses are so fast that the RAM chips have to be packed tightly around the GPU core to maintain signal integrity, so the limit is more or less how many chips they can physically fit multiplied by the biggest chip capacity their suppliers can provide.
  [-]
  - codedokode 7 months ago
    But theoretically RAM chips do not need to be synchronous with each other. Even more, the data lanes on the chip do not need to be synchronous - you can treat each lane as an independent serial channel. And GDDR latency is high enough that longer lanes won't change anything.
    [-]
    - wtallis 7 months ago
      You can't quite treat each lane as an independent serial channel; DRAM chips are at least 8 bits wide, and this GPU needs to either connect 16 bits to each die or connect groups of 32 bits to two dies, and the DRAM die does want to get the whole word on the same clock cycle. There's no SERDES at either end like you get with PCIe or Ethernet. Just 512 PHYs doing 32+Gbps PAM3. If you want those to be long-reach PHYs, you're not going to have much die space or power budget left for compute.
  - etiam 7 months ago
    Does that mean it's perfectly feasible to have more if one accepts a higher latency? Seems like there could be plenty of use cases where that's preferable.
    [-]
    - Lramseyer 7 months ago
      Not exactly. The name of the game with GDDR memory is "speed on the cheap." To do this, it uses a parallel bus with data rates pushed to the max. Not much headroom for things that could compromise signal integrity like socketed parts, or even board traces longer than they absolutely need to be. That's why the DRAM modules are close to the GPU and they're always soldered down.
      Also, the latency with GDDR7 is pretty terrible. It uses PAM3 signaling with a cursed packet encoding scheme. At least they were nice enough to add in a static data scrambler this time around! The lack of RLL was kind of a pain in GDDR6.
    - codedokode 7 months ago
      GDDR chips already have very high latency.
    - 0x073 7 months ago
      Like the gtx 970 with 3,5 + 0,4 memory.
  - lazide 7 months ago
    Also limited by heat dissipation.
- pavlov 7 months ago
  Apple sells a Mac Studio with the M3 Ultra chip and 512GB VRAM (unified memory between CPU and GPU). It costs $9,500.
  Their secret is that the memory is manufactured within the chip package.
  [-]
  - jayd16 7 months ago
    LPDDR5X ~550GB/s vs GDDR7 which is ~1.8 TB/s
  - angoragoats 7 months ago
    This doesn’t really answer the parent’s question.
    That the memory is on a PCB close to the CPU/GPU certainly helps with signal integrity, but it is not by any means relevant here. The Apple platform has high memory bandwidth compared to x86 PCs because the CPU has a wide memory bus. You can get similar memory bandwidth out of high-end Epyc and Xeon CPUs which use standard DIMMs but with many more memory channels than a regular desktop computer.
  - ein0p 7 months ago
    It's actually not within the chip's package. It's soldered to the board. It's just regular, fairly high spec LPDDR5X IIRC, there are just a TON of memory channels.
    [-]
    - pavlov 7 months ago
      It's not in the package? TIL... My misunderstanding seems to be common across the interwebs.
      I remember Apple used to show slides depicting the M1 SoC as one unit containing a CPU, GPU, Neural Engine, cache, and DRAM all together. But slides shown at an Apple event definitely qualify for artistic license.
      [-]
      - jsheard 7 months ago
        The memory is soldered on top of the package, here's an actual real life photo of the M1 package with two LPDDR packages sitting on it:
        https://en.wikipedia.org/wiki/Apple_M1#/media/File:Mac_Mini_...
        You could call that on-package, but it's not on-package in the same way that GPU HBM is, with that the main die and memory dies are packaged together on the same substrate. That's a much more difficult and expensive process, apparently packaging is the main bottleneck for H100/H200 production.
        https://i.imgur.com/hFTPyjk.jpeg
        [-]
        nomel 7 months ago
        For that H100 image, how do they get the top surface of all the chips on the same plane? Do they solder them "upside down", so the chips are sitting on a reference plane, and the PCB is allowed to "float"? Or is the soldering controlled enough that they can just heat and be done?
        ein0p 7 months ago
        Nor is it a chiplet, in which case it would qualify as "in-SoC".
      - TiredOfLife 7 months ago
        Wait till you learn that unified memory was in common use by pc's long before Apple "invented" it.
    - xxs 7 months ago
      The statement is correct; it's not on the substrate similar to AMD's 3D cache or it doesn't use interposer like HBM.
      You can consider it like a small PCB that has the CPU die and the memory soldered very closely nearby (and like mentioned +memory channels)
  - codedokode 7 months ago
    Why NVIDIA cannot manufacture 512 Gb chips and put 16 of them on the board?
    [-]
    - jsheard 7 months ago
      Apples architecture comes with its own trade-offs, it gives them huge capacity and pretty good bandwidth, but not nearly as much as Nvidia's architectures have. The M3 Ultra is 800GB/sec, the RTX 5090 is 1.8TB/sec, and the H200 is 4.8TB/s(!). Huge capacity with middling bandwidth is in vogue because it's a good fit for AI inference, but AI training and most other applications of GPUs need as much bandwidth as they can get.
      [-]
      - 7 months ago
        [deleted]
      - codedokode 7 months ago
        Well, if you have 16 M3-equivalent chips you can multiply the bandwidth by 16, right? Also, as I understand, ML is basically matrix multiplication and it has O(N³) operations on O(N²) numbers, so bandwidth might be not as important as number of ALUs.
- YetAnotherNick 7 months ago
  What's the usecase of 512 GB memory that you cannot achieve through multi GPUs? Maybe you can make it bit cheaper as you don't require multiple chip, but I would say it is just a maybe because the chip is not the costliest part for Nvidia to manufacture, it is the memory[1].
  [1]: https://www.nextplatform.com/2024/02/27/he-who-can-pay-top-d...
  [-]
  - immibis 7 months ago
    One card with twice the memory would let you run the model in half as many cards, at half the speed.
- utf_8x 7 months ago
  While there are definitely physical limits, the core limitation here is greed. They would sell less cards. Same reason why their consumer cards are limited to ridiculously low amounts of VRAM (16GB on an RTX5080, only 8 on the RTX4060, etc) so if you want to do any serious AI you have to buy their overpriced enterprise cards.
  [-]
  - therealpygon 7 months ago
    Not just less cards; by making their cards have less ram, they are helping prop up cloud-based inference, which in turn generates revenue for their most expensive line. It is the reason you used to be able to get 12GB ram on a 3060, and that now you have to move up 3 increasingly expensive models to get the same, restricted only by drivers and not capability. They made it clear that this was all intentional because they didn’t want consumer hardware in data centers as it costs profits.
- GuuD 7 months ago
  Bandwidth/GPU real estate in terms of area. Biggest GDDR7 chips are 3GB 32-bit, it has 512bit wide bus. And even this is going to moonlight as a space heater
  [-]
  - codedokode 7 months ago
    Why Apple packs something like 64 Gb on the CPU chip and NVIDIA cannot?
    [-]
    - bryanlarsen 7 months ago
      Apple uses LPDDR4 which comes in densities of up to 16Gb AFAICT. So they can have ~5X as much memory using the same bus width.
      [-]
      - nhubbard 7 months ago
        Apple uses LPDDR5X, not LPDDR4, iirc
        [-]
        xxs 7 months ago
        Of course it's ddr5 not 4.
        [-]
        IndrekR 7 months ago
        From PCB perspective, LPDDR5(X) interface is quite different from regular DDR5. Same with DDR4 and LPDDR4. Source: have designed few boards with different memory interfaces.
    - wmf 7 months ago
      LPDDR dies can stack but GDDR cannot?
- octacat 7 months ago
  Wiring also, chips for ddr7 look like they have a lot of pads. And you need pads on the graphics chip for them too.
- blitzar 7 months ago
  Planned obsolescence, got to sell 7xxx cards somehow.
jsheard 7 months ago
What a beast. Like past generations there is a variant with a blower-style cooler which is limited to 300W, but now they're also doing variants modelled after their gaming cards but with even higher TDPs. Triple the memory of the gaming flagship too, you used to only get double.
[-]
- zamadatix 7 months ago
  This looks like the same TDP as the gaming flagship of the same chip (5090, also GB202 based).
  [-]
  - jsheard 7 months ago
    The 5090 reference design is 575W. Not a huge difference, but the workstation card is slightly more.
    [-]
    - zamadatix 7 months ago
      Good point, thanks!
magicalhippo 7 months ago
Will be fun to see if it has all the ROPs it should[1], or if NVIDIA has really gone all in on making this the worst product launch ever...
[1]: https://www.techpowerup.com/332884/nvidia-geforce-rtx-50-car...
kelseyfrog 7 months ago
I just hope whatever organ I’m selling to afford this GPU is one of the paired ones.
[-]
- theandrewbailey 7 months ago
  Too bad SLI isn't a thing anymore.
zokier 7 months ago
As far as I can tell, it uses the same infamous power connector as 5090. I wonder if there are any differences there, maybe some additional balancing/safety features?
0cf8612b2e1e 7 months ago
Is this real 600W or 750W when we are in burst mode? I am too accustomed to the TDP lies from CPUs.
[-]
- xxs 7 months ago
  T is for thermal, it's mostly about the need to be able to dissipate that much heat on average not peak (or transient) power.
torginus 7 months ago
> 600W
is this a sign that semiconductor scaling is completely dead now?
[-]
- adrian_b 7 months ago
  Not yet, but the NVIDIA RTX 5000 (Blackwell) series does not use a newer manufacturing process, so their energy efficiency is slightly worse than that of the RTX 4000 SUPER (Ada) series, which remain the most efficient GPUs (e.g. RTX 4080 SUPER).
  The RTX 5000 (Blackwell) series has increased the performance only by using bigger chips and a higher power consumption. The RTX Pro Blackwell series use the same chips as the consumer series.
7 months ago
[deleted]
waltercool 7 months ago
[dead]
Hiko0 7 months ago
600W of power? Would you sell a car with "35l/100km of power"?
[-]
- redundantly 7 months ago
  Because the people buying these things have power supplies that are also rated by their wattage.
  If someone has a 750w PSU they'll need to replace it with something higher, e.g., 1000w, to run this card and the rest of the computer components.
  Having power ratings in these simple terms is helpful.
- noqc 7 months ago
  what on earth is wrong with watts as a unit of power?
- quickthrowman 7 months ago
  Would you rather have it say 50A @ 12V? The headline is written incorrectly (because the article writer didn’t write it), but the article says ‘needs 600W of power’.
  Headlines are misleading, film at 11.