It feels like just yesterday that Chips and Cheese started publishing (*checked and they started up in 2020 -- so not that long ago after all!), and now they've really become a mainstay in my silicon newsletter stack, up there with Semianalysis/Semiengineering/etc.
> Intel uses a software-managed scoreboard to handle dependencies for long latency instructions.
Interesting! I've seen this in compute accelerators before, but both AMD and Nvidia manage their long-latency dependency tracking in hardware so it's interesting to see a major GPU vendor taking this approach. Looking more into it, it looks like the interface their `send`/`sendc` instruction exposes is basically the same interface that the PE would use to talk to the NOC: rather than having some high-level e.g. load instruction that hardware then translates to "send a read-request to the dcache, and when it comes back increment this scoreboard slot", the ISA lets/makes the compiler state that all directly. Good for fine control of the hardware, bad if the compiler isn't able to make inferences that the hardware would (e.g. based on runtime data), but then good again if you really want to minimize area and so wouldn't have that fancy logic in the pipeline anyways.
I'm also hoping that Intel puts out an Arc A770 class upgrade in their B-series line-up.
My workstation and my kids' playroom gaming computer both have A770's, and they've been really amazing for the price I paid, $269 and $190. My triple screen racing sim has an RX 7900 GRE ($499), and of the three the GRE has surprisingly been the least consistently stable (e.g. driver timeouts, crashes).
Granted, I came into the new Intel GPU game after they'd gone through 2 solid years of driver quality hell, but I've been really pleased with Intel's uncharacteristic focus and pace of improvement in both the hardware and especially the software. I really hope they keep it up.
It feels like just yesterday that Chips and Cheese started publishing (*checked and they started up in 2020 -- so not that long ago after all!), and now they've really become a mainstay in my silicon newsletter stack, up there with Semianalysis/Semiengineering/etc.
> Intel uses a software-managed scoreboard to handle dependencies for long latency instructions.
Interesting! I've seen this in compute accelerators before, but both AMD and Nvidia manage their long-latency dependency tracking in hardware so it's interesting to see a major GPU vendor taking this approach. Looking more into it, it looks like the interface their `send`/`sendc` instruction exposes is basically the same interface that the PE would use to talk to the NOC: rather than having some high-level e.g. load instruction that hardware then translates to "send a read-request to the dcache, and when it comes back increment this scoreboard slot", the ISA lets/makes the compiler state that all directly. Good for fine control of the hardware, bad if the compiler isn't able to make inferences that the hardware would (e.g. based on runtime data), but then good again if you really want to minimize area and so wouldn't have that fancy logic in the pipeline anyways.
I love these breakdown writeups so much.
I'm also hoping that Intel puts out an Arc A770 class upgrade in their B-series line-up.
My workstation and my kids' playroom gaming computer both have A770's, and they've been really amazing for the price I paid, $269 and $190. My triple screen racing sim has an RX 7900 GRE ($499), and of the three the GRE has surprisingly been the least consistently stable (e.g. driver timeouts, crashes).
Granted, I came into the new Intel GPU game after they'd gone through 2 solid years of driver quality hell, but I've been really pleased with Intel's uncharacteristic focus and pace of improvement in both the hardware and especially the software. I really hope they keep it up.