AI will go the way of ASICs, just like bitcoin.

glitchc · on Sept 10, 2024

Indeed, you can strip out a whole host of things from the GPU, the framebuffer, the Z-buffer, the transform and lighting engine, instead filling it with more CUDA cores and a higher bandwidth memory controller with a larger bus, etc.

And, as it happens, that's exactly what NVidia's done with the H100: https://developer.nvidia.com/blog/nvidia-hopper-architecture...

It still needs to be programmable though. Can't get away from that.

kolinko · on Sept 10, 2024

You can get away from that if you constrain it to a specific type of models (say attention based).

adastra22 · on Sept 10, 2024

You don’t need general programmability for AI inference.

glitchc · on Sept 10, 2024

The money's in the training, not the inference.

If you look at Apple and Google, they already have their own hardware for inference in their smartphones. They don't need NVidia for that.

mupuff1234 · on Sept 10, 2024

Apple and Google use TPUs for training.

https://www.cnbc.com/2024/07/29/apple-says-its-ai-models-wer...

glitchc · on Sept 10, 2024

Hmmm, that's worse for NVidia.

adastra22 · on Sept 11, 2024

NVIDIA owns the interconnects that are used for this training. I’m sure they have their own competing AI accelerator they are working on too.

adastra22 · on Sept 10, 2024

You don’t need programmability for AI teaining either.

Rinzler89 · on Sept 10, 2024

It's already there. Have you seen the six figure AI chips that Nvidia is selling to the data center customers? Those chips are no GPUs, they can't draw a single triangle or map a single texture, they're AI accelerators all the way. People still think Nvidia is selling gaming GPUs for AI workloads like it's 2018?

Google, Meta, et-all are working on their own AI chips but those chips will have to beat Nvidia's at Performance and TCO and Nvidia shows no signs of slowing down to let competitors catch up.

kolinko · on Sept 10, 2024

The chips are optimised for matmuls, but not for transformer architecture per se. With dedicated ASICS, and weights hardcoded (or stored in SRAM) we could theorically get 1 token per one cycle - so millions/billions of tokens per second, not hundreds.

Etched, for example claims they have a chip reaching 500k tok/s in the works. Which is still far from the theoretical max with the current techology.

A similar scenario went with Bitcoin's GPU/FPGA/ASIC - the current ASICs are millions of times faster than GPUs.

throwthrowuknow · on Sept 10, 2024

That’s fine if you never need to improve the model, which is valid in some use cases, but for chat style interaction or even code generation you’ll regularly have to update the weights.

kolinko · on Sept 10, 2024

Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.

matwood · on Sept 10, 2024

> beat Nvidia's at Performance and TCO

TCO, yes. Raw performance, not necessarily. TCO will attack NVDA's margins. When Meta last wrote about their cluster it was presented as power equivalent to X NVDA chips. They are already bringing their own chips into the mix.

posix_compliant · on Sept 10, 2024

With Bitcoin I feel like it’s different, since the hashing algorithm would only ever change during a fork. This is rare in that it only ever happens every few years.

With AI, we’re constantly training different models, which can’t be trained using asics. If we ever get to the point where we no longer need to train new models, then yeah, it will go the way of bitcoin.

TacticalCoder · on Sept 10, 2024

> With Bitcoin I feel like it’s different, since the hashing algorithm would only ever change during a fork. This is rare in that it only ever happens every few years.

Wait what!? Did the Bitcoin hashing algorithm ever change?

adastra22 · on Sept 10, 2024

It’s never happened for Bitcoin.

qqqult · on Sept 10, 2024

kkielhofner · on Sept 10, 2024

> just like bitcoin

The problem with this comparison is Bitcoin has basically just been SHA256 for 15 years and likely will continue to be for some time.

Transformers have been mostly dominant for at least several years but there are still other archs (CNN, RNN, etc) in various use-cases and we're already seeing nearly-fundamental changes in Transformers and "emerging" approaches like Mamba, RWKV, hybrids, etc. Transformers have shown remarkable versatility and adaptability (that's their whole thing) but it's already creaking and showing its age.

Startups building Transformer-specific silicon are playing a very risky game that is already somewhat problematic now and almost certainly won't end well.

AI is much newer, much more vast, and moving much more quickly. The ASIC design, tape out, manufacture, software ecosystem, actually getting to market, etc cycle is fundamentally too long and I suspect even the Transformer-specific silicon we see now will be viewed as a major blunder in the relatively near future:

"Oh yeah, remember those graveyard companies that did transformer silicon back in the first AI hype round?"

I cannot see how anything other than GPGPU, TPU, NPU, etc (or similar "generic" approaches) will have legs.

brrrrrm · on Sept 10, 2024

It kinda has already with fixed matrix multiplication units. But beyond that, no chance. Bitcoin is an unchanging hash algo, not a developing software

vidarh · on Sept 10, 2024

Groq is one example (NOT Musk's Grok), though currently focused only on inference, I think.

danielmarkbruce · on Sept 10, 2024

An H100 is already close to ASIC. "GPU" is just a path dependent historical naming.