Didn’t we just see big pretraining gains from Google and likely Anthropic?
I like Dario’s view on this, we’ve seen this story before with deep learning. Then we progressively got better regularization, initialization, and activations.
I’m sure this will follow the same suit, the graph of improvement is still linear up and to the right
I like Dario’s view on this, we’ve seen this story before with deep learning. Then we progressively got better regularization, initialization, and activations.
I’m sure this will follow the same suit, the graph of improvement is still linear up and to the right