Hacker Newsnew | past | comments | ask | show | jobs | submit | KarraAI's commentslogin

Been testing Deepseek R1 for coding tasks, and it's really impressive. The model nails Human Eval with a score of 96.3%, which is great, but what really stands out is its math performance (97.3% on MATH-500) and logical reasoning (71.5% on GPQA). If you're working on algorithm-heavy tasks, this model could definitely give you a solid edge.

On the downside, it’s a bit slower compared to others in terms of token generation (37.2 tokens/sec) and has a lower output capacity (8K tokens), so it might not be the best for large-scale generation. But if you're focused on solving complex problems or optimizing code, Deepseek R1 definitely holds its own. Plus, it's incredibly cost-effective compared to other models on the market.


How do you ensure the student model learns robust generalizations rather than just surface-level mimicry?


No idea as I don't work on that, but my guess would be that the higher the 'n' the more model A approaches model B.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: