Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What of it?

For me too, it was around that time last year, with GPT-5, Claude Sonnet 4.5 and then Gemini 3 that I started feeling that these models are clearly becoming great at reasoning. I'm not at all opposed to saying that they are around PhD-level on at least some domains.

 help



I think there's a lot of difference between sounding like someone and being someone. The models are excellent at pretending indeed.

I don't think that sama was arguing that ChatGPT actually passed a PhD thesis defense. But arguably, it could make for an interesting benchmark.

Please do not get swayed by nor defend the words vomited by a snake oil salesman.

Also what benchmark? How will you you design it?


exactly. this is what whole RL thing is optimizing for, even if that is not the intent.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: