This was a really interesting paper but there's a massive gap in what they didn'... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		joshuaisaact 16 days ago \| parent \| context \| favorite \| on: Embarrassingly simple self-distillation improves c... This was a really interesting paper but there's a massive gap in what they didn't try, which is inference-time temperature changes based on the fork/lock distinction. Maybe I'll try that myself, because it feels like it could be a great source of improvements. It would be really useful to see adaptive per-token sampling as an additional decode-only baseline.

grumbelbart 16 days ago | [–]

Is this some kind of calibration then? I'd expect that the probabilities automatically adjust during training, such that in "lock" mode, for example, syntax-breaking tokens have a very low probability and would not be picked even wich higher temperature.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact