Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, but you want alignment even if you don't want censorship. A very intelligent but unaligned model will be prone to doing useless things like "auto-complete" your question into a more elaborate version or responding with "just google it dumbass" and other forms of internet vitriol.


> Yes, but you want alignment even if you don't want censorship.

Instruction tuning helps a lot with this but what a lot of people mean is the refusal to do things. You get to chose how "aligned" it is, for some usecases like talking to customers you definitely want something very "safe" (won't start using slurs or something terrible). But for direct usage you generally never want it to refuse to do anything.

Checkout Anthropic on the extreme side - every iteration of Claude has gotten worse on the chatbot arena (elo based on humans blindly comparing responses).

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: