Yes, but you want alignment even if you don't want censorship. A very intelligen...

declaredapple · on Jan 19, 2024

> Yes, but you want alignment even if you don't want censorship.

Instruction tuning helps a lot with this but what a lot of people mean is the refusal to do things. You get to chose how "aligned" it is, for some usecases like talking to customers you definitely want something very "safe" (won't start using slurs or something terrible). But for direct usage you generally never want it to refuse to do anything.

Checkout Anthropic on the extreme side - every iteration of Claude has gotten worse on the chatbot arena (elo based on humans blindly comparing responses).

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...