Windows Update. To reboot a server, you need to take it out of production. With a TTL of 5 minutes, it can take an hour for (nearly) all users to stop using that server.
If things behaved nicely, yes. There's all sorts of weird DNS caching behaviour out there. It's not unusual to find folks with DNS servers / clients that are caching records for 1 hour+, and then of course there's people running super old versions of Java that used to cache DNS forever by default (before JDK 6). There's a very clear set of user that seem to cache for 10-15 minutes, regardless of any DNS TTL.
Sure. My general approach is to use lower TTL values (~ 5 minutes) and just accept that if people do dumb things, they just have to put up with things randomly breaking unexpected.