Could someone enlighten me and explain *why* Heroku sends TERM signals to the ru...

ithkuil · on June 27, 2014

I cannot say why Heroku sends TERM signals, but here's why I would do it if I were designing a a PaaS:

* You want to instill the right culture in your customers code, that everything can fail, often, and they have to build software with that in mind. That's because stuff will fail always, and your customers will have to handle failures anyway, otherwise they will blame you.

* It's easier and cheaper to manage your fleet if all you have to care is that ninetysomething percent of your hosts are healthy.

* You can also detect broken machines more easily if you can remove it from the cluster as soon as you suspect it, knowing that no customers will be hurt. Where "broken" can mean anything, sometimes some instances will just run slowly, have bad IO, slow network, whatever, you don't care, you know you can just kill and respawn the containers as long as the total number of dynos meets the requirements and you don't exceed some predetermined rate of churn, which would affect the customer.

* You need to perform maintenance on machines where your run your customers containers/VMs. You can implement live migration, but it has a cost (implementation, management, storage etc), even more true a few years ago.

* You need to perform maintenance within the customers containers themselves; live migration won't help you with that. You don't want to bother your customers with maintenance windows.

* It's easy and cheap to "move" around containers across machines in order to balance load, spread an application across power domains.

ominous_prime · on June 27, 2014

TERM is the canonical way to signal a process to exit gracefully. The management application has no way to determine what other special handling your app may require.

davetron5000 · on June 27, 2014

Heroku sends them as a normal course of operations. Dynos get cycled daily. Why…not sure, but it happens and is well-documented by them that it happens. It's likely impossible to completely insulate against it, but if you design your jobs to be idempotent and safely retriable, rather than try to trap their signals, your jobs will be a lot more bullet-proof

zo1 · on June 28, 2014

"Heroku sends them as a normal course of operations." Wow, I didn't actually know that. Makes me glad I didn't pick Heroku recently for one of my mini-projects. It requires long-running processes.

I guess Heroku "dynos" are more suited for "worker" type jobs, then. In which case, sending the TERM signal to all processes isn't necessarily a really bad way of notifying the worker to shut down. Although, we are in 2014, and I don't see why they can't easily come up with a more robust solution. Even if it's in the form of a "shut-down" process, or giving the worker more than 10s to shutdown.