Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could someone enlighten me and explain why Heroku sends TERM signals to the running processes? Doesn't sound very healthy, that's for sure. Nor something I'd personally tolerate from someone I'm purchasing a "cloud hosting" service from.

Is it simply a case that this is the way that Heroku responds to being told to shutdown an instance? If so, why isn't the managing app that sends the shutdown call to the instance also handling the graceful "shutdown" of the processes on that instance?



I cannot say why Heroku sends TERM signals, but here's why I would do it if I were designing a a PaaS:

* You want to instill the right culture in your customers code, that everything can fail, often, and they have to build software with that in mind. That's because stuff will fail always, and your customers will have to handle failures anyway, otherwise they will blame you.

* It's easier and cheaper to manage your fleet if all you have to care is that ninetysomething percent of your hosts are healthy.

* You can also detect broken machines more easily if you can remove it from the cluster as soon as you suspect it, knowing that no customers will be hurt. Where "broken" can mean anything, sometimes some instances will just run slowly, have bad IO, slow network, whatever, you don't care, you know you can just kill and respawn the containers as long as the total number of dynos meets the requirements and you don't exceed some predetermined rate of churn, which would affect the customer.

* You need to perform maintenance on machines where your run your customers containers/VMs. You can implement live migration, but it has a cost (implementation, management, storage etc), even more true a few years ago.

* You need to perform maintenance within the customers containers themselves; live migration won't help you with that. You don't want to bother your customers with maintenance windows.

* It's easy and cheap to "move" around containers across machines in order to balance load, spread an application across power domains.


TERM is the canonical way to signal a process to exit gracefully. The management application has no way to determine what other special handling your app may require.


Heroku sends them as a normal course of operations. Dynos get cycled daily. Why…not sure, but it happens and is well-documented by them that it happens. It's likely impossible to completely insulate against it, but if you design your jobs to be idempotent and safely retriable, rather than try to trap their signals, your jobs will be a lot more bullet-proof


"Heroku sends them as a normal course of operations." Wow, I didn't actually know that. Makes me glad I didn't pick Heroku recently for one of my mini-projects. It requires long-running processes.

I guess Heroku "dynos" are more suited for "worker" type jobs, then. In which case, sending the TERM signal to all processes isn't necessarily a really bad way of notifying the worker to shut down. Although, we are in 2014, and I don't see why they can't easily come up with a more robust solution. Even if it's in the form of a "shut-down" process, or giving the worker more than 10s to shutdown.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: