Hey, always wondered: how come you tried to store the DB data on the docker’s system disk instead of the external volume? This would bypass all the driver stability problems, make the data recoverable trivially, decrease cpu usage, etc...
I know that there are posts from 2016 explaining how to do so, but the article had no mention of this?
That seems to be a common misconception. I don't know why anybody thought I wasn't aware of external volumes, I was.
It made little difference, the entire ecosystem was highly unstable. Containers could still fail, the docker daemon could hang, and the host could kernel panic any minute. For databases that meant downtime and potential data corruption.
Besides, there is a major use case to run temporary databases for CI testing. I remember a lot of issues when running performance tests or seeding the database with initial data, basically anything that is performance intensive. I think the unstable filesystem played a role but it was far from the only root cause.
Honestly it's been 6 years now and I've never seen a company running any critical databases inside docker (but I've seen it for testing). I've known a lot of companies that said they did or they had plans to move existing databases, but that wasn't real. At the end of the day there was no sysadmin/DBA/devops who would do it, they understood that it would come back to bite them (many had enough troubles just running ephemeral web services). Maybe it's the harder part to grasp but database is really a different mindset from web development, you cannot risk losing customer data, this would be an extinction level event for your job and for the company.
We run almost all of our DBs - pg, cockroachdb, clickhouse and etcd/kafka/redis (if you can consider those a database) inside docker/crio inside k8s. In production under high load. Works really well. We’ve had more crashes of the db itself than anything container/node related
If you mean using Docker containers, well those are pretty stable. There are hundreds of companies running ClickHouse on Kubernetes, which deploys using Docker containers. Some of them run on very large K8s clusters. We've seen very few problems.
The one issue I've seen specific to Docker is that you can run into configuration errors that keep them from coming up. That happens occasionally and it can be tricky to debug.
Disclaimer: My company Altinity wrote the ClickHouse Kubernetes Operator, which is also a Docker container.
The article is 5 years old, your operator is 3 years old. 5+ years ago docker was different. I had seen stability issues too but they got less and less by time until everything was running without any problems. Sometimes even a reboot was needed because the kernel was misbehaving.
> Honestly it's been 6 years now and I've never seen a company running any critical databases inside docker (but I've seen it for testing)
Oh. Maybe I misread--the above point seemed to be referring to the present, which is what I was addressing. 6 years ago is a different matter entirely. Heck, operators didn't exist then either. [1]
I know that there are posts from 2016 explaining how to do so, but the article had no mention of this?