We’ve been using Zalando’s Postgres Operator before in production and we recently switched to cloudnative-pg. We didn’t experience any issues so far and I’m a big fan of their design choices (where you have a single user, single database for each micro-service that requires one).
I used zalando and surprisingly hit a bunch of config issues where the backups won't run on an s3 endpoint that isn't aws or wal-e now wont run because it's got no db credentials but does the seaweed endpoint.
now my one db-one user-one pod "cluster" is dead because it won't elect itself leader. I simply cannot kick it correctly to revive it.
> The Kubernetes way
to me, this translates into: a really complicated mess of manifests of layers-of-layers of fun, but hidden in an attempt to make it easy; but when it does work in the simplest forms, it works great.
if any lesson ive learned; let kubernetes manage only things that wont shoot yourself in the foot with; you wont know until you try :)
Your definition of Kubernetes is accurate for maybe 5 years ago. These days Kubernetes is perfectly capable of running stateful services and supports them as a first class citizen.
Whenever someone runs Kubernetes onprem I tell them to buy a TrueNAS or another cheap SAN. A cheap SAN costs as much as a DevOps expert setting up your Ceph infrastructure and a lot less when you actually run into issues with that software defined storage solution.
Once you do that Kubernetes is actually quite nice, because it gives you a base configuration of postgres that comes with automatic backup setup etc.
Kubernetes has nothing to offer to anyone that wants to work with storage (but there's a myriad of CSIs).
Here, let me give you an example of a system that offers storage in a similar way to how Kubernetes offers... well, it isn't really that good at offering compute, but, at least it kind of does it. So, Ceph -- that's something that makes sense to run PostgreSQL on (it's a storage provider). Kubernetes isn't a storage provider. It doesn't know how to manage it even...
I.e. if you think that you run PosgreSQL on Kubernetes -- you are mistaken. Something else does it. Kubernetes is a proxy there, at best (but probably is completely irrelevant).
Rook is about using CSI... it doesn't run Ceph on Kubernetes. That's impossible because Ceph relies on functionality that exists in drivers (kernel modules) to run. CSI is the component that communicates between eg. rbd driver and the user-space (eg. Kubernetes controllers), but it doesn't run Ceph.
It's in principle impossible to do anything about block devices in containers like those used by Kubernetes because those rely on Linux processes and associated namespaces. There isn't a Linux namespace for block devices, the closest you can get is the filesystem namespace. In other words, you cannot manage block devices purely in containers, you need some help from the host operating system. And this is why I mentioned CSIs in my previous post.
It does run Ceph on Kubernetes. How else would you describe deploying OSDs to linux servers via Kubernetes other than "Ceph on Kubernetes"?
> That's impossible because Ceph relies on functionality that exists in drivers (kernel modules) to run
This statement doesn't make sense. All linux applications require kernel functionality. Yes, to deploy Ceph, you must run Linux systems with the desired kernel modules. Turns out, Rook sets that up for ya! This statement exposes a somewhat deep misunderstanding of what Kubernetes is.
I run into you in every thread that mentions k8s and I sense extreme vitriol and a huge lack of experience / understanding. Don't mistake my future lack of replies for an unsaid "you've misunderstood".
Part of my job is to measure storage performance...
I can tell you at leas this: there cannot be a meaningful "benchmark of Postgres on Ceph". Too many things will influence the benchmark way too much. You need to be a lot more specific when you talk about such benchmarks. Here are some things you will need to present:
* Are OSDs connected to the node being tested through network or are they closer (NVMe / SAS / SATA?). If network, what's the bandwidth? What's the latency? What if it's stuff like reliable Ethernet that's used for iSCSI / NVMe over IP or something like that?
* How much memory (relative to data at rest) does the node have?
* What is the layout of memory buffers in PostgreSQL?
* What is the setting used for synchronization in PostgreSQL?
* How much replication is going to happen (Ceph pool size)?
* Block sizes and frame sizes.
* Type of workload. Surprisingly, some queries can exploit parallelism in I/O while other queries cannot. Surprisingly some queries will need a lot of synchronization while others don't.
And there's more, it would be too tedious to try to give an exhaustive list of things to control for. And the problem is, at least those mentioned here can influence performance sometimes up to an order of magnitude, sometimes two orders of magnitude...
I also only run short term.
I enable customers with designing and implementing the proper (CEPH e.a.) environment for their workloads.
But I don't run their systems. I always handover to the rest of the technical staff.
Bringing in a cheap SAN basically shifts that responsibility from people like me to the SAN vendor, and they could bring in hardware that might do the job.
Running on K8s, I feel you need two types of storage:
- Block storage, with proper fsync (fast and reliable)
- S3 storage
Both MUST be CloudNative.
I don't know if Cheap SAN is available with proper k8s CSI providers, but if they are, they could be up to the challenge.
Note that people like me can enable customers with both (choosing the proper 'cheap SAN', but also designing a proper storage environment with CEPH or other software storage solutions.
Why not dedicate some worker nodes using taints/tolerations/labels, even on bare metal, with locally attached storage? I wrote this many years ago now but that's the reason why we started CloudNativePG (OpenEBS might not be the answer today, but there are many storage engines now, including topolvm which brings LVM to the game): https://www.2ndquadrant.com/en/blog/local-persistent-volumes...
It is ultimately your choice. I am a big fan of shared nothing architecture for the database. (I am a maintainer of CloudNativePG)
Yeah, and let Postgres take care of redundancy.
I agree that this is an interesting proposition.
AFAIK PortWorkx could do a similar thing, but then with storage redundancy.
Basically:
- storage is synced to 3 local storage devices spread across 3 different k8s nodes. This could be NVMe.
- pod is only scheduled next to one of the three
- reads are local, writes are local (for fsync) and synchronised to the other devices.
I would love to test with pg_tps_optimizer against Portworkx
I both agree and don't agree about your comments.
Benchmarks should be a comparison and one can very well do a comparison between exactly same deployment on exactly same infrastructure with 2 different storage types without going so deep into the weeds. It is crucial to understand the environment of the actual benchmark, but many of the things you mention are less important unless you want to investigate what actually is going on under the hood (hoping to improve something).
Also note that to many people looking to run database workloads on K8s / CEPH, knowing that someone was able to run with 18k TPS without pulling rabbits from their sleeves is much helpful, and people asking all of these details basically makes people less willing to share, which is not helpful at all.
Be that as it may, as mentioned on another thread, we ran benchmarks on Premise/open Shift / CEPH, and I will try to answer as much of your questions as possible on these benchmarks. If you want more details, LMK...
* Stack is: Openshift - RBD - Network - CEPH node - VMWare VMDK - SAN storage
* Network (AFAIK) is 10g, I haven't tested network latency or storage latency, but the roundtrip for a commit (which pg_bench and pg_tps_optimizer call latency) took about is 30ms running 233 clients / 17k TPS.
* no fancy stuff like reliable Ethernet that's used for iSCSI / NVMe over IP or something like that
* I mostly ran with pg_tps_optimizer and it is designed to test storage performance (not performance from app perspective) the way it works things like shared buffer size is less important. But FYI, I ran with 2GB for cluster.spec.resources.limits.memory.
* What is the layout of memory buffers in PostgreSQL?
I don't understand what you are trying to get at. Running on K8s, you should trust the operator to deploy as smart as possible and not worry about stuff like this unless you are trying to actually investigate and fix problems. I ran with standard settings.
* I tested with many options including. Single instance, async (with synchronous_commit is remote_write, on, remote_apply) and sync (remote_write, on, remote_apply). These tests where run on Azure VM, but I am fairly sure running on OpenShift/CEPH does not impact that much. Biggest difference with 13 clients, 12/13k TPS with sync and 17/18k TPS with async. Difference is smaller with higher number of clients. As the effect is larger with smaller number of clients, probably the effect is less severe with openshift/ceph.
* AFAIK we CEPH set to keep 3 replicas. TBH, I don't see how this is of much importance. CEPH RBD kernel driver writes to both replicas in parallel. Doing more in parallel has little impact on latency and bandwidth is not the issue.
* I don't know the Block sizes and frame sizes for sure. I expect it is default settings (4096).
* Type of workload. Yeah, this is important stuff.
First of all, about pg_tps_optimizer. I have the most interesting information with pg_ts_optimizer. It basically runs update statements on a record in a table, and with 233 clients this is 233 tables. This really tests storage performance (we rule out things like semaphore locks). This might be compared to importing data with a separate client (which could run in parallel) for every table (or partition if you like).
With pg_bench (default workload), we see similar graphs, but we see limitations with pg_bench with higher number of clients. As all data is in the same table(s) with higher number of clients they run into contention issues (probably semaphore softlocks). As this is not a limitation of storage, I personally find this less interesting.
We have run benchmarks on our environment (CNPG, Openshift and CEPH in Dutch gov) and compared to Azure Postgres and (CNPG on) Azure AKS.
Pgbench and pg_tps_optimizer.
CEPH indeed is a 'high bandwidth / high latency' storage solution,
and as such we could get to comparable TPS but required more clients.
With 2 vCPU the max TPS was about 17k/18k with AKS and also with OpenShift.
But with AKS we required 34 parallel clients and with OpenShift/CEPH we required 233 clients. More clients <=> more in parallel <=> more TPS on CEPH...
If you are interested I can share some graphs.
I run a stateful metadata cache on Kubernetes with a StatefulSet and EBS as block storage. It runs SQLite just fine.
As for the "setting up Kubernetes" comment, I think that could be true of Kubernetes years ago. Nowadays Platform Engineers generally build on its capabilities continually, and by the time a user is using it to schedule network, compute, and storage the setup for an application maybe takes a day or so without a template. Most of the platform engineering work I've done on Kubernetes had much more to do with lifecycle management than paining myself over initial provisioning.
We do the equivalent on GCP. A lot of the criticism about Kubernetes storage seems to come from people using it onprem.
In that context, I can well imagine that it's a PITA to set up well. As rjzzleep commented in this thread:
> Whenever someone runs Kubernetes onprem I tell them to buy a TrueNAS or another cheap SAN. A cheap SAN costs as much as a DevOps expert setting up your Ceph infrastructure and a lot less when you actually run into issues with that software defined storage solution. Once you do that Kubernetes is actually quite nice
Why do you use a StatefulSet? I assume every instance has it own volume with its own sqlite and that backs the cache. Why not just Deployment? Failover would be easier in that case.
I scale it vertically. Cache refresh is non optimal and takes a few hours. I could make it better by having the instances talk, but frankly the service may never actually need to scale. It can handle thousands of rps off that single container.
We've started using CNPG cautiously in production and can say we're finally confident about running databases in kubernetes. (we had a bad run before with custom setups, and went back to using VMs again).
I am very grateful to maintainers for not only open sourcing the industrial-grade operator, but also sharing so much of their expertise. Unexpected side-effect of adopting CNPG for us was that we now have a good starting point for running postgres with high availability. (obvs there's still a lot to learn, but CNPG docs is a treasure trove of operational knowledge)
I never ran postgres in prod in k8s, but i used cloudnative-pg for some light, non-critical loads, dev/stg stuff. It works and it's fast, recovery just works (unlike zalando, which refuses to operate with anything but AWS s3). I also reported few minor bugs to cnpg team and they fixed them in no time.
We're using the Zalando operator. We have our wals and base backups shipped to Azure Storage Blob. Not sure where this S3 only thing comes from. Can you exllain further?
The way I'm reading the comments here is 'AWS S3 as the only flavour of S3 that works', i.e. they're having issues backing up to to services like DigitalOcean Object Storage via S3, rather than S3 being the only backup protocol that's supported.
I'd definitely take something over the insanity that is managed services and constant ticket filing with the cloud vendors, but my gut says I just need a Helm chart that makes it decently easy to, a.) bring up PG & some replicas that hopefully have a decent consistency story and b.) a backup job that can write to S3 et al.
I need a better "seduce to use" for a CRD for Postgres, I guess.
"It doesn’t rely on statefulsets [and manages disks itself]" and "The Kubernetes way" (and dark red on blue, yeesh) … just doesn't inspire me?
CRs that simply instantiate an instance aren't all that useful IMO. they're much more useful if you have something like Prometheus' case where they want to attach configuration to various Kubernetes resources and can't easily fit it in annotations.
i don't know that i would actually recommend Helm for much though, since dealing with templates beyond a basic "sub string into field" use case is pain and misery. once you start dealing with named helper templates that operate at different scopes and all but the simplest control flow, the lack of tooling makes debugging templates a nightmare. the operator ecosystem has its own problems (i still can't tell what the extra Red Hat stuff beyond kubebuilder is really helping with and can't stand updating it), but having an actual programming language and all the accompanying type checking and testing tools available is a major benefit.
if you don't need something complex, kustomize feels much less breakage-prone than Helm, though its patches aren't as intuitive to write.
Let's be clear: all of the YAML-cum-DSL K8s deployment options are terrible. It's still just templating with extra steps. But Kubernetes is, fundamentally, one big while(1) loop turning YAML into infrastructure, so, eventually, you have to target YAML. It's just unfortunate that we got to "spicy regexes with if statements" and then stopped, instead of using something more robust.
I don't understand this at all, a Kustomize base is far easier to read than a Helm Chart? It's a set of manifests with no templating. I find that it's easy to see what gets patched or replaced in the overlays and there's a lot more fail-fast rather than stringly-typed whitespace nonsense.
I guess your kustomizeations are relatively simple. Just wait until you have 4-5 layers of it, flipping between several files to figure out which file needs to change how, just to change an attribute. With a chart, I just make the change and can work it out, all in the same file.
The place the operator looks to be most useful is updating the db. It'll do "the right thing" updating the secondary pods first, doing a switch over of the primary, then updating the old primary pod.
That sort of action is real hard to do with just a plain helm chart (especially if you didn't plan for it up front).
The operator also handles when a node fails or is taken down. An operator can handle replicating to a new replica on a new node and making sure everything stays consistent.
What is a reason someone would want to run Postgres on Kubernetes?
I associate Kubernetes with stateless services that can canary deployed, blue-green deployments, auto restarts upon memory crashes, auto scaling etc.
But, I cannot think of any practical reason one would want such events on a database that one relies on for keeping the state?!
What am I missing?
Are there any toy use cases where people are using Postgres for where they can afford(and want) kuberneyes to rolling deploy, and crash restart Postgres instances?
In that case, why Postgres in the first place?
I did read the article you linked. It does touch a little bit on potential benefits of having databases managed the same way the services using them are, and integration tests that include the database.
However, I’m still scratching my head how any of that is better or not possible with a Postgres installation that is outside of kubernetes. Let’s take out the management part of databases themselves - assume a managed database service like RDS POSTGRES. why would one want to run Postgres on EKS over having their pods on EKS talking to RDS Postgres?
I feel like I’m missing some technical reason/advantage that makes all these people choosing to run Postgres on Kunernetes with operators and what not.
What does Kubernetes bring to this table over a separate Postgres that dont run the risk of kubernetes interfering with the reliability or operation?
The only reason is if you have a bunch of micro service apps that need their own small db that can be quickly spun up, and you already run everything else in kubernetes. A very niche use case that doesn’t apply to 95% of shops.
All the rest of replies just make my eyes roll, it’s like reading promotional pamphlets devoid of any context.
The reason could be the last 4 years of evolution in Kubernetes. Have you heard of DoK Community (Data on Kubernetes)? Might be a good place where to start.
I prefer to run PostgreSQL The Grumpy Greybeard Way.
Like, on a (virtual) machine.
With a config file.
I'm doing a web app with terabytes of healthcare data with intense computation and geospatial queries with an average 60ms response time including latency.
My current DR/HA is basically just hourly block level delta snapshots. It's super simple, cheap and easy to do if the business can tolerate up to an hour of potential data loss.
Running a database on a VM doesn't sound very Greybeard™ of you; don't you know that abstracting like that kills performance and takes you away from the hardware? (Only half joking; obviously you totally can make it run fine on a VM, but it also runs fine in a container.)
That is very cool and much needed, but I am _paranoid_ about hosting the data myself. I've lost too many databases over the years to the most random mistakes. Granted, technology has matured a lot over the last decode, but still... the cost (the nights spent debugging and troubleshooting) of maintaining the database yourself is just not worth it when managed solutions exist.
The whole clue of cloudnative-pg is that it makes supporting Postgres so much easier. My DevOps colleague and I had no experience with running PG at all, studied the backup and replication chapters of Simon Riggs book [0], read the excellent cnpg documentation and deployed Postgres clusters on Lab, Dev & Prod k8s clusters. Production started sept 2022. Not a huge use case, 5TB of data, constant stream of IoT type of data. Continuous backup to Minio on TrueNAS Core. Later we added both a hot-standby and a backup replica in a secondary site for disaster recovery. No production issues like you describe.
Many EDB & 2nd Quadrant have 10+ years experience as core Postgres commiters. I had the pleasure meeting some at Kubecon EU in Amsterdam, friendly Italian & UK Engineers I felt I can trust to make good software. They saw the issues you describe and took a step forward by engineering a proper Kubernetes Operator, introducing it at Kubecon EU 2022 to the public when it reached version 1.5, after they had been running it on their DBaaS production clusters for some time.
Highly recommended. IMHO running Postgres on-prem in most use cases is cheaper than a hosted version. Especially taking into account Schrem 1, 2 & (upcoming) 3 [1].
Easier in the sense that you don't know what it's doing... It's not hard to deploy PostgreSQL and make it do something. Making it do what it needs to do well is a completely different thing. Tools like this one (or any other management tool that comes from outside) aren't helping you to make it easier. To make things easy you need to learn how to do them. They make it easy to waste a ton of resources on things you don't need for the faction of performance you can get.
The OP was complaining it was difficult to maintain Postgres, with late nights. Expensive in human resource costs. You’re now switching to a different topic, from total cost of ownership to resource cost of ownership, not taking into account human operator costs.
EDB argues that human costs of operating an highly available cluster with hot standby etc using cnpg & the abstraction it offers is much lower than without an Kubernetes operator.
Of course a highly skilled and experienced Postgres guru with years of experience can run a highly available cluster on less compute resources. But what happens if that person is not available? On holiday leave, with pension? How many of those gurus would a company need to employ? How many of these self maintained pg clusters can a couple of these gurus maintain?
Not having DBAs, the people with the actual skills to run the databases you need, available when you need them, certainly sounds like a business staffing problem, and not an engineering problem. Why are inexperienced developers managing databases in the first place? Or is this something people think you can just wing and be okay?
Who says ‘inexperienced developers are managing databases’? That is not the use case EDB is advocating with cnpg, EDB offers a Postgres DBaaS for developers.
The README [0] explains that CloudNativePG has been designed by Postgres experts with Kubernetes administrators in mind. Put simply, it leverages Kubernetes by extending its controller and by defining, in a programmatic way, all the actions that a good DBA would normally do when managing a highly available PostgreSQL database cluster.
> Granted, technology has matured a lot over the last decode,
Running PostgreSQL database by yourself was just as easy decade ago as now.
Like, I get not wanting to, especially if ops is not your job, but compared to actually programming apps it's not that hard job job till you get to TB+ sizes. At least compared to writing the more complex apps using it.
So are compilers, virtual networks, and encryption, but I have tools for all those that don’t inspire the kind of nameless terror that databases do. The failure modes are different but the problems are not really easier intellectually speaking.
There are databases like CockroachDB that are more modern and a lot more approachable for high availability but for some reason everyone adores Postgres. I’m not sure why. It’s arcane and clunky and feels like 1980s Unix software.
PostgreSQL is the very best of the clunky 80s unix software. Its features and reliability are unmatched. The core contributors have earned the trust of the community.
You lose a lot of features and performance when you go from a single server database to a distributed system. Distributed systems are significantly more complex to set up, administer, and debug. For nearly all databases in use, the tradeoff isn't worth it.
It's really no wonder that postgresql is as popular as it is.
That's because they're not really that hard. Compilers are essentially pure functions, encryption is as well. State is another beast entirely.
If I had to put on my innovator cap and do a relatively weakly informed guess, I'd say its because querying capabilities and reliable storage are still too conflated. If we focused on reliable storage that only has great replication support to other querying systems, the problem might get easier.
Actually not quite. Backups (assuming you are doing single node) still needs you to decide on your SLOs - RTO (how long it takes to restore) and RPO (how much data lose you can suffer) numbers. On the instant snazy end you have streaming backups and recovery and then on the other extreme you have backup once in N hours/days and restore taking how ever long it takes to restore (so you have customer outages you need to negotiate.
Now let us involve multi node, (both replication and partitioning of shards). As shards go and up and down ensuring data is in sync etc is a hard consistency problem and needs man years of operational excellence and bug fixing.
So when people think databases - they think of the cool stuff - the database engine that does relational algebra and handles SQL queries. That is (IMO) only 1% of a practical, performant, reliable database (offering).
Maybe if you are gigantic, but there is a long tail of people with <1TB database needs that don’t really need shards and can be well served by a fail over cluster with a master and one or two replicas that can become masters.
These days you don’t really need shards until you hit many terabytes or even more depending on your read and especially write load. NVMe storage is really fast and lots of RAM for caching has become cheap.
So my point was around all things a managed for gives you (eg sharding and replication). Even by the time I had to setup streaming replication and have to worry about wal drifts it is easier to pay a managed provider no?
Also what about the customer that deleted an important thing 6 weeks ago and absolutely needs it recovered? BTW, it's just one tentant in that DB, the other shouldn't be recovered, naturally.
In that case, it’d probably be best to just handle deletions at the application layer (e.g., setting a “deleted_at” timestamp field with scheduled permanent deletions later).
And in terms of data compliance, it’s very important to make sure permanent deletions propagate through your backup systems within a reasonable amount of time - Google Cloud[1], for example, is ~180 days.
Backups? Do you want to share your idea about how you'd do backups? Especially to a distributed database?
Here are some of the questions you'll have to answer and some options you will have to consider before you go there:
Let's start with the heavy stuff: consistency groups. I.e. groups of bulk storage that underlines your entire infrastructure that ensure that your application and database(s) all recover to the shared state once they crash. To better explain this concept, consider this: you have an application that works with two databases, let's say a document database to store documents uploaded by users (which are later parsed by the application and transformed into records in a relational database). Now, each database provides best consistency guarantees... but they still can fail independently and subsequently recover to different state, where, for example, the document database can be ahead of the relational one (and lose some data). Similar problems face sharded databases.
How geographically far are you going to send your backups? You see, the closer to the working server they are, the higher is the chance you'll lose them together. But, here's the problem: the further away the backups are, the lower is your ability to keep the backup up-to-date with the database, and, subsequently, more data to lose.
Well, backups inherently lose data (for the time between the last backup and the time of the crash). So, if you don't want to lose data at all, you probably want replication rather than backups. And you probably want online replication (but then the distance between the replicas is even more important than in the case with backups).
Also, backups are huge. If you want to ship them outside of the facilities of the storage vendor... that's going to be expensive.
Another point to consider: databases provide consistency guarantees, but does your database provide consistency guarantees you want? Is every relation encoded by using foreign keys, or does the application have some knowledge of how to interpret pieces of data and stitch them together into relationships unknown to your database? Are you sure that every operation that requires atomicity is implemented in a database rather than application (which doesn't enforce atomicity)? What if you stick a backup (recovery point) in a precise moment when your application was doing something that was meant to be atomic, but the application author didn't know how to express in SQL (because in their fear of technology they chose to use Hybernate or SQLAlchemy etc.)? And if you do so, it spoils your backup...
I actually do not understand the point here. And maybe you are not very familiar with the concept of transactions. Backups can only account for committed transactions.
However, we are talking about Postgres, here, not a generic database. PostgreSQL natively provides continuous backup, streaming replication, including synchronous (controlled at transaction level), cascading, and logical. You can easily implement with Postgres, even in Kubernetes with CloudNativePG, architectures with RPO=0 (yes, zero data loss) and low RTO in the same Kubernetes cluster (normally a region), and RPO <= 5 minutes with low RTO across regions. Out of the box, with CloudNativePG, through replica clusters.
We are also now launching native declarative support for Kubernetes Volume Snapshot API in CloudNativePG with the possibility to use incremental/differential backup and recovery to reduce RTO in case of very large databases recovery (like ... dozens of seconds to restore 500GB databases).
So maybe it is time to reconsider some assumptions.
We are currently moving to CNPG and have tried CrunchyData and Zalando in the process. The other two we abandoned while trying it out.
Zalando:
- Relies on WAL-E which is now obsolete
- Documentation all over the place
- Hacky setup that deviates from K8s standards (no easy way to set user through supplying secrets, for instance).
In general, it feels like an operator to be used internally at Zalando according to their conventions that they just open sourced. It doesn’t seem like they want (or get time) to support other conventions. I don’t think this is a bad thing, it’s already great Zalando open sourced this. Just important to know when you decide to use it.
CrunchyData:
- Incomplete documentation (Certain values settings are missing from their API specs)
- Hacky user setup.
- Doesn’t support running without backups enabled. (Obviously, you’d never want to run without backups setup on prod. But when testing, it’s nice to not need to have a perfect setup from the start. Without backups, it will let the database pods fill up their PVC’s with a WAL. Even when not doing any writes. It fills up at about 10GB/day.)
- Backups seem to randomly fail.
It looks pretty OK otherwise.
CNPG:
- Adheres to K8S standards
- Seem to realise that an Operator will (currently) not fully replace a DBA. Their kubectl plug-in is great to interact with the cluster.
Obviously we still need to test rollovers and restoring from backups, but so far it’s been easy to setup.
It does suffer from what most operators suffer from; their CRD is a mess. The UID of the Postgres Container is specified on the same indentation level as my switchOverDelay, superuserSecret and bootstrap spec. Would be nice if these would follow a more logical grouping (pod spec, users, switchover).
I've never used it myself, but while doing research I noticed that it received a lot of praise from users.
One thing that did catch my attention is that it doesn't use statefulsets for the postgres pods. I mostly agree with their reasons, but I haven't taken the time to understand their implementation.
Personally I've only used Kubegres (https://www.kubegres.io/), which didn't even make the above list. It's ok for a personal project.
All k8s solutions for postgres take subtly different approaches. It seems that they've all converged on the Operator pattern. The basics are easy: run a database process which persists data to a cloud disk of your choice. The hard parts are how to update, migrate, backup, restore, monitor, failover, replicate, etc. These kubernetes "operators" promise to fulfill the role of a DBA but, just like hiring a DBA, it requires buy-in to their approach.
From my experiences with Strimzi, I think of it as "half-managed", like, it'll make doing things like upgrades easy as, but yeah, it's just tooling that makes self-management easier.
I’ve been using it in prod for a while now, pretty happy with it. Solid, integrated pgbouncer, crd based, good license.
I do wish there was a simpler way to handle major version upgrades of pg.
When I looked at some alternatives, these were my thoughts (may be out of date by now)
- kubegres: maintained by one guy, lots of GitHub issues with no responses
- crunchy data pgo: licensing is not obvious, seems to require license in some cases
- stackgres: agpl, no thanks
- zalando: they know pg extremely well, but it’s not kubernetes native. Doesn’t include pgbouncer. Doesn’t handle automatic failover when a node dies, and during testing it often got confused when killing a node.
> CloudNativePG exclusively relies on the Kubernetes API server to maintain the state of a PostgreSQL cluster.
This is scary af. Kubernetes API server is very finicky and unreliable. It's probably the first component that fails in Kubernetes no matter what the problem is (eg. I recently managed to run it into unrecoverable state by accidentally starting 10K jobs instead of 100).
This is just outright bad idea... but, really "reliable" and Kubernetes don't belong together. So, if you wanted an unreliable PostgreSQL, which you have no idea how it's managed and how to recover... well, that sounds like fun!
Are these extremely large shops in some sort of pocket universe? Because I've been around the block enough to regularly experience Kubernetes's issues over and over again, and I'd say that the GP comment you're critiquing is actually right on the money. In my experience, people downplaying Kubernetes's drawbacks have either never been bitten by them, or make a lot of money by getting people to use Kubernetes.
Complex systems are complex. If you don’t need it, don’t use it.
But you’re absolutely wrong to call it unreliable. I’ve also “been around the block”, and I’ve seen 50k lines of BASH fail in complex ways too.
A google search would show you plenty of extremely large shops using k8s, and dozens or hundreds of tech talks by their lead engineers saying how much better it made their lives.
No it's not. Kubernetes is not an equivalent of a car, it's an equivalent of Land Rover Discovery 5 in your scenario. There are plenty of cars which score differently on some pre-agreed reliability scale (and, specifically, this model of Land Rover scores very poorly).
Kubernetes is light years away from being a reliable system. It's a system to support Web sites, which don't need to do anything that would require a lot of investment into reliability. Nobody in their right mind would use Kubernetes to run highly-reliable systems that, eg. put human lives at risk.
PostgreSQL is in a completely different category from reliability standpoint: much more reliable. But, if you put it on Kubernetes footing, you give up that reliability.
Again, maybe your business is to make Web sites -- then nobody cares, it's good enough. Running a database of a bank on top of Kubernetes -- well, I hope nobody actually tries that. And size has nothing to do with that. Reliability is about all sorts of metrics like mean time to failure / recovery and guarantees that the system makes like once a particular condition is encountered, the system will halt and so on. Kubernetes never claimed anything in that category, and, in practice, it doesn't offer any satisfactory guarantees of its own performance.
Anecdotally, I saw Kubernetes fail due to its internal problems more times that I can count. Like, we are talking about at least hundreds of times. And by "fail" I mean that the system enters into a state that cannot be salvaged by a reboot or replacement a master node(s). Since my interaction with it involves a lot of testing, but I'm not testing Kubernetes, rather something else running on it, I'm all too used to deleting and deploying new clusters.
This does not sound like knowledgable feedback. You should consider investigating and fixing the problem next time. At least to the point where you can explain to others what failed. A problem that continually affected you thru many clusters sounds like a configuration issue. PEBKAC, etc.
I previously worked for a company that ran highly-reliable systems that put human lives at risk. We increased the reliability of deployments and rollbacks quite a lot by moving to K8s, and that was years ago, when K8s was much newer than it is today. A very good friend still works there and they have zero plans or desires to move off K8s, as it works fantastically for them. They maintain 5 nines uptime and are responsible for daily hospital operations.
Also - plenty of human lives rely on “websites”. Your car probably speaks HTTP. I don't understand this line of thinking at all.
You are probably viewing the dark colour scheme of the website. The website looks fine in light colour scheme. However in dark colour scheme, while it flips the background and text colour alright, it does not alter the accent colours sufficiently well.
Consider the colour of the text "Try Kubernetes Way" and the background colour. In light mode, we have foreground colour #DC2626 on background colour #FFFFFF. This has a contrast ratio of 4.82. WCAG 2.0 level AA recommends minimum contrast ratio of 3.0 for large text. So a contrast of 4.82 is decent enough. However in dark mode, we see #991B1B on #075985 which has a contrast ratio of only 1.09. This ratio is too low and fails all WCAG contrast requirements.
It's been K - 8 letters - S long enough that everyone at work just calls it "Kates" now, which initially annoyed me to no end, but has kind of grown on me.
Nitpick, but I think it is from Ancient Greek. A cursory Google tells me the modern Greek descendant is κυβερνήτης/kyvernítis and has a slightly different meaning.
κυβερνήτης is the ancient word too. Maybe they were trying to word play with cube, which by the way also comes from κύβος-kyvos and not kuvos. Exactly same meaning as far as I know, wonder what the cursory google came up as the difference in meaning. Also root for cybernetics...