I have been advocating for the longest time that "repeatable read" is just a bad...

rockwotj · on Dec 20, 2023

I honestly don't think I've seen people reason about read committed well, especially as an application grows it becomes very difficult to understand all the cases in which locks are grabbed/data is accessed. (I feel that way about multi-threaded code and locks too, but that's for another time).

So I really only see serializable to be the only sane isolation model (for r/w transactions), and snapshot isolation is a good model for readonly transactions (basically you get a frozen in time snapshot of the database to work with). This also happens to be the only modes in which Spanner gives you: https://cloud.google.com/spanner/docs/transactions

rjh29 · on Dec 20, 2023

Read Uncommitted is fine for aggregate stats, although at that point it would be better to pipe the data into Clickhouse.

camgunz · on Dec 20, 2023

Yeah there are many use cases where you aren't really using transactions, or where hard accuracy isn't a requirement (you're taking a percentage out of millions of rows) but you need fast write performance. It's very nice and can help you put off transitioning to a "real" data warehouse or a cache for a while.

jasonwatkinspdx · on Dec 20, 2023

Read only snapshot queries are very useful in practical systems.

tzone · on Dec 20, 2023

Here is my issue with this. We are assuming you need to read a "consistent snapshot" in some type of real time application. Because if it isn't real time, you can always have those snapshot type of querying on "replicas", since that is a lot easier to implement correctly without sacrificing performance.

So assuming you are looking at reading "consistent snapshot" in the context of a real time transaction. If the data that you want to read as a "consistent snapshot" is small, locking + reading is good enough in most cases.

If the data to read is too large (i.e. query takes long time to execute, and pulls a lot of data), you are going to have ton of scaling issues if you are depending on something like "repeatable read". Long running transactions, long running queries, etc are bane of all the database scaling and performance.

So you really want to avoid that anyways, you would almost always be much better of changing your application logic to make sure you can have much shorter, time bounded transactions and queries and setup better application level consistency scheme. Otherwise you will at some point hit scaling/performance problems and they will be an absolute nightmare to fix.

jasonwatkinspdx · on Dec 20, 2023

Under MVCC this does not require locks which is a non trivial overhead savings for use cases that can tolerate slightly stale data, but want that data to be internally consistent, ie no read skew (and write skew is irrelevant in read only queries). This combination of strict serializable + read only snapshots is quite common with recently developed databases.

klysm · on Dec 20, 2023

But you wouldn’t need to lock them if repeatable read actually worked

tzone · on Dec 20, 2023

That is sort of my point. I think "repeatable read" is a fools gold. You think you wont need to do locking, but it is too easy to make incorrect assumptions about what guarantees "repeatable read" provides and you can make very subtle mistakes which leads to rare, extremely hard to diagnose correctness issues.

Repeatable read type of setups also make it much easier to accidentally create much longer running transactions, and long running transactions/too many concurrent open transactions/etc can create really unexpected, very hard to resolve performance issues in the long run for any database.

karmakaze · on Dec 22, 2023

I agree that "read committed" is clearer in knowing what you're getting than "repeatable read". The latter can be convenient and workable if you accept that you will need to do locking to avoid write skew, etc.

I've used both "read committed" and "repeatable read" with MySQL and learned to deal with each in their own way.

The problem I've seen is with large/long-lived transactions that impact performance, where the solution is to divide writes into smaller transactions in the design--"read committed" does tend to encourage smaller transactions.