Great article on the fundamental problems associated with mutable state. The fundamental problem is that the idea of an object with a set of state that is the same to all observers violates pretty much the whole of information theory. It's not a problem that will ever be fixed with out changing the fundamental laws of the universe.
Ditch the mutable data and you can stop asking questions like what do we do if 10 becomes 10.5 before it becomes 11 and start storing values which never change.
Information theory does not have a problem, only, as you say, our universe. Mathematicians and their various hangers-on like programming language researchers often prefer to deal with models that have no concept of time, in which the very concept of "observer" is extraneous since there isn't really anything like a "point of view". Everything just... is.
Also, I think the focus on immutability misses the more interesting discussion about lattices and how important they are to distributed programming. Check out this video, which will take you up from the beginning: http://vimeo.com/53904989
(Seriously, if you do anything remotely distributed, this is required viewing. There's some serious stuff going on in the distributed world and this is a great intro.)
The talk is getting interesting at 11 minutes in. But I can't find the slides anywhere and the video was edited by someone who likes to show the slide for 5 seconds and the person pointing at it for 2 minutes.
The question of consensus and mutability are, to some extent, separable; immutable data does not provide liveness without consensus guarantees. See, for instance, Datomic's design, where a strong coordinator is needed to serialize updates to otherwise immutable/replayable state.
Indeed. Although the use of confluent data structures where possible combined with immutable data stores combined and 'live' functional reactive programming across device boundaries -would- drastically simplify cases like these.
The argument here is basically to replace mutable state with event sourcing. It's an interesting idea and sometimes the right one, but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding. This is also going to yield performance problems I'll be tempted to avert with caching, leading to fantastically bad performance or cascading failures whenever the caches fail or are restarted.
I'm sure it's the right answer for some people, some of the time. In fact, I'm sure it's not applied as frequently as it should be. But it's definitely a specialized tool with special applications, where mutable state is, for better or worse, the hammer we can and ought to continue relying on.
> but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding
I think there might be a balance here: you can always garbage-collect events that are already merged in an updated value of the given object. Depending on the requirement, this GC can be done e.g. after days or after months of that merge...
Doesn't having an "updated value" imply mutable state? If I have some, why not have it all? Having both mutable state and a complex folding operation to maintain it is going to give me multiple sources for the same information. Which one will be authoritative? I'd expect a compromise to be worse than either extreme.
I assume you have 'events' that you will 'merge' together in a single state object (in case you want to display something). So the operation is to fetch every related event, merge, display.
Now the 'folding' can be defined as snapshotting the 'merged state'. Instead of fetching 10 events, after the folding + GC, you will fetch e.g. 2 + the folded one. You are saving some CPU and bandwidth over time and that's it.
> The argument here is basically to replace mutable state with event sourcing. It's an interesting idea and sometimes the right one, but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding.
Once an event is visible to all users, it can be merged into the base state of the system and no longer needs to be stored (or at least kept online) as a separate event. For the performance reasons you allude to, you probably want to do that as much as possible.
(This is still effectively an append-only series of immutable states, but losing, at least from regular on-line access, "older" states that no one can see anymore.)
There's the rub: any GC/compaction (and when you get down to it, any read) is actually a distributed consensus problem. If you're interested in this problem, you might take a look at the CRDT garbage collection literature for more details.
> if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding
Yes I could see that getting prohibitively expensive when SSDs cost 70 cents/GB and hard drives 5 cents/GB. You should really throw out your historical data at those kinds of costs, probably not worth 5 cents per GB.
Personally, that's why I don't keep backups, files change all the time, and I was going broke making sure I had older copies of my data. I'd rather just rewrite all my code and retake all my pictures.
Perhaps 100s of TBs but 100 of GBs is not a difficult problem to solve, perhaps once you get to more than 100 TB you'd be beyond the capabilities of a single chassis.
Even a petabyte should fit in a rack or two.
I really fail to understand how a business could acquire that much data and not be able to sell it.
As the other commenters have pointed out, I'm pretty sure that you're supposed to compact old events once they have been determined to be eventually consistent amongst all (or almost all) of the nodes.
Ditch the mutable data and you can stop asking questions like what do we do if 10 becomes 10.5 before it becomes 11 and start storing values which never change.