Great article on the fundamental problems associated with mutable state. The fun...

jerf · on Sept 27, 2013

Information theory does not have a problem, only, as you say, our universe. Mathematicians and their various hangers-on like programming language researchers often prefer to deal with models that have no concept of time, in which the very concept of "observer" is extraneous since there isn't really anything like a "point of view". Everything just... is.

Also, I think the focus on immutability misses the more interesting discussion about lattices and how important they are to distributed programming. Check out this video, which will take you up from the beginning: http://vimeo.com/53904989

(Seriously, if you do anything remotely distributed, this is required viewing. There's some serious stuff going on in the distributed world and this is a great intro.)

marshray · on Sept 27, 2013

The talk is getting interesting at 11 minutes in. But I can't find the slides anywhere and the video was edited by someone who likes to show the slide for 5 seconds and the person pointing at it for 2 minutes.

aphyr · on Sept 26, 2013

The question of consensus and mutability are, to some extent, separable; immutable data does not provide liveness without consensus guarantees. See, for instance, Datomic's design, where a strong coordinator is needed to serialize updates to otherwise immutable/replayable state.

mtrimpe · on Sept 26, 2013

Indeed. Although the use of confluent data structures where possible combined with immutable data stores combined and 'live' functional reactive programming across device boundaries -would- drastically simplify cases like these.

fusiongyro · on Sept 26, 2013

The argument here is basically to replace mutable state with event sourcing. It's an interesting idea and sometimes the right one, but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding. This is also going to yield performance problems I'll be tempted to avert with caching, leading to fantastically bad performance or cascading failures whenever the caches fail or are restarted.

I'm sure it's the right answer for some people, some of the time. In fact, I'm sure it's not applied as frequently as it should be. But it's definitely a specialized tool with special applications, where mutable state is, for better or worse, the hammer we can and ought to continue relying on.

agilord · on Sept 26, 2013

> but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding

I think there might be a balance here: you can always garbage-collect events that are already merged in an updated value of the given object. Depending on the requirement, this GC can be done e.g. after days or after months of that merge...

fusiongyro · on Sept 26, 2013

Doesn't having an "updated value" imply mutable state? If I have some, why not have it all? Having both mutable state and a complex folding operation to maintain it is going to give me multiple sources for the same information. Which one will be authoritative? I'd expect a compromise to be worse than either extreme.

agilord · on Sept 27, 2013

I assume you have 'events' that you will 'merge' together in a single state object (in case you want to display something). So the operation is to fetch every related event, merge, display.

Now the 'folding' can be defined as snapshotting the 'merged state'. Instead of fetching 10 events, after the folding + GC, you will fetch e.g. 2 + the folded one. You are saving some CPU and bandwidth over time and that's it.

dragonwriter · on Sept 26, 2013

> The argument here is basically to replace mutable state with event sourcing. It's an interesting idea and sometimes the right one, but if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding.

Once an event is visible to all users, it can be merged into the base state of the system and no longer needs to be stored (or at least kept online) as a separate event. For the performance reasons you allude to, you probably want to do that as much as possible.

(This is still effectively an append-only series of immutable states, but losing, at least from regular on-line access, "older" states that no one can see anymore.)

aphyr · on Sept 27, 2013

Once an event is visible to all users,

There's the rub: any GC/compaction (and when you get down to it, any read) is actually a distributed consensus problem. If you're interested in this problem, you might take a look at the CRDT garbage collection literature for more details.

pfraze · on Sept 27, 2013

But don't you buy yourself a stronger guarantee that all inputs have been collected by waiting to converge?

_3u10 · on Sept 27, 2013

> if each user action that triggered a one cell update becomes an event I have to keep forever, I see my database size exploding

Yes I could see that getting prohibitively expensive when SSDs cost 70 cents/GB and hard drives 5 cents/GB. You should really throw out your historical data at those kinds of costs, probably not worth 5 cents per GB.

Personally, that's why I don't keep backups, files change all the time, and I was going broke making sure I had older copies of my data. I'd rather just rewrite all my code and retake all my pictures.

buerkle · on Sept 27, 2013

Hardware cost is hardly the only cost. Managing a few GBs of data is quite a bit different than managing 100s of GBs or more of data.

_3u10 · on Sept 27, 2013

Perhaps 100s of TBs but 100 of GBs is not a difficult problem to solve, perhaps once you get to more than 100 TB you'd be beyond the capabilities of a single chassis.

Even a petabyte should fit in a rack or two.

I really fail to understand how a business could acquire that much data and not be able to sell it.

http://blog.backblaze.com/2013/02/20/180tb-of-good-vibration...

Here ya go, 180 TB for $10K in 4U, which means 10 to a rack which means 1.8 PB per rack. Who has 180 TB of database that isn't worth $10K?

coherentpony · on Sept 27, 2013

Hi, I work in computational fluid dynamics.

buerkle · on Sept 27, 2013

Again you need to look at more than the cost of hardware, that's not the issue. More data requires more managing; performance, backup, failures, etc

fusiongyro · on Sept 27, 2013

Hi, I work in astronomy.

oakwhiz · on Sept 27, 2013

As the other commenters have pointed out, I'm pretty sure that you're supposed to compact old events once they have been determined to be eventually consistent amongst all (or almost all) of the nodes.