Everything You Know (about Parallel Programming) Is Wrong

fmstephe · on March 27, 2012

My reaction to this is that I am sure he is right that embracing nondeterminism and relaxing synchronization will allow us to scale up to these many core PCs. However, this does not solve the problem of how 'the rest of us' program them effectively.

One of the biggest challenges I see practically are that concurrency and parallelism are the hardest thing to push down under a layer of abstraction. These effects leak out all the way through your application (and even into other applications).

It strikes me that this kind of non-deterministic approach will leak everywhere. And then who can program in this way? Who among us can reason effectively about just how far from the true answer we are. If this is the only way we can really harness the many-many-core PCUs of the future then we will have the change the way we make software. The IBM approach of armies of programmers will not work. As much as I hate this approach to software it is the way most software (at least in the enterprise) is made. Will we go back to the long beards? Or will we end up in even more aggressively restrictive frameworks that work harder to prevent us from doing this wrong.

If this approach to parallelism turns out to be important I hope that the problem of reasoning about these types of program is tractible. To push well understood software development even further away from the reach of most programmers. If we all end up having to learn a bit of statistics that wouldn't be a bad thing.

yummyfajitas · on March 27, 2012

When you write concurrent software, you typically build two kinds of components.

The first are the cogs. These are typically either immutable data structures, or mutable components designed to be accessed only from a single thread. The second is a concurrent abstraction which sits on top of the cogs.

For example, in my project Hobo, I've got an immutable, read-only index with procedural, deterministic methods. Then sitting in front of it I've got a TThreadPoolServer which handles multithreaded access.

Single threaded: https://github.com/stucchio/Hobo/blob/master/src/org/styloot...

Library handles multithreaded part: https://github.com/stucchio/Hobo/blob/master/src/org/styloot...

Another thing that will work for data where contention is rare is software transactional memory. Most programmers are already familiar with SQL transactions, I don't see a reason they can't use it in memory as well.

fmstephe · on March 27, 2012

"The first are the cogs. These are typically either immutable data structures, or mutable components designed to be accessed only from a single thread. The second is a concurrent abstraction which sits on top of the cogs."

This is a very good illustration of how concurrency potentially leaks out across an entire application. You can see here that your concerns for concurrency appears in the cogs. It is certainly true that you have provided a layer of abstraction but only for a certain concurrency model, and your cogs will reflect that in their implementation.

I would also note that I wouldn't claim that all concurrency problems pose these kinds of problem. Graphics pipelines being the easiest (and best abstracted) example. But in general for the day to day code we write, concurrency concerns have a habit of being very hard to tidy away.

For a really good treatment on the virtues and limitations of transactional memory there is a great conversation between Rich Hickey and Cliff Click on the clojure mailing list. I can't find the original, but here is a good cleaned up summary of the exchange.

http://blogs.azulsystems.com.sharedcopy.com/cliff/2008/05/97...

yummyfajitas · on March 27, 2012

Concurrency does leak out and influence design decisions, most notably the API of the rest of the app. This is a given and there is nothing to be done about it.

But the point I'm making is that once the API is fixed you get to program more or less normally. Look at the code I linked to - it's all standard procedural java. Some smart guys at Facebook developed the concurrency system. I did some ordinary programming subject to one additional constraint (immutability).

I imagine this is how the future will look - top programmers build APIs, and the army of programmers will build many immutable/transactional/etc subcomponents. With language support (such as what Haskell/Clojure already give you) this will be very easy to enforce - IBM's army of guys who can't handle concurrency will stick to writing pure functions.

vajrabum · on March 27, 2012

Yes, and the way I understood the article is that the problem--not for the next generation of chips but two generations down the line--is that cog architecture scales up to 10s or 100s of processors but not thousands due to synchronization delay. That I believe is what David Ungar is attempting to address in his research.

swah · on March 27, 2012

Thanks guys, this kind of discussion is why I open HN every day for the past few years.

jerhewet · on March 27, 2012

> The first are the cogs.

Would appreciate it if people around here (and elsewhere) would stop acting like the word "cogs" is universally recognized, like "ram" or "dos". I have no clue what "cogs" is / are, and the search engines aren't helping to make me any wiser (or at least informed).

lambda · on March 27, 2012

Cogs are the teeth on gears. Metaphorically, they are used to refer to one small part of a machine, which are not useful on their own, but in aggregate can get useful work done. The term is sometimes applied in a derogatory fashion to people, implying that they are nothing more than a replaceable cog in a large machine.

mooism2 · on March 27, 2012

> And then who can program in this way? Who among us can reason effectively about just how far from the true answer we are.

If floating point math is anything to go by, most programmers won't be able to cope with this. (Then again, most languages do nothing to help programmers keep track of how large the inaccuracies get.)

fmstephe · on March 27, 2012

An interesting point. I suppose if this becomes a really important technique we may see some very interesting language support. Fun times :)

riffraff · on March 27, 2012

Please, if you are going to be astonished and or irritated about the bit in TFA related to "non determinism", consider that, in Ungar's own words:

""" To the extent that an application remains acceptable to its end users with respect to their business needs, we aim to permit – or no longer attempt to prevent – inconsistency relating to the order in which concurrent updates and queries are received and processed """

This is something many people are already familiar with: you can giveup some correctness or timeliness, if it doesn't matter much. It's what's behind eventual consistency, yesterday's youtube architecture article, redis' nofsync settings etc.

[1] http://soft.vub.ac.be/~smarr/renaissance/ungar-kimelman-adam...

varelse · on March 27, 2012

While I absolutely agree that we have to give up on nondeterministic execution (did so years and years ago), I flat out disagree that this necessarily leads to nondeterministic output/results. That's just lazy and a nightmare to debug.

Speaking from my own experience, there's usually a way to order the results either by sorting, by accumulating them in a deterministic manner (extended fixed-point math or by ordered reduction), or by halting execution upon reaching a consistent degree of convergence.

And we already know one way to program such beasts: in a map-reduce like manner where chunks of computation are doled out to all the cores, each executed deterministically within them, but doled out in a nondeterministic order.

6ren · on March 27, 2012

more discussion on reddit: http://www.reddit.com/r/programming/comments/reo77/everythin...

interview with author (about the talk): http://channel9.msdn.com/Blogs/Charles/SPLASH-2011-David-Ung...

the project homepage: http://soft.vub.ac.be/~smarr/renaissance/

drostie · on March 27, 2012

This is very interesting -- is there any hope to get video of the talk?

mooism2 · on March 27, 2012

When is this from? It talks about "the end of the first decade of the new century" as if it's not in the past.

DanielBMarkham · on March 27, 2012

So quantum computing says that instead of one calculation, I perform multiple calculations simultaneously in different universes. Now this guy says that I have to give up the idea of computing as being deterministic and always getting the right answer?

It has to be true, because it makes no sense whatsoever. Web designers of the future will have a great reply to customers. "The site gives wrong answers? Well, you need to realize that asking for correct answers isn't what we do with computers. And besides that, in another universe the answers are correct. Boy, are you really happy over there!"

That was perhaps a too-long-winded snark with a point: it's not that I don't feel that we're headed in the right direction, it's that visionaries always seem to concentrate on the brain-exploding part of what they're saying instead of talking about how it all integrates together. While I think this might sell some website views, it doesn't do much for the practitioner looking for insight.

GeoffWozniak · on March 27, 2012

Taking into account the snark, it doesn't change the fact that you're presenting a false dichotomy.

Ungar isn't saying that we have to give up determinism. What he's saying is that to take advantage of massively parallel systems, determinism comes at a high cost.

Clearly, we can write programs on current architectures that are deterministic (well, perceived as such, at least) and we don't have to forgo that. But there is no reason to believe it has to be the only architecture. Ungar is looking at what happens when you try to do computation in a highly networked environment with low latency (like say, a brain).

Also, if you don't see it helping for the practitioner, perhaps practitioners aren't asking good (or enough?) questions.

hessenwolf · on March 27, 2012

Determinism comes at a high cost? So the more processors we add, the more our computers become like us, very smart but not entirely accurate?

brazzy · on March 27, 2012

More like: the more processors we add, the more performance can be gained by switching to nondeterministic algorithms for problems where getting a solution that is within 99% of the perfectly correct one and arrives in 1 second is better than getting a perfect solution in 2 hours.

azim · on March 27, 2012

I like the way you've phrased this, so let me give a contrived but quasi-realistic example: I have a network of nodes which calculates and returns some floating point value. I then accumulate those in the arbitrary order in which they come back from in the network. (According to Google) 1111.0000000000001 * 2 * 1111.00000000000001 = 2468642 BUT 1111.0000000000001 * 1111.00000000000001 * .2 = 246864.2. By using floating point we've already decided it's OK to give up some precision, however by applying the commutative property of multiplication now we've also given up determinism.

JoeAltmaier · on March 27, 2012

Ungar could have said what you said, but didn't. Clearly some gee-whiz marketing going on.

And no, practitioners shouldn't have to ask the question "what good it is?" Every research result begs that question.

mseebach · on March 27, 2012

As I understand it, the alternative to giving the correct answer isn't giving a wrong answer - it's giving an answer that's right to a high - but not 100% - degree of confidence.

There are plenty of domains where a very fast almost-correct answer is preferable to a perfectly correct, slower one.

Take search, for an example. First off - it's very hard to determine an exact definition of the right answer. If the provided answer was good enough, it was right.

Another domain is route planning, especially with multiple stops in arbitrary order (travelling salesman). There are plenty of applications where the 99% answer in one second is vastly preferable to the 100% answer in an hour.

bnegreve · on March 27, 2012

An algorithm can be indeterministic and still have a determined outcome. Actually many parallel algorithms already are very indeterministic since the computations when they perform depend on the thread scheduling.

dxbydt · on March 27, 2012

The family of problems where computing is non-deterministic and there is no single right answer is very large. Many gossiping algorithms belong to that family, as do several finance problems with an underlying stochastic model. Here's a very practical problem that I solve everyday that can benefit hugely from Ungar's model of nondeterministic concurrent computation : a bank makes n loans, with each loan having a different probability of default. The loans are all connected by a covariance matrix ie. given any two loans (p,q), we have a best guess of the chances of p defaulting if q defaults. At the end of 1 year, m of those n loans will default. But today, right now, the bank has to submit to the Fed its best guess of how much money it might lose at the end of next year! In finance, we call this "estimation of the conditional value at risk" aka cvar estimation. There is no right answer, m and n are in the millions, the covariance matrix is huge, the number of possible losses is scala.math.pow(2,m) which is essentially un-computable for even a very small m like 100 loans, so it is out of the question if m is millions of loans. Yet I perform this very calculation using a modest 100,000 simulations ( via Actors in Scala, 1 Monte Carlo simulation becomes 1 Actor iterating over a list of loans & computing their losses concurrently ). While I sort of converge to an answer, the computation is non-deterministic, there is no real right answer, and Ungar's model would be a clear win for this problem.

fmstephe · on March 27, 2012

I think it is fair to say that this approach only works in some domains. We won't be using it where giving the 'wrong' answer upsets the people with the money. But if you listen to the channel 9 interview put up by 6ren he clearly outlines a number of scenarios that are appropriate.

<not-snarky>It isn't clear what quantum computing has to do with this discussion.</not-snarky>

Edit: Mitigate snarkiness of final comment.

DanielBMarkham · on March 27, 2012

"...It isn't clear what quantum computing has to do with this discussion..."

My entire point was that the way we describe future tech emphasizes the differences and exotic nature of the concepts instead of how it's all going to eventually work together anyway. Quantum computing was my second example.

Apologies if I didn't make that clear enough.

fmstephe · on March 27, 2012

I agree with your sentiment that it isn't clear how this will fit into software development in general. Fair enough about the quantum comment. Also, my comments sounds snarkier that it was intented, so apologies for that :) (what a polite bunch we all are)

stavros · on March 27, 2012

Don't think of it as a quantum CPU. Think of it as a quantum coprocessor that the CPU uses to accelerate some operations.

moonchrome · on March 27, 2012

OK how is this different from distributed computing ?

robfig · on March 27, 2012

I was wondering the same thing. Clearly Google et al has some success with parallel computation without giving up determinism.

Thinking about how the cloud model (collaborating services via RPC) maps to a single machine -- I guess that looks like a actor / message passing model where each process gets its own core or something.

Plenty of programming languages make this sort of thing straightforward -- (I'm most familiar with Go, where it is absolutely trivial).

To me, it looks like the industry is actually very well poised to handle to multi-core onslaught, without requiring some kind of fundamental rethink like nondeterminism.

onemoreact · on March 27, 2012

Google does give up on determinism when they can. Search is a great example, ask 1,000 machines to give their top 10 results and if 5 don't respond in time your still 99.5% sure that the top result is still the 'best' one. And even if you did lose your best result changes are the rest of the list is still going to be good enough.

scott_w · on March 27, 2012

Because in distributed computing, the communication lag is more noticeable. It is also harder to synchronise distributed nodes. Multiple cores on a single machine will probably share read-only resources e.g. the clock.

moonchrome · on March 27, 2012

Sure, but that's actually a positive and allows special case optimizations of the existing algorithms. My point is this certainly doesn't mean that everything we know is wrong, in fact we have been dealing with this class of problem "in the wild" for last 5-10 years with cloud/cluster computing and GPGPU. Also I feel this "many core is just around the corner" is a lot of hype and little actual results considering that Intel was announcing 70 core CPU by 2011 back in 2007 and we barely hit the 16 cores with AMD. There are some exotic CPUs and GPUs but nothing mainstream, desktops are still at 4-8 core ceiling, mobile is the same. Hopefully SOC GPUs with OpenCL support become mainstream, that will allow interesting things in the mobile space with, but beyond that I don't see this trend hitting the industry any time soon, eg. in 5+ years.

alexchamberlain · on March 27, 2012

I disagree with a couple of points...

First of all, even handhelds will have thousands of CPUs.

Second of all, it's not necessary to scale to this level. Rather, we will run more concurrent, independent tasks. Servers with thousands of workers will be common at home.

Someone · on March 27, 2012

What would those thousands of workers do? If they are independent of each other, why put them all in a single server instead of in an ASIC? If they are dependent of each other, why will thousands of thread/processes be the architecturally best choice on at server?

alexchamberlain · on March 27, 2012

A web server, for example.

A server would be cheaper?

varelse · on March 27, 2012

It's only cheaper if the underlying memory system has enough bandwidth to handle the throughput of all those cores at once. Otherwise, a fraction of the cores end up as dead-weight, which is surprisingly common in x86 systems set up like this.

alexchamberlain · on March 27, 2012

Does that include I/O over a network?