Why Johnny Can’t Write Multithreaded Programs

sz4kerto · on Dec 10, 2013

> If you need a lock, then you probably have global mutable state that has to be protected against concurrent updates. The existence of global mutable state indicates a flaw in the application’s design, which you should review and change.

No, no, no. If you are writing multithreaded programs, you _will_ have locks, period. The question is whether you explicitly use them or if they are hidden somewhere in your libraries, frameworks, etc. Actually, the latter are much more difficult to debug. Most of us have probably seen situations where deadlocks or data corruption has occured because of some extremely complicated interaction of message pumps - UI thread - callbacks - async operations - etc., without any explicit locking.

> Perhaps most importantly, modern programming languages and libraries make it easy to create a producer-consumer application.

Yes, that's great again, until the deadlock-corruption-indeterminism problem comes back again in a different abstraction level.

There is _no way out_. If you are writing concurrent apps, you have understand what's happening. By concurrent I mean concurrent on the level you are working on. Using a parallel 'map' operation is different because the concurrency is an implementational detail from your perspective. But if you are writing a massively concurrent quoting/trading engine, then you have to know what's happening. And if you do, then you're free to use locks, executors, async, whatever -- it's just a tool to solve the problem.

...

The more I look at this article the more ridiculous it seems...

jwr · on Dec 10, 2013

> No, no, no. If you are writing multithreaded programs, you _will_ have locks, period. The question is whether you explicitly use them or if they are hidden somewhere in your libraries, frameworks, etc. Actually, the latter are much more difficult to debug.

You say that and then proceed to call the article ridiculous.

Well, I write and maintain large applications in Clojure. The code uses a multitude of threads for various purposes, with different lifetimes. And I've yet to see a single explicit lock. What's more important, though, is that I've yet to see a race condition, a deadlock, or data corruption, which I find pretty impressive.

I do agree that you need to understand what is happening — but I disagree with your assumption that the basic tools are all on the same level. They are not, which Clojure clearly demonstrates. I won't go into details, because Rich Hickey explains it much, much better than I ever will be able to — here's an example of his talk: http://www.youtube.com/watch?v=dGVqrGmwOAw

sz4kerto · on Dec 10, 2013

I'll definitely watch the video and I have to say I don't know Clojure very well.

I think we need to distinguish between in-app race conditions (a primitive example is when you're trying to modify a data which cannot be written atomically), and situations where you have to maintain some order of operations because if you don't then the program won't work correctly _from the end-user's standpoint_. In concurrent programming safe operations are not composable, i.e. if you have >1 thread-safe operations then using those operations in parallel won't be thread-safe by itself).

What I tried to say is that as you increase the level of abstraction (i.e. build something useful using the great constructs of Clojure), if concurrency still exists on that higher abstraction level then you need to think about synchronization, again. And the problem you'll face is really similar to those low-level issues when you try to protect a 64-bit long data structure in C. That's why I encourage everybody to think about these simple toy-problems even if they specifically never come up when you write programs using modern tools.

yogthos · on Dec 10, 2013

The reason threading is much easier in Clojure is due to immutability. Any changes made to data happen via revisions on existing data. This also means that any changes are inherently scoped to the appropriate code branch by default.

Shared mutable data is handled by the STM in Clojure. The STM, in turn, relies on immutable data structures to work. Since data is immutable you can have optimistic locking. Reading existing state is perfectly safe even when a new state is being generated.

This article http://blog.enfranchisedmind.com/2009/01/the-problem-with-st... does a great job explaining why it's notoriously difficult to make a working STM in an imperative language.

jwr · on Dec 13, 2013

> I think we need to distinguish between in-app race conditions (a primitive example is when you're trying to modify a data which cannot be written atomically), and situations where you have to maintain some order of operations because if you don't then the program won't work correctly _from the end-user's standpoint_. In concurrent programming safe operations are not composable, i.e. if you have >1 thread-safe operations then using those operations in parallel won't be thread-safe by itself).

That's true. I also agree that regardless of the language or libraries used one needs to be aware of concurrency issues and understand them. The world is concurrent and there is no way around that.

What I took issue with was the statement that you will have locks and even if they are in your libraries, they will come back to bite you. That is simply not true in case of complete languages/environments designed for concurrency. Clojure is one I know — and it isn't just a "library" that helps, it's the combination of language design, functional approach, data structures, immutable data and software transactional memory.

Incidentally, the words "thread-safe" are really hard to encounter if you write in Clojure. I would imagine it is similar in Erlang.

eXpl0it3r · on Dec 10, 2013

I guess that points out another issue with the article - it talks about programming in general and doesn't take into account that it's way easier to write multi-threaded applications with languages that are explicitly created for developing multi-threaded application and languages that just happen to support threading.

So comparing for example Clojure and C++ in regards to multi-threading Clojure will of course win, since C++ only recently got official standardized threading implemented and is far away from a really good concurrency support.

nly · on Dec 10, 2013

C++ will, of course, win the moment you let 3rd party libraries in the door. Oh, sorry... that's just another vague generalisation.

andyjohnson0 · on Dec 10, 2013

Do your large applications use databases, the filesystem, console I/O, etc?

jwr · on Dec 13, 2013

Yes, very much so.

balakk · on Dec 10, 2013

Perhaps you might appreciate that there is a wide spectrum of problems that might require parallelism, from dramatically simple enterprise apps to "massively concurrent quoting/trading engines".

This advice is written for the 90% of use-cases that can be solved with simple abstractions, where programmers are still spinning up threads and locking objects. Most application programmers don't do manual memory management, low-level network programming, graphics, and so on. Why should parallel programming be any different? That is the point I believe the author is making, and I happen to agree.

mcguire · on Dec 10, 2013

On the other hand, there is a certain undercurrent to this and many similar posts that I've seen, something along the lines of "Because I don't need to do it, no one needs to." Or, in this case, "Synchronization primitives are deep, dark, evil voodoo magic and you'll never be beardy enough to do it right." Such as:

"Locks and other synchronization primitives are systems level constructs. People who know a lot more about multithreading use those constructs to build concurrent data structures and higher level synchronization constructs that mere application programmers like you and I use in our programs."

...which is probably true, but there is nothing scary about "systems level" stuff---even the author could probably tool up to do it.

That becomes a problem in the 10% of use-cases (which I suspect is more frequent than that) where the simple abstractions are not satisfactory.

temujin · on Dec 10, 2013

"If you are writing multithreaded programs, you _will_ have locks, period."

Not necessarily. In scientific computing, it is common to encounter "embarrassingly parallel" scenarios where each thread can solve equal-sized chunks of the problem, and you can wait till all threads have terminated to collect results.

edit: Okay, the final wait is usually implemented with a lock under the hood. Nevertheless, this type of program is straightforward to reason about.

mnw21cam · on Dec 10, 2013

Here is where you need to be really careful. How do you know all the threads have terminated? That's communicated through a lock in the underlying .

Certainly some systems (such as Java and multiple CPUs with independent caches) that use thread-local working areas, you cannot rely on a task being completed meaning that you can actually use the data it has produced, until you actually make use of a lock. The lock ensures that the local copy is synchronised with the central copy.

There's plenty of opportunity for fun if you don't use locks in some form or other in multi-threaded systems.

sz4kerto · on Dec 10, 2013

And of course that 'wait' operation does not involve locks and/or condvars.

Oh wait.

lmm · on Dec 10, 2013

Who cares how it's implemented? Manually using a lock is a smell in the same way that goto is a smell. Wait for a task to finish is not a smell in the same way that while is not a smell.

mannykannot · on Dec 10, 2013

"Manually using a lock is a smell in the same way that goto is a smell."

The presence of a lock tells you nothing about whether a program is well-designed. In fact, the whole idea that you can argue about the correctness of a program from the presence or absence of "code smells" is absurdly simplistic. I have seen plenty of bad code littered with while statements.

the_af · on Dec 10, 2013

Code smells (I don't like the name either) are just warning signs. They do not necessarily indicate an actual problem.

Incorrect use of abstraction levels, such as using tools that are too low-level and error-prone for the task at hand (as in the majority of business applications) IS one of those warning signs. It usually indicates a conceptual problem, and makes the code needlessly complex.

eXpl0it3r · on Dec 10, 2013

There isn't really that big of a difference between a lock and a wait, which is the point that was being made. It either way will "lock" up the thread until condition X is full filled and since it depends on other threads to finish, it's still very possible to get deadlocks and a like.

Besides that, depending on the task, it's very important how something is implemented.

yxhuvud · on Dec 10, 2013

That can just as well be implemented by message passing, which can happen through a socket or whatever.

fr0sty · on Dec 10, 2013

And if both threads are doing a blocking read waiting for their message? Blocking locks, deadlocks, endless fun...

But wait, you don't use blocking-IO, you poll the sockets to see if there is data available, right? Do you realize what you are polling? (hint, it is global and mutable)

polskibus · on Dec 10, 2013

Surely the wait is a kind of a lock?

mannykannot · on Dec 10, 2013

"Nevertheless, this type of program is straightforward to reason about."

This type of program is an uninteresting corner case, that tells us nothing about the more general problem of reliably writing concurrent software.

rietta · on Dec 10, 2013

"...No, no, no. If you are writing multithreaded programs, you _will_ have locks, period. The question is whether you explicitly use them or if they are hidden somewhere in your libraries, frameworks, etc...."

That is not true. Lock-free data structures exist and are applicable to many multi-threading use cases. For a super primer see https://en.wikipedia.org/wiki/Non-blocking_algorithm and if you are really interested, read about Transactional memory - http://dl.acm.org/citation.cfm?id=165164, and then "The Art of Multiprocessor Programming" (it is a text book) by Herlihy and Shavit.

sz4kerto · on Dec 10, 2013

I have been following transactional memory since 2004 or something like that, it's great as it solves the locking problem on that specific abstraction level. It does not solve the general problem of synchronization, though. See my other comment in this thread. Locking problem != making sure that a 64-bit value will be left in an inconsistent state.

rietta · on Dec 10, 2013

You have followed it longer than I have then. My gut reaction is more to the universal nature of your claim that I quoted. It is a goal to have practical large systems that need no locks. Current operating systems in production may not be there yet, but it is not proven that it is impossible to create a system that is entirely lock free.

mannykannot · on Dec 10, 2013

Reasoning about the correctness of lock-free concurrency is much more difficult than doing the same for sequential algorithms, and gives the lie to the original author's claim that successful concurrent programming is just a matter of avoiding mutable global state.

fulafel · on Dec 10, 2013

The quote seems to use "lock" to mean any locking provided by your primitives, he probably includes the atomic instructions too since they have locking in hardware.

exDM69 · on Dec 10, 2013

> If you need a lock, then you probably have global mutable state that has to be protected against concurrent updates. The existence of global mutable state indicates a flaw in the application’s design, which you should review and change.

There's a grain of truth in this one. A lot of people think that multithreaded programming is like single threaded programming with locks added. This is not the case.

In my experience, with almost every mutex there should be one or more condition variables that are used to signal various things. A single mutex is almost never enough to achieve anything meaningful, because for every state you mutate, there's probably another thread waiting for that state to be mutated.

If you're working with a language or a framework that has some kind of queues and channels (like Go or Haskell, for example), they are probably more useful for lots of tasks than locks and conditions. Of course, someone has implemented those queues using locks and conditions, or atomic operations.

eXpl0it3r · on Dec 10, 2013

I can only agree here, though I think the article would be much better, if the author had clearly specified his universe of applications.

> We move data around, transform it, perhaps do some calculations from time to time, and finally store the results in a database or display them on the screen.

If it's just limited to this, then I could partially agree. But I can't agree at all to the following statement:

> There are difficult multithreading scenarios, just as there are difficult scenarios in the single-threaded world. But those are relatively rare.

Adding some "magic" patterns and guidelines won't make synchronization issues go away.

kasey_junk · on Dec 10, 2013

It's worse than that. On commodity hardware you can't prevent having global state because the processor is making those choices for you. L2/L3/Main Memory are all mutable global states that your data gets put into. Of course, modern processors are pretty good at making sure they don't corrupt that global state, but drill down far enough & you have global mutable state, that needs to be synchronized around.

That said, if you aren't at that level you can almost certainly write lock free code (and you should) by apply design principals.

the_af · on Dec 10, 2013

Isn't this basically what TFA is saying? Who is talking about the hardware level?

Hermel · on Dec 10, 2013

> There is _no way out_. If you are writing concurrent apps, you have understand what's happening.

Actually, there is a number of best practices that help you avoiding problems. For example, it helps to have a strict ordering of locks - i.e. everyone who locks on objects A and B, must do so first on object A. As a corollary, you should not call a callback or notify a listener while holding a lock - because you don't know what it will do. These two simple rules together already help a lot - even without understanding every possible state.

the_af · on Dec 10, 2013

> No, no, no. If you are writing multithreaded programs, you _will_ have locks, period

It's clear from the article the author means low-level locking primitives.

jaimebuelta · on Dec 10, 2013

To me, saying: "Multithreaded programs are not difficult, you just have to know how to do them" is extremely similar to "Multithreaded programs are difficult, and you have to know how to do them"

I've done a lot of multithreaded programs, and, yep, you have to use some good practices to avoid making the code explode. You know, having to learn them and stuff is why they are "difficult". Just because you know how to get something done does not make that "easy". That's what experience is for.

notacoward · on Dec 10, 2013

His argument for "if you use a lock you did something wrong" seems to presuppose that somebody else created handy concurrent data structures for you. In other words, he admits that multithreaded programming is hard, so you should make it Somebody Else's Problem. I don't think that's a very strong argument.

Nursie · on Dec 10, 2013

Half the world seems to operate on the principle that this sort of stuff is SEP these days, and then sneer at the people who are solving it for being so low-level and not at all modern...

sophacles · on Dec 10, 2013

A different take on this: we have a small team. We are solving in a hard problem. Other people have solved this other hard problem for us. Let's focus on doing our thing very well, and not spend our limited cycles on redundancy.

I like to drive the point home with this (slightly hyperbolized) statement: X already does this, and those folks need it to be good if they want to eat, so why the heck are we doing it ourselves?

Nursie · on Dec 10, 2013

Agreed, code re-use is a very valuable and great thing. Where there are well-tested and well-proven technologies to do what you're trying to do, use them (if the license fits, it's not too expensive etc etc.)

But someone has to do the 'hard' parts, and do them well, and I don't see why competent tech folks shouldn't at least be interested in that.

anaphor · on Dec 10, 2013

Exactly. Things like this just deter people from becoming interested in "low level" things like compilers, runtime systems, databases, etc... and maybe becoming active researchers in these areas.

lmm · on Dec 10, 2013

Structured programming is not hard - you just use someone else's implementation of loops and functions and classes. If everyone tried to build those themselves from low-level primitives, structured programming would be seen as just as hard as multithreaded programming currently is.

notacoward · on Dec 10, 2013

False analogy. For one thing, structured programming covers control flow and not just data structures. For another, things like for loops and functions have been in every language for decades. Processors are even designed to support them. Concurrent data structures are nowhere near that level of maturity or ubiquity. Many people still work in languages and environments where they can't reasonably be assumed. Even if the OP's argument is valid within some specific context, it's still being oversold as general advice.

rdtsc · on Dec 10, 2013

Exactly.

Unless the language/framework/library has automatic serialization and copying of objects, or automatic pointer ownership transfer when put in these magic queues, the problem is still there.

henrik_w · on Dec 10, 2013

On the subject of multithreading, I like this one by Ned Batchelder[1]:

Some people, when confronted with a problem, think, "I know, I'll use threads," and then two they hav erpoblesms.

[1]: http://nedbatchelder.com/blog/201204/two_problems.html

timje1 · on Dec 10, 2013

Personally I've found that learning a functional programming language, with strictly enforced 'no side effects' functions, has simplified multi-threaded programming for me. I just remember back to those restrictions and force myself to follow them, whatever the current language allows.

For reference, the language was Erlang, which I believe has the best handling of threading that I've encountered so far.

tel · on Dec 10, 2013

Agreed! My mind on concurrent programming is far away from the usual nightmare stuff because I just think at the level of immutability, MVars, OTP, STM, LVish. You have to simplify your computational semantic domain before it's easy to augment it with nice abstractions for concurrency.

bottled_poe · on Dec 11, 2013

Side-effect free functions work great for algorithms which can be processed in parallel. Such functions tend to indicate good software design. However, unlike parallel processing, concurrent programming by definition assumes that each process will need to share access to resources. I don't believe side-effect free functions alone will help much...

zvrba · on Dec 10, 2013

So you've basically experienced the benefits of actors (message-passing style, no shared state). But you can use message-passing paradigm in imperative languages too, though it's rather cumbersome in non-extensible languages like C or Java.

jerf · on Dec 10, 2013

I think (s)he strongly implied that. I'd simply flat-out say it, the best way to learn how to program in an imperative multi-threaded environment is to clock enough time in Erlang, Haskell, or perhaps Clojure (with as much immutability as possible) that you can get real work done (not merely finish a couple of tutorials) that you are forced to learn how to program in a minimal-sharing style, then if you go back to something like Go or something, it's really not that hard.

I think you can learn how to program multithreaded code well without that, but you'll learn a great deal faster if you use a language that kicks you in the face every time you do something stupid (or simply doesn't let you do it). As in, potentially an order of magnitude faster. Fast feedback is non-linearly better than slow feedback.

Though this does assume you get to greenfield the multithreaded code. Woe betide ye who have to retrofit existing poor multithreaded code. I don't even know what to tell you, except you have my pity.

timje1 · on Dec 10, 2013

That's correct, I love the actor system in Erlang, and I wish I had access to it in every other language I've used.

Building multi-threaded applications with functional programming forced me to write pure functions - or close enough, Erlang does have some side-effects but this is just to make it more useable than the mostly academic Haskell.

I've carried this approach into C# and Java, and typically end up with methods that fit a multi-threaded paradigm by default because of their lack of state. As a freebie, they can be effortlessly unit tested by mocking their direct parameters and testing the return values (no need to mock or test state).

exDM69 · on Dec 10, 2013

I have written some multithreaded code and I can agree that it's not too difficult to write. What is difficult is testing and debugging multithreaded code.

There are lots of corner cases that you might overlook when writing multithreaded code, and some of them will only ever happen if the threads get scheduled in a particular manner.

Multithreaded software development is not easy, partially because in the past we've acquired practices that don't work very well in multithreaded world. Partially because it's a whole new world which is not thoroughly explored and established best practices do not exist.

But all in all, too many people are afraid of multithreaded code and this is holding us back. It's not scary or difficult, it just requires a more rigorous approach at times.

stevoski · on Dec 10, 2013

Writing a background thread that does a long-running calculation? easy.

Writing a background thread that can be cancelled immediately, that gives useful progress information, that doesn't leave other threads indefinitely blocked if cancelled, and that cleans up after itself if cancelled? hard.

For my single-threaded code I have great static analysis tools, IDE magic, and testing frameworks. For my multi-threaded code, I'm on my own...

Pxtl · on Dec 10, 2013

> For my single-threaded code I have great static analysis tools, IDE magic, and testing frameworks. For my multi-threaded code, I'm on my own...

That's a big point. The tooling for parallel code is developing really slowly compared to the proliferation of multi-core processors. Most of it seems to be "we'll make the library multi-threaded but all your code will still be single-threaded" like we see in Node.

throwaway0094 · on Dec 10, 2013

Whoa whoa whoa. Slow down. When did I become an Applications developer and not a systems developer? The kernel I work on uses TONS of threads and locks. And sure, it's fucking crazy to debug. But I don't think you'll be prying mutable state from the cold dead hands of kernel developers any time soon.

the_af · on Dec 10, 2013

I think it's pretty clear if you're a systems developer (or someone writing a library with concurrent data structures), this article doesn't apply to you.

The majority of developers aren't going to be systems developers, though. Just as the majority of developers don't use assembly language either.

mannykannot · on Dec 10, 2013

Exactly. Someone has to deal with concurrency so that, in many cases, applications programmers don't have to.

unclebucknasty · on Dec 10, 2013

I think there's another simpler factor: some people are just naturally better wired to think in terms of concurrency.

There's a certain thinking style you bring to writing and understanding concurrent software. You need the ability to "visualize" the contexts, behavior, and data access. You need to see the possible race conditions and understand what resources need to be "protected" and how.

Reminds me of recursion. Back in my CS days, it just made some people's heads explode. They couldn't think in a nested fashion to understand what was going on at various recursion levels, nor what was required to terminate, what should be returned, etc.

Like-wise with proofs in geometry. Some people just couldn't get there. But, for those who could, it came without too much effort.

You kind of either have it or you don't. You can practice and improve, but some people just naturally "get it". So, I don't think it's as simple as "they learned the wrong approach". In fact, if they truly understood the concepts and possessed the requisite thinking style, they would be able to identify the "wrong approach", and the right one.

rdtsc · on Dec 10, 2013

> If a multithreaded program is unreliable it’s most likely due to the same reasons that single-threaded programs fail: The programmer didn’t follow basic, well known development practices.

Well no. It is like saying mt programs are broken because the user didn't know how to write them. It is not saying much. Everything with bugs is broken because the writer didn't know how to write it.

The reality is that multi-threaded is hard because it is hard to reason about and hard to get right. In one large application it only takes on junior programmer to dip their fingers in and add that one module that will crash the system 1 out 1000 run on customer's hardware only.

MT issues and problems on a whole new level of difficulty that is qualitatively different than what single threaded programs experience. Sequential problems and bugs are still there, but now there are a host of completely new things happening.

> These programs very often fit nicely into a producer-consumer model with three threads:

* The input thread reads data and places it on the input queue.

Well, you also have to make sure the queue is thread safe. Understand what does it mean to get an item off of an empty queue. Now you have queue-using-up-all-my-memory problem if you do stream processing with mis-matching processing rates between parts of your programs. What about objects you put in the queue. If those have any pointers or are pointers themselves now you are back to the original problem. So make sure objects are copies, are serialized/deserialized or there is proper ownership handoff.

In summary, is it possible to write correct highly concurrent programs in a shared memory model with threads/and-or/events -- yes. Is it easy? No, it is not easy.

The article correctly points the often common fallacy that using event/callback-based single threaded approach is superior and avoids all the concurrency related issue. It doesn't avoid all issues. Now the race is between callback/errback chains instead of threads. Granularity is much higher (if incrementing a shared 64 bit int is fine without a lock). But the need to protect shared data structures is still there. The danger is psychological because one thinks "hey this is safe, I don't need anything to worry about"

That is why I believe it is better to use a language/system/framework that uses isolated heaps and rigidly enforces separation between concurrency units. Erlang has that, D has some of that, Nimrod, Web Workers and Dart isolates to that. You incur a speed penalty for that. Don't be shy about using regular Unix OS processes. Those are great for isolation and do it well. The cost of IPC and cost of creating them can also be high. You in incur a penalty for copying data between concurrency units, but remember, there is a bigger penalty you incur when the program has crashed. The converse of that is, the biggest performance gain your application will achieve is when it goes from not working to working and staying working.

GrinningFool · on Dec 10, 2013

"The reality is that multi-threaded is hard because it is hard to reason about "

I think that's key.

The primitives of multithreading aren't difficult, nor are the basic concepts. What is hard for most people is thinking about it in a meaningful way.

The mental model for single-threaded programming is challenging for many - you essentially run the program or subset of the program in your head (at least, if you're any good). This includes all branching and jumping, and effectively requires maintaining one or more stacks among other things.[1]

To fully conceptualize multi-threaded programming, programmers need to be able to expand their mental model of the executing application. Not holding one stack, but N. Awareness that each may (and likely will) differ slightly. Awareness of other actors. Awareness of exactly what is shared, and what paths lead to its modification. Being able to visualize (for lack of a better word) which others may interact with the shared data, and under what circumstances. All of this comes into play when you add just a second thread executing the same code - it gets considerably more complex from there. The closest analogy that comes to mind is the shift from thinking in three dimensions from two, or four from three.

It is not impossible to reason about, nor is it impossible to build such a mental model. But it is hard. And it does take practice.

[1] And even this basic ability seems unfortunately uncommon.

rbehrends · on Dec 10, 2013

Is it easy? No, it is not easy.

Correction: It's not easy in most programming languages, because most programming languages fail at implementing mutual exclusion in a way that is safe. It's basically the same problem as buffer overruns in C, where indices don't get checked (except that in the concurrent case, most languages don't require you to associate a lock with shared data and/or don't verify that you hold the lock before operating on that shared data).

Per Brinch Hansen's famous rant gives the gory details [1]. The basic technology had been figured out in the early 1970s.

[1] http://brinch-hansen.net/papers/1999b.pdf

Jtsummers · on Dec 10, 2013

Trapped behind a web filter so your link was filtered as "Personal Site". I believe [1] is the same document for others in the same situation. Also, thanks for the link, it's an interesting read.

[1] http://cse.unl.edu/~witty/class/csce351/howto/PBHansenParall...

notacoward · on Dec 10, 2013

I agree with "hard to reason about and get right" because of the state-space explosion. If you do five atomic things in a single-threaded program, you have six total states to deal with. If you do those same five things in five threads, you have over 200. That's true regardless of whether you're using a naive "locks and blocks" model or a sophisticated functional/CSP/actor model, and splattering that state across multiple stacks/continuations doesn't exactly make debugging easier. People who have done this for a while do apply good programming practices to reduce the number of states and the dependencies between them, but the fundamental problem still remains. Every new state represents a new potential for error, and code that has many error cases is simply harder to write/debug/verify than code that has few.

VLM · on Dec 10, 2013

"That's true regardless of whether you're using a naive "locks and blocks" model or a sophisticated functional/CSP/actor model, and splattering that state across multiple stacks/continuations doesn't exactly make debugging easier."

Enormously greater complexity has the side effect of making execution slower. Something to think about. Converting a fast executing and quick to write/debug and reliable single threaded app into an inherently slow executing and slow to write/debug and unreliable multithreaded app needs an extremely good reason outside of the tradeoffs listed above. Those situations do exist, but situations like "I thought it would be fun to put multi-threaded programmer on my resume" happen more often.

Its kind of like using "goto". If you don't really understand everything about the topic, its probably a wise choice to always avoid.

notacoward · on Dec 10, 2013

I know this is HN and therefore I'm supposed to disagree, but I don't see how I would. Yes, there are a lot of people writing multi-threaded programs who shouldn't be, either in the sense that they don't need to or that they're not qualified. The problem I see with the OP is that it's giving advice based on assumptions common in the tyros' environment but much rarer for people who actually do need to write multi-threaded programs. The claim that "it's not hard" doesn't hold water. Dabbling's never hard.

yonran · on Dec 10, 2013

I think the article has some good points, but he should spend more time discussing what circumstances you can block when waiting for a result, and how to balance too much blocking against too many asynchronous states.

I think his focus on “global mutable state” is a bit of a red herring. It makes no difference whether the mutable state is global (protected by a lock) or owned by a single thread. You can refactor all your synchronized {...} blocks and replace them with executor.execute(futuretask); futuretask.get();, where executor is a single-threaded event loop and futuretask.get() blocks for the operation to complete. This refactoring would make a single thread the owner of the state, but there would be exactly the same number of deadlocks. Conceptually, you can think of your critical sections as a separate thread on which you invoke methods and then block until completion.

After refactoring to use Executor and BlockingQueue, one still has to make sure there are no cycles in which threads are allowed to block waiting for which other threads.

joosters · on Dec 10, 2013

IMO the problem with threads is the 'all-or-nothing' approach to shared data. If you want to make use of multiple CPUs, you either use threads, in which case all of your process data is shared, or you use multiple processes, where none of the data is shared. It is the over-sharing that trips people up with threads.

Ideally, we need an API that allows selective data sharing, so you greatly reduce the chance of one thread trampling over bits of another. Right now, the common way to do that is with multiple processes and shared memory regions, but it is cumbersome to set up and requires the programmer to treat the shared data very differently (e.g. one process can't just malloc data for a string, say, and then let the other process access it.

The other route is through message-passing, but again you have to write your code in a totally different way. Someone please invent multi-threading with selective sharing!

VLM · on Dec 10, 2013

"Perhaps you’ve accepted the common fallacy that “Multithreading is hard.” It’s not."

Bzzt full stop epic fail. Think about the other discussion today about the "naturalness" of zero or one based arrays.

If multithreading was natural and inherent part of all noobs, you'd teach programming beginning with multithreading and maybe later on you'd have some weird module which confused most of the class explaining how some legacy languages and systems only allow one thread which has certain unusual programming constraints that most of the class would screw up. Also all automata theory classes would begin with the eNFA as the base class and later on tell the kids about DFAs being broken forms of a NFA and all that. I'm just not seeing it.

The article did fail at architectural level. Most non-threaded programs can't be upgraded to threading because either its not worth the effort or its impossible to solve the actual problem without threads so the postulated single threaded solution can't exist therefore you can't upgrade what could never have been implemented. Why are you multi-threading?

Most likely if you're adding multi-threading explicitly to a single threaded app for an inherently multi-threaded problem, you're going to spend all your time figuring out how the single threaded part exploited some other component of the entire system to create a psuedo-multi-threaded app. Perhaps it relies on the OS to spawn multiple clients in a client server arch where all the processes live on the same CPU or something. The problem isn't just adding a thread so you can checkmark a marketing box that its now a multi-threaded app so someone can collect an end of year bonus, the problem is you need to scrap the whole thing and start architecture work from first principles.

Another funny part was the natural assumption that all programs/projects will be huge enough to contain at least two completely non-interacting subsystems so no global state is the ideal. That's a dumb architecture design or business decision. Lets think of an example where you would have threads with no global mutable state. How about a tetris implementation in one thread and an enterprise rational DBMS system in another thread and an IRC client in another thread, all in the same program/project. That's just a dumb architectural idea, make totally separate projects. Make the smallest usable debuggable tools and use them together, not the largest undebuggable swiss army knife the world has ever seen.

fatman · on Dec 10, 2013

This is one of my largest weaknesses. I haven't come across any good resources for self-study on this topic. I haven't found any online course-type resources, and the books i've found are hard to follow. Any recommendations for how to learn this stuff would be very welcome.

camperman · on Dec 10, 2013

Would people be interested in a getting started type tutorial using Zero MQ? As I replied to thewarrior below, the manual is very good and it made multithreading an absolute pleasure for me but it's not really an introduction to threads - I had to experiment quite a bit. I can do it for C, Python and Lua.

What would be useful examples? Multithreaded video decoding? Scaling out calculations to multiple cores?

fatman · on Dec 10, 2013

That would be awesome. I just got on a new project a week ago where the PM sent me a link to Zero MQ and basically said "here, read this".

gnuvince · on Dec 10, 2013

> The programmer didn’t follow basic, well known development practices. Multithreaded programs seem harder or more complex to write because two or more concurrent threads working incorrectly make a much bigger mess a whole lot faster than a single thread can.

Actually, I prefer to think that the reason multi-threaded programs are harder to write is because it's harder to compose components together. If I have a regular old function and another regular old function and compose them, I get a regular old function and I know that it works how I expect it. With two multi-threaded functions, then I have no guarantees anymore. If I cannot take simple components and combine them into more complex ones, how am I suppose to write correct code? Sorry, but I'm just not good enough.

Jtsummers · on Dec 10, 2013

This is where language choice and concurrency libraries become critical. Erlang's (IME) processes are nearly as composable as functions. With golang's channels allow something similar. ZMQ seems to provide a similar mechanism in a language agnostic fashion, though that seems to be about connecting multiple processes rather than threads (do people use ZMQ for interthread communication and syncronization?).

kunil · on Dec 10, 2013

I am in no position to call my self I can write good multi threaded programs but... damn people are bad. Once I worked in an embedded project that had sleeps in UI thread and then people started to use threads for 'optimizing'. It didn't go well

colinshark · on Dec 10, 2013

Multithreaded code is more difficult to write and more complex to debug by any objective measure.

"There are difficult multithreading scenarios, just as there are difficult scenarios in the single-threaded world. But those are relatively rare."

Is this some sort of flame bait?

kasey_junk · on Dec 10, 2013

A bigger problem than people writing concurrency poorly is that they jump to concurrency too soon and think it is going save them from their performance problems. It almost certainly won't.

Contrary to the growing popularity of message passing concurrency, modern commodity hardware is not massively parallel and jamming massively parallel algorithms on top of it generally is a mismatch. You are almost certainly better off having a few well defined threads that are fast than unbounded threads that are slow.

chrismorgan · on Dec 10, 2013

I like Rust's approach (it's not new, but Rust does a good job of the total implementation). When I'm using Rust, I know that data race in safe code is impossible.

(I know this isn't quite all there is to the problems of multithreading, but it's a very high percentage indeed in almost all applications.)

jebblue · on Dec 10, 2013

>> If you eliminate shared state, you have no reason to misuse those synchronization primitives.

A multi-threaded program that doesn't need to share state probably isn't the usual case, not from my experience anyway.

Locks are not evil, think of a railway station, there are sections of track that must be shared by multiple trains.

wehadfun · on Dec 10, 2013

Thanks for the article. When I have free time I will look into the various tools that modern languages provide to solve problems that are usually handled with threads.

I would have appreciated an article on the various "libaries" we should be using instead.

JeffJenkins · on Dec 10, 2013

Is there a list of languages which use separate heaps for each thread? I'm really interested in learning more about that concurrency model, but haven't had much luck compiling a comprehensive list of which languages actually use it

bwindels · on Dec 11, 2013

javascript with webworkers

mannykannot · on Dec 10, 2013

The author seems to think it is all about avoiding mutable global state, but it is more than that: it is about shared resources and temporal ordering, and these are not issues that can, in general, be avoided in the simple way he supposes.

jheriko · on Dec 10, 2013

its interesting - but there is something fallacious here:

"Probably the most important lesson to be learned from the past 60 years of software development is that global mutable state is bad. Really bad."

This represents a misunderstanding about the badness of global variables and the goodness of encapsulation in my mind. Mutable global state is an unavoidable necessity to ship your app - every single pixel on your screen is a bit of global mutable state. I won't dive to far in here, but I cringe everytime I see these fancy pants words applied because its pretentious /and/ trivially measurably wrong (see everything on your monitor right now).

thewarrior · on Dec 10, 2013

As a n00b programmer I must say that the author dosen't give any examples on how locks are commonly misused or exactly how to avoid these pitfalls other than to say "Avoid global mutable state".

camperman · on Dec 10, 2013

I am not a n00b programmer by any means but I've never needed to use threads until recently. Non-trivial threading - like decoding video in the background and coordinating with audio kind of non-trivial - is hard. Difficult to reason about, unbelievably frustrating to debug, errors seem random, races, deadlocks - you name it, I struggled with it.

That all went away when I discovered Zero MQ (http://zeromq.org/intro:read-the-manual). The manual claims:

"If you've spent years learning tricks to make your MT code work at all, let alone rapidly, with locks and semaphores and critical sections, you will be disgusted when you realize it was all for nothing."

It's not exaggerating. ZMQ's multithreading is marvelous. No mutexes, no locks, no semaphores, no critical sections, no load problems - just perfectly scalable threads that work with any language on any platform. In fact it's so cool, that if you don't have a problem that needs multithreading, go out and find one.

nly · on Dec 10, 2013

ZMQ still relies on a message passing model, which is orthogonal to the threaded shared memory model all our hardware actually presents to us. The advantage is more being able to go from threads to processes, then to the network, easily. That advantage is just as useless to most applications as threading is to achieve concurrency (rather than parallelism).

camperman · on Dec 11, 2013

"ZMQ still relies on a message passing model, which is orthogonal to the threaded shared memory model all our hardware actually presents to us."

So what? There are all sorts of successful language and computing models which are light years removed from threaded shared memory. Message passing without sharing state, except for some lock-free primitives, is bloody brilliant for very scalable concurrency - as Erlang has shown us for coming up on 30 years now.

"The advantage is more being able to go from threads to processes, then to the network, easily."

That's the raison d'etre for ZMQ to be sure. But very pleasurable multithreading is a massive advantage too. If it scales out to nodes as well, so much the better.

"That advantage is just as useless to most applications as threading is to achieve concurrency (rather than parallelism)."

Threading using ZMQ message passing is a superb way to achieve concurrency (given that this is the real world and we can't all rewrite our apps from scratch in Go or Clojure). I'm a threading beginner and I've got video decoding, graphics handling and networking for a real time app all in separate threads going great guns without falling over or messing each other around - on a very low spec machine too I might add. On a four core workstation, all threads get a whole core to themselves no problem.

Try it. You'll like it :)

nly · on Dec 11, 2013

Any locked (or lock-free) pointer queue with the right ownership semantics will give you what ZMQ does within a threaded program. A coroutine library will get you even more flexibility if you just want to setup pipelines. If you want greater scalability, there are lots of non-blocking I/O event libraries (libuv/libevent/boost asio). None of these solutions involve moving to a message passing model.

I'm not discrediting that operating between threads is a handy addition to ZMQ, but using it for communication between threads seems like a lot of work (moving to that message passing model) if you never intend to leave one process.

I do happen to like messaging, but when your data becomes complex it's a right dog...you're now in the world of protocol design.

camperman · on Dec 12, 2013

"Any locked (or lock-free) pointer queue with the right ownership semantics will give you what ZMQ does within a threaded program."

OK, now I think you're trolling, perhaps by cutting and pasting from random technical manuals. What you said there makes no sense at all. Could you paste some example code please?

"A coroutine library will get you even more flexibility if you just want to setup pipelines."

Now I know you're trolling. Coroutines are completely useless for real concurrency like decoding audio and video in the background simultaneously - because only one coroutine can be executing at the same time.

"If you want greater scalability, there are lots of non-blocking I/O event libraries (libuv/libevent/boost asio). None of these solutions involve moving to a message passing model."

What if I don't really care about I/O? What if I have a small program that needs to use multiple threads on multiple cores to efficiently search a large problem space while doing video decoding in the background and having a responsive UI? I must just link to Boost? Er, no thanks.

"I'm not discrediting that operating between threads is a handy addition to ZMQ, but using it for communication between threads seems like a lot of work (moving to that message passing model) if you never intend to leave one process."

Compared to getting synchronization right, optimizing critical sections, debugging random failures under load, it's trivial.

"I do happen to like messaging, but when your data becomes complex it's a right dog...you're now in the world of protocol design."

I'm going to assume that you're still in school or not actually a programmer. ZeroMQ is used by AT&T, Cisco, EA, Los Alamos Labs, NASA, Weta Digital, Zynga, Spotify, Samsung Electronics, Microsoft, and CERN (I grabbed this list from its front page). CERN just published a report in which it evaluated ZeroMQ for a custom middleware solutions running 4000 server processes communicating with 80 000 devices for a total of 2 000 000 I/O points that could be queried. It aced the requirements. Erlang has used message passing for decades in the telecoms industry which has enormously complex data and unique realtime, distributed requirements.

Protocol design is easy. Main thread creates child. Child loads video and reports its ready. Main thread sends START. Child thread receives START and starts. Child thread reports DONE when done. Was that so hard?

kasey_junk · on Dec 10, 2013

I've used Zero MQ a lot and really like it. It can make writing concurrency applications much simpler. That said, it is important to note that the library itself is using synchronization primitives.

Understanding when and how ZMQ does this is very important for optimizing throughput.

camperman · on Dec 10, 2013

Yeah, but as an introduction to multithreading it's awesome. You will still learn how to create threads, how to communicate with them, how to synchronise them and how to avoid sharing state if you use Zero MQ. None of that is abstracted. But once you've got the principles down, then you can dig deeper or do it by hand with pthreads or your OS's library if you feel the need.

The main pain with learning threading is you have to wrap your head around the principles of it - which are fairly foreign - while at the same time battling with the very strange and frustrating errors because it's so easy to get wrong. ZMQ teaches you the principles without the errors after which you can go master the low level bits if you wish.