BLAST is a very interesting example. BLAST became so common, but was so poorly understood, that if you wrote a replacement for BLAST that was faster, you had to ensure that the output was identical to the human readable output of BLAST, and bug-for-bug compatible, too.
BLAST wasn't really implemented as a library- it was based on a baroque toolkit and the underlying code was extremely complex to analyze.
There was no officially supported file format (other than the human readable output) so you had to write parsers. The human readable output changed between versions.
The one good thing about BLAST was the stastical model- you really could trust the calibrated E-values.
Not to be confused with BLAS, which is for Linear Algebra and likewise a neat example. BLAS and its co-conspirator LAPACK are the underpinning for many (most?) scientific libraries in use today. Packages like scipy will wrap BLAS and LAPACK to make them more ergonomic and present more uniform interfaces in the style of the host language.
Well, I worked with BLAS as well, but it was just something we linked in (IE, it's a stable library with a stable interface with a limited set of functions). None of this should be surprising- in my opinion, computers mainly exist to do linear algebra efficiently.
> in my opinion, computers mainly exist to do linear algebra efficiently.
What about the discussion we're having? It's enabled by computers. But no linear algebra is happening there.
What about word processing? Computer games? The world wide web, and all the information contained in it? The authoring of music and graphics? All of those are enabled by computers.
Saying it's about doing linear algebra quickly is massively reductive.
PageRank was the algorithm. These days, while PageRankmight still be in-play somewhere, I suspect the results are largely dominated by machine-learned suggestions.
I agree it's going too far to say that computers are mostly about linear algebra, but computer graphics is one of the go-to examples of applied linear algebra. I believe linear algebra is also central to games' physics engines, but that's not something I know much about.
Computer graphics uses matrices to transform the points, but there isn't much else about it that uses linear algebra. 2D rasterization cleverly iterates over pixels. 3D rasterization does the same for triangles. The other parts of 3D are writing shaders (these days, usually modelling real-world materials), and rendering to textures for effects like reflections and shadows. Even though a main application of linear algebra is computer graphics, computer graphics uses very little linear algebra; the matrix part of your code is pretty small.
"Linear Algebra," truth be told, involves a lot of high-rent concepts that are not related to computer graphics. Even midlevel concepts like eigenvalues seem to make no appearance. It would be a little more representative to say that computer graphics involved vectors and matrices.
I've seen bounding box code that uses PCA and, by extension, eigenvectors. I think inverse kinematics code uses eigenvectors too. But yeah, transforms easily make up 80% of the use cases.
I would not have included Jupyter Notebooks in this list. However, I would not hesitate to include the Pandas library in such a list as it has propelled the democratization of data analysis forward a great deal. I recall the days of having to either cobble together my own data analysis algorithms or having to depend on multiple libraires to get basic things done. Pandas, however, wouldn’t be Pandas without Numpy and ultimately the C/Fortran libraries it was built on.
> Pandas, however, wouldn’t be Pandas without Numpy and ultimately the C/Fortran libraries it was built on.
I think either numpy/numeric and scipy deserve to be on the list more than Jupyter (and Pandas) since I think they were the fundamental reason that Python took off in physical / data sciences.
Jupyter notebooks brought documented code to a broad audience (I personally use org-mode, but I appreciate that Jupyter can appeal to non-Emacs users).
Pandas is based in concept on R's data frames and dplyr libraries which predate it by a long shot. If you were talking about implementation of data frames, then you would have to include Julia's as well.
From a practical perspective though, SQL has to be a big one - though not often used by people closest to the science.
Even more than that, Jupyter brought literate programming and REPL-style programming to the masses. Those ideas have been present from the dawn of computing (and have seen various proprietary incarnation e.g., Matlab, Mathematica) but it was Jupyter that thrust them to wide audiences. I was glad to see Jupyter mentioned as an important advancement in the scientific computing arena. In addition, for me, the paper was a delight to read. I have met or even collaborated with a number of people interviewed in the article. (I am also an org-mode / babel user.)
I know that some people here on hackernews don't have a good opinion of Jupyter Notebooks, but I expect them to be a very important step in how we publish scientific papers and experiment with ideas. There are probably other tools that are currently more influential, but the longer term the notebooks are going to replace the traditional papers.
Well, I would have thought that "molecular simulation codes" such as QM, MD or the related methods to MD should have been on this list. They pretty much all arose from academic groups such as those of Karplus and Pople, and all profit from the staggering increase in compute power. They drive material design, drug design, ... all of which are significant technological problems of today.
EDIT: I think the Unix family tree should have been high on the list as it forms the basis of most of today's servers.
it's incorrect to say that molecular simulation codes have a real impact on drug design. That's a narrative that is stated, but it's not representative. Molecular simulation codes are either too inaccurate, or undersample the space (we can't tell because of how much CPU is required to answer). I think most people have come to realize that modelling like this is mainly useful as a visualization technique to obtain intuition for how proteins behave, not a quantitative way of probing a system computationally.
That's a great question! David Shaw disagrees with me about whether this is going to work (I used to be in his camp). Their money goes to a number of things- high personnel costs, presumably high hardware costs, I don't know what else.
I guess they figured Monte Carlo is more of a technique than something as definite as a code. Certainly the first implementations didn't do much worthwhile science.
Monte Carlo is a class of algorithms, just like FFT isn't a code or a specific algorithm but a class of algorithms.
Is "worthwhile science" a jab at nuclear applications?
MC was developed by and used for the Manhattan Project - almost the entire computer architecture we are using to communicate resulted from military research related to strategy concerning weapons that could not have been made without MC, and this remade not only our relationship with technology but the geopolitics of the entire world.
I would say that, more broadly, the omission is the lack of any mention of the use of random sampling, including MC and random methods in machine learning.
Almost all monte carlo production code is MCMC of some form and the idea of constructing a markov chain with a certain stationary distribution by generating and accepting/rejecting candidate jumps is far from trivial.
Also, you'd think computer algebra systems would make the list. Macsyma, Mathematica, etc.
Mathematica has had a notebook interface since before I could personally afford a machine that could run Mathematica, but somehow the newer and inferior Ipython/Jupyter system makes the list?
Mathematica is not much liked in science. If you aren't careful or good at programming (most scientists) it's very slow, and it's much too easy to create a forkbomb the crashes the UI with a trillion red errors.
Plus the language is an ugly lisp and looks weird to people who understand log tables and calculators and Fortran.
And it's expensive and the guy most associated with it (Stephen something) has a bad rep.
But Jupyter is hot at the moment (for good and bad reasons) and it's FOSS and is easy to install as long as you can get your grad student to install conda for you.
Somebody's buying enough licenses to keep them in business. You either find a computer algebra system useful for what you do or you don't. My needs are pretty simple and Wolfram Alpha works well for me when I need something like that. Better than GNU Maxima, which I also like.
Stephen Wolfram made his early reputation in part by being one of the first people to seriously use Macsyma for physics. No, I don't know any physicists that take A New Kind of Science that seriously, but that's a different story.
Jupyter is hot, but it's not novel. The Mathematica notebook interface was an innovation and is still best-in-class in my opinion.
I wasn't very good at solving integrals in grad school, so I bought Mathematica because I had the impression it could solve them. What I found was interesting. First, for all the integrals I dealt with, computers can't solve them using the rule systems that exist. Instead, you need a Physics Grad Student who knows how to solve integrals. Mathmeatica will just take your integral and spit out a more complicated one.
However, Mathematica is really good for symbolic algebra! I wrote a Python-Mathematica bridge many, many years ago, and learned that not only could it decompose numeric matrices, it could do it for matrices with symbols. Mind=blown.
That's the thing, there is essentially one FFT algorithm. If you want to include a big family of algorithms as Monte Carlo you better make place for it further down the line after Euler integration or possibly just numerical integration, and it's starting to become a question if maybe the list should simply be "addition and multiplication but with computers".
It is the primary training method for Pytorch and Tensorflow models and thus essential for training artificial neural nets (such as AlexNet, weird they would randomly pick an architecture). It is often used for non-negative matrix factorization, non-linear regression, the basis for most modern machine learning algorithms.
The use of the term “code” in this way is universal among the US national laboratories and adjacent scientific computing fields.
It refers to a complete software package. So, a “hydrodynamics code” is a hydrodynamics simulation, like Gerris.
It seems peculiar but they’ve been using the term this way since the 1960’s and probably long before. Given how much of modern computer science arose from the US government scientific complex I’d say they’ve earned the right to define their own usage.
It's widely used on the right side of the Atlantic too, at least in technical circles. What's unfortunate is that -- especially in the wider world -- now everyone just talks about "coding", the relatively trivial part of programming.
It's pretty common in Computational Fluids as well. Two well-known codes among several are Code Aster[0], a structural mechanics, thermodynamics, and finite element analysis solver, and Code Saturne[1], a Navier-Stokes solver.
This was common when I was in school, long ago. Also, codes may have had names, but more frequently than not were named after their authors. "I used so-and-so's code for this."
The problem is compounded today, by a certain contingent (okay, physicists) who enjoy referring to things by the most archaic possible name.
It's not just the US, I think it's fairly common in the physical sciences in general. My masters project was "A Fast Monte Carlo Code for Brachytherapy" where "Code" really means software/program.
Yep. I did a tiny bit of Fortran for antenna design and modeling in college in the 2000s. I noticed then that the professor -- who had been at my university since the late 50s -- and surrounding material (textbook, etc.) referred to programs as "codes".
>The ‘discovery’ was actually a rediscovery — the German mathematician Carl Friedrich Gauss worked it out in 1805, but he never published it, says Nick Trefethen, a mathematician at the University of Oxford, UK.
I wonder what you could come up with if you went through unpublished works of people like Gauss, Ramanujan, Euler, Erdos, etc. There are likely a bunch of stuff that they would have seen as merely an impractical curiosity back then, but that actually might have a lot of use now.
I'm not sure what to make of the article but, since it includes BLAS, it might be worth pointing out that scientific subroutine libraries are one of the most successful examples of reuse, particularly compared with more trendy things aiming at that.
I can't remember if I ever actually linked ESSL (Engineering and Scientific Subroutine Library) on OS\370 but I remember it in the mid(?) 80s, when I don't think it was very new. These days I have it on POWER9 (if doubtless rather different). I don't know if it's older than the NAG library.
I would have included symbolic algebra systems to the list. Many famous calculations in physics and mathematics may not have been attempted without them. Macsyma and Reduce paved the way for Mathematica and Maple. There are still many symbolic algebra algorithms waiting to be implemented. The love and care many open source symbolic systems are receiving is a reflection of their need and importance to science and technology.
Lotta people haranguing about the use of the phrase "computer codes", it's just a very old/anachronistic way of referring to a computer program. Most of the people I know who still use that language are Fortran devs.
That might explain it; I mostly have heard "computer codes" (pluralized with an 's' at the end, rather than treating 'computer code' as a non-granular catch all for any amount of written code, much like a liquid; "water" is any amount of water, since it requires greater definition to talk about the singular, is it a drop, a molecule, etc. And referencing "waters" is an archaic reference to separate bodies of water, more commonly replaced with 'seas' or 'oceans' or similar) from Indian programmers, and there fairly frequently. Maybe it's an academic influence from Britain, that has gotten dropped with the British programmers I've met, but still influenced the schooling of the Indian ones.
In some sense a couple of them were, though not necessarily by their originators. The original FORTRAN compiler is long gone, but there were commercial FORTRAN compilers sold for numerous computers before the rise of free software. Likewise BLAS and its ilk have been recompiled into the numerical packages are provided in support of hardware sales, such as the Intel Math Kernel Library.
"Ten computer codes that transformed genomics" would be a more fitting title. It's a good selection of algorithms, but quite biased by the author's field of research.
The book "Working with Coders" [1] by Patrick Gleeson might answer that for you. On page 4 it says that the author is going to "assume that you don't know the first thing about computer code" and goes on to imply that the book will explain it.
I'm somewhat confused that you have to ask, though, since your HN "about" says "boffinism.com / patrickgleeson.com", and based on the content of those sites you appear to be the author of that book.
Did you not read it while you were writing it? :-)
Or was you comment a clever ploy, hoping someone would read your HN "about", see that you clearly know what "computer code" is, and call you on it, thereby generating some free publicity for your book?
I mean, I was really just facetiously pointing out that computer code is a mass noun, so we are all used to talking about _some_ computer code, but not so much _a_ computer code.
It's a boffinism! Some fields/sectors do use it like a mass noun as you point out. Often refers to a set of routines or a library. So something like "ocean model codes" means packages of routines or libraries for ocean weather simulation.
BLAST wasn't really implemented as a library- it was based on a baroque toolkit and the underlying code was extremely complex to analyze.
There was no officially supported file format (other than the human readable output) so you had to write parsers. The human readable output changed between versions.
The one good thing about BLAST was the stastical model- you really could trust the calibrated E-values.