Computer codes that transformed science

dekhn · on Jan 20, 2021

BLAST is a very interesting example. BLAST became so common, but was so poorly understood, that if you wrote a replacement for BLAST that was faster, you had to ensure that the output was identical to the human readable output of BLAST, and bug-for-bug compatible, too.

BLAST wasn't really implemented as a library- it was based on a baroque toolkit and the underlying code was extremely complex to analyze.

There was no officially supported file format (other than the human readable output) so you had to write parsers. The human readable output changed between versions.

The one good thing about BLAST was the stastical model- you really could trust the calibrated E-values.

_ofdw · on Jan 20, 2021

>BLAST

Not to be confused with BLAS, which is for Linear Algebra and likewise a neat example. BLAS and its co-conspirator LAPACK are the underpinning for many (most?) scientific libraries in use today. Packages like scipy will wrap BLAS and LAPACK to make them more ergonomic and present more uniform interfaces in the style of the host language.

dekhn · on Jan 20, 2021

Well, I worked with BLAS as well, but it was just something we linked in (IE, it's a stable library with a stable interface with a limited set of functions). None of this should be surprising- in my opinion, computers mainly exist to do linear algebra efficiently.

ogogmad · on Jan 20, 2021

> in my opinion, computers mainly exist to do linear algebra efficiently.

What about the discussion we're having? It's enabled by computers. But no linear algebra is happening there.

What about word processing? Computer games? The world wide web, and all the information contained in it? The authoring of music and graphics? All of those are enabled by computers.

Saying it's about doing linear algebra quickly is massively reductive.

adgjlsfhk1 · on Jan 20, 2021

Note also that PageRank is the algorithm that lets google rank results and is also just a massive linear algebra problem.

messe · on Jan 21, 2021

PageRank was the algorithm. These days, while PageRankmight still be in-play somewhere, I suspect the results are largely dominated by machine-learned suggestions.

MaxBarraclough · on Jan 20, 2021

I agree it's going too far to say that computers are mostly about linear algebra, but computer graphics is one of the go-to examples of applied linear algebra. I believe linear algebra is also central to games' physics engines, but that's not something I know much about.

prewett · on Jan 20, 2021

Computer graphics uses matrices to transform the points, but there isn't much else about it that uses linear algebra. 2D rasterization cleverly iterates over pixels. 3D rasterization does the same for triangles. The other parts of 3D are writing shaders (these days, usually modelling real-world materials), and rendering to textures for effects like reflections and shadows. Even though a main application of linear algebra is computer graphics, computer graphics uses very little linear algebra; the matrix part of your code is pretty small.

whatshisface · on Jan 20, 2021

"Linear Algebra," truth be told, involves a lot of high-rent concepts that are not related to computer graphics. Even midlevel concepts like eigenvalues seem to make no appearance. It would be a little more representative to say that computer graphics involved vectors and matrices.

jjoonathan · on Jan 21, 2021

I've seen bounding box code that uses PCA and, by extension, eigenvectors. I think inverse kinematics code uses eigenvectors too. But yeah, transforms easily make up 80% of the use cases.

a1369209993 · on Jan 21, 2021

> It would be a little more representative to say that computer graphics involved vectors and matrices.

"Linear Arithmetic", maybe?

dekhn · on Jan 20, 2021

All the things you listed above involve linear algebra, including word processors (a modern word processor is loaded with algorithms).

bernardv · on Jan 20, 2021

I would not have included Jupyter Notebooks in this list. However, I would not hesitate to include the Pandas library in such a list as it has propelled the democratization of data analysis forward a great deal. I recall the days of having to either cobble together my own data analysis algorithms or having to depend on multiple libraires to get basic things done. Pandas, however, wouldn’t be Pandas without Numpy and ultimately the C/Fortran libraries it was built on.

randlet · on Jan 20, 2021

> Pandas, however, wouldn’t be Pandas without Numpy and ultimately the C/Fortran libraries it was built on.

I think either numpy/numeric and scipy deserve to be on the list more than Jupyter (and Pandas) since I think they were the fundamental reason that Python took off in physical / data sciences.

hatmatrix · on Jan 20, 2021

Jupyter notebooks brought documented code to a broad audience (I personally use org-mode, but I appreciate that Jupyter can appeal to non-Emacs users).

Pandas is based in concept on R's data frames and dplyr libraries which predate it by a long shot. If you were talking about implementation of data frames, then you would have to include Julia's as well.

From a practical perspective though, SQL has to be a big one - though not often used by people closest to the science.

julienchastang · on Jan 21, 2021

Even more than that, Jupyter brought literate programming and REPL-style programming to the masses. Those ideas have been present from the dawn of computing (and have seen various proprietary incarnation e.g., Matlab, Mathematica) but it was Jupyter that thrust them to wide audiences. I was glad to see Jupyter mentioned as an important advancement in the scientific computing arena. In addition, for me, the paper was a delight to read. I have met or even collaborated with a number of people interviewed in the article. (I am also an org-mode / babel user.)

hatmatrix · on Jan 22, 2021

org-babel is brilliant but I do find that you have to do a lot of things manually that Jupyter does automatically (manage figures, etc.)

avrionov · on Jan 20, 2021

I know that some people here on hackernews don't have a good opinion of Jupyter Notebooks, but I expect them to be a very important step in how we publish scientific papers and experiment with ideas. There are probably other tools that are currently more influential, but the longer term the notebooks are going to replace the traditional papers.

jimbokun · on Jan 20, 2021

Probably could have been folded into the IPython entry, as that's how many people encounter it.

jleyank · on Jan 20, 2021

Well, I would have thought that "molecular simulation codes" such as QM, MD or the related methods to MD should have been on this list. They pretty much all arose from academic groups such as those of Karplus and Pople, and all profit from the staggering increase in compute power. They drive material design, drug design, ... all of which are significant technological problems of today.

EDIT: I think the Unix family tree should have been high on the list as it forms the basis of most of today's servers.

dekhn · on Jan 20, 2021

it's incorrect to say that molecular simulation codes have a real impact on drug design. That's a narrative that is stated, but it's not representative. Molecular simulation codes are either too inaccurate, or undersample the space (we can't tell because of how much CPU is required to answer). I think most people have come to realize that modelling like this is mainly useful as a visualization technique to obtain intuition for how proteins behave, not a quantitative way of probing a system computationally.

selimthegrim · on Jan 21, 2021

What on earth is DESRES spending all that money on then

dekhn · on Jan 21, 2021

That's a great question! David Shaw disagrees with me about whether this is going to work (I used to be in his camp). Their money goes to a number of things- high personnel costs, presumably high hardware costs, I don't know what else.

hchz · on Jan 20, 2021

Monte Carlo is a big omission.

https://laws.lanl.gov/vhosts/mcnp.lanl.gov/pdf_files/la-ur-1...

SiempreViernes · on Jan 20, 2021

I guess they figured Monte Carlo is more of a technique than something as definite as a code. Certainly the first implementations didn't do much worthwhile science.

hchz · on Jan 20, 2021

Monte Carlo is a class of algorithms, just like FFT isn't a code or a specific algorithm but a class of algorithms.

Is "worthwhile science" a jab at nuclear applications?

MC was developed by and used for the Manhattan Project - almost the entire computer architecture we are using to communicate resulted from military research related to strategy concerning weapons that could not have been made without MC, and this remade not only our relationship with technology but the geopolitics of the entire world.

I would say that, more broadly, the omission is the lack of any mention of the use of random sampling, including MC and random methods in machine learning.

whatshisface · on Jan 20, 2021

It's subjective, but the idea behind monte carlo seems so obvious that I can hardly imagine it being called an invention.

jjoonathan · on Jan 21, 2021

Almost all monte carlo production code is MCMC of some form and the idea of constructing a markov chain with a certain stationary distribution by generating and accepting/rejecting candidate jumps is far from trivial.

buescher · on Jan 20, 2021

They included the FFT.

Also, you'd think computer algebra systems would make the list. Macsyma, Mathematica, etc.

Mathematica has had a notebook interface since before I could personally afford a machine that could run Mathematica, but somehow the newer and inferior Ipython/Jupyter system makes the list?

Y_Y · on Jan 20, 2021

Mathematica is not much liked in science. If you aren't careful or good at programming (most scientists) it's very slow, and it's much too easy to create a forkbomb the crashes the UI with a trillion red errors.

Plus the language is an ugly lisp and looks weird to people who understand log tables and calculators and Fortran.

And it's expensive and the guy most associated with it (Stephen something) has a bad rep.

But Jupyter is hot at the moment (for good and bad reasons) and it's FOSS and is easy to install as long as you can get your grad student to install conda for you.

buescher · on Jan 20, 2021

Somebody's buying enough licenses to keep them in business. You either find a computer algebra system useful for what you do or you don't. My needs are pretty simple and Wolfram Alpha works well for me when I need something like that. Better than GNU Maxima, which I also like.

Stephen Wolfram made his early reputation in part by being one of the first people to seriously use Macsyma for physics. No, I don't know any physicists that take A New Kind of Science that seriously, but that's a different story.

Jupyter is hot, but it's not novel. The Mathematica notebook interface was an innovation and is still best-in-class in my opinion.

dekhn · on Jan 21, 2021

I wasn't very good at solving integrals in grad school, so I bought Mathematica because I had the impression it could solve them. What I found was interesting. First, for all the integrals I dealt with, computers can't solve them using the rule systems that exist. Instead, you need a Physics Grad Student who knows how to solve integrals. Mathmeatica will just take your integral and spit out a more complicated one.

However, Mathematica is really good for symbolic algebra! I wrote a Python-Mathematica bridge many, many years ago, and learned that not only could it decompose numeric matrices, it could do it for matrices with symbols. Mind=blown.

SiempreViernes · on Jan 26, 2021

That's the thing, there is essentially one FFT algorithm. If you want to include a big family of algorithms as Monte Carlo you better make place for it further down the line after Euler integration or possibly just numerical integration, and it's starting to become a question if maybe the list should simply be "addition and multiplication but with computers".

leecarraher · on Jan 20, 2021

Stochastic Gradient Descent as well.

leecarraher · on Jan 20, 2021

It is the primary training method for Pytorch and Tensorflow models and thus essential for training artificial neural nets (such as AlexNet, weird they would randomly pick an architecture). It is often used for non-negative matrix factorization, non-linear regression, the basis for most modern machine learning algorithms.

avalys · on Jan 20, 2021

The use of the term “code” in this way is universal among the US national laboratories and adjacent scientific computing fields.

It refers to a complete software package. So, a “hydrodynamics code” is a hydrodynamics simulation, like Gerris.

It seems peculiar but they’ve been using the term this way since the 1960’s and probably long before. Given how much of modern computer science arose from the US government scientific complex I’d say they’ve earned the right to define their own usage.

lp251 · on Jan 20, 2021

Can confirm. My group leader is interested in building “a new hydrocode”, or discussing particulars of “a transport code”.

Also, a whole bunch of people at LANL call slide decks “viewgraphs”. That’s my favorite.

gnufx · on Jan 20, 2021

It's widely used on the right side of the Atlantic too, at least in technical circles. What's unfortunate is that -- especially in the wider world -- now everyone just talks about "coding", the relatively trivial part of programming.

_ofdw · on Jan 20, 2021

It's pretty common in Computational Fluids as well. Two well-known codes among several are Code Aster[0], a structural mechanics, thermodynamics, and finite element analysis solver, and Code Saturne[1], a Navier-Stokes solver.

[0] https://www.code-aster.org/

[1] https://www.code-saturne.org/cms/

analog31 · on Jan 20, 2021

This was common when I was in school, long ago. Also, codes may have had names, but more frequently than not were named after their authors. "I used so-and-so's code for this."

The problem is compounded today, by a certain contingent (okay, physicists) who enjoy referring to things by the most archaic possible name.

randlet · on Jan 20, 2021

It's not just the US, I think it's fairly common in the physical sciences in general. My masters project was "A Fast Monte Carlo Code for Brachytherapy" where "Code" really means software/program.

armadsen · on Jan 20, 2021

Yep. I did a tiny bit of Fortran for antenna design and modeling in college in the 2000s. I noticed then that the professor -- who had been at my university since the late 50s -- and surrounding material (textbook, etc.) referred to programs as "codes".

RcouF1uZ4gsC · on Jan 20, 2021

>The ‘discovery’ was actually a rediscovery — the German mathematician Carl Friedrich Gauss worked it out in 1805, but he never published it, says Nick Trefethen, a mathematician at the University of Oxford, UK.

I wonder what you could come up with if you went through unpublished works of people like Gauss, Ramanujan, Euler, Erdos, etc. There are likely a bunch of stuff that they would have seen as merely an impractical curiosity back then, but that actually might have a lot of use now.

jimbokun · on Jan 20, 2021

Impressive that Dayhoff appears on the list twice, for "biological databases" and BLAST. I think she's the only one.

dekhn · on Jan 21, 2021

Yes, Margaret Dayhoff was great. See this article for more: https://www.smithsonianmag.com/science-nature/how-margaret-d...

gnufx · on Jan 20, 2021

I'm not sure what to make of the article but, since it includes BLAS, it might be worth pointing out that scientific subroutine libraries are one of the most successful examples of reuse, particularly compared with more trendy things aiming at that.

I can't remember if I ever actually linked ESSL (Engineering and Scientific Subroutine Library) on OS\370 but I remember it in the mid(?) 80s, when I don't think it was very new. These days I have it on POWER9 (if doubtless rather different). I don't know if it's older than the NAG library.

rxm · on Jan 21, 2021

I would have included symbolic algebra systems to the list. Many famous calculations in physics and mathematics may not have been attempted without them. Macsyma and Reduce paved the way for Mathematica and Maple. There are still many symbolic algebra algorithms waiting to be implemented. The love and care many open source symbolic systems are receiving is a reflection of their need and importance to science and technology.

klelatti · on Jan 20, 2021

Alternate headline: Ten ways computing helped to transform science

LVB · on Jan 20, 2021

I expected to find: https://en.wikipedia.org/wiki/Nastran

dljsjr · on Jan 20, 2021

Lotta people haranguing about the use of the phrase "computer codes", it's just a very old/anachronistic way of referring to a computer program. Most of the people I know who still use that language are Fortran devs.

Just an old fashioned way of saying "programs".

itronitron · on Jan 20, 2021

Yeah, it's preferred terminology by people that share their work as source code and not as applications.

JJMcJ · on Jan 20, 2021

More in the scientific world, more in Britain than in the USA.

lostcolony · on Jan 20, 2021

That might explain it; I mostly have heard "computer codes" (pluralized with an 's' at the end, rather than treating 'computer code' as a non-granular catch all for any amount of written code, much like a liquid; "water" is any amount of water, since it requires greater definition to talk about the singular, is it a drop, a molecule, etc. And referencing "waters" is an archaic reference to separate bodies of water, more commonly replaced with 'seas' or 'oceans' or similar) from Indian programmers, and there fairly frequently. Maybe it's an academic influence from Britain, that has gotten dropped with the British programmers I've met, but still influenced the schooling of the Indian ones.

dljsjr · on Jan 20, 2021

Perhaps. I'm in the US, and I hear it from old-school devs sometimes.

DoingIsLearning · on Jan 20, 2021

I think it's also context/industry specific, it's still pretty common terminology in Aerospace/Avionics/Defense even among younger people.

amelius · on Jan 20, 2021

Were these codes successfully commercialized?

analog31 · on Jan 20, 2021

In some sense a couple of them were, though not necessarily by their originators. The original FORTRAN compiler is long gone, but there were commercial FORTRAN compilers sold for numerous computers before the rise of free software. Likewise BLAS and its ilk have been recompiled into the numerical packages are provided in support of hardware sales, such as the Intel Math Kernel Library.

tpoacher · on Jan 21, 2021

"Codes"? "CODES"? In Nature of all places?

Someones shoulds gives theses guys somes advices abouts writings corrects grammars.

sieste · on Jan 20, 2021

"Ten computer codes that transformed genomics" would be a more fitting title. It's a good selection of algorithms, but quite biased by the author's field of research.

jimbokun · on Jan 20, 2021

I think BLAS, FFT and IPython are all quite a lot more general than just genomics.

hatmatrix · on Jan 20, 2021

They also talk about climate models though.

1vuio0pswjnm7 · on Jan 20, 2021

Are Jupyter notebooks slow.

boffinism · on Jan 20, 2021

What's a computer code?

tzs · on Jan 20, 2021

> What's a computer code?

The book "Working with Coders" [1] by Patrick Gleeson might answer that for you. On page 4 it says that the author is going to "assume that you don't know the first thing about computer code" and goes on to imply that the book will explain it.

I'm somewhat confused that you have to ask, though, since your HN "about" says "boffinism.com / patrickgleeson.com", and based on the content of those sites you appear to be the author of that book.

Did you not read it while you were writing it? :-)

Or was you comment a clever ploy, hoping someone would read your HN "about", see that you clearly know what "computer code" is, and call you on it, thereby generating some free publicity for your book?

If so, well played you magnificent bastard [2].

[1] https://www.amazon.co.uk/dp/148422700X/ref=rdr_ext_tmb

[2] https://tvtropes.org/pmwiki/pmwiki.php/Main/MagnificentBasta...

boffinism · on Jan 20, 2021

I mean, I was really just facetiously pointing out that computer code is a mass noun, so we are all used to talking about _some_ computer code, but not so much _a_ computer code.

But... Thanks for the bump!

pintxo · on Jan 20, 2021

Apparently it's about computer science / information technology contributions to science as a whole:

    Fortran compiler 
    Fast Fourier transform 
    Biological databases 
    General circulation model of the climate 
    BLAS 
    NIH Image / ImageJ / Fiji 
    BLAST 
    arXiv 
    IPython Notebook / Jupyter 
    AlexNet

Interesting that they do not list the internet itself.

imglorp · on Jan 20, 2021

I would like to nominate several newer software-heavy projects that have moved humanity forward.

    * CERN LHC
    * Event Horizon Telescope
    * LIGO

dekhn · on Jan 21, 2021

LIGO used my computer code (pyglobus) to move data around! I was very proud of that.

ralfn · on Jan 20, 2021

I think the OP was referring to the unfortunate phrasing.

For a publication like Nature to use baby-talk phrasings that are just semantic nonsense.

Im not a native English speaker, but I would be embarrassed if I wrote that headline.

wyldfire · on Jan 20, 2021

For a scientist who uses computers but not to write software, it's probably forgivable. It seems even more forgivable for a journalist.

retrac · on Jan 21, 2021

It's a boffinism! Some fields/sectors do use it like a mass noun as you point out. Often refers to a set of routines or a library. So something like "ocean model codes" means packages of routines or libraries for ocean weather simulation.

mrtimuk · on Jan 20, 2021

Exactly. The title sounds like CPU instructions; but the article reads as "pieces of software".

ohgodhelpplease · on Jan 20, 2021

ESL-speak for many things related to programming.

unwind · on Jan 20, 2021

Aaurgh, the weird Anglo/scientific terminology where "a computer program" has been transformed to "a code".

So weird, ugly, misleading and strange.