As an aside - monoisotopic mass is a strange one to use. In the real world you a...

eesmith · on Feb 19, 2020

https://en.wikipedia.org/wiki/Monoisotopic_mass#Monoisotopic... points out:

> The monoisotopic mass is not used frequently in fields outside of mass spectrometry because other fields cannot distinguish molecules of different isotopic composition. For this reason, mostly the average molecular mass or even more commonly the molar mass is used. For most purposes such as weighing out bulk chemicals only the molar mass is relevant since what one is weighing is a statistical distribution of varying isotopic compositions.

> This concept is most helpful in mass spectrometry because individual molecules (or atoms, as in ICP-MS) are measured, and not their statistical average as a whole. Since mass spectrometry is often used for quantifying trace-level compounds, maximizing the sensitivity of the analysis is usually desired. By choosing to look for the most abundant isotopic version of a molecule, the analysis is likely to be most sensitive, which enables even smaller amounts of the target compounds to be quantified. Therefore, the concept is very useful to analysts looking for trace-level residues of organic molecules, such as pesticide residue in foods and agricultural products.

DrScientist · on Feb 21, 2020

However for proteins - which, even if broken down to small peptides in the mass spec, have large numbers of C, N, O, H atoms then monoisotopic makes no sense.

eesmith · on Feb 22, 2020

https://patents.google.com/patent/US20050267689A1/en disagrees with you, saying:

> Leftmost peaks in isotopic clusters correspond to molecules containing only the lowest-mass isotopes of all their atoms: all carbon atoms are C-12, all hydrogen atoms are H-1, all nitrogen atoms are N-14, and so on. These peaks are known to those skilled in the art as monoisotopic peaks. While each chemical species of molecule manifests itself in the mass spectrum as an isotopic cluster, it is characterized by only one monoisotopic peak, thus it became common practice to characterize molecules in the mass range of up to approximately 10 kDa by their monoisotopic masses. For example, it became common practice to use monoisotopic masses in protein identification methods based on comparing mass spectral data to databases of masses of protein fragments.

https://www.sciencedirect.com/topics/biochemistry-genetics-a... also disagrees, quoting from "Protein Identification by Peptide Mass Fingerprinting (PMF)", Nachimuthu Saraswathy, Ponnusamy Ramalingam, in Concepts and Techniques in Genomics and Proteomics, 2011

> 13.4 Data analysis and identification of protein

> The peak list is compared with a peak list generated from the database proteins. The commonly used computer search engines are MS-Fit, Mascot, Peptident, Profound, etc. The monoisotopic mass of the each peak, the protease used, the number of missed cleavages in order to account for the possible incomplete digestion are given as input.

That is, it appears that monoisotopic mass makes good sense for small peptides in mass spectroscopy.

DrScientist · on Feb 24, 2020

Let's take a short tryptic peptide. LQGIVSWGSGCAQK Formula is C62 N18 O19 S1 H100

Let's look at C - C12 is around 98.93 natural abundance, the N 99.6, O 99.76, H 99.98 If we forget the others for simplicity of calculation and only look at C. Then the probability of getting a peptide with all C12 is 0.9893^62 ~ 0.51 ie only half of the sample will be monoisotopic mass - double the length of the peptide and it's down to a quarter - full length protein you are looking at vanishingly small amounts.

The original problem was to calculate masses of things upto 1000aa - something of 1000aa would have a frequency of monoisotopic species of 2.04058E-21 - ie a handful of molecules out of 6x10^23 of a mole.

The value of monoisotopic values decreases as the size and complexity of the molecule goes up.

eesmith · on Feb 24, 2020

You wrote: "The value of monoisotopic values decreases as the size and complexity of the molecule goes up."

Yes, that is is agreement with the text I quoted earlier - "it became common practice to characterize molecules in the mass range of up to approximately 10 kDa by their monoisotopic masses".

Note (from another part of the second link I gave) "This eight amino acid peptide was named GmPep914 (DHPRGGNY), based on its monoisotopic mass."

So, there's plenty of clear evidence that people do use monoisotopic masses for mass spectra analysis of at least some peptides.

What is your point? That this question is poorly written? I think I started this thread to point that out.

DrScientist · on Feb 24, 2020

My point was that while for small molecules, the mono-isotopic mass makes perfect sense as it's the major species, for larger proteins it isn't and indeed becomes a vanishingly small proportion.

Note for something around 10kda the difference between the average and mono-isotopic mass will be around ~6 daltons - with the experimental accuracy around ~1 dalton!

eesmith · on Feb 24, 2020

You write now that it makes perfect sense.

However, at https://news.ycombinator.com/item?id=22383816 you wrote "even if broken down to small peptides in the mass spec, ... then monoisotopic makes no sense".

It was that latter point that I contested, because the literature clearly indicates that for at least some small peptides it makes sense.

DrScientist · on Feb 25, 2020

Depends on your definition of small - but I accept the point that if the peptides are small enough it can be useful and I went too far there.

Remember the problem posed was to calculate the mass of proteins upto 1000aa where the difference between mono-isotopic mass and real average mass would be many 10's of daltons - much more than the missing water!

eesmith · on Feb 25, 2020

I think the problem is poorly worded, which lends itself to the confusion we experience, unless we know (presumably from the context of the goals of this project) that these physical details are beyond the scope of the project.

That is, I think "useful" here is meant as "useful in learning to program", not "useful in actual mass spectra analysis."

Eg, you write "calculate the mass of proteins up to 1000aa".

There's a couple of picky details.

1) The text says "total weight of [A protein string] P", where "The standard weight assigned to each member of the 20-symbol amino acid alphabet is the monoisotopic mass of the corresponding amino acid".

(I'm ignoring that "weight" != "mass" because in this context those are synonyms.)

It reads like the text defines "standard weight" in terms of the monoisotopic mass, and asks to compute that weight. So the problem is not asking to "calculate the mass of proteins up to 1000aa" but "calculate the monoisiotopic mass of proteins up to 1000aa." It further says "all amino acid masses are assumed to be monoisotopic unless otherwise stated".

(Alas, the explanatory text goes on to say "There are two standard ways of computing the mass", which contradicts the assertion that there's a "the standard weight.")

2) It says "protein string" in the problem, not protein, and clarifies that "In the following several problems on applications of mass spectrometry, we avoid the complication of having to distinguish between residues and non-residues by only considering peptides excised from the middle of the protein."

That is, the problem posed was not to calculate the "[monoisotopic] mass of proteins" but something more like the "[monoisotopic] mass of peptides excised from the middle of the protein, represented as a protein string."

3) In trying to understand this @$%@#$%&%$ topic more, I found https://patentimages.storage.googleapis.com/42/6b/2b/ec3a694... which I believe says that for very accurate mass spectrometers, for large proteins, the most abundant mass may be more useful than the average mass.

DrScientist · on Feb 26, 2020

It's suppose to be an educational tool - bioinformatics is more than writing programs to add up numbers it's about understanding the science behind it.

So I didn't like the question because it treated the problem as a simple 'write a program to add up a list of numbers, based on a lookup table',rather than address the real science issues around protein mass. ( Here the real challenges actually come from post-translational modifications - which makes mass matching a very hard problem indeed - with lots of challenges for anyone in computer science )

eesmith · on Feb 26, 2020

In my top-level coment I asked: How (in)correct are the other answers? I-am-not-a-bioinformatics-programmer.

As a variation, how many other questions do you not like?

http://rosalind.info/problems/hamm/ computes the Hamming distance between two strings, using the simplification that only point mutations are important. This is of course not a true reflection of the science behind comparing two DNA strings.

Do you therefore also not like that question? Which others don't you like, because of simplifications they don't explain?

As an education tool, is it not useful to discard complexity in the process of bootstraping towards the full details?

Ha! I had heard the phrase "lie-to-children" before, which seemed relevant to this thread. https://en.wikipedia.org/wiki/Lie-to-children says:

> A lie-to-children (plural lies-to-children) is a simplified explanation of technical or complex subjects as a teaching method for children and laypeople. The technique has been incorporated by academics within the fields of biology, evolution, bioinformatics and the social sciences.