- June 25, 2020 at 2:35 pm #13813Nicolas ChéronGuest
I recently discovered gypsum-dl, and I think it is a very nice new tool that is efficient and easy to use (so I thank you for it). I am trying to convert this SMILES string to 3D: "c12CNCCOCCNCc3ccc4c(cc(cc4)CNCCOCCNCc4ccc(cc1)c(c4)c2)c3". OpenBabel completely fails with it, and gypsum-dl works smoothly. I have a little issue though: this molecule has 4 amine groups, and I never get structures that are charged +4 (unless I set the pH to very low values). I read in the gypsum-dl paper that "It eliminates any additional ionization forms whose formal charges deviate from that baseline by 3 e or more." I do agree that being charged +4 may not be physiologically relevant when the ligand is alone in water, but when bound to a target it is more difficult to assess this statement.
Thus, I would like to allow the generation of highly charged ligands, aiming at seeing after the docking which ionized form seems to provide the best pose. What should I do to allow a charge of +4 ?
- June 27, 2020 at 8:30 pm #13934
Hi Nicolas. Much thanks for your interest in Gypsum-DL. Coincidentally, in the latest version (1.1.5) I changed the charge range to -4e/+4e (to accommodate ATP). So using the latest version, it should be possible to get the +4e form of your ligand.
But unless you use very permissive parameters, you’re unlikely to get the +4e form at physiological pH. To illustrate, let’s assume Gypsum-DL is only considering alternate protonation states and that there’s a 50/50 chance of protonating a given nitrogen. Since you have four such nitrogens, the chance of getting the +4e form would be 0.5 ^ 4, or one in sixteen. If you’re
--max_variants_per_compoundparameter is set to the default (5), you might not see the +4e form unless you run Gypsum-DL multiple times.
If you lower the pH, as you suggested, the chance of protonating increases, so you’re more likely to see +4e. (At low enough values, that’s the only form you’ll see.) Another option would be to ask Gypsum-DL top output more molecules. For examples, when I ran your molecule through the latest version using this command line (which is probably excessive):
python run_gypsum_dl.py --source tmp.smi --max_variants_per_compound 40 --thoroughness 1
I did see the +4e form.
Hope this makes sense! Thanks again for your interest in the software. Take care.
- June 29, 2020 at 5:22 pm #13975Nicolas ChéronGuest
Thank you for your answer.
From dimorphite-dl, the average pKa of amines is 8.16. Thus, at pH=7, [protonated]/[not-protonated]=10^(1.16)=14.5 for one amine, i.e. 94% of chance of being protonated. Thus, the chance of getting the +4e form is (0.94)^4=78%. Thus, I was expecting to see more often the +4e form and I don’t fully understand why it is not the case. Is it because pH is not strictly 7 but is between 6.4 and 8.4 by default ? or more because there is an error bar on the pKa ?
Regarding the likelyhood of being charged +4 in a real case, from the pKa of spermine given in Figure 1 of https://pubs.acs.org/doi/abs/10.1021/jm900187v, it will be mainly charged +4 at pH=7.
As you said, I could increase the max_variants_of_compound, however, if I need to dock a large library it would multiply my waiting time by 8 (if going from default of 5 to 40), that’s why I am trying to more deeply understand how gypsum-dl works.
- June 29, 2020 at 7:37 pm #13981Nicolas ChéronGuest
I can partially answer one of my own questions. I performed several tests with max_variants_of_compound=10, setting different values of min_ph and max_ph, and changing the error bar of amine pKa in site_substructures.smarts.
* Setting the error bar to 0 and –min_ph 6.5 –max_ph 7.5, all output structures are charged +4. Using the default for pH, max charge is +3 (and found only one).
* Setting the error bar to 1.25 (half of what is in the original file), max charge is always +2 (even with –min_ph 7 –max_ph 7).
Thus, both play a role, but the error bar on the pKa seems more important.
I have an additional question: I don’t fully understand what pka_precision does. Manual says "Size of pH substructure ranges", which seems to be what min_ph and max_ph are doing. So what is the difference between the two?
- June 29, 2020 at 8:07 pm #13983Nicolas ChéronGuest
I now understand what pka_precision does, it is clearly explained in the dimorphite publication.
- June 30, 2020 at 2:37 pm #13990Nicolas ChéronGuest
Sorry for all the messages. I am now working on this SMILES string "c1cc2CNCCN3CCNCc4cccc(CNCCN(CCNCc(c2)c1)CCNCc1cc(CNCC3)ccc1)c4" which is molecule 7 from this article http://dx.doi.org/10.1039/c4ob02618g. In Figure 19, we can see the concentration profiles (drawn from experimental pKa), and we see that at pH=7 there is a mixture of compounds charges +3 and +4. I would like to check if I can generate a mixture of structures charged +3 and +4
With –max_variants_per_compound 10 –min_ph 7 –max_ph 7 –pka_precision 0.45, all structures are charged +8. pKa of amines is 8.16+/62.52 and 8.16-0.45*2.52=7.03, thus there is no overlap between pH range and pKa range and it is the expected behavior that amines are all charged (the function remove_highly_charged_molecules doesn’t seem to work in this case, but I think it is the expected behavior since there are no other possible option of protonation).
With –pka_precision 0.5, since 8.16-0.50*2.52=6.9, there is now overlap between pH and pKa range. But 9 out of 10 are charges +2 and 1 is charged +3. If I add –thoroughness 6, I can see in the Contents of MolContainers that some states charged +3 or +4 are created, but at the end I only obtain structures charged +2. Since there are 9 possible protonated forms (charge from 0 to 8), the most likely is the +4 form and I don’t understand why I don’t see it at all.
Can you please explain me what is going on to select the final m compounds from the m*t variants? Is it only the ones with the lowest energies ? Does gypsum directly compare the energy of two molecules with a different ionization state?
Last question: is there a final test to check if we have generated twice the same structure (or two structures very close)?
Thank you for your help.
- July 12, 2020 at 2:31 am #14329
Hi Nicolas. No worries about all the messages. I should start by clarifying that Gypsum-DL was trained on monoprotic compounds. Accuracy is likely less when dealing with compounds that have multiple ionization sites, especially if those sites are near each other. It’s part of the compromise between speed and accuracy that we had to strike. QM-based methods are going to be more accurate, albeit computationally expensive. (I’m sure you’re already aware of all this, but I thought it would be good to mention it for others who visit this page.)
Very impressive how thoroughly you’ve delved into the algorithm! Re. how Gypsum-DL selects the final m compounds from the mt variants, it uses RDKit to calculate the energy of the mt 3D models and then keeps the top m with the best energies. In case you’re interested, some of the relevant code is here:
It could be that your protonation state of interest is more likely to be energetically unfavorable and so is being eliminated for that reason. I believe Gypsum does compare the energies of molecules with different ionization states, as long as they come from the same input SMILES string.
Gypsum does try to eliminate molecules with very similar conformational states, though in practice I have seen some geometrically similar conformations. https://git.durrantlab.pitt.edu/jdurrant/gypsum_dl/-/blob/1.1.6/gypsum_dl/MyMol.py#L596
Hope this answer helps. Take care.
- July 15, 2020 at 1:46 pm #14365Nicolas ChéronGuest
Thank you for your answer. I have now an hypotheses of why 9 out of 10 molecules are charged only +2: molecules charged +3 or +4 have probably a higher energy (computed with RDKit) because of coulombic repulsion between the positively charged sites.
- August 8, 2020 at 11:25 pm #14620
Hi Nicolas. Sorry for the delay in responding! I think you might be right about that. It would make sense that so charged a molecule would be energetically unfavorable. Take care.
- The topic ‘Maximum charge’ is closed to new replies.