GRANT: Machine-Learning Scoring Functions

Random Ideas Until You Get Organized

  1. Different scoring functions for full ligands vs. fragments. Even just getting fragments with high ligand efficiency and then searching ligand databases for ones that contain those fragments, followed by docking, could be effective.
  2. Benchmark training times. It’s not always good to train longer.
  3. You can visualize the importance of different molecules by removing them and restoring. You can do it on an atom by atom basis, or you can fragment and do it that way. David Koes thinks fragmentation might be better. He wonders if it’s truly additive.
  4. Look up clustered cross-validation. Clustering on the receptor sequence alone should be sufficient, not the ligand, because even if you have an identical ligand binding to two very different receptors, that is essentially an independent data point.
  5. For David Koes, accuracy improved substantially when he created arbitrary rotations of his data. I believe this is only relevant to convolutional neural networks, however.
  6. Test different subsets rigorously. Test pdb resolution. Auto-dock Vina atom types, or just elements as he does? Things like that.
  7. Using gaussian for atomic positions rather than just the positions themselves is also an excellent idea. Combine them with a quadratic function so that they go to exactly zero after a certain distance, perhaps some factor of the van der waals radius.
  8. Also separate out IC50 and Ki values to see if that helps.
  9. Understand this: “a maximum Pearson Correlation Coefficient of 0.72, 0.74, 0.66, 0.68 and 0.71 in regression mode and a maximum Matthew’s correlation coefficient 0.91, 0.93, 0.70, 0.89 and 0.71 respectively in classification mode during 10-fold cross-validation.”
  10. Interesting link:
  11. Investigate retraining:

Olexander… UNC

  1. There are formulas to convert between IC50, Kd, and Ki. They are accurate at converting to within 1 order of magnitude. Look into it.
  2. Look up. Extended connectivity fingerprints. ECFP. Topological features of ligands.
  3. Look at both sensitivity and selectivity.
  4. Microenvironment by converting radial distribution (radius and angles) to an input vector somehow.