To correctly score a the binding of a protein-ligand complex, it’s critical to first correctly predict the binding pose. This project identifies which docking programs are best suited to predicting the ligand poses of compounds bound to different protein classes and makes that data available for all to access.
I imagine this project could be applied to this R01.
[project-management file=”Systematic-Docking-Database.xml”]
Additional Random Thoughts
How to divide proteins?
Could be by family, fold, etc. But that doesn’t seem right. Could have same fold but very different binding-pocket properties.
So, how about flood the pocket with points, out do the convex hull defined by alpha carbons, as well as the bounds of the docked box. Count numbers of different kinds of receptor atoms in that box. Then normalize? K-means clustering to get groups of similar proteins.
How to train docking?
smina, made right here at Pitt. Lots of pocket measurements built in. You need to find right weight combinations for each term.
Optimize one weight, find best one. Then keep that one, optimize next weight, etc. Keep going until accuracy does not improve significantly.
Don’t throw out other data. Just weight the specific receptor class your interested in more heavily during fitting.
You might have a reduced set for an initial pass, to throw out ones that are obviously bad, but do a final test on the whole set to make sure you’ve picked the right descriptor to optimize at each step.
CROSS VALIDATION ISÂ CRITICAL!!!! Search PDB to get more examples of ones that aren’t co-crystallized with ligands.
Additional Notes
Pose prediction project. How does Vina’s accuracy vary by RMSD? That would be an interesting graph. Look into RELU layers that’s what Google uses. Consider throwing out crystal structures with high RMSD.