a few problems…

Viewing 3 reply threads
  • Author
    Posts
    • #15144
      Chritsian Le Gouill
      Guest

      Hello,
      I tried Gypsum_DL with 3 different libraries. One of 5000 cmpds from Chembridge, it did go through but these structures were not converted: CC1=CC=CN2N=CC(C(=O)NC34CC5CC(C3)CC(C5)(C4)N3C=NC=N3)=C12; CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCN(CC1)C1CCOCC1 ; CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCN(CC1)C1=NC=CN=C1 ; CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCC(CC1)NS(C)(=O)=O
      One library from SPECS of 37000 cmpds, prepared in Datawarrior and Smi file prepared with InstantJchem (ChemAxon). The conversion of this library with Gypsum hangs at one point and do not go through… no error message to indicate what is the problem.
      One library of 250k. If I try to convert it, only one of the 12 threads (6 cores with 2 threads each) available is used so I had to stop it as it would have taken a month to go through. So I fragmented it in 50k and then all threads were used however at one point I get this message:
      "Killed" …. I fragmented it into 25k, same problem.
      During the conversion of the 37k and 250k, I also get these messages:
      Detected unusual substructure: C=C([O-])[OH]
      Detected unusual substructure: C(=[CH2])[OH]
      Detected unusual substructure: [C-]
      Here is the command line I use:
      python run_gypsum_dl.py –source Smi-Specs_Divsersity_35000-cmpds.smi –min_ph 7.4 –max_ph 7.4 –pka_precision 1 –job_manager multiprocessing –num_processors -1 –use_durrant_lab_filters

      Best, Christian

    • #15531
      Christian Le Gouill
      Guest

      Hello,

      I was not allocating enough Ram to Gypsum-DL. Now, I do not have any problem with the prg quitting suddenly with the mention "Killed".
      I found 2 structures that were creating problems in one of my libraries (see other post). I removed them and everything is fine now.
      Best Regards,
      Christian

      • #15885
        Jacob Durrant
        Keymaster

        Hi Christian. Sorry for my delay in getting back to you, and much thanks for bringing these issues to my attention. Excellent that you were able to debug the memory issue. Seems likely that others will run into this problem too, so I added a line to the README.md file (to be included in future versions): https://git.durrantlab.pitt.edu/jdurrant/gypsum_dl/-/blob/master/README.md#memory-considerations

        Regarding the four Chembridge compounds, I’m not surprised that Gypsum struggled to process them. They all contain adamantane substructures. I think what’s happening is that RDKit often fails to generate acceptable 3D coordinates for these constrained-ring structures, so Gypsum ends up throwing them out. In some cases, none of the generated structures are acceptable. To test this theory, I processed the following SMILES strings:

         
        CC1=CC=CN2N=CC(C(=O)NC34CC5CC(C3)CC(C5)(C4)N3C=NC=N3)=C12
        CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCN(CC1)C1CCOCC1
        CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCN(CC1)C1=NC=CN=C1
        CCC12CC3CC(O)(C1)CC(C3)(C2)C(=O)N1CCC(CC1)NS(C)(=O)=O
        

        With these parameters:

         
        {
            "source": "t2.smi",
            "separate_output_files": true,
            "job_manager": "multiprocessing",
            "output_folder": "gypsum_dl_test_output_test2_mult/",
            "add_pdb_output": true,
            "num_processors": -1,
            "min_ph": 7.4,
            "max_ph": 7.4,
            "pka_precision": 1,
            "use_durrant_lab_filters": true,
            "thoroughness": 3,
            "max_variants_per_compound": 5
        }
        

        Three of the four compounds failed.

        But when I increased the thoroughness to 6, even when max_variants_per_compound was 1 to speed things up, only one compound failed, presumably because with these settings gypsum had more tries to get it right. I added some notes here: https://git.durrantlab.pitt.edu/jdurrant/gypsum_dl/-/blob/master/README.md#highly-constrained-ring-systems

        Regarding the unusual substructures, these refer to MolVS-generated tautomers that Gypsum will discard. MolVS sometimes generates implausible tautomers, and throwing out inappropriate ones after-the-fact seemed like the best approach. Other, better-behaved tautomers generated from the parent compound could well be retained, though.

        Thanks for all your help with this. It’s very helpful to have user feedback to improve the program.

        Take care,

        Jacob

    • #15949
      Christian Le Gouill
      Guest

      Hello,

      Increasing thoroughness works well with most of the rejected cmpds. Thank you for your help and for creating such a useful tool.

      Best,
      Christian

    • #16247
      Jacob Durrant
      Keymaster

      Hi Christian. Re. the memory issue, you might want to try the latest version of Gypsum. I made some updates in version 1.1.7 that should reduce the amount of memory required. Take care.

Viewing 3 reply threads
  • The topic ‘a few problems…’ is closed to new replies.