Dimorphite-DL adds hydrogen atoms to molecular representations, as appropriate for a user-specified pH range. It is a fast, accurate, accessible, and modular open-source program for enumerating small-molecule ionization states.

Tips for Grad School

I recently got an email from a thoughtful undergraduate who’s going to grad school. He asked for some general advice. Here are my thoughts, for what they’re worth.

Whose Lab Should I Work In?

  1. When picking a lab, pick based on the lab’s principal investigator (PI) more than the specific project. Projects change. Look for a PI who will provide the supportive environment you need to grow as a scientist.
  2. Different PIs have different managerial styles. Some are micromanagers. Others are so hands off you might not even see them that often. Similarly, some students thrive when micromanaged, and others thrive when given complete independence. There’s not just one right way to manage a lab. But do be sure to find the right kind of PI for you.
  3. Labs with more senior PIs (tenured, for example) often have well established projects and hierarchies. A lot of students benefit from that. Senior PIs also tend to be better known in the community, which can be helpful when job-application time comes around.
  4. Labs with more junior PIs often (but not always!) afford students more opportunities to craft their own projects. Working for a younger professor often provides fun opportunities to help “grow the lab.” Younger PIs tend to work closer with grad students, too, though there are certainly exceptions. Excellent graduate students can also play important roles in helping a younger PI establish his or her lab. Letters of recommendation that cite concrete examples like that can be very powerful!

How Many Hours Should I Work as a Grad Student?

If you’re a graduate student, your career has begun! That means this is a full-time job. You should work at least 60 productive hours a week. Working more may lead to more career-advancing publications, but few can manage 80 hours a week. Don’t burn out!

There is a certain amount of random luck when it comes to getting a dream job after graduation. But there definitely is a correlation (even if it isn’t perfect) between how hard you work as grad student and how well you do professionally afterwards. Even if your PI isn’t counting the minutes you’re in the lab, don’t cheat yourself by working less than you should.

Try to Get Your Own Funding

Even if your lab is well funded, try to get a fellowship of your own. Even if you don’t get it, it’s good writing practice. And, regardless of your career plans, it’s impressive to see that kind of thing on a graduate C.V.’s.

Be Open to Jobs Outside of Academia

I worry that not enough graduate students consider careers outside of academia. I worry we don’t prepare them well enough for careers in industry. While I obviously love academia, there are many advantages to working elsewhere. Consider doing a summer internship with a biotech company during your PhD if your PI can spare you.

Work on Multiple Projects

Given that projects often fail, it’s good to work on multiple projects as a graduate student. Here’s some good advice from my graduate PI, J. Andrew McCammon:

Your main project at any given time should be one that is well-defined and that is solvable within 6 months. As you become established you may want to add a second project of medium difficulty, and then one of greater difficulty that would have a very great impact if completed… As soon as your main, straightforward project is underway, start on your second, slightly more ambitious project. If you run into problems with one project, you can set it aside for a day or two while you work on the other one. Then you can take a fresh look at the troublesome project.

At the same time, don’t take on so many projects that you can’t finished any of them. An unfinished project has the same career-advancing value as no project.

Spend Time Both Reading and Writing

It’s easy to focus so much on doing experiments that you don’t spent time reading about others’ science. Writing down your own research plan also keeps you organized and makes it easier to write manuscripts when the time comes. Don’t neglect both your scientific reading and writing!

Manage Your Time Carefully

Again, from Dr. McCammon:

Set daily, weekly and long-term goals. Review your goals and progress frequently. If you are consistently falling short of your goals, try keeping track of how you actually spend your time for a few days. Use this inventory to help plan adjustments. Know when to be meticulous and when you can be sloppier. If you’re banging your head against an obstacle in your research, step back, rethink the problem and try to go around the obstacle instead of through it.

Don’t Get Discouraged!

Grad school can be very difficult. There certainly are days and weeks where you will feel like you’re not getting anything done. But if you’re persistent, things will move forward eventually. Don’t get discouraged!

I hope these ideas help!




BlendMol is a Blender plugin that can easily import VMD 'Visualization State' and PyMOL 'Session' files. BlendMol empowers scientific researchers and artists by marrying molecular visualization and industry-standard rendering techniques. The plugin works seamlessly with popular analysis programs (i.e., VMD/PyMOL). Users can import into Blender the very molecular representations they set up in VMD/PyMOL.

MD Simulations: Analysis and Ideas

The purpose of this document is to briefly describe how to analyze an MD simulation in layman’s terms. It also offers general simulation ideas.

Analysis Techniques

Root Mean Square Deviation

How similar is the shape of each simulated protein conformation to some reference conformation? The conformation of the first simulation frame is a good choice for the reference. The RMSD should start to stabilize after you’ve been simulating for a while, indicating that the system is properly equilibrated. (Link)

Root Mean Square Fluctuation

How wiggly is each protein residue (e.g., amino acid) over the course of the simulation? (Link)


By eliminating protein conformations that are very similar, clustering generates an ensemble of particularly distinct conformations. (Link)

FTMAP Hotspot Analysis

You can use FTMAP to identify druggable hotspots. One approach is to apply FTMAP to representative structures identified through clustering, to account for protein flexibility.

Relaxed-Complex Scheme

Docking small-molecules into various conformations extracted from the simulation via clustering is also a good approach for identifying novel chemical probes. This kind of virtual screen is called the “relaxed complex scheme.”

Ensemble Electrostatics

The electrostatic potentials surrounding the protein can determine how some small molecules bind. You can also calculate ensemble-averaged versions of those potentials that may be more realistic. APBS and Delphi are two programs for calculating potentials.

Principal Component Analysis

Protein motions are very complex. PCA presents a simplified representation of these motions. Some of the minor motions are lost, but the larger-scale motions are still represented. You can project simulated conformations onto the first two principal components (2D graph of the complex motions), or you can morph a model of the protein itself according to the principal components. (Link)

Distance-Based Measurements

It can be helpful to measure the distance between two atoms over the course of a simulation. For example, you can monitor the distance between carboxylate and amine groups to see hydrogen bonds forming and breaking.

Measuring Pocket Volumes

Using the POVME algorithm, you can measure the volume of a given pocket over the course of the trajectory. Sometimes the largest-volume pocket can be useful for drug-discovery projects. POVME is also good at identifying cryptic pockets.

Analyze Hydrogen-Bond Networks

What somewhat-distant residues might influence the motions of the binding pocket through hydrogen-bond networks? HBonanza is a tool for measuring just that.

Pathways of Correlated Motions

You can also analyze the pathways of correlated motions that might connect distant residues. This can sometimes reveal allsoteric mechanisms. WISP is the tool to use.

Kinds of MD

There are endless flavors of MD. I’m going to put a list here, in case you want to research further. (If you do, please feel free to send brief summaries of each method so I can paste them here!)

  • Regular (vanilla) MD.
  • Accelerated MD (McCammon group)
  • Coarse-grained MD (awesome, though I haven’t done much with it)
  • Metadynamics
  • Replica exchange MD
  • Umbrella sampling
  • Markov-State-Model guided MD (amazing)
  • WESTPA (would like to start using)
  • Implicit-solvent MD
  • Brownian dynamics (not really MD, but relevant)

Please send additional methods and/or descriptions if you get a chance. Thanks!

Reasons to Simulate

“Could there be cryptic binding pockets in my protein that aren’t evident in any crystal structure?”

“I’d like to use multiple protein conformations for drug discovery, but all the crystal structures look the same. I’ll simulate to get more conformations!”

“I want to use something better than a docking score to predict ligand binding. Why not use an alchemical method like free-energy purturbation or thermodynamic integration?”

“Could understanding the dnyamics of a certain region of the protein reveal its molecular mechanism?”

“I’ve got two similar proteins, but they do different things. Maybe they are evolutionarily related, or one is a mutant of the other. Can dynamics reveal why they behave differently?”

“My protein has some crazy allosteric mechanism. Might MD simulations reveal the subtle shifts in correlated residue motions that transmit the allosteric signal?”

“I want to engineer a protein to do something new. Can I predict how mutations will change its function before I make the actual protein and test it experimentally?”

“I think I know how a small molecule binds to my protein, but I’m not sure. If I simulate it, will it slip in the binding pocket (probably the wrong pose!), or will its pose remain stable?”

Please send more ideas! I want this page to help with future brainstormig.

Reasons not to Simulate

“I want to see some large conformational changes.” It ain’t gong to happen on MD timescales.

“I want to know something for certain.” You always need experimental validation. You can test something from the literature (already experimentally demonstrated), or you can get a collaborator for prospective validation.

“I want to simulate some huge system.” Probably not going to happen, unless you can get a PRAC.

Running Calculations on CRC Resources


This brief tutorial shows how to run calculations on Center for Research Computing (CRC) resources.

1. Copy Your Files to the CRC

CRC resources run on Unix. If you’re also running some form of Unix (e.g., Ubuntu, macOS), it’s easiest to use the rsync command. rsync copies files between computers intelligently. For example, it only copies files if they don’t already exist on the remote computer. Here are some examples of use:

rsync -vrz /local/directory/with/files my_user_name@nullh2p.crc.pitt.edu:/crc/destination/directory/

Let’s parse that command line.

  1. rsync: The Unix command.
  2. -vrz: Tell me everything you’re doing verbosely, recurse into subdirectories, and compress files before you transfer them. Note: If you’re transferring files over a fast network, compressing them may actually take more time than just transferring them uncompressed.
  3. /local/directory/with/files: The path to the directory on your computer that you want to copy to the CRC.
  4. my_user_name: Your username on the CRC system.
  5. h2p.crc.pitt.edu: The URL that points to the CRC.
  6. /crc/destination/directory/: The remote CRC directory where you want to copy your directory/files.

So the above command will copy the /local/directory/with/files directory and all of its contents to /crc/destination/directory/files/ on the CRC.

You can also copy single files. In that case, the -r flag isn’t needed:

rsync -vz /local/file.ext my_user_name@nullh2p.crc.pitt.edu:/crc/destination/directory/

You can also use the open-source GUI program Filezilla to copy files to the CRC. You’ll need to use SFTP - SSH File Transfer Protocol. But Filezilla doesn’t use rsync‘s intelligent transfer strategy. Transfers will often take longer.

2. Log into the CRC

With your files copied, you are ready to log into the CRC. Logging in will allow you to use CRC’s Unix as if you were sitting at one of their terminals. We use the ssh command:

ssh my_user_name@nullh2p.crc.pitt.edu

Once you enter your password, you’ll be able to type Unix commands that will be run on CRC’s system, not your local computer.

3. Change to the CRC Directory Where You will Run your Calculations

It’s just Unix, so you’ve got this:

cd /crc/destination/directory/files/

4. Make a Submission Script

Supercomputers and computer clusters are organized differently than laptops and desktops. The CRC computer you log into is called the login node. It’s used for file copying/moving, limited file processing, and telling the CRC system how to run your heavy-duty calculations. But don’t actually run those calculations on the login node!

Other CRC users are logged into the same node, though you can’t see them. If you run heavy calculations on the login node, it will slow everyone down. They might think to themselves, “Who is the jerk hogging the login node?” You might even get an angry email from the system administrator. Of course this has never happened to me.

Fortunately, there are many compute nodes connected to the login node. That’s where you want to actually run your calculations. But you need to tell the login node how to farm your calculations out to the compute nodes. That’s what a submission script is for.

There are several programs that manage compute jobs on super computers. Common ones include SLURM and PBS. CRC happens to use SLURM. Let’s parse a SLURM submission script for running a NAMD molecular dynamics simulation, which we’ll call namd_submission.sh.

NAMD Submission Script

#SBATCH --job-name=shroom2
#SBATCH --output=shroom2.out
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=28
#SBATCH --time=72:00:00
#SBATCH --cluster=mpi
#SBATCH --partition=opa

module purge
module load intel/2017.1.132 intel-mpi/2017.1.132
module load namd/2.12


mpirun -n $SLURM_NTASKS namd2 prod19.conf > prod19.conf.out

The SLURM Header

  1. #!/bin/bash: Tells CRC we’ll be using bash commands.
  2. #SBATCH --job-name=shroom2: The name of our job will be “shroom2”
  3. #SBATCH --output=shroom2.out: Write any errors to a file called shroom2.out
  4. #SBATCH --nodes=8: Ask for 8 compute nodes. Note that many jobs can only run on a single node (e.g., AutoDock Vina). MPI NAMD happens to be able to spread its calculations over multiple nodes, if it’s compiled as an MPI-compatible executable (details not so important).
  5. #SBATCH --ntasks-per-node=28: Ask for 28 processors on each node. It’s kind of like how our computer bob has 56 processors. Each compute node has multiple processors too.
  6. #SBATCH --time=72:00:00: Let the job run for 72 hours. The system won’t let you be a total compute hog. You can’t just let jobs run forever. Try to be as accurate as you can in guessing how long your calculation will take, and then add a little time. Shorter jobs start running faster, so there’s an advantage to accurately estimating how long your calculations will take.
  7. #SBATCH --cluster=mpi: CRC has several clusters, or “compute-node clubs.” Each cluster is made of compute nodes designed to accommodate different kinds of jobs. The mpi cluster is good for jobs that spread over multiple compute nodes, like MPI NAMD jobs.
  8. #SBATCH --partition=opa: Not sure… good to check CRC documentation.


Most people don’t use NAMD, so it doesn’t make sense to load it for every user. CRC uses a module system so users can load only the programs they need.

  1. module purge: Get rid of any currently loaded modules.
  2. module load intel/2017.1.132 intel-mpi/2017.1.132: Load modules NAMD needs.
  3. module load namd/2.12: Load the NAMD module itself.

Running your Program

Now you type the commands you want to run on the compute node, to actually perform your calculation. MPI NAMD needs some environment variables set:


The MPI NAMD command is a bit complicated:

mpirun -n $SLURM_NTASKS namd2 my_namd_conf_file.conf > my_namd_conf_file.conf.out

  1. mpirun: An executable program for running other executable programs such that they can take advantage of multiple nodes. NAMD happens to be able to do just that.
  2. -n $SLURM_NTASKS: The number of processors to use. SLURM itself provides this information via the $SLURM_NTASKS variable, which is set based on the parameters in the submission-script header.
  3. namd2: The NAMD executable, available because you loaded the namd/2.12 module.
  4. my_namd_conf_file.conf: The NAMD config file.
  5. >: A Unix command that says “put whatever you were going to print to the screen in a text file instead.”
  6. my_namd_conf_file.conf.out: The text file where the output will be written.

Other Submission-Script Examples

Here we’ll put examples of other submission scripts in the future.

Submitting the Job

You’ve created the submission script, so now tell SLURM to use it. The sbatch command does just that:

sbatch namd_submission.sh

Your job heads off to the SLURM scheduler, which figures out when to start running it based on the parameters in your header and the needs of other CRC users.

How Goes My Job?

You can check on the progress of your job using CRC’s crc-squeue.py command. It wraps around and formats SLURM‘s own squeue command. Thanks, CRC uber nerds! Here’s the output of that command:

  JOBID PAR                                NAME ST         TIME  NODES CPUS     NODELIST(REASON)

  JOBID PAR                                NAME ST         TIME  NODES CPUS     NODELIST(REASON)
  31629 opa                             shroom2  R         0:05      8  224         opa-n[60-67]

  JOBID PAR                                NAME ST         TIME  NODES CPUS     NODELIST(REASON)

Take a look at the ST (state?) column. “R” Means our job is running on the compute nodes. I think “PD” means it’s waiting to run, and “C” means it’s been canceled. If you don’t see your job listed, it must have already finished, either because it completed, it ran into an error, or you ran out of your alloted time.

What if I want to Cancel my Job?

SLURM‘s scancel is good at that. The CRC uber nerds also made crc-scancel.py, which works even better on their system.

Copying Output Files Back to My Local Computer

Once your calculation is done, you can copy all the files back to your local computer using the rsync command. Let’s say you’re using your local computer’s Unix command line and want to copy the files from the CRC to your computer:

rsync -vrz my_user_name@nullh2p.crc.pitt.edu:/crc/destination/directory /local/directory/

FYI: In a perfect world, if you were using CRC’s login node, you could also copy files to your local computer:

rsync -vrz /crc/destination/directory my_user_name@nullurl.of.my.computer:/local/directory/

But, if you’re using bob the computer, good luck getting past the firewall! It’s much easier to use bob‘s command line to copy files from the CRC, rather than using the CRC’s command line to copy files to bob.

Reminder: You can also use Filezilla, but I don’t recommend it.


The Yeast-Beast Pipeline

We here in the Durrant Lab are excited to be adding an experimental component to our work! Graduate student Jennifer Walker, a.k.a. the “yeast whisperer,” has been setting up yeast-evolution experiments. Our goal is to force yeast to evolve resistance to anti-cancer and anti-parasitic drugs.

Resistance-conferring mutations often affect the drug-binding protein (i.e., the drug “target”). To identify the target, we’ll sequence the genomes of the resistant yeast to discover which protein has changed. Identifying a drug’s target in yeast will teach us about the same target in cancer or parasite cells.

We will leverage this information to improve our computational techniques. The goal is to better understand drug mechanisms of action and to identify new small molecules that are perhaps even more potent.

Jen took some neat pictures of her ongoing work. Hope you enjoy!





Pyrite is a Blender plugin that imports the atomic motions captured by molecular dynamics simulations. Using Pyrite, these motions can be rendered using Blender's state-of-the-art computer-graphics algorithms. All 3D protein representations (e.g., surface, ribbon, VDW spheres) are supported. Aside from advancing scientific and collaborative objectives, Pyrite renderings will also appeal to students and non-specialists.

Software Licenses to Use

Do you use a copylefted library? (e.g., GPL)

Then your program also needs to be GPL. It’s a viral license. Here are some examples:

Is the library GPLv2 without “GPLv2 or any later version” clause?

Consider finding a different library. This is bad… too restrictive.

Is the library GPLv3 or GPLv2 with the clause?

Ok, use GPLv3 for your software. Add this to the start of every file (per recommendations here):

<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year>  <name of author>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also copy from this page into LICENSE.txt.

No copylefted libraries? (e.g., just permissive)

Then I recommend using a permissive license (Apache 2.0 License), so others can use your code in both permissive and copylefted derivative works.

Helpful article: “If you want the widest possible distribution and adoption, fewest restrictions on users, open and transparent source code, peer review, community contributions to the codebase, and easy incorporation of your code by others… then a permissive FOSS license such as the BSD/MIT, Apache, or ECL licenses may work well. Because of the few requirements on users, these licenses are amongst the easiest to apply and administer, and promote unfettered incorporation of your code into other software—including copyleft or commercial software. Despite their general permissiveness, they do assure continued author attribution in any and all redistributions or derivative works.”

Add this to the start of every file (per recommendations here):

Copyright [yyyy] [name of copyright owner]

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at


Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.

Also copy from this page into LICENSE.txt.

Here are some notes on permissive libraries, that helped me settle on Apache2:

But you do want your permissive license to be compatible with GPL-licensed libraries.

Script LicenseModule is GPLv2Module is GPLv3Notes
Apache2YesOkFor large programs; no patent treachery.
X11 (MIT)OkOkFor small programs.
BSD 3-ClauseOkOkNo protection against patent treachery. [Apache2 preferable]

Let’s compare the features of these licenses. Taken from this source.

LicenseLinking (Library)DistributionModificationPatent grantSublicensingTM grant
X11 (MIT)YesYesYesManuallyYesManually
BSD 3-ClYesYesYesManuallyYesManually

Compatibility Questions

GPLv2 vs. GPLv3 is but another complication. Some licenses are compatible with one but not the other (e.g., MDAnalysis is GPLv2).

Script LicenseModule is GPLv2Module is GPLv3Notes
Apache2NoOkFor large programs; no patent treachery.
X11OkOkFor small programs.
BSD 3-ClauseOkOkNo protection against patent treachery. Apache2 preferable?

* Ok only if using a GPLv2 version that allows you to follow the terms of subsequent GPL versions (MDAnalysis is ok).


This page describes how to pick a software license when releasing new software.

Licenses to Avoid

David A. Wheeler makes a compelling case for why all open-source software should be GPL-compatible. This is especially applicable to Python scripts, since using a library that is GPL licensed (e.g., MDAnalysis) is only allowed if the script itself is also GPL licensed. So NEVER use these GPL-incompatible licenses:

  • Original BSD license (4-clause BSD license) (BAD!!!)
  • Apache License (version 1.0 or 1.1) (BAD!!!)
  • Mozilla Public License (MPL) version 1.1 (BAD!!!)
  • MIT License. (You can integrate MIT-licensed code into a GPL-licensed project, but you can’t integrate GPL-licensed code into an MIT-licensed project.) (BAD!!!)

Compatibility Questions

GPLv2 vs. GPLv3 is but another complication. Some licenses are compatible with one but not the other (e.g., MDAnalysis is GPLv2).

Script LicenseModule is GPLv2Module is GPLv3Notes
LGPLv3Ok*OkMostly used for software libraries.
Apache2NoOkFor large programs; no patent treachery.
X11OkOkFor small programs.
BSD 3-ClauseOkOkNo protection against patent treachery. Apache2 preferable?

* Ok only if using a GPLv2 version that allows you to follow the terms of subsequent GPL versions (MDAnalysis is ok).


Let’s compare the features of these licenses. Exclusionary criteria are shown in bold. Taken from this source.

LicenseLinking (Library)DistributionModificationPatent grantSublicensingTM grant
GPLv3GPLv3 onlyYes*Yes*YesNo*Yes
LGPLv3W/ restrictionsYes*Yes*YesNo*Yes
BSD 3ClYes+Yes+Yes+ManuallyYes+Manually

* Because copylefted.
+ Because permissive.


  • Sublicensing means modified code may be licensed under a different license (for example a copyright). In the case of permissive licenses, this means derivatives works don’t even have to be open source.
  • You want companies to be able to use your software. All these licenses allow for private use. Users can modify the code for internal use, e.g. by a corporation, without sharing with community. They just can’t include the code in their own distributed software packages.
  • You want this software to be accessible to others through a website (server side) without having to release the website code. These licenses shouldn’t prevent that. In fact, the AGPL license was created specifically to avoid this perceived “loophole” (so avoid that one!). (additional source)


GPLv3 is the way to go unless you you’re using a GPLv2 library that DOES NOT have the “GPLv2 or any later version” clause. Note that MDAnalysis does have that clause.

If you’re using a GPLv2 library that doesn’t have that clause, I suggest looking for a different library. But I think you could use GPLv2 for your software, as long as you do include that clause.

Note that GPLv3 can be found here. That page recommends placing the notice at the top of each file: “It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the “copyright” line and a pointer to where the full notice is found.”

Here’s the text:

<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year>  <name of author>

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.