Prepare a Protein for Simulation

This tutorial shows how to prepare a protein for simulation. There are expensive commercial programs like Schrodinger’s Maestro, but we here focus on free methods.

Download a Model of the Protein

There are several sources:

  1. Download an experimental structure from the PDB.
  2. Create a homology model using SWISS-MODEL. You can find the sequences of many proteins using uniprot.

Build Missing Loops

Use SL2‘s SuperLoop2. Here’s the manual.

Add Hydrogen Atoms

PDB2PQR can be used to add hydrogen atoms.

  1. Be sure to pick the output naming scheme that corresponds to the MD forcefield you’ll eventually use (probably AMBER).
  2. Select the right pH. For most proteins, 7 is fine.
  3. Download the PQR file.

Waters and Ions: CHARMM-GUI

Try CHARMM-GUI Quick MD Simulator.

  1. Select the PDB format.
  2. Under “PDB Manipulation Options,” select “Terminal group patching.”
  3. For the ion concentration, 150 mM is typically good. I prefer NaCl to KCl.
  4. Keep going through the steps. It takes a long time. In the end, download the .tgz file. It should contain what you need to run the MD simulation.

Parameterize for Amber

Unfortunately, the CHARMM-GUI output isn’t quite where it needs to be for an Amber simulation. We need to parameterize the system for amber. Start with the amber/step3_charmm2amber.pdb file. You can use tleap to parameterize this file. It’s part of AmberTools. On, you can access it by typing in module purge then module load amber/14-intel-2013-cuda-5.0. Here’s a sample input file (

source /opt/sam/rhel6/amber/amber14/dat/leap/cmd/leaprc.ff14SB
loadAmberParams /opt/sam/rhel6/amber/amber14/dat/leap/parm/frcmod.ionsjc_tip3p

mol = loadpdb step3_charmm2amber.pdb
setBox mol "vdw"

charge mol

saveamberparm mol mol.prmtop mol.prmcrd


The above won’t work if you try tleap -f, though, because the PDB file isn’t in AMBER format. You need to change some strings. This simple bash script should do the trick:

cat ${1} | sed "s/CLA CLA/Cl- Cl-/g" | sed "s/SOD SOD/Na+ Na+/g" | sed "s/TIP3/HOH /g" | sed "s/OH2 HOH/O   HOH/g" | sed "s/HSD/HID/g" | sed "s/HSE/HIE/g" | sed "s/CD  ILE/CD1 ILE/g" | sed "s/HA2 GLY/HA3 GLY/g" | sed "s/HA1 GLY/HA2 GLY/g" | sed "s/  HN  /  H   /g" | sed "s/HB2 ARG/HB3 ARG/g" | sed "s/HB2 ASN/HB3 ASN/g" | sed "s/HB2 ASP/HB3 ASP/g" | sed "s/HB2 CYS/HB3 CYS/g" | sed "s/HB2 GLN/HB3 GLN/g" | sed "s/HB2 GLU/HB3 GLU/g" | sed "s/HB2 HID/HB3 HID/g" | sed "s/HB2 HIE/HB3 HIE/g" | sed "s/HB2 LEU/HB3 LEU/g" | sed "s/HB2 LYS/HB3 LYS/g" | sed "s/HB2 MET/HB3 MET/g" | sed "s/HB2 PHE/HB3 PHE/g" | sed "s/HB2 PRO/HB3 PRO/g" | sed "s/HB2 SER/HB3 SER/g" | sed "s/HB2 TRP/HB3 TRP/g" | sed "s/HB2 TYR/HB3 TYR/g" | sed "s/HB1 ARG/HB2 ARG/g" | sed "s/HB1 ASN/HB2 ASN/g" | sed "s/HB1 ASP/HB2 ASP/g" | sed "s/HB1 CYS/HB2 CYS/g" | sed "s/HB1 GLN/HB2 GLN/g" | sed "s/HB1 GLU/HB2 GLU/g" | sed "s/HB1 HID/HB2 HID/g" | sed "s/HB1 HIE/HB2 HIE/g" | sed "s/HB1 LEU/HB2 LEU/g" | sed "s/HB1 LYS/HB2 LYS/g" | sed "s/HB1 MET/HB2 MET/g" | sed "s/HB1 PHE/HB2 PHE/g" | sed "s/HB1 PRO/HB2 PRO/g" | sed "s/HB1 SER/HB2 SER/g" | sed "s/HB1 TRP/HB2 TRP/g" | sed "s/HB1 TYR/HB2 TYR/g" | sed "s/HD2 ARG/HD3 ARG/g" | sed "s/HD2 LYS/HD3 LYS/g" | sed "s/HD2 PRO/HD3 PRO/g" | sed "s/HD1 ARG/HD2 ARG/g" | sed "s/HD1 LYS/HD2 LYS/g" | sed "s/HD1 PRO/HD2 PRO/g" | sed "s/ HD1 ILE /HD11 ILE /g" | sed "s/ HD2 ILE /HD12 ILE /g" | sed "s/ HD3 ILE /HD13 ILE /g" | sed "s/HE2 LYS/HE3 LYS/g" | sed "s/HE1 LYS/HE2 LYS/g" | sed "s/HG12 ILE/HG13 ILE/g" | sed "s/HG11 ILE/HG12 ILE/g" | sed "s/HG1 CYS/HG  CYS/g" | sed "s/HG1 SER/HG  SER/g" | sed "s/ HG2 / HG3 /g" | sed "s/ HG1 / HG2 /g" | sed "s/HG2 THR/HG1 THR/g" | sed "s/ OT1 / O   /g" | sed "s/ OT2 /OXT  /g" | sed "s/ HT1 / H1  /g" | sed "s/ HT2 / H2  /g" | sed "s/ HT3 / H3  /g"

Save that script as fix_names.bash and run bash fix_names.bash MESSED_UP_PDB.pdb > FIXED_PDB.pdb.

You still need to modify FIXED_PDB.pdb though. Find the place in the file where the HOH fields start, and add a TER before it.

Now run the tleap script above: tleap -f

Make sure the total charge of your system is 0.0! You should now have the files to use in Amber.

Run MD Simulation

If you modify the files from CHARMM-GUI (in the amber subdirectory) to read in your newly parameterized Amber files, it should run fine. The COMET supercomputer is a good choice. Here’s an example submission script:

#SBATCH --job-name="jdd_amber"
#SBATCH --output="jdd_amber.out"
#SBATCH --partition=gpu-shared
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --no-requeue
#SBATCH --gres=gpu:1
#SBATCH -t 48:00:00
#SBATCH -A csd529

module load cuda/6.5

# copy these files to luster system
rsync -vr /home/jdurrant/NA_amber /oasis/scratch/comet/$USER/temp_project/
cd /oasis/scratch/comet/$USER/temp_project/NA_amber/
rsync -vr * /home/jdurrant/NA_amber/

csh README.bash

#/share/apps/gpu/amber/pmemd.cuda -O -i mdin.GPU -o mdout.GPU.$SLURM_JOBID -x mdcrd.$SLURM_JOBID -inf mdinfo.$SLURM_JOBID -l mdlog.$SLURM_JOBID -p prmtop -c inpcrd