i want the writer to write a thesis for me i will upload a file that describes everything i want and when i finish all the results i will upload them too. if you have any question please send me an email.Article
pubs.acs.org/jcim
LEADS-PEP: A Benchmark Data Set for Assessment of Peptide
Docking Performance
Alexander Sebastian Hauser and Björn Windshügel*
Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Schnackenburgallee 114, 22525 Hamburg, Germany
S Supporting Information
*
ABSTRACT: With increasing interest in peptide-based therapeutics also
the application of computational approaches such as peptide docking has
gained more and more attention. In order to assess the suitability of docking
programs for peptide placement and to support the development of peptidespecific docking tools, an independently constructed benchmark data set is
urgently needed. Here we present the LEADS-PEP benchmark data set for
assessing peptide docking performance. Using a rational and unbiased
workflow, 53 protein−peptide complexes with peptide lengths ranging from
3 to 12 residues were selected. The data set is publicly accessible at www.
leads-x.org. In a second step we evaluated several small molecule docking
programs for their potential to reproduce peptide conformations as present in LEADS-PEP. While most tested programs were
capable to generate native-like binding modes of small peptides, only Surflex-Dock and AutoDock Vina performed reasonably
well for peptides consisting of more than five residues. Rescoring of docking poses with scoring functions ChemPLP,
ChemScore, and ASP further increased the number of top-ranked near-native conformations. Our results suggest that small
molecule docking programs are a good and fast alternative to specialized peptide docking programs.

INTRODUCTION
Protein−peptide interactions are involved in numerous cellular
processes and are estimated to account for up to 40\% of all
interactions within the cell.1 Therefore, it is not surprising that
in recent years, the development of peptide-based therapeutics
has gained increased interest in the pharmaceutical industry and
this is expected to further grow in future.2−4 Between 2009 and
2013, 10\% of the overall drug approvals were represented by
peptides and several of these therapeutics are first-in-class
drugs, such as boceprevir and telaprevir, both targeting the
hepatitis C virus.5 As of today more than 100 peptide-based
drugs have reached the pharmaceutical market and many more
are currently investigated in clinical trials.3
Computational chemistry techniques have proven to
successfully support the drug discovery process for small
molecules, for example by virtual screening.6 The adaption of
molecular modeling and docking methods for the prediction of
peptide binding modes is currently under intensive development and evaluation.7,8 In particular peptide docking is
challenging due to the large number of rotatable bonds and
the resulting high flexibility of the molecule. On the other hand,
peptides are composed of unique properties such as structural
hierarchy and physical restrictions and simplicity that can be
employed to improve protein−peptide docking strategies.7
So far, only few programs specifically designed for peptide
docking have been developed. An example is Rosetta
FlexPepDock. Several protocols are available that revealed
good performance in terms of reproducing peptide conformations of different protein−peptide X-ray crystal structures.9,10
Another approach is DynaDock which has been shown to
© 2015 American Chemical Society
perform well across a data set of 15 protein−peptide
complexes.11
In addition to specialized tools also small molecule docking
programs have been tested for peptide docking. AutoDock has
been shown to dock very short peptides (2−4 aa length) with
reasonable accuracy.12 Very recently, a modified version of the
docking program Glide performed equally accurate as the
Rosetta FlexPepDock ab initio protocol while being over 100
times faster.13
So far, all reports on peptide docking performance suffer
from missing comparability of the results as the test sets used
for assessment are not publicly accessible. Also it cannot be
excluded that these data sets are biased toward a specific tool.7
Therefore, an independently constructed and publicly available
benchmark data set of protein−peptide complexes is urgently
needed in order to compare available docking programs and to
support their further development.
For small molecules, several benchmark data sets for docking
and virtual screening exist. In order to evaluate the potential of
docking programs to reproduce binding modes as determined
by X-ray crystallography, the Astex Diverse Set comprising 85
high quality protein−ligand X-ray crystal structures can be
utilized.14 For evaluation of virtual screening performance the
Directory of Useful Decoys (DUD) is a popular benchmark
data set.15,16 An alternative for virtual screening assessment is
provided by the Demanding Evaluation Kits for Objective In
silico Screening (DEKOIS).17
Received: April 24, 2015
Published: December 14, 2015
188
DOI: 10.1021/acs.jcim.5b00234
J. Chem. Inf. Model. 2016, 56, 188−200
Article
Journal of Chemical Information and Modeling
In this study we present LEADS-PEP, the first representative
of the Lessons for Efficiency Assessment of Docking and
Scoring (LEADS) collection. LEADS-PEP is a publicly available
benchmark data set that enables the evaluation of docking
programs for their potential to reproduce peptide binding
modes and to compare different methods and parameters. The
collection consists of 53 protein−peptide complexes that have
been prepared using a rational and unbiased workflow. In a
second step we have utilized the data set for a detailed
evaluation of several popular small molecule docking programs
and scoring functions of which most have not been considered
for peptide docking so far.
AutoDock Vina24 (version 1.1.2) are Open Source software
available from the Scripps Research Institute. In addition to
GOLD’s default scoring function ChemPLP, all other
implemented scoring functions (ASP, ChemScore, and GoldScore) were also investigated.
Peptide and Protein Preparation. In order to prevent
any bias by using coordinates present in the protein−peptide Xray crystal structures, all peptides were generated in a linear
conformation (backbone torsion angles of 180°) within
SYBYL-X 2.1.1 with charged termini and minimized utilizing
Powell method with default settings. It was ensured that the
peptide coordinates of the linearized peptides do not align with
coordinates of the peptide binding site.
Protein structures were prepared using Protonate3D within
MOE. All water atoms were removed.
Docking Tools. Surflex-Dock. The protomol file for each
complex was built based on all residues within 5 Å of the
cocrystallized peptide using a threshold of 0.01 and a bloat of 0.
For docking with standard accuracy (SA) the density of search
(spindense) was set to 6.0, the number of spins per alignment
to 12 (nspin), and the additional starting conformations per
molecule (multistart) to 6. For high accuracy (HA) docking the
following settings were used: spindense 9.0, nspin 24, multistart
12. The Surflex-Dock “Total_Score” was used as the native
scoring function.25
AutoDock. AutoDockTools within MGLTools (version
1.5.6) were utilized in order to generate PDBQT format files of
the receptor and ligand. Grids maps were calculated with
AutoGrid. The grid box was defined based on the cocrystallized
ligand using a python script within MGLTools. Grid
dimensions were increased in all six directions by 13 points
(4.9 Å). All dockings were performed using the Lamarckian
genetic algorithm with the maximum number of energy
evaluations set to 2 500 000 (SA) or 25 000 000 (HA). As
AutoDock does not handle ligands with more than 32 torsion
angles, for larger peptides a recompiled version allowing up to
64 torsion angles was used.
AutoDock Vina. Grid dimensions were adopted from the
preparation for AutoDock. The exhaustiveness was set to either
8 (SA) or 100 (HA), respectively.
GOLD. The docking site was defined by all residues within 5
Å distance to the cocrystallized peptide. For each available
scoring function (ChemScore,26 ChemPLP,27 ASP,28 and
GoldScore29) a separate docking was performed. The early
termination option was switched off.
Pose Selection and RMSD Calculation. For all docking
scenarios the number of docking runs was set to 20. As a
measure of the peptide docking accuracy the root-mean-square
deviation (RMSD) for backbone atoms (N, CA, C) was
calculated using shell and SPL scripts. In order to evaluate
external scoring functions, docking poses were rescored
utilizing the rescoring option implemented in GOLD. All
four scoring functions (ASP, ChemPLP, ChemScore, GoldScore) were tested with default settings. Only the nonminimized poses were analyzed.
Figures with molecular representations were prepared using
VMD30 and POV-Ray (www.povray.org).

EXPERIMENTAL SECTION
Benchmark Data Set Generation. For generation of the
LEAPS-PEP data set, a selection process was developed (Figure
S1). At first, the Protein Data Bank (PDB)18 was queried for
peptide-bound protein X-ray crystal structures with the
following constraints: (i) the structure does not contain any
DNA or RNA, (ii) it includes experimental data, (iii) the
structure contains between two and four chains, (iv) at least
one chain is between 2 and 15 amino acids long, (v) the
resolution is < 2.0 Å, and (vi) the Rfree < 0.3. The query extracted 1376 PDB entries (as of 29/05/2015) that were downloaded. Each structure was split into its protein and peptide chains. Peptide chains were further filtered for structures that do not include any hetero atoms and are not covalently linked to the protein chain. Complexes containing hetero atoms within 4 Å of the interface between protein and peptide were removed from the set. Subsequently, PROCHECK19 was used to analyze the residue-by-residue geometry and stereochemical quality of the complexes. Structures containing atoms in close distance (30\% of the peptide residues have less than three van der Waals contacts to the protein) and/or crystallization artifacts were excluded. For most peptide lengths (3−12 residues) between five and six complexes were chosen for the data set. It was further attempted to include a broad set of peptides with different characteristics such as acidic, basic, hydrophobic, hydrophilic, or aromatic entities. The final peptide docking benchmark data set contains 53 complexes. Docking Programs and Scoring Functions. Within this study, we utilized the docking programs GOLD, Surflex-Dock, AutoDock and AutoDock Vina for the evaluation of their potential to reproduce cocrystallized peptide binding modes. Surflex-Dock21 (version 2.706.13302) is included in SYBYL-X 2.1.1 (Certara L.P., St. Louis, MO, USA). GOLD22 (version 5.2.2) was licensed from Cambridge Crystallographic Data Centre, Cambridge, UK. AutoDock23 (version 4.2.5.1) and ■ RESULTS Benchmark Data Set. We set up a workflow resulting in an unbiased selection of protein−peptide complexes with great structural and functional diversity on the basis of all publicly available X-ray crystal structures. For both proteins and 189 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 1. Overview of Peptides Included in the LEADS-PEP Benchmark Data Seta H-bond res PDB sequence heavy atoms rot bonds ring count acc don MW log P 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H KLK IAG WLF NIF IAG APT AVPI VTLV GFEP AVPA AIAV DDLYG GRFQV GETRL ASVSA DLTRP ARTKQ PQFSLW GQLGLF ALDKWD EQVSAV YPTSII VQDTRL ETVRFQS NPISDVD PTVEEVD ARTKQTA PQIINRP GPTIEEVD PQPVDSWV SLLKKLLD GAANDENY ATVRTYSC PTPSAPVPL PPPPPPPPP APPPRPPKP EAPPSYAEV KILHRLLQD SLNYIIKVKE ATSAKATQTD NFDNPVYRKT YAESGIQTDL GLLDALDLAS SDVAFRGNLLD VGYPKVKEEML DSTITIRGYVR SLARRPLPPLP ITFEDLLDYYGP RRLPIFNRISVS PPPRPTAPKPLL LTFEHYWAQLTS QLINTNGSWHIN RRNLKGLNLNLH 27 18 34 28 18 20 28 30 32 25 26 41 43 40 30 42 42 56 45 53 44 49 51 61 53 55 54 59 60 66 65 60 62 62 64 68 68 80 85 69 89 77 69 85 90 90 86 96 103 91 107 99 102 15 7 11 11 7 6 9 12 11 7 10 18 19 19 14 18 22 21 20 23 21 21 25 30 23 24 28 24 27 24 35 27 31 20 8 19 27 40 44 32 40 38 33 40 44 45 33 44 49 31 48 46 51 0 0 3 1 0 1 1 0 2 1 0 1 1 0 0 1 0 4 1 2 0 2 0 1 1 1 0 2 1 4 0 1 1 4 9 6 3 1 1 0 3 1 0 1 2 1 4 3 2 6 5 3 1 9 7 8 9 7 8 9 10 11 9 9 16 16 17 13 17 18 17 15 19 18 16 21 24 22 22 23 22 24 23 22 26 24 21 19 23 25 29 29 30 33 30 26 33 30 35 30 32 38 30 36 38 40 11 5 6 7 5 5 5 7 5 5 6 8 14 13 9 12 18 11 10 12 11 10 16 18 11 9 21 16 10 12 17 15 19 9 2 13 11 22 22 21 25 17 13 21 19 27 20 16 32 17 22 26 34 390 259 465 392 259 287 399 431 447 356 372 580 607 575 433 601 605 777 634 746 631 693 731 866 757 785 777 838 856 926 930 851 901 878 892 958 960 1136 1207 993 1254 1094 985 1205 1293 1281 1219 1345 1461 1286 1495 1397 1451 0.2 −0.4 3.5 0.1 −0.4 −1.5 1.0 1.0 −0.5 −0.4 0.6 −1.3 −1.5 −2.3 −3.3 −1.9 −4.1 0.9 −0.2 −0.5 −2.9 0.2 −2.8 −3.7 −4.0 −1.9 −5.5 −2.0 −2.7 −1.2 0.7 −6.4 −3.6 −1.2 0.0 −1.8 −2.3 −0.9 −0.4 −8.6 −4.6 −4.1 −1.3 −3.9 0.3 −3.6 −0.6 1.1 −2.5 −0.4 −1.7 −7.2 −5.7 a LEADS-PEP benchmark data set sorted by peptide length (res, number of residues). For each peptide several physicochemical properties (calculated within MOE) are listed (acc, acceptor; don, donor). peptides and/or peptides with missing atoms were discarded. LEADS-PEP includes proteins of not more than 30\% sequence identity. For generation of the current release we mainly peptides several quality measures (e.g., stereochemical properties) were considered for the selection. Complexes containing heteroatoms (e.g., buffer molecules) in close distance to the 190 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Figure 1. Overview of CPU time required by tested docking approaches for each peptide. Calculations were performed on an Intel Xeon E5-2620 CPU at 2.00 GHz. Abbreviations: SA, standard accuracy settings; HA, high accuracy settings. settings (5.9 min.). However, it should be noted here that Vina is parallelized and was run using 8 threads in this study. Computing times for peptides using GOLD with scoring functions ChemScore (CS, 8.1 min.) and ASP (8.2 min.) were slightly higher. GoldScore (GS) revealed as slowest GOLD scoring option (27.1 min.). Using SurflexSA the computing time for a peptide was approximately 13 min. With high accuracy (HA) settings the computing time increased to 42.4 min per peptide which is only slightly longer than for AutoDockSA (40.9 min.). AutoDock with high accuracy settings required by far most computing time per peptide (419.2 min.). The standard measure for assessing the accuracy of redocking performance is the root-mean-square deviation (RMSD) between docked pose and the experimentally determined conformation. Here, a docking pose was considered as nearnative conformation once its backbone RMSD is ≤2.5 Å.7 At first, we investigated the RMSD of top-scored docking poses. An overview of the deviation from the experimentally determined peptide coordinates for all programs tested on the LEADS-PEP data set is given in Table 2. Considering the median RMSD over the whole benchmark data set for the top-ranked pose, GOLD with GS scoring function revealed as most accurate docking approach (4.5 Å), closely followed by SurflexSA (4.8 Å), GOLD:CS (4.9 Å), and SurflexHA (5.0 Å). For all other tested docking approaches the median RMSD was above 6 Å. Most programs were capable to reconstruct conformations of shorter peptides (3−4 residues) quite accurately, while longer peptides often caused problems (Table 2). For 4 peptides all docking approaches correctly reproduced the experimentally determined binding modes (1B9J, 1TW6, 4C2C, 4J44), while for 18 others all programs failed to identify a native-like conformation. Using the number of near-native poses as assessment criterion, SurflexSA performed best with 38\% of the 53 top-ranked docking poses adopting a near-native conformation. The program not only correctly placed drug-like short peptides (7 out of 11) but also successfully reproduced concentrated on peptides adopting turn or coil conformations. Ten peptides contain secondary structures. The percentage of residues with secondary structure in these peptides ranges between 33 and 82\%. More detailed information on the work flow is shown in Figure S1. The outcome of our selection procedure was a data set comprising 53 high-resolution protein−peptide complexes with peptides composed of 3 to 12 residues and having between 7 and 51 rotatable bonds. Table 1 provides an overview on the data set along with some molecular properties of the peptides. Only peptides possessing between 2 and 4 residues revealed drug-like properties as defined by Lipinski’s “Rule-of-Five”,31 which is often used as a probability criterion in drug discovery to estimate oral bioavailability. All peptides possessing more than four residues featured several “Rule-of-Five” violations. In order to ensure a neutral starting structure, all peptides to be docked were generated as extended conformations (φ/ω torsion angles adopting 180°), and it was ensured that the atomic coordinates do not overlap with the binding sites. Evaluation of Small Molecule Docking Programs. In a second step the LEADS-PEP benchmark data set was utilized for a detailed analysis of the peptide docking performance of several popular docking tools, namely AutoDock, AutoDock Vina (hereafter termed Vina), Surflex-Dock (hereafter termed Surflex), and GOLD. None of these programs has been specifically designed for handling peptides or other highly flexible ligands. Settings of the programs were not significantly changed compared to those usually used for small molecule docking. In particular, this included a limited number of docking runs (20). First of all, we analyzed the CPU time required by each program (Figure 1 and Table S1). In general, the computing time increased with residue length and the time difference between shortest and longest peptide reached up to 2 orders of magnitude for the same program. With a median CPU time of 5.6 min, GOLD with ChemPLP (CP) emerged as fastest program, closely followed by Vina using standard accuracy (SA) 191 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling reached 23\% success rate. AutoDockSA and both Vina approaches were capable to reproduce 19\% of the 53 peptides correctly. GOLD:ASP (15\%) and AutoDockHA (17\%) showed worst performance. Only VinaHA, Surflex using standard and high accuracy settings as well as GOLD:GS were capable to identify native conformations of peptides containing 10 or more residues. Application of high accuracy settings for AutoDock, Vina, and Surflex did not result in improved overall performance. The median RMSD over the whole benchmark data set using AutoDockHA was almost identical compared to the approach using SA settings and the number of near-native conformations even slightly dropped. For VinaHA and SurflexHA the median RMSD was marginally higher compared to results obtained for standard accuracy settings. The number of near-native poses was identical for both Vina settings but declined by four when using SurflexHA instead of SurflexSA. The number of peptide conformations reproduced correctly by both SA and HA settings was 6 for AutoDock, 8 for Vina, and 13 for Surflex. For 2XFX the application of VinaHA and SurflexHA resulted in near-native conformations of docking poses while both programs failed to produce accurate peptide conformations when using SA settings. In case of 3UPV, 3NJG, and 4DS1, SurflexHA outperformed the same program using standard accuracy settings. For a number of protein−peptide complexes docking programs with standard accuracy settings revealed near-native poses but failed to identify a correct pose among the top-ranked conformations when used with HA settings. Two such incidences occurred when using Vina (2OY2, 3LNY), four in case of AutoDock (2OXW, 2HPL, 3D1E, 4BTB), and even seven when applying Surflex (2HPL, 3NFK, 3IDG, 4NNM, 1OU8, 2W0Z, and 1H6W). Figure 2 shows selected examples of near-native peptide conformations produced by different docking programs. VinaSA docked the largely solvent-exposed 3-mer peptide of 3BS4 correctly and the backbone RMSD compared to the X-ray crystal structure was just 0.6 Å (Figure 2A). Only the Nterminus of the peptide was not correctly placed, resulting in a larger deviation of the asparagine side chain. Although the backbone RMSD of the 2HPL pentapeptide docked with AutoDockSA is reasonably low (1.7 Å), the position of the Nterminal residue was less accurate (RMSD = 4.0) (Figure 2B). Nevertheless the docking pose revealed complete reproduction of the intermolecular hydrogen bond interaction pattern, and only the intramolecular hydrogen bond shared between the aspartate side chain and glycine backbone revealed as shifted toward the tyrosine backbone nitrogen (data not shown). The heptapeptide of 3MMG docked using VinaHA showed different orientations of both terminal residues. Since VinaHA placed four out of five central residues with high accuracy, the overall backbone RMSD was 1.2 Å (Figure 2C). However, it failed to reproduce all hydrogen bonds between protein and peptide. With exception of both terminal residues, SurflexSA positioned the nonamer peptide of 2W0Z very accurately (RMSD = 1.3 Å; Figure 2D). GOLD:GS reproduced the 1H6W peptide binding mode for seven of the ten residues with high accuracy (Figure 2E). Only the N-terminal amino acids revealed larger deviations from the X-ray crystal structure and the backbone RMSD for the whole peptide is 1.1 Å. The conformation for the 2XFX peptide generated by SurflexHA differed only by 1.4 Å from the X-ray crystal structure (Figure 2F). Coordinates of N- and Cterminus matched the crystal structure well but positions of some largely solvent-exposed residues showed larger deviations Table 2. Peptide Docking Performance as Measured by Best Scored Binding Modesa a Backbone RMSD of the best scored poses are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. near-native conformations of several longer peptides, including also two peptides comprising 11 residues. Only three other approaches reached a success rate of 30\% (SurflexHA, GOLD:CS, GOLD:GS). GOLD:CP was on third place and 192 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Figure 2. Selected examples of accurately reproduced peptide binding modes. (A) 3BS4 (peptide length: 3 aa, method VinaSA), (B) 2HPL (5 aa, AutoDockSA), (C) 3MMG (7 aa, VinaHA), (D) 2W0Z (9 aa, SurflexSA), (E) 1H6W (10 aa, GOLD:GoldScore), (F) 2XFX (11 aa, SurflexHA). For 2XFX all side chain atoms except for lysine were removed for clarity. Proteins are shown as surface (carbon atoms in gray), peptides as capped sticks (cocrystallized carbon atoms in orange, docked carbon atoms in green). resulting in a completely different orientation of the lysine side chain. Binding modes of inaccurately docked poses revealed shifting along the backbone, large reorientations of the N- and/or Cterminal region or even completely inverted peptides compared to the X-ray crystal structure conformation (Figure 3, left panel). In a next step we investigated whether other peptide conformations within the set of 20 poses generated for each peptide better agree with the reference structure. Thus, for each docking approach the peptide pose with lowest RMSD to the cocrystallized conformation was extracted. Compared to the top-ranked poses, the median RMSD for the best pose set was significantly lower (Table 3). The drop in RMSD varied between 1.5 and 5.0 Å. Four approaches (VinaHA, SurflexSA, SurflexHA, and GOLD:GS), revealed a median RMSD ≤ 2.5 Å. For all docking programs, the number of near-native poses increased for the best pose set compared to the set containing top-ranked poses (Table 3). SurflexSA performed best and identified 29 near-native poses. Results for VinaHA, SurflexHA and GOLD:GS were almost equally well, resulting in 28 correctly reproduced peptide conformations. VinaSA and GOLD:CS were able to identify 20 near-native poses. GOLD with either ASP or CP placed 17 peptides correctly. Finally, both AutoDock approaches revealed the lowest number of near-native poses (SA 11, HA 12). Furthermore, we investigated the actual number of nearnative conformations within the set of 20 docking poses (Table 4). Despite similar overall performance in terms of median RMSD (best pose) and number of near-native occurrences over the whole benchmark data set, results for VinaHA, GOLD:GS, and Surflex (both settings) largely differed. Application of VinaHA resulted in 101 (9.5\% of the set of 1060 poses) and GOLD:GS produced 183 near-native poses (17.3\%). SurflexSA and SurflexHA generated 340 (32.1\%) and 317 (29.9\%) nearnative conformations, respectively. For six peptides both SurflexSA and SurflexHA achieved the maximum number of near-native poses. In case of SurflexSA, this included not only short (3−4 res, 1B9J, 3BS4, 4C2C), but also long peptides (10−11 res, 1H6W, 1N12, 3BFW). SurflexHA generated 20 near-native poses for four short (1B9J, 3BS4, 1TW6, 4C2C), one medium-sized (3MMG), and one long (1N12) peptide. Figure 3. Identification of near-native docking poses using rescoring. (A) 1UOP (peptide length: 4 aa, method: VinaHA and ChemPLP). (B) 1SVZ (6 aa, VinaHA and ChemPLP). (C) 1OU8 (8 aa, SurflexHA and ASP). (D) 4DGY (12 aa, VinaHA and ChemPLP). Proteins are shown as surface (carbon atoms in gray), peptides as capped sticks (cocrystallized carbon atoms in orange, best-scored pose green, best rescored pose cyan). GOLD:GS achieved the maximum number of near-native poses only for drug-like peptides (1B9J, 2OY2, 4C2C, 4J44). For either AutoDock or Vina (both settings) the maximum number 193 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling set of 20 docking poses (Table 4). For example SurflexSA produced up to three near-native conformations for six peptides of which one was also top-ranked (2W0Z). Application of SurflexHA resulted in seven peptides with few near-native docking poses and for two peptides (3NJG, 1ELW) SurflexHA top-ranked these conformations. In more than a third (19) of the 53 peptides VinaHA revealed three or less near-native poses among the 20 peptide conformations. Only in four cases (3BS4, 4C2C, 4NNM, 2XFX) the top-ranked conformation was also a near-native pose. GOLD:GS revealed several peptides (15) with up to 3 near-native conformations with 4 of them topranked by GoldScore (3MMG, 4N7H, 1H6W, 3BRL). Utilization of Rescoring for Improving Peptide Docking Results. The clear discrepancy between the number of top-ranked near-native poses and existing low RMSD conformations indicated a substantial potential for improving the outcome of peptide docking scenarios for several programs. One option is to re-evaluate the docking poses using other scoring functions. The docking program GOLD provides the opportunity to rescore preexisting docking poses using its internal scoring functions. In order to evaluate whether utilization of this functionality may narrow the RMSD gap between top-ranked and best pose set, all docking poses generated during our evaluation were rescored with CP, CS, GS, and ASP, respectively. Table 5 provides a summary of the best rescoring option over the whole data set. Most programs did not show any improvement of the overall median RMSD and when using GOLD with scoring functions ASP and CS the performance even declined. Only for VinaHA (3.5 Å) the median RMSD dropped significantly. However, much more important than the overall RMSD improvement is the capability to actually increase the number of near-native poses within the top-ranked rescored conformations. Table 6 shows the comparison of near-native conformations of topscored, best poses and top-rescored poses. Results for GOLD:CP did not improve when using rescoring and for AutoDockSA the number of top-rescored poses declined compared to the top-ranked poses. For all other approaches the number of near-native poses increased between 6 and 110\%. In combination with ASP, CP or CS, VinaHA revealed best overall improvement and all three docking/rescoring combinations resulted in 21 near-native conformations, compared to only 10 considering the top-ranked pose. In combination with CP the additional near-native poses comprised peptides composed of 4 (Figure 3A), 5, 6 (Figure 3B), 7, and 8 as well as 11 and 12 (Figure 3D) residues, respectively. Also the performance of SurflexHA in combination with ASP rescoring improved significantly (38\%), resulting in 22 nearnative conformations. Additional peptides with near-native conformations are composed of five, six, eight (Figure 3C) and ten residues, respectively. In case of SurflexSA, identified as the most performant tool considering the best scored set, rescoring increased the number of near-native conformations by 10\% to 22 using either CP or CS. All other docking/rescoring approaches always resulted in less than 20 near-native conformations (Table 6). Although rescoring turned out to significantly improve the overall redocking performance for several programs, in few cases the top-rescored poses adopted non-native conformations while the best scored docking pose fulfilled the 2.5 Å criterion. Such unwanted events occurred for example when using VinaHA with CP or CS rescoring (2XFX, top-scored pose: 2.0 Å, top- Table 3. Peptide Docking Performance as Measured by Best Sampled Binding Modesa a Lowest backbone RMSD values from each docking run are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. of near-native poses per peptide did not exceed 17 (AutoDockHA, 2OY2) or 11 (VinaHA, 4J44), respectively. In several cases VinaHA, SurflexSA, SurflexHA and GOLD:GS generated only few (1−3) near-native conformations within the 194 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 4. Number of Near-Native Poses Generated by Each Docking Approacha AutoDock res 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10 10 10 10 10 11 11 11 11 11 12 12 12 12 12 sum Vina Surflex GOLD PDB SA HA SA HA SA HA ASP CP CS GS 1B9J 2OY2 3GQ1 3BS4 2OXW 2B6N 1TW6 3VQG 1UOP 4C2C 4J44 2HPL 2 V3S 3NFK 1NVR 4V3I 3T6R 1SVZ 3D1E 3IDG 3LNY 4NNM 4Q6H 3MMG 3Q47 3UPV 4QBR 3NJG 1ELW 3CH8 4WLB 1OU8 1N7F 3OBQ 4BTB 2W0Z 4N7H 2QAB 1H6W 3BRL 1NTV 4DS1 2O02 1N12 2XFX 3BFW 4EIK 3DS1 4J8S 2W10 3JZO 4DGY 2B9H 6 8 3 0 5 0 14 0 2 5 10 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57 14 17 13 2 7 0 16 0 4 13 8 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 97 3 1 6 3 2 1 4 5 0 3 6 2 4 0 0 0 0 0 0 0 3 1 0 0 0 2 2 1 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 54 9 4 10 3 2 0 6 6 2 3 11 3 2 2 1 0 0 3 0 1 2 3 0 4 3 2 6 5 0 2 0 0 0 0 3 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 101 20 2 13 20 1 0 19 10 7 20 11 9 0 10 0 0 0 3 5 17 0 19 0 17 0 8 18 6 12 1 0 16 0 14 0 1 0 0 20 0 0 1 0 20 0 20 0 0 0 0 0 0 0 340 20 0 16 20 0 0 20 19 0 20 4 2 0 9 2 0 0 3 4 0 0 18 10 20 0 5 13 3 1 0 0 12 0 13 0 0 0 0 16 1 0 11 0 20 15 19 1 0 0 0 0 0 0 317 20 20 5 6 9 2 10 1 1 8 14 0 3 1 0 1 1 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 106 20 20 15 11 4 0 14 0 0 10 17 0 6 0 0 2 6 0 1 0 3 1 0 0 0 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 134 20 20 17 14 18 3 8 1 0 13 16 1 4 0 0 5 4 0 1 0 6 2 2 0 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 158 20 20 15 9 1 3 19 4 5 20 20 0 6 2 0 3 7 0 0 0 1 8 0 1 1 0 7 2 3 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 183 a Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. rescored pose 3.8 Å), SurflexSA with CS (1OU8, 1.7 vs 2.8 Å; 2W0Z, 1.3 vs 4.3 Å), SurflexHA with ASP (4QBR, 1.2 vs 12.3 Å) or GOLD:GS with ASP (3BS4, 0.9 vs 5.4 Å; 4QBR, 1.9 vs 11.5 Å). 195 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling docking. We restricted our analysis to docking/rescoring approaches with at least 33\% success rate. The data set for analysis included VinaHA:CP, SurflexSA:CS, SurflexHA:ASP, and GOLD:GS:ASP. At first, we evaluated whether an increasing number of rotatable bonds (see also Table 1) has negative effects on the docking performance (Figure 4A). This was true for all programs, in particular the performance of GOLD:GS:ASP was highly dependent on the number of rotatable bonds. Also we evaluated the impact of the peptide conformation in the bound state on the docking performance. For this purpose the ratios between maximum Cα−Cα distances for bound and linearized peptides were determined. A low ratio indicates a more folded peptide while a high ratio reveals a linear conformation. Within the LEADS-PEP data set 4DGY (bound/linear ratio: 0.13), 4J8S (0.42), and 3DS1 (0.43) possess lowest elongation while several peptides are fully stretched when in complex with their target protein (4C2C, 1.02; 3OBQ, 1.09; 2W0Z, 1.26). Except for VinaHA:CP, we observed a correlation between elongation and RMSD for both Surflex approaches as well as GOLD (Figure 4B). There was also a clear trend for these three approaches when investigating the number of intramolecular hydrogen bonds in the peptides. These are expected to occur in peptides with more condensed conformation. Within LEADSPEP peptides contain between zero and nine such hydrogen bonds. As shown in Figure 4C, an increase in intramolecular hydrogen bonds correlated with a loss in docking accuracy for Surflex and GOLD. Recently, it has been suggested that the presence of free charged side chains strongly correlates with docking success.13 In our data set, the majority of peptides possesses no (36) or a single (14) free charged side chain. Of the four best docking/scoring options, only VinaHA:ASP performance revealed as strongly dependent on the number of free charged side chains (Figure 4D). Table 5. Overview of Best Docking/Rescoring Combinationsa ■ DISCUSSION For determination of binding modes of small molecules at their protein target or for identification of bioactive molecules from a set of active and nonactive compounds current docking programs and scoring functions have reached an acceptable performance.33,34 However, docking of highly flexible peptides still remains a computational challenge and only few specific peptide docking programs have been developed so far. Up to now, no standards regarding an intermethod comparison of docking and scoring performance have been established as has been done for small molecule docking and screening.14−16 In order to evaluate their tools, developers of peptide docking programs often have used self-constructed benchmark data sets, therefore a biased selection cannot be completely excluded and usually the prepared structures are not available for other researchers. Reconstruction of these data sets is error-prone, as the preparation procedure of the proteins (e.g., protonation states, amide and histidine side chain corrections, inclusion of water molecules) may differ. Therefore, we created a benchmark data set for peptide docking with several advantages: (a) LEADS-PEP is publicly available at www. leads-x.org. (b) It is not biased toward any docking program. (c) It is ready-to-use as fully prepared protein and peptide structures are provided. Our collection contains 53 highresolution protein−peptide complexes with peptide lengths ranging from 3 to 12 residues. The selection process was guided by quality and sequential diversity of the structures using an objective and reproducible workflow. Starting from a large set a Backbone RMSD of the best docking/scoring combinations are shown in a gradient color code. Highly accurate poses (10.0 Å) is highlighted in dark red. Abbreviations: res, residues; SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. Molecular Properties Determining Peptide Docking Success. In a last step we intended to investigate molecular properties of the peptides influencing the success of peptide 196 DOI: 10.1021/acs.jcim.5b00234 J. Chem. Inf. Model. 2016, 56, 188−200 Article Journal of Chemical Information and Modeling Table 6. Overview of Near-Native Conformations Obtained with Different Approaches and Best Rescoring Optionsa AutoDock best score best pose rescoring best rescoring options Vina Surflex GOLD SA HA SA HA SA HA ASP CP CS GS 10 11 9 ASP CS 9 12 11 CS 10 20 12 CS CP GS 10 28 21 CP CS ASP 20 29 22 CS CP 16 28 22 ASP 8 17 12 CS 12 17 12 CP 16 20 17 GS 16 28 18 ASP CS a Number of near-native conformations obtained for top-ranked docking pose (best score), pose with lowest RMSD (best pose), and top-rescored pose (rescoring). Best rescoring options are sorted by overall median RMSD. Abbreviations: SA, standard accuracy; HA, high accuracy; ASP, Astex Statistical Potential; CP, ChemPLP; CS, ChemScore; GS, GoldScore. Figure 4. Evaluation of factors affecting performance for best performing docking/rescoring options. (A) Number of rotatable bonds (VinaHA:CP, Pearson’s correlation coefficient r = 0.31, two-tailored p value = 0.02*); SurflexSA:CS, r = 0.41, p = 0.003**; SurflexHA:ASP, r = 0.37, p = 0.007**; GOLD:GS:ASP, r = 0.51, p = 0.0001***). (B) Ratio between maximum Cα−Cα of bound and linear peptides (VinaHA:CP, r = 0.02, p = 0.89; SurflexSA:CS, r = 0.40, p = 0.003**; SurflexHA:ASP, r = 0.41, p = 0.002**; GOLd:GS:ASP, r = 0.40, p = 0.003**). (C) Number of intramolecular hydrogen bonds (VinaHA:CP, r = 0.10, p = 0.46; SurflexSA:CS, r = 0.48, p = 0.0003***; SurflexHA:ASP, r = 0.54, p = Purchase answer to see full attachment




Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.