Click on the section header to expand/collapse the section. Click here to expand all the sections.

Pocketome entry content

ENTRY header record

A special header record named ENTRY summarizes the information about the binding site and references to other databases. It is technically implemented as an ICM table with a header and no body (i.e. no rows or columns). Below is the list of fields that are currently implemented in the ENTRY header.

Entry name

ENTRY.name string

   NAME

Protein info

All headers are \n-separated lines of |-separated values.

ENTRY.protSwiss string

   A|NAME_ORGNSM|SWISSACC|from|to
   B|NAME_ORGNSM|SWISSACC

ENTRY.protName string

   A|Protein A name
   B|Protein B name

ENTRY.protFamily string

   A|Family of protein A
   B|Family of protein B

ENTRY.protDomain string

   Ai|Ri|from|to|region name|involved residues: (,:)-separated list of residue numbers
   Ai|Di|from|to|domain name|involved residues: (,:)-separated list of residue numbers

PDB lists

All headers are ,-separated values.

ENTRY.PDBfull string
full list of PDBs (siteFinder input)

ENTRY.PDBunprocessed string
list of PDBs that were not processed by siteFinder

ENTRY.PDBprocessed string
list of PDBs that were processed by siteFinder

ENTRY.PDBredundant string
list of PDBs that were processed by siteFinder, but considered redundant and are not present in the entry

ENTRY.PDBnonredundant string
list of non-redundant PDBs present in the entry

RMSD

ENTRY.rmsdBB real
median backbone RMSD between the pockets

ENTRY.rmsdTot real
median full-atom RMSD between the pockets

Collated set of tagged complex structures

A collated set of tagged pocket complex structures forms the heart of the Pocketome entry. The invariable and the variable parts of the crystallographic complexes within the entry are categorized as a binding site and a ligand, respectively. A binding site always contains amino-acid residues from one or more polypeptide chains, and may potentially contain non-peptidic cofactor molecules or metal ions. A ligand may be a single non-polymer molecule, a stand-alone amino-acid or nucleotide, or a part of a large polypeptide or nucleotide chain.

The polypeptide chains forming the binding site may be of one or two types, denoted as A and B. The number of chains of each type is not limited, e.g., in theory, sites formed by 3 copies of chain A and 6 copies of chain B are possible. The successive copies of chain A are denoted A1, A2, etc.; the successive copies of chain B are denoted B1, B2, etc. Chain designation is spatially consistent through all the objects in the Pocketome entry.

PDB structures vs Pocketome structures

The structures in the Pocketome entries may differ from the original PDB files by the following:

These and other changes have been made to the PDB files in the process of the Pocketome construction to bring them to a standard form and ensure that the only source of variation between the ensemble structures is the natural flexibility of the protein and the induced fit effect. Structures of identical composition are eliminated from the publicly available Pocketome entries.

The molecular objects in the Pocketome entries are served in the ICB (ICM binary) format. Molecular objects in ICM have a hierarchical architecture: each object consists of molecules which in turn consist of residues which consist of atoms. A subset of ICM language is designed to address individual atoms, residues, molecules and objects on the workspace, or various subset of these entities. Please see ICM selection reference guide for more details on molecular selections.

Pre-defined molecular tags in the Pocketome entry objects

Pocketome uses the mechanism of tagging the molecular selections, as implemented in ICM, for entry component annontation. A tag can be set on a subset of atoms, residues, molecules or objects. A tag has a name and a value of any common scalar type. Given a tagged object, not only tag values can be extracted, but also selections can be built using a tag name and (optionally) value. Please see the ICM reference guide for more details on molecular tags.

The following table contains all pre-defined molecular tags in the Pocketome entries. Every object in the entry is labeled with some or all of these tags. Tags that describe properties of the object in context of the entire ensemble are marked with (E):

Selection levelTag nameValueDescription
atom pocket "A1"/"A2"/…/"B1"/… heavy atoms of receptor chains in -1.2 vicinity[1] of the ligands of the object
atom (E) site "A1"/"A2"/…/"B1"/… heavy atoms of receptor chains in -1.2 vicinity of the ligands of all objects
atom (E) clash 0.1/0.3/0.5/0.7/0.9 score of clashes of site atoms with all the ligands;
score x means that distance between atoms is less than sum of VdW radii multiplied by (1 - x)
residue (E) freq 0.0–1.0 residue frequency in "site" selection
residue (E) psite "A1"/"A2"/…/"B1"/… "site" selection propagated over all objects
residue mut residue_name[2] mutation/insertion/addition of "psite" residue (compared with the SwissProt sequence)
residue covlig molecule_name:residue_name "psite" residue covalently bonded with the ligand
residue covmod molecule_name:residue_name "psite" residue covalently bonded with other residue (excluding its own chain and the ligand)
residue domain "AD1,AR1"/"BD1"/… domain/region annotation index[3]
residue ligand molecule_index bonded ligand molecules have the same molecule index
residue bb residues of protein chains outside (>12 Å) the binding site;
side-chains of these residues are deleted, only backbone atoms (c, o, n, ca) are retained[4]
molecule ligsz Nof(atoms) size of the ligand part of the ligand molecule
molecule lignm residue_names[5] list of residue names of the ligand (for amino chains gaps ≤ 2 aa are healed)
molecule cofac residue_names list of residue names of the cofactor
molecule metal metal ion
molecule chain "A1"/"A2"/…/"B1"/… receptor chain
object chains "A1,A2,…,B1,…" list of receptor chains
object lig1 Nof(atoms_of_ligand) objects with "good" ligand (one hetero molecule)

Notes:

↑1. -1.2 vicinity means distance cut-off of 1.2 * sum of van der Waals radii, see Sphere function.

↑2. Natural amino acids and nucleotides are abbreviated with 1-letter code (upper case), other residue names are provided as in PDB (lower case).

↑3. Detailed bioannotation of protein domains and regions is provided in the ENTRY record, see above.

↑4. Do not convert objects with "bb" residues! Either delete these residues or modify them to glycine before conversion.

↑5. Non-standard (e.g. hetero) residue names are delimited by a dot ("."). Gaps (e.g. in polypeptides) are represented by a dash ("-").

Two types of ligand/residue contact maps: binding pocket and binding site

Molecular interactions between the pocket elements and the bound are summarized in the Pocketome entries in the form of contact maps. For the purpose of Pocketome entry annotation, two atoms are considered to be in contact if and only if the distance between their centers does not exceed 120% of sum of their van der Waals radii. A residue is said to be in contact with the ligand if at least one non-hydrogen atom of the ligand contacts at least one non-hydrogen atom in that residue.

Each individual ligand makes physical contacts with some or all of the pocket residues in its cognate structure, and possibly with cofactors or metal ions if they form a part of the binding site. These interactions are summarized in the pocket contact map. This map allows quick identification of conserved or ligand-specific interactions within the binding site.

The ensemble of spatially overlaid ligands makes virtual "contacts" with binding site residues in each of the structures. These "contacts" represent what would happen if other (non-cognate) ligands from the ensemble were inserted, in their crystallographic positions, into each individual pocket conformation. The virtual "contacts" are summarized in the site contact map. Because of the natural protein plasticity and the induced fit effect, the site contact map may contain steric clashes which are encoded as a specific type of ligand-residue contact. The site contact map provides basis for the analysis of induced fit and steric compatibility of different pocket structures in the ensemble.

In the downloadable ICB files, both contact maps are stored in the ICM table format: pocketMap and siteMap. From the online entry page, the maps can also be downloaded separately in the TSV (Tab-Separated Values) format.

Pairwise structure comparison matrices

If a Pocketome entry contains an ensemble of two or more structures, their pairwise comparison is performed by several criteria and the results are summarized in a matrix form.

The two most straightforward plots show the pairwise backbone and full-atom RMSD between the pockets.

Pockets are also compared in terms of their steric compatibility with various ligands in the ensemble. For that comparison, (i) each pocket was described by a vector of steric clashes that it makes with the ligand ensemble, (ii) pocket distances were calculated as normalized absolute difference between the vectors and (iii) all pockets within the entry were clustered according to the calculated distance to form sterically cross-compatible subsets. Cluster numbers are stored in the first field of each object.

In the Pocketome ICB files, all matrices are stored in ICM matrix format: M_compatibility, M_clash, M_rmsdBB, and M_rmsdTot. Object order (stored in the second object field) is the same in the ICB file, comparison matrices and the comparison page.

HTML pages

HTML pages inside each entry visualize entry components and contain clickable links that propagate actions to the 3D graphics display window. The set of pages includes:

Each HTML page, except for help, contains common elements:

Each of the pocket and site pages contains:

The pairwise comparison page contains the graphical representation of the pairwise structure comparison matrices described above.

A set of pre-programmed views for the molecular objects

Each Pocketome entry contains two pre-programmed views (also called interactive slides). All residues of the binding site, ensemble of all ligands, metal ions and cofactors in vicinity of the ligand ensemble (if any) are shown on the first slide. Residues of an individual pocket (typically with the largest ligand) are highlighted and the protein chains are shown in full length on the second slide.

In addition of these pre-defined views, several tools exist to change/customize the visualization:

The color scheme remains untouched by these tools. Every protein chain is assigned a specific color: the master protein chain A1 is light green, chain A2 of the same protein in multimeric complexes (if any) is light blue, etc. Ligands are always shown in magenta. Metal ions and cofactors (if any) are represented with light blue and light yellow colors, respectively. This scheme is consistent between 3D and HTML parts of the entry.

Pocketome entry manipulation

Manipulating downloaded entries with ICM family software

Opening a downloaded Pocketome entry in the free ICM Browser (or in the full ICM molecular modeling suite) gives access to full online functionality plus a wide range of ICM-specific manipulations. These manipulations are summarized with examples below.

ICM access to Pocketome ENTRY header

The ENTRY header can be made visible in the ICM workspace by

   set property show on ENTRY

The fields of the ENTRY header record are accessed via ENTRY.<fieldName> syntax. For example:

To retrieve the name of SwissProt sequence(s) for the proteins forming the site:

   Field(Split(ENTRY.protSwiss, "\n"), 2, "|")

To retrieve SwissProt accession number for the master sequence:

   Field(Split(ENTRY.protSwiss, "\n"), 3, "|")

To build a list of domains/regions involved in ligand binding:

   Split(ENTRY.protDomain "\n" ) ~ "*|*|*|*|*|*"

To extract domain/region annotations in the form of a table:

   read table input="ch|dom|fr|to|name|rsel\n" + ENTRY.protDomain separator="|" header name="T_out"

ICM access to Pocketome contact maps

The contact maps, in the ICM table format, can be made visible in the ICM workspace by

   set property show on pocketMap siteMap

Following that, the contact maps can be manipulated via graphical interface. siteMap contains four embedded pairwise comparison matrices described above. Using cluster tree view one can recluster objects according to selected matrix: right click on a table cell, select 'Color By' menu item and choose matrix in the drop-down list (e.g. 'Cluster 2' for clash similarity matrix, full list is shown in the cluster array in the Header tab), then click on the corresponding matrix tab on the right panel. To change clustering cut-off, drag vertical delimiter in horizontal direction, this will change colors (both in the table and in the tree view) and values (cluster numbers) in the corresponding table column, e.g. cl_clash.

pocketMap and siteMap can be also manipulated using ICM syntax for tables. For example:

To get the list of objects without ligands:

   Sum(Replace( (siteMap.lig_Nat == 0).pdb_ch "." "") ",")

To obtain the list of of site residues:

   Sum(Replace(Field(Name(siteMap) 2 ".") ~ "*_mut" "_mut" "") ",")

To transform a contact map for the first site residue into a separate ICM table:

   s = Replace(Field(Name(siteMap) 2 ".") ~ "*_mut" "_mut" "")[1]
   S = Name(siteMap) ~ "*." + s + "*"
   n = Nof(siteMap)
   group table T_out Replace(siteMap.pdb_ch "." "") "obNm" Sarray(n Field(s 1 "_") ) "chain" Sarray(n Replace(Field(s 2 "_") "n" "-") ) "residue" $S[1] "cont" $S[2] "mut" $S[3] "info"

ICM access to Pocketome molecular tags

Pocketome selection tags can be used to address the selection or extract its properties using Select() and Field() functions in ICM, respectively. Below is an incomplete list of ICM language expressions using the Pocketome tags.

To select objects where ligand is a single hetero molecule of 5–50 heavy atoms:

   Select(a_*. "lig1>4") & Select(a_*. "lig1<51")

To list all polypeptide chains forming the binding site:

   Sum(Unique(Split(Sum(Field(a_*. "chains") ",") ",") sort) ",")

To list all polypeptide chains directly contributing to the binding site interactions (this is frequently, but not always, the same as above):

   Sum(Unique(Split(Sum(Field(Mol(Select(a_*./ "psite") ) "chain") ",") ",") sort) ",")

To obtain a non-redundant list of PDB codes in the current Pocketome entry:

   Sum(Unique(Sarray(Name(a_*.) 1 4) sort) ",")

To obtain a non-redundant list of ligand names encoded by three-letter PDB HET codes (for non-polypeptide type ligands) and/or standard one-letter codes (for amino acids and nucleotides):

   Sum(Unique(Split(Replace(Sum(Field(Select(a_*.* "lignm") "lignm") ",") "[^a-z0-9]" "." regexp) ".") !~ "" sort) ",")

To print ligand selection names with the list of residues and number of non-hydrogen atoms for each:

   msel = Select(a_*.* "lignm")
   show column Name(msel full) Field(msel "lignm") Sarray(Field(msel "ligsz") )

To show covalent modifiers of site residues:

   Res(Next(Select(a_*./ "covmod" ) bond) & !Mol(Select(a_*./ "covmod" ) ) & !Select(a_*./ "ligand") )

To retrieve the name of master SwissProt sequence:

   Name(Select(a_*.* "chain==A1")[1] swiss)[1]

To retrieve the SwissProt accession number:

   s = Name(Select(a_*.* "chain==A1")[1] swiss)[1]
   Field(Namex($s) 1 " ")

To list binding site residue numbers on chain A1:

   l_showResCodeInSelection = no
   S_lst = Unique(Field(Sarray(Select(a_*./ "psite==A1") residue) 2 "/") sort)
   sort number S_lst
   Sum(S_lst ",")

To show site consensus:

   Select(a_*./ "freq>=0.5")

To select residues by the specified domain/region s_domain using information in the ENTRY header record:
use "SH2" to select SH2, SH2-like and extended SH2 domains/regions;
use "^SH2$" to select SH2 domain only

   S_out = Split(ENTRY.protDomain "\n")
   I_out = Index(Field(S_out 5 "|") Replace(s_domain "([[*(+.+)?]])" "\\\\\\1" regexp) regexp all)
   Field(S_out[I_out] 5 "|")
   Select(a_*./ "domain~(^|*,)(" + Sum(Unique(Replace(Field(S_out[I_out] 1 "|") "[0-9]" "") + Field(S_out[I_out] 2 "|") sort) "|") + ")(,*|$)")

To retrieve the domain/region names for the specified residue selection rsel:

   S_out = Split(ENTRY.protDomain "\n")
   Unique(Field(S_out[Index(S_out Replace(Sum(Replace(Unique(Split(Sum(Field(Select(Res(rsel) "domain") "domain") ",") ",") sort) "([AB]+)([RD]+.*$)" "^\\11\|\\2\|" regexp) "|") "^$" "-----" regexp) regexp all)] 5 "|") sort)

Exporting Pocketome entry components into non-ICM formats

ICM family products support export of data in a variety of conventional formats including but not limited to:

Reference

Pocketome database is available online at http://pocketome.org . When using this server and/or this file, please cite:

Kufareva, I., Ilatovskiy, A.V. and Abagyan, R. (2012) Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Research, 2012 Jan; 40(1): D535–D540. PubMed 22080553.