Click on the section header to expand/collapse the section. Click here to expand all the sections.
A special header record named ENTRY summarizes the information about the binding site and references to other databases. It is technically implemented as an ICM table with a header and no body (i.e. no rows or columns). Below is the list of fields that are currently implemented in the ENTRY header.
All headers are \n-separated lines of |-separated values.
A|Protein A name B|Protein B name
A|Family of protein A B|Family of protein B
Ai|Ri|from|to|region name|involved residues: (,:)-separated list of residue numbers Ai|Di|from|to|domain name|involved residues: (,:)-separated list of residue numbers
All headers are ,-separated values.
full list of PDBs (siteFinder input)
list of PDBs that were not processed by siteFinder
list of PDBs that were processed by siteFinder
list of PDBs that were processed by siteFinder, but considered redundant and are not present in the entry
list of non-redundant PDBs present in the entry
A collated set of tagged pocket complex structures forms the heart of the Pocketome entry. The invariable and the variable parts of the crystallographic complexes within the entry are categorized as a binding site and a ligand, respectively. A binding site always contains amino-acid residues from one or more polypeptide chains, and may potentially contain non-peptidic cofactor molecules or metal ions. A ligand may be a single non-polymer molecule, a stand-alone amino-acid or nucleotide, or a part of a large polypeptide or nucleotide chain.
The polypeptide chains forming the binding site may be of one or two types, denoted as A and B. The number of chains of each type is not limited, e.g., in theory, sites formed by 3 copies of chain A and 6 copies of chain B are possible. The successive copies of chain A are denoted A1, A2, etc.; the successive copies of chain B are denoted B1, B2, etc. Chain designation is spatially consistent through all the objects in the Pocketome entry.
The structures in the Pocketome entries may differ from the original PDB files by the following:
These and other changes have been made to the PDB files in the process of the Pocketome construction to bring them to a standard form and ensure that the only source of variation between the ensemble structures is the natural flexibility of the protein and the induced fit effect. Structures of identical composition are eliminated from the publicly available Pocketome entries.
The molecular objects in the Pocketome entries are served in the ICB (ICM binary) format. Molecular objects in ICM have a hierarchical architecture: each object consists of molecules which in turn consist of residues which consist of atoms. A subset of ICM language is designed to address individual atoms, residues, molecules and objects on the workspace, or various subset of these entities. Please see ICM selection reference guide for more details on molecular selections.
Molecular interactions between the pocket elements and the bound are summarized in the Pocketome entries in the form of contact maps. For the purpose of Pocketome entry annotation, two atoms are considered to be in contact if and only if the distance between their centers does not exceed 120% of sum of their van der Waals radii. A residue is said to be in contact with the ligand if at least one non-hydrogen atom of the ligand contacts at least one non-hydrogen atom in that residue.
Each individual ligand makes physical contacts with some or all of the pocket residues in its cognate structure, and possibly with cofactors or metal ions if they form a part of the binding site. These interactions are summarized in the pocket contact map. This map allows quick identification of conserved or ligand-specific interactions within the binding site.
The ensemble of spatially overlaid ligands makes virtual "contacts" with binding site residues in each of the structures. These "contacts" represent what would happen if other (non-cognate) ligands from the ensemble were inserted, in their crystallographic positions, into each individual pocket conformation. The virtual "contacts" are summarized in the site contact map. Because of the natural protein plasticity and the induced fit effect, the site contact map may contain steric clashes which are encoded as a specific type of ligand-residue contact. The site contact map provides basis for the analysis of induced fit and steric compatibility of different pocket structures in the ensemble.
In the downloadable ICB files, both contact maps are stored in the ICM table format: pocketMap and siteMap. From the online entry page, the maps can also be downloaded separately in the TSV (Tab-Separated Values) format.
If a Pocketome entry contains an ensemble of two or more structures, their pairwise comparison is performed by several criteria and the results are summarized in a matrix form.
The two most straightforward plots show the pairwise backbone and full-atom RMSD between the pockets.
Pockets are also compared in terms of their steric compatibility with various ligands in the ensemble. For that comparison, (i) each pocket was described by a vector of steric clashes that it makes with the ligand ensemble, (ii) pocket distances were calculated as normalized absolute difference between the vectors and (iii) all pockets within the entry were clustered according to the calculated distance to form sterically cross-compatible subsets. Cluster numbers are stored in the first field of each object.
In the Pocketome ICB files, all matrices are stored in ICM matrix format: M_compatibility, M_clash, M_rmsdBB, and M_rmsdTot. Object order (stored in the second object field) is the same in the ICB file, comparison matrices and the comparison page.
HTML pages inside each entry visualize entry components and contain clickable links that propagate actions to the 3D graphics display window. The set of pages includes:
Each HTML page, except for help, contains common elements:
Each of the pocket and site pages contains:
The pairwise comparison page contains the graphical representation of the pairwise structure comparison matrices described above.
Each Pocketome entry contains two pre-programmed views (also called interactive slides). All residues of the binding site, ensemble of all ligands, metal ions and cofactors in vicinity of the ligand ensemble (if any) are shown on the first slide. Residues of an individual pocket (typically with the largest ligand) are highlighted and the protein chains are shown in full length on the second slide.
In addition of these pre-defined views, several tools exist to change/customize the visualization:
The color scheme remains untouched by these tools. Every protein chain is assigned a specific color: the master protein chain A1 is light green, chain A2 of the same protein in multimeric complexes (if any) is light blue, etc. Ligands are always shown in magenta. Metal ions and cofactors (if any) are represented with light blue and light yellow colors, respectively. This scheme is consistent between 3D and HTML parts of the entry.
Opening a downloaded Pocketome entry in the free ICM Browser (or in the full ICM molecular modeling suite) gives access to full online functionality plus a wide range of ICM-specific manipulations. These manipulations are summarized with examples below.
The ENTRY header can be made visible in the ICM workspace by
set property show on ENTRY
The fields of the ENTRY header record are accessed via ENTRY.<fieldName> syntax. For example:
To retrieve the name of SwissProt sequence(s) for the proteins forming the site:
Field(Split(ENTRY.protSwiss, "\n"), 2, "|")
To retrieve SwissProt accession number for the master sequence:
Field(Split(ENTRY.protSwiss, "\n"), 3, "|")
To build a list of domains/regions involved in ligand binding:
Split(ENTRY.protDomain "\n" ) ~ "*|*|*|*|*|*"
To extract domain/region annotations in the form of a table:
read table input="ch|dom|fr|to|name|rsel\n" + ENTRY.protDomain separator="|" header name="T_out"
The contact maps, in the ICM table format, can be made visible in the ICM workspace by
set property show on pocketMap siteMap
Following that, the contact maps can be manipulated via graphical interface. siteMap contains four embedded pairwise comparison matrices described above. Using cluster tree view one can recluster objects according to selected matrix: right click on a table cell, select 'Color By' menu item and choose matrix in the drop-down list (e.g. 'Cluster 2' for clash similarity matrix, full list is shown in the cluster array in the Header tab), then click on the corresponding matrix tab on the right panel. To change clustering cut-off, drag vertical delimiter in horizontal direction, this will change colors (both in the table and in the tree view) and values (cluster numbers) in the corresponding table column, e.g. cl_clash.
pocketMap and siteMap can be also manipulated using ICM syntax for tables. For example:
To get the list of objects without ligands:
Sum(Replace( (siteMap.lig_Nat == 0).pdb_ch "." "") ",")
To obtain the list of of site residues:
Sum(Replace(Field(Name(siteMap) 2 ".") ~ "*_mut" "_mut" "") ",")
s = Replace(Field(Name(siteMap) 2 ".") ~ "*_mut" "_mut" "") S = Name(siteMap) ~ "*." + s + "*" n = Nof(siteMap) group table T_out Replace(siteMap.pdb_ch "." "") "obNm" Sarray(n Field(s 1 "_") ) "chain" Sarray(n Replace(Field(s 2 "_") "n" "-") ) "residue" $S "cont" $S "mut" $S "info"
Pocketome selection tags can be used to address the selection or extract its properties using Select() and Field() functions in ICM, respectively. Below is an incomplete list of ICM language expressions using the Pocketome tags.
Select(a_*. "lig1>4") & Select(a_*. "lig1<51")
Sum(Unique(Split(Sum(Field(a_*. "chains") ",") ",") sort) ",")
Sum(Unique(Split(Sum(Field(Mol(Select(a_*./ "psite") ) "chain") ",") ",") sort) ",")
Sum(Unique(Sarray(Name(a_*.) 1 4) sort) ",")
Sum(Unique(Split(Replace(Sum(Field(Select(a_*.* "lignm") "lignm") ",") "[^a-z0-9]" "." regexp) ".") !~ "" sort) ",")
msel = Select(a_*.* "lignm") show column Name(msel full) Field(msel "lignm") Sarray(Field(msel "ligsz") )
To show covalent modifiers of site residues:
Res(Next(Select(a_*./ "covmod" ) bond) & !Mol(Select(a_*./ "covmod" ) ) & !Select(a_*./ "ligand") )
To retrieve the name of master SwissProt sequence:
Name(Select(a_*.* "chain==A1") swiss)
To retrieve the SwissProt accession number:
s = Name(Select(a_*.* "chain==A1") swiss) Field(Namex($s) 1 " ")
l_showResCodeInSelection = no S_lst = Unique(Field(Sarray(Select(a_*./ "psite==A1") residue) 2 "/") sort) sort number S_lst Sum(S_lst ",")
To show site consensus:
To select residues by the specified domain/region s_domain using information in the ENTRY header record:
use "SH2" to select SH2, SH2-like and extended SH2 domains/regions;
use "^SH2$" to select SH2 domain only
S_out = Split(ENTRY.protDomain "\n") I_out = Index(Field(S_out 5 "|") Replace(s_domain "([[*(+.+)?]])" "\\\\\\1" regexp) regexp all) Field(S_out[I_out] 5 "|") Select(a_*./ "domain~(^|*,)(" + Sum(Unique(Replace(Field(S_out[I_out] 1 "|") "[0-9]" "") + Field(S_out[I_out] 2 "|") sort) "|") + ")(,*|$)")
To retrieve the domain/region names for the specified residue selection rsel:
S_out = Split(ENTRY.protDomain "\n") Unique(Field(S_out[Index(S_out Replace(Sum(Replace(Unique(Split(Sum(Field(Select(Res(rsel) "domain") "domain") ",") ",") sort) "([AB]+)([RD]+.*$)" "^\\11\|\\2\|" regexp) "|") "^$" "-----" regexp) regexp all)] 5 "|") sort)
ICM family products support export of data in a variety of conventional formats including but not limited to:
ligsel = Mol(Select(Res(Select(a_*. "lig1") ) "ligand") ) group table chem Chemical(ligsel) "mol" Field(ligsel "lignm") "lignm" Name(ligsel object) "obnm" write table mol chem "chem.sdf"
write table separator="\t" header pocketMap "pocket.tsv"
Pocketome database is available online at http://pocketome.org . When using this server and/or this file, please cite:
Kufareva, I., Ilatovskiy, A.V. and Abagyan, R. (2012) Pocketome: an encyclopedia of small-molecule binding sites in 4D. Nucleic Acids Research, 2012 Jan; 40(1): D535–D540. PubMed 22080553.