The Pocketome is an encyclopedia of conformational ensembles of all druggable binding sites
that can be identified experimentally from co-crystal structures in the
Protein Data Bank.
2227 entries total, at least 943 entries from mammals.
Most represented protein families:
|187|| Protein Kinase Superfamily|
|55|| Cytochrome P450 Family|
|41|| Nuclear Hormone Receptor Family|
|32|| Peptidase S1 Family|
|27|| GST Superfamily|
|25|| Calycin Superfamily|
|22|| Class-II Aminoacyl-tRNA Synthetase Family|
|22|| Short-chain Dehydrogenases/reductases (SDR) Family|
|19|| G-protein Coupled Receptor 1 Family|
|16|| AB Hydrolase Superfamily|
# of pockets per entry
median pairwise pocket RMSD
What is and what is not covered by the Pocketome?
Each Pocketome entry corresponds to a small molecule binding site in a protein that
- has an entry in the reviewed part of the UniProt Knowledgebase,
- has been co-crystallized in complex with at least one drug-like small molecule,
- is represented in at least two PDB entries.
Binding sites that do not satisfy all three of the above requirements may not be represented in the Pocketome collection. This includes:
- singletons: sites with only one structure in the PDB, or several structures of identical composition
- sites with no co-crystallized drug-like molecules
- variable content/combinatorial sites (e.g. MHC Class 1 and 2, antibodies, etc.)
- sites in proteins within the unreviewed part of Uniprot
For selected interesting binding sites that do not pass the filters of the input generation utility, the Pocketome entries can be obtained by providing the siteFinder input manually.
For example, this allows construction of the Pocketome entries in cases when there are no structures with drug-like ligands, or when the protein is only described in the unreviewed part of Uniprot.
Because both PDB and Uniprot databases are constantly expanding and developing, and because the Pocketome is updated automatically with their releases,
currently excluded entries may become part of the Pocketome encyclopedia in future versions.
How is the Pocketome built?
The Pocketome ensembles have been generated from
current releases of UniProt and PDB using a unique algorithm that automatically collects, clusters,
and validates the binding pockets based on the consistency of their composition between the multiple structures (to be published).
To speed up the molecular visualization, the Pocketome structures use a reduced atom representation for all parts of the protein
except the binding pocket. Structures of identical composition are eliminated from the online version of the entries, but not from the index.
The Pocketome entries can also differ from the original PDB files by the following:
- Spatial orientation: the Pocketome structure ensembles are re-oriented in 3D space so that multiple structures of the same binding site are superimposed.
- Residue numbering: Pocketome residues are numbered according to the reference SwissProt sequence with initiating Met and signal peptide present (where applicable).
- Ligand names: Pocketome molecules are sometimes renamed from their original PDB names to remove ambiguities, merge covalently linked parts of a single ligand etc.
- Protein molecule names and number: Biological units are reconstructed in the Pocketome entries for those cases where the oligomeric partner forms an essential part of the binding pocket. Additionally, multiple parts of a single precursor molecule are merged together and renumbered according to the SwissProt sequence.
- Water molecules are removed in the current Pocketome release.
- Detergents, ions, cofactors are removed except for cases where they form an essential part of the binding pocket.
These and other changes have been made to the PDB files in the process of the Pocketome construction to bring them to a standard form and ensure that the only source of variation between the ensemble structures is the natural flexibility of the protein and the induced fit effect.
Pocketome entries are available for download and for interactive viewing using the versatile ActiveICM technology. The pocket contacts tables are available for download in the TSV format.
The term that was first introduced in Ref.2 below to signify the entire set of macromolecular binding sites for small molecules, drugs, substrates and metabolites.
In the context of this encyclopedia we are focused on binding sites with multiple three dimensional structures of those pockets with different (or no) cocrystallized ligands
which can be called experimental (or validated) pocketome. The theoretical pocketome (e.g. Ref 3) is currently not included.
The set of amino-acid residues that have been experimentally shown to participate in ligand binding in at least one of the co-crystal complexes.
The site is a superset of all pockets projected onto the amino-acid sequence of the protein.
The set of atoms that are in direct contact with the co-crystallized ligand in a single experimentally determined complex structure.
- Kufareva I., Abagyan R. The flexible pocketome engine for structural chemogenomics. Methods Mol. Biol., Wiley, 2009, 575, 249-79
- An J., Totrov M., Abagyan R. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell. Proteomics, 2005, Jun, 4, 752-61
- Nicola G., Smith C.A., Abagyan R. New method for the assessment of all drug-like pockets across a structural genome. J. Comput. Biol., 2008, Apr;15(3):231-240
This work is partially supported by NIH grants R01 GM071872, U01 GM094612, U54 GM094618, and RC2 LM010994.