|
INTERPARE The Protein Interfaceome Database |
|
Background
1. PSIMAP and it's Algorithm
PSIMAP,the Protein Structural Interactome Map,
is a database of all the structurally observed interactions among protein
domains of known three-dimensional structures in the
PDB. It can be constructed using
any reliable protein domain definition, where domains are defined as
evolutionarily conserved structural and functional protein units.
Here we use the domain definitions provided by
SCOP (Structural Classification of Proteins),
which uses structural and functional homology to manually define evolutionarily
distinct protein domain families and superfamilies. Alternatively,
other domain definitions (such as
CATH,
FSSP,
Pfam, etc.) can be used.
Domains from a multi-domain PDB
entry are empirically denoted as interacting with each other if at least 5
residue pairs are within a 5 Angstrom distance. Although the data in the
PDB is relatively limited in comparison
to the available sequence data, it is much more comprehensive when compared
to the available protein interaction data.
PSIMAP provides an overview of all the observed
domain-domain interactions at the protein family or superfamily level.
It is important to consider protein interactions at this level with respect to
the stability of the network; while the number of
PDB entries is growing superlinearly,
the number of new folds is only increasing linearly. It is probable that there
are no more than 2,000 distinct protein topologies in nature.
Because of the slow growth in the number of new superfamilies and superfamily
interactions over time, PSIMAP represents the
first global overview of interactions at this level.
For example, the recent conservative superfamily assignment of 56 genomes
covered between 40-67% of the total detected genes in eukaryotes
and eubacteria (~100,000 genes) and between 31-54% of the total detected genes
in archaebacteria (~10,000 genes).
As a significant portion of the unassigned genes may represent trans-membrane
proteins not structurally determined due to experimental difficulties,
it is reasonable to suggest that the PDB
and PSIMAP cover many of the existing
globular superfamilies in nature.
Read more here.
Read about the concept and the assumptions of PSIMAP. Visit PSIMAP database or PSIbase.
2. Accessible Surface Area[
click to see conceptual animation]
The Accessible Surface Area (ASA) method detects protein
regions that are buried to be excluded from a solvent when forming a multimer
or a complex. If more than two subunits interact or aggregate with each other,
they lose some area that could be accessible by a solvent in the state of free
subunit or domain.
In the InterPare method, we define residues as interface
residues if they lost more than 1Å2 solvent accessible
surface area (ASA) on aggregation or complexation (Jones et al., 1996, 1997, 2000).
The ASA of protein molecules was calculated using a program called
NACCESS
(Hubbard S., http://wolf.bms.umist.ac.uk/naccess) which is an implementation
of the algorithm developed by Lee and Richards (Lee et al., 1971).
It calculates the absolute ASA and the relevant ASA in terms of total residues,
side chains, polar atoms, and non-polar atoms.
Relative accessibilities are calculated for each amino acid
in the protein by expressing the summed residue accessible surface
as a percentage of that observed in an ALA-X-ALA tri-peptide
(Hubbard et al., 1991).
Surface residues are defined as those residues that had a relative ASA of
more than 5% (Miller et al., 1987).
Interior residues are defined as those residues that have a relative ASA of
less than 5%.
The default van der Waals radii of atoms were taken from Chothia (Chothia, 1976).
We used water of 1.40 van der Waals radii as a solvent.
3. Voronoi Diagram
The Voronoi diagram, known as Dirichlet tessellation, has been widely
used in the fields of science and engineering. The Voronoi diagram was
first introduced as an application to the study of protein structure
by Richards et al. (1974, 1977).
There is a report on defining molecular interfaces
by Power Diagram - Voronoi Diagram on weighted points set
(Varshney et al., 1995). We used the same protocol suggested
by Varshney et al., except for two aspects.
First, we did not use polygon filtering by distance threshold
in that we already had interfaces defined
by distance threshold method.
Second, we calculated interfaces on domains instead of
calculating them on protein complexes. We first constructed
a three dimensional power-diagram P of the atoms.
The construction of such a power diagram will have a time complexity of
(n is number of atoms in the protein) where the number of neighbors
for any given atom is bounded by a constant.
Each face of the power-diagram P is defined by two atoms (Figure 3).
For each face in P, if two atoms defining a face belong to different
domains from each other, we call such face as interface face.
Let us define interface-cell as a cell in the power-diagram P
that has at least one interface-face.
Let us define interface-atoms to be those atoms whose cells are interface-cells.
In the InterPare database, all the interface-atoms between two domain pairs
are stored in a PDB-style file format. The default van der Waals radii of atoms
were taken from Chothia (1976).
![]() Power diagram of two different domains. Light blue circles (atoms) are contained in domain A and green atoms are in domain B.
Dotted lines denote Voronoi edges and a solid line represents the interface-faces between two domains.
Any polygon which is adjacent to at least one interface-face is called interface-cell.
If a cell is an interface-cell, then we call the atom in the cell an interface-atom.
Interface-atoms are slightly darker than non-interface atoms. InterPare database stores all the interface-atom information.
|