What is Proteome-pI 2.0 - Proteome Isoelectric Point Database?

Proteome Isoelectric Point Database 2.0 (Proteome-pI 2.0) is the collection of data gathering information about isoelectric points and molecular weights for model organism proteomes (over 20,000 organisms). The results are presented in the form of virtual 2D-PAGE plots. Additionally, in order to facilitate bottom-up proteomics analysis, individual proteomes had been in silico digested using five most commonly used proteses (Trypsin, Chemotrypsin, Trypsin+LysC, LysN, ArgC) and the peptides’ theoretical isoelectric points and molecular weights have been provided.
The goals of the database include making statistical comparisons of the various prediction methods freely available to the scientific community (21 algorithms implemented) as well as facilitating the biological investigation of protein isoelectric point space both at the protein and peptide level.

What are the key improvements of Proteome-pI 2.0 over Proteome-pI 1.0?

The first version of Proteome-pI database had been developed in 2016 (for more details read the paper). The update brings significant improvements:

  1. The number of proteomes analyzed increased from 5,029 to 20,115 (quantitative change)
  2. New pI estimation algorithms had been added, including recently developed IPC 2.0. Currently, 21 isoelectric point prediction algorithms are supported (quantitative and qualitative change)
  3. pKa predictions for proteins are inclued
  4. Individual proteomes had been in silico digested using five most commonly used proteases and the peptides’ theoretical isoelectric points and molecular weights have been provided (qualitative change)

How digestion peptides for the proteomes are generated?

In order to in silico digest proteins within the proteomes, we use RPG software. The five most frequently used proteases are used (Trypsin, Chymotrypsin, Trypsin+LysC, LysN, ArgC) with miscleavage of 1.4%. Next, as MS machines can process in specific ranges of the mass, you are allowed to download filtered fractions depending on the mass spec machine you use:
  1. LTQ Orbitrap (600~4000 Da)
  2. MALDI TOF/TOF (750~5500 Da)
  3. ESI Ion Trap (600~3500 Da)
  4. MSlow (narrow range of the mass) (800~3500 Da)
  5. MShigh (wide range of the mass) (600~5500 Da)

What is an isoelectric point?

The isoelectric point is the pH at which a particular molecule carries no net electrical charge. For more information, see:


Why isoelectric point is important?

Isoelectric point is an important parameter for many analytical biochemistry and proteomics techniques, especially for 2D gel electrophoresis (2D-PAGE), capillary isoelectric focusing (cIEF), and liquid chromatography-mass spectrometry (LC-MS).

Methods

Currently, isoelectric point in Proteome-pI is predicted using the following methods:
For the accuracy of the individual methods check here.

Reference Proteomes

Isoelectric point was predicted for the proteomes provided by UniProt database (Release 2021_03). The proteomes have been manually and algorithmically selected. They cover well-studied model organisms and other organisms of interest for biomedical research and phylogeny.

Statistics (Total of species = 20,115). For more details see here

Other datasets

Apart from the proteomes from model organisms, there are also pI predictions for high-throughput analysis for nr, UniProt, and PDB databases:

Databases with experimentally derived isoelectric points



Proteome-pI 2.0 is available under Creative Commons Attribution-NoDerivs license, for more details see here

Reference: Kozlowski LP. Proteome-pI 2.0: Proteome Isoelectric Point Database Update. Nucleic Acids Res. 2021, doi: 10.1093/nar/gkab944 Contact: Lukasz P. Kozlowski