Proteome-pI 2.0 - Proteome Isoelectric Point Database is a database of pre-computed isoelectric points for proteomes from different model organisms (20, 115 species).
A full list of organisms can be seen here

General statistics for analyzed proteomes

 

Number of proteomes

Total number 

of proteins

Mean number of

proteins ± SD

Mean size of

proteins ± SD

Mean mw of

proteins ± SD

Viruses

Archaea

Bacteria

Eukaryote (all isoforms)

Eukaryote (main isoform)

Eukaryote (minor isoforms)

10,064

331

8,108

1,612

1,612

637

518,140

767,951

30,290,647

29,752,296

25,437,198

4,315,098

       51 ±     85  

    2,320 ±  1,263

    3,736 ±  1,785

   18,457 ± 16,804

   15,780 ± 11,138

    6,774 ± 14,244

237 ± 300

278 ± 211

320 ± 246

467 ± 471

438 ± 420

638 ± 676

26.6 ± 33.2

30.6 ± 23.1

35.1 ± 26.8

52.1 ± 52.4

48.8 ± 46.7

71.2 ± 75.4

 

 

IDM number

of proteins

Median number

of protein

IDM size

of proteins

Median size

of protein

IDM  mw

of proteins

Median mw

of protein

Viruses

Archaea

Bacteria

Eukaryote (all isoforms)

Eukaryote (main isoform)

Eukaryote (minor isoforms)

     35

2,265

3,583

15,677

14,121

3,916

    17

2,164

3,503

13,001

12,791

1,000

  182

  249

  287

  391

  370

  534

145

234

274

353

336

476

20.4

27.4

31.4

43.6

41.2

59.7

16.4

25.8

30.0

39.4

37.5

53.2

 mw - molecular weight in kDa; IDM - interdecile mean

As one can see Viruses have the smallest proteomes (coding usually only a handful number of proteins) with compacted proteins. Then Archaea step in with relatively small proteomes (~2.3k) and short proteins (234-278 aa). The next group, Bacteria, code bigger proteomes (~3.7k) and longer proteins (274-320 aa).

Eukaryotes on the other hand are the most sophisticated, having the biggest proteomes (~18k) and the longest proteins (336-476 aa). Moreover, many of them possess multiple splicing isoforms which in some cases can significantly increase proteome complexity (e.g. in humans 20,600 proteins vs. 79,500 additional isoforms)

General isoelectric point statistics



Molecular weight and isoelectric points across kingdoms
Eukaryota code the biggest proteins, while viruses make them compact (left plot). On the other hand, the isoelectric point of proteins is highly controlled in Eukaryota most likely due to efficient homeostasis, while Archean proteins are allowed to have a wide range of pI (in extreme conditions those organisms frequently change pH inside of the cell to use less energy for the homeostasis), right plot. Data for 331 Archaea, 4,046 Viruses, 8,105 Bacteria and 1,612 Eukaryota proteomes with at least 50 proteins.

Isoelectric points for different methods
Each prediction method has slightly different characteristics for predicting isoelectric points, but on the other hand, all of them differ in prediction across kingdoms. The above picture has been composed based on a random selection of 100 proteomes from each kingdom.


Dissociation contant (pKa) predictions according to charge location
The plot is based on a random selection of 400 proteomes (100 Viruses, 100 Archaea, 100 Bacteria, and 100 Eukaryota has been used)


Amino acid frequency across kingdoms

Kingdom Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Xaa

Total
amino acids

TXT

Viruses
7.81Ala: 7.806 ± 0.078
1.29Cys: 1.287 ± 0.018
6.20Asp: 6.204 ± 0.016
6.46Glu: 6.457 ± 0.031
3.91Phe: 3.910 ± 0.019
6.72Gly: 6.721 ± 0.030
1.96His: 1.962 ± 0.011
6.05Ile: 6.048 ± 0.050
6.24Lys: 6.237 ± 0.051
8.28Leu: 8.281 ± 0.014
2.51Met: 2.511 ± 0.008
4.99Asn: 4.990 ± 0.050
4.25Pro: 4.247 ± 0.028
3.62Gln: 3.620 ± 0.014
5.31Arg: 5.306 ± 0.042
6.47Ser: 6.474 ± 0.017
6.14Thr: 6.141 ± 0.015
6.66Val: 6.664 ± 0.021
1.42Trp: 1.416 ± 0.009
3.71Tyr: 3.714 ± 0.004
0.004Xaa: 0.004 ± 0.001

122,870,810

extended_txt_version

Archaea
8.95Ala: 8.951 ± 0.136
0.90Cys: 0.899 ± 0.020
7.00Asp: 6.995 ± 0.100
6.46Glu: 6.457 ± 0.031
3.65Phe: 3.650 ± 0.031
7.84Gly: 7.841 ± 0.060
1.86His: 1.860 ± 0.018
6.03Ile: 6.030 ± 0.135
4.18Lys: 4.184 ± 0.156
9.11Leu: 9.107 ± 0.036
2.14Met: 2.138 ± 0.030
3.36Asn: 3.358 ± 0.083
4.36Pro: 4.363 ± 0.029
2.48Gln: 2.477 ± 0.033
5.83Arg: 5.833 ± 0.071
6.12Ser: 6.118 ± 0.059
5.84Thr: 5.842 ± 0.048
8.16Val: 8.164 ± 0.071
1.06Trp: 1.058 ± 0.009
3.18Tyr: 3.183 ± 0.037
0.010Xaa: 0.010 ± 0.004

213,285,886

extended_txt_version

Bacteria
10.64Ala: 10.638 ± 0.034
0.90Cys: 0.902 ± 0.003
5.67Asp: 5.668 ± 0.006
6.06Glu: 6.056 ± 0.009
3.76Phe: 3.764 ± 0.009
8.01Gly: 8.008 ± 0.014
2.08His: 2.080 ± 0.003
5.52Ile: 5.524 ± 0.022
4.23Lys: 4.225 ± 0.025
10.12Leu: 10.117 ± 0.008
2.31Met: 2.312 ± 0.005
3.35Asn: 3.353 ± 0.017
4.82Pro: 4.815 ± 0.012
3.49Gln: 3.491 ± 0.010
6.18Arg: 6.177 ± 0.020
5.75Ser: 5.746 ± 0.008
5.58Thr: 5.578 ± 0.006
7.42Val: 7.422 ± 0.012
1.31Trp: 1.313 ± 0.003
2.81Tyr: 2.809 ± 0.010
0.001Xaa: 0.001 ± 0.000

9,693,905,784

extended_txt_version

Eukaryota
7.38Ala: 7.376 ± 0.045
1.85Cys: 1.852 ± 0.012
5.34Asp: 5.341 ± 0.011
6.55Glu: 6.555 ± 0.017
3.79Phe: 3.794 ± 0.012
6.35Gly: 6.354 ± 0.021
2.50His: 2.498 ± 0.006
4.94Ile: 4.942 ± 0.024
5.64Lys: 5.640 ± 0.024
9.39Leu: 9.385 ± 0.015
2.27Met: 2.268 ± 0.006
4.13Asn: 4.134 ± 0.027
5.56Pro: 5.561 ± 0.022
4.27Gln: 4.274 ± 0.023
5.71Arg: 5.705 ± 0.018
8.45Ser: 8.454 ± 0.015
5.56Thr: 5.556 ± 0.016
6.24Val: 6.238 ± 0.013
1.24Trp: 1.239 ± 0.005
2.81Tyr: 2.806 ± 0.010
0.028Xaa: 0.028 ± 0.003

13,901,635,566

extended_txt_version

All
8.72Ala: 8.715 ± 0.045
1.46Cys: 1.455 ± 0.014
5.49Asp: 5.493 ± 0.008
6.36Glu: 6.364 ± 0.012
3.78Phe: 3.781 ± 0.008
7.04Gly: 7.039 ± 0.019
2.32His: 2.321 ± 0.006
5.19Ile: 5.193 ± 0.018
5.06Lys: 5.055 ± 0.022
9.67Leu: 9.674 ± 0.009
2.29Met: 2.286 ± 0.004
3.82Asn: 3.815 ± 0.017
5.24Pro: 5.241 ± 0.015
3.94Gln: 3.938 ± 0.016
5.90Arg: 5.896 ± 0.014
7.33Ser: 7.326 ± 0.026
5.57Thr: 5.571 ± 0.010
6.74Val: 6.738 ± 0.013
1.27Trp: 1.268 ± 0.003
2.82Tyr: 2.815 ± 0.007
0.017Xaa: 0.017 ± 0.002

23,931,698,046

extended_txt_version

Note: The error has been estimated with the bootstrapping (x1000) at the proteome level

More amino acid statistics


Dipeptides

Viruses

Archaea

Bacteria

Eukaryota

All kingdoms

Tripeptides

Viruses

Archaea

Bacteria

Eukaryota

NA



Proteome-pI 2.0 is available under Creative Commons Attribution-NoDerivs license, for more details see here

Reference: Kozlowski LP. Proteome-pI 2.0: Proteome Isoelectric Point Database Update. Nucleic Acids Res. 2021, doi: 10.1093/nar/gkab944 Contact: Lukasz P. Kozlowski