======================================================================= Instructions for downloading PEPTIDE part of "Proteome-pI 2.0" database ======================================================================= Proteomes of all 20,115 organisms had been in silico digested with five most frequently used proteases (Trypsin, Chemotrypsin, Trypsin+LysC, LysN, ArgC) and then isoelectric point had been predicted This resulted in ~300k files that can be downloaded one by one. The files can be very big thus they are compressed with zip algorithm. Caution: Before you start make sure that you have enough disk space (~15TB) and resources (a lot of time and CPU for unzipping) In order to download all files, you will need to write simple bash, python (or any other) script that will download a selection of the data you want: The files are located in subdirectories: http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/ http://isoelectricpointdb2.org/Archaea_rpg_zip/MShigh/ http://isoelectricpointdb2.org/Archaea_rpg_zip/MALDI/ http://isoelectricpointdb2.org/Archaea_rpg_zip/ESI/ http://isoelectricpointdb2.org/Archaea_rpg_zip/LTQ/ http://isoelectricpointdb2.org/Viruses_rpg_zip/MSlow/ http://isoelectricpointdb2.org/Viruses_rpg_zip/MShigh/ http://isoelectricpointdb2.org/Viruses_rpg_zip/MALDI/ http://isoelectricpointdb2.org/Viruses_rpg_zip/ESI/ http://isoelectricpointdb2.org/Viruses_rpg_zip/LTQ/ http://isoelectricpointdb2.org/Bacteria_rpg_zip/MSlow/ http://isoelectricpointdb2.org/Bacteria_rpg_zip/MShigh/ http://isoelectricpointdb2.org/Bacteria_rpg_zip/MALDI/ http://isoelectricpointdb2.org/Bacteria_rpg_zip/ESI/ http://isoelectricpointdb2.org/Bacteria_rpg_zip/LTQ/ http://isoelectricpointdb2.org/Eukaryota_rpg_zip/MSlow/ http://isoelectricpointdb2.org/Eukaryota_rpg_zip/MShigh/ http://isoelectricpointdb2.org/Eukaryota_rpg_zip/MALDI/ http://isoelectricpointdb2.org/Eukaryota_rpg_zip/ESI/ http://isoelectricpointdb2.org/Eukaryota_rpg_zip/LTQ/ From the given directories, for each proteome, you can download five files, for instance: http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/UP000509750_2743090_all_ArgC.MSlow.fasta.pI.csv.zip http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/UP000509750_2743090_all_ChTry.MSlow.fasta.pI.csv.zip http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/UP000509750_2743090_all_LysN.MSlow.fasta.pI.csv.zip http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/UP000509750_2743090_all_TryLysC.MSlow.fasta.pI.csv.zip http://isoelectricpointdb2.org/Archaea_rpg_zip/MSlow/UP000509750_2743090_all_Try.MSlow.fasta.pI.csv.zip Thus the pattern is: uniprotid_taxid_all_enzyme.fraction.fasta.pI.csv.zip For convenience, if want to download all data you can use the list of files for the above directories: http://isoelectricpointdb2.org/ls_Archaea_rpg_zip_MSlow.txt http://isoelectricpointdb2.org/ls_Archaea_rpg_zip_MShigh.txt http://isoelectricpointdb2.org/ls_Archaea_rpg_zip_ESI.txt http://isoelectricpointdb2.org/ls_Archaea_rpg_zip_LTQ.txt http://isoelectricpointdb2.org/ls_Archaea_rpg_zip_MALDI.txt http://isoelectricpointdb2.org/ls_Bacteria_rpg_zip_MSlow.txt http://isoelectricpointdb2.org/ls_Bacteria_rpg_zip_MShigh.txt http://isoelectricpointdb2.org/ls_Bacteria_rpg_zip_ESI.txt http://isoelectricpointdb2.org/ls_Bacteria_rpg_zip_LTQ.txt http://isoelectricpointdb2.org/ls_Bacteria_rpg_zip_MALDI.txt http://isoelectricpointdb2.org/ls_Viruses_rpg_zip_MSlow.txt http://isoelectricpointdb2.org/ls_Viruses_rpg_zip_MShigh.txt http://isoelectricpointdb2.org/ls_Viruses_rpg_zip_ESI.txt http://isoelectricpointdb2.org/ls_Viruses_rpg_zip_LTQ.txt http://isoelectricpointdb2.org/ls_Viruses_rpg_zip_MALDI.txt http://isoelectricpointdb2.org/ls_Eukaryota_rpg_zip_MSlow.txt http://isoelectricpointdb2.org/ls_Eukaryota_rpg_zip_MShigh.txt http://isoelectricpointdb2.org/ls_Eukaryota_rpg_zip_ESI.txt http://isoelectricpointdb2.org/ls_Eukaryota_rpg_zip_LTQ.txt http://isoelectricpointdb2.org/ls_Eukaryota_rpg_zip_MALDI.txt You can download the files from two mirrors: http://isoelectricpointdb2.org http://isoelectricpointdb2.mimuw.edu.pl ======================================================================= References: Kozlowski LP (2022) Proteome-pI 2.0: proteome isoelectric point database update. Nucleic Acids Res. (Database Issue) 50 (D1): D1535-D1540, doi: 10.1093/nar/gkab944 Kozlowski LP (2021) IPC 2.0: prediction of isoelectric point and pKa dissociation constants. Nucleic Acids Res. 49 (W1): W285-W292. doi: 10.1093/nar/gkab295 ====================================================================== __author__ = "Lukasz Pawel Kozlowski" __email__ = "lukaszkozlowski.lpk@gmail.com" __copyrights__ = "Lukasz Pawel Kozlowski" __website_ = "http://isoelectricpointdb2.org" __license__ = "http://isoelectricpointdb2.org/license.txt"