The SEquence Globally Unique IDentifier (SEGUID) Proteome Database contains a unique protein sequence identifier based on the Secure Hash Algorithm (SHA-1) digest of the primary sequence because our bioinformatics, analytical, and high-throughput proteomics pipelines suffered from changing and disappearing protein identifiers. A SEGUID is stable for the lifetime of a protein and is used as the central identifier while all other aliases are treated as dynamic properties. Everyone can derive the same SEGUID from the sequence information, which allows easy data sharing. The use of SEGUID ensures that proteomics data is resilient to changes in annotation databases and the reports generated reflect the most recent annotations collected from sequence databases. Our SEGUID website provides a number of web applications and web services which are described in this manuscript. The FTP site provides pre-calculated data, FASTA files, alias tables, and sample programs describing the web services and their consumption by other applications.

SEGUID is meant to replace the 64-bit Cyclic Redundancy Check (CRC64).


Below are two nearly identical immunoglobulin fragments (except for where indicated with "*"). The SEGUID is the 3rd field after the '>' sign.

>gnl|sha|BpBeDdcNUYNsdk46JoJdw7Pd3BI|immunoglobulin lambda light chain variable region [Homo sapiens]
>gnl|sha|X5XEaayob1nZLOc7eVT9qyczarY|immunoglobulin lambda light chain variable region [Homo sapiens]


