Each file contains a list of gene pairs from a certain genome from version 13.1 of BioCyc, along with a label of "1" if the pair is considered to be functionally related or "-1" otherwise. A pair of genes is considered to be functionally related if the products of the two genes catalyze reactions in the same metabolic pathway, belong to the same protein complex, or take part in the same signaling pathway, as found in version 13.1 of BioCyc.
The full set of samples contains all pairs of genes in the corresponding genome. This set is not used in the paper to report results because of the likely high ratio of mislabeled negative samples. The known-function set is designed to discard all genes for which no knowledge of the function is available. These genes would result on mislabeled negative examples since their function is not known and, as a result, a positive label will never be assigned to a pair involving those genes. The sm-enzyme set of samples contains pairs only for genes whose product is an enzyme in a small-molecule reaction.
The gene IDs used in these files are the BioCyc gene IDs. For Escherichia coli K-12, the UniProt IDs for the products of the genes are given in a separate file. This file lists the EcoCyc gene ID, the EcoCyc ID of its product and the UniProt ID of this product (if any). A gene might have more than one product, in which case there will be more than one line for such gene listed in this file.
The full database for each organism is available in the BioCyc download page.
Organism |
Full Set |
Known Function Set |
Sm Enzyme Set |
Mapping |
Escherichia coli K-12 substr. MG1655 | gz | gz | gz | txt |
Escherichia coli O157:H7 EDL933 | gz | gz | gz | |
Escherichia coli CFT073 | gz | gz | gz | |
Shigella flexneri 2a str. 2457T | gz | gz | gz | |
Vibrio cholerae O1 biovar El Tor str. N16961 | gz | gz | gz | |
Caulobacter crescentus CB15 | gz | gz | gz | |
Mycobacterium tuberculosis CDC1551 | gz | gz | gz | |
Mycobacterium tuberculosis H37Rv | gz | gz | gz | |
Francisella tularensis tularensis SCHU S4 | gz | gz | gz | |
Helicobacter pylori 26695 | gz | gz | gz |