Information

Pathway Tools Overview Pathway Tools Testimonials
Publications
Release Note History
Contributions
Pathway Tools Blog
Technical Datasheet
Fact Sheet
Pathway Tools Testimonials
Contact Us

Licensing

Academic Licenses Commercial Licenses

Technical Specs

Web Services
Pathway Tools APIs
Installation Guide
Ontologies
Operations
File Formats

Support

Submitting Bug Reports
Tutorials
FAQs
Webinars
Ortholog DB Setup Instructions

Ortholog DB Setup Instructions

Context

Starting with Pathway Tools release 9.5, the comparative genome browser enables displaying orthologs between several organisms. To make this work, the orthologs need to have been precomputed and made accessible to the Pathway Tools software. Orthologs are also used for various other comparative queries between organisms.

The main method for storing and querying the orthologs is by a dedicated ortholog server, using MySQL. Pathway Tools contains code for retrieving orthologs from the ortholog server.

For Pathway Tools users working with their own, proprietary organisms, the prerequisite would be that orthologs between their organisms have been precomputed by the user, in the format described below, so the orthologs are formulated in terms of the gene IDs in those organism PGDBs.

Ortholog DB Server Setup Instructions

This section describes how the MySQL server is loaded up with the ortholog data, which happens in 2 stages.

Stage 1. Creating Ortho-Tab Files

One or several files need to be created, which contain the ortholog data. These files will be loaded into MySQL in Stage 2. The file format is simple, containing 5 Tab-delimited columns. And so this file format is called Ortho-Tab. Each row indicates an ortholog between one gene in one PGDB and another gene in another PGDB. Each such link has to be mentioned only once in the total set of files, because the retrieval code will query in both directions. This cuts in half the number of ortholog rows that have to be put into files and stored in MySQL.

The 5 columns are: GeneID1 , GeneID2 , OrgID1 , OrgID2 , PValue . Both GeneID1 and GeneID2 are the frame IDs of gene frames in their respective PGDBs and need to be unique within their PGDBs. Both OrgID1 and OrgID2 are the unique IDs for their respective PGDBs. PValue is a double float number, containing the PValue of the BLAST score. The PValue is used by Pathway Tools for propagating gene function information between some sets of PGDBs. An example line from an Ortho-Tab file looks like:

CC0008 CBU_0001 CAULO CBUR227377 .00000000000000000000000000000000000000000000000000000000000023

Starting with Pathway Tools release 29.0, there is support for computing orthologs directly from PGDBs in a fairly simple manner, by utilizing the Diamond external program. This should work quite well for modest numbers of PGDBs, like one or two dozen.

First, it is necessary to install Diamond, which can be found here: https://github.com/bbuchfink/diamond . We have tested with Diamond version 2.0.14 so far.

Second, start Pathway Tools in order to get a Lisp prompt, i.e. execute ptlisp from a shell. Then, at the Lisp prompt, type the following (customized to your detailed directory location):

(setf ortho::*diamond-prepare-protseq-db-executable-dir* "/home/user/diamond/")

In other words, the global variable will be set to the directory in which the diamond executable resides.

Third, invoke the command that will prepare data needed by Diamond from each PGDB that is part of the ortholog computation, and which will then actually compute the orthologs and will place the resulting Ortho-Tab files in a subdirectory. The following is the command, which should customized accordingly:

(ortho::simple-ortho-tabs-between-orgids '("ECOLI" "BSUB" "ABC" "XYZ") "29.0" "/var/orthos/")

The first argument of the command is a list of orgids (= PGDB IDs). The second argument is the version string for these PGDBs. And the third argument is the directory into which all results will be placed.

The final results of the Ortho-Tab files should be found in the ortho-tab-files subdirectory.

Stage 2. Populating MySQL from the Ortho-Tab Files

  1. Ensure that a MySQL (version 8, or later) server is running, which has proper access permissions for creating a table in a database and data loading permissions (See MySQL server details below).
  2. Create the "orthologs" database schema in your database, with the mysql program:
    mysql> create database orthologs
    NOTE: Whatever you call the ortholog database name, 
          also set this value in the ptools-init.dat file via the Ortho-RDBMS-Database-Name 
          configuration directive.  
  3. Start up a Pathway Tools image.
  4. Ensure that the ec::*ortholog-link-host* variable is set correctly, pointing to the ortholog-link server. It is set by the parameter called Ortho-RDBMS-Server-Hostname in the ptools-init.dat file, along with 3 more related parameters. (See below.)
  5. The ortholog data is stored in one SQL table called Orthologs . If this table already exists and was used for a prior version of the data, then this table needs to be dropped, by running the following at the LISP prompt:
    (connect-to-ortholog-link-db-if-needed)
    (dbi.mysql:sql "DROP TABLE Orthologs" :db *ortholog-link-db*)
    
  6. Create the Orthologs table, populate from the Ortho-Tab files, and build the indices, by running the following at the LISP prompt:
    (init-ortho-link-db "/var/orthos/ortho-tab-files/")
    
    Replace the example path "/var/orthos/ortho-tab-files/" with the directory location of where the Ortho-Tab files are located.

    This could take several hours to run to completion.
    For the 189 PGDBs of the 9.5 release, this took 37 min., running on cumin. (kr:Nov-6-2005)
    For 400 11.5 PGDBs, it took over 7 hrs., running on baharat. (kr:Oct-1-2007).
  7. The ortholog server should now be ready to use.

MySQL server details

  1. The mysql server usually runs as its own user (mysql). Ensure that:
    • Your ortholog data files AND directories are accessible by the mysql user/group.
    • Some Linux distros (in particular Debian based) utilizes a security feature like AppArmor that limits which files/directories that services like mysql can access. AppArmor profiles are usually stored in: /etc/apparmor.d/usr.sbin.mysqld
      Review or adjust the files/directories the paths so that mysql has access to your data files.
  2. Ensure you have enough free disk space for your ortholog data on the mysql server. One of our MySQL servers once ran out of disk space while the indices were being built. The problem is that it ended just hanging forever, and never returned any kind of error message regarding the problem. Running
    df
    should give an indication of whether a disk partition is used up 100%. Also, the MySQL logs, stored at /var/log/mysql/ are likely to contain a disk space error message. However, ordinary users do not have read permissions for these logs...
  3. Ensure the user account you use to load the data has sufficient permissions to load data. Currently, our mysql interface only supports server side loading of data files.
    grant file on *.* to dbuserid@localhost identified by 'dbpassword';

Pathway Tools configuration

In order to make use of the MySQL database for ortholog queries, you must modify a few Pathway Tools parameters stored in the ptools-init.dat configuration file. These are the parameters you need to configure:
  • Ortho-RDBMS-Server-Port 3306 (default mysql port, ask your DBA if you're not sure)
  • Ortho-RDBMS-Database-Name XXXXX (whatever name you called your ortholog database in stage 2, step #2 above).
  • Ortho-RDBMS-Username XXXXX (username to access your mysql DB)
  • Ortho-RDBMS-Password XXXXX (password you use to access your mysql DB)
  • Get-Orthologs-From-SRI N (If you're behind a firewall, you'll want to set this to "N", otherwise, each ortholog query will attempt to query SRI's public ortholog database also.)

Let us please know if you run into trouble with any of this, and we will help guide you through this. Not many of our users have experimented with their own ortholog servers, so the setup is not very user-friendly yet.