Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Prerequisite

Prerequisites

In order to build OpenCGA from source code you must first get the source code , user should download the source code from github and build it. This build process is handled with maven. The following tools of OpenCGA from GitHub, most of the dependencies - including OpenCB dependencies - will be fetched from Maven Central Repository, however in some scenarios OpenCB dependencies will need to be built from GitHub source code. Compiling and building processes are carried out by Apache Maven. The following tools are required for successful build:

  • Java JDK 1.8 (JDK 1.8.0_60+)
  • Apache Maven 3

Stable releases are merged and tagged at master branch, users are encouraged to use latest stable release for production. Current active development is carried out at develop branch and need Java 8, only compilation is guaranteed and bugs are expected, use this branch for development or for testing new functionalities. Only dependencies of master branch are ensured to be deployed at Maven Central Repositorydevelop branch may require users to download and install other active OpenCB repositories:

  • biodata

You can learn how to install them in this section at Installation Guide > Server Configuration.

Getting and Compiling Dependencies

OpenCGA as any other software has dependencies, some of them come from other OpenCB projects such as CellBase while others are third-party dependencies such as MongoDB. All OpenCGA stable releases are always merged and tagged at master branch (users are encouraged to use latest stable release for production), you can find all releases at OpenCGA Releases. We guarantee that all the dependencies needed for building stable releases are deployed at Maven Central Repository, this is true for both OpenCB and third-party dependencies. Therefore for building a stable release you only need to clone OpenCGA repository itself since all the dependencies will be fetched form Maven Central repository.

This is different for development branches. Active OpenCGA development is carried out at develop branch, in this branch third-party dependencies will be still fetched from Maven Central Repository but this is not true for OpenCB dependencies since it is very likely they are still in development and therefore they are not deployed. Keep in mind that we only guarantee that  develop compiles and that bugs are expected, use this branch for development or for testing new functionalities. So, for building develop branch you may require to download and install the following OpenCB repositories in this order:

As you can see one of our rules is that develop branch of all major applications such as OpenCGA and CellBase always depend on develop branches. So, if you really want to build develop the you can clone and build dependencies by executing:

Code Block
languagebash
themeRDark
titleClone Dependencies
## Clone develop branc
git clone -b develop https://github.com/opencb/java-common-libs.git
git clone -b develop https://github.com/opencb/biodata
 (branch 'develop')datastore
.git
git clone -b develop https://github.com/opencb/
datastore (branch 'develop')cellbase
cellbase.git
git clone -b develop https://github.com/opencb/
cellbase (branch 'develop')
  • hpg-bigdatahttps://github.com/opencb/hpg-bigdata (branch 'develop')
  • Clone And Compile Dependencies

    User must clone and compile all of the above dependencies using the following commands :

    Code Block
    titleClone Dependency
    git clone github-Dependency-URL
    
    Code Block
    titleCompile Dependency
    mvn clean install -DskipTests
    

    Clone OpenCGA

    Code Block
    $ git clonehpg-bigdata.git
    
    ## Now you can execute the following command in each of the folders the specified order above
    mvn clean install -DskipTests


    Clone and Build OpenCGA

    You can clone OpenCGA from GitHub by executing:

    Code Block
    languagebash
    themeRDark
    titleShell
    ## Latest stable version
    git clone -b master https://github.com/opencb/opencga.git
    

    Latest stable release at master branch can be downloaded executing:

    Code Block
    $.git
    
    ## Develop branch, for this to work remember to clone and build OpenCB dependencies (see above)
    git clone -b master https://github.com/opencb/opencga.git

    Build

    OpenCGA can be compiled directly, just by executing mvn install -DskipTests, but some configuration files can be customized with the content of the file ~/.m2/settings.xml.

    User don't need this file for the installation, but it makes easier to compile, install and configure at the same time the program, by filtering the resources files. Also, this information is used to run the tests. In that case, this file is required
    develop https://github.com/opencb/opencga.git

    Building with Maven

    OpenCGA allows to customise many variables in the configuration files, in order to make easier the building and configuration of OpenCGA we rely on Maven Properties that can be defined in file ~/.m2/settings.xml. During the building all these properties will be injected automatically in the configuration files so users do not have to manually change all configuration values. Note this is only possible when building OpenCGA from source code, if you download the binary version you will have to manually set up all configuration variables.

    An example of that file can be found in the README and below. The description of each property can be found below:

    • OPENCGA.CATALOG.DB.HOSTS: This property should be configured with the host and port of the MongoDB installation. By default, for development purposes, we have it set with "localhost:27017".OPENCGA.CATALOG.DB.DATABASE: This property indicates the database name that will be created to store the catalog information. Default: opencga_catalogof the MongoDB installation. By default, for development purposes, we have it set with "localhost:27017".
    • OPENCGA.CATALOG.DB.USER: This property should only be set if the MongoDB needs authentication. In this case, this property will contain the user name with permissions for the database. *This can be left empty in any case. The admin will be able to set this credentials using the command line.
    • OPENCGA.CATALOG.DB.PASSWORD: This property should only be set if the MongoDB needs authentication. In this case, this property will contain the password of the user with permissions for the database. *Like in the user property, this can be left empty. The admin will be able to set this credentials using the command line.

    • OPENCGA.INSTALLATION.DIR: This property is extremely important when using Tomcat to deploy the webservices. This property will have to point to the final OpenCGA installation directory after everything has been built. This property will be used by Tomcat to locate the configuration files. If this is not properly set, none of the webservices will work. Default: /opt/opencga.

    • OPENCGA.CATALOGUSER.ROOTDIRWORKSPACE: In Catalog, users are allowed to build their own directory structure, upload their own files, run analysis, etc. This path should be pointing to a physical location where Catalog will be storing those files and directory structure. By default, we normally put it in a folder called "sessions" within the installation directory (file:///opt/opencga/sessions/). Be aware of the "file://" annotation. In version 0.8 this is still necessary but will not be needed for future releases (see issue/opencga/sessions/).

    • OPENCGA.STORAGE.VARIANT.DB.HOSTS:

    • OPENCGA.STORAGE.VARIANT.DB.USER:
    • OPENCGA.STORAGE.VARIANT.DB.PASSWORD:
    • OPENCGA.STORAGE.ALIGNMENT.DB.HOSTS:
    • OPENCGA.STORAGE.ALIGNMENT.DB.USER:
    • OPENCGA.STORAGE.ALIGNMENT.DB.PASSWORD:
    • OPENCGA.ANALYSIS.EXECUTION.MANAGER: OpenCGA Catalog allows users to run jobs. This property indicates how the jobs will be launched. At the moment we only support two types: LOCAL to run the jobs locally in a thread or SGE to run the jobs using Sun Grid Engine. More queuing systems will be supported soon.

    • OPENCGA.CLIENT.HOST: This property should be pointing to the URL where the webservices will be available. For development purposes, the default is http://localhost:8080/opencga/. This property is read by the command line opencga.sh in order to communicate with the webservices.

    • OPENCGA.CELLBASE.REST.HOST:

      OPENCGA.CELLBASE.VERSION:

      URL to be used for Variant Annotation.


    You can copy this example from main pom.xml to ~/./m2/settings.xml:

    Note
    titleUpdate

    The next XML code has been updated on , please make sure you update your settings.xml, sorry for any inconvenience caused.


    Code Block
    languagexml
        <?xml version="1.0" encoding="UTF-8"?>
        <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
    .0.0.xsd">
            <profiles>
                <profile>
                    <id>custom-config</id>
                    <activation>
           <profiles>             <profile><activeByDefault>true</activeByDefault>
                    <id>custom-config<</id>activation>
                    <activation><properties>
                        <activeByDefault>true</activeByDefault><opencga.war.name>opencga-${opencga.version}</opencga.war.name>
    
                        </activation>!-- General -->
                        <properties><OPENCGA.INSTALLATION.DIR>/opt/opencga</OPENCGA.INSTALLATION.DIR>
                        <opencga<OPENCGA.war.name>opencga-${opencga.version}</opencga.war.name>USER.WORKSPACE>file:///opt/opencga/sessions/</OPENCGA.USER.WORKSPACE>
                         <!-- General --><OPENCGA.JOBS.DIR>${OPENCGA.USER.WORKSPACE}/jobs/</OPENCGA.JOBS.DIR>
                        <OPENCGA.INSTALLATION.DIR>/opt/opencga<DB.PREFIX>opencga</OPENCGA.INSTALLATIONDB.DIR>PREFIX>
                        <OPENCGA.USER.WORKSPACE>file:///opt/opencga/sessions/<EXECUTION.MODE>LOCAL</OPENCGA.USEREXECUTION.WORKSPACE>MODE>
    
                        <OPENCGA.JOBS.DIR>${OPENCGA.USER.WORKSPACE}/jobs/</OPENCGA.JOBS.DIR><!-- Client -->
                        <OPENCGA.DB.PREFIX>opencga</OPENCGA.DB.PREFIX>.CLIENT.REST.HOST>http://localhost:8080/${opencga.war.name}</OPENCGA.CLIENT.REST.HOST>
                        <OPENCGA.CLIENT.EXECUTION.MODE>LOCAL<GRPC.HOST>http://localhost:9091</OPENCGA.CLIENT.EXECUTION.MODE>
    
     GRPC.HOST>
                      <!-- Client -->  <OPENCGA.CLIENT.ORGANISM.SCIENTIFIC_NAME>Homo sapiens</OPENCGA.CLIENT.ORGANISM.SCIENTIFIC_NAME>
                        <OPENCGA.CLIENT.REST.HOST>http://localhost:8080/${opencga.war.name}<ORGANISM.COMMON_NAME>human</OPENCGA.CLIENT.RESTORGANISM.HOST>COMMON_NAME>
                        <OPENCGA.CLIENT.GRPC.HOST>http://localhost:9091<ORGANISM.TAXONOMY_CODE>9606</OPENCGA.CLIENT.GRPCORGANISM.HOST>TAXONOMY_CODE>
                        <OPENCGA.CLIENT.ORGANISM.SCIENTIFIC_NAME>Homo sapiens<ASSEMBLY></OPENCGA.CLIENT.ORGANISM.SCIENTIFIC_NAME>ASSEMBLY>
    
                        <OPENCGA.CLIENTSERVER.ORGANISMREST.COMMON_NAME>human<PORT>9090</OPENCGA.CLIENTSERVER.ORGANISMREST.COMMON_NAME>PORT>
                        <OPENCGA.CLIENTSERVER.ORGANISMGRPC.TAXONOMY_CODE>9606<PORT>9091</OPENCGA.CLIENTSERVER.ORGANISMGRPC.TAXONOMY_CODE>PORT>
                        <OPENCGA.CLIENTMONITOR.ORGANISM.ASSEMBLY><PORT>9092</OPENCGA.CLIENT.ORGANISM.ASSEMBLY>MONITOR.PORT>
    
                        <!--  <OPENCGA.SERVER.REST.PORT>9090</OPENCGA.SERVER.REST.PORT>Catalog -->
                        <OPENCGA.SERVERCATALOG.GRPCDB.PORT>9091<HOSTS>localhost:27017</OPENCGA.SERVERCATALOG.GRPCDB.PORT>HOSTS>
                        <OPENCGA.CATALOG.MONITORDB.PORT>9092<USER></OPENCGA.CATALOG.MONITORDB.PORT>USER>
                         <!-- Catalog --><OPENCGA.CATALOG.DB.PASSWORD></OPENCGA.CATALOG.DB.PASSWORD>
                        <OPENCGA.CATALOG.DB.HOSTS>localhost:27017<AUTHENTICATION_DATABASE></OPENCGA.CATALOG.DB.HOSTS>AUTHENTICATION_DATABASE>
                        <OPENCGA.CATALOG.DB.USER><CONNECTIONS_PER_HOST>20</OPENCGA.CATALOG.DB.USER>CONNECTIONS_PER_HOST>
    
                        <!-- Storage -->
                        <OPENCGA.CATALOG.DB.PASSWORD><STORAGE.DEFAULT_ENGINE>mongodb</OPENCGA.CATALOG.DB.PASSWORD>STORAGE.DEFAULT_ENGINE>
                        <OPENCGA.CATALOGSTORAGE.DB.AUTHENTICATION_DATABASE><CACHE.HOST>localhost:6379</OPENCGA.CATALOGSTORAGE.DBCACHE.AUTHENTICATION_DATABASE>HOST>
                        <OPENCGA.CATALOGSTORAGE.DB.CONNECTIONS_PER_HOST>20<SEARCH.HOST>http://localhost:8983/solr/</OPENCGA.CATALOGSTORAGE.DBSEARCH.CONNECTIONS_PER_HOST>
                        <OPENCGA.STORAGE.STUDY_METADATA_MANAGER></OPENCGA.STORAGE.STUDY_METADATA_MANAGER>
    
     <!-- Storage -->                 <!-- Storage Variants  <OPENCGA.STORAGE.DEFAULT_ENGINE>mongodb</OPENCGA.STORAGE.DEFAULT_ENGINE>general -->
                        <OPENCGA.STORAGE.VARIANT.CACHEDB.HOST>localhostHOSTS>localhost:6379<27017</OPENCGA.STORAGE.VARIANT.CACHEDB.HOST>HOSTS>
                        <OPENCGA.STORAGE.VARIANT.SEARCH.HOST>http://localhost:8983/solr/<DB.USER></OPENCGA.STORAGE.VARIANT.SEARCHDB.HOST>USER>
                        <OPENCGA.STORAGE.STUDY_METADATA_MANAGER><.VARIANT.DB.PASSWORD></OPENCGA.STORAGE.STUDY_METADATA_MANAGER>.VARIANT.DB.PASSWORD>
    
                        <!-- Storage VariantsAlignments general -->
                        <OPENCGA.STORAGE.VARIANTALIGNMENT.DB.HOSTS>localhost:27017</OPENCGA.STORAGE.VARIANTALIGNMENT.DB.HOSTS>
                        <OPENCGA.STORAGE.VARIANTALIGNMENT.DB.USER></OPENCGA.STORAGE.VARIANTALIGNMENT.DB.USER>
                        <OPENCGA.STORAGE.VARIANTALIGNMENT.DB.PASSWORD></OPENCGA.STORAGE.VARIANTALIGNMENT.DB.PASSWORD>
    
                        <!-- Storage Alignments general Storage-mongodb -->
                        <OPENCGA.STORAGE.MONGODB.ALIGNMENTVARIANT.DB.HOSTS>localhost:27017<AUTHENTICATION_DATABASE></OPENCGA.STORAGE.MONGODB.ALIGNMENTVARIANT.DB.HOSTS>AUTHENTICATION_DATABASE>
                        <OPENCGA.STORAGE.MONGODB.VARIANT.DB.CONNECTIONS_PER_HOST>20</OPENCGA.STORAGE.MONGODB.VARIANT.DB.CONNECTIONS_PER_HOST>
    
             <OPENCGA.STORAGE.ALIGNMENT.DB.USER></OPENCGA.STORAGE.ALIGNMENT.DB.USER>           <!-- Storage-hadoop -->
           <OPENCGA.STORAGE.ALIGNMENT.DB.PASSWORD></OPENCGA.STORAGE.ALIGNMENT.DB.PASSWORD>             <!--If empty, will use the ZOOKEEPER_QUORUM read from the <!-- Storage-mongodb hbase configuration files-->
                        <OPENCGA.STORAGE.MONGODBHADOOP.VARIANT.DB.AUTHENTICATION_DATABASE><HOSTS></OPENCGA.STORAGE.MONGODBHADOOP.VARIANT.DB.AUTHENTICATION_DATABASE>HOSTS>
                        <OPENCGA.STORAGE.MONGODBHADOOP.VARIANT.DB.CONNECTIONS_PER_HOST>20<USER></OPENCGA.STORAGE.MONGODBHADOOP.VARIANT.DB.CONNECTIONS_PER_HOST>USER>
                         <!-- Storage-hadoop --><OPENCGA.STORAGE.HADOOP.VARIANT.DB.PASSWORD></OPENCGA.STORAGE.HADOOP.VARIANT.DB.PASSWORD>
                        <!--If empty, will use the ZOOKEEPER_QUORUM read from the hbase configuration files--><OPENCGA.STORAGE.HADOOP.VARIANT.HBASE.NAMESPACE></OPENCGA.STORAGE.HADOOP.VARIANT.HBASE.NAMESPACE>
                        <OPENCGA.STORAGE.HADOOP.VARIANT.ARCHIVE.TABLE.PREFIX>${OPENCGA.DB.HOSTS><PREFIX}_study</OPENCGA.STORAGE.HADOOP.VARIANT.ARCHIVE.DB.HOSTS>TABLE.PREFIX>
    
                        <!-- Email server -->
                        <OPENCGA.STORAGE.HADOOP.VARIANT.DB.USER><MAIL.HOST></OPENCGA.STORAGE.HADOOP.VARIANT.DB.USER>MAIL.HOST>
                        <OPENCGA.STORAGE.HADOOP.VARIANT.DB.PASSWORD><MAIL.PORT></OPENCGA.STORAGE.HADOOP.VARIANT.DB.PASSWORD>MAIL.PORT>
                        <OPENCGA.STORAGE.HADOOP.VARIANT.HBASE.NAMESPACE><MAIL.USER></OPENCGA.STORAGE.HADOOP.VARIANT.HBASE.NAMESPACE>
      MAIL.USER>
                     <OPENCGA.STORAGE.HADOOP.VARIANT.ARCHIVE.TABLE.PREFIX>${OPENCGA.DB.PREFIX}_study<   <OPENCGA.MAIL.PASSWORD></OPENCGA.STORAGE.HADOOP.VARIANT.ARCHIVE.TABLE.PREFIX>MAIL.PASSWORD>
    
                        <!-- Email servercellbase -->
                              <OPENCGA.MAIL.HOST></OPENCGA.MAIL.HOST><OPENCGA.CELLBASE.REST.HOST>http://bioinfo.hpc.cam.ac.uk/cellbase/</OPENCGA.CELLBASE.REST.HOST>
    					<OPENCGA.CELLBASE.VERSION>v4</OPENCGA.CELLBASE.VERSION>
                        <OPENCGA.CELLBASE.MAILDB.PORT><HOST>localhost:27017</OPENCGA.CELLBASE.MAILDB.PORT>HOST>
                        <OPENCGA.CELLBASE.MAILDB.USER></OPENCGA.CELLBASE.MAILDB.USER>
                        <OPENCGA.CELLBASE.MAILDB.PASSWORD></OPENCGA.CELLBASE.MAILDB.PASSWORD>
                         <!-- cellbase --><OPENCGA.CELLBASE.DB.AUTHENTICATION_DATABASE></OPENCGA.CELLBASE.DB.AUTHENTICATION_DATABASE>
                        <OPENCGA.CELLBASE.DB.VERSION>v4<READ_PREFERENCE>secondaryPreferred</OPENCGA.CELLBASE.VERSION>.DB.READ_PREFERENCE>
                    </properties>
           <OPENCGA.CELLBASE.REST.HOST>http://bioinfodev.hpc.cam.ac.uk/cellbase-4.5.0-beta1.1/</OPENCGA.CELLBASE.REST.HOST>
          </profile>
            </profiles>
        </settings>
    

    After creating this and configuring a default profile, you can build OpenCGA by executing the following command from the root of the cloned repository:

    Code Block
    languagetext
    themeRDark
    $ mvn clean install -DskipTests         <OPENCGA.CELLBASE.DB.HOST>localhost:27017</OPENCGA.CELLBASE.DB.HOST>
                        <OPENCGA.CELLBASE.DB.USER></OPENCGA.CELLBASE.DB.USER>
                        <OPENCGA.CELLBASE.DB.PASSWORD></OPENCGA.CELLBASE.DB.PASSWORD>
                        <OPENCGA.CELLBASE.DB.AUTHENTICATION_DATABASE></OPENCGA.CELLBASE.DB.AUTHENTICATION_DATABASE>
                        <OPENCGA.CELLBASE.DB.READ_PREFERENCE>secondaryPreferred</OPENCGA.CELLBASE.DB.READ_PREFERENCE>
                    </properties>
                </profile>
            </profiles>
        </settings>
    

    After creating and configuring the default profile, you can build OpenCGA by executing the following command from the root of the cloned repository:

    Code Block
    languagetext
    themeRDark
    $ mvn clean install -DskipTests

    After successful compilation, user should find the following file structure under OPENCGA.INSTALLATION.DIR:

    Code Block
    opencga/
    ├── tools/
    ├── bin/
    ├── conf/
    └── libs/-P custom-config

    The first time this command can take some minutes since it has to fetched and store locally all the dependencies, next builds will be much faster. After successful building, user should find the following file structure under a build folder:

    Code Block
    languagetext
    themeRDark
    titleShell
    build
    ├── bin
    │   ├── obsolete
    │   ├── opencga-admin.sh
    │   ├── opencga-analysis.sh
    │   ├── opencga-env.sh
    │   └── opencga.sh
    ├── conf
    │   ├── client-configuration.yml
    │   ├── configuration.yml
    │   ├── log4j.properties
    │   └── storage-configuration.yml
    ├── examples
    │   ├── 1k.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz
    │   ├── 20130606_g1k.ped
    │   ├── opencga-index.sh
    │   └── opencga-storage-fetch.sh
    ├── libs
    │   ├── activation-1.1.jar
    │   ├── .....
    │   └── zookeeper-3.4.6.jar
    ├── LICENSE
    ├── opencga-1.0.0-rc4.war
    ├── README.md
    ├── test
    │   ├── bin
    │   ├── dependencies
    │   ├── fitnesse
    │   └── README.md
    └── tools
        ├── affy-expression-normalization
        ├── .....
        └── variant
    
    42 directories, 181 files


    Table of Contents:

    Table of Contents
    indent20px