Installation

Installing and configuring CellBase consists of different steps, as you will see in this page you must first make sure that the server(s) have all dependencies installed, then you can configure and complete the CellBase build.

You do not need to install CellBase to run queries. See Using CellBase for more information on how to use CellBase

Overview

Building a CellBase instance has three stages:

Stage	Description
Download **	Downloads the data files for the specified data sets
Build **	Parses the downloaded data files, generates JSON objects, e.g. gene.json
Load	Loads the generated JSON objects into the Mongo database

This document will show you how to create a CellBase instance. First, you will download a set of raw files from several data sources. These raw files shall contain the core data that will populate the Cellbase knowledgebase. Then, you will build the JSON documents that should be loaded into the Cellbase knowledgebase. These three stages are described in detail below.

** We have already downloaded and processed these data, and the resulting JSON documents are available through our FTP server. For those users who wish to skip these two sections, directly download json documents from http://bioinfo.hpc.cam.ac.uk/downloads/cellbase/v4/homo_sapiens_grch37/mongodb/ and jump to the [[Load Data Models]] section.

Step 1 - Configuring the Server

Before you can start building CellBase, you must first install all required software dependencies.

Hardware

Which sort of hardware you need depends on how much data you need, query load, etc. A full CellBase instance is 1 TB of data, but loading only genomic data is XXX GB. Also loading and querying data is very resource intensive, we recommend at least XXX GB of RAM.

Software Dependencies

Below are the software dependencies required by CellBase.

Software	Version	Purpose
Java	8
MongoDB	3.6	Database
Tomcat	8.5x
Docker	18	Building Ensembl

Java - we recommend you use the OpenJDK.
MongoDB - put your mongo credentials in settings.xml ???
Tomcat - put your tomcat credentials in settings.xml ???
Docker - CellBase uses docker to manager the Perl modules required to query Ensembl's Perl API.

Step 2 - Downloading the data

Run this command to download all the data:

./build/bin/cellbase-admin.sh download -d gene -s hsapiens

See Download Sources for the details on all the data that's available to download.

Step 3 - Building the data

Run this command to download all the data:

./build/bin/cellbase-admin.sh build -d gene -s hsapiens

See Building the CellBase database for the details on how to build.

Step 4 - Loading the data

Run this command to download all the data:

./build/bin/cellbase-admin.sh load -d gene -s hsapiens

See Load Data for the details on all the data that's available to download.

Now that you have your own installation of CellBase, see Using CellBase for information how to run queries.

Table of Contents:

Page tree

Installation

Overview

Step 1 - Configuring the Server

Hardware

Software Dependencies

Step 2 - Downloading the data

Step 3 - Building the data

Step 4 - Loading the data