This tutorial will first guide you to download a set of raw files from several data sources. These raw files shall contain the core data that will populate the Cellbase knowledgebase. Then, the tutorial will show you how to build the JSON documents that should be loaded into the Cellbase knowledgebase. However, we have already processed all these data and json documents are available through our FTP server for those users who wish to skip these two sections below. Thus, if you want to skip the sections below, you can directly download json documents from http://bioinfo.hpc.cam.ac.uk/downloads/cellbase/v4/homo_sapiens_grch37/mongodb/ and jump to the [[Load Data Models]] tutorial.
You do not need to install CellBase to run queries. See Using CellBase for information how to run queries. |
Stage | Description |
---|---|
Download | Downloads the data files for the specified data sets |
Build | Parses the downloaded data files, generates JSON objects, e.g. gene.json |
Load | Loads the generated JSON objects into the Mongo database |
Which sort of hardware you need depends on how much data you need, query load, etc. A full CellBase instance is 1 TB of data, but loading only genomic data is XXX GB. Also loading and querying data is very resource intensive, we recommend at least XXX GB of RAM.
Below are the software dependencies required by CellBase.
Software | Version | Purpose |
---|---|---|
Java | 8 | |
MongoDB | 3.6 | Database |
Tomcat | 8.5x | |
Docker | 18 | Building Ensembl |
Now that you have your own installation of CellBase, see Using CellBase for information how to run queries.
Table of Contents: