We have already processed all these data and json documents are available through our FTP server for those users who wish to skip this section. Download the JSON from here: http://bioinfo.hpc.cam.ac.uk/downloads/cellbase/v4/homo_sapiens_grch37/mongodb/ http://bioinfo.hpc.cam.ac.uk/downloads/cellbase/v4/homo_sapiens_grch38/mongodb/ And follow the instructions here: Load Data |
The first step to creating a CellBase instance is to download the data files. Download can be done through the CellBase CLI.
$ cellbase/build/bin$ ./cellbase.sh download --data genome,gene |
The --data argument is required and is a comma separated list of data types to download. See below for the full list.
Type | Data sources |
---|---|
genome |
|
gene |
|
variation ** |
|
variation_functional_score |
|
regulation |
|
protein |
|
conservation ** |
|
clinical_variants ** |
|
repeats |
|
svs |
|
all ** | Downloads all of the above |
See Download Sources for details on versions and available organisms.
** Please note that many files are very large and can take several hours to download. |
For example, to download all human (GRCh37) data from all sources and save it into the `/tmp/data/cellbase/v4/` directory, run:
cellbase/build/bin$ ./cellbase-admin.sh download -a GRCh37 --common /tmp/data/cellbase/v4/common/ -d all -o /tmp/data/cellbase/v4/ -s hsapiens |
If download was successful, you can proceed to building the json objects that should be loaded into the corresponding database: Building the CellBase database