Page History

...

CellBase takes advantage of the data integrated to implement a rich and high-performance variant annotator. The variant annotation tool is integrated within the CellBase code and can be accessed in two different ways:

CellBase clients: a number of client libraries are provided which make intensive use of the CellBase RESTful API. They provide fast programmatic access for genome-scale data analysis, therefore discouraging massive downloads of data to local computers. Currently supported languages include Python, R, Java and JavaScript. A similar design has been used in all of them in order to facilitate their use, external contributions and maintenance. Again, all of them provide an exhaustive API for accessing the whole CellBase RESTful API. Please, refer to the corresponding Tutorials to find details on how to download, install, configure the libraries
Using remote RESTful web services: both GET and POST annotation web services are available (see http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/). By avoiding local installation of the knowledge base, users do not need to store hundreds of Gigabytes (about 900GB in current release v4) and will always be automatically updated The best way to use of the RESTful Web Services is through the client libraries implemented for different programming languages. Nevertheless, under certain circumstances it may be required to directly access the RESTful API.Web services based annotation results are returned in the form of JSON objects.
Using the Java command line: current Java CLI can connect to either remote web services or efficiently fetch annotation data directly from a custom installation of the database. Even when connecting to remote web services, the annotation CLI provides a lightweight efficient multi-threaded implementation which outperforms other local variant annotators (see _Benchmark_ results below)

...

Data provided by the variant annotator is the result of integrating most of the annotations available at the CellBase knowledge base: ENSEMBL's core transcript annotation such as location, id, strand, biotype,etc.; protein annotation provided by UniProt, InterPro, SIFT and PolyPhen; population frequencies provided by the European Variation Archive for The 1000 Genomes Project Phase 3, The Exome Server Project (EVS), The Exome Aggregation Consortium v3 (ExaC), gnomAD exomes, gnomAD genomes and The Genomes of the Netherlands (GoNL); sequence conservation from PhastCons and PhyloP; gene expression values from The Genome Expression Atlas and The Genotype-Tissue Expression project (GTEx); gene drug interaction data from The Drug Gene Interaction Database (DGIdb) and the Human Phenotype Ontology database (HPO); clinical variants annotation from ClinVar. Sequence effect prediction is also calculated on the fly and described by Sequence Ontology (SO) terms. We are constantly working to integrate new data sources in the knowledgebase.

...

Page tree

Versions Compared

Old Version 3

New Version 4

Key