Understanding the URL
The general structure of a CellBase RESTful call is:
Where HOST_URL is
Sections in braces are parameters, so they can be treated as variables.
As is explained later in this documentation, this RESTful WS will get all the information available for the gene BRCA2 in human using the v4 version.
The available parameters are:
Indicate the CellBase version to retrieve information from. Versions are numbered as v1, v2, v3, v4, etc. At this moment the latest stable version is v4.
Species to get information from. A list of the species available for version v4 can be found at Data sources and species. Use the Id to indicate the species. For example, hsapiens for Homo sapiens or mmusculus for Mus musculus.
Category and subcategory
These parameters must be specified depending on the nature of your input data. For example, if we want to query CellBase by a genomic region (e.g. 13:32889575-32889763) we should use the genomic category and region subcategory. There are 4 main categories:
|Genomic category makes reference to genomic coordinates
|Position, Variant, Region
|Feature category involve all elements which have a defined location on the genome and provides an easy way to retrieve cross references for an ID
|Gene, Variation, Transcript, Protein, Xref, ...
|Regulatory category refers to all regulatory interactions involving transcription factors and microRNAs
|Meta category provides resources to access CellBase meta-data, including available code versions, species or functioality
What the user wants to retrieve from the id. This is the query parameter, it is the feature or term about we want to retrieve the information (resource). Its type must correspond with the subcategory.
NOTE: In order to improve performance, ID lists can be passed together in only one REST call separated by commas. Only 200 IDs are allowed. For larger queries user the CellBase clients.
bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest/v4/hsapiens/feature/gene/BRCA2,BCL2/snp?include=chromosome,start,end,id (returned data limited to start, end and rs id to reduce response time in this last query)
Each Category and Subcategory can have different resources and actions allowed. They specify the type of result we want to obtain from the ID.
NOTE: Resources must always be written in singular.
Filters and extra options
Filters and extra options can be added at the end of the REST query followed by
?. They are optional and can be combined using the
& sign. E.g:
These are the available filters and options:
output format: coded as *of**, the only allowed value now is json (default), others such as protobuf are being developed. E.g.:
exclude: name of the fields to be excluded in the output. E.g:
include: name of the fields to be included in the output, the rest will be excluded. E.g:
limit: maximum number of results to be returned. By default, all results are returned. E.g:
skip: number of results to be skipped. By default, no result is omitted. E.g:
count: get the number of results obtained. By default, false. E.g:
A list of all available web services end points is available at:
powered by the Swagger project.
Python client. PyCellBase
- This Python package makes use of the exhaustive RESTful Web service API that has been implemented for the CellBase database.
- It enables to query and obtain a wealth of biological information from a single database, saving a lot of time.
- As all information is integrated, queries about different biological topics can be easily and all this information can be linked together.
- Currently Homo sapiens, Mus musculus and a total of 32 species are available and many others will be included soon.
- More info about this package in the Python client tutorial of the CellBase Wiki
- PyCellBase is compatible with both Python 2 and 3.
- This package makes use of multithreading to improve performance when the number of queries exceed a specific limit.
- It is distributed:
- With the rest of the CellBase code at
- Via PyPI - the Python Package Index (not available yet)
- With the rest of the CellBase code at
PyCellBase code can be accessed at
The CellBaseClient class provides access to the different clients of the data we want to query (e.g. gene, transcript, variation, protein, genomic region, variant).
Each of these clients provide a set of methods to ask for the resources we want to retrieve. Most of these methods will need to be provided with comma-separated IDs or list of IDs. Optional filters and extra options can be added as key-value parameters.
Responses are retrieved as JSON formatted data. Therefore, fields can be queried by key.
If there is an available resource, but there is not an available method in this python package, the CellBaseClient class can be used to create the URL of interest. This class is able to access the RESTful Web Services through the get method it implements. In this case, this method needs to be provided with those parameters which are required by the URL: category (e.g. feature), subcategory (e.g. gene), ID to search for (e.g. BRCA1) and method to query (e.g. search).
Configuration data as host, API version, or species is stored in a ConfigClient object. A custom configuration can be passed to CellBaseClient with a ConfigClient object provided with a JSON or YML config file. If you want to change the configuration on the fly you can directly modify the ConfigClient object.
Please, find more details on how to use the python library at: Python client tutorial
PyCellBase can be cloned in your local machine by executing in your terminal:
$ git clone https://github.com/opencb/cellbase.git
Once you have downloaded the project you can install the library:
$ cd cellbase/clients/python
$ python setup.py install
(not available yet)
Note: R client library available from version 4 onwards
R library package is implemented and maintained for the latest R release and distributed through Bioconductor (https://bioconductor.org/packages/release/bioc/html/cellbaseR.html). For a quick install, please enter the R terminal and type:
The R code is also distributed together with the rest of the CellBase code. R code can be found at:
The R code provides a library for programmatic access to CellBase data. No CLI is implemented in R.
Comprehensive description of the library is provided by the Bioconductor documentation:
Table of Contents:
- No labels