Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

OpenCGA implements an R REST client library called opencgaR to execute any query or operation through the REST web services API. The client offers programmatic access to the implemented REST web services, facilitating the access and analysis of data stored in OpenCGA. From version 2.0.0 data is returned in a new RestResponse object which contains metadata and the results. The client also implements some handy methods to return information from this object.


opencgaR has been implemented by Marta Bleda. The code is open-source and can be found at https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/R. It can be installed easily by downloading the pre-build package. Please, find more details on how to use the R library at Using the R client.

Installation

The R client requires at least R version 3.4, although most of the code is fully compatible with earlier versions. The pre-build R package of the R client can be downloaded from the OpenCGA v2.0.0 GitHub Release at https://github.com/opencb/opencga/releases and installed using the install.packages function in R. install.packages can also install a source package from a remote `.tar.gz` file by providing the URL to such file (code below).

Code Block
languagebash
themeRDark
## Install opencgaR by providing the URL to the package
install.packages("opencgaR_2.0.0.tar.gz", repos=NULL, type="source")

Getting started

Connection and authentication into an OpenCGA instance

A set of methods have been implemented to deal with the connectivity and login to the REST host. Connection to the host is done in two steps using the functions initOpencgaR and opencgaLogin for defining the connection details and login, respectively.

The initOpencgaR function accepts either host and version information or a configuration file (as a list() or in YAML or JSON format). The opencgaLogin function establishes the connection with the host, it requires an opencgaR object (created using initOpencgaR function) and the login details: user and password. User and password can optionally be introduced interactively through a popup window using interactive=TRUE, to avoid typing user credentials within the R script or a config file.

The code below shows three different ways to initialise the OpenCGA connection with the REST server.

Code Block
languagebash
themeRDark
## Initialise connection specifying host and REST version
con <- initOpencgaR(host = "http://bioinfo.hpc.cam.ac.uk/opencga-prod/", version = "v2")

## Initialise connection using a configuration in R list
conf <- list(version="v2", rest=list(host="http://bioinfo.hpc.cam.ac.uk/opencga-prod/"))
con <- initOpencgaR(opencgaConfig=conf)

## Initialise connection using a configuration file (in YAML or JSON format)
conf <- "/path/to/conf/client-configuration.yml"
con <- initOpencgaR(opencgaConfig=conf)

Once the connection has been initialised users can login specifying their OpenCGA user ID and password.

Code Block
languagebash
themeRDark
## Log in
con <- opencgaLogin(opencga = con, userid = "demouser", passwd = "demouser")
Client Library


Code Block

Design Principles

The R package can be downloaded from opencgaR_1.4.0.tar.gz and the source code can be found in https://github.com/opencb/opencga/tree/develop/opencga-client/src/main/R. The methods and classes implemented have been designed following the S4 interface as recommended by Bioconductor. Class definitions are stored in R/AllClasses.R. Currently, there is only one class defined containing the OpenCGA connection details which is extensively used by all methods in the package. A set of methods have been implemented to deal with the connectivity and login to the REST host. These methods are stored in R/OpencgaR-methods.R. Connection to the host is done in two steps using the functions initOpencgaR and opencgaLogin for defining the connection details and loging in, respectively.

code
languagebash
themeRDark
titleInitialise the OpenCGA connection and login
# Initialise connection
con <- initOpencgaR(host = "http://localhost:8080/opencga/", version = "v1")
# Log in
con <- opencgaLogin(opencga = con, userid = "user", passwd = "pass")

API

The package implements at least one function for each available resource (user, project, study, etc.) which are defined in R/AllGeneric.R. Currently, the following functions are available:

Every method belonging to each resource takes the mandatory parameters individually and calls to the corresponding web service using the correct HTTP method (GET or POST). All functions require an OpencgaR connection object (as described in the initialisation and login step). The action parameter expects a character string specifying what information we want to obtain (the endpoint, as described here). Any additional query and body parameters (params) should be specified in the params field as a list(). For example, if we want to get the information about two samples ("sample1" and "sample2") excluding the attributes and stats fields, we could do it as follows:

Code Block
themeRDark
# Create params list
params <- list(exclude="attributes,stats")


# Get information
sampleClient(OpencgaR=con, 
			 sample=c("sample1","sample2"), 
			 action="info",
			 params=params)

In addition to the individual resource clients, we have created a general method that gives the user more flexibility to construct REST queries. This is the fetchOpenCGA() function which accepts multiple parameters:

Code Block
themeRDark
titlefetchOpenCGA function
fetchOpenCGA(object, category, categoryId, 
			 subcategory, subcategoryId, action, 
			 params, httpMethod, as.queryParam)

In the above function and following the terminology hereobject is the OpencgaR connection object, category is the resource, categoryId is the resource ID we want to query and action is the endpoint, as specified earlier. More complex web services will require subcategory and subcategoryId. As happens with the other methods, additional query and body parameters are handled internally so the user should just pass a list() to the params parameter. By default, the parameters are NULL, so there is no need to set them if unused. The HTTP method (GET or POST) should be specified using httpMethod, please check the Swagger documentation to know what is the correct method for the function you want to use.

Help

A help function (opencgaHelp()) has been also implemented to provide easy access to the information about the web services available. 

Code Block
themeRDark
titleopencgaHelp()
# List all services available
opencgaHelp(opencga=con)


# List all services available for a particular resource
opencgaHelp(opencga=con, client="studyClient")


# List all parameters available for a particular resource endpoint
opencgaHelp(opencga=con, client="studyClient", action="search"## Look for the first 5 sample IDs of the study "study"
sample_result = oc.samples.search(study='study', limit=5, include='id')



Markdown
| endpointName | Endpoint WS | parameters accepted |
| -- | :-- | --: |
| groupByAudit | /{apiVersion}/admin/audit/groupBy | count, limit, fields[*], entity[*], action, before, after, date |
| indexStatsCatalog | /{apiVersion}/admin/catalog/indexStats |  |
| installCatalog | /{apiVersion}/admin/catalog/install | body[*] |
| jwtCatalog | /{apiVersion}/admin/catalog/jwt | body[*] |
| createUsers | /{apiVersion}/admin/users/create | body[*] |
| importUsers | /{apiVersion}/admin/users/import | body[*] |
| syncUsers | /{apiVersion}/admin/users/sync | body[*] |


Table of Contents:

Table of Contents
indent20px