Date: Thu, 28 Mar 2024 08:52:52 +0000 (GMT) Message-ID: <1326634073.255.1711615973030@web> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_254_2067071849.1711615972584" ------=_Part_254_2067071849.1711615972584 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
OpenCGA implements a Python REST client library called PyOpenCGA=
to execute any query or operation through the REST web servi=
ces API. PyOpenCGA provides programmatic access to all the implemented REST web services, providing an eas=
y, lightweight, fast and intuitive solution to access OpenCGA data. The lib=
rary offers the convenience of an object-oriented scripting language and pr=
ovides the ability to integrate the obtained results into other Python appl=
ications.
Some of the main features include:
PyOpenCGA has been implemented b= y Daniel Perez, Pablo Marin and David Gomez and it is based on a previous l= ibrary called pyCGA implemented by Antonio Rueda and Dan= iel Perez from Genomics England. The code is open-source and can be found a= t ht= tps://github.com/opencb/opencga/tree/develop/opencga-client/src/main/python= /pyOpenCGA. It can be installed using PyPI and . Please, find more deta= ils on how to use the python library at Using the Python client.
Python client requires at least Python 3.x, althou= gh most of the code is fully compatible with Python 2.7. You can install Py= OpenCGA either from PyPI repository or from source c= ode.
PyOpenCGA client is available at PyPI repository at h= ttps://pypi.org/project/pyopencga/. Installation is as simple as runnin= g the following command line:
## Latest stable = version pip install pyopencga=20
From OpenCGA v2.0.0 the Python client source code can be found at GitHub= Release at https://github.com/opencb/opencga/releases= . You can easily install pyOpenCGA using the setup.py= file.
## Get latest sta= ble version from https://github.com/opencb/opencga/releases. You can use wg= et from the terminal wget https://github.com/opencb/opencga/releases/download/v2.0.0/opencga-2.0= .0.tar.gz ## Decompress tar -zxvf opencga-2.0.0.tar.gz ## Move to the pyOpenCGA client folder cd opencga-2.0.0/clients/python ## Install the library python setup.py install=20
Configuration is handled by the ClientConfiguration class. You = can create a ClientConfiguration using either the conf/client-= configuration.yml file or by passing a dictionary.
## Import ClientCon= figuration class from pyopencga.opencga_config import ClientConfiguration ## You can create a ClientConfiguration by using the path to the client-con= figuration.yml file (it can also accept a JSON file) config =3D ClientConfiguration('opencga-2.0.0/conf/client-configuration.yml= ') ## Additionally, you can pass a dictionary using the same structure as the = client-configuration.yml (the only required parameter is REST host) config =3D ClientConfiguration({"rest": {"host": "http://bioinfo.hpc.cam.ac= .uk/opencga-prod"}})=20
OpencgaClient is the= main class in pyOpenCGA. It manages login/logout authenticat= ion, REST clients initialisation and provides a set of other utilities.
To create an OpencgaClient instance, a C= lientConfiguration instance must be passed as an argument. You can aut= henticate in two different ways. First, you can login by providing the user= and optionally the password. Second, you can provide a valid token when cr= eating OpencgaClient. Remember that tokens are only valid for a pe= riod of time.
## Import ClientCon= figuration and OpencgaClient class from pyopencga.opencga_config import ClientConfiguration from pyopencga.opencga_client import OpencgaClient ## Create an instance of OpencgaClient passing the configuration config =3D ClientConfiguration('opencga-2.0.0/conf/client-configuration.yml= ') oc =3D OpencgaClient(config) ## Two authentication options: ## Option 1. If the user has a valid token, it can be passed to start doing= calls as an authenticated user oc =3D OpencgaClient(config, token=3D'TOKEN') ## Option 2. If no token is provided, the user must login with valid creden= tials. Password is optional (if it is not passed to the login method, it wi= ll be prompted to the user) oc.login(user=3D'USER')=09## The password will be asked # or oc.login(user=3D'USER', password=3D'PASSWORD') ## You can logout by executing the following command, the token will be del= eted.=20 oc.logout()=20
The OpencgaClient class work=
s as a client factory containing all the different clients, one per REST resource, that are necessary to call any REST web service. =
Below is a list of available clients:
## Create main clie= nts users =3D oc.users projects =3D oc.projects studies =3D oc.studies files =3D oc.files jobs =3D oc.jobs families =3D oc.families individuals =3D oc.individuals samples =3D oc.samples cohorts =3D oc.cohorts panels =3D oc.panels ## Create analysis clients alignments =3D oc.alignment variants =3D oc.variant clinical =3D oc.clinical ga4gh =3D oc.ga4gh ## Create administrative clients admin =3D oc.admin meta =3D oc.meta variant_operations =3D oc.variant_operations=20
Clients implements all available REST API endpoints, on= e method has been implemented for each REST web service. The list of availa= ble actions that can be performed with all those clients can be checked in = Swagger as explained in RESTful Web Services#Swagger. Each parti= cular client has a method defined for each available web service implemente= d for the resource. For instance, the whole list of actions available for t= he Sample resource are shown below.
For all those actions, there is a method available in the sample client.= For instance, to search for samples using the /search web se= rvice, you need to execute:
## Look for the fir= st 5 sample IDs of the study "study" sample_result =3D oc.samples.search(study=3D'study', limit=3D5, include=3D'= id')=20
As described in RESTful Web Services#RESTResponse, all R= EST web services return a RestResponse object containing some= metadata and a list of OpenCGAResults. Each of these OpenCGAR= esults contain some other metadata and the actual data results.
To work with these REST responses in an easier way, RestRespons= e class has been implemented to wrap the web service Ret= Response object and to offer some useful methods to process the r= esults. For instance, the sample_result variable from th= e example above is a RestResponse instance. This object = defines several methods to navigate through the data.
The implemented RestResponse methods are:
## Returns the list= of results for the response in position "response_pos" (response_pos=3D0 b= y default) sample_response.get_results(response_pos) ## Returns the result in position "result_pos" for the response in position= "response_pos" (response_pos=3D0 by default) sample_response.get_result(result_pos, response_pos) ## Returns the list of responses sample_response.get_responses() ## Returns the response in position "response_pos" (response_pos=3D0 by def= ault) sample_response.get_response(response_pos) ## Returns all results from the response in position "response_pos" as an i= terator (response_pos=3DNone returns all results for all QueryResponses) sample_response.result_iterator(response_pos) ## Returns all response events by type "event_type" ('INFO', 'WARNING' or '= ERROR') (event_type=3DNone returns all types of event) sample_response.get_response_events(event_type) ## Returns all response events by type "event_type" ('INFO', 'WARNING or 'E= RROR') for the response in position "response_pos" (event_type=3DNone retur= ns all types of event; response_pos=3D0 by default) sample_response.get_result_events(event_type, response_pos) ## Return number of matches for the response in position "response_pos" (re= sponse_pos=3DNone returns the number for all QueryResponses) sample_response.get_num_matches(response_pos) ## Return number of results for the response in position "response_pos" (re= sponse_pos=3DNone returns the number for all QueryResponses) sample_response.get_num_results(response_pos) ## Return number of insertions for the response in position "response_pos" = (response_pos=3DNone returns the number for all QueryResponses) sample_response.get_num_inserted(response_pos) ## Return number of updates for the response in position "response_pos" (re= sponse_pos=3DNone returns the number for all QueryResponses) sample_response.get_num_updated(response_pos) ## Return number of deletions for the response in position "response_pos" (= response_pos=3DNone returns the number for all QueryResponses) sample_response.get_num_deleted(response_pos)=20
To explore the data in an easier way, a method named print_r= esults has also been implemented to show the response in a mo= re human-readable format.
## Print results of= the query for the response in position "response_pos" (response_pos=3DNone= returns the results for all QueryResponses) sample_response.print_results(fields=3D'id', response_pos=3D0, limit=3D5, s= eparator=3D'\t', metadata=3DTrue, outfile=3D'path/to/output.tsv')=20
# First, we need to = import both the ClientConfiguration and the OpencgaClient from pyopencga.opencga_config import ClientConfiguration from pyopencga.opencga_client import OpencgaClient # Second, we need to set up the configuration # The main client-configuration.yml file has a "host" section to point to t= he REST OpenCGA endpoints # We need to either pass the path to the configuration file or a dictionary= with the same structure of the file config =3D ClientConfiguration({'rest': {'host': 'http://bioinfo.hpc.cam.ac= .uk/opencga-prod'}}) # Third, we create an instance of the OpencgaClient passing the configurati= on oc =3D OpencgaClient(config) # Finally, we need to authenticate. oc.login(user=3D'demouser', password=3D'demouser') # Additionally, we can check that we've logged in successfully by printing = the obtained token print(oc.token)=20
# We can get the ID = of all the available projects in this OpenCGA installation for project in oc.projects.search().get_results(): print(project['id']) # We can get the ID of all the available studies in the project for study in oc.studies.search(project=3D'family').get_results(): print(study['id']) # We can get the ID for all the available families in the study for family in oc.families.search(study=3D'corpasome').get_results(): print(family['id']) # We can get the ID for all the available samples in the study for sample in oc.samples.search(study=3D'corpasome').get_results(): print(sample['id'])=20
# We are interested = in looking for all the individuals containing a particular disorder: "OMIM:= 611597" individuals_query_response =3D oc.individuals.search( study=3D'corpasome', # name of the study where the families are stored disorders=3D'OMIM:611597', # id of the disorders of interest include=3D'id' # retrieve only these fields from the results ) # If we want to know exactly the number of individuals obtained, we can run= : print(individuals_query_response.get_num_results()) # Now we fetch all the variants falling in the "BFSP2" gene for those indiv= iduals # In this case, we will limit the variant query to a maximum of 10 results # We also exclude sample information (includeSample=3D'none') as it can be = huge and would make this query much slower for individual in individuals_query_response.get_results(): print('Individual: ' + individual['id']) samples =3D ','.join([sample['id'] for sample in individual['samples']]= ) variant_response =3D oc.variants.query(study=3D'corpasome', sample=3Dsa= mples, gene=3D'BFSP2', includeSample=3D'none', limit=3D10) if variant_response.get_num_results() > 0: for variant in variant_response.get_results(): print('{}:{}-{}\t{}'.format(variant['chromosome'], str(variant[= 'start']), str(variant['end']), variant['type'])) else: print('No variant results found')=20
# Now we are interes= ted in getting the rs IDs for the first 10 variants for a particular sample for variant in oc.variants.query(sample=3D'ISDBM322015', study=3D'corpasome= ', limit=3D10).get_results(): print(variant['names']) # We can also get rs IDs for multiple samples for variant in oc.variants.query(sample=3D'ISDBM322015,ISDBM322016,ISDBM322= 017,ISDBM322018', study=3D'corpasome', limit=3D10).get_results(): print(variant['names'])=20
# If we have an ID f= or a variant, we can obtain its ID in OpenCGA (chromosome:position:referenc= e:alternate) variant_id =3D oc.variants.query(study=3D'corpasome', xref=3D'rs1851943').g= et_result(0)['id'] # Now we are interested in getting all the samples that have that particula= r ID for variant in oc.variants.query_sample(study=3D'corpasome', variant=3Dvari= ant_id, debug=3DTrue).get_results(): for study in variant['studies']: for sample in study['samples']: print(sample['sampleId'])=20
Additionally, there are several notebooks defined in https://github.com/open= cb/opencga/tree/develop/opencga-client/src/main/python/notebooks with m= ore real examples.
Table of Contents:
Useful Links