Date: Sat, 30 Mar 2024 03:30:53 +0000 (GMT) Message-ID: <1270854115.677.1711769453097@web> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_676_830600010.1711769453094" ------=_Part_676_830600010.1711769453094 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The first step is to import the module and initialize the CellBaseClient= :
>>&= gt; from pycellbase.cbclient import CellBaseClient >>> cbc =3D CellBaseClient()=20
The second step is to create the specific client for the data we want to= query (in this example we want to obtain information for a gene):
>>&= gt; gc =3D cbc.get_gene_client()=20
And now, you can start asking to the CellBase RESTful service by providi= ng a query ID:
>>&= gt; tfbs_responses =3D gc.get_tfbs('BRCA1') # Obtaining TFBSs for BRCA1 ge= ne=20
Responses are retrieved as JSON formatted data. Therefore, fields can be= queried by key:
>>&= gt; tfbs_responses =3D gc.get_tfbs('BRCA1') >>> tfbs_responses[0]['result'][0]['tfName'] 'E2F4' >>> transcript_responses =3D gc.get_transcript('BRCA1') >>> 'Number of transcripts: %d' % (len(transcript_responses[0]['re= sult'])) 'Number of transcripts: 27' >>> for tfbs_response in gc.get_tfbs('BRCA1,BRCA2,LDLR'): ... print('Number of TFBS for "%s": %d' % (tfbs_response['id'], len(tfb= s_response['result']))) 'Number of TFBS for "BRCA1": 175' 'Number of TFBS for "BRCA2": 43' 'Number of TFBS for "LDLR": 141'=20
Data can be accessed specifying comma-separated IDs or a list of IDs:
>>&= gt; tfbs_responses =3D gc.get_tfbs('BRCA1') >>> len(tfbs_responses) 1 >>> tfbs_responses =3D gc.get_tfbs('BRCA1,BRCA2') >>> len(tfbs_responses) 2 >>> tfbs_responses =3D gc.get_tfbs(['BRCA1', 'BRCA2']) >>> len(tfbs_responses) 2=20
If there is an available resource, but there is not an available method = in this python package, the CellBaseClient can be used to create the URL of= interest and query the RESTful service:
>>&= gt; tfbs_responses =3D cbc.get(category=3D'feature', subcategory=3D'gene', = query_id=3D'BRCA1', resource=3D'tfbs') >>> tfbs_responses[0]['result'][0]['tfName'] 'E2F4'=20
Optional filters and extra options can be added as key-value parameters = (value can be a comma-separated string or a list):
>>&= gt; tfbs_responses =3D gc.get_tfbs('BRCA1') >>> len(res[0]['result']) 175 >>> tfbs_responses =3D gc.get_tfbs('BRCA1', include=3D'name,id') >>> len(res[0]['result']) 175 >>> tfbs_responses =3D gc.get_tfbs('BRCA1', include =3D ['name', '= id']) >>> len(res[0]['result']) 175 >>> tfbs_responses =3D gc.get_tfbs('BRCA1', limit=3D100) >>> len(res[0]['result']) 100 >>> tfbs_responses =3D gc.get_tfbs('BRCA1', skip=3D100) >>> len(res[0]['result']) 75=20
The best way to know which data can be retrieved for each client is eith= er checking out the RESTful web serv= ices section of the CellBase Wiki or the C= ellBase web services
If we do not know which method is the most adequate for our task, we can= get helpful information for each data-specific client:
>>&= gt; cbc.get_region_client().get_help() RegionClient - get_clinical: Retrieves all the clinical variants - get_conservation: Retrieves all the conservation scores - get_gene: Retrieves all the gene objects for the regions. If query pa= ram histogram=3Dtrue, frequency values per genomic interval will be returne= d instead. - get_model: Get JSON specification of Variant data model - get_regulatory: Retrieves all regulatory elements in a region - get_repeat: Retrieves all repeats for the regions - get_sequence: Retrieves genomic sequence - get_tfbs: Retrieves all transcription factor binding site objects for= the regions. If query param histogram=3Dtrue, frequency values per genomic= interval will be returned instead. - get_transcript: Retrieves all transcript objects for the regions - get_variation: Retrieves all the variant objects for the regions. If = query param histogram=3Dtrue, frequency values per genomic interval will be= returned instead.=20
We can get the accepted parameters and filters for a specific method of = interest by using the get_help method:
>>&= gt; cbc.get_region_client().get_help('get_gene', show_params=3DTrue)= =20
Configuration stores the REST services host, API version and species.
Getting the default configuration:
>>&= gt; ConfigClient().get_default_configuration() {'version': 'v4', 'species': 'hsapiens', 'rest': {'hosts': ['http://bioinfo= .hpc.cam.ac.uk:80/cellbase']}}=20
Showing the configuration parameters being used at the moment:
>>&= gt; cbc.show_configuration() {'host': 'bioinfo.hpc.cam.ac.uk:80/cellbase', 'version': 'v4', 'species': '= hsapiens'}=20
A custom configuration can be passed to CellBaseClient with a ConfigClie= nt object. JSON and YML files are supported:
>>&= gt; from pycellbase.cbconfig import ConfigClient >>> from pycellbase.cbclient import CellBaseClient >>> cc =3D ConfigClient('config.json') >>> cbc =3D CellBaseClient(cc)=20
A custom configuration can also be passed as a dictionary:
>>&= gt; from pycellbase.cbconfig import ConfigClient >>> from pycellbase.cbclient import CellBaseClient >>> custom_config =3D {'rest': {'hosts': ['bioinfo.hpc.cam.ac.uk:8= 0/cellbase']}, 'version': 'v4', 'species': 'hsapiens'} >>> cc =3D ConfigClient(custom_config) >>> cbc =3D CellBaseClient(cc)=20
If you want to change the configuration on the fly you can directly modi= fy the ConfigClient object:
>>&= gt; cc =3D ConfigClient() >>> cbc =3D CellBaseClient(cc) >>> cbc.get_config()['version'] 'v4' >>> cc.version =3D 'v3' >>> cbc.get_config()['version'] 'v3'=20
# Loading= CellBase and configuration clients from pycellbase.cbconfig import ConfigClient from pycellbase.cbclient import CellBaseClient # Initializing CellBaseClient cc =3D ConfigClient("/path/to/config.json") cbc =3D CellBaseClient(cc) # Initializing gene client gc =3D cbc.get_gene_client() # Retrieving transcription factor binding sites (TFBS) for a gene list gene_list =3D ['BRCA1', 'BRCA2', 'LDLR'] tfbs_responses =3D gc.get_tfbs(gene_list, include=3D'id') # Printing the number of TFBS found for each gene for response in tfbs_responses: print('Number of TFBS for "%s": %d' % (response['id'], len(response['r= esult'])))=20
A use case where PyCellBase is used to obtain multiple kinds of data fro= m different sources can be found in this Jupyter Notebook
You can navigate from GitHub examples: