Using the Python client

An explanation about how to install and use the Python client can be found here: Python
However, in this document we will create an example tutorial to learn how to work with the Python client in order to get the most out of it.
script1.py
# First, we need to import both the ClientConfiguration and the OpenCGAClient
from pyopencga.opencga_config import ClientConfiguration
from pyopencga.opencga_client import OpenCGAClient

# The main client-configuration.yml file has a 'host' section to point to the Rest OpenCGA endpoints.
# We need to either pass the path to the configuration file or a dictionary with the format of the file.
config = ClientConfiguration('/opt/opencga/conf/client-configuration.yml')
config = ClientConfiguration({
        "rest": {
                "host": "http://bioinfo.hpc.cam.ac.uk/web-apps/opencga-demo"
        }
})

# And finally create an instance of the OpenCGAClient passing the configuration
oc = OpenCGAClient(config)

# Now we need to authenticate.
oc.login('myUser')               # If done this way, password will be prompted to the user so it is not displayed but...
oc.login('myUser', 'myPassword') # ... it is also possible to pass the password directly as an additional parameter

# Let's assume our installation already has been populated and we are interested in looking for 
# all the families containing a concrete disorder: 'Rod-cone dystrophy'. To fetch this data, we will need to:
family_query_response = oc.families.search(study="study1", limit=10, disorders="Rod-cone dystrophy", include="id,members.id")

# Running oc.families.search(disorders="Rod-cone dystrophy") with only the 'disorders' field would only work
# if only one project and one study has been defined. However, we expect that most of the OpenCGA installations
# will have more than one study, so we need to specify the families of which study we are looking for.

# Additionally, we are passing limit = 10 to limit the number of family results we want to fetch. Because this
# is an example, we are simply limiting the number of results to 10. 

# Finally, if we don't specify anything else, all the values from the Family will be fetched. When writing 
# scripts, we are normally interested in just a few fields of a whole entry, so adding the include/exclude fields
# will definitely help us getting the results faster as we will avoid sending data we are going to discard through
# the network. In this particular case, we are only interested in getting the Family id and the id of the members
# of the family. To know what fields you can include/exclude, please follow the data models we have defined.

# family_query_response is an instance of the QueryResponse class defined in the Python library. To read the fields,
# we could do the following:
family_query_response.time         # Get the time spent with the REST call
family_query_response.apiVersion   # Get the API version of the REST
family_query_response.queryOptions # Get the QueryOptions of the call (include/exclude, limit, skip, count...)
family_query_response.warning      # Get warning messages
family_query_response.error        # Get error messages
family_query_response.responses    # Get the responses (Array of QueryResults containing the data queried)

# We can iterate over all the results to print all the id's using the 'results()' method such as in the example below:
for family in family_query_response.results():
	print (family['id'])

# We could have this same behaviour if we run the following script, which is why 'results()' is that useful.
for query_result in family_query_response.responses:
	for family in query_result['results']:
		print (family['id'])

# If we want to know exactly the amount of results obtained, we can run:
family_query_response.num_results()


# Or let's say that instead of querying the data, we only wanted to get the number of families in the study with that disorder. In that case, we could:
family_count_response = oc.families.search(study="study1", disorders="Rod-cone dystrophy", count=True)


# And then get the number of matches by calling to the num_matches method:
family_count_response.num_matches()


# Now that we know how to work with the OpenCGA QueryResponse object, we will write a script to fetch all the variants 
# falling in the 'BMPR2' gene found in any member of the family. In this case, we will limit the variant query to a maximum
# of 10 results excluding the sample information (sample information can be huge and would make this query much slower).
for family in oc.families.search(study="study1",limit=10,disorders="Rod-cone dystrophy",include="id").results():
    print ("Family: " + family['id'])
    variant_response = oc.variant.query(family=family['id'], gene= 'BMPR2', study='study1', includeSamples=None, limit=10)
    if variant_response.num_total_results() > 0:
        for variant in variant_response.results():
            print (variant['chromosome'] + ":" + str(variant['start']) + "-" + str(variant['end']) + '\t' + variant['type'])
    else:
        print ("No variant results found")
    print()
Page tree

Using the Python client