Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Once the CLI code has been downloaded a properly configured (please refer to the Command Line Interface (CLI) section for further information), it is ready for start querying HGVA. We will focus on those commands, subcommands and parameters of the CLI which are of more interest for HGVA users, giving examples of their use and pin pointing certain peculiarities of the parameters for HGVA. Data is hierarchically organised in Projects and Studies. Please, have a look at Datasets and Studies in order to understand how data is organized: Projects, Studies and Cohorts . For details on the query parameters, please refer to the Swagger documentation at:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices

Further details on the API specification can be found at the RESTful Web Services and Clients section.

Getting information about genomic variants

Getting variant data from a given study:

http://bioinfodev

Installation

As described in the documentation, HGVA backend is powered by the OpenCGA project. The CLI is distributed with the rest of the OpenCGA code. The OpenCGA code can be cloned in your machine by executing in your terminal. Checkout the latest code (release-1.1.0 branch):

Code Block
languagetext
themeMidnight
$ git clone https://github.com/opencb/opencga.git
$ git checkout v1.3.6

Alternatively, you can download tar.gz files with the code for the latest tags/releases of OpenCGA from:

https://github.com/opencb/opencga/releases

Once you have downloaded the code, follow the instructions at the How to Build section of the OpenCGA repository:

https://github.com/opencb/opencga

The CLI interface is accessible through the opencga.sh script:


Code Block
languagetext
themeMidnight
cd opencga
cd build
cd bin
opencga/build/bin$ ./opencga.sh

Program:     OpenCGA (OpenCB)
Version:     1.1.0
Git commit:  f2dace56fcdf491efee8ebb0cb43f981e31c320e
Description: Big Data platform for processing and analysing NGS data

Usage:       opencga.sh [-h|--help] [--version] <command> [options]

Catalog commands:
         users  User commands
      projects  Project commands
       studies  Study commands
         files  File commands
          jobs  Jobs commands
   individuals  Individual commands
      families  Family commands
       samples  Samples commands
     variables  Variable set commands
       cohorts  Cohorts commands

Analysis commands:
    alignments  Implement several tools for the genomic alignment analysis
       variant  Variant commands

The CLI provides commands, subcommands and parameters to access its functionality. Commands of most interest for HGVA users are projectsstudiescohorts and samples. Please, find below a list of commands which can be of most interest for HGVA user. Further documentation on the OpenCGA CLI can be found at the Command Lines section of the OpenCGA documentation.

Configuration

As previously said, the CLI makes intensive use of the RESTful API. Thus, the only configuration detail needed for the CLI to work is a URL where the Web Services API is hosted. The configuration file client-configuration.yml is used for this purpose. You shall find a template of this file at the build/conf directory:

Code Block
languagetext
themeMidnight
$ ll opencga/build/conf/client-configuration.yml
-rw-r--r-- 1 fjlopez fjlopez 290 Oct 24 17:49 opencga/build/conf/client-configuration.yml

Edit this file with any text editor and set the rest → host attribute to "http://bioinfo.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/analysis/variant/query?studies={project}:{study}":

Code Block
titleConfiguration file client-configuration.yml
---
## number of seconds that session remain open
sessionDuration: 12000

## REST client configuration options
rest:
  host: "http://bioinfo.hpc.cam.ac.uk/hgva"
  batchQuerySize: 200
  timeout: 30000
  defaultLimit: 2000

## gRPC configuration options
grpc:
  host: "localhost:9091"


Examples

Getting information about variants

You can query variants by using the variant command and query subcommand. An extensive list of filtering parameters allow great flexibility on the queries (check Swagger documentation link above). Please, check inline help provided by opencga.sh for further details. For example, get TTN variants from the Genome of the Netherlands study, which is framed within the reference_grch37 project. We will restrict studies data to those corresponding to GONL. Finally, we will also limit the number of returned results to 3:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/analysis/variant/query?gene=TTN&studies=GONL&limit=3

Getting information about projects

Getting all metadata from a particular project:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/projects/{projects}/info

Code Block
themeMidnight
./opencga.sh variant query --gene TTN --study GONL --limit 3 --of json --output-study GONL

Getting information about projects

You can use the command projects to query projects data.

For getting all metadata from a particular project you can use the info subcommand. For example, getting all metadata for the referencecancer_grch37 project:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/projects/reference_grch37/info

Getting grch37 project:

Code Block
themeMidnight
./opencga.sh projects info --project cancer_grch37

For getting all metadata from all studies associated to a particular project :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/projects/{projects}/studies

yo ucan use the studies subcommand. For example, getting all studies and their metadata for the cancer_grch37 project:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/projects/cancer_grch37/studies

Getting information about studies

Get
Code Block
themeMidnight
./opencga.sh projects studies --project cancer_grch37

Getting information about studies

You can use the command studies to query studies data.

For getting all available studies and their metadata . Please you can use the search subcommand. For example, getting all metadata for all available studies (please note, of special interest will be here the field alias which contains the study identifier to be used as an input whenever a study must be passed as a parameter):

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/search

For example, getting all metadata for all available studies:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/search

Getting

Code Block
themeMidnight
./opencga.sh studies search

For getting summary data from a particular study :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/{project}:{study}/summary

you can use the summary subcommand. For example, getting summary data for study 1kG_phase3 which is framed within project reference_grch37:http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/

Code Block
themeMidnight
./opencga.sh studies summary --study reference_grch37:1kG_phase3
/summary
 

Getting For getting all available metadata for a particular study :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/{project}:{study}/info

you can use the info command. For example, getting all metadata for study GONL  which is framed within the project reference_grch37:

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/reference_grch37:GONL/info

Getting
Code Block
themeMidnight
./opencga.sh studies info --study GONL

For getting all samples metadata for a given study :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/{project}:{study}/samples

you can use the samples subcommand. For example, getting all samples metadata for study 1kG_phase3 which is framed within project reference_grch37. Please, note that not all studies contain samples data, e.g. GONL, ExAC, among others, only provide variant lists and aggregated frequencies, i.e. no sample genotypes.http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/studies/

Code Block
themeMidnight
./opencga.sh studies samples --study reference_grch37:1kG_phase3
/samples

Getting information about samples


Get

You can use the command samples to query samples data.

For getting all metadata for a particular sample :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/samples/{sample}/info?study={project}:{study}

you can use the info subcommand. For example, get all metadata for sample HG00096 of the 1kG_phase3 study which is framed within the reference_grch37 project:http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/samples/HG00096/info?study=

Code Block
themeMidnight
./opencga.sh samples info --sample HG00096 --study reference_grch37:1kG_phase3

Getting information about cohorts

Getting You can use the cohorts command to query cohorts data.

For getting all samples metadata in a given cohort :

http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/cohorts/{cohort}/samples?study={project}:{study}

For you can use the samples subcommand. For example, get all samples metadata for cohort GBR from study 1kG_phase3 which is framed within project reference_grch37:http://bioinfodev.hpc.cam.ac.uk/hgva-1.0/webservices/rest/v1/cohorts/GBR/samples?study=

Code Block
themeMidnight
./opencga.sh cohorts samples --study reference_grch37:1kG_phase3 --cohort GBR


Table of Contents:

Table of Contents