Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

OpenCGA is an open-source platform that aims to provide a full stack solution for big data analysis and visualisation of genomic data. OpenCGA has been designed to provide a secure, high-performance and scalable solution. OpenCGA covers all aspects of genomic analysis: metadata database, authentication and security, variant normalisation and aggregation, variant annotation, highly scalable variant NoSQL storage, alignment and coverage, big data variant analysis, visualisation

OpenCGA is developed and maintained in the University of Cambridge and it is currently used by several big data projects such as GEL.

Main Features

In this section you will find a summary of the main features of OpenCGA.

Catalog Metadata

Catalog Data Models and Annotations 


Catalog Database


Security

Authentication and Permissions

  • OpenCGA comes with a built-in authentication system. Other systems are also supported such as LDAP or Microsoft Azure AD (under development). Authentication tokens use JWT standard which facilitate the creation of federated systems.
  • Advanced and efficient resource permission system implemented. You can define different permissions such as VIEW, WRITE or DELETE at study level or at any specific document. This allow to share data with other users. More information at Sharing and Permissions.

Alignment Storage


Variant Storage

OpenCGA provides a framework for implementing big data variant storage engines which support: real-time queries, interactive complex data aggregations, full-text search, variant analysis, ... The framework takes care of several common operations such as variant normalisation, sample genotype aggregation, variant stats calculation, variant annotation, secondary indexing or in-memory cache. Two different engines are implemented using NoSQL databases: MongoDB and HBase. A secondary index using Solr is nicely integrated with the two implementations. By implementing variant storage engines with NoSQL database we ensure a fast response time and high concurrency. 

Data Management

  • Advanced variant normalisation tool implemented.
  • Rich and efficient data models implemented. Variant data model support different studies, file information, samples information and a rich variant annotation. Sample genotypes are efficiently stored to allow 
  • Dynamic variant storage, you can add or remove samples dynamically from the variant storage. This allows to add hundred of samples a day and remove incorrect data at any point.
  • Variant Stats
  • Variant Annotation

Query Engine


Aggregation and Stats


Big Data Analysis


Performance and scalability

  • Storage engines have been implemented to provide real-time queries and interactive aggregations (faceted) even with thousands of whole genomes.
  • Data mod

Clinical Analysis

OpenCGA aims to provide a full solution for Clinical Genomics analysis, this covers patient clinical data, interpretation algorithms and a pathogenic variant database. 

Clinical Data

  • Catalog is designed to store any clinical data model.

Clinical Interpretation Analysis

  • Open a patient case study by creating a clinical analysis, this contains the patient and family data, the disease or phenotype to be analysed or the files among other information.
  • Complete disease panel management implemented: create, update and delete disease panels. You can also import them automatically from PanelApp (GEL). Updated panels are versioned to keep track of existing analysis.
  • Several rare disease interpretation  analysis implemented such as TEAM or Tiering which is based on GEL RD Tiering tool (Cancer interpretation analysis coming soon). You can use one or more disease panels in the interpretation analysis.
  • You can save more than one interpretation analysis result in the clinical analysis to create one or more clinical reports

Pathogenic Variant Database


RESTful Web Services

OpenCGA implements more than 150 RESTful web services to allow users to manipulate and query Catalog metadata and data such as alignment, variants and pathogenic variants. REST web services are documented using Swagger, you can see OpenCGA Swagger documentation at http://bioinfo.hpc.cam.ac.uk/hgva/webservices/. To facilitate the usage all of these web services we have implemented different client libraries and a command line (see below in Usability).

REST web services can be grouped in different categories: Catalog, Alignment, Variant, Clinical and Admin. 

Catalog

Alignment

  • You can index BAM files to query reads and calculate coverage in BigWig format
  • Query method to fetch alignments in GA4GH format, several filters implemented: region, mapping quality, number of mismatches, number of hits, properly paired, ...

Variant


Clinical


Admin


Usability

REST Clients


Command-line Interface (CLI)


Visualisation

OpenCGA web catalog


IVA

Genome Browser


Table of Contents:

  • No labels