OpenCGA is an open-source platform that aims to provide a full stack solution for big data analysis and visualisation of genomic data. OpenCGA has been designed to provide a secure, high-performance and scalable solution. OpenCGA covers all aspects of genomic analysis: metadata database, authentication and security, variant normalisation and aggregation, variant annotation, highly scalable variant NoSQL storage, alignment and coverage, big data variant analysis, visualisation
OpenCGA is developed and maintained in the University of Cambridge and it is currently used by several big data projects such as GEL.
In this section you will find a summary of the main features of OpenCGA.
OpenCGA provides a framework for implementing big data variant storage engines which support: real-time queries, interactive complex data aggregations, full-text search, variant analysis, ... The framework takes care of several common operations such as variant normalisation, sample genotype aggregation, variant stats calculation, variant annotation, secondary indexing or in-memory cache. Two different engines are implemented for different use cases: MongoDB and HBase. A secondary index using Solr is nicely integrated with the two implementations.
OpenCGA aims to provide a full solution for Clinical Genomics analysis, this covers patient clinical data, interpretation algorithms and a pathogenic variant database.
OpenCGA implements more than 150 RESTful web services to allow users to manipulate and query Catalog metadata and data such as alignment, variants and pathogenic variants. REST web services are documented using Swagger, you can see OpenCGA Swagger documentation at http://bioinfo.hpc.cam.ac.uk/hgva/webservices/. To facilitate the usage all of these web services we have implemented different client libraries and a command line (see below in Usability).
REST web services can be grouped in different categories: Catalog, Alignment, Variant, Clinical and Admin.
Table of Contents: