OpenCGA is an open-source project that implements a high-performance, scalable and secure platform for Genomic data analysis and visualisation

OpenCGA provides a secure and Big Data storage engine and analysis framework for genomic scale data analysis of hundreds of terabytes

Main Features

Authenticated and secure platform to query and visualise data
A metadata database to keep track of registered users, projects, studies, files, samples, families, jobs, ...
A clinical data database for sample, patients and families
Full clinical analysis platform implemented, you can create the cases and run different clinical interpretations algorithms from your scripts or from a web application
Comprehensive RESTful web service API with more than 150 endpoints to fully query and manage all metadata and clinical data
Four different client libraries implemented in Java, Python, R and Javascript
Interactive web-based data mining tool based on IVA

Latest news:

Metadata and Security

Metadata Database

OpenCGA Catalog implements a high-performance metadata database to track all files metadata, samples, families, ...

Security

OpenCGA implements authentication to control what data can be seen by users. Data such as Files, Samples, Families, .. can be shared in different way.

Variant and Alignment Storage

Variant Database

OpenCGA implements a high-performance and scalable variant NoSQL database to store and index thousands of whole genome VCF files. Performance observed show more than 2,000 whole genomes indexed a day.

Many variant operations have been implemented such as variant aggregation, stats calculation, variant annotation, export, ...

We have implemented the most advanced query engine and aggregation framework to query variants.

Alignment Storage

Indexing BAM files and calculating coverage is supported. You can efficiently query all these data through REST web services.

Easy to Use

REST API and Clients

We have implemented a comprehensive REST API to work with Catalog and query Variants and Alignment data in a secure way. To facilitate using REST we have developed four client libraries developed in Java, Python, R and Javascript.

Command Line Interface

OpenCGA implements two different command lines, one for the users and one for the admin. Users can fully operate OpenCGA from the command line.

Clinical Analysis

Clinical Data and Disease Panels

You can store all you clinical data in our free data model solution in Catalog. You can define your clinical variables and annotate files, samples, individuals, families or cohort. Clinical Data is indexed automatically to provide a real-time queries and aggregations analysis.

Disease Panels are fully supported and versioned.

Clinical Interpretation Analysis

You can define different types of Clinical Analysis. We have implemented some automatic clinical interpretation algorithms for Rare Diseases (families) and Cancer. A Decision Support System has also been implemented in IVA.

Big Data Analysis

Rich Data Models

OpenCGA takes advantage of the rich data models developed in OpenCB. We make an extensive use of Variant and VariantAnnotation data models.

Spark Analysis

OpenCGA implements several analysis top of the Variant storage. These analysis can use different programming models – such as MapReduce – or different technologies such as Spark.

A Spark-based library has developed to provide extra analysis capabilities.

Visualisation

Source Code

Web based on IVA project at https://github.com/opencb/iva/tree/app/hgva

Server based on OpenCGA at https://github.com/opencb/opencga

Contributing

IVA is a collaborative project that aims to integrate as many reference human studies as possible, you can contact us for feature request. If you want to contribute to the code you are more than welcome to contribute to IVA and OpenCGA

Development

Contributors

Ignacio Medina (HPCS, University of Cambridge)

Source Code

Web based on IVA project at https://github.com/opencb/iva/tree/app/hgva

Server based on OpenCGA at https://github.com/opencb/opencga

Contributing

IVA is a collaborative project that aims to integrate as many reference human studies as possible, you can contact us for feature request. If you want to contribute to the code you are more than welcome to contribute to IVA and OpenCGA

Main Features

Latest news:

Metadata and Security

Metadata Database

Security

Variant and Alignment Storage

Variant Database

Alignment Storage

Easy to Use

REST API and Clients

Command Line Interface

Clinical Analysis

Clinical Data and Disease Panels

Clinical Interpretation Analysis

Big Data Analysis

Rich Data Models

Spark Analysis

Visualisation

Source Code

Contributing

Development

Contributors

Source Code

Contributing

Recent space activity