Different NoSQL databases for storage. Users can choose which database fits bets its current infrastructure and data size
Apache Hadoop for big data processing and storage
High-performance Computing (HPC) for computation-intensive analysis
HTML5 and RESTful web services for information retrieval and data visualization

...

Platform Overview

The image below show a global view of the infrastructure used by OpenCGA. When a file is uploaded to the system, it is stored in:

A filesystem for archiving purposes. This filesystem could be UNIX-based or Hadoop-based.
A database for interactive queries. We plan to support MongoDB and HBase databases.

...

Technical Documentation Overview

At this section you can find some useful links and information for researchers and software developers who are planning to deploy and/or integrating OpenCGA services with their software applications and tools. These are working documents:

Data models : Describes data models for representing Variant and alignment data.
Architecture : Describes the technologies and architecture of OpenCGA and some other implementation details.
Storage implementation : Describes how the data models are mapped to the different database backends (Mongo and HBase).
Releases and Roadmap : Do you want to know what's coming next?
Download and install : Please have a look at the [README file](https://github.com/opencb/opencga/blob/develop/README.md) in the repository.

Getting Involved

...

a full stack solution for big data analysis and visualisation of genomic data. OpenCGA has been designed to provide a secure, high-performance and scalable solution for genomics analysis and visualisation.

OpenCGA implements a complete solution that covers all aspects of genomic analysis: metadata database, authentication and security, variant normalisation and aggregation, variant storage and annotation, highly scalable variant NoSQL storage engine, alignment and coverage, big data variant analysis, RESTful web services, visualisation

OpenCGA is developed and maintained in the University of Cambridge and it is currently used by several big data projects such as GEL (Genomics England).

Main Features

OpenCGA provides a complete solution for genomics data analysis:

Authenticated and secure platform to query and visualise data, advanced permission system
A metadata database to keep track of registered users, projects, studies, files, samples, families, jobs, ...
You can store the clinical data for sample, patients or families
Alignment storage allows to index BAM/CRAM, calculate index and query data and coverage
The most advanced, high-performance and scalable Variant storage solution, you can normalise, load, index and aggregate thousands of whole genomes per day
Genomic Analysis implemented on top of variant and alignment storage layer using advanced technologies such as Spark
Full clinical analysis platform implemented, you can create the cases and run different clinical interpretations algorithms from your scripts or from a web application
Comprehensive RESTful web service API with more than 150 endpoints to fully query and manage all metadata and clinical data
Four different client libraries implemented in Java, Python, R and Javascript
Interactive web-based application for the analysis and visualisation of variants and reads

Projects

OpenCGA is used by several projects being the most importan Genomics England (NHS).

Table of Contents:

Table of Contents

indent	20px

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Platform Overview

Technical Documentation Overview

Getting Involved

Main Features

Projects

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Platform Overview

Technical Documentation Overview

Getting Involved

Main Features

Projects