Different NoSQL databases for storage. Users can choose which database fits best current infrastructure and data size
Apache Hadoop for big data processing and storage
High-performance Computing (HPC) for computation-intensive analysis
HTML5 and RESTful web services for information retrieval and data visualization

...

Platform Overview

The image below shows a global view of the infrastructure used by OpenCGA. When a file is uploaded to the system, it is stored in:

A filesystem for archiving purposes. This filesystem could be UNIX or Hadoop-based
A database for interactive queries. We plan to support MongoDB and HBase databases

...

Technical Documentation Overview

At this section you can find some useful links and information for researchers and software developers who are planning to deploy and/or integrate OpenCGA services with their software applications and tools. These are working documents:

Data models : Describes data models for representing Variant and alignment data
Architecture : Describes the technologies and architecture of OpenCGA and some other implementation details
Storage implementation : Describes how the data models are mapped to the different database backends (Mongo and HBase)
Releases and Roadmap : Do you want to know what's coming next?
Download and install : Please have a look at the [README file](https://github.com/opencb/opencga/blob/develop/README.md) in the repository

Getting Involved

...

a full stack solution for big data analysis and visualisation of genomic data. OpenCGA has been designed to provide a secure, high-performance and scalable solution. OpenCGA covers all aspects of genomic analysis: metadata database, authentication and security, variant normalisation and aggregation, variant annotation, highly scalable variant NoSQL storage, alignment and coverage, big data variant analysis, visualisation

OpenCGA is developed and maintained in the University of Cambridge and it is currently used by several big data projects such as GEL.

Main Features

In this section you will find a summary of the main features of OpenCGA.

Catalog Metadata

Catalog Data Models and Annotations

Catalog Database

Security

Authentication

Permissions

Alignment Storage

Variant Storage

Performance and scalability

Data Management

Query Engine

Aggregation and Stats

Big Data Analysis

Clinical Analysis

OpenCGA aims to provide a full solution for Clinical Genomics analysis, this covers patient clinical data, interpretation algorithms and a pathogenic variant database.

Clinical Data

Catalog is designed to store any clinical data model.

Clinical Interpretation Analysis

Open a patient case study by creating a clinical analysis, this contains the patient and family data, the disease or phenotype to be analysed or the files among other information.
Complete disease panel management implemented: create, update and delete disease panels. You can also import them automatically from PanelApp (GEL). Updated panels are versioned to keep track of existing analysis.
Several rare disease interpretation analysis implemented such as TEAM or Tiering which is based on GEL RD Tiering tool (Cancer interpretation analysis coming soon). You can use one or more disease panels in the interpretation analysis.
You can save more than one interpretation analysis result in the clinical analysis to create one or more clinical reports.

Pathogenic Variant Database

RESTful Web Services

OpenCGA implements more than 150 RESTful web services to allow users to manipulate and query Catalog metadata and data such as alignment, variants and pathogenic variants. REST web services are documented using Swagger, you can see OpenCGA Swagger documentation at http://bioinfo.hpc.cam.ac.uk/hgva/webservices/. To facilitate the usage all of these web services we have implemented different client libraries and a command line (see below in Usability).

REST web services can be grouped in different categories: Catalog, Alignment, Variant, Clinical and Admin.

Catalog

Alignment

You can index BAM files to query reads and calculate coverage in BigWig format
Query method to fetch alignments in GA4GH format, several filters implemented: region, mapping quality, number of mismatches, number of hits, properly paired, ...

Variant

Clinical

Admin

Usability

REST Clients

Command-line Interface (CLI)

Visualisation

OpenCGA web catalog

IVA

Genome Browser

Table of Contents:

Table of Contents

indent	20px

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Platform Overview

Technical Documentation Overview

Getting Involved

Main Features

Catalog Metadata

Catalog Data Models and Annotations

Catalog Database

Security

Authentication

Permissions

Alignment Storage

Variant Storage

Performance and scalability

Data Management

Query Engine

Aggregation and Stats

Big Data Analysis

Clinical Analysis

Clinical Data

Clinical Interpretation Analysis

Pathogenic Variant Database

RESTful Web Services

Catalog

Alignment

Variant

Clinical

Admin

Usability

REST Clients

Command-line Interface (CLI)

Visualisation

OpenCGA web catalog

IVA

Genome Browser

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Platform Overview

Technical Documentation Overview

Getting Involved

Main Features

Catalog Metadata

Catalog Data Models and Annotations

Catalog Database

Security

Authentication

Permissions

Alignment Storage

Variant Storage

Performance and scalability

Data Management

Query Engine

Aggregation and Stats

Big Data Analysis

Clinical Analysis

Clinical Data

Clinical Interpretation Analysis

Pathogenic Variant Database

RESTful Web Services

Catalog

Alignment

Variant

Clinical

Admin

Usability

REST Clients

Command-line Interface (CLI)

Visualisation

OpenCGA web catalog

IVA

Genome Browser