Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Data organisation
OpenCGA uses a hierarchical structure to organize datasets. Briefly, Projects, Studies and Cohorts are used to organize HGVA metadata:
- Projects can contain one or more Studies. A project specifies a species and assembly, all studies from the same project are stored in the same database and share variant annotation.
- Study, in turn, represents a particular data set with/without samples metadata, cohorts, and obviously genomic variation data. For example, The 1000 Genomes Project is defined as a study in OpenCGA. Likewise, The Genome of the Netherlads or the Exome Aggregation Consortium are also two different studies, and so on.
- Finally, a cohort is simply a set of samples defined within a study. For example, populations and super-populations within The 1000 Genomes Project are defined as cohorts. Thus, EUR, AMR or GBR are examples of cohorts.
Here you can get more info about OpenCGA Catalog data models. Projects and Studies have unique alias to ease their usage. Please, see below the full list of currently available datasets (loaded as studies) in HGVA and how they are organised in different projects.
Studies
Project name (alias) | Studies | HGVA Version/Date | |
---|---|---|---|
v1 (Dec. 2016) | v2 (June 2017) | ||
Reference GRCh37 (reference_grch37) | 1000 genomes project Genomes Project GRCh37 | Phase 3 2016-05 | Phase 3 2016-05 |
Exome Sequencing Project (ESP6500) | 2016-05 | 2016-05 | |
Exome Aggregation Consortium (ExAC) | 0.3.1 2016-05 | 0.3.1 2016-05 | |
Genome of the Netherlands (GoNL) | Release 5 2016-05 | Release 5 2016-05 | |
UK10K projectProject | 2016-05 | 2016-05 | |
Spanish Medical Genome Project (MGP) | 2016-12 | 2016-12 | |
gnomAD Exome and Genome | -2017-05 | ||
Reference GRCh38 (reference_grch38) | 1000 genomes project Genomes Project GRCh38 | Phase 3 2016-10 | Phase 3 2016-10 |
ESP6500 | - | ||
Cancer GRCh37 (cancer_grch37) | QIMR Berghofer Melanoma | 2016-12 | 2016-12 |
Chronic Myeloid Leukemia - Russian Academy of Medical Sciences | 2016-12 | 2016-12 | |
Platinum (platinum) | Illumina Platinum | 2015-08 | 2015-08 |
Variant annotation was carried out by the CellBase project. Please, check CellBase documentation for details on additional data sources: Data sources and species
Table of Contents:
Table of Contents |
---|