Data organisation
OpenCGA uses a hierarchical structure to organize datasets. Briefly, Projects, Studies and Cohorts are used to organize HGVA metadata:
- Projects are entities which contain one or more Studies.
- Study, in turn, represents a particular data set with/without samples metadata, cohorts, and obviously genomic variation data. For example, The 1000 Genomes Project is defined as a study in OpenCGA. Likewise, The Genome of the Netherlads or the Exome Aggregation Consortium are also two different studies, and so on.
- Finally, a cohort is simply a set of samples defined within a study. For example, populations and super-populations within The 1000 Genomes Project are defined as cohorts. Thus, EUR, AMR or GBR are examples of cohorts.
Please, click on http://bioinfo.hpc.cam.ac.uk/hgva-1.0/... to get a full list of currently available datasets (studies) in HGVA and how they are organized in different projects.
Studies
Project | Studies | Version/Date | |
---|---|---|---|
HGVA v1 (Dec. 2016) | HGVA v2 (Jul. 2017) | ||
Reference GRCh37 | 1000 genomes project GRCh37 | Phase 3 2016-05 | |
1000 genomes project GRCh38 | Phase 3 2016-10 | ||
Exome Sequencing Project | 2016-05 | ||
Exome Aggregation Consortium | 0.3.1 2016-05 | ||
Genome of the Netherlands | Release 5 2016-05 | ||
UK10K project | 2016-05 | ||
Spanish Medical Genome Project | 2016-12 | ||
QIMR Berghofer Melanoma | 2016-12 | ||
Chronic Myeloid Leukemia - Russian Academy of Medical Sciences | 2016-12 |
Variant annotation was carried out by the CellBase project. Please, check CellBase documentation for details on additional data sources: Data sources and species