Data organisation

OpenCGA uses a hierarchical structure to organize datasets. Briefly, Projects, Studies and Cohorts are used to organize HGVA metadata:

Projects are entities which contain one or more Studies.
Study, in turn, represents a particular data set with/without samples metadata, cohorts, and obviously genomic variation data. For example, The 1000 Genomes Project is defined as a study in OpenCGA. Likewise, The Genome of the Netherlads or the Exome Aggregation Consortium are also two different studies, and so on.
Finally, a cohort is simply a set of samples defined within a study. For example, populations and super-populations within The 1000 Genomes Project are defined as cohorts. Thus, EUR, AMR or GBR are examples of cohorts.

Please, see below the full list of currently available datasets (studies) in HGVA and how they are organized in different projects.

Studies

Project name (alias)	Studies	Version/Date
Project name (alias)	Studies	HGVA v1 (Dec. 2016)	HGVA v2 (Jul. 2017)
Reference GRCh37 (reference_grch37)	1000 genomes project GRCh37	Phase 3 2016-05	To be decided
	Exome Sequencing Project	2016-05	To be decided
	Exome Aggregation Consortium	0.3.1 2016-05	To be decided
	Genome of the Netherlands	Release 5 2016-05	To be decided
	UK10K project	2016-05	To be decided
	Spanish Medical Genome Project	2016-12	To be decided
Reference GRCh38 (reference_grch38)	1000 genomes project GRCh38	Phase 3 2016-10	To be decided
Platinum (platinum)	Illumina Platinum	2015-08	To be decided
Cancer GRCh37 (cancer_grch37)	QIMR Berghofer Melanoma	2016-12	To be decided
Cancer GRCh37 (cancer_grch37)	Chronic Myeloid Leukemia - Russian Academy of Medical Sciences	2016-12	To be decided

Variant annotation was carried out by the CellBase project. Please, check CellBase documentation for details on additional data sources: Data sources and species

Table of Contents: