Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Data organisation

OpenCGA uses a hierarchical structure to organize datasets. Briefly, Projects, Studies and Cohorts are used to organize HGVA metadata:

  • Projects are entities which contain one or more Studies. 
  • Study, in turn, represents a particular data set with/without samples metadatacohorts, and obviously genomic variation data. For example, The 1000 Genomes Project is defined as a study in OpenCGA. Likewise, The Genome of the Netherlads or the Exome Aggregation Consortium are also two different studies, and so on.
  • Finally, a cohort is simply a set of samples defined within a study. For example, populations and super-populations within The 1000 Genomes Project are defined as cohorts. Thus, EUR, AMR or GBR are examples of cohorts.

Please, see below the full list of currently available datasets (studies) in HGVA and how they are organized in different projects.

Studies


Project


Studies

Version/Date
HGVA v1 (Dec. 2016)HGVA v2 (Jul. 2017)
Reference GRCh37



1000 genomes project GRCh37Phase 3 2016-05To be decided
Exome Sequencing Project2016-05To be decided
Exome Aggregation Consortium0.3.1 2016-05To be decided
Genome of the NetherlandsRelease 5 2016-05To be decided
UK10K project2016-05To be decided
Spanish Medical Genome Project2016-12To be decided
Reference GRCh381000 genomes project GRCh38Phase 3 2016-10To be decided
Cancer GRch38QIMR Berghofer Melanoma2016-12To be decided
Chronic Myeloid Leukemia - Russian Academy of Medical Sciences2016-12To be decided

Variant annotation was carried out by the CellBase project. Please, check CellBase documentation for details on additional data sources: Data sources and species

Table of Contents:

  • No labels