Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data organisation

OpenCGA uses a hierarchical two-level structure to organize organise datasets. Briefly,  Projects, Studies and Cohorts are these are Projects and Studies and are used to organize organise HGVA data and metadata:

  • Projects are entities which contain one or more Studies is the top-level and can contain one or more studiesProjects are specific for one species and assembly, all studies in a project are stored and indexed together in the same database and, therefore, they share the variant annotation
  • Study, in turn, represents a particular data set with/without dataset which can contain samples metadata and cohorts, and obviously all the genomic variation datavariants. For example, The the 1000 Genomes Project is  is defined as a study in OpenCGA . Likewise, The Genome of the Netherlads or the Exome Aggregation Consortium are also two different studies, and so on.Finally, a cohort is simply and belongs to Reference GRCh37 project. You can also define cohorts in the studies, they are just a set of samples defined within a study. For example, populations and super-populations within The 1000 Genomes Project are defined as cohorts. Thus, so EUR, AMR or GBR are examples of cohorts.

Please, click on http://bioinfo.hpc.cam.ac.uk/hgva-1.0/... to get a full list of currently available datasets (studies) in You can get more information about data organisation at OpenCGA Catalog Data Management. Projects and Studies have a unique alias to ease their usage from the command-line and REST API, you can find more information about how to query data programmatically at RESTful Web Services and Clients. Please, see next section the full list and organisation of the currently available Projects and Studies (datasets) in HVGVA.

Datasets

In this sections you can find all datasets loaded in HGVA and how they are organized in different projects.

Studies

Project

organised in Projects and Studies (see previous section).

HGVA


Project name (alias)

Studies

Version/Date
HGVA Version (date)
NameAliasv1 (Dec. 2016)
HGVA
v2 (
Jul
Jan.
2017
2018)
Reference GRCh37
(reference_grch37)


1000
genomes project
Genomes Project GRCh371kG_phase3Phase 3 2016-05
1000 genomes project GRCh38
Phase 3 2016-
10
05
Exome Sequencing Project (ESP6500)ESP65002016-052016-05
Exome Aggregation Consortium (ExAC)EXAC0.3.1 2016-050.3.1 2016-05
Genome of the Netherlands (GoNL)GONLRelease 5 2016-05
UK10K project
Release 5 2016-05
UK10K ProjectUK10k2016-052016-05
DiscovEHRDISCOVEHR-
Genome Aggregation Database (gnomAD Exomes)GNOMAD_EXOMES-
Genome Aggregation Database (gnomAD Genomes)GNOMAD_GENOMES-
Spanish Medical Genome Project (MGP)MGP2016-122016-12

Reference GRCh38

(reference_grch38)

1000 Genomes Project GRCh381kG_phase3Phase 3 2016-10Phase 3 2016-10
ESP6500ESP6500-
UK10K Project (*)UK10K-
DiscovEHR (*)DISCOVEHR-
Genome Aggregation Database (gnomAD Exomes) (*)GNOMAD_EXOMES-
Genome Aggregation Database (gnomAD Genomes) (*)GNOMAD_GENOMES-

Cancer GRCh37

(cancer_grch37)

QIMR Berghofer MelanomaQIMR_Berghofer_Melanoma2016-122016-12
Chronic Myeloid Leukemia - Russian Academy of Medical SciencesRAMS_CML2016-122016-12

Platinum

(platinum)

Illumina Platinumillumina_platinum2015-082015-08

(*) Liftover carried out by Genomics England (GEL)


Variant Anotation

Variant annotation was carried out by the CellBase project. Please, check CellBase documentation for details on additional data sources: Data sources and species

Table of Contents:

Table of Contents