Human Genomic Variation Archive (HGVA) is an open-access genomic variation resource that integrates variants from the main reference human projects. HGVA also adds valuable information such as variant annotation: consequence types, population frequencies, protein effect predictions, variant-associated phenotype, etc.

Overview

HGVA aims to provide a high-performance and scalable resource to store, query and visualise variants from main open-access human datasets. We have put special emphasis on making HGVA very responsive even with complex queries and to make the data available to researchers and bioinformaticians in three different ways: a rich web interface based on OpenCB IVA, client libs (Python, Java and JavaScript) and through a command-line. Also, different datasets are normalised and annotated using OpenCB CellBase.

HGVA does not intend to replace or provide full archiving services like NCBI dbSNP or EMBL-EBI EVA projects, these provide excellent submission and accessioning services and play a crucial role allowing scientists to submit variation data for many different species. Instead, HGVA is focused on human and only the most relevant datasets are selected and indexed. HGVA provides different high-performance user interfaces to allow researchers to query and visualise human datasets or to use this data in genomic pipelines.

We would like to thank very much the authors of the different projects such as 1000 Genomes Project or ExAC (see below for a complete list) for making all these invaluable data open and freely accessible for the biomedical community, we hope HGVA will help to make all these data more accessible to them.

HGVA was born during 2016 as a response to the necessity of having most relevant human datasets centralised, normalised and annotated for different analysis pipelines. It is currently developed and maintained by researchers at University of Cambridge and Genomics England and it is freely available at http://hgva.opencb.org

Main Features

  • Most important high-quality variant studies normalised and integrated in one single server database
  • High-performance complex queries to variants. Faceted seach also implemented
  • Datasets are organised in four main projects Reference Studies GRCh37Reference Studies GRCh38, Cancer GRCh37 and Platinum (see below)
  • Rich variant annotation performed using OpenCB CellBase, including HPO terms, consequence types, substitution effect prediction scores, Gene Ontology terms, etc.
  • Population frequencies calculated, including populations and super-populations
  • Data is indexed in the server using OpenCB OpenCGA. This provides a high-performance and scalable variant storage solution for big data analysis and visualisation
  • Rich interactive web-based data mining tool based on OpenCB IVA
  • Clients in Python, Java, JavaScript for fast programmatic access
  • Command-line interface developed


Latest news:



Current Projects and Studies

Reference Studies GRCh37

  • 1000 Genomes Phase 3
  • Exome Aggregation Consortium (ExAC)
  • Exome Sequencing Project (ESP6500)
  • Genome of the Netherlands (GoNL)
  • UK10K project
  • Spanish Medical Genome Project (MGP)

Refernce Studies GRCh38

  • 1000 Genomes Phase 3

Cancer GRCh37

  • QIMR Berghofer Melanoma
  • Chronic Myeloid Leukemia - Russian Academy of Medical Sciences

Platinum

  • Illumina Platinum

Statistics

More than 250M variants reported and about 120M of unique variants



StudyNum. variants
1kG Phase385170328
ExAC

63000000

ESP65001997952
GoNL20708427
UK10K ALSPAC46618311
UK10K TWINSUK46618311
MGP711005



Developers

Source Code

Web based on IVA project at  https://github.com/opencb/iva/tree/app/hgva

Server based on OpenCGA at  https://github.com/opencb/opencga

Contributing

HGVA is a collaborative project that aims to integrate as many reference human studies as possible, you can contact us for feature request. If you want to contribute to the code you are more than welcome to contribute to IVA and OpenCGA

Contact



Recent space activity