Here you can find a full report of about oading 62,000 samples for Genomics England Research environment.

Platform

A 30- nodes Hadoop cluster ... 

Data

62,000 genomes organised in...

  • GRCh37 Germline - LOADING
    • RD37 (~5,000 VCF files multisample)

* GRCh38 Germline - LOADED, STATS and ANNOTATED

- RD38 (~16,000 VCF files multisample)
- CG38 (~10,000 VCF files)
* GRCh38 Somatic - LOADED, STATS and ANNOTATED
- CS38 (~10,000 VCF files)


Loading Data


Query Performance

Common Queries


Clinical Queries



Table of Contents: