Here you can find a full report of about oading 62,000 samples for Genomics England Research environment.
A 30- nodes Hadoop cluster ...
62,000 genomes organised in...
* GRCh38 Germline - LOADED, STATS and ANNOTATED
- RD38 (~16,000 VCF files multisample)
- CG38 (~10,000 VCF files)
* GRCh38 Somatic - LOADED, STATS and ANNOTATED
- CS38 (~10,000 VCF files)
Table of Contents: