Here you can find a full report of about loading 62,000 samples for Genomics England Research environment.

Platform

The platform used for this case study consists on a Hadoop Cluster of 35 nodes (5 + 30) and a LSF queue system:


Node#nodescoresmemory (GB)
LSF queue node for load1012364
Hadoop master nodes528216
Hadoop worker nodes3028216

Data

The data of this case study contains a total of 64,078 samples divided in 4 different datasets.


DatasetAliasFilesSamplesSamples per fileVariants
Rare Disease GRCh38RD3816,59133,1802.00437,740,498
Cancer Germline GRCh38CG389,1679,1671.00286,136,051
Cancer Somatic GRCh38CS389,5899,5891.00398,402,166
Rare Disease GRCh37RD375,32912,1422.28298,763,059
Total40,67664,078
1,421,041,774


Loading Data


Query Performance

Common Queries


Clinical Queries



Table of Contents: