Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Here you can find a full report of about loading 62,000 samples for Genomics England Research environment.

Platform

The platform used for this case study consists on a Hadoop Cluster of 35 nodes (5 + 30) and a LSF queue system:


Node#nodescoresmemory (GB)
LSF queue node for load1012364
Hadoop master nodes528216
Hadoop worker nodes3028216

Data

The data of this case study contains a total of 64,078 samples divided in 4 different datasets.


DatasetAliasFilesSamplesSamples per fileVariants
Rare Disease GRCh38RD3816,59133,1802.00437,740,498
Cancer Germline GRCh38CG389,1679,1671.00286,136,051
Cancer Somatic GRCh38CS389,5899,5891.00398,402,166
Rare Disease GRCh37RD375,32912,1422.28298,763,059
Total40,67664,078
1,421,041,774


Loading Data


Query Performance

Common Queries


Clinical Queries



Table of Contents:


  • No labels