Genome-Wide Association Study (GWAS) is a hypothesis-free method for identifying associations between genetic regions and traits. GWAS analysis are usually used to identify genes involved in human disease. By applying GWAS analysis to variant data we will be able to identify a given variant (or a set of variants) involved in a given phenotype or disorder.Based on a statistical test, GWAS analysis will provide a level of significance (or p-value) for each variant. OpenCGA implements GWAS analysis based on the statistical tests: chi-square and Fisher.
OpenCGA GWAS analysis extends Oskar GWAS implementation. GWAS is implemented using Hadoop MapReduce over HBase.
OpenCGA support different input parameters:
Variant data with sample genotypes
Two list of samples (case-control study)
Statistical test: chi square or fisher.
Instead of providing two lists of samples, users can provide:
A phenotype, and
A pedigree for those variant samples
OpenCGA implementation supports different configuration variables. This can be setup in the OpenCGA installation folder or specified during execution.
GWAS analysis result includes a text file with score and plot image (WIP)
Text file that consists of a header line (starting with #), and then one line per variant with the following 12-13 columns:
Start base-pair coordinate
End base-pair coordinate
List of consequence types
Allelic test chi-square statistic. Not present with 'fisher' test.