Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Genome-Wide Association Study (GWAS) is a hypothesis-free method for identifying associations between genetic regions and traits. GWAS analysis are usually used to identify genes involved in human disease. By applying GWAS analysis to variant data we will be able to identify a given variant (or a set of variants) involved in a given phenotype or disorder.Based on a statistical test, GWAS analysis will provide a level of significance (or p-value) for each variant. OpenCGA implements GWAS analysis based on the statistical tests: chi-square and Fisher.
Implementation
OpenCGA GWAS analysis extends Oskar GWAS implementation. GWAS is implemented using Hadoop MapReduce over HBase.
Input Parameters
OpenCGA support different input parameters:
- Variant data with sample genotypes
- Two list of samples (case-control study)
- Statistical test: chi square or fisher.
Instead of providing two lists of samples, users can provide:
- A phenotype, and
- A pedigree for those variant samples
Configuration
OpenCGA implementation supports different configuration variables. This can be setup in the OpenCGA installation folder or specified during execution.
Output
- A text file that consists of a header line (starting with #), and then one line per variant with the following 12-13 columns:
chromosome | Chromosome code |
start | Start base-pair coordinate |
end | End base-pair coordinate |
strand | Strand |
reference | Reference allele |
alternate | Alternate allele |
dbSNP | Variant identifier |
gene | Gene name |
biotype | Bioytpe |
conseq. types | List of consequence types |
chi square | Allelic test chi-square statistic. Not present with 'fisher' test. |
p-value | Allelic test p-value |
odd ratio | Odd ratio: odds(allele 1 | case) / odds(allele 1 | control) |
Next, it shows the first lines of a result file after executing a GWAS analysis using the chi square test:
Code Block | ||
---|---|---|
| ||
#chromosome start end strand reference alternate dbSNP gene biotype conseq. types chi square p-value odd ratio 22 16054454 16054454 + C T rs373998521 intergenic_variant 2.4727272727272727 0.11583677431831574 0.0 22 16065809 16065809 + T C ENSG00000233866 lincRNA downstream_gene_variant 0.053968253968253915 0.8162967146689325 0.8 22 16065809 16065809 + T C regulatory_region_variant 0.053968253968253915 0.8162967146689325 0.8 22 16077310 16077310 + T A ENSG00000229286 unprocessed_pseudogene 2KB_upstream_variant 0.9714285714285711 0.3243241555798487 3.0 22 16077310 16077310 + T A regulatory_region_variant 0.9714285714285711 0.3243241555798487 3.0 22 16080499 16080499 + A G rs200119791 ENSG00000229286 unprocessed_pseudogene upstream_gene_variant 1.8888888888888886 0.16932729721206297 Infinity 22 16080499 16080499 + A G rs200119791 ENSG00000235265 unprocessed_pseudogene downstream_gene_variant 1.8888888888888886 0.16932729721206297 Infinity 22 16084621 16084621 + T C ENSG00000235265 unprocessed_pseudogene non_coding_transcript_exon_variant,non_coding_transcript_variant 2.4425287356321843 0.11808572685033702 Infinity |
Useful Links
Table of Contents:
Table of Contents | ||
---|---|---|
|