Variant Stats contain a basic information for each variant in a different cohort.
Variant Stats is implemented using Hadoop MapReduce over HBase.
OpenCGA support different input parameters:
- Variant Query
- Sample list, cohort or query
If the stats are not indexed, the analysis produces a Variant stats file in json format with the following model schema:
Variant Stats Data Model
Unique cohort identifier within the study.
Count of samples with non-missing genotypes in this variant from the cohort.
Count of files with samples from the cohort that reported this variant.
Total number of alleles in called genotypeCounters. It does not include missing alleles.
Number of reference alleles found in this variant.
Reference allele frequency calculated from refAlleleCount and alleleCount, in the range [0,1]
Number of main alternate alleles found in this variants. It does not include secondary alternates.
Alternate allele frequency calculated from altAlleleCount and alleleCount, in the range [0,1]
Number of missing alleles.
Number of genotypes with all alleles missing (e.g. ./.). It does not count partially missing genotypes like "./0" or "./1".
Number of occurrences for each genotype.
Genotype frequency for each genotype found calculated from the genotypeCount and samplesCount, in the range [0,1]
Minor allele frequency. Frequency of the less common allele between the reference and the main alternate alleles.
Allele with minor frequency.
Minor genotype frequency. Frequency of the less common genotype seen in this variant.
Genotype with minor frequency.
The number of occurrences for each FILTER value in files from samples in this cohort reporting this variant.
Frequency of each filter calculated from the filterCount and filesCount, in the range [0,1]
The number of files from samples in this cohort reporting this variant with valid QUAL values.
The average Quality value for files with valid QUAL values from samples in this cohort reporting this variant.
Pre-computed stats are useful for filtering variants. This stats are intra-study, calculated within a given cohort.
Table of Contents: