Page tree
Skip to end of metadata
Go to start of metadata

Variant Stats

Pre-calculated stats are useful for filtering variants. This stats are intra-study, calculated within a given cohort.

Cohorts

Cohorts are defined as a arbitrary group of samples. Cohorts can be defined in Catalog, either selecting samples one by one or selecting all samples that share some attributes like population or phenotype.

If a cohort is modified after calculating the statistics, the existing statistics became INVALID.

By default, in each study, there is defined the cohort ALL that contains all the samples loaded in the study. Every time that new samples are loaded in the study, this cohort is modified, and the statistics have to be recomputed.

Stats models

There are two types of statistics, per variant, and global statistics. Variant statistics are stored in the variants database, within the StudyEntry. Global statistics are stored in Catalog.

  • Variant Stats (intra variant)
    These stats are calculated for each variant, and for a set of samples (cohort).

    Result
    VariantStats
    	// Total number of alleles in called genotypes. Does not include missing alleles
    	int alleleCount
    	// Number of reference alleles found in this variant
    	int refAlleleCount
    	// Number of main alternate alleles found in this variant. Does not include secondary alternates
    	int altAlleleCount
    	// Reference allele frequency calculated from refAlleleCount and alleleCount, in the range (0,1)
    	float refAlleleFreq
    	// Alternate allele frequency calculated from altAlleleCount and alleleCount, in the range (0,1)
    	float altAlleleFreq
    	// Count for each genotype found
    	map<int> genotypeCount
    	// Genotype frequency for each genotype found
    	map<float> genotypeFreq
    	// Number of missing alleles
    	int missingAlleleCount
    	// Number of missing genotypes
    	int missingGenotypeCount
    	// Minor allele frequency
    	float maf
    	// Minor genotype frequency
    	float mgf
    	// Allele with minor frequency
    	string mafAllele
    	// Genotype with minor frequency
    	string mgfGenotype
  • Variant Global Stats (inter variant)
    PENDING


Aggregated statistics

Usually, public studies do not provide samples data. In this situations is not possible to calculate the statistics. Instead, the statistics can be extracted from the INFO column. Unfortunately, there is no standard way for defining multi-cohort statistics in the VCF format. Instead, OpenCGA recognizes three different formats for representing statistics.

  • BASIC mode
  • EVS mode
  • EXAC mode

Table of Contents:


  • No labels