Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Variant Stats contain a basic information for each variant in a different cohort. 

Implementation

Variant Stats is implemented using Hadoop MapReduce over HBase. 

Input

Parameters

OpenCGA support different input parameters:

  • Variant Query
  • Sample list, cohort or query

Output

Files

Variant stat file including the following values:

  • The total number of alleles (it does not include missing alleles)
  • The number of reference alleles found in this variant
  • The number of main alternate alleles found in this variant (it does not include secondary alternates)
  • The reference allele frequency, i.e., the quotient of the number of reference alleles divided by the total number of alleles.
  • The alternate allele frequency, i.e., the quotient of the number of alternate alleles divided by the total number of alleles.
  • The number of occurrences for each genotype
  • The frequency for each genotype
  • The number of missing alleles
  • The number of missing genotypes
  • The minor allele frequency (maf)
  • The minor genotype frequency (mgf)
  • The allele with the minor frequency
  • The genotype with the minor frequency

    If the stats are not indexed, the analysis produces a Variant stats file in json format with the following model schema:

    Variant Stats Data Model

    Include Page
    opencb:Variant Stats Data Model Schema
    opencb:Variant Stats Data Model Schema

    Index

    Pre-computed stats are useful for filtering variants. This stats are intra-study, calculated within a given cohort.

    Useful Links

    Table of Contents:

    Table of Contents
    indent20px