Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

OpenCGA provides a set of analysis o compute basic statistics given a variant dataset. In order to get richer statistics, the variant data should comprise annotation and pedigree (samples, phenotypes,...).

OpenCGA computes three types of statistics:

Next sections describe the three types of these statistics.

Anchor

summarysummarySummary statsSummary or global stats provides significant information about the

variant

dataset. It includes:The total number of variants.

  • The total number of samples.
  • The number of variants per chromosome.
  • The number of variants per consequence type.
  • The number of variants per biotype.
  • The number of variants per type (SNV, INDEL,...)
  • The number of variants per genotype.
  • The Ts/TV ratio  or transition-to-transversion ratio.
  • A heterozigosity score.
  • A missingness score.
  • A list of the most affected genes.
  • Indel length
  • A list of HPO and genes for loss of function (LoF) variants.
  • A list of the most frequenct

    variant

    traits.The number of mendelian error per type of error.

  • Relatedness scores (IBD/IBS scores).
  • Summary statistics are stored in a JSON format file.

    AnchorvariantvariantVariant stats

    Variant stats are calculated for each variant, in addition, you may specify a set of samples (aka, cohort) in order to take into account only those samples.

    Variant stats include the following values:

    • The total number of alleles (it does not include missing alleles)
    • The number of reference alleles found in this variant
    • The number of main alternate alleles found in this variant (it does not include secondary alternates)
    • The reference allele frequency, i.e., the quotient of the number of reference alleles divided by the total number of alleles.
    • The alternate allele frequency, i.e., the quotient of the number of alternate alleles divided by the total number of alleles.
    • The number of occurrences for each genotype
    • The frequency for each genotype
    • The number of missing alleles
    • The number of missing genotypes
    • The minor allele frequency (maf)
    • The minor genotype frequency (mgf)
    • The allele with the minor frequency
    • The genotype with the minor frequency

    Pre-calculated stats are useful for filtering variants. This stats are intra-study, calculated within a given cohort.

    Anchor
    summary
    summary
    Sample stats

    Sample stats are calculated for each sample. It includes the following information:

    • The total number of variants.
    • The number of variants per chromosome.
    • The number of variants per consequence type.
    • The number of variants per biotype.
    • The number of variants per type (SNV, INDEL,...)
    • The number of variants per genotype.
    • The transition-to-transversion ratio (ti/tv ratio).
    • A heterozigosity score.
    • A missingness score.
    • A list of the most affected genes.
    • The number of variants per indel length
    • A list of HPO and genes for loss of function (LoF) variants.
    • A list of the most frequent variant traits.

    Summary statistics are stored in a JSON format file.

    Table of Contents:

    Table of Contents
    indent20px