Variant Stats contain a basic information for each variant in a different cohort.

Implementation

Variant Stats is implemented using Hadoop MapReduce over HBase.

Input

Parameters

OpenCGA support different input parameters:

Variant Query
Sample list, cohort or query

Output

Files

Variant stat file including the following values:

The total number of alleles (it does not include missing alleles)

The number of reference alleles found in this variant

The number of main alternate alleles found in this variant (it does not include secondary alternates)

The reference allele frequency, i.e., the quotient of the number of reference alleles divided by the total number of alleles.

The alternate allele frequency, i.e., the quotient of the number of alternate alleles divided by the total number of alleles.

The number of occurrences for each genotype

The frequency for each genotype

The number of missing alleles

The number of missing genotypes

The minor allele frequency (maf)

The minor genotype frequency (mgf)

The allele with the minor frequency

The genotype with the minor frequency

If the stats are not indexed, the analysis produces a Variant stats file in json format with the following model schema:

Variant Stats Data Model

Include Page

	opencb:Variant Stats Data Model Schema
	opencb:Variant Stats Data Model Schema

Index

Pre-computed stats are useful for filtering variants. This stats are intra-study, calculated within a given cohort.

Useful Links

Table of Contents:

Table of Contents

indent	20px

Page tree

Versions Compared

Old Version 3

New Version Current

Key

Implementation

Input

Parameters

Output

Files

Variant Stats Data Model

Index

Useful Links

Page tree

Page History

Versions Compared

Old Version 3

New Version Current

Key

Implementation

Input

Parameters

Output

Files

Variant Stats Data Model

Index

Useful Links