Page History

...

id

String

Unique variant ID, this consists of chromosome, position, reference and alternate alleles in this format: chrom:pos:ref:alt

names

List<String>

Other IDs found for this genomic variant across all VCF files indexed

chromosome

String

The chromosome where the genomic variant is located

start

int

The 1-based position where the genomic variant starts. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

end

int

The 1-based position where the genomic variant ends. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

reference

String

Reference allele. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

alternate

String

Alternate allele. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

strand

String

Reference strand for this variant, by default all variants are represented in the positive strand

length

int

Length of the genomic variation which depends on the variant type

type

VariantType

Type of variant, the accepted types and Sequence Ontology (SO) terms are:

SNV	SO:0001483
SNP	SO:0000694
MNV	SO:0002007
MNP	SO:0001013
INDEL	SO:1000032
INSERTION	SO:0000667
DELETION	SO:0000159
TRANSLOCATION	SO:0000199
INVERSION	SO:1000036
CNV	SO:0001019
DUPLICATION	SO:1000035
BREAKEND	NA
SYMBOLIC	NA

sv

StructuralVariation

Specific information for Structural Variants

ciStartLeft

int

The confidence interval around START for imprecise variants - left

ciStartRight

int

The confidence interval around START for imprecise variants - right

ciEndLeft

int

The confidence interval around END for imprecise variants - left

ciEndRight

int

The confidence interval around END for imprecise variants - right

copyNumber

int

Number of copies for CNV variants

leftSvInsSeq

String

Left inserted sequence for long INSERTIONS

rightSvInsSeq

String

Right inserted sequence for long INSERTIONS

type

StructuralVariantType

Structural variant types and SO terms are:

COPY_NUMBER_GAIN	SO:0001742
COPY_NUMBER_LOSS	SO:0001743
TANDEM_DUPLICATION	SO:1000173

breakend

Breakend

mate

BreakendMate

chromosome	The chromosome of the mate variant
position	The position of the mate variant
ciPositionLeft	The confidence interval around BREAKEND position - left
ciPositionRight	The confidence interval around BREAKEND position - right

orientation

BreakendOrientation

SE	Start - End t[p[ piece extending to the right of p is joined after t
SS	Start - Start t]p] reverse comp piece extending left of p is joined after t
ES	End - Start ]p]t piece extending to the left of p is joined before t
EE	End - End [p[t reverse comp piece extending the right of p is joined before t

insSeq

String

Sequence inserted between the two breakends

studies

List<StudyEntry>

Information specific to each study the variant was read from, such as samples or statistics

studyId

String

Unique ID for the study

secondaryAlternates

List<AlternateCoordinate>

All alternate alleles that have been indexed along with a variant alternate

chromosome String	The chromosome where the genomic variation occurred
start int	First position 1-based of the alternate
end int	End position 1-based of the alternate
reference String	Reference allele
alternate String	Alternate allele
type VariantType	Type of variant

files

List<FileEntry>

List of files from the study where the variant was present

fileId

String

Unique ID of the indexed file

call

OriginalCall

Original call in the VCF file, this is filled when the variant has been normalised

variantId	Original call position for the variant, if the file was normalised
alleleIndex	Alternate allele index of the original multi-allellic variant call

data

Map<String, String>

File related data that depend on the format of the file the variant was initially read from

sampleDataKeys

List<String>

Specifies the sample data keys for each sample data (see below). The first key is always genotype (GT).

samples

List<SampleEntry>

Sample-related data, each element is related to one sample and contains the specific information for one sample

sampleId

String

Unique sample ID

fileIndex

int

The relative index position in files kist where this sample was loaded

data

List<String>

Sample data, field GT is always the first one. The order and length must match sampleDataKeys field

stats

List<VariantStats>

Variant stats for each variant in the different cohorts, it contains the following fields:

cohortId String	Unique cohort ID
alleleCount int	Total number of alleles in called genotypeCounters. Does not include missing alleles
refAlleleCount int	Number of reference alleles found in this variant
altAlleleCount int	The number of main alternate alleles found in these variants excluding secondary alternates
refAlleleFreq float	Reference allele frequency calculated from refAlleleCount and alleleCount, in the range (0,1)
altAlleleFreq float	Alternate allele frequency calculated from altAlleleCount and alleleCount, in the range (0,1)
missingAlleleCount int	Number of missing alleles
missingGenotypeCount int	Number of missing genotypeCounters
genotypeCount Map<String, int>	Count for each genotype found
genotypeFreq Map<String, float>	Genotype frequency for each genotype found
filterCount Map<String, int>	Number of samples with non-missing genotype with that specific filter
filterFreq Map<String, float>	Frequency of each filter. Count divided by the number of non-missing samples
qualityAvg float	The weighted average of the Quality computed only for non-missing samples
maf float	Minor allele frequency
mgf float	Minor genotype frequency
mafAllele String	The allele with minor frequency
mgfGenotype String	Genotype with minor frequency

Include Page

	Variant Stats Data Model Schema
	Variant Stats Data Model Schema

scores

List<VariantScore>

Analysis scores such as GWAS precomputed and indexed

id String	Variant score ID
cohort1 String	The main cohort used for calculating this score
cohort2 String	The optional secondary cohort used for calculating the score
score float	Score value
pValue float	Score p-value

issues

List<IssueType>

Issues found in this variant for a specific sample in this study

type

IssueType

Issues can have one of these types:

DUPLICATION

DISCREPANCY

MENDELIAN_ERROR

DE_NOVO

sample

SampleEntry

The sample information containing sampleId, fileIndex and data (see above)

annotation

Variant Annotation object, this is a large data model and is documented independently

...

Page tree

Versions Compared

Old Version 10

New Version Current

Key