Page tree
Skip to end of metadata
Go to start of metadata


id

String

Unique variant ID, this consists of chromosome, position, reference and alternate alleles in this format: chrom:pos:ref:alt

names

List<String>

Other IDs found for this genomic variant across all VCF files indexed

chromosome

String

The chromosome where the genomic variant is located

start

int

The 1-based position where the genomic variant starts. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

end

int

The 1-based position where the genomic variant ends. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

reference

String

Reference allele. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

alternate

String

Alternate allele. For variants coming from VCF files, this position is likely to be normalised, in this case, the original call in the file is stored in studies.files.call (see below)

strand

String

Reference strand for this variant, by default all variants are represented in the positive strand

length

int

Length of the genomic variation which depends on the variant type

type

VariantType

Type of variant, the accepted types and Sequence Ontology (SO) terms are:

SNVSO:0001483
SNPSO:0000694
MNVSO:0002007
MNPSO:0001013
INDELSO:1000032
INSERTIONSO:0000667
DELETIONSO:0000159
TRANSLOCATIONSO:0000199
INVERSIONSO:1000036
CNVSO:0001019
DUPLICATIONSO:1000035
BREAKENDNA
SYMBOLICNA

sv

StructuralVariation

Specific information for Structural Variants

ciStartLeft

int

The confidence interval around START for imprecise variants - left

ciStartRight

int

The confidence interval around START for imprecise variants - right

ciEndLeft

int

The confidence interval around END for imprecise variants - left

ciEndRight

int

The confidence interval around END for imprecise variants - right

copyNumber

int

Number of copies for CNV variants

leftSvInsSeq

String

Left inserted sequence for long INSERTIONS

rightSvInsSeq

String

Right inserted sequence for long INSERTIONS

type

StructuralVariantType

Structural variant types and SO terms are:

COPY_NUMBER_GAINSO:0001742
COPY_NUMBER_LOSSSO:0001743
TANDEM_DUPLICATIONSO:1000173

breakend

Breakend

mate

BreakendMate

chromosomeThe chromosome of the mate variant
positionThe position of the mate variant
ciPositionLeftThe confidence interval around BREAKEND position - left
ciPositionRightThe confidence interval around BREAKEND position - right

orientation

BreakendOrientation

SE

Start - End

t[p[  piece extending to the right of p is joined after t

SS

Start - Start

t]p]  reverse comp piece extending left of p is joined after t

ES

End - Start

]p]t  piece extending to the left of p is joined before t

EE

End - End

[p[t reverse comp piece extending the right of p is joined before t

insSeq

String

Sequence inserted between the two breakends

studies

List<StudyEntry>

Information specific to each study the variant was read from, such as samples or statistics

studyId

String

Unique ID for the study

secondaryAlternates

List<AlternateCoordinate>

All alternate alleles that have been indexed along with a variant alternate

chromosome

String

The chromosome where the genomic variation occurred

start

int

First position 1-based of the alternate

end

int

End position 1-based of the alternate
reference

String

Reference allele

alternate

String

Alternate allele

type

VariantType

Type of variant

files

List<FileEntry>

List of files from the study where the variant was present

fileId

String

Unique ID of the indexed file

call

OriginalCall

Original call in the VCF file, this is filled when the variant has been normalised

variantId

Original call position for the variant, if the file was normalised

alleleIndex

Alternate allele index of the original multi-allellic variant call

data

Map<String, String>

File related data that depend on the format of the file the variant was initially read from

sampleDataKeys

List<String>

Specifies the sample data keys for each sample data (see below). The first key is always genotype (GT).

samples

List<SampleEntry>

Sample-related data, each element is related to one sample and contains the specific information for one sample

sampleId

String

Unique sample ID

fileIndex

int

The relative index position in files kist where this sample was loaded

data

List<String>

Sample data, field GT is always the first one. The order and length must match sampleDataKeys field

stats

List<VariantStats>

Variant stats for each variant in the different cohorts, it contains the following fields:


cohortId
String

Unique cohort identifier within the study.

sampleCount
int

Count of samples with non-missing genotypes in this variant from the cohort.
This value is used as denominator for genotypeFreq.

fileCount
int

Count of files with samples from the cohort that reported this variant.
This value is used as denominator for filterFreq.

alleleCount
int

Total number of alleles in called genotypeCounters. It does not include missing alleles.
This value is used as denominator for refAlleleFreq and altAlleleFreq.

refAlleleCount
int

Number of reference alleles found in this variant.

refAlleleFreq
float

Reference allele frequency calculated from refAlleleCount and alleleCount, in the range [0,1]

altAlleleCount
int

Number of main alternate alleles found in this variants. It does not include secondary alternates.

altAlleleFreq
float

Alternate allele frequency calculated from altAlleleCount and alleleCount, in the range [0,1]

missingAlleleCount
int

Number of missing alleles.

missingGenotypeCount
int

Number of genotypes with all alleles missing (e.g. ./.). It does not count partially missing genotypes like "./0" or "./1".

genotypeCount
Map<String, int>

Number of occurrences for each genotype.
This does not include genotype with all alleles missing (e.g. ./.), but it includes partially missing genotypes like "./0" or "./1".
Total sum of counts should be equal to the count of samples.

genotypeFreq
Map<String, float>

Genotype frequency for each genotype found calculated from the genotypeCount and samplesCount, in the range [0,1]

maf
float

Minor allele frequency. Frequency of the less common allele between the reference and the main alternate alleles.
This value does not take into acconunt secondary alternates.

mafAllele
String

Allele with minor frequency.

mgf
float

Minor genotype frequency. Frequency of the less common genotype seen in this variant.
This value takes into account all values from the genotypeFreq map.

mgfGenotype
String

Genotype with minor frequency.

filterCount
Map<String, int>

The number of occurrences for each FILTER value in files from samples in this cohort reporting this variant.
As each file can contain more than one filter value (usually separated by ';'), the total sum of counts could be greater than to the count of files.

filterFreq
Map<String, float>

Frequency of each filter calculated from the filterCount and filesCount, in the range [0,1]

qualityCount
int

The number of files from samples in this cohort reporting this variant with valid QUAL values.
This value is used as denominator to obtain the qualityAvg

qualityAvg
float

The average Quality value for files with valid QUAL values from samples in this cohort reporting this variant.
Some files may not have defined the QUAL value, so the sampling could be less than the filesCount.

scores

List<VariantScore>

Analysis scores such as GWAS precomputed and indexed

id

String

Variant score ID
cohort1
String
The main cohort used for calculating this score

cohort2

String

The optional secondary cohort used for calculating the score

score

float

Score value

pValue

float

Score p-value

issues

List<IssueType>

Issues found in this variant for a specific sample in this study

type

IssueType

Issues can have one of these types:

DUPLICATION
DISCREPANCY
MENDELIAN_ERROR
DE_NOVO

sample

SampleEntry

The sample information containing sampleId, fileIndex and data (see above)
annotationVariant Annotation object, this is a large data model and is documented independently
  • No labels