Consider this multi-sample VCF input record at chromosome 1 position 100. It lists four samples with their genotypes being; homozygous reference [AA/AA], heterozygous SNP [AA/AT], heterozygous insertion [AT/AAC] and heterozygous deletion [AA/A]:
#CHROM POS REF ALT FORMAT SAMPLE1 SAMPLE2 SAMPLE3 SAMPLE4 |
The OpenCB Variant Normalization process normalises first splits the record into four individual variants;
#CHROM POS REF ALT |
Each variant is then allele trimmed and positions updated;
#CHROM POS REF ALT |
The final JSON representation of the Variant objects as stored in the OpenCGA database is as follows:
{ |