- Created by Nacho Medina, last modified by Jacobo Coll on Mar 23, 2020
You are viewing an old version of this page. View the current version.
Compare with Current View Page History
« Previous Version 8 Next »
Overview
Genomic variant data model plays a crucial role not only in OpenCGA but also in OpenCB suite. Variant data model provides a generic way of representing any variant with any other interesting information associated with it. Variant data model is heavily used in OpenCGA when loading VCF files or when exporting query results. Variant data model is implemented in OpenCB Biodata project, this allows the rest of OpenCB projects such as CellBase to use it.
Goals
Main goals of variant data model include:
- To be able represent any type of variant (SNV, INDEL) or structural variant (INSERTION, DELETION, CNV, TRANSLOCATION, ...), this includes phased variants and non-diploid organisms.
- To provide a file-format agnostic solution of storing genomic variant data from VCF, gVCF, microarrays, ...
- To allow bioinformaticians to add valuable and rich annotations for researchers and clinicians
Main Features
Some of the main features of the variant data model include:
Design
A high level representation of the variant looks like this:
id | The variant ID | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
names | Other names used for this genomic variation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
chromosome | Chromosome where the genomic variation occurred | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
start | Normalized position where the genomic variation starts | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
end | Normalized position where the genomic variation ends | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
length | Length of the genomic variation, which depends on the variation type | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type | Type of variation: single nucleotide, indel or structural variation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
strand | Reference strand for this variant | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
reference | Reference allele | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
alternate | Alternate allele | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
studies
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
annotation |
id String | The variant ID | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
names List<String> | Other names used for this genomic variation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
chromosome String | Chromosome where the genomic variation occurred | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
start int | Normalized position where the genomic variation starts | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
end int | Normalized position where the genomic variation ends | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
length int | Length of the genomic variation, which depends on the variation type | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
type VariantType | Type of variation: single nucleotide, indel or structural variation.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
strand String | Reference strand for this variant | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
reference String | Reference allele | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
alternate String | Alternate allele | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
studies List<StudyEntry> | Information specific to each study the variant was read from, such as samples or statistics
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
annotation |
{ "id": "1:69511:A:G", "names": ["rs75062661"], "chromosome": "1", "start": 69511, "end": 69511, "strand": "+", "length": 1, "type": "SNV", "reference": "A", "alternate": "G", "studies": [ { "studyId": "demo@family:corpasome", "files": [ { "fileId": "quartet.variants.annotated.vcf.gz" "call" : { }, "data": { "ABHom": "0.982", "AC": "8", "AF": "1.00", "AN": "8", "BaseQRankSum": "2.089", "DB": "true", "DP": "331", "Dels": "0.00", "EFF": "NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Aca/Gca|T141A|305|OR4F5||CODING|NM_001005484.1|1|1)", "FILTER": "VQSRTrancheSNP99.90to100.00", "FS": "8.817", "HaplotypeScore": "2.4399", "MLEAC": "8", "MLEAF": "1.00", "MQ": "15.47", "MQ0": "145", "MQRankSum": "-0.047", "OND": "0.018", "QD": "12.97", "QUAL": "4293.01", "ReadPosRankSum": "1.662", "SB": "-1.450e+03", "VCF_ID": "rs75062661", "VQSLOD": "-14.4975", "culprit": "MQ", "set": "FilteredInAll" } } ], "secondaryAlternates": [], "sampleDataKeys": ["GT", "AD", "DP", "GQ", "PL"], "samples": [ { "sampleId": "", "fileIndex": 0, "data": ["1/1", "2,171", "173", "99", "2218,228,0"] }, { "sampleId": "", "fileIndex": 0, "data": ["1/1", "0,33", "34", "60", "508,60,0"] }, { "sampleId": "", "fileIndex": 0, "data": ["1/1", "0,61", "63", "93", "777,93,0"] }, { "sampleId": "", "fileIndex": 0, "data": ["1/1", "0,61", "61", "96", "790,96,0"]} ], "issues": [], "scores": [], "stats": {"ALL": {"alleleCount": 8, "altAlleleCount": 8, "altAlleleFreq": 1.0, "filterCount": {"PASS": 0, "VQSRTrancheSNP99.90to100.00": 1}, "filterFreq": {"PASS": 0.0, "VQSRTrancheSNP99.90to100.00": 1.0}, "genotypeCount": {"0/0": 0, "0/1": 0, "1/1": 4}, "genotypeFreq": {"0/0": 0.0, "0/1": 0.0, "1/1": 1.0}, "maf": 0.0, "mafAllele": "A", "mgf": 0.0, "mgfGenotype": "0/0", "missingAlleleCount": 0, "missingGenotypeCount": 0, "qualityAvg": 4293.01, "refAlleleCount": 0, "refAlleleFreq": 0.0} }, }], "annotation": { "additionalAttributes": { "opencga": { "attribute": { "annotationId": "CURRENT", "release": "1" } } }, "alternate": "G", "chromosome": "1", "consequenceTypes": [ { "biotype": "protein_coding", "cdnaPosition": 421, "cdsPosition": 421, "codon": "Aca/Gca", "ensemblGeneId": "ENSG00000186092", "ensemblTranscriptId": "ENST00000335137", "exonOverlap": [{"number": "1/1", "percentage": 0.108932465}], "geneName": "OR4F5", "proteinVariantAnnotation": {"alternate": "ALA", "features": [{"description": "GPCR, " "rhodopsin-like, " "7TM", "end": 280, "id": "IPR017452", "start": 34}, {"end": 182, "start": 90, "type": "disulfide " "bond"}, {"description": "Helical; " "Name=4", "end": 151, "start": 133, "type": "transmembrane " "region"}, {"description": "Olfactory " "receptor " "4F5", "end": 305, "id": "PRO_0000150547", "start": 1, "type": "chain"}], "keywords": ["Cell " "membrane", "Complete " "proteome", "Disulfide " "bond", "G-protein " "coupled " "receptor", "Membrane", "Olfaction", "Receptor", "Reference " "proteome", "Sensory " "transduction", "Transducer", "Transmembrane", "Transmembrane " "helix"], "position": 141, "reference": "THR", "substitutionScores": [{"description": "tolerated", "score": 0.63, "source": "sift"}, {"description": "benign", "score": 0.003, "source": "polyphen"}], "uniprotAccession": "Q8NH21"}, "sequenceOntologyTerms": [{"accession": "SO:0001583", "name": "missense_variant"}], "strand": "+", "transcriptAnnotationFlags": ["CCDS", "basic"]}, {"sequenceOntologyTerms": [{"accession": "SO:0001566", "name": "regulatory_region_variant"}]}], "conservation": [{"score": 1.149999976158142, "source": "gerp"}, {"score": 0.1289999932050705, "source": "phastCons"}, {"score": -0.527999997138977, "source": "phylop"}], "cytoband": [{"chromosome": "1", "end": 2300000, "name": "p36.33", "stain": "gneg", "start": 1}], "displayConsequenceType": "missense_variant", "functionalScore": [{"score": -0.7899999618530273, "source": "cadd_raw"}, {"score": 0.03999999910593033, "source": "cadd_scaled"}], "geneDrugInteraction": [], "geneTraitAssociation": [], "hgvs": ["ENST00000335137(ENSG00000186092):c.421A>G"], "id": "rs2691305", "populationFrequencies": [{"altAllele": "G", "altAlleleFreq": 0.95061594, "altHomGenotypeFreq": 0.93263996, "hetGenotypeFreq": 0.03595196, "population": "ALL", "refAllele": "A", "refAlleleFreq": 0.049384065, "refHomGenotypeFreq": 0.031408086, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9499386, "altHomGenotypeFreq": 0.92997545, "hetGenotypeFreq": 0.03992629, "population": "OTH", "refAllele": "A", "refAlleleFreq": 0.050061423, "refHomGenotypeFreq": 0.03009828, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.999461, "altHomGenotypeFreq": 0.99892205, "hetGenotypeFreq": 0.0010779734, "population": "EAS", "refAllele": "A", "refAlleleFreq": 0.0005389867, "refHomGenotypeFreq": 0.0, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.95083994, "altHomGenotypeFreq": 0.9305369, "hetGenotypeFreq": 0.040606, "population": "AMR", "refAllele": "A", "refAlleleFreq": 0.049160052, "refHomGenotypeFreq": 0.028857054, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.97795016, "altHomGenotypeFreq": 0.9710086, "hetGenotypeFreq": 0.013883217, "population": "ASJ", "refAllele": "A", "refAlleleFreq": 0.022049816, "refHomGenotypeFreq": 0.015108207, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.99145377, "altHomGenotypeFreq": 0.98848504, "hetGenotypeFreq": 0.0059373877, "population": "FIN", "refAllele": "A", "refAlleleFreq": 0.00854624, "refHomGenotypeFreq": 0.005577546, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9727796, "altHomGenotypeFreq": 0.96255124, "hetGenotypeFreq": 0.020456737, "population": "NFE", "refAllele": "A", "refAlleleFreq": 0.027220415, "refHomGenotypeFreq": 0.016992046, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.6074365, "altHomGenotypeFreq": 0.47664425, "hetGenotypeFreq": 0.26158446, "population": "AFR", "refAllele": "A", "refAlleleFreq": 0.39256352, "refHomGenotypeFreq": 0.2617713, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.95853204, "altHomGenotypeFreq": 0.94338477, "hetGenotypeFreq": 0.03029453, "population": "MALE", "refAllele": "A", "refAlleleFreq": 0.041467976, "refHomGenotypeFreq": 0.02632071, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.94091445, "altHomGenotypeFreq": 0.91947174, "hetGenotypeFreq": 0.04288538, "population": "FEMALE", "refAllele": "A", "refAlleleFreq": 0.05908557, "refHomGenotypeFreq": 0.03764288, "study": "GNOMAD_EXOMES"}, {"altAllele": "G", "altAlleleFreq": 0.84222084, "altHomGenotypeFreq": 0.77478045, "hetGenotypeFreq": 0.1348808, "population": "ALL", "refAllele": "A", "refAlleleFreq": 0.15777917, "refHomGenotypeFreq": 0.090338774, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9404255, "altHomGenotypeFreq": 0.9191489, "hetGenotypeFreq": 0.04255319, "population": "OTH", "refAllele": "A", "refAlleleFreq": 0.05957447, "refHomGenotypeFreq": 0.038297873, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 1.0, "altHomGenotypeFreq": 1.0, "hetGenotypeFreq": 0.0, "population": "EAS", "refAllele": "A", "refAlleleFreq": 0.0, "refHomGenotypeFreq": 0.0, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9410377, "altHomGenotypeFreq": 0.9103774, "hetGenotypeFreq": 0.061320756, "population": "AMR", "refAllele": "A", "refAlleleFreq": 0.058962263, "refHomGenotypeFreq": 0.028301887, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9672131, "altHomGenotypeFreq": 0.9508197, "hetGenotypeFreq": 0.032786883, "population": "ASJ", "refAllele": "A", "refAlleleFreq": 0.032786883, "refHomGenotypeFreq": 0.016393442, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9918478, "altHomGenotypeFreq": 0.98913044, "hetGenotypeFreq": 0.0054347827, "population": "FIN", "refAllele": "A", "refAlleleFreq": 0.008152174, "refHomGenotypeFreq": 0.0054347827, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.9637507, "altHomGenotypeFreq": 0.94847214, "hetGenotypeFreq": 0.03055722, "population": "NFE", "refAllele": "A", "refAlleleFreq": 0.03624925, "refHomGenotypeFreq": 0.02097064, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.5886525, "altHomGenotypeFreq": 0.41246733, "hetGenotypeFreq": 0.3523703, "population": "AFR", "refAllele": "A", "refAlleleFreq": 0.4113475, "refHomGenotypeFreq": 0.23516238, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.8381471, "altHomGenotypeFreq": 0.7682737, "hetGenotypeFreq": 0.13974673, "population": "MALE", "refAllele": "A", "refAlleleFreq": 0.16185293, "refHomGenotypeFreq": 0.09197956, "study": "GNOMAD_GENOMES"}, {"altAllele": "G", "altAlleleFreq": 0.84750646, "altHomGenotypeFreq": 0.78322285, "hetGenotypeFreq": 0.12856731, "population": "FEMALE", "refAllele": "A", "refAlleleFreq": 0.1524935, "refHomGenotypeFreq": 0.08820986, "study": "GNOMAD_GENOMES"}], "reference": "A", "repeat": [{"chromosome": "1", "copyNumber": 2.0, "end": 87112, "id": "9119", "percentageMatch": 0.992904, "source": "genomicSuperDup", "start": 10001}, {"chromosome": "1", "copyNumber": 2.0, "end": 87112, "id": "14903", "percentageMatch": 0.995437, "source": "genomicSuperDup", "start": 18393}], "start": 69511, "traitAssociation": [{"additionalProperties": [{"name": "mutationSomaticStatus_in_source_file", "value": "Confirmed " "somatic " "variant"}], "alleleOrigin": [], "bibliography": [], "ethnicity": "Z", "genomicFeatures": [{"featureType": "gene", "xrefs": {"symbol": "OR4F5"}}, {"featureType": "gene", "xrefs": {"symbol": "8301"}}], "heritableTraits": [], "id": "COSM4144171", "somaticInformation": {"histologySubtype": "neoplasm", "primaryHistology": "other", "primarySite": "thyroid", "sampleSource": "", "tumourOrigin": ""}, "source": {"name": "cosmic"}, "submissions": []}], "variantTraitAssociation": {"clinvar": [], "cosmic": [{"geneName": "OR4F5", "histologySubtype": "neoplasm", "mutationId": "COSM4144171", "mutationSomaticStatus": "Confirmed " "somatic " "variant", "primaryHistology": "other", "primarySite": "thyroid", "sampleSource": "", "siteSubtype": "", "tumourOrigin": ""}]}}}
Implementation
Variant data model is implemented in OpenCB Biodata project, this allows the resto of OpenCB projects such as CellBase, Oskar to
Table of Contents:
- No labels