Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

Genomic variant data model plays a crucial role not only in OpenCGA but also in OpenCB suite. Variant data model provides a generic way of representing any variant with any other interesting information associated with it. Variant data model is heavily used in OpenCGA when loading VCF files or when exporting query results. Variant data model is implemented in OpenCB Biodata project, this allows the resto of OpenCB projects such as CellBase to use it.

Goals

Main goals of variant data model include:

  • To be able represent any type of variant (SNV, INDEL) or structural variant (INSERTION, DELETION, CNV, TRANSLOCATION, ...), this includes phased variants and non-diploid organisms.
  • To provide a file-format agnostic solution of storing genomic variant data from VCF, gVCF, microarrays, ...
  • To allow bioinformaticians to add valuable and rich annotations fo researchers and clinicians 

Main Features

Some of the main features of the variant data model include:

Design

A high level representation of the variant looks like this:


{
    "id": "1:69511:A:G",
    "names": ["rs75062661"],
    "chromosome": "1",
    "start": 69511,
    "end": 69511,
    "strand": "+",
    "length": 1,
    "type": "SNV",
    "reference": "A",
    "alternate": "G",
    "studies": [
        {
            "studyId": "demo@family:corpasome",
            "files": [
                {
                    "fileId": "quartet.variants.annotated.vcf.gz"
                    "call" : {
                    
                    },
                    "data": {
                        "ABHom": "0.982",
                        "AC": "8",
                        "AF": "1.00",
                        "AN": "8",
                        "BaseQRankSum": "2.089",
                        "DB": "true",
                        "DP": "331",
                        "Dels": "0.00",
                        "EFF": "NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Aca/Gca|T141A|305|OR4F5||CODING|NM_001005484.1|1|1)",
                        "FILTER": "VQSRTrancheSNP99.90to100.00",
                        "FS": "8.817",
                        "HaplotypeScore": "2.4399",
                        "MLEAC": "8",
                        "MLEAF": "1.00",
                        "MQ": "15.47",
                        "MQ0": "145",
                        "MQRankSum": "-0.047",
                        "OND": "0.018",
                        "QD": "12.97",
                        "QUAL": "4293.01",
                        "ReadPosRankSum": "1.662",
                        "SB": "-1.450e+03",
                        "VCF_ID": "rs75062661",
                        "VQSLOD": "-14.4975",
                        "culprit": "MQ",
                        "set": "FilteredInAll"
                    }
                }
            ],
            "secondaryAlternates": [],
            "sampleDataKeys": ["GT", "AD", "DP", "GQ", "PL"],
            "samples": [
                {
                    "sampleId": "",
                    "fileIndex": 1,
                    "data": ["1/1", "2,171", "173", "99", "2218,228,0"]
                },
                {
                    "sampleId": "",
                    "fileIndex": 1,
                    "data": ["1/1", "0,33", "34", "60", "508,60,0"]
                },
                {
                    "sampleId": "",
                    "fileIndex": 1,
                    "data": ["1/1", "0,61", "63", "93", "777,93,0"]
                },
                {
                    "sampleId": "",
                    "fileIndex": 1,
                    "data": ["1/1", "0,61", "61", "96", "790,96,0"]}
            ],
            "issues": [],
            "scores": [],
            "stats": {"ALL": {"alleleCount": 8,
                                             "altAlleleCount": 8,
                                             "altAlleleFreq": 1.0,
                                             "filterCount": {"PASS": 0,
                                                             "VQSRTrancheSNP99.90to100.00": 1},
                                             "filterFreq": {"PASS": 0.0,
                                                            "VQSRTrancheSNP99.90to100.00": 1.0},
                                             "genotypeCount": {"0/0": 0,
                                                               "0/1": 0,
                                                               "1/1": 4},
                                             "genotypeFreq": {"0/0": 0.0,
                                                              "0/1": 0.0,
                                                              "1/1": 1.0},
                                             "maf": 0.0,
                                             "mafAllele": "A",
                                             "mgf": 0.0,
                                             "mgfGenotype": "0/0",
                                             "missingAlleleCount": 0,
                                             "missingGenotypeCount": 0,
                                             "qualityAvg": 4293.01,
                                             "refAlleleCount": 0,
                                             "refAlleleFreq": 0.0}
                                             },
                           }],

              "annotation": {
                "additionalAttributes": {
                    "opencga": {
                        "attribute": {
                            "annotationId": "CURRENT",
                            "release": "1"
                        }
                    }
                },
                "alternate": "G",
                "chromosome": "1",
                "consequenceTypes": [
                    {
                        "biotype": "protein_coding",
                        "cdnaPosition": 421,
                        "cdsPosition": 421,
                        "codon": "Aca/Gca",
                        "ensemblGeneId": "ENSG00000186092",
                        "ensemblTranscriptId": "ENST00000335137",
                                                   "exonOverlap": [{"number": "1/1",
                                                                    "percentage": 0.108932465}],
                                                   "geneName": "OR4F5",
                                                   "proteinVariantAnnotation": {"alternate": "ALA",
                                                                                "features": [{"description": "GPCR, "
                                                                                                             "rhodopsin-like, "
                                                                                                             "7TM",
                                                                                              "end": 280,
                                                                                              "id": "IPR017452",
                                                                                              "start": 34},
                                                                                             {"end": 182,
                                                                                              "start": 90,
                                                                                              "type": "disulfide "
                                                                                                      "bond"},
                                                                                             {"description": "Helical; "
                                                                                                             "Name=4",
                                                                                              "end": 151,
                                                                                              "start": 133,
                                                                                              "type": "transmembrane "
                                                                                                      "region"},
                                                                                             {"description": "Olfactory "
                                                                                                             "receptor "
                                                                                                             "4F5",
                                                                                              "end": 305,
                                                                                              "id": "PRO_0000150547",
                                                                                              "start": 1,
                                                                                              "type": "chain"}],
                                                                                "keywords": ["Cell "
                                                                                             "membrane",
                                                                                             "Complete "
                                                                                             "proteome",
                                                                                             "Disulfide "
                                                                                             "bond",
                                                                                             "G-protein "
                                                                                             "coupled "
                                                                                             "receptor",
                                                                                             "Membrane",
                                                                                             "Olfaction",
                                                                                             "Receptor",
                                                                                             "Reference "
                                                                                             "proteome",
                                                                                             "Sensory "
                                                                                             "transduction",
                                                                                             "Transducer",
                                                                                             "Transmembrane",
                                                                                             "Transmembrane "
                                                                                             "helix"],
                                                                                "position": 141,
                                                                                "reference": "THR",
                                                                                "substitutionScores": [{"description": "tolerated",
                                                                                                        "score": 0.63,
                                                                                                        "source": "sift"},
                                                                                                       {"description": "benign",
                                                                                                        "score": 0.003,
                                                                                                        "source": "polyphen"}],
                                                                                "uniprotAccession": "Q8NH21"},
                                                   "sequenceOntologyTerms": [{"accession": "SO:0001583",
                                                                              "name": "missense_variant"}],
                                                   "strand": "+",
                                                   "transcriptAnnotationFlags": ["CCDS",
                                                                                 "basic"]},
                                                  {"sequenceOntologyTerms": [{"accession": "SO:0001566",
                                                                              "name": "regulatory_region_variant"}]}],
                             "conservation": [{"score": 1.149999976158142,
                                               "source": "gerp"},
                                              {"score": 0.1289999932050705,
                                               "source": "phastCons"},
                                              {"score": -0.527999997138977,
                                               "source": "phylop"}],
                             "cytoband": [{"chromosome": "1",
                                           "end": 2300000,
                                           "name": "p36.33",
                                           "stain": "gneg",
                                           "start": 1}],
                             "displayConsequenceType": "missense_variant",
                             "functionalScore": [{"score": -0.7899999618530273,
                                                  "source": "cadd_raw"},
                                                 {"score": 0.03999999910593033,
                                                  "source": "cadd_scaled"}],
                             "geneDrugInteraction": [],
                             "geneTraitAssociation": [],
                             "hgvs": ["ENST00000335137(ENSG00000186092):c.421A>G"],
                             "id": "rs2691305",
                             "populationFrequencies": [{"altAllele": "G",
                                                        "altAlleleFreq": 0.95061594,
                                                        "altHomGenotypeFreq": 0.93263996,
                                                        "hetGenotypeFreq": 0.03595196,
                                                        "population": "ALL",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.049384065,
                                                        "refHomGenotypeFreq": 0.031408086,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9499386,
                                                        "altHomGenotypeFreq": 0.92997545,
                                                        "hetGenotypeFreq": 0.03992629,
                                                        "population": "OTH",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.050061423,
                                                        "refHomGenotypeFreq": 0.03009828,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.999461,
                                                        "altHomGenotypeFreq": 0.99892205,
                                                        "hetGenotypeFreq": 0.0010779734,
                                                        "population": "EAS",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.0005389867,
                                                        "refHomGenotypeFreq": 0.0,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.95083994,
                                                        "altHomGenotypeFreq": 0.9305369,
                                                        "hetGenotypeFreq": 0.040606,
                                                        "population": "AMR",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.049160052,
                                                        "refHomGenotypeFreq": 0.028857054,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.97795016,
                                                        "altHomGenotypeFreq": 0.9710086,
                                                        "hetGenotypeFreq": 0.013883217,
                                                        "population": "ASJ",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.022049816,
                                                        "refHomGenotypeFreq": 0.015108207,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.99145377,
                                                        "altHomGenotypeFreq": 0.98848504,
                                                        "hetGenotypeFreq": 0.0059373877,
                                                        "population": "FIN",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.00854624,
                                                        "refHomGenotypeFreq": 0.005577546,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9727796,
                                                        "altHomGenotypeFreq": 0.96255124,
                                                        "hetGenotypeFreq": 0.020456737,
                                                        "population": "NFE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.027220415,
                                                        "refHomGenotypeFreq": 0.016992046,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.6074365,
                                                        "altHomGenotypeFreq": 0.47664425,
                                                        "hetGenotypeFreq": 0.26158446,
                                                        "population": "AFR",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.39256352,
                                                        "refHomGenotypeFreq": 0.2617713,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.95853204,
                                                        "altHomGenotypeFreq": 0.94338477,
                                                        "hetGenotypeFreq": 0.03029453,
                                                        "population": "MALE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.041467976,
                                                        "refHomGenotypeFreq": 0.02632071,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.94091445,
                                                        "altHomGenotypeFreq": 0.91947174,
                                                        "hetGenotypeFreq": 0.04288538,
                                                        "population": "FEMALE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.05908557,
                                                        "refHomGenotypeFreq": 0.03764288,
                                                        "study": "GNOMAD_EXOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.84222084,
                                                        "altHomGenotypeFreq": 0.77478045,
                                                        "hetGenotypeFreq": 0.1348808,
                                                        "population": "ALL",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.15777917,
                                                        "refHomGenotypeFreq": 0.090338774,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9404255,
                                                        "altHomGenotypeFreq": 0.9191489,
                                                        "hetGenotypeFreq": 0.04255319,
                                                        "population": "OTH",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.05957447,
                                                        "refHomGenotypeFreq": 0.038297873,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 1.0,
                                                        "altHomGenotypeFreq": 1.0,
                                                        "hetGenotypeFreq": 0.0,
                                                        "population": "EAS",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.0,
                                                        "refHomGenotypeFreq": 0.0,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9410377,
                                                        "altHomGenotypeFreq": 0.9103774,
                                                        "hetGenotypeFreq": 0.061320756,
                                                        "population": "AMR",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.058962263,
                                                        "refHomGenotypeFreq": 0.028301887,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9672131,
                                                        "altHomGenotypeFreq": 0.9508197,
                                                        "hetGenotypeFreq": 0.032786883,
                                                        "population": "ASJ",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.032786883,
                                                        "refHomGenotypeFreq": 0.016393442,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9918478,
                                                        "altHomGenotypeFreq": 0.98913044,
                                                        "hetGenotypeFreq": 0.0054347827,
                                                        "population": "FIN",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.008152174,
                                                        "refHomGenotypeFreq": 0.0054347827,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.9637507,
                                                        "altHomGenotypeFreq": 0.94847214,
                                                        "hetGenotypeFreq": 0.03055722,
                                                        "population": "NFE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.03624925,
                                                        "refHomGenotypeFreq": 0.02097064,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.5886525,
                                                        "altHomGenotypeFreq": 0.41246733,
                                                        "hetGenotypeFreq": 0.3523703,
                                                        "population": "AFR",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.4113475,
                                                        "refHomGenotypeFreq": 0.23516238,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.8381471,
                                                        "altHomGenotypeFreq": 0.7682737,
                                                        "hetGenotypeFreq": 0.13974673,
                                                        "population": "MALE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.16185293,
                                                        "refHomGenotypeFreq": 0.09197956,
                                                        "study": "GNOMAD_GENOMES"},
                                                       {"altAllele": "G",
                                                        "altAlleleFreq": 0.84750646,
                                                        "altHomGenotypeFreq": 0.78322285,
                                                        "hetGenotypeFreq": 0.12856731,
                                                        "population": "FEMALE",
                                                        "refAllele": "A",
                                                        "refAlleleFreq": 0.1524935,
                                                        "refHomGenotypeFreq": 0.08820986,
                                                        "study": "GNOMAD_GENOMES"}],
                             "reference": "A",
                             "repeat": [{"chromosome": "1",
                                         "copyNumber": 2.0,
                                         "end": 87112,
                                         "id": "9119",
                                         "percentageMatch": 0.992904,
                                         "source": "genomicSuperDup",
                                         "start": 10001},
                                        {"chromosome": "1",
                                         "copyNumber": 2.0,
                                         "end": 87112,
                                         "id": "14903",
                                         "percentageMatch": 0.995437,
                                         "source": "genomicSuperDup",
                                         "start": 18393}],
                             "start": 69511,
                             "traitAssociation": [{"additionalProperties": [{"name": "mutationSomaticStatus_in_source_file",
                                                                             "value": "Confirmed "
                                                                                      "somatic "
                                                                                      "variant"}],
                                                   "alleleOrigin": [],
                                                   "bibliography": [],
                                                   "ethnicity": "Z",
                                                   "genomicFeatures": [{"featureType": "gene",
                                                                        "xrefs": {"symbol": "OR4F5"}},
                                                                       {"featureType": "gene",
                                                                        "xrefs": {"symbol": "8301"}}],
                                                   "heritableTraits": [],
                                                   "id": "COSM4144171",
                                                   "somaticInformation": {"histologySubtype": "neoplasm",
                                                                          "primaryHistology": "other",
                                                                          "primarySite": "thyroid",
                                                                          "sampleSource": "",
                                                                          "tumourOrigin": ""},
                                                   "source": {"name": "cosmic"},
                                                   "submissions": []}],
                             "variantTraitAssociation": {"clinvar": [],
                                                         "cosmic": [{"geneName": "OR4F5",
                                                                     "histologySubtype": "neoplasm",
                                                                     "mutationId": "COSM4144171",
                                                                     "mutationSomaticStatus": "Confirmed "
                                                                                              "somatic "
                                                                                              "variant",
                                                                     "primaryHistology": "other",
                                                                     "primarySite": "thyroid",
                                                                     "sampleSource": "",
                                                                     "siteSubtype": "",
                                                                     "tumourOrigin": ""}]}}}



Implementation

Variant data model is implemented in OpenCB Biodata project, this allows the resto of OpenCB projects such as CellBase, Oskar to 


Table of Contents:


  • No labels