Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Annotate

As part of the enrichment step, some extra information can be added to the variants database as Annotations. This VariantAnnotation can be fetch from Cellbase or read from local file provided by the user. The model of the variant annotation is defined in the project Biodata, in variantAnnotation.avdl

Annotators

Variant Storage Engine can make use of different annotators to produce the annotation for the variants.

The annotator can be modified at the annotating step, and the default value is defined in the storage-configuration.yml file:

  • annotator: "cellbase_rest"

WARN Previous to version v1.3.0: Parameter "annotationSource" should be used instead of "annotator". See OpenCGA#747.

CellBase Annotator

CellBase Variant Annotation

PENDING

CellBase REST Annotator

This is the default annotator for OpenCGA. This Annotator connects to a CellBase installation using the REST API.

This is an example of cellbase annotation using a REST call:

http://bioinfo.hpc.cam.ac.uk/cellbase/webservices/rest/v4/hsapiens/genomic/variant/19:44934489:G:A/annotation?exclude=expression

CellBase Direct Annotator

The CellBaseDirectAnnotator creates a connection directly with the CellBase database. This requires a local installation of CellBase, which takes some resources, but it speeds up the annotation step removing network time.

Configuration

PENDING

  • annotator.cellbase.exclude: "expression,hgvs,repeats,cytoband"
  • annotator.cellbase.use_cache: true
  • annotator.cellbase.imprecise_variants: false  # Imprecise variants supported by cellbase (REST only)

Custom annotator

PENDING

Custom annotation

The VariantAnnotation model includes a field for adding extra annotation attributes. This field is intended to contain custom annotation provided by the end user.

Additional attributes can be grouped by source. Each source will contain a set of key-value attributes creating this structure:

Result
VariantAnnotation = {
  // ... 
  "additionalAttributes" : {
    "<source1>" : {
      "attribute" : {
        "<key1>":"<value>",
        "<key2>":"<value>",
        "<key3>":"<value>"
      }
    },
    "<source2>" : {
      "attribute" : {
        "<key1>":"<value>",
        "<key2>":"<value>",
        "<key3>":"<value>"
      }
    }
  }

OpenCGA Storage is able to load this custom annotation from 3 different formats: GFF, BED and VCF. When loading the new annotation data, the user has to provide a name for the new custom annotation. Because the structure of these file formats is slightly different, the information loaded won't be the same.

GFF and BED files describe features within a region, providing a chromosome, start and end. All the variants between the start and end will be annotated with the information.

  • GFF : From this file format, only the third column, containing the feature is extracted and loaded with the key "feature"
    This line of GFF will generate the next additionalAttributes:

    GFF
    chr22 TeleGene enhancer 16053659 16063659 500 + . touch1
    Result
    VariantAnnotation = {
      // ... 
      "additionalAttributes" : {
        "myGff" : {
          "attribute" : {
            "feature" : "enhancer"
          }
        }
      }
    }
    
    
  • BED : From the bed format, columns name (4th), score (5th) and strand (6th) will be loaded.

    This line of BED will generate the next additionalAttributes:

    BED
    chr22 16053659 16063659 Pos1 353 + 127471196 127472363 255,0,0 0 A A
    Result
    VariantAnnotation = {
      // ... 
      "additionalAttributes" : {
        "myBed" : {
          "attribute" : {
            "name":"Pos1",
            "score":"353",
            "strand":"+"
          }
        }
      }
    }
    
  • VCF : This format is not region based, so each line will modify a single variant. All the INFO column will be loaded as additional attributes.

    The next VCF will generate the next additionalAttributes:

    VCF
    ##fileformat=VCFv4.2
    ##FILTER=<ID=PASS,Description="All filters passed">
    ##INFO=<ID=FEATURE,Number=1,Type=String,Description="Feature type">
    ##INFO=<ID=SCORE,Number=1,Type=Integer,Description="Score value">
    ##INFO=<ID=STRAND,Number=1,Type=Integer,Description="Strand">
    #CHROM POS    ID REF    ALT    QUAL   FILTER INFO
    chr22 16050075 A G . 100 PASS FEATURE=specific;SCORE=300;STRAND=+
    Result
    VariantAnnotation = {
      // ... 
      "additionalAttributes" : {
        "myVcf" : {
          "attribute" : {
            "FEATURE":"specific",
            "SCORE":"300",
            "STRAND":"+"
          }
        }
      }
    }
    
  • Example with multiple sources: In case of having custom annotations from more than one source, more than one source will appear in the additionalAttributes field:
    Result
    VariantAnnotation = {
      // ... 
      "additionalAttributes" : {
        "myVcf" : {
          "attribute" : {
            "FEATURE":"specific",
            "SCORE":"300",
            "STRAND":"+"
          }
        },
        "myBed" : {
          "attribute" : {
            "name":"Pos1",
            "score":"353",
            "strand":"+"
          }
        }
      }
    }

Table of Contents:


  • No labels