Overview

CellBase takes advantage of the data integrated to implement a rich and high-performance variant annotator. The variant annotation tool is integrated within the CellBase code and can be accessed in several different ways:

The typical input for the CellBase variant annotator will be a VCF file, although the CLI also offers the possibility to explicitly provide a short list of variants as an argument for fast annotation. Two different output formats can be currently generated by the annotator: a .json file with a list of VariantAnnotation objects (see Variant and VariantAnnotation models at https://github.com/opencb/biodata/tree/develop/biodata-models/src/main/resources/avro), or a tab separated values file with the VEP formatted output.

Data sources

Data provided by the variant annotator is the result of integrating most of the annotations available at the CellBase knowledge base: ENSEMBL's core transcript annotation such as location, id, strand, biotype,etc.; protein annotation provided by UniProt, InterPro, SIFT and PolyPhen; population frequencies provided by the European Variation Archive for The 1000 Genomes Project Phase 3, The Exome Server Project (EVS), The Exome Aggregation Consortium v3 (ExaC), gnomAD exomes, gnomAD genomes and The Genomes of the Netherlands (GoNL); sequence conservation from PhastCons and PhyloP; gene expression values from The Genome Expression Atlas; gene drug interaction data from The Drug Gene Interaction Database (DGIdb) and the Human Phenotype Ontology database (HPO); clinical variants annotation from ClinVar. Sequence effect prediction is also calculated on the fly and described by Sequence Ontology (SO) terms. We are constantly working to integrate new data sources in the knowledgebase.

Benchmark

Exhaustive comparison of sequence effect predictions was made against VEP (83) results for the whole 1000 Genome Phase 3 variant set (83 million variants, 346 million effect predictions), yielding a 99.999% of concordance with Ensembl VEP Consequence Types.

Custom annotations

CellBase variant annotations can be complemented with custom annotations provided by the user. The variant annotation CLI allows to provide a VCF file with custom annotation in the INFO column.

How to annotate variants

Please, refer to the VCF and Variant Annotation tutorial.