A gene is considered to be knocked out for a sample when there is a set of variants that disable each copy of a certain gene.
This analysis obtains the list of knocked out genes for each input sample.
A variant is considered to disable a gene depending on the biotype of the gene, and its annotated consequence type. In protein_coding genes, the consequence type must be any from the list of loss of function sequence ontology terms listed below. In genes with other biotypes, the consequence type is not checked. The variants must also match other filter quality criteria.
Loss of function consequence type:
frameshift_variant
inframe_deletion
inframe_insertion
start_lost
stop_gained
stop_lost
splice_acceptor_variant
splice_donor_variant
transcript_ablation
transcript_amplification
initiator_codon_variant
splice_region_variant
incomplete_terminal_codon_variant
There are multiple scenarios where we can ensure that a set of variants are affecting all copies of the gene, therefore, the gene is knocked out.
Implemented at opencga#1455.
Parameters can be grouped in three categories:
The analysis will produce one JSON file per sample with all knocked-out genes for that sample, one JSON file per gene with all samples with that gene knocked-out, and one summary JSON file with aggregated information.
....
gene
id
name
chromosome
start
end
biotypes
transcripts[]
id
chromosome
start
end
....
stats
numIndividuals
numSamples
individuals[]
id
disorders
phenotypes
samples[]
id
transcripts[] : KnockoutTranscript
id
chromosome
start
end
biotype
variants[] : KnockoutVariant
id
genotype
filter
qual
knockoutType : [HOM_ALT, COMP_HET, MULTI_ALLELIC, DELETION_OVERLAP]
sequenceOntologyTerms[]
individual
id
....
sample
id
phenotypes
...
stats
numGenes
numTranscripts
genes[] : KnockoutGene
id
name
biotype
status
chromosome
start
end
strand
source
description
transcripts[] : KnockoutTranscript
id
chromosome
start
end
biotype
variants[] : KnockoutVariant
id
genotype
filter
qual
knockoutType : [HOM_ALT, COMP_HET, MULTI_ALLELIC, DELETION_OVERLAP]
sequenceOntologyTerms[]
{ "sample" : "NA19600", "genesCount" : 5, "transcriptsCount" : 15, "countByType" : { "homAltCount" : 15, "multiAllelicCount" : 2, "compHetCount" : 0, "deletionOverlapCount" : 0 }, "genes" : [ { "id" : "ENSG00000186470", "name" : "BTN3A2", "transcripts" : [ { "id" : "ENST00000377708", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000508906", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000356386", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000527422", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000396948", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000527639", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000396934", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000198919", "name" : "DZIP3", "transcripts" : [ { "id" : "ENST00000361582", "biotype" : "protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000463306", "biotype" : "protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000479138", "biotype" : "protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000215182", "name" : "MUC5AC", "transcripts" : [ { "id" : "ENST00000621226", "biotype" : "protein_coding", "homAltVariants" : [ ], "multiAllelicVariants" : [ "11:1158073:T:C", "11:1158073:T:-" ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000147874", "name" : "HAUS6", "transcripts" : [ { "id" : "ENST00000380496", "biotype" : "protein_coding", "homAltVariants" : [ "9:19058483:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000380502", "biotype" : "protein_coding", "homAltVariants" : [ "9:19058483:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000099937", "name" : "SERPIND1", "transcripts" : [ { "id" : "ENST00000215727", "biotype" : "protein_coding", "homAltVariants" : [ "22:20780030:-:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000406799", "biotype" : "protein_coding", "homAltVariants" : [ "22:20780030:-:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] } ] } |
Table of Contents: