Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
A gene is considered to be knocked out for a sample when there is a set of variants that disable each copy of a certain gene.
This analysis obtains the list of knocked out genes for each input sample.
A variant is considered to disable a gene depending on the biotype of the gene, and its annotated consequence type. In protein_coding genes, the consequence type must be any from the list of loss of function sequence ontology terms listed below. In genes with other biotypes, the consequence type is not checked. The variants must also match other filter quality criteria.
Loss of function consequence type:
frameshift_variant
inframe_deletion
inframe_insertion
start_lost
stop_gained
stop_lost
splice_acceptor_variant
splice_donor_variant
transcript_ablation
transcript_amplification
initiator_codon_variant
splice_region_variant
incomplete_terminal_codon_variant
There are multiple scenarios where we can ensure that a set of variants are affecting all copies of the gene, therefore, the gene is knocked out.
- Homozygous Alternate : The sample presents the same knock-out variant in both copies of the chromosome.
- Multi-allelic : The sample presents two different knock-out variants in the same position.
- Compound Heterozygous : The sample presents two different variants at a particular gene, one on each chromosome of a pair, each of them inherited from different parents.
- Structural Variation overlap : The sample presents a large structural deletion overlapping with a knock out variant.
Implementation
Implemented at opencga#1455.
Input
Parameters
Parameters can be grouped in three categories:
- Samples selection
- sample : List of samples to analyse. The analysis will produce a file for each sample.
- Genes selection
If no parameter is provided to select genes, the analysis will consider all protein_coding genes.- gene : List of genes of interest. This parameter can be combined with panel, but not with biotypeIn combination with parameter panel, all genes will be used.
- panel : List of panels of interest. This parameter can be combined with gene, but not with biotype In combination with parameter gene, all genes will be used.
- biotype : List of biotypes. This parameter can not be combined with gene or panelUsed to filter by transcripts biotype.
- Variants filter
- consequenceType : Consequence type as a list of SequenceOntology terms. This filter is only applied on protein_coding genes. By default filters by lodd loss of function consequence types.
- filter : List of accepted terms from the FILTER column from the VCF file.
- qual : Filter by QUAL column from the files.
Output
The analysis will produce one JSON file per sample with all knocked-out genes for that sample, one JSON file per gene with all samples with that gene knocked-out, and one summary JSON file with aggregated information.
Files
knockout
_genes.summary.json
See GeneKnockoutSummary KnockoutByGene.java
....
knockout
_genes.gene.{gene}.json
gene
id
name
chromosome
start
end
biotypes
transcripts[]
id
chromosome
start
end
....
stats
numIndividuals
numSamples
- byType : Map<KnockoutType, long>
individuals[]
id
disorders
phenotypes
samples[]
id
transcripts[] : KnockoutTranscript
id
chromosome
start
end
biotype
variants[] : KnockoutVariant
id
genotype
filter
qual
knockoutType : [HOM_ALT, COMP_HET, MULTI_ALLELIC, DELETION_OVERLAP]
sequenceOntologyTerms[]
knockout.sample.{individual}.{sample}.json
See GeneKnockoutBySample KnockoutBySample.java
individual
id
....
sample
id
phenotypes
...
stats
multiAllelicCountnumGenes
compHetCount
deletionOverlapCountnumTranscripts
- byType : Map<KnockoutType, long>
genes[] : KnockoutGene
id
name
biotype
status
chromosome
start
end
strand
source
description
transcripts[] : KnockoutTranscript
id
homAltVariants[]
multiAllelicVariants[]
compHetVariants[]
deletionOverlapVariants
chromosome
start
end
biotype
variants[] : KnockoutVariant
id
genotype
filter
qual
knockoutType : [HOM_ALT, COMP_HET, MULTI_ALLELIC, DELETION_OVERLAP]
sequenceOntologyTerms[]
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{ "sample" : "NA19600", "genesCount" : 5, "transcriptsCount" : 15, "countByType" : { "homAltCount" : 15, "multiAllelicCount" : 2, "compHetCount" : 0, "deletionOverlapCount" : 0 }, "genes" : [ { "id" : "ENSG00000186470", "name" : "BTN3A2", "biotypetranscripts" : "protein_coding",[ { "transcriptsid" : [ {"ENST00000377708", "idbiotype" : "ENST00000377708protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000508906", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000356386", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000527422", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000396948", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000527639", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000396934", "biotype" : "protein_coding", "homAltVariants" : [ "6:26370605:T:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000198919", "name" : "DZIP3", "biotypetranscripts" : "protein_coding",[ { "transcriptsid" : [ {"ENST00000361582", "idbiotype" : "ENST00000361582protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000463306", "biotype" : "protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000479138", "biotype" : "protein_coding", "homAltVariants" : [ "3:108634973:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000215182", "name" : "MUC5AC", "biotypetranscripts" : "protein_coding", [ { "transcriptsid" : ["ENST00000621226", { "idbiotype" : "ENST00000621226protein_coding", "homAltVariants" : [ ], "multiAllelicVariants" : [ "11:1158073:T:C", "11:1158073:T:-" ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000147874", "name" : "HAUS6", "biotypetranscripts" : "protein_coding", [ { "transcriptsid" : ["ENST00000380496", { "idbiotype" : "ENST00000380496protein_coding", "homAltVariants" : [ "9:19058483:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000380502", "biotype" : "protein_coding", "homAltVariants" : [ "9:19058483:C:A" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] }, { "id" : "ENSG00000099937", "name" : "SERPIND1", "biotypetranscripts" : "protein_coding", [ { "transcriptsid" : ["ENST00000215727", { "idbiotype" : "ENST00000215727protein_coding", "homAltVariants" : [ "22:20780030:-:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] }, { "id" : "ENST00000406799", "biotype" : "protein_coding", "homAltVariants" : [ "22:20780030:-:C" ], "multiAllelicVariants" : [ ], "compHetVariants" : [ ], "deletionOverlapVariants" : [ ] } ] } ] } |
Table of Contents:
Table of Contents | ||
---|---|---|
|