As OpenCGA wraps the packages Samtools and GATK, users can easily call variants from their alignment files, i.e., BAM files. This tutorial details how to use the command line to the Samtools and GATK wrappers.
Prerequisites
A working setup of OpenCGA is required to setup a testing environment, please follow the steps on installation guide.
In addition, you need to download the following data files:
Preparing the reference genome: index and dictionary files
In order to use the ref.fasta file, it has to be linked to the OpenCGA catalog:
$ ./opencga.sh files link -i ~/ref.fasta --path call/ --parents
Once linked the Fasta file, you need to index it by running the samtools wrapper with the command faidx. The FASTA index file (ref.fasta.fai) is created in the folder of the input FASTA file (ref.fasta):
And you need to create the sequence dictionary file for that FASTA file, again you run the samtools wrapper with the command dict. The sequence dictionary file (ref.dict) is created in the folder of the input FASTA file (ref.fasta):
Once sorted the BAM file, you can index it by running the samtools wrapper with the command index. The BAM index file (mother.sorted.bam.bai) is created:
$ ./opencga.sh alignments samtools --command index --input-file mother.sorted.bam
Variant calling
You can call variants by running the Gatk running the samtools wrapper with the command HaplotypeCaller:
The variant calls are saved in the VCF file mother.vcf that can ben downloaded from the OpenCGA catalog to the local directory /tmp by using the following command: