As OpenCGA wraps the packages Samtools and GATK, users can easily call variants from their alignment files, i.e., BAM files. This tutorial details how to use the command line to the Samtools and GATK wrappers.
A working setup of OpenCGA is required to setup a testing environment, please follow the steps on installation guide.
In addition, you need to download the following data files:
Before calling variants you need to prepare your reference genome (FASTA) file and your input alignment (BAM) file. So, you need to index your FASTA file and create the sequence dictionary file for that FASTA file. In addition, you need to index your BAM file. Next sections show the OpenCGA command lines to do it.
Preparing the reference genome: index and dictionary files
In order to use the ref.fasta file, it has to be linked to the OpenCGA catalog:
$ ./opencga.sh files link -i ~/ref.fasta --path call/ --parents
Once linked the Fasta file, you need to index it by running the Samtools wrapper with the command faidx:
The FASTA index file (ref.fasta.fai) will be created in the folder of the input FASTA file (ref.fasta). You can retrieve information about the created FASTA index file by running the following command:
$ ./opencga.sh files info --file ref.fasta.fai
And you need to create the sequence dictionary file for that FASTA file, again you run the Samtools wrapper with the command dict. The sequence dictionary output file (ref.dict) will be created in the folder of the input FASTA file (ref.fasta):
Once sorted the BAM file, you can index it by running the Samtools wrapper with the command index. The index file (mother.sorted.bam.bai) will be created in the folder of the input BAM file (mother.sorted.bam):
$ ./opencga.sh alignments samtools-run --command index --input-file mother.sorted.bam
You can call variants by running the Gatk wrapper with the command HaplotypeCaller: