Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
This tutorial details how to use the OpenCGA alignment command line to run the alignment/mapping pipeline steps. The alignment pipeline outputs alignments in BAM files from raw sequence data in FastQ format files. BAM files can be used for further analysis, such as alignment statistics, coverage computation or variant calling.
Prerequisites
A working setup of OpenCGA is required to setup a testing environment, please follow the steps on installation guide.
In addition, you need to download the following data files:
- Raw sequence data file: input.fastq
- Reference genome: reference.fasta
The alignment pipeline
Quality control for raw sequence data: FastQC subcommand
In order to use the input.fastq file, it has to be linked to the OpenCGA catalog:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh files link -i ~/input.fastq --path test/ --parents |
Once linked the FastQ file, you can run the FastQC command:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh alignments fastqc --file input.fastq |
For the input.fastq file, the FastQC command creates a report file called input_fastqc.html that can be downloaded from the OpenCGA catalog to the local directory /tmp by using the following command:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh files download --file input_fastqc.html --to /tmp |
Here is the FastQC report file: input_fastqc.html.
Mapping raw sequences: BWA subcommand
First, link the reference.fasta file to the OpenCGA catalog:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh files link -i ~/reference.fasta --path test/ --parents |
Then, you can run the bwa index command to index database sequences in the FASTA format:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh alignments bwa --command index --fasta-file reference.fata |
The Internally, the index for the reference.fasta file created by the bwa index command consists of the following files:
Code Block | ||||
---|---|---|---|---|
| ||||
reference.fasta.bwt reference.fasta.pac reference.fasta.ann reference.fasta.amb reference.fasta.sa |
Once created the index, you can map the FastQ file by using the bwa mem command:
Code Block | ||||
---|---|---|---|---|
| ||||
$ ./opencga.sh alignments bwa --command mem --index-base-file reference.fasta --fastq1-file input.fastq --sam-file output.sam |
In the previous command, the result alignments are saved in SAM format in the output.sam file .
Converting to and sorting BAM files
....
Indexing and querying BAM files
....
Computing and querying BAM coverage
....
Computing and querying BAM statistics
....
Table of Contents:
Table of Contents | ||
---|---|---|
|