OpenCGA Alignment Engine provides a solution to storage and process sequence alignment data from Next-Generation Sequencing (NGS) projects. The Alignment Engine supports the most common alignment file formats, i.e.: SAM, BAM and CRAM, and takes the alignment data model specification from GA4GH and the implementation from OpenCB GA4GH. See a full description at Alignment Data Model.
We do not define or endorse any dedicated unaligned sequence data format. Instead we recommend storing such data in one of the alignment formats (SAM, BAM, or CRAM) with the unmapped flag set
index
query
coverage
statistics
In addtion, OpenCGA provides wrappers to the following third-party alignment software packages:
FastQC: a quality control tool for high throughput sequence data.
BWA: a software package for mapping low-divergent sequences against a large reference genome.
Samtools: a program for interacting with high-throughput sequencing data in SAM, BAM and CRAM formats.
deepTools: a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.
OpenCGA Alignment User Interfaces
OpenCGA provides two interfaces to allow users execute the alignment tools and analysis:
Command line inteface
RESTful web services interface
OpenCGA command line interface
The OpenCGA command line interface to manage alignment data is accessible through the script opencga.sh using the command alignments:
$ ./opencga.sh alignments
Usage: opencga.sh alignments <subcommand> [options]
Subcommands:
index Index alignment file
query Search over indexed alignments
stats-run Compute stats for a given alignment file
stats-info Retrieve stats for a given alignment file
stats-query Fetch alignment files according to their stats
coverage-run Compute coverage for a given alignemnt file
coverage-query Query the coverage of an alignment file for regions or genes
coverage-ratio Compute coverage ratio from file #1 vs file #2, (e.g. somatic vs germline)
bwa BWA is a software package for mapping low-divergent sequences against a large reference genome.
samtools Samtools is a program for interacting with high-throughput sequencing data in SAM, BAM and CRAM formats.
deeptools Deeptools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq.
fastqc A quality control tool for high throughput sequence data.