Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

To describe alignments with additional information about the fragment and the read, we adopt the read alignment schema defined by the Global Alliance for Genomics and Health (GA4GH). A ReadAlignment object is equivalent to an alignment line in a SAM file. Note that we use a plain text file to store the header lines of a SAM file (i.e., lines with program and chromosomes information).

Data serialization systems such as Avro, Thrift and Google Protobuf can use the ReadAlignment schema and then read and write alignments in a compact, fast and binary data format. In addition, ReadAlignment objects can be stored in the Parquet format and then can be processed by a number of different systems: Spark, Hive, Impala and others.

Example

This section shows an alignment in SAM format and the equivalent ReadAlignment.

Example of alignment in SAM format:

Code Block
languagetext
themeRDark
linenumberstrue
1_229454865_229455276_0:0:0_0:0:0_0    83    1    229455177    60    100M    =    229454865    -412    AGTGCTATTTGGATTCATCCCATATGGGCCCCATCTTGTGGTCTGAGGCCTGACAGGGCTCACCTGCAAGCTCGGTTCTCTGCTGTCTTTGATATGGACT    ????????????????????????????????????????????????????????????????????????????????????????????????????    NM:i:0    AS:i:100    XS:i:0

ReadAlignment object for the previous alignment (JSON format):

Code Block
languagetext
themeRDark
linenumberstrue
{
  "id" : {
    "string" : "1_229454865_229455276_0:0:0_0:0:0_0"
  },
  "readGroupId" : "no-group",
  "fragmentName" : "1",
  "improperPlacement" : {
    "boolean" : true
  },
  "duplicateFragment" : {
    "boolean" : false
  },
  "numberReads" : {
    "int" : 2
  },
  "fragmentLength" : {
    "int" : -412
  },
  "readNumber" : {
    "int" : 0
  },
  "failedVendorQualityChecks" : {
    "boolean" : false
  },
  "alignment" : {
    "org.ga4gh.models.LinearAlignment" : {
      "position" : {
        "referenceName" : "1",
        "position" : 229455176,
        "strand" : "NEG_STRAND"
      },
      "mappingQuality" : {
        "int" : 60
      },
      "cigar" : [ {
        "operation" : "ALIGNMENT_MATCH",
        "operationLength" : 100,
        "referenceSequence" : null
      } ]
    }
  },
  "secondaryAlignment" : {
    "boolean" : false
  },
  "supplementaryAlignment" : {
    "boolean" : false
  },
  "alignedSequence" : {
    "string" : "AGTGCTATTTGGATTCATCCCATATGGGCCCCATCTTGTGGTCTGAGGCCTGACAGGGCTCACCTGCAAGCTCGGTTCTCTGCTGTCTTTGATATGGACT"
  },
  "alignedQuality" : [ 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30 ],
  "nextMatePosition" : {
    "org.ga4gh.models.Position" : {
      "referenceName" : "1",
      "position" : 229454865,
      "strand" : "POS_STRAND"
    }
  },
  "info" : {
    "AS" : [ "i", "100" ],
    "XS" : [ "i", "0" ],
    "NM" : [ "i", "0" ]
  }
}

More information and references

Include Pageoskar:Read Alignmentoskar:Read Alignment



Table of Contents:

Table of Contents
indent20px