Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Include Page
Data serialization systems such as Avro, Thrift and Google Protobuf can use the ReadAlignment schema and then read and write alignments in a compact, fast and binary data format. In addition, ReadAlignment objects can be stored in the Parquet format and then can be processed by a number of different systems: Spark, Hive, Impala and others.
Example
This section shows an alignment in SAM format and the equivalent ReadAlignment.
Example of alignment in SAM format:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
1_229454865_229455276_0:0:0_0:0:0_0 83 1 229455177 60 100M = 229454865 -412 AGTGCTATTTGGATTCATCCCATATGGGCCCCATCTTGTGGTCTGAGGCCTGACAGGGCTCACCTGCAAGCTCGGTTCTCTGCTGTCTTTGATATGGACT ???????????????????????????????????????????????????????????????????????????????????????????????????? NM:i:0 AS:i:100 XS:i:0 |
ReadAlignment object for the previous alignment (JSON format):
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
{
"id" : {
"string" : "1_229454865_229455276_0:0:0_0:0:0_0"
},
"readGroupId" : "no-group",
"fragmentName" : "1",
"improperPlacement" : {
"boolean" : true
},
"duplicateFragment" : {
"boolean" : false
},
"numberReads" : {
"int" : 2
},
"fragmentLength" : {
"int" : -412
},
"readNumber" : {
"int" : 0
},
"failedVendorQualityChecks" : {
"boolean" : false
},
"alignment" : {
"org.ga4gh.models.LinearAlignment" : {
"position" : {
"referenceName" : "1",
"position" : 229455176,
"strand" : "NEG_STRAND"
},
"mappingQuality" : {
"int" : 60
},
"cigar" : [ {
"operation" : "ALIGNMENT_MATCH",
"operationLength" : 100,
"referenceSequence" : null
} ]
}
},
"secondaryAlignment" : {
"boolean" : false
},
"supplementaryAlignment" : {
"boolean" : false
},
"alignedSequence" : {
"string" : "AGTGCTATTTGGATTCATCCCATATGGGCCCCATCTTGTGGTCTGAGGCCTGACAGGGCTCACCTGCAAGCTCGGTTCTCTGCTGTCTTTGATATGGACT"
},
"alignedQuality" : [ 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30 ],
"nextMatePosition" : {
"org.ga4gh.models.Position" : {
"referenceName" : "1",
"position" : 229454865,
"strand" : "POS_STRAND"
}
},
"info" : {
"AS" : [ "i", "100" ],
"XS" : [ "i", "0" ],
"NM" : [ "i", "0" ]
}
} |
More information and references
- G4GH's schema documentation: http://ga4gh-schemas.readthedocs.io/en/latest/schemas/reads.proto.html
- Sequence Alignment/Map Format Specification: https://samtools.github.io/hts-specs/SAMv1.pdf
- Sequence Alignment/Map Optional Fields Specification: https://samtools.github.io/hts-specs/SAMtags.pdf
Table of Contents:
Table of Contents | ||
---|---|---|
|