Bracing against the wind  

Thursday, August 16, 2012

Bwa, Mosaik and Bwasw Notes

Although bwa finds decent alignments when modeling whole-genome sequencing, we do a lot of targeted capture.   In one study, where the capture process was already well-vetted, bwa reported 20% of the reads as aligning "off target", even after doing our standard quality filtering.  And more than half of those weren't even mapping to the correct chromosome.

As an experiment, we compared the results to Mosaik.   Mosaik can, in some ways, be used to model "truth" for alignment algorithms since it's a fast smith waterman "complete" aligner.

Mosaik, when aligning to the whole genome, found that only 1% of our reads were off target, and none (zero) were off-chromosome.

That's a pretty scary result, except that we knew that there were paralogs out there, and expected some ambiguity, so this exacerbated bwa's issues.

The only other aligner we tested that performed "correctly" on that data set was bwasw, and we had to write our own "paired-end mismatch discarder" because it doesn't handle paired end reads.   It's remarkably accurate though.

