Bug fixed in QIIME OTU picker

1 03 2010

We’ve fixed a bug in pick_otus.py that affected svn versions 596 (02/09/10) through 751(02/23/10).
This bug does NOT affect the stable QIIME 0.9 release, since it was introduced later.

Short explanation of the bug:

Inadvertedly, the cd-hit prefix filter was turned ON by default with a prefix length of 50. The prefix pre-filter collapses all reads having the same 50 nucleotides at their 5′ end before sending them to cd-hit. In most cases, this has the effect that the resulting number of OTUs is smaller than without the filter. The intended use of this prefilter is with huge data sets, where cd-hit doesn’t scale very well, however we recommend to set the prefix length to 100.

How to tell if you were affected:

If you have used pick_otus.py or the workflow script pick_otus_through_otu_table.py with default values and did an svn update since  2/9/2010 you are most likely affected. The log file written by pick_otus.py tells if you were really affected. If you were using the workflow script the log file will be in YOUR_QIIME_OUTDIR/cdhit_picked_otus/*_otus.log .

With prefix prefiltering disabled the log file will contain the line:

“No prefix-based prefiltering.”

With prefix pre-filtering enabled it will have lines similar to this:

“Prefix-based prefiltering, prefix length: 50
Prefix-based prefiltering, post-filter num seqs: 45”

What to do if you are affected:

If you were running just a single QIIME study and the results look right, you might just leave it as is. However, if you want to compare your results to other QIIME studies, it is important that the same OTU picking method is used. In that case we advise you to re-run the workflow after doing an svn up.




%d bloggers like this: