ISME 15 QIIME Workshop

11 03 2014

Hello QIIME users,
The QIIME Developers will be teaching a workshop after ISME 15 on August 30th, 2014 in Seoul, South Korea. This is the Saturday following ISME, and will be held at Seoul National University, near the ISME venue. You can find details on the workshop, including a link to the application, at:

http://qiime.org/workshops/isme15-qiime.pdf

Hope to see you at ISME 15!

- Greg, on behalf of the QIIME development group





QIIME 1.8.0 is live!

12 12 2013

Hello QIIME users,
Today we’re very excited to announce the release of QIIME 1.8.0, which is packed with new features that expand the functionality and usability of QIIME. The QIIME 1.8.0 Virtual Box and EC2 images will be ready later this week. Here are some of the highlights in 1.8.0:

First, one of the most frequent comments that we get about QIIME (which we completely agree with) is that it is difficult to install. To address this, we’ve defined a QIIME “base install” package, which is installable with pip so greatly reduces the complexity of QIIME installation. The QIIME base install package supports running the most commonly used QIIME commands with default parameters, and in most cases will be sufficient for your entire QIIME analyses. You can find discussion of this in the updated QIIME Installation Guide.

Next, our PCoA plots are no longer based on KiNG, but on the new Emperor 3D plotting package. This enables more advanced analysis of PCoA plots in the context of sample metadata, and is one step toward our larger goal of improving QIIME’s interactive visualization capabilities.

We’ve added support for assembling Illumina paired end reads in the new join_paired_ends.py script, which wraps  ea-utils and SeqPrep,  and for working with alternative barcoding schemes used on the Illumina platform with the new extract_barcodes.py. A new tutorial has been added for working with alternative barcoding schemes.

We’ve updated QIIME’s default taxonomy assigner to be the new uclust-based consensus taxonomy assigner. This was shown to be more accurate and faster than the existing methods (Bokulich, Rideout et al. (submitted)). Important: the RDP Classifier is no longer the default taxonomy assigner used in QIIME.

otu_category_significance.py has been removed in favor of group_significance.py, which supports additional types of tests, and is more maintainable and extensible than otu_category_significance.py.

core_diversity_analysis.py has a new parameter, --recover_from_failure, that allows the user to re-run the script on an existing output directory and will only re-run analyses that haven’t already been run. This supports rapid recovery from failed runs, and additionally allows the user to add categories to a previous run, which is very common and previously required a full re-run.

We’ve added new script, estimate_observation_richness.py, which implements some of the interpolation and extrapolation richness estimators in Colwell et al. (2012). Important: This script should be considered beta software; it is currently an experimental feature in QIIME.

Finally, we’ve updated our documentation on contributing to QIIME. If you’re interested in helping us make QIIME better there are lots of ways that you can get involved. See our new Contributing to QIIME document.

And this is all just scratching the surface. There are many more features in QIIME 1.8.0 – for more details you should review the full ChangeLog.

Enjoy!

Greg (on behalf of the expanding list of QIIME developers)





PyNAST 1.2.1 release is live, and future PyNAST announcements will be posted to the QIIME Blog

16 11 2013

Hi all,

I’m happy to announce the PyNAST 1.2.1 release, available for download here. This is primarily a bug fix release, allowing PyNAST to make smarter decisions about where to store temporary files (QIIME issues #999 and #1114). This had been an issue for some PyNAST users working in cluster environments. If this hasn’t been an issue for you, it is not important to upgrade from PyNAST 1.2.0.

Also, all future news and announcements related to PyNAST will be posted here on the QIIME blog. Merging these blogs will help us reduce our administrative burden. The PyNAST blog has been very low traffic (four posts in nearly four years) so we don’t expect this to increase the number of posts here noticeably.

Thanks to the QIIME users how helped us track down this PyNAST issue!

Greg





Bug in compare_categories.py, supervised_learning.py, and detrend.py

30 10 2013
QIIME users,
 
A bug was recently discovered in QIIME 1.7.0 (and previous QIIME versions) that may affect some of QIIME’s scripts that wrap R functionality. These scripts include compare_categories.py, supervised_learning.py, and detrend.py.
 
The bug involves a discrepancy between the way that QIIME parses metadata mapping files. QIIME has two different parsers written in the Python and R programming languages. The Python parser strips leading/trailing whitespace from mapping file data fields, while the R parser does not. This can lead to the exclusion of samples, and/or incorrect grouping of samples based on a mapping file category, when the R parser is used. This bug may affect the results of compare_categories.py (only adonis, db-RDA, Moran’s I, MRPP, and PERMDISP; the other methods are fine), supervised_learning.py, and detrend.py. Mapping files that successfully passed check_id_map.py’s validation tests may still be affected by this issue.
 
IMPORTANT: If your mapping file has leading/trailing whitespace (with or without double quotes) in any of the mapping file fields and you used any of the aforementioned scripts, *your results may be incorrect*. It is very easy to accidentally add leading/trailing whitespace to your mapping file, especially if you are editing one by hand in a spreadsheet program such as Excel, so this bug may affect you even if you’re sure that your mapping file fields don’t have this issue.
 
Unfortunately, we cannot predict how your results may have changed due to this bug (e.g., we cannot predict whether statistical tests will be overly conservative, or how far off a test statistic might be from the correct value) since samples with leading/trailing whitespace will be dropped from analyses, and incorrect groupings of samples based on a categorical variable may be created based on whitespace. For example, a Treatment category may have some sample groups labeled as Control and Fast, but if a couple of the samples were labeled ‘ Fast   ‘ (without quotes), these samples would be artificially grouped together, instead of being grouped with the rest of the Fast samples. Thus, due to the random nature of sample exclusion and artificial grouping, we cannot predict how your results might be affected.
 
This parsing bug, as well as check_id_map.py’s validation tests and documentation, have been fixed in the latest development version of QIIME 1.7.0-dev (fixed on October 3, 2013, commit 5669b5891c26c9631c465243c046cdc33d0f8ba7). In order to avoid serious issues like this in the future, we are working on migrating the R parsing code into its own CRAN package (with extensive unit tests), as well as formally defining a metadata mapping file format so that other tools can consistently implement and validate this format.
 
If you are using a release version (or an older development version) of QIIME, we have put together a workaround that you can use until the next QIIME release. There is a script called strip_mapping_file_fields.py (hosted as a Gist on GitHub) that takes a mapping file as input, strips any double quotes (“), then strips any leading/trailing whitespace, and writes the resulting data to a new mapping file. This script requires that QIIME 1.7.0 is installed. If you are a MacQIIME user, you will need to first activate your MacQIIME environment by running the ‘macqiime’ command.
 
We highly recommend that you first use check_id_map.py to validate your mapping file and *ensure that there are no errors or warnings*. Next, use this script to “cleanse” your mapping file before using any of the aforementioned scripts (compare_categories.py, supervised_learning.py, and detrend.py).
 
Here’s how you can download and use the script. Assuming your mapping file is named map.txt, run:
 
check_id_map.py -m map.txt -o check_id_map_output
# Fix any issues with your mapping file until there are no errors or warnings.
git clone https://gist.github.com/7009545.git strip_mapping_file_fields
python strip_mapping_file_fields/strip_mapping_file_fields.py -m map.txt -o map_fixed.txt
# Continue with your analyses, using map_fixed.txt.
 
We apologize for any inconvenience this may cause you, and we will continue striving to prevent bugs of this nature as QIIME development progresses. As always, please get in touch with us on the QIIME forum if you have any issues or questions.
 
–Jai




Announcing the release of a QIIME-formatted version of the SILVA 111 reference database.

25 06 2013

We’re please to announce the availability of a QIIME-formatted version of the SILVA 111 reference database. The aligned SILVA 111 SSU Ref fasta export (SSURef_111_tax_silva_full_align_trunc.fasta.tgz) was downloaded from SILVA (Quast et al 2013; http://www.arb-silva.de/). This file was filtered and then clustered into OTUs at  99, 97, 94, and 90 percent similarity within QIIME using UCLUST. Included are OTU maps, taxonomy mapping files, aligned and unaligned reference sequences.  Trees are included only for eukaryotes (18s) as many people may choose to use a different resource, such as the Greengenes reference OTUs, for bacteria and archaea (16S).  Thus, representative set and taxonomy files that only contain eukaryotes (18S) are included in this release. For more detail on reference file generation please see the included notes file (SILVA_111_QIIME_format_notes.txt).

You can download this reference collection from here: QIIME-Silva-111.

Laura Wegener Parfrey





Greengenes 13_5

20 05 2013

The Greengenes Consortium is pleased to announce the release of Greengenes 13_5, expanding our coverage of the Archaea and Bacteria to 1,262,986 sequences with 203,452 99% OTUs and 99,322 97% OTUs. You can find the files for this release on the official Greengenes 13_5 database page.

This new release comes with a few backend changes:

* The ARB database (to be released a few days behind this official release) will include sequences as aligned by PyNAST v1.1 (Caporaso et al 2010) and are safe for probe design. Thanks to Les Dethlefsen for contacting us about this.

* Alignments from SSU-Align (Nawrocki 2009) and PyNAST (Caporaso et al 2011) are now provided. The de novo tree used by Greengenes is based on the SSU-Align alignment but we are now providing the PyNAST sequences for completeness.

* Chimera checking is now only performed using UCHIME (Edgar et al. 2011). As with the last release, the reference database used was comprised of the consensus sequences from the 94% OTUs the previous release (GG 12_10 in this case). Previously flagged chimeras are still excluded.

* Mappings to the Integrated Microbial Genomes database are now included.

* The inference for determining whether a record is a named isolate or clone has been revamped.

* An outline of the taxonomy changes is included that describes the taxonomic name change, the number of affected tips, and an example identifier that can be traced back to the tree.

The top 5 major group changes are:

Screen Shot 2013-05-20 at 3.12.08 PM

Suggestions and improvements to the taxonomy were made by a collection of people. We’d like to thank Alex Probst, Bing Ma, Francesca DeFilippis, Kyle Bittinger, Niels Larsen and Cathy Lozupone for their feedback. Greengenes is a living project, and the feedback we receive is vital for improving the utility of this resource.

The Greengenes Consortium is composed of the University of Colorado, Second Genome, and the University of Queensland.

We look forward to hearing your feedback!

- Daniel McDonald (on behalf of the Consortium)





QIIME 1.7.0 is live!

15 05 2013

We’re very excited to announce the QIIME 1.7.0 release. This includes tons of new features and documentation updates, so lots of new stuff to play with!

The biggest highlights are listed below, but for the adventurous you can view this awesome list of all of the QIIME commits.

Changes in QIIME 1.7.0 include:

* Preliminary (beta-level) support for integration with the Galaxy Framework. Using the new QIIME-Galaxy integration utility, you will be able to run QIIME within Galaxy, taking advantage of the Galaxy GUI.

* The core_qiime_analyses.py script has been replaced with the new core_diversity_analyses.py. This is a complete refactoring to support only “downstream” analyses (i.e., starting with a BIOM table). We’ve found that this makes the script more widely applicable as it’s now general to any BIOM data and/or different OTU picking strategies. See issues #477 and #688 for details.

* The three main OTU picking workflows, pick_otus_through_otu_table.py, pick_reference_otus_through_otu_table.py, and pick_subsampled_reference_otus_through_otu_table.py have been renamed for clarity to pick_de_novo_otus.py, pick_closed_reference_otus.py, and pick_open_reference_otus.py, respectively. We have additionally created a new OTU picking document that describes the differences between these OTU picking protocols, and illustrates how to use each one. See issue #708 for more details on this. These updates include support for usearch 6, which is not available publicly yet, but due out soon.

* New documentation on working with BIOM tables in QIIME, using state strings to select certain samples that should be included in analyses, and tracking the source of microbial communities using SourceTracker.

* Under the hood, we’ve completely refactored the QIIME workflow framework to support easier development of QIIME workflow scripts in the future. We’ve also merged the qiime_test_data repository into QIIME, which facilitates our testing efforts.

* The per_library_stats.py script has been removed in favor of biom-format’s print_biom_table_summary.py, which provides additional information on top of per_library_stats.py.

* summarize_taxa.py now outputs taxa summary tables in both classic (TSV) and BIOM formats by default. This will allow taxa summary tables to be used with other QIIME scripts that expect BIOM files as input, including core_diversity_analyses.py and many of the scripts discussed here.

* We found and fixed a bug in the creation of biplots using make_3d_plots.py. This bug would change the placement of taxonomic groups based on how many taxa were included in the biplot analysis (the default was 10). Based on several tests that we ran, this did not effect biological conclusions that would be drawn from the data, but we suggest rerunning to verify that this is the case in your biplots. For more details and a more in-depth explanation see issue #677.

Finally, due to the ever-increasing number of samples and amount of metadata in studies of microbial ecology, we’re very excited to announce a new software package, Emperor, which we plan to use as a replacement for KiNG for viewing 3D plots in QIIME 1.8.0. This is a new WebGL, non-Java, Chrome-specific, visualization tool that scales much better than KiNG. This is now available as beta software, if you’d like to try it out. You can see details on the features in its ChangeLog. We encourage QIIME users to test this new tool and add suggestions and report bugs in the Emperor issue tracker.

Enjoy QIIME 1.7.0! This is only a partial list of the changes – you can view the full ChangeLog with further detail here. As always, get in touch on the QIIME Forum with any questions, and submit Pull Requests with any contributions.

We hope to see you at ASM next week!

Greg (with help from Antonio, Jose, Yoshiki and Jai), on behalf of the ever-growing QIIME development group








Follow

Get every new post delivered to your Inbox.

Join 540 other followers