Greengenes 13_5

20 05 2013

The Greengenes Consortium is pleased to announce the release of Greengenes 13_5, expanding our coverage of the Archaea and Bacteria to 1,262,986 sequences with 203,452 99% OTUs and 99,322 97% OTUs. You can find the files for this release on the official Greengenes 13_5 database page.

This new release comes with a few backend changes:

* The ARB database (to be released a few days behind this official release) will include sequences as aligned by PyNAST v1.1 (Caporaso et al 2010) and are safe for probe design. Thanks to Les Dethlefsen for contacting us about this.

* Alignments from SSU-Align (Nawrocki 2009) and PyNAST (Caporaso et al 2011) are now provided. The de novo tree used by Greengenes is based on the SSU-Align alignment but we are now providing the PyNAST sequences for completeness.

* Chimera checking is now only performed using UCHIME (Edgar et al. 2011). As with the last release, the reference database used was comprised of the consensus sequences from the 94% OTUs the previous release (GG 12_10 in this case). Previously flagged chimeras are still excluded.

* Mappings to the Integrated Microbial Genomes database are now included.

* The inference for determining whether a record is a named isolate or clone has been revamped.

* An outline of the taxonomy changes is included that describes the taxonomic name change, the number of affected tips, and an example identifier that can be traced back to the tree.

The top 5 major group changes are:

Screen Shot 2013-05-20 at 3.12.08 PM

Suggestions and improvements to the taxonomy were made by a collection of people. We’d like to thank Alex Probst, Bing Ma, Francesca DeFilippis, Kyle Bittinger, Niels Larsen and Cathy Lozupone for their feedback. Greengenes is a living project, and the feedback we receive is vital for improving the utility of this resource.

The Greengenes Consortium is composed of Phil Hugenholtz, Todd DeSantis, Rob Knight and Daniel McDonald. We would particularly like to thank Phil for his heroic curation efforts, a monumental task that makes Greengenes possible.

We look forward to hearing your feedback!

- Daniel McDonald (on behalf of the Consortium)





QIIME 1.7.0 is live!

15 05 2013

We’re very excited to announce the QIIME 1.7.0 release. This includes tons of new features and documentation updates, so lots of new stuff to play with!

The biggest highlights are listed below, but for the adventurous you can view this awesome list of all of the QIIME commits.

Changes in QIIME 1.7.0 include:

* Preliminary (beta-level) support for integration with the Galaxy Framework. Using the new QIIME-Galaxy integration utility, you will be able to run QIIME within Galaxy, taking advantage of the Galaxy GUI.

* The core_qiime_analyses.py script has been replaced with the new core_diversity_analyses.py. This is a complete refactoring to support only “downstream” analyses (i.e., starting with a BIOM table). We’ve found that this makes the script more widely applicable as it’s now general to any BIOM data and/or different OTU picking strategies. See issues #477 and #688 for details.

* The three main OTU picking workflows, pick_otus_through_otu_table.py, pick_reference_otus_through_otu_table.py, and pick_subsampled_reference_otus_through_otu_table.py have been renamed for clarity to pick_de_novo_otus.py, pick_closed_reference_otus.py, and pick_open_reference_otus.py, respectively. We have additionally created a new OTU picking document that describes the differences between these OTU picking protocols, and illustrates how to use each one. See issue #708 for more details on this. These updates include support for usearch 6, which is not available publicly yet, but due out soon.

* New documentation on working with BIOM tables in QIIME, using state strings to select certain samples that should be included in analyses, and tracking the source of microbial communities using SourceTracker.

* Under the hood, we’ve completely refactored the QIIME workflow framework to support easier development of QIIME workflow scripts in the future. We’ve also merged the qiime_test_data repository into QIIME, which facilitates our testing efforts.

* The per_library_stats.py script has been removed in favor of biom-format’s print_biom_table_summary.py, which provides additional information on top of per_library_stats.py.

* summarize_taxa.py now outputs taxa summary tables in both classic (TSV) and BIOM formats by default. This will allow taxa summary tables to be used with other QIIME scripts that expect BIOM files as input, including core_diversity_analyses.py and many of the scripts discussed here.

* We found and fixed a bug in the creation of biplots using make_3d_plots.py. This bug would change the placement of taxonomic groups based on how many taxa were included in the biplot analysis (the default was 10). Based on several tests that we ran, this did not effect biological conclusions that would be drawn from the data, but we suggest rerunning to verify that this is the case in your biplots. For more details and a more in-depth explanation see issue #677.

Finally, due to the ever-increasing number of samples and amount of metadata in studies of microbial ecology, we’re very excited to announce a new software package, Emperor, which we plan to use as a replacement for KiNG for viewing 3D plots in QIIME 1.8.0. This is a new WebGL, non-Java, Chrome-specific, visualization tool that scales much better than KiNG. This is now available as beta software, if you’d like to try it out. You can see details on the features in its ChangeLog. We encourage QIIME users to test this new tool and add suggestions and report bugs in the Emperor issue tracker.

Enjoy QIIME 1.7.0! This is only a partial list of the changes – you can view the full ChangeLog with further detail here. As always, get in touch on the QIIME Forum with any questions, and submit Pull Requests with any contributions.

We hope to see you at ASM next week!

Greg (with help from Antonio, Jose, Yoshiki and Jai), on behalf of the ever-growing QIIME development group





QIIME 1.6.0 is live!

19 12 2012

We’re very excited to announce the QIIME 1.6.0 release, which you can download from here. As usual, we’ll have the updated AWS and VirtualBox images available tomorrow.

Some of the highlights in QIIME 1.6.0 are:

* We’ve updated RDP Classifier re-training support to allow any number of ranks in training files, as long as the number of ranks is uniform. This removes the need for special RDP training files in reference OTU collections. We’ve also posted some tips for defining your own training files.

* assign_taxonomy.py now supports assignment with tax2tree version 1.0 and mothur version 1.25.0.

* Added a new script, compute_core_microbiome.py, which identifies the core OTUs (i.e., those defined in some user-defined percentage of the samples).

* Added a new script, compare_taxa_summaries.py to allow for the graphical and statistical comparison of the taxonomic composition of samples.

* Added a new script, filter_taxa_from_otu_table.py, which allows users to filter OTUs with (or without) specific taxonomy assignments from an OTU table.

* make_distance_boxplots.py and make_distance_comparison_plots.py can now perform Student’s two-sample t-tests to determine whether a pair of boxplots/distributions are significantly different.

* compare_alpha_diversity.py now supports both parametric and nonparametric two sample t-tests (nonparametric is the default).

* Detrending of quadratic curvature in ordination coordinates can now be performed with the new detrend.py script. This is the approach that was used in Figure 3 of Harris JK, et al. “Phylogenetic stratigraphy in the Guerrero Negro hypersaline microbial mat.” The ISME Journal (2012).

* Added experimental support for translated database mapping through map_reads_to_reference.py and parallel_map_reads_to_reference.py. This is analogous to closed-reference OTU picking, but translates queries to search against a protein database, so is useful for mapping metagenomic or metatranscriptomic data against databases of functional genes (e.g., IMG). Currently BLAT and usearch are supported for translated searching.

* Added new script load_remote_mapping_file.py and accompanying tutorial to allow exporting and downloading of mapping files stored as Google Spreadsheets. This should be considered experimental (i.e., beta) software at this stage!

* The parallel framework in QIIME has been completely re-written to allow for easier development of new parallel scripts and to facilitate changes to the underlying parallel functionality.

* Building from the recent development of the qiime_test_data repository (as discussed here), which contains example input and output for most QIIME scripts, the script documentation was updated so usage examples correspond to the example input and output files in qiime_test_data. Our script interface testing was greatly improved by this addition as well, which makes the QIIME scripts and documentation more reliable. 

* We replaced add_taxa.py with add_metadata.py to support addition of more general metadata to BIOM files (e.g., OTU tables). add_metadata.py is a script in the biom-format pacakge (a core QIIME dependency). You can find help with add_metadata.py here.

We’ve also re-organized our tutorials page, and added a lot of new tutorials. These include tutorials for predicting mislabeled samplesanalyzing fungal ITS data with QIIMEcomparing taxonomic summaries, loading mapping files from Google Spreadsheets, and analyzing shotgun sequencing data with QIIME (an experimental feature). We’ve also added a new overview tutorial based on analyzing Illumina data which we developed for the ISME 14 QIIME/MG-RAST workshop, rewrote our documentation on parallel QIIME and QIIME parameter files, and expanded our notes on how you can get involved in QIIME development.

So as you can tell, there is a lot packed into this release. You can see a more complete list of changes in our ChangeLog

Enjoy!

Greg, on behalf of the expanding QIIME development group





UNITE/QIIME 12_11 ITS reference OTUs now available (alpha release!)

27 11 2012

Hi all,

I’m pleased to announce that as part of a collaborative effort we have released the 12_11 alpha version of the UNITE/QIIME ITS reference OTUs. These are now linked from the QIIME resources page. These can be used for open or closed reference OTU picking, and for assigning taxonomy to ITS reads. Taxonomy can be assigned using BLAST in QIIME 1.5.0, or BLAST and RDP in QIIME 1.5.0-dev (and in the coming 1.6.0 release). See the acknowledgements list for the list of those involved.

The 12_11 release should be considered an alpha release, meaning that it is a very early stage release and errors are certain to exist. You should interpret results with care. We are releasing this now to provide tools to better support ITS analysis in QIIME, and we’ll be improving these OTUs over time. These are derived from the UNITE database (UNITE_public_24.09.12 release). You can find details on how these were created in the README.md file packaged with the release. Efforts have been made by UNITE to improve the taxonomic information associated with some of the sequences in their database. The QIIME reference sequence sets linked here have not been subject to any other form of curation (manual or automated) and certainly include incorrectly identified taxonomy, chimeras, and other problematic sequences. Improved fungal rDNA ITS reference sets based on the semi-curated centroids for sequence clusters in the UNITE Global Key Annotations module will soon be available from the QIIME resources page.

Please direct any feedback (either issues that you notice or feature requests) to the its-reference-otus issue tracker on GitHub, and direct any questions to the QIIME Forum. All files are additionally available in the its-reference-otus GitHub repository

You can use these files in the same way as you would use the Greengenes reference OTUs, just substituting files from this data set as the reference sequences and taxonomy.

This collaboration arose from the Sloan Foundation‘s Fungal ITS Workshop held on 19–20 October, 2012 in Boulder, Colorado, USA.

Enjoy!

Greg

 

 





QIIME is now hosted on GitHub, and bug in compare_alpha_diversity.py

26 10 2012

Hello QIIME users,
As you may or may not have noticed, as of 16 October QIIME is now hosted on GitHub rather than Sourceforge. This includes our source code revision control (which is now in git rather than svn), as well as our website. As always, you should still access the QIIME website through www.qiime.org, which now points to our site on GitHub.

We also ported all feature requests and bug reports to GitHub Issues, and have additionally moved our trac system from a locally-hosted, controlled access system to a GitHub-hosted, publicly accessible system (that same GitHub Issues tracker), so users can now check in to see what we’re working on. All new feature requests and bug reports should be submitted though GitHub Issues.

The switch to GitHub was primarily motivated by its Pull Request system, which facilitates a more open development environment than we think is possible on Sourceforge. Developers outside of the QIIME group can now fork the repository and make changes (e.g., add a new feature, fix a bug, or update some documentation). When your changes are ready, you can then issue a pull request and the developers will review those changes and either merge the pull request or get back to you with requests for modifications. As you can see here and here, we’ve already had a few pull requests from developers outside of the core developer group. Thanks for these initial submissions, and keep those pull requests coming! If you want to find out if we’re interested in incorporating a new feature that you’d like to work on, just ask on the QIIME Forum.

We have also recently expanded the qiime_test_data repository, which contains example input and output for nearly all of the QIIME scripts (and the remaining few will have test data added shortly). This provides an additional source of documentation for users (you can see what valid input looks like for a script, for example, see here for the pick_otus_through_otu_table.py script), and is also used to help us keep our usage examples up-to-date. You’ll notice now that when you call a script with the -h parameter that the input and output data corresponds to data in the qiime_test_data repository. We test these usage examples using qiime_test_data on a nightly basis as part of our automated testing system, so we now know within 24 hours if we’ve made a change that broke one of our usage examples.

Thanks to all of the QIIME developers who participated in the 15-16 October code sprint where we implemented these changes and additions!

Finally, we recently noticed a bug in the compare_alpha_diversity.py code that would result in an incorrect t-statistic and p-value being generated. This bug made the test more conservative (i.e., an insignificant p-value was returned when it should have been significant). This has been fixed, but affects all versions of QIIME prior to commit d223376 (22 October 2012). Sorry for the inconvenience! Please get in touch on the QIIME forum if you have any questions about this.

Greg





Greengenes 12_10 is released!

16 10 2012

We’re pleased to announce that the Greengenes 12_10 release is live. This release takes Greengenes from 408k sequences to over 1 million. At the OTU level, Greengenes has grown from 35k 97% OTUs to 85k 97% OTUs.

The large increase in the number of reference sequences is particularly exciting for researchers working on less-well-characterized environments. For example, when performing reference-based OTU picking for the “88 soils” (Lauber et al, 2009) data, the number of reads assigned to the reference database rose from 53.7k when picking OTUs against the 4feb2011 OTUs to 66.4k when picking OTUs against the 12_10 OTUs. This means that we have reliable, detailed taxonomic information for many more of the OTUs, as well as a more reliable tree relating these OTUs (relative to a de novo constructed tree). We see a similar increase in the number of assigned OTUs, albeit less dramatic, in the Moving Pictures of the Human Microbiome (Caporaso et al, 2011) study, going from 18.9k with 4feb2011 to 20.0k with 12_10. It is likely that we will continue to see large gains in the number of OTUs that match Greengenes in less well-characterized environments with subsequent releases.

We are also excited to announce that Greengenes is now protected under the Creative Commons Share-Alike license, permanently placing the resource in the public domain. In addition, a guiding body, the Greengenes Consortium, has been formed to direct the resource into the future. The Consortium represents academic and biotech interests with members from the Australian Centre for Ecogenomics at the University of Queensland, the BioFrontiers Institute at the University of Colorado, and Second Genome, Inc.

The release can be obtained from http://greengenes.secondgenome.com.

Due to the substantial increase in the size of Greengenes, we performed several comparisons of the OTUs and taxonomy against the prior Greengenes release to confirm that the results match what we expect. Specifically, we performed closed- and open-reference OTU picking on the Moving Pictures of the Human Microbiome (Caporaso et al, 2011) data and on the “88 soils” (Lauber et al, 2009) data. Procrustes analysis (performed with transform_coordinate_matrices.py and compare_3d_plots.py) shows a very strong concordence between the UniFrac PCoA plots resulting from both methods of OTU picking with both the 4feb2011 and 12_10 Greengenes OTUs. For example, comparing closed-reference OTU picking against the 4feb2011 Greengenes OTUs and the 12_10 Greengenes OTUs on the 88 soils data yielded a highly significant Procrustes result (M2=0.028, p<0.01; samples colored by soil pH):

Taxonomy summaries of the Moving Pictures samples are highly correlated as well (p<0.001; performed with compare_taxa_summaries.py), where the largest differences results from the reclassification of some Tenericutes sequences from 4feb2011 (top panel) as Firmicutes in 12_10 (bottom panel):

You can use the 12_10 Greengenes OTUs with QIIME in the same way as the 4feb2011 release. You can pass representative set fasta files for reference-based OTU picking (open-reference OTU picking discussed here and closed-reference OTU picking discussed here), or use the sequences and taxonomy files to retrain the RDP classifier as described here.

Enjoy!

Greg and Daniel





A repository for QIIME example input and output

26 07 2012

Hello QIIME users,

We’ve recently begun assembling a repository of example input and output files for use with QIIME. You can find this in our GitHub repository here:

https://github.com/qiime-dev/qiime_test_data

The repository has been moved to: https://github.com/qiime/qiime_test_data.

This repository currently contains example input and output for around 30% of QIIME’s scripts, and we’ll be continuing to add more example input and output over the next few months. You can find the example input and output for a script by looking up the script name in the repository at the link above. For example, example input and output for filter_samples_from_otu_table.py in the filter_samples_from_otu_table directory in the qiime_test_data repository. Note that the example input and output files correspond to the usage examples that you see when you call a script with it’s -h parameter or on the script’s documentation page (e.g., filter_samples_from_otu_table.py), so you can see exactly how to use the script and try it out for yourself.

This repository is designed to serve a few purposes. First, it’s an additional source of documentation. For example, if you’re trying to assemble data for use with QIIME you can review the corresponding example input to see what your files should look like. You can also use it for debugging: if a QIIME script is failing, you can test the script with the example data from the qiime_test_data repository to confirm that the script is working correctly with input that you know is valid. Next, the QIIME development group uses this for automated testing. In the repository you’ll see a script_usage_tests.py file. We run this (call it with the -h parameter to see how) to confirm that all of the script interfaces are currently working with example input. This has historically been hard for us to do in QIIME, and this new script testing framework will help keep QIIME more stable. This can also be run via Qiime/tests/all_tests.py by adding the qiime_test_data_dir to your .qiime_config file: in that case, every time you run all_tests.py, the interfaces for the scripts will be tested in addition to the full unit test suite.

Get in touch on the QIIME forum with any questions.

Greg








Follow

Get every new post delivered to your Inbox.

Join 417 other followers