Oral Presentation Australian Microbial Ecology 2019

Community profiling in the age of genomes (#48)

Ben J Woodcroft 1 , Rhys Newell 1 , Gene W Tyson 1
  1. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia

Microbial community profiling has long been a primary focus in microbial ecology, and is often performed by amplification of the 16S rRNA gene and well established bioinformatic routines. The increasing use of shotgun metagenomics to characterise communities presents particular challenges to researchers, because reads cannot be easily grouped together to form operational taxonomic units (OTUs) as amplicon sequences can be. However, shotgun metagenomics is a much richer source of information for community profiling than traditional amplicon studies. Here we present three separate but related research projects that concentrate on different aspects of this difficult class of bioinformatic problems.

 

CoverM is a user-friendly tool for calculating community profiles from metagenomes in the context of a reference set of genomes. It uses established read mapping techniques, using an efficient algorithm to determine several coverage statistics e.g. mean coverage of a genome, or fraction of a genome covered by at least one read. Borrowing concepts from RNAseq bioinformatics, we then explore how alternative methods to direct read mapping based on de Bruijn graphs can be used for profiling. These techniques can, theoretically at least, calculate community profiles with a resolution that is able to separate two genomes with a single base pair difference. Finally, we present a scalable technique for community profiling in the absence of reference genomes, one that generates OTU table similar in many ways to amplicon-based profiles. The software (SingleM) has been applied to ~10,000 publicly available environmental metagenomes and will be made available as a website where sequence types recovered from a researcher’s sample can be related to the increasingly vast set of public data.

 

These software are available at:

http://github.com/wwood/CoverM

http://github.com/wwood/SingleM