Our recent breakthroughs and advances in culture independent techniques, such as whole genome shotgun (WGS) metagenomics and 16S rRNA amplicon sequencing have dramatically changed the way we can examine microbial communities. But does the hype of microbiome outweighs the potential of our understanding of this ‘second genome’? There are many hurdles to tackle before we are able to identify and compare bacteria driving changes in their ecosystem. In addition to the bioinformatics challenges, current statistical methods are limited to make sense of these complex data that are inherently sparse, compositional and multivariate.
I will discuss some of the topical challenges in 16S and WGS data analysis, including the presence of confounding variables and batch effects and some experimental design considerations. I will present our latest methodological developments to identify multivariate microbial signatures using Projection to Latent Structure (PLS) dimension reduction methods, and our recent advances in data integration for microbiome data, including longitudinal data. Our methods are implemented in our R toolkit mixOmics dedicated to biological (omics) data integration. I will illustrate these challenges and some proposed solutions on several microbial community studies from our network of collaborators.