Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Nice phylogenetic trees with ggtree

1 minute read

Published:

This is a simple tutorial on how to do a good-looking tree in R, using the package ggtree. For reference, I will show how to do the Archaeal Tree of Life, as calculated by the GTDB.

portfolio

publications

Operational Gene Clusters and intrinsic uncertainty in pangenome analyses

Published in BioRxiv, 2022

A key step for comparative genomics is to group open reading frames into functionally and evolutionarily meaningful gene clusters. Such clustering is complicated by intraspecific gene duplications and horizontal gene transfers, that are frequent in prokaryotes. In consequence, gene clustering methods must deal with a trade-off between identifying vertically transmitted representatives of multi-copy gene families (recognizable by synteny conservation) and retrieving complete sets of species-level orthologs. We studied the conceptual and practical implications of adopting homology, orthology, or synteny conservation as formal criteria for gene clustering by performing comparative analyses of 125 prokaryotic pangenomes. We found that clustering criteria affect pangenome functional characterization, core genome inference, and reconstruction of ancestral gene content to different extents. Pangenome and core genome sizes are affected by the same multiplicative factor in all species, which allows for robust across-species comparisons regardless of the clustering criterion. However, across-species comparisons of pangenome diversity, gene fluxes, and functional profiles are substantially affected by inconsistencies among clustering criteria. Such inconsistencies are driven not only by mobile genetic elements, but also by genes involved in defense, secondary metabolism, and other functions overrepresented in the accessory genome. In the case of pangenome diversity, the variability that can be attributed to methodological inconsistencies can exceed the effect sizes of ecological and phylogenetic variables. Our results emphasize the need to consider conceptual adequacy, not just computational performance, when designing workflows for pangenome analysis. We also provide a benchmarking dataset to assess the robustness and across-method reproducibility of future comparative studies.

Recommended citation: Manzano-Morales, Saioa, Yang Liu, Jaime Huerta-Cepas, and Jaime Iranzo. 2022. “Operational Gene Clusters and Intrinsic Uncertainty in Pangenome Analyses.” BioRxiv 2022.09.25.509376 https://www.biorxiv.org/content/10.1101/2022.09.25.509376v1.full.pdf

talks

teaching