A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use

Reginato, Marcelo

dc.contributor.author	Reginato, Marcelo	pt_BR
dc.date.accessioned	2024-04-12T06:19:44Z	pt_BR
dc.date.issued	2022	pt_BR
dc.identifier.issn	2167-8359	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/274659	pt_BR
dc.description.abstract	Background: Genome skimming is a popular method in plant phylogenomics that do not include a biased enrichment step, relying on random shallow sequencing of total genomic DNA. From these data the plastome is usually readily assembled and constitutes the bulk of phylogenetic information generated in these studies. Despite a few attempts to use genome skims to recover low copy nuclear loci for direct phylogenetic use, such endeavor remains neglected. Causes might include the trade-off between libraries with few reads and species with large genomes (i.e., missing data caused by low coverage), but also might relate to the lack of pipelines for data assembling. Methods: A pipeline and its companion R package designed to automate the recovery of low copy nuclear markers from genome skimming libraries are presented. Additionally, a series of analyses aiming to evaluate the impact of key assembling parameters, reference selection and missing data are presented Results: A substantial amount of putative low copy nuclear loci was assembled and proved useful to base phylogenetic inference across the libraries tested (4 to 11 times more data than previously assembled plastomes from the same libraries). Discussion: Critical aspects of assembling low copy nuclear markers from genome skims include the minimum coverage and depth of a sequence to be used. More stringent values of these parameters reduces the amount of assembled data and increases the relative amount of missing data, which can compromise phylogenetic inference, in turn relaxing the same parameters might increase sequence error. These issues are discussed in the text, and parameter tuning through multiple comparisons tracking their effects on support and congruence is highly recommended when using this pipeline. The skimmingLoci pipeline (https://github.com/mreginato/ skimmingLoci) might stimulate the use of genome skims to recover nuclear loci for direct phylogenetic use, increasing the power of genome skimming data to resolve phylogenetic relationships, while reducing the amount of sequenced DNA that is commonly wasted	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.relation.ispartof	PeerJ. Corte Madera. Vol. 10 (Dec. 2022), e14525, 25 p.	pt_BR
dc.rights	Open Access	en
dc.subject	Genome skimming	en
dc.subject	Filogenética	pt_BR
dc.subject	Low copy	en
dc.subject	Mapping reads	en
dc.subject	High-throughput sequencing	en
dc.subject	R package	en
dc.title	A pipeline for assembling low copy nuclear markers from plant genome skimming data for phylogenetic use	pt_BR
dc.type	Artigo de periódico	pt_BR
dc.identifier.nrb	001189352	pt_BR
dc.type.origin	Estrangeiro	pt_BR

Files in this item

Name:: 001189352.pdf
Size:: 3.837Mb
Format:: PDF
Description:: Texto completo (inglês)

View/Open

This item is licensed under a Creative Commons License

Journal Articles (40281)

Biological Sciences (3173)

Show simple item record