PREreview del Phylogenomics, Biogeography, and a New Family-level Classification of Silversides, Rainbowfishes, and Allies (Teleostei: Atheriniformes)

por Fabricius Domingos, Felipe Paiva, Rhamon de Castro Malheiro, Fernanda S. Caron, Amanda Varago, Matheus Salles y 1 otro autor del DEVL Ecology & Evolution Club

Publicado: 18 de junio de 2026
DOI: 10.5281/zenodo.20753016
Licencia: CC BY 4.0

Overall assessment

This manuscript presents a substantial phylogenomic and biogeographic investigation of Atheriniformes, integrating a large exon-capture dataset with extensive taxonomic sampling from GenBank and providing one of the most comprehensive evolutionary frameworks currently available for the group. The study addresses long-standing questions regarding interfamilial relationships, diversification history, and historical biogeography. Particularly noteworthy are the broad taxonomic coverage, the explicit consideration of long-branch attraction, and the attempt to incorporate geographically explicit biogeographic models. The resulting phylogenetic hypotheses and proposed taxonomic revisions will undoubtedly stimulate further research on the evolution of Atheriniformes.

We view this as a valuable and timely contribution that substantially advances our understanding of a challenging and historically contentious group. We are confident that the manuscript will attract considerable interest from the systematic and evolutionary biology communities and will find an appropriate publication venue after revision. Our comments are intended to strengthen the manuscript and improve the transparency of several analytical decisions that are central to the authors’ conclusions.

Minor comments

Writing and presentation

The manuscript is generally well written, logically organized, and easy to follow. However, we recommend a careful revision of verb tenses throughout the Results section, where shifts occasionally affect consistency.
The title accurately reflects the content of the manuscript and clearly communicates its scope. Nevertheless, it is somewhat descriptive and does not fully capture some of the most compelling aspects of the study, particularly the biogeographic findings and the proposed taxonomic revision, which are highlighted more effectively in the abstract.

Introduction

The manuscript is not structured around a single explicit hypothesis or prediction, but rather around a series of systematic, biogeographic, and taxonomic objectives. This is entirely reasonable given the scope of the study, although stating these objectives more explicitly may help readers understand the rationale behind the analyses.
The biogeographic component represents one of the major strengths of the manuscript. However, the Introduction would benefit from a more detailed explanation of the conceptual basis of the DEC model and why geographically explicit models are expected to outperform traditional binary marine/freshwater coding schemes in this system.
Given the complex taxonomic history of Atheriniformes and the proposed changes to family-level classification, an additional figure summarizing previous classifications and major phylogenetic hypotheses could substantially improve accessibility for non-specialist readers (e.g., Crawford et al 2012).

Methods

The overall methodological framework is appropriate and well aligned with the objectives of the study.
The section describing the BioGeoBEARS analyses would benefit from additional detail. In particular, it is unclear whether alternative biogeographic models were formally compared before selecting DEC as the preferred framework.
The lineage-through-time plot in the Results is a valuable addition, but the corresponding methodology and its purpose are currently missing from the Methods section. Please consider updating the Methods.
Some methodological details currently occupying substantial space could potentially be moved to supplementary materials (e.g., details on calibration points for divergence time estimation), allowing greater emphasis on the biogeographic analyses and model-selection procedures.

Results and figures

Several figures could be reorganized to improve readability. In particular, some phylogenetic trees currently presented in the main text may be better suited for supplementary materials.
- The rationale for selecting specific phylogenetic trees for downstream analyses should be made more explicit.
- The interpretation of the lineage-through-time analysis in Figure 3B would benefit from further explanation.
- The organization of Figures 3 and 4 is somewhat confusing, particularly because portions of one figure appear to overlap conceptually with the other.
- Figure legends should more clearly indicate which topology was selected for each analysis and why that particular topology was preferred.
Whenever geological periods or eras are discussed, approximate numerical ages should also be provided to improve accessibility for readers outside historical biogeography and palaeontology.

Discussion and taxonomic conclusions

The Discussion generally places the findings in an appropriate evolutionary context and acknowledges several limitations of the phylogenetic analyses.
However, the manuscript lacks a dedicated conclusion section summarizing the principal findings and broader implications of the study.
Because the proposed family-level reclassification is one of the most significant outcomes of the manuscript, the rationale supporting the taxonomic revision could be synthesized more explicitly. Briefly describing the major changes and their reasons at the beginning of this section would help.
The status of Bleheratherina as incertae sedis should be further explained and justified, as readers may not immediately understand why this taxon could not be confidently assigned within the proposed classification.

Major comments

BioGeoBEARS analyses and model justification

The biogeographic component is one of the most innovative and potentially impactful aspects of the manuscript. The explicit incorporation of multiple marine and freshwater regions represents a substantial improvement over traditional binary habitat coding and provides an elegant framework for investigating the evolutionary history of habitat transitions.

However, the methodological justification for the choice of DEC requires further development. The manuscript states that DEC was selected as the best-fitting model, but it remains unclear whether alternative BioGeoBEARS models were formally evaluated and compared. Because model choice can strongly influence ancestral-range reconstructions and inferred numbers of habitat transitions, readers would benefit from a clearer description of the model-selection procedure and the criteria used to justify DEC over competing alternatives.

More broadly, given that the central biogeographic conclusions depend heavily on these reconstructions, greater methodological detail regarding area definitions, dispersal constraints, and model comparison would substantially strengthen the manuscript. We encourage the authors to expand this section and provide sufficient information to allow readers to evaluate the robustness of the biogeographic inferences.

Divergence-time estimation: methodological choices require stronger justification

We encourage the authors to provide a more detailed justification for the divergence-time analyses.

First, it is unclear why only twelve loci were selected for dating and why loci with the lowest proportion of missing data were considered the most appropriate choice. Missing-data levels alone are not necessarily informative regarding a locus's suitability for molecular dating. Loci with fewer missing sites are not inherently more clock-like, more informative, or better fitting to relaxed-clock models. Likewise, a lower proportion of missing data does not imply a greater number of informative sites (e.g., variable sites, parsimony-informative sites, or other measures of phylogenetic information content). Consequently, the rationale for selecting these twelve loci remains unclear. A potentially more defensible strategy would be to select loci based on clock-likeness metrics, using approaches such as SortaDate or similar methods specifically designed for divergence-time analyses. A second concern relates to the apparent reduction of the dataset to alleviate computational burden. While this is a reasonable practical consideration, the current literature generally suggests that the number and quality of loci are more important for divergence-time estimation than the number of sampled taxa. Consequently, an alternative strategy involving fewer representatives per major lineage but substantially more loci may have provided a more robust temporal framework.

Related to this issue, the relationship between the selected loci and the constrained species-tree topologies deserves further consideration. If the gene trees associated with these twelve loci differ substantially from the species-tree topology used to constrain the analyses, this discrepancy could potentially affect convergence and parameter estimation. Given that only a small subset of loci was retained for dating, readers would benefit from a clearer assessment of how representative these loci are relative to the phylogenomic dataset as a whole and whether alternative selection criteria were considered.

The use of RelTime in the dating analyses is also not entirely clear from the current description. The manuscript states that RelTime was used to generate ultrametric starting trees satisfying the fossil calibrations, but it is difficult to understand precisely how these trees were incorporated into the subsequent BEAST analyses and why this approach was preferred over alternative strategies. Additional methodological detail would improve reproducibility and help readers understand the role of RelTime within the analytical pipeline. More broadly, if RelTime was used solely to generate starting trees compatible with the calibration scheme, the authors should explicitly state this and explain the rationale behind this decision. If the procedure was adopted to facilitate convergence, it would be useful to clarify whether alternative starting conditions were explored and whether convergence diagnostics suggested any sensitivity to the initial tree. There is a risk that using starting trees that are similar to the imposed topological constraint (species tree) might reduce the ability of the MCMC chains to explore the maximum likelihood landscape and result in sub-optimal estimates, and this possibility must be addressed.

While these concerns do not necessarily invalidate the dating results, we believe that a more detailed justification of the analytical choices is necessary to allow readers to properly evaluate the robustness of the temporal framework presented in the manuscript.

Tree selection, phylogenetic uncertainty, and analytical decision-making

One of the strengths of this study is the extensive exploration of phylogenetic uncertainty through nineteen separate analyses. However, the criteria used to determine which topologies should be preferred for downstream analyses remain unclear.

A fundamental difficulty is that these analyses were conducted using different datasets and analytical configurations, making direct statistical comparisons difficult or impossible. Ideally, alternative topologies would be evaluated within a common statistical framework, but this is not feasible when the underlying datasets differ substantially. As a result, it remains unclear how the authors determined which trees should be preferred for biogeographic reconstruction, divergence-time estimation, and taxonomic interpretation. Based on Figure 2 and the subsequent analyses, preference for particular relationships appears to be influenced largely by the frequency with which they are recovered across analyses. While congruence among analyses is certainly informative, a majority-vote approach is not necessarily equivalent to statistical support. More explicit criteria for topology selection would therefore improve transparency and reproducibility.

More importantly, the results suggest that extremely short internal branches near the base of the tree are the primary source of phylogenetic uncertainty. The authors attempted to address this issue through the exclusion of long-branched taxa. While this is a reasonable exploratory strategy from a systematic perspective, it is less clear that it directly addresses the underlying causes of conflict from a phylogenetic inference standpoint. In particular, the manuscript would benefit from a more explicit discussion of why long-branch exclusion was preferred over alternative approaches designed to identify and mitigate sources of topological instability. For example, identifying rogue taxa using dedicated algorithms such as RogueNaRok may help determine whether a small number of problematic terminals are disproportionately influencing topology estimation. Such approaches directly target instability in tree inference and may provide a more informative framework for evaluating conflicting relationships.

Similarly, evaluating substitutional saturation across the complete dataset (particularly at third codon positions) could provide important insights into the source of the observed instability. If evidence of saturation is detected, excluding third codon positions from the analyses may improve recovery of deep relationships and help alleviate the difficulties associated with short internal branches. This approach has been successfully applied in large-scale phylogenomic studies facing similar challenges, such as the avian phylogeny of Jarvis et al. (2014), where alternative coding schemes and the exclusion of potentially saturated signals contributed to resolving deep divergences. Because the deepest nodes in the present study appear to be characterized by short internodes and conflicting signals, exploring saturation explicitly may prove particularly informative.

More generally, these approaches may help identify an analytical strategy that better fits the underlying properties of the dataset. Such a strategy could potentially improve the fit between model and data, provide a clearer basis for selecting among competing topologies, and reduce reliance on comparisons among numerous alternative analyses that cannot be formally discriminated using a common statistical framework.

This issue becomes particularly important because the proposed family-level classification depends on relationships inferred from precisely those regions of the tree where support is weakest and topological instability is greatest. Therefore, a more extensive discussion of short deep branches, phylogenetic uncertainty, and the criteria used to select preferred topologies would substantially strengthen confidence in the taxonomic conclusions.

This review was developed collaboratively by members of the DEVL (Diversity and Evolution Laboratory) PREREVIEW Club during a group peer-review exercise intended to train graduate research students. Reviewers: Amanda Varago, Fabricius Domingos, Felipe José Batista, Fernanda Caron, Júnior Nadaline, Matheus Salles, and Rhamon Malheiro.

Competing interests

The authors declare that they have no competing interests.

Use of Artificial Intelligence (AI)

The authors declare that they did not use generative AI to come up with new ideas for their review.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.