Skip to main content

Write a PREreview

Fine-Grained Taxonomy with Vision Models: A Benchmark on Long-Tailed and Domain-Adaptive Classification

Posted
Server
Preprints.org
DOI
10.20944/preprints202507.1714.v1

Ground beetles are a highly sensitive and speciose biological indicator, critical for biodiversity monitoring, yet their taxonomic classification remains underutilized due to the manual effort required for species differentiation based on subtle morphological variations. In this paper, we present a benchmark for fine-grained taxonomic classification, evaluating 12 vision models, across four diverse, long-tailed datasets spanning over 230 genera and 1769 species. These datasets include both controlled laboratory images and challenging field-collected (in-situ) photographs. We investigate two key real-world challenges: sample efficiency and domain adaptation. Our results show that 1) a Vision and Language Transformer with an MLP head achieves best performance, with 97% genus-level and 94% species-level accuracy; 2) efficient subsampling allows train data to be cut in half with minimal performance degradation; 3) model performance significantly drops in domain shift from lab to in-situ settings, highlighting a critical domain gap. Overall, our study lays a foundation for scalable, fine-grained taxonomic classification of beetles and supports broader applications in sample-efficient and domain-adaptive learning for ecological computer vision.

You can write a PREreview of Fine-Grained Taxonomy with Vision Models: A Benchmark on Long-Tailed and Domain-Adaptive Classification. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now