Write a PREreview

Rice Grain Classification Using Vision Transformer (ViT) Architecture

by Shubham Singh, Sudeep Marwah, Rahul Neware, and Akash Hosamani

Posted: October 28, 2025
Server: Preprints.org
DOI: 10.20944/preprints202510.1976.v1

Globalfoodsecurity, precise market pricing, and efficient quality con trol are all hampered by the subjective, labor-intensive, and error-prone nature of traditional manual classification techniques for rice, a staple commodity. Much research into machine vision and deep learning technologies has been prompted by the growing need for automated, non-destructive, and effective methods for rice va riety identification. Despite their notable achievements in this field, Convolutional Neural Networks (CNNs) frequently struggle to capture long-range relationships and achieve optimal generalization across a variety of visually similar and distinct rice types, which is a constant problem. Using the sophisticated capabilities of Vision Transformer (ViT) models, this research suggests a novel method for au tomated rice type detection. In comparison to conventional CNN architectures, ViTs are highly respected for their capacity to manage global dependencies and continuously produce competitive, and frequently better, performance in challeng ing image classification tasks. The suggested ViT-based approach is intended to get over the inherent difficulties of differentiating minute details, such as specific morphological, morphological, and color traits, among different species of rice. The model is set up for effective feature extraction and reliable pattern learning straight from image data by using its potent self-attention mechanism, negating the need for extensive pre-processing for raw images. The goal of this research is to create a reliable and extremely accurate classification system for a variety of rice types, taking inspiration from prior works that show high classification accuracies, such as RiceSeedNet, which achieved 97% for 13 rice seed variants and 99% for 8 rice grain varieties. The successful implementation of this Vision Transformer model is anticipated to significantly enhance precision agriculture by providing a more reliable, consistent, and scalable solution for the identification of rice seeds and grains, thereby supporting farmers and the broader agricultural industry in ensuring product quality and contributing to global food security.

You can write a PREreview of Rice Grain Classification Using Vision Transformer (ViT) Architecture. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.