Multimodal and Multilingual Fake News Detection using MuRIL and Vision Transformers with Explainable AI
- Posted
- Server
- Preprints.org
- DOI
- 10.20944/preprints202602.0333.v1
The exponential rise of digital media has democratized information access but has concurrently fostered an”infodemic” of fake news, particularly in linguistically diverseand rapidly digitizing regions like India. Traditional fake news detection systems predominantly focus on high-resource languages such as English and often operate as interpretability-lacking ”black boxes,” failing to address the nuances of code-mixed regional content and multimodal (text and image) dis-parate information. This paper proposes a robust Multimodal and Multilingual Fake News Detection System that uniquely integrates the MuRIL (Multilingual Representations for Indian languages) transformer for context-aware text analysis and Vision Transformers (ViT) for granular image feature extraction.Unlike conventional approaches that use CNNs for visual data,our system leverages ViT to capture global dependencies in im-ages. We implement a novel Cross-Attention fusion mechanism todynamically align textual and visual features. Furthermore, to enhance trust and transparency, we integrate Explainable AI (XAI) modules—specifically SHAP for text and Grad-CAM for distinct visual saliency. Experimental evaluations on a comprehensivedataset of Indian news demonstrate that our architecture achievesan accuracy of approximately 82%, significantly outperforming unimodal baselines, while providing actionable, human-readable explanations for its decisions.