The emergence of 6G wireless networks promises to revolutionize vehicular communications by enabling ultra-reliable, low-latency, and high-capacity data exchange. In this context, collaborative perception techniques, where multiple vehicles or infrastructure nodes cooperate to jointly receive and decode transmitted signals, aim to enhance reliability and spectral efficiency for Connected Autonomous Vehicle (CAV) applications. In this paper, we propose an end-to-end wireless neural receiver based on a Differential Transformer architecture, tailored for 6G V2X communication with a specific focus on enabling collaborative perception among connected autonomous vehicles. Our model integrates key components of the 6G physical layer, designed to boost performance in dynamic and challenging autonomous driving environments. We validate the proposed system across a range of scenarios, including 3GPP-defined Urban Macro (UMa) channel. To assess the model's real-world applicability, we evaluate its robustness within a V2X framework. In a collaborative perception scenario, our system processes heterogeneous LiDAR and camera data from four connected vehicles in dynamic cooperative vehicular networks. The results show significant improvements over state-of-the-art methods, achieving an average precision of 0.84, highlighting the potential of our proposed approach to enable robust, intelligent, and adaptive wireless cooperation for next-generation connected autonomous vehicles.