Escrever uma avaliação PREreview

Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management

de Tsuyoshi Okita

Publicado: 12 de maio de 2026
Servidor: Preprints.org
DOI: 10.20944/preprints202605.0736.v1

Multimodal reinforcement learning agents must fuse signals with vastly different noise profiles—yet existing architectures, whether monolithic (π0, DreamerV3) or modular (MSDP, VTDexManip), allow noise from unreliable modalities to contaminate reliable ones at the point of fusion. We propose filter-before-mixing: each modality’s representation is independently refined by a per-modality Flow Matching module before spectral-domain fusion via a Fourier Neural Operator (FNO), with a residual gate ensuring that refinement is never harmful. The resulting architecture, FreamerV1 (Filter-before-mixing dreamer), has 93M parameters (0.4M trainable). On MiniGrid, FreamerV1 reaches 100% success at 5000 episodes, surpassing the 94% encoder-only baseline which degrades to 78% due to catastrophic forgetting. On Crafter (no language modality), it scores 16.0%, exceeding DreamerV3 (14.5%). On PAMAP2 wearable sensors—where no pre-trained encoder exists—the foundation encoder achieves 2.4× higher reward and 16× lower variance than a vanilla MLP, confirming that the filter-before-mixing advantage grows with encoder noise.

Você pode escrever uma avaliação PREreview de Filter Before Mixing: Per-Modality Denoising for Multimodal RL with Application to Health Management. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.