Saltar al contenido principal

Escribe una PREreview

MDL-AE: Investigating the Trade-Off Between Compressive Fidelity and Discriminative Utility in Self-Supervised Learning

Publicada
Servidor
Preprints.org
DOI
10.20944/preprints202511.2123.v1

Current paradigms in Self-Supervised Learning (SSL) achieve state-of-the-art results through complex, heuristic-driven pretext tasks such as contrastive learning or masked image modeling. This work proposes a departure from these heuristics by reframing SSL through the fundamental principle of Minimum Description Length (MDL). We introduce the MDL-Autoencoder (MDL-AE), a framework that learns visual representations by optimizing a VQ-VAE-based objective to find the most efficient, discrete compression of visual data. We conduct a rigorous series of experiments on CIFAR-10, demonstrating that this compression-driven objective successfully learns a rich vocabulary of local visual concepts. However, our investigation uncovers a critical and non-obvious architectural insight: despite learning a visibly superior and higher-fidelity vocabulary of visual concepts, a more powerful tokenizer fails to improve downstream performance, revealing that the nature of the learned representation dictates the optimal downstream architecture. We show that our MDL-AE learns a vocabulary of holistic object parts rather than generic, composable primitives. Consequently, we find that a sophisticated Vision Transformer (ViT) head, a state-of-the-art tool for understanding token relationships, consistently fails to outperform a simple linear probe on the flattened feature map. This architectural mismatch reveals that the most powerful downstream aggregator is not always the most effective. To validate this, we demonstrate that a dedicated self-supervised alignment task, based on Masked Autoencoding of the discrete tokens, resolves this mismatch and dramatically improves performance, bridging the gap between generative fidelity and discriminative utility. Our work provides a compelling end-to-end case study on the importance of co-designing objectives and their downstream architectures, showing that token-specific pre-training is crucial for unlocking the potential of powerful aggregators.

Puedes escribir una PREreview de MDL-AE: Investigating the Trade-Off Between Compressive Fidelity and Discriminative Utility in Self-Supervised Learning. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.

¿Qué es un ORCID iD?

Un ORCID iD es un identificador único que te distingue de otros/as con tu mismo nombre o uno similar.

Comenzar ahora