Escribe una PREreview

What Matters: Datasets or Robust Frameworks in Modern Robot Learning?

por Md Selim Sarowar

Publicada: 17 de junio de 2026
Servidor: Preprints.org
DOI: 10.20944/preprints202606.1149.v1

Recent progress in robot learning has relied on two investments: larger datasets and more capable models. Vision-language-action (VLA) policies now report success rates above 90% on standard benchmarks, yet perturbation studies show the same policies collapsing to near 0% when object positions, instructions, or scene layouts shift, exposing memorization where competence was claimed. This survey asks whether progress comes mainly from data, from models, or from an interaction that current evaluations often obscure. We review more than 200 papers spanning VLA architectures, world models, reinforcement-learning post-training, robot manipulation datasets, data generation pipelines, scaling studies, and perturbation benchmarks, including a structured analysis of a 100-paper survey set centered on the ICLR 2026 world-model literature. We catalogue every major public manipulation dataset with size, embodiment coverage, collection method, and known weaknesses; we reconstruct the evidence on data scaling laws and data quality; and we trace the evaluation crisis from benchmark inflation through memorization diagnoses to factor-level robustness decompositions. Our synthesis is that the question is ill-posed as a dichotomy: data diversity dominates in-distribution gains, model class and training objective dominate out-of-distribution retention, and current benchmarks confound the two because train and test conditions coincide. We state the conditions under which each answer holds, identify bottlenecks per subfield, and propose falsifiable research directions, including counterfactually structured datasets, world-model-regularized policies, and factor-controlled evaluation protocols.

Puedes escribir una PREreview de What Matters: Datasets or Robust Frameworks in Modern Robot Learning?. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.