What Matters: Datasets or Robust Frameworks in Modern Robot Learning?
- Publicada
- Servidor
- Preprints.org
- DOI
- 10.20944/preprints202606.1149.v1
Recent progress in robot learning has relied on two investments: larger datasets and more capable models. Vision-language-action (VLA) policies now report success rates above 90% on standard benchmarks, yet perturbation studies show the same policies collapsing to near 0% when object positions, instructions, or scene layouts shift, exposing memorization where competence was claimed. This survey asks whether progress comes mainly from data, from models, or from an interaction that current evaluations often obscure. We review more than 200 papers spanning VLA architectures, world models, reinforcement-learning post-training, robot manipulation datasets, data generation pipelines, scaling studies, and perturbation benchmarks, including a structured analysis of a 100-paper survey set centered on the ICLR 2026 world-model literature. We catalogue every major public manipulation dataset with size, embodiment coverage, collection method, and known weaknesses; we reconstruct the evidence on data scaling laws and data quality; and we trace the evaluation crisis from benchmark inflation through memorization diagnoses to factor-level robustness decompositions. Our synthesis is that the question is ill-posed as a dichotomy: data diversity dominates in-distribution gains, model class and training objective dominate out-of-distribution retention, and current benchmarks confound the two because train and test conditions coincide. We state the conditions under which each answer holds, identify bottlenecks per subfield, and propose falsifiable research directions, including counterfactually structured datasets, world-model-regularized policies, and factor-controlled evaluation protocols.