Escrever uma avaliação PREreview

Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization

de Dong Qiu, Duo Xu e Limengxi Yue

Publicado: 31 de dezembro de 2025
Servidor: arXiv
DOI: 10.48550/arxiv.2512.24609

Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforcement learning-augmented LLM agent framework that formulates cooperation as a decentralized partially observable Markov decision process (Dec-POMDP) and adopts centralized training with decentralized execution (CTDE). We introduce Group Relative Policy Optimization (GRPO) to jointly optimize agent policies with access to global signals during training, together with a simplified joint reward that balances task quality, speed, and coordination cost. On collaborative writing and coding benchmarks, our framework delivers a 3x increase in task processing speed over single-agent baselines, 98.7% structural/style consistency in writing, and a 74.6% test pass rate in coding. The approach consistently outperforms strong multi-agent LLM baselines and provides a practical path toward reliable collaboration in complex workflows.

Você pode escrever uma avaliação PREreview de Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.