Escribe una PREreview

Web Agent Agentic Reinforcement Learning Decision Model Under Multi-Cost and Failure Risk Constraints

por Qianli Ma, Limengxi Yue, Shuyang Xu, Yanpei Shi y Hongrui Liu

Publicada: 2 de febrero de 2026
Servidor: Preprints.org
DOI: 10.20944/preprints202602.0095.v1

Intelligent agent interactions in real-world web environments are commonly constrained by request budgets,time delays, anti-crawling restrictions, and operational failure risks. Strategies solely optimizing task successrates often exhibit unusable phenomena such as "high success but high cost" or "low risk but conservativefailure."This paper proposes a constrained Agentic reinforcement learning model for Web Agents, unifyingpage access, search requests, and external API calls into a unified long-term decision-making framework withassociated costs. It simultaneously incorporates cost budget constraints and tail risk control into theoptimization objective: constructing a multidimensional cost vector comprising cumulative request count,total latency, and failure penalties to achieve budget compliance via Lagrange dual updates;while employing aCVaR risk term to suppress excessive exploration of high-failure-probability paths, thereby achievingadaptive balance among "completion rate, cost, and risk."Experiments were conducted across 30–70site/page templates and 800–1,500 end-to-end web tasks (including information extraction, pricecomparison, form submission, and cross-page navigation). Interaction sequences spanned 20–120 steps withtool scales of 30–200. Performance was benchmarked against unconstrained RL, budget-constrained RL, andrule-based/scripted web agents, quantifying task completion rates, cost-per-success, failure rates, and policystability.scripted web agents. We quantified task completion rates, cost-per-success, failure rates, and policystability.Results demonstrate that at equivalent completion rates, our method reduces C-PS by 22%–31% andlowers failure rates by 18%–26% under high failure penalties. Under fixed budgets, task completion ratesincrease by 10%–16%, highlighting the necessity and effectiveness of constraint modeling for practical WebAgent deployment.

Puedes escribir una PREreview de Web Agent Agentic Reinforcement Learning Decision Model Under Multi-Cost and Failure Risk Constraints. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.