Saltar al contenido principal

Escribe una PREreview

Web Agent Agentic Reinforcement Learning Decision Model Under Multi-Cost and Failure Risk Constraints

Publicada
Servidor
Preprints.org
DOI
10.20944/preprints202602.0095.v1

Intelligent agent interactions in real-world web environments are commonly constrained by request budgets,time delays, anti-crawling restrictions, and operational failure risks. Strategies solely optimizing task successrates often exhibit unusable phenomena such as "high success but high cost" or "low risk but conservativefailure."This paper proposes a constrained Agentic reinforcement learning model for Web Agents, unifyingpage access, search requests, and external API calls into a unified long-term decision-making framework withassociated costs. It simultaneously incorporates cost budget constraints and tail risk control into theoptimization objective: constructing a multidimensional cost vector comprising cumulative request count,total latency, and failure penalties to achieve budget compliance via Lagrange dual updates;while employing aCVaR risk term to suppress excessive exploration of high-failure-probability paths, thereby achievingadaptive balance among "completion rate, cost, and risk."Experiments were conducted across 30–70site/page templates and 800–1,500 end-to-end web tasks (including information extraction, pricecomparison, form submission, and cross-page navigation). Interaction sequences spanned 20–120 steps withtool scales of 30–200. Performance was benchmarked against unconstrained RL, budget-constrained RL, andrule-based/scripted web agents, quantifying task completion rates, cost-per-success, failure rates, and policystability.scripted web agents. We quantified task completion rates, cost-per-success, failure rates, and policystability.Results demonstrate that at equivalent completion rates, our method reduces C-PS by 22%–31% andlowers failure rates by 18%–26% under high failure penalties. Under fixed budgets, task completion ratesincrease by 10%–16%, highlighting the necessity and effectiveness of constraint modeling for practical WebAgent deployment.

Puedes escribir una PREreview de Web Agent Agentic Reinforcement Learning Decision Model Under Multi-Cost and Failure Risk Constraints. Una PREreview es una revisión de un preprint y puede variar desde unas pocas oraciones hasta un extenso informe, similar a un informe de revisión por pares organizado por una revista.

Antes de comenzar

Te pediremos que inicies sesión con tu ORCID iD. Si no tienes un iD, puedes crear uno.

¿Qué es un ORCID iD?

Un ORCID iD es un identificador único que te distingue de otros/as con tu mismo nombre o uno similar.

Comenzar ahora