Ir para o conteúdo principal

Escrever uma avaliação PREreview

AgentVerify: Compositional Formal Verification of AI Agent Safety Properties via LTL Model Checking

Publicado
Servidor
Preprints.org
DOI
10.20944/preprints202604.1029.v1

Autonomous AI agents operating in high-stakes domains—financial trading, medical diagnostics, autonomous code execution—lack formal safety guarantees for their core operational loops, including memory management, tool invocations, and human interactions. Current verification approaches either fail to scale to neural components or ignore the structured control flow of agentic systems entirely. We introduce AgentVerify (Compositional Formal Verification of AI Agent Safety Properties via LTL Model Checking), a model checking framework that specifies and verifies safety properties for agent architectures using temporal logic. AgentVerify defines compositional specifications for memory integrity, tool call pro tocols, MCP/skill invocations, and human-in-the-loop boundaries, enabling rigorous runtime monitoring and post-hoc behavioral analysis. In an empirical evaluation across 15 diverse agent scenarios (low- and high-difficulty), our post-hoc behavioral analysis component achieved a verification accuracy of 86.67% (mean over 3 seeds, σ=0.00), outperforming a monolithic contract verification baseline (80.00%) and a runtime monitoring baseline without temporal logic (46.67%). A monolithic neural verifier, which attempts to verify the LLM outputs directly, performed poorly at 13.33%, confirming that end-to-end neural verification is currently intractable for production-scale agents. These results demonstrate that formal methods applied to the agent’s observable control flow provide a tractable and effective path to safety assurance, complementing rather than replacing neural-centric efforts to align large language models.

Você pode escrever uma avaliação PREreview de AgentVerify: Compositional Formal Verification of AI Agent Safety Properties via LTL Model Checking. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.

O que é um ORCID iD?

Um ORCID iD é um identificador único que diferencia você de outras pessoas com o mesmo nome ou nome semelhante.

Começar agora