Escrever uma avaliação PREreview

LLM Agents as Programmable Subjects: Assays and Benchmarks for Agentic Behavior and Alignment

de Gaurav Koley e Aditya Thiruvengadam

Publicado: 6 de outubro de 2025
Servidor: Preprints.org
DOI: 10.20944/preprints202510.0476.v1

We present a framework, assay suite, and reference toolkit for studying LLM agents as programmable subjects in controlled computational laboratories. We formalize subjects and protocols with explicit identifiability assumptions, and provide core and extended trait assays with reliability, invariance, and causal robustness criteria. The framework targets empirical characterization of emergent traits (e.g., deception, diligence, and constraint obedience) across models, tools, and environments, complementing capability benchmarks by emphasizing auditable process traces in addition to outcomes. We report current capabilities and limitations and outline an agenda for improving causal reasoning, interpretability, and robust validation. The objective is to provide shared infrastructure and standards, rather than to advance a particular position about how such agents ought to be used.

Você pode escrever uma avaliação PREreview de LLM Agents as Programmable Subjects: Assays and Benchmarks for Agentic Behavior and Alignment. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.