Skip to main content

Write a PREreview

LLM Agents as Programmable Subjects: Assays and Benchmarks for Agentic Behavior and Alignment

Posted
Server
Preprints.org
DOI
10.20944/preprints202510.0476.v1

We present a framework, assay suite, and reference toolkit for studying LLM agents as programmable subjects in controlled computational laboratories. We formalize subjects and protocols with explicit identifiability assumptions, and provide core and extended trait assays with reliability, invariance, and causal robustness criteria. The framework targets empirical characterization of emergent traits (e.g., deception, diligence, and constraint obedience) across models, tools, and environments, complementing capability benchmarks by emphasizing auditable process traces in addition to outcomes. We report current capabilities and limitations and outline an agenda for improving causal reasoning, interpretability, and robust validation. The objective is to provide shared infrastructure and standards, rather than to advance a particular position about how such agents ought to be used.

You can write a PREreview of LLM Agents as Programmable Subjects: Assays and Benchmarks for Agentic Behavior and Alignment. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now