Path-Sensitive AGI Alignment: Cognitive Integrity, Escape Cost, and Trajectory Risk in Augmented State Space
- Publicada
- Servidor
- Preprints.org
- DOI
- 10.20944/preprints202605.0905.v1
AGI alignment is often evaluated at a snapshot: a system is judged by its current outputs, policy profile, benchmark behavior, or apparent corrigibility. Snapshot evaluation misses a central risk of advanced deployment: a good endpoint can still be reached by a bad journey. Two trajectories may arrive in similar behavioral regions while differing in reversibility, opacity, intervention cost, memory entanglement, institutional dependency, and the quality of human judgment left available for oversight. This paper develops a path-sensitive alternative. It represents AGI development as motion through an augmented state space Z containing model and environment state, world-model structure, policy state, memory and provenance traces, governance affordances, institutional embedding, and human evaluative capacity. Cognitive integrity — the capacity of individuals, teams, or institutions to sustain calibrated attention, trust, contestability, and decision under pressure [1] — is introduced here as an alignment-relevant state variable rather than assumed as a familiar metric. The formal contribution is a scaffold of definitions: controlled transition laws over augmented state, escape cost, path-level alignment functionals, viability floors, forbidden regions, and trajectory classes distinguished by lock-in, basin structure, retargetability, and integrity preservation. The result does not supply a calibrated empirical model of deployed AGI systems. It specifies what such a model must track if alignment evidence is to cover both present behavior and the remaining possibility of legible, reversible, and cognitively intact correction.