Ir para o conteúdo principal

Escrever uma avaliação PREreview

Simulated Coherence, Absent Minds: On the Philosophical Illusions of AI Alignment

Publicado
Servidor
Preprints.org
DOI
10.20944/preprints202507.1654.v1

Pizzochero and Dellaferrera (2025) have recently demonstrated that large language models (LLMs) are capable of emulating human philosophical viewpoints with remarkable fidelity. By instructing these models to simulate responses from distinct intellectual subpopulations, they find that LLMs reproduce answer distributions that closely mirror those of actual philosophers and scientists. Yet, this paper contends that such outputs represent simulation rather than introspection. Building on insights from AI alignment theory and our formal investigations into strategic obfuscation in scheming agents, we underscore the epistemic hazards of conflating linguistic fluency with genuine cognition. Concepts such as semantic encryption and epistemic adversariality illustrate how persuasive, coherent outputs may obscure rather than clarify the model’s alignment with human reasoning. Consequently, we argue that the deployment of LLMs in experimental philosophy and oversight contexts must be approached with critical rigor. In the absence of access to internal deliberative processes, behavioral mimicry should not be mistaken for philosophical comprehension. It is not enough that machines produce plausible answers; the deeper question remains whether these answers emerge from any meaningful cognitive substrate. The central challenge, then, is not to teach machines to speak like thinkers, but to determine whether thought lies behind the simulation.

Você pode escrever uma avaliação PREreview de Simulated Coherence, Absent Minds: On the Philosophical Illusions of AI Alignment. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.

O que é um ORCID iD?

Um ORCID iD é um identificador único que diferencia você de outras pessoas com o mesmo nome ou nome semelhante.

Começar agora