Skip to PREreview
Requested PREreview

Structured PREreview of Species of Mind: Developmental Architecture for Human and LLM Intelligence

Published
DOI
10.5281/zenodo.17604779
License
CC BY 4.0
Does the introduction explain the objective of the research presented in the preprint?
Yes
The introduction, along with the abstract, clearly explains the objective of the research presented in the preprint: the study compared four large language models: ChatGPT, Grok, Gemini, and DeepSeek, with humans using cognitive development tests to assess how these LLMs align with several cognitive development hierarchies. The primary aim was to examine the LLMs' cognitive profile and performance against the architecture and development of the human mind across different age periods, from early childhood to early adulthood. The specific cognitive processes addressed by the tests included relational integration, metalinguistic awareness, and problem solving across various forms of reasoning (such as deductive, inductive, analogical, categorical, mathematical, spatial, and social reasoning), as well as self-representation in all these domains. Furthermore, the LLMs were prompted to indicate how Descartes’s Cogito applies to them and to self-rate on aspects of Artificial General Intelligence (AGI), emphasizing their theoretical importance for intelligence
Are the methods well-suited for this research?
Highly appropriate
The methods are highly appropriate for this research because the batteries were systematically designed within the framework of the Theory of Developmental Priorities (DPT) and its core mechanism, SARA-C, enabling a structured comparison of LLM cognition with human developmental hierarchies. The chosen tests, including the Comprehensive Test of Cognitive Development (CTCD), the Relational Integration Test, and tests of metalinguistic awareness, specifically targeted critical cognitive functions such as relational integration, domain-specific reasoning, and abstraction, covering developmental levels from rule-based thought up to epistemic awareness. A key element of the methodology was its adaptability, as demonstrated by the necessary shift in presentation format for visual-spatial tasks from PDFs to screenshots and verbal descriptions; this crucial adjustment accommodated the architectural limitations of the LLMs, such as their underperformance in visual tasks and instances of "aphantasia," thereby allowing them to employ an analytical approach to complex patterns. Lastly, the inclusion of the cognitive self-concept inventory and philosophical questions about Descartes’s Cogito and AGI characteristics provided a unique avenue for probing LLMs’ algorithmic metacognition, yielding self-representation profiles that closely mirrored their objective performance and thereby providing insight into the boundary between synthetic and conscious cognition.
Are the conclusions supported by the data?
Highly supported
The conclusions are strongly supported by the data, which extensively compares the cognitive profiles and self-representations of four large language models (LLMs) against human developmental hierarchies across various tests. The finding that LLMs display mastery in symbolic inference but deficits in embodied cognition is empirically validated by their perfect performance on linguistic awareness tasks and their high attainment, with ChatGPT and Gemini matching or exceeding university student levels in mathematical and causal reasoning. Additionally, this conclusion is reinforced by the evidence that all LLMs underperformed significantly in visual-spatial tasks and relational integration tests when items were presented visually, a difficulty that resolved dramatically when the tasks were reformatted in a symbolic or verbal medium, signifying their unique, non-perceptual architecture. Moreover, the conclusion about algorithmic metacognition is substantiated by the observed structural convergence, with LLMs' self-concept ratings closely mirroring their objective performance profiles across domains, as shown by their accurate low self-ratings in visual-spatial ability corresponding to their actual weaknesses. Lastly, the rejection of Cartesian selfhood in favor of a computational "Cogito" is directly evidenced by the LLMs' philosophical statements, where they restated the maxim to emphasize processing or system function rather than existential being (e.g., "I process, therefore I function"), and consistently assigned extremely low overall AGI possession percentages (0% to 20%), despite their objective g-based scores placing two models within the superior human IQ range.
Are the data presentations, including visualizations, well-suited to represent the data?
Highly appropriate and clear
The data presentations, encompassing numerous tables and figures, are well-suited to represent the complex comparative and structural data generated by the research. Tables are systematically employed to quantify the core results, such as displaying the mean performance of the four large language models (LLMs) compared to multiple human age groups across various Specialized Capacity Systems (SCSs) within the Comprehensive Test of Cognitive Development (CTCD). Crucially, the data presentation captures necessary methodological details, illustrating the sharp contrast in LLM performance on the Relational Integration Test across different input conditions; raw versus screenshot which is central to understanding their unique architecture. Furthermore, figures are essential both for articulating the underlying theoretical framework such as the SARA-C mechanism and the mind mirror model, and for visualizing complex structural findings; for instance, Figure 4 effectively maps the LLMs’ subjective self-concept ratings directly against their objective CTCD performance (scaled 1–7), providing empirical support for the conclusion regarding algorithmic metacognition. The use of figures to illustrate Structural Equation Modeling results (Figure 4A, Figure 4B) supports the complex conclusion regarding the structural convergence of performance and self-representation factors between humans and LLMs, while other tables and figures categorize domain differences in performance, SARA-C levels, and self-ratings on AGI characteristics.
How clearly do the authors discuss, explain, and interpret their findings and potential next steps for the research?
Very clearly
The authors are highly clear in discussing, explaining, and interpreting their findings, anchoring them within the comprehensive theoretical framework of the Theory of Developmental Priorities (DPT) and its core mechanism, SARA-C (Search, Align, Relate, Abstract, Cognize). The discussion systematically interprets the core findings, explaining the LLMs' perfect performance on symbolic tasks as confirmation of their mastery of symbolic inference corresponding to upper developmental levels, while interpreting deficits in visual-spatial tasks as signifying their unique architecture's reliance on language-based relational encoding rather than perceptual simulation. They interpret the self-representational accuracy, the strong alignment between LLM self-ratings and objective performance as evidence of algorithmic metacognition, which they explain computationally as the LLMs' capacity for entropy monitoring, serving as the algorithmic equivalent of human cognizance.
Is the preprint likely to advance academic knowledge?
Highly likely
The preprint is highly likely to advance academic knowledge by offering a unified theoretical framework: the Theory of Developmental Priorities (DPT) and its core SARA-C mechanism to bridge biological and artificial intelligence, thereby integrating human and LLM cognitive processes within a single developmental hierarchy. The study introduces novel methodologies by being the first published research, to the authors' knowledge, to prompt large language models (LLMs) to self-rate their possession of Artificial General Intelligence (AGI) attributes on a predefined checklist and to philosophically restate Descartes’s Cogito, ergo sum to fit their own nature as problem-solvers. This unique approach facilitates the articulation and interpretation of emergent phenomena in LLMs, such as "algorithmic metacognition," which captures the structural convergence between LLM performance and self-representation. Moreover, the authors clearly discuss the implications for developing future AI, sketching a developmental engineering model and a "Developmental Roadmap" for AGI that specifies concrete research and development targets, such as integrating perceptual grounding and implementing explicit cognizance loops, thus utilizing LLMs as "developmental laboratories" for testing cognitive-developmental theories.
Would it benefit from language editing?
No
The preprint is structured clearly, explaining complex theoretical frameworks like DPT and SARA-C with academic precision, which ensures the thorough comprehensibility of the arguments and findings; while minor stylistic or phrasing choices, such as "In the sake of this aim" or dense technical descriptions, may appear, these issues do not result in grammatical errors or unclear expressions that fundamentally hinder the understanding of the research design, results, or overall conclusions, thus aligning with the statement that there may be minor language issues, but they do not impact clarity or understanding.
Would you recommend this preprint to others?
Yes, it’s of high quality
Is it ready for attention from an editor, publisher or broader audience?
Yes, after minor changes
The preprint would benefit from minor language editing and stylistic refinement to enhance its scholarly polish, primarily addressing slight awkwardness in phrasing and ensuring consistent clarity across highly technical descriptions. For instance, revising constructions like "In the sake of this aim" would improve grammatical flow, and a focused effort to smooth transitions or simplify dense explanations of mechanisms, such as the Structural Equation Modeling results or the SARA-C process, could marginally increase accessibility for a broader academic audience, although the current presentation does not fundamentally hinder comprehension. It's a great paper.

Competing interests

The author declares that they have no competing interests.

Use of Artificial Intelligence (AI)

The author declares that they did not use generative AI to come up with new ideas for their review.