A Non-Turing Computer Architecture for Artificial Intelligence with Dynamic Rule Learning and Generalization Abilities Using Images or Texts
- Posted
- Server
- Preprints.org
- DOI
- 10.20944/preprints202412.2432.v5
Since the beginning of modern computer history, the Turing machine has been a dominant architecture for most computational devices, which consists of three essential components: an infinite tape for input, a read/write head, and finite control. In this structure, what the head can read (i.e., bits) is the same as what it has written/outputted. This is actually different from the ways in which humans think or do thought/tool experiments. More precisely, what humans imagine/write on paper are images or texts, and they are not the abstract concepts that they represent in the human brain. This difference is neglected by the Turing machine, but it actually plays an important role in abstraction, analogy, and generalization, which are crucial in artificial intelligence. Compared with this architecture, the proposed architecture uses two different types of heads and tapes, one for traditional abstract bit inputs/outputs and the other for specific visual ones (more like a screen or a workspace with a camera observing it). The mapping rules between the abstract bits and the specific images/texts can be realized by neural networks like Convolutional Neural Networks, YOLO, Large Language Models, etc., with a high accuracy rate. As an example, this paper presents how the new computer architecture (what we call ``Ren machine" for simplicity here) autonomously learns a distributive property/rule of multiplication in the specific domain and further uses the rule to generate a general method (mixed in both the abstract domain and the specific domain) to compute the multiplication of any positive integers based on images/texts. The machine's strong reasoning ability is also corroborated in proving a theorem in Plane Geometry. Moreover, a robotic architecture based on Ren machine is proposed to address the challenges faced by the Vision-Language-Action (VLA) models in unsound reasoning ability and high computational cost.