Ir para o conteúdo principal

Escrever uma avaliação PREreview

A Purely Distributional Embedding Algorithm

Publicado
Servidor
Preprints.org
DOI
10.20944/preprints202602.0581.v1

This paper introduces the Distributional Embedding Algorithm (DEA), a purely deterministic framework for generating word embeddings through an \emph{Iterative Structural Saliency Extraction}, which is based on a natural Galois correspondence. Unlike stochastic "black-box" machine learning models, DEA grounds semantic representation in the topological structure of a corpus, mapping the redistribution of semantic mass across identifiable structural nuclei. We apply this model to a controlled dataset of 300 propositions from David Bohm’s \textit{Wholeness and the Implicate Order}, identifying four primary semantic basins that account for 74\% of the text's logical flow. By tracking the iterative expansion of these clusters, we demonstrate a ``topological collapse'' where shared lexical pivots connect distant propositions. Validation via cosine distance measures confirms high structural orthogonality between core conceptual terms and extrinsic category noise (e.g., \textit{intelligence} vs. \textit{desk}, d=0.99d=0.99). We conclude that DEA offers a computationally efficient, transparent, and structurally-aware alternative that can be integrated with existing neural architectures to enhance interpretability in semantic modeling. Moreover, DEA is based on the \textbf{Logarithmic Hypothesis} about the dimension of the embedding vectors, w.r.t. the number of propositions of the corpus. While modern AI architectures require thousands of embedding components to process 101310^{13} propositions, the DEA approach suggests a structural collapse of complexity, where the global semantic manifold can be distilled into Llog10(13)L \approx \log_{10}(13) features. Even at a hyper-refined resolution of L30L\approx 30, the model offers a deterministic, ``white-box'' alternative to current neural networks, providing a thousand-fold increase in computational efficiency without sacrificing logical precision.

Você pode escrever uma avaliação PREreview de A Purely Distributional Embedding Algorithm. Uma avaliação PREreview é uma avaliação de um preprint e pode variar de algumas frases a um parecer extenso, semelhante a um parecer de revisão por pares realizado por periódicos.

Antes de começar

Vamos pedir que você faça login com seu ORCID iD. Se você não tiver um iD, pode criar um.

O que é um ORCID iD?

Um ORCID iD é um identificador único que diferencia você de outras pessoas com o mesmo nome ou nome semelhante.

Começar agora