Skip to main content

Write a PREreview

Word2Vec: Um algoritmo saussuriano

Posted
Server
SciELO Preprints
DOI
10.1590/scielopreprints.11678

This article proposes an interpretation of the functioning of Word2Vec, an algorithm for generating word embeddings, in light of Ferdinand de Saussure’s Theory of Value (TdV). In recent years, Word2Vec has proven highly useful for various NLP tasks—such as text classification, sentiment analysis, and word occurrence probability estimation—due to its handling of high-dimensional vectors. I argue, therefore, that this language model allows us to recognize that certain theoretical notions from Saussurean linguistics—namely, system, sign, and value—remain productive for reflecting on the theoretical and epistemological aspects involved in meaning determination in natural languages, as well as on how such notions appear to be emulated by modern NLP techniques, such as Word2Vec. This study begins with a critique of the limitations of TF-IDF, proceeds through the influence of Distributional Semantics and the Distributional Hypothesis in modern vector-based language models, and ultimately suggests that Word2Vec shows signs of being capable of operationalizing, at the level of computational semantics, what Saussure had already formulated conceptually in the early twentieth century—namely: that the meaning of a word is neither fixed nor individual, but relational and determined by the similar and dissimilar values that surround it. In this sense, the Saussurean sources mobilized in this research to define the conceptual framework include: the Course in General Linguistics; the manuscript collection Notes pour le 3e Cours; and the notebook of Émile Constantin, a student in Saussure’s Third Course in General Linguistics taught in Geneva between 1910 and 1911. Our objective, then, is to propose that the Saussurean notions of similia and dissimilIa can be identified within the theoretical underpinnings of Word2Vec, promoting a convergence between Saussurean theory and contemporary NLP. The central hypothesis of this work is that Word2Vec can be read as a Saussurean algorithm, as it computationally applies the dynamics of linguistic values to emulate the way meanings are determined through the relations between words, as foreseen by the Genevan master more than a century ago.

You can write a PREreview of Word2Vec: Um algoritmo saussuriano. A PREreview is a review of a preprint and can vary from a few sentences to a lengthy report, similar to a journal-organized peer-review report.

Before you start

We will ask you to log in with your ORCID iD. If you don’t have an iD, you can create one.

What is an ORCID iD?

An ORCID iD is a unique identifier that distinguishes you from everyone with the same or similar name.

Start now