PREreview of Decoding semantics from natural speech using human intracranial EEG
- Published
- DOI
- 10.5281/zenodo.15165844
- License
- CC BY 4.0
This review is the result of a virtual, live-streamed preprint journal club organized and hosted by PREreview and the Society for the Neurobiology of Language (SNL) within the collaborative SNL Semantics club (https://prereview.org/clubs/snl-semantics).This was the first edition of the club, and a total of three people joined the discussion. We thank all participants who contributed to the discussion and helped make it possible for us to provide feedback on this preprint. We look forward to welcoming more participants as the club continues to grow.
Summary
This study explores the potential for decoding lexical semantic information from human intracranial EEG recordings during natural speech. The Authors used multivariate pattern analysis to decode word-level semantic features from sEEG data obtained during spontaneous conversations in 14 participants. They achieved above-chance decoding accuracy, identified a left-lateralized cortical network involved in semantic processing, and found that lower-frequency oscillations contributed more strongly to semantic decoding. The Authors conclude that it is feasible to extract word meanings from neural activity during natural speech, which has implications for the development of more effective Brain-Computer Interfaces (BCIs) for restoring language in individuals with speech impairments.
Main Strengths:
The study addresses an ambitious research question for the field of neurobiology of language.
The study uses a naturalistic experimental design, recording neural activity during spontaneous conversations, which increases the ecological validity of the findings.
The Authors employ multivariate pattern analysis and appropriate statistical controls (e.g., permutation tests), to ensure the reliability of their decoding results.
Below we list major and minor concerns that were discussed by participants of the live review, and, where possible, we provide suggestions on how to address those issues.
Major concerns and feedback:
Authors performed spectral clustering on the word2vec embeddings into 10 categories. However, the input dataset was composed of content words, non-content words and interjections. Content words (e.g. dog, run, beautiful, hardly) include nouns, verbs, adjectives and adverbs. Non-content words or function words (e.g. she, and, my, is) include pronouns, prepositions, determiners, conjunction and auxiliary verbs. Interjections (e.g. yeah, ouch, ah, ehm) include words or expressions “. Only content words carry lexical meaning or the semantic content relevant to the research goal. Therefore, the adoption of non-content words and interjections is misaligned with the main purpose of the study, i.e., extracting word meaning from neural activity. The inclusion of non-content words and interjections might confound the semantic clustering results, making it difficult to interpret the findings in the context of word meaning. More specifically, In clusters 7 and 10, action verbs appear alongside tense modifiers and modals (e.g., coming, going, got, seen, has, been, might, should, gonna, would). This grouping seems to reflect more syntactic patterns—capturing tense, aspect, or modality—rather than coherent semantic categories. Similarly, cluster 3 appears thematically mixed, including people, actions, locations, times, and adjectives (e.g., mouse, sharks, pool, swollen, seal, at, next, room…). The lack of a clear conceptual focus in this cluster may complicate the interpretation of results in terms of word meaning.
The use of a stimulus dataset composed of content words, non-content words and interjections led to 10 clusters that do not match with any definition of semantic clusters within current theoretical frameworks (e.g., taxonomic or thematic). It is evident that the spectral clustering analysis relies on feature patterns other than semantics. The stimulus dataset contains inflected forms of nouns and verbs and inflected forms of the same word (e.g., sister, sisters; college, colleges). We suggest that the Authors substantially improve the stimulus dataset by preprocessing the textual data. This could include, first, applying a normalizer (targeting go, goes, going, gone), followed by a Part-Of-Speech (POS) tagging and lemmatization analysis using Spacy or NLTK libraries. Moreover, it should be explicitly stated throughout the text that the study did not aim to decode word meanings or semantic features, but rather focused on decoding cluster identity: sentences such as “This approach allowed us to test whether patterns of neural activity contained sufficient information to predict the semantic feature of each word” could be misleading.
The Authors adopted word2vec to generate embeddings and further assessed the generalizability of their findings using FastText and GloVe. However, the choice of these embeddings is not well justified: the Authors should elaborate on why word2vec, FastText, and GloVe were chosen over other models, outlining the specific advantages these embeddings offer for their analysis as well as any known limitations. Notably, the selected embeddings do not allow straightforward interpretation of the resulting clusters. Alternative embeddings – such as those grounded in human ratings– would provide more interpretable dimensions (see, for instance, https://pubmed.ncbi.nlm.nih.gov/31832879/, https://pubmed.ncbi.nlm.nih.gov/27310469/, https://www.nature.com/articles/s41597-023-01995-6).
Given the nature of sEEG data, one would expect a detailed analysis of temporal dynamics. While the time window analysis (ranging from 100 to 900 milliseconds) suggests stable semantic decoding, the study does not investigate how this temporal consistency may vary across different brain regions.
The lack of neuropsychological data on the patients is particularly problematic, especially given that no language profile is provided – not even basic information such as participants’ native language. This is concerning in light of the highly variable output: [...] participants engaged in natural conversations [...] lasted between 16 to 92 minutes [...] each participant spoke an average of 2,696 ± 1,786 (Mean ± s.t.d) words, including 621 ± 278 unique, unrepeated words. It remains unclear whether any patients suffer from language production (or comprehension) impairments.More broadly, the study involves a specific population –patients undergoing epilepsy monitoring – which raises the question about the generalizability of the findings to healthy individuals or other populations with language impairments. These limitations should be explicitly acknowledged and discussed.
Relying on the preprint entitled Natural language processing models reveal neural dynamics of human conversation | bioRxiv for methodological details is not ideal, as it forces the readers to go on a scavenger hunt to piece together the critical information about neural data processing. Even the supplementary materials of that preprint do not provide sufficient detail to fully understand all the preprocessing and analysis steps involved. These details should be clearly and comprehensively presented within the current preprint manuscript
Minor concerns and feedback:
Figure 2.A employs transparency to indicate prediction correctness, a visual encoding that proves to be a significant limitation to the panel’s interpretability. This lack of clear visual differentiation hinders the unambiguous understanding of the model’s performance at the individual token level, and thus reduces the overall comprehensibility of the figure.
The Authors use the terms '128-256 channels' and '1908 bipolar channels,' but it is unclear if the latter refers to the total number of contact points across all channels and patients. We assume the Authors mean 128-256 channels per participant and then excluding the IEDs, noise, and bad channels, they have 1908 over all (1908/14 patients ~= 136 bipolar channels per patient). The Authors should clarify if this is correct.
Concluding remarks
We thank the Authors of the preprint for posting their work openly for feedback. We also thank all participants of the review call for their time and for engaging in the lively discussion that generated this review.
Competing interests
The authors declare that they have no competing interests.