Skip to PREreview

PREreview of A sparse code for natural sound context in auditory cortex

CC BY 4.0

This review is the result of a virtual, live-streamed preprint journal club organized and hosted by PREreview and the journal Current Research in Neurobiology (CRNEUR) as part of a community-based review pilot (you can read more about the pilot here). The discussion was joined by 12 people in total, including a preprint author, 2 journal editors and 2 call facilitators. We thank all participants who contributed to the discussion and made it possible for us to provide feedback to this preprint.


In this paper, the authors investigate the neural code underlying the representation of auditory context - particularly how distributed this code is in primary and secondary auditory cortical areas of ferrets, and how inhibitory and excitatory cells uniquely participate in code. They study this by performing extracellular recordings from primary and secondary auditory cortices in passively listening ferrets while presenting probe stimuli preceded by different auditory contexts (i.e. sounds preceding a probe stimulus) allowing them to test how previous sound context integrates with current auditory inputs. Furthermore, they identified inhibitory neurons through optogenetic photo-tagging to describe the unique role of inhibitory neurons in this process. 

Through a combination of general linear modeling (GLM), decoding, and measurements of changes in PSTHs following probe sound presentation in different contexts the authors demonstrate that context is sparsely represented across populations in the auditory cortex.  At a single neurons level the authors found that different neurons represented distinct contexts, which at a population level could represent almost the entire context landscape tested. This suggests that the sound context a stimulus is preceded by is represented across a distributed code. Furthermore, the preceding sound context could be represented for up to almost a second, with the secondary auditory cortical region dorsal peri-ectosylvian gyrus (dPEG) having longer representation of context relative to primary auditory cortex. Furthermore, a GLM-STRF model with adaptation dynamics and contributions of the simultaneously recorded population of neurons (but not a model with only STRF) was able to simulate the effect of context on probe stimulus representations, suggesting that the context is embedded in a population code. 

All in all, the consensus from the prereview participants was that this is a well written paper that provides a comprehensive investigation of how the representation of sound is influenced by preceding sound context across populations of neurons in auditory cortical areas of ferrets. 

Below we list major and minor concerns that were discussed by participants of the journal club, and, where possible, we provide suggestions on how to address those issues.

List of major concerns and feedback:

  • The amplitude measure and the duration measure of context effects are likely dependent measures, given that the authors compute the integral of the significant bins - this would naturally lead to higher amplitude values for more significant bins (i.e., more numbers to add). It would be good to run a control to show how much this could correlate by chance.

  • In relation to the principal component analysis: It would seem that the fact that PC1 can capture almost all of the context pairs is an argument for a low dimensional, and therefore not sparse representation. Though the other pieces of evidence favor an interpretation of sparse representation, this finding seems at odds with that. Demonstration of how much variance is explained by PC1, and additional PCs are needed to make this clear. Because it would seem that a sparse code would be high dimensional.

  • Results from narrow spiking population and opto-tagged population are at times at odds with each other (i.e. Fig. 5E,F). I think it is possible that the discrepancy has arisen from a lack of subsampling the populations to match the numbers of neurons for comparison. Please rerun these analyses with subsampling so the amount of neurons are matched in each group. Otherwise interpretation of these results are difficult.

  • In order to justify that the STRF cannot capture the context over a full 1-second timescale, it would seem justifiable to create STRFs with 1-second history.

  • Are the STRF-based predictions predicated on assumptions of linearity (typically Shihab and Stephen assume linearity, right?)? Would a non-linear bottom-up/acoustically driven model better fit the data? (Ex:

  • The discussion dives into the distinct roles of feedforward and recurrent roles of excitatory and inhibitory neurons. This can be expanded on by investigating the temporal profile of the coupling filters from the GLM - particularly if the coupling filters temporal profile is different for excitatory and inhibitory neurons.

List of minor concerns and feedback:

  • Beyond calling them natural sounds, there does not seem to be much description of what the sounds were. Particularly it would be good to know how much the sounds are dynamically changing over the 1-second sound clip. The sound of water streaming is quite different from vocalization in terms of dynamics over the course of the sound stimulus. Please provide a fuller description of what the sounds used were.

  • The authors should consider adding some remarks on how the effects of context may have differed across laminae - and in relation to this mention angle of approach to cortex and the degree to which your contacts span cortical laminae in the methods section

  • In all figures error bars are currently SEM. It may be a matter of taste, but I find 95% confidence intervals easier to use for visual statistical inference of the figures - as that is what significance tests are testing for.

  • In figure 3: the y-axis in panel A has context pairs. It would be good to know what each pair is, especially since it looks like they are clustered in some sensible way.

  • There is a wrong reference to panel Fig. 1G-K in line 105

  • It would be good to show how well the encoding model actually predicts the neural activity - It is currently only shown as a relative measure to the best model prediction. It would just be good to know how well the model performs in general - to better understand the limits of interpretation.

  • The behavioral state of the animals seems like a crucial factor in long timescale effects. The pupil responses being reported are a great addition to your data, and suggest the animals were alert, but given that others have reported the pupil diameter as being quite sensitive to temporal sound patterns (Barczak et al. 2018, PNAS; many others) you may want to compare your effects to others’.

  • Could not find a specific section for data and code availability. There was a mention of software that was developed for the lab but there was no link to it.

Concluding remarks

We thank the authors of the preprint for posting their work as such and for agreeing to participate in this pilot. We also thank all participants of the live-preprint journal club for their time and for engaging in the lively discussion that generated this review.

Competing interests

Daniela Saderi, one of the call facilitators, conducted her PhD in the David lab. Daniela did not contribute to the review.