Maybe neuroscience is sort of like an engineering project where we build bridges from brain to mind. On our side of the river lie the nuts and bolts of the brain, and on the other lies our psychology. Building the bridge is about explaining how the former realizes the latter.
In the case of language comprehension, over at the other riverbank we see rich hierarchical representations of meaning and syntax. The exciting project for us is to figure out what it is about the brain that can transform squiggly air into such linguistic structures. At our pier we have neurons, networks, and anatomical regions, but also dynamical phenomena.
Squiggly problems inspire squiggly solutions. Neural oscillations are widely pursued as a potential mechanism for language comprehension. In this post I want to explore the scope and limits of this idea. Let’s first spell out the basic picture. It’s quite elegant.
We begin with the problem statement at hand. If you look at a speech waveform, you can see the syllables and gaps of silence by eye. Although the gaps carry information (morse code disintegrates without breaks), they don’t require content decoding. In contrast, high amplitude moments require all the neural resources you can spare, as the number of possible states is high. How might the brain home in on these bits to extract all the content that’s in it?
Comprehension by passive synchronization
One idea is that oscillations align phases of high firing to the high information segments of speech through entrainment. Imagine two church bells placed right next to each other with a bit of space between them. Tap one with a hammer and the other will ring too. In the same vein, networks in the brain wired to oscillate at roughly the rate of speech have the propensity to synchronize to incoming sentences, with excitable peaks aligning to sounds, and inhibiting troughs aligning to silences. What you get is a homunculus-free process in which the brain organically samples relevant waveform segments.
One idea is that oscillations align phases of high firing to the high information segments of speech through entrainment. Imagine two church bells placed right next to each other with a bit of space between them. Tap one with a hammer and the other will ring too. In the same vein, networks in the brain wired to oscillate at roughly the rate of speech have the propensity to synchronize to incoming sentences, with excitable peaks aligning to sounds, and inhibiting troughs aligning to silences. What you get is a homunculus-free process in which the brain organically samples relevant waveform segments.
A rewarding aspect of science are the glimpses of beauty you see along the way, and this has been one for me. Through simple principles of physics and self-organization at the neural level, you get computational enrichment at the psychological level.
But the flip side of elegance is oversimplification. Recall the task description at hand, which is to explain how the brain extracts linguistic structure from waveforms. With its graceful simplicity, our homunculus-free process is also quite dull: it can only tailor oscillatory networks to what is actually rhythmic in the waveform. Well, what is rhythmic in the waveform?
If we take a step back to peer across the river at the connecting structure psycholinguists have prepared for us, we appear to be off the mark. Not much of what linguists are talking about is linearly decodable from waveforms. Words and syllables might be — they occur every 200 milliseconds or so — but language comprehension is not a simple matter of concatenating words. As we hear a string of words and parse them over time, our minds impose highly nonlinear tree structures onto them, which requires much more than merely tacking one word onto the next.
Insofar as passively entrained oscillations are at play in language comprehension, syntactical processing appears to be outside of its explanatory scope. This is because passively synchronizing networks are like a train driving across the waveform, building new rails in front of it as it goes. Basically, the core idea of cell ensembles passively following along with rhythmic speech precludes nonlinear processing. And if there’s no shipping around of linguistic elements, we’re not really talking about computation at all.
As described in an incisive and timely paper by Kazanina & Tavano (2022) [1]— which inspired this blog post — we might call the passive synchronization idea a chunking view because the syntactic units are assumed to be constructed segment by segment. How might the oscillatory view be expanded to accommodate the winding paths of linguistic form?
Comprehension by active integration
The brain is not a radio antenna that fixedly processes whatever you throw at it. Instead, the brain proactively predicts and sends top-down information to alter processing. Under this active view, we might think twice about leaving behind that homunculus.
From this perspective, oscillations hierarchically structure syntactical elements in the input stream regardless of whether those elements were adjacently presented or not — a view that Kazanina and Tavano call the integration view. As they emphasize, what sets apart this class of models is that they don’t cast oscillations as faithful recapitulators of the input stream. Rather, the content held online by oscillations might unfold faster than it unfolded in the spoken sentence, or things might be shuffled around to unlock computation [2]. Even the oscillatory dynamics themselves might be complex, with faster waves nesting inside slower waves to package smaller units in an overarching structure [3].
The brain is not a radio antenna that fixedly processes whatever you throw at it. Instead, the brain proactively predicts and sends top-down information to alter processing. Under this active view, we might think twice about leaving behind that homunculus.
From this perspective, oscillations hierarchically structure syntactical elements in the input stream regardless of whether those elements were adjacently presented or not — a view that Kazanina and Tavano call the integration view. As they emphasize, what sets apart this class of models is that they don’t cast oscillations as faithful recapitulators of the input stream. Rather, the content held online by oscillations might unfold faster than it unfolded in the spoken sentence, or things might be shuffled around to unlock computation [2]. Even the oscillatory dynamics themselves might be complex, with faster waves nesting inside slower waves to package smaller units in an overarching structure [3].
Paradoxically, active models tend to be more modest about the computational scope of oscillations in language comprehension. In my reading, this is because as we model how oscillations are coordinated top-down to carry linguistic information in non-trivial formats, we tend to offshore computational responsibility to other processes. Specifically, we implicitly or explicitly invoke stored representations of grammatical rules and semantics as doing some of the work — displacing part of our scientific explanation from temporal to spatial codes.
Keeping track of what we’re trying to explain
Let’s take stock of our bridge-building efforts. Oscillatory models that explain linguistic processing through passive mechanisms fail to account for the non-linearity of language. Active models get around this by having the brain shuffle units around, which is a necessary criterion for parsing speech. But for the brain to be able to do that, it needs a m̶y̶s̶t̶e̶r̶i̶o̶u̶s̶ ̶h̶o̶m̶u̶n̶c̶u̶l̶u̶s̶ spatially coded rulebook on how to process linguistic units. In one sense then, active models kick the conundrum down the road.
Keeping track of what we’re trying to explain
Let’s take stock of our bridge-building efforts. Oscillatory models that explain linguistic processing through passive mechanisms fail to account for the non-linearity of language. Active models get around this by having the brain shuffle units around, which is a necessary criterion for parsing speech. But for the brain to be able to do that, it needs a m̶y̶s̶t̶e̶r̶i̶o̶u̶s̶ ̶h̶o̶m̶u̶n̶c̶u̶l̶u̶s̶ spatially coded rulebook on how to process linguistic units. In one sense then, active models kick the conundrum down the road.
The limits of passive oscillatory views
P1: Minds represent language using tree structures, which is a core aspect of syntax
P2: Streams of speech do not have such structure embedded linearly in the signal
P3: Passive oscillations process speech linearly
C: Ergo, passively driven oscillations don’t explain much about syntactical processing
The limits of active oscillatory views
P4: Rather than passively synchronize, oscillations are systematically coordinated to structure linguistic units
P5: The computational machinery that coordinates such reorganization resides elsewhere
C: Ergo, actively driven oscillations don’t explain much about syntactical processing
It seems that the question of how we process syntax arrives at the same puzzling destination as the question of how we process semantics: how the heck is linguistic information stored in the brain [4] and how is such information brought to bear on the extracted stream units? If we get a handle on how the algorithms of language are encoded and wielded, I think we’ll proceed more efficiently in our inquiry on how oscillations come to do what they do – including their sophisticated acts of gymnastics where bands interweave to embed content.
Put differently, maybe both passive chunking and active integration style approaches risk putting the cart before the horse through a focus on oscillations over the locus of linguistic algorithms and symbols [5]. What are the bones of our computational machinery? How are the primitive rules and symbols of language coded physically? These sorts of questions keep biting us in the butt and working on them might re-invigorate work on oscillations rather than put the field down.
More broadly, I think the case study of language comprehension illustrates why science benefits from moving from mind to brain as opposed to from brain to mind [6]. It might be tempting, in adopting a bottom-up approach, to throw tree structures out of the window — indeed, it would be a surefire way to get a bridge off the ground from our side of the pond. But not all bridges are secure; some can’t explain human language.
In conclusion, from one point of view, constraints set by psycholinguistics are annoying at best and theory-stopping at worst. From another, concepts like tree structures are a useful map for us to trace our research by. In this light, the computational specifications we might be tempted to break down using brain-based terminology are best seen as a yardstick that informs us of how developed our bridge is.