<- Back to all posts

Predictive Coding, Music, and the Vector Space of Conversation

2026-04-05

Starting from predictive coding theory, explaining the neural mechanisms of musical pleasure, and modeling conversation as directional choices in vector space — revealing the shared information-theoretic foundation of both.


Predictive Coding: A Unified Explanation of Pleasure

The cerebral cortex (especially the auditory cortex and prefrontal cortex) continuously predicts the next note. When a prediction is violated "just right" — say, an unexpected but harmonically reasonable key change — the prefrontal-striatal circuit generates a positive prediction error signal. This is the direct source of pleasure.

A landmark 2011 study in Nature Neuroscience by the Zatorre team at McGill University revealed the dopamine mechanism behind this: when music induces "chills," dopamine release in the ventral striatum occurs in two phases — the anticipation phase (caudate nucleus activation) and the climax phase (nucleus accumbens activation). This is the same reward circuit used for food, sex, and other primary rewards.

This leads to a core principle:

Too predictable → boredom. Completely unpredictable → noise aversion. Maximum pleasure occurs in the medium information-entropy range.

This is the Wundt curve (inverted U-shape). From this perspective, the essence of music is a temporal art that manipulates neural prediction systems through the establishment and disruption of acoustic patterns.

This model doesn't only apply to music. It can explain any pleasure experience based on pattern expectancy — including conversation.

The Wundt curve — pleasure peaks at medium prediction error. Hover the dots.


A Vector Space Model of Conversation

Conversation can be modeled as directional choices in vector space. At the start of an exchange, initiator A proposes a direction vector x. Each subsequent response from B is a choice of a new vector in this space. Different choices lead to vastly different conversation quality.

Strategy 1: Collinear — Prediction Error ≈ 0

B chooses a direction aligned with or at a very small angle to x.

This manifests in two forms:

  • Collinear reinforcement: continuing the logical reasoning along A's line of thought — shallow deepening.
  • Neighborhood search: finding shallowly related content for synonymous transformation — slightly different direction but similar magnitude.

Both are extremely energy-efficient for B's brain. But A wants to yawn — because A's internal model can fully predict B's output, yielding zero information gain. The musical analogue: endlessly repeating the same chord progression. Safe, but boring. The nucleus accumbens won't release dopamine.

Collinear response — hover to see: almost no deviation. Boring.

Strategy 2: Orthogonal — Significant but Integrable Prediction Error

B proposes a direction orthogonal to x, with a critical constraint: the origin must lie within the range of x.

This means that although B's contribution surprises A's predictions, A can absorb the new information by expanding dimensionality. The musical analogue: a "just-right violation of expectation" — an unexpected key change that can be harmonically rationalized after the fact.

The effect? The conversation is elevated by one dimension. A's internal representation space goes from n to n+1 dimensions; all subsequent predictions occur in a higher-dimensional space, and the explorable state space grows exponentially. Prediction error triggers dopamine release, and successful model updating produces a second wave of reward.

This requires substantial knowledge reserves from B, but the payoff is enormous. Deep conversations that feature orthogonal exchanges at a certain frequency will be of very high quality.

Orthogonal response — hover to see the dimension expand. The key insight.

Strategy 3: Random — Very Large and Hard-to-Integrate Prediction Error

B proposes a vector y completely unrelated to x in both direction and origin. The musical analogue: suddenly inserting a passage of atonal noise. Generally perceived as absurdist, producing confusion or aversion. But sometimes surprisingly effective.

The key insight: randomness is surface-level. The underlying nature falls into two types:

MemorizedTrue Random
MechanismRetrieving a seemingly random vector from pre-memorized corpusGenerating a high-information-content new structure in real-time within the current context
Cognitive costMinimal — pure retrievalExtreme — the generator is also expanding their own representation space in real-time
Musical analogueQuoting an unfamiliar melody from another culture — sounds like noise, but is actually deterministic copyingColtrane leaving the theme in A Love Supreme to enter free improvisation — genuine creative mutation

Random response — hover to see: chaotic, disconnected jumps.


An Information-Theoretic Measure of Conversation Quality

The above framework has a powerful implied corollary: conversation quality can be quantified.

Conversation quality ≈ Δdim(both parties' internal representation spaces)

The optimal conversation strategy:

  1. Expand dimensions in orthogonal directions at a certain frequency — the core driver
  2. Intersperse collinear deepening to consolidate new dimensions — otherwise the structure collapses
  3. Occasionally introduce one high-quality random input to test boundaries — exploring the unknown

This is essentially the core principle from Schmidhuber's Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes: an agent's intrinsic reward comes not from low entropy (boredom), nor from high entropy (noise), but from the rate of entropy decrease — the moment when the model is successfully compressing information that was previously incompressible. Orthogonal exchange precisely maximizes this compression progress.


Conversation as Coupled Dynamics

The above framework is unidirectional (B's response to A), but real conversation is bidirectional. If A and B are both simultaneously updating their internal models, conversation becomes the coupled dynamics of two predictive coding systems.

A good conversation may correspond to the two systems reaching a kind of resonance — each party's orthogonal inputs happen to land in the direction most amenable to expansion for the other's model.

This might explain why some people just "click" in conversation: not because their knowledge bases are similar, but because their models' expandable directions happen to be complementary.