Predictive Coding, Music, and the Vector Space of Conversation

2026-04-05

Starting from predictive coding theory, explaining the neural mechanisms of musical pleasure, and modeling conversation as directional choices in vector space — revealing the shared information-theoretic foundation of both.

Predictive Coding: A Unified Explanation of Pleasure

The cerebral cortex (especially the auditory cortex and prefrontal cortex) continuously predicts the next note. When a prediction is violated "just right" — say, an unexpected but harmonically reasonable key change — the prefrontal-striatal circuit generates a positive prediction error signal. This is the direct source of pleasure.

A landmark 2011 study in Nature Neuroscience by the Zatorre team at McGill University revealed the dopamine mechanism behind this: when music induces "chills," dopamine release in the ventral striatum occurs in two phases — the anticipation phase (caudate nucleus activation) and the climax phase (nucleus accumbens activation). This is the same reward circuit used for food, sex, and other primary rewards.

This leads to a core principle:

Too predictable → boredom. Completely unpredictable → noise aversion. Maximum pleasure occurs in the medium information-entropy range.

This is the Wundt curve (inverted U-shape). From this perspective, the essence of music is a temporal art that manipulates neural prediction systems through the establishment and disruption of acoustic patterns.

This model doesn't only apply to music. It can explain any pleasure experience based on pattern expectancy — including conversation.

The Wundt curve — pleasure peaks at medium prediction error. Hover the dots.

A Vector Space Model of Conversation

Conversation can be modeled as directional choices in vector space. At the start of an exchange, initiator A proposes a direction vector x. Each subsequent response from B is a choice of a new vector in this space. Different choices lead to vastly different conversation quality.

Strategy 1: Collinear — Prediction Error ≈ 0

B chooses a direction aligned with or at a very small angle to x.

This manifests in two forms:

Collinear reinforcement: continuing the logical reasoning along A's line of thought — shallow deepening.
Neighborhood search: finding shallowly related content for synonymous transformation — slightly different direction but similar magnitude.

Both are extremely energy-efficient for B's brain. But A wants to yawn — because A's internal model can fully predict B's output, yielding zero information gain. The musical analogue: endlessly repeating the same chord progression. Safe, but boring. The nucleus accumbens won't release dopamine.

Collinear response — hover to see: almost no deviation. Boring.

Strategy 2: Orthogonal — Significant but Integrable Prediction Error

B proposes a direction orthogonal to x, with a critical constraint: the origin must lie within the range of x.

This means that although B's contribution surprises A's predictions, A can absorb the new information by expanding dimensionality. The musical analogue: a "just-right violation of expectation" — an unexpected key change that can be harmonically rationalized after the fact.

The effect? The conversation is elevated by one dimension. A's internal representation space goes from n to n+1 dimensions; all subsequent predictions occur in a higher-dimensional space, and the explorable state space grows exponentially. Prediction error triggers dopamine release, and successful model updating produces a second wave of reward.

This requires substantial knowledge reserves from B, but the payoff is enormous. Deep conversations that feature orthogonal exchanges at a certain frequency will be of very high quality.

Orthogonal response — hover to see the dimension expand. The key insight.

Strategy 3: Random — Very Large and Hard-to-Integrate Prediction Error

B proposes a vector y completely unrelated to x in both direction and origin. The musical analogue: suddenly inserting a passage of atonal noise. Generally perceived as absurdist, producing confusion or aversion. But sometimes surprisingly effective.

The key insight: randomness is surface-level. The underlying nature falls into two types:

	Memorized	True Random
Mechanism	Retrieving a seemingly random vector from pre-memorized corpus	Generating a high-information-content new structure in real-time within the current context
Cognitive cost	Minimal — pure retrieval	Extreme — the generator is also expanding their own representation space in real-time
Musical analogue	Quoting an unfamiliar melody from another culture — sounds like noise, but is actually deterministic copying	Coltrane leaving the theme in A Love Supreme to enter free improvisation — genuine creative mutation

策略三：随机 — 预测误差极大且难以整合

B 提出一个与 x 的方向和起点完全无关的向量 y。对应音乐里突然插入一段无调性噪音。一般来说会被认为是搞抽象，体验是困惑或排斥。但有时或许有奇效。

关键在于：随机是表象，本质分两种：

	背诵型	真随机型
机制	从预先记忆的语料中检索一个看似随机的向量	在当前上下文中实时生成高信息量的新结构
脑力消耗	最低——纯检索	极高——生成者自己也在实时扩展表征空间
音乐类比	引用一段听众不熟悉的异文化旋律——听起来像噪音，但其实是确定性的复制	Coltrane 在 A Love Supreme 里离开主题进入自由即兴——真正的创造性突变

Random response — hover to see: chaotic, disconnected jumps.

An Information-Theoretic Measure of Conversation Quality

The above framework has a powerful implied corollary: conversation quality can be quantified.

Conversation quality ≈ Δdim(both parties' internal representation spaces)

The optimal conversation strategy:

Expand dimensions in orthogonal directions at a certain frequency — the core driver
Intersperse collinear deepening to consolidate new dimensions — otherwise the structure collapses
Occasionally introduce one high-quality random input to test boundaries — exploring the unknown

This is essentially the core principle from Schmidhuber's Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes: an agent's intrinsic reward comes not from low entropy (boredom), nor from high entropy (noise), but from the rate of entropy decrease — the moment when the model is successfully compressing information that was previously incompressible. Orthogonal exchange precisely maximizes this compression progress.

Conversation as Coupled Dynamics

The above framework is unidirectional (B's response to A), but real conversation is bidirectional. If A and B are both simultaneously updating their internal models, conversation becomes the coupled dynamics of two predictive coding systems.

A good conversation may correspond to the two systems reaching a kind of resonance — each party's orthogonal inputs happen to land in the direction most amenable to expansion for the other's model.

This might explain why some people just "click" in conversation: not because their knowledge bases are similar, but because their models' expandable directions happen to be complementary.

Locked Post

Predictive Coding, Music, and the Vector Space of Conversation

Predictive Coding: A Unified Explanation of Pleasure

预测编码：愉悦感的统一解释

A Vector Space Model of Conversation

Strategy 1: Collinear — Prediction Error ≈ 0

对话的向量空间模型

策略一：同向 — 预测误差 ≈ 0

Strategy 2: Orthogonal — Significant but Integrable Prediction Error

策略二：正交 — 预测误差显著但可整合

Strategy 3: Random — Very Large and Hard-to-Integrate Prediction Error

策略三：随机 — 预测误差极大且难以整合

An Information-Theoretic Measure of Conversation Quality

Conversation as Coupled Dynamics

对话质量的信息论度量

对话作为耦合动力学