§ 0 写于
2026.04.19
written
2026.04.19
night
essay 2026 · 04 · 19 阅读 6 分钟6 min read

从导航工程绕到机制可解释性 From navigation engineering to mech interp

关于卡尔曼滤波和残差流之间,那条比想象中更短的路径。 On the surprisingly short walk from a Kalman filter to a residual stream.

大一上学期的某一天,我在导航工程的课上画卡尔曼滤波器。老师在黑板上写下那个递推:预测 → 观测 → 更新,状态向量沿着时间一步一步传,每一步都是上一步加上一个修正量。我盯着那个箭头,想到的不是飞机,是 transformer。 One day in first semester I was sketching a Kalman filter in navigation class. The recurrence on the board: predict → observe → update, state vector marching along time, each step the previous plus a correction. I stared at the arrow and thought, not of airplanes, but of transformers.

残差流也是这样:x_{ℓ+1} = x_ℓ + f(x_ℓ)。每一层往同一个向量里加一点东西。区别只是,卡尔曼滤波那个修正量是闭式解,是我们写出来的;残差流的修正量是学出来的,是我们读出来的 Residual streams are like that: x_{ℓ+1} = x_ℓ + f(x_ℓ). Each layer adds a little something to the same vector. The only difference: the Kalman correction is closed-form — we write it. The residual correction is learned — we read it.

写出来和读出来——这就是工程和可解释性的差别。一个是规定状态怎么演化,一个是看着已经演化的轨迹问"它在干嘛"。卡尔曼滤波我学了一学期,写得出每个矩阵的闭式。残差流我看了三个月论文,连第几层在干嘛都说不清。 Writing it versus reading it — that, to me, is the difference between engineering and interpretability. One specifies how state evolves; the other watches the trajectory and asks "what is it doing." I spent a semester on Kalman and can derive every matrix from scratch. I've read residual-stream papers for three months and still can't tell you what layer 12 is up to.

"工科教你怎么让一个东西,理科教你怎么它在跑什么。可解释性是把这两件事缝起来——你得先让它跑,再装作不知道它在跑什么。" "Engineering teaches you to make a thing run. Science teaches you to watch what's running. Interp is the stitch — you make it run, then pretend you don't know what's running."

导航工程的另一个礼物:噪声。我们整个学期都在跟噪声打交道——传感器噪声、过程噪声、协方差矩阵。这套训练让我对"信号 vs 噪声"有种本能的怀疑。后来读 mech interp 的时候,看到论文里那些 0.83 / 0.04 之类的数字,我会先问"这俩数据点的 baseline 是多少"——这是导航工程留给我的。 Another gift from navigation: noise. A whole semester wrestling sensor noise, process noise, covariance matrices. It built a kind of suspicion toward "signal vs. noise" that I now apply to every interp paper. When I see numbers like 0.83 / 0.04, my first move is "what's the baseline?" — that habit is from nav class.

所以这次转向不是 180 度,更像是 30 度的夹角。残差流那张图我第一次完全理解的时候,是在导航课的草稿纸边上画的。 So the pivot isn't 180°. More like 30°. The first time I fully understood the residual-stream diagram, I drew it in the margin of a nav-class worksheet.