一个不那么正式的实验复盘 An informal post-mortem

emotion probing 这个项目算告一段落了！趁热记一下踩的坑。 The emotion probing project is wrapped up. Notes on a few pitfalls, while they're still fresh.

§ 1要清楚自己在研究什么know what you're researching

当初看 Anthropic 博客上头了，当天就一鼓作气把模型拉下来要玩 steering。实际上我没想清楚自己要研究的是什么——这是后来实验进度混乱、迷茫情绪冒出来的主要原因。 I got hyped reading an Anthropic blog post and pulled a model down the same day to play with steering. I hadn't actually thought through what I was researching. That's the main reason the experiment later turned chaotic, and the main reason the lost-in-the-fog feeling crept in.

下次找到感兴趣的方向后，先画该领域的 map：从相关论文里提炼研究对象、研究方法、研究共识 / 分歧、相关领域（上级 / 并列 / 下级）。 Next time, before diving in, draw a map of the field: from related papers, extract research objects, methods, consensus / disagreements, adjacent domains (parent / sibling / child).

§ 2不要过于依靠 AIdon't lean on AI too hard

这个实验的所有代码都外包给 Opus 4.6 了。表面上看：没出 bug，我也能看懂大部分语法。但代码能力本身没有得到训练——更准确地说，我没在实验设计的语义层做 review。后来 audit 抓出来的几个问题（seed 没固定、keyword hit rate 系统性误判 vector effect 的方向）都不是语法 bug，是 conceptual bug，单看代码语法是看不出的。 All the code for this experiment was outsourced to Opus 4.6. On the surface: no bugs, and I could read most of the syntax. But the coding skill itself didn't get trained — and more precisely, I wasn't reviewing at the semantic layer of the experiment design. The issues audit later caught (unfixed seed, keyword-hit-rate systematically misjudging the direction of the vector's effect) weren't syntactic bugs — they were conceptual bugs. You can't see them from the syntax alone.

重要的不是写代码本身，而是思维的呈现。 What matters isn't writing the code itself; it's the thinking the code expresses.

下次试试"苏格拉底"提示词——让 AI 在写代码前先反问我每一步的 why（"你为什么用这个 metric？""这一步在验证哪个假设？"），而不是直接出脚本让我事后审。 Next time, try a "Socratic" prompt — have the AI ask me the why behind each step before writing code ("Why this metric?" "Which hypothesis is this step testing?"), instead of handing over a script and letting me audit it after the fact.

§ 3分析实验结果analyzing the results

这是考验思维的部分：一个实验现象能跟猜想对应上，要想清楚这是因果性还是相关性，以及如何设计实验去验证新的猜想。不要想当然地把现象绑定到自己最想看到的解释上。所以"看到现象 → 形成猜想"之后，下一步不是"写进结论"，而是立刻想：能不能造一个反例或者一个 control 把这个猜想推翻？默认假设是相关，举证责任在因果方。（我真的得去补逻辑学了 XD） This is where the thinking gets tested: when an observation matches a hypothesis, work out whether the relation is causal or merely correlational, and design the next experiment to verify the new hypothesis. Don't reflexively bind the observation to whichever explanation you most want to see. So after "observe → form hypothesis," the next step isn't "write it into the conclusion" — it's: immediately ask, can I construct a counterexample or a control that breaks this? The default is correlation; the burden of proof is on causation. (I really need to go brush up on some logic XD)

还有：记得固定 seed。实际意义是——下一次跑同一个实验得到不同结果时，要能立刻判断是数据 / 超参的问题，还是机制本身的问题。 Also: fix your seed. The actual point — when the same experiment yields a different result on the next run, you need to tell at once whether it's a data / hyperparam issue or whether the mechanism itself has a problem.