略读预计 3 分钟

I am dreading our LLM-written incident report future

摘要

作者区分了“用 LLM 辅助收集事故报告素材”与“让 LLM 直接写完整报告”的差异，前者可减少整理数据的体力劳动，后者则替代了关键的思考与验证过程。文章引用写作促使澄清思维的观点，强调亲自写作能暴露认知中的模糊点，而交由 LLM 生成文本会跳过这一环节。作者担忧 LLM 生成的事故报告可能呈现表面连贯的解释，但实际上可能编造系统间的因果关系或遗漏关键交互，而由于缺少人类深入整理与核查，这些错误难以被发现。相比代码或 SRE 诊断任务可以通过测试或结果反馈验证，事故报告本身缺乏即时“真值检验”，因此更容易产生看似正确但实际错误的内容。文章最后指出，在效率诱惑下，这种写作方式可能被广泛采用，但最终会导致事故记录变成缺乏真实洞察的“仿真文本”，削弱组织从事故中学习的能力。

荐读理由

在事故复盘这类场景中，写作本身被强调为迫使工程人员对证据做一致性检查的关键步骤，因此不能直接用生成文本替代这一推导过程，否则容易在未核对系统证据的情况下形成表面自洽但实际错误的结论，从而影响后续系统判断与学习。

原文

The other day, Reginald Braithwaite posted the following toot. For posterity, I’ve also included my own response to it:

Braithwaite’s post is dripping with sarcasm, but make no mistake, incident reports written entirely by LLMs is coming. And I am not looking forward to this future.

Before I dive in here, I want to note that there is a lot of toil you need to do in order to gather the data you need to write a good incident report, and LLMs can help significantly reduce that toil. I’ve got no issues there. But there’s a world of difference between using LLMs to help you assemble the ingredients involved in writing an incident report, and using an LLM to actually write the report itself.

Braithwaite’s post is horrifying to me precisely because of the seduction of the LLM as a tool for generating an incident report. After all, you can just ask it to write the report, and it’ll do it. And that’s exactly what scares me.

There’s a famous quote by the cartoonist Dick Guindon: “Writing is Nature’s way of showing you how sloppy your thinking is“. You might think you understand a concept, but it’s only when you put metaphorical pen to paper, when you actually try to explain the concept in written words to a potential reader, that you realize how fuzzy your understanding actually is. Writing in your own words forces you to confront how much you actually understand what it is that you’re writing about. Or, as Leslie Lamport put it, “If you’re thinking without writing, you only think you’re thinking.”

Having an LLM generate the text of an incident write-up bypasses this thinking step. Now there’s no human in the loop of the writing process that has to confront whether the explanation is actually consistent with the evidence that they’ve gathered. Instead, what you get is a plausible explanation of what happened to someone who is not intimately familiar with the details. They might read, nod along, and think, “yes, that makes sense.” But the LLM may have invented couplings between systems that aren’t there, and may miss critical interactions that were actually part of the incident, and because nobody did the hard work of actually synthesizing the data to do the write-up, nobody will notice. Because if you’re trying to reduce the writing effort, how much effort are you really going to put into checking the LLMs work.

In my view, LLM-generated incident write-ups are more dangerous than using LLM for coding or for AI SRE style tasks. For coding tasks, there’s always a testing step to check that the code exhibits the desired behavior, even if nobody looks at the code itself for meaningful details. For AI SRE tasks, either the LLM output helps you resolve the incident, or it doesn’t. In both cases, Nature is the ultimate arbiter of the LLM output.

But incident write-ups aren’t like that. The consequences of a poor report aren’t immediately apparent the way incorrect code or an incorrect operational diagnosis are in the moment. Instead, we get incident reports that have the superficially correct form, but are actually incorrect, with no obvious test for correctness.

And, because incident reports are time-consuming to write, the temptation to use AI tools to generate them will be overwhelming. But these LLMs will not go around talking to people that were involved in the incident. These reports will be simulacra; they will have the right form, but they will not provide readers with genuine insights into the nature of the system. The amount of learning will be significantly curtailed.

And, yes, people will probably use AI to summarize them as well.

It’s not a future I’m looking forward to.

Lobsters · 2 赞 · 0 评讨论 → 阅读原文 →

这条对你有帮助吗？