Back to Podcasts Claude

Stop babysitting your agents

Good afternoon everybody. 大家下午好。 My name is Sid Budhiraja. 我叫 Sid Budhiraja。 I'm one of the founding engineers of Claude Code. 我是 Claude Code 的创始工程师之一。 And today I'm excited to be here to talk to you guys about how you can stop babysitting your agents. 今天很高兴来到这里，跟大家聊聊怎么停止对 AI 智能体的过度监管。 Um, as models have been getting smarter, I've noticed that we're increasingly spending a larger percentage of our time staring at the screen waiting for Claude to finish its work or just acting as a glorified QA tester for Claude. 随着模型越来越强，我发现我们花越来越多时间盯着屏幕等 Claude 干完活，或者只是充当 Claude 的高端 QA 测试员。 And this can be quite unsatisfactory and also just an inefficient use of your time. 这体验相当糟糕，也是对时间的低效使用。 And my goal for this talk is to, uh, give you strategies and help you take back some of this time so that you can manage your agents better. 我今天演讲的目标，就是给你们一些策略，帮你们把这部分时间夺回来，让你们能更好地管理自己的智能体。 Um, you can also think of this as a, uh, a more advanced Claude Code talk, so a Claude Code 301 type university class. 你也可以把这场演讲理解为进阶版 Claude Code 讲座，类似大学里的 Claude Code 301 课程。 Uh, and because of that, we have some prerequisites. 所以这里有一些前置要求。 And some table stakes that, uh, everyone here should have at least heard about if not implemented for your own projects. 有些基础内容，在座各位至少应该听说过，最好已经在自己项目里落地了。 Starting with a very high-quality CLAUDE.md file. 首先是一份高质量的 CLAUDE.md 文件。 This is the single highest leverage thing that you can do to improve your Claude Code experience. 这是你能做的、提升 Claude Code 体验杠杆率最高的一件事。 So if you haven't done this yet, highly encourage you to try it out. 如果还没做，强烈建议去试一试。 Number two is connecting your tools to Claude Code. 第二是把你的工具接入 Claude Code。 A good rule of thumb is that if a tool is useful for you in your day-to-day life, it will also be useful for Claude. 经验法则是：对你日常有用的工具，对 Claude 同样有用。 So things like, you know, Slack, uh, Asana, Linear, Datadog, BigQuery—all of these things help Claude stitch together a much richer context for itself. 比如 Slack、Asana、Datadog、BigQuery，这些工具都能帮 Claude 构建更丰富的上下文。 And it's able to perform much better if you give it access to these tools. 给它访问这些工具的权限，它的表现会好得多。 And finally, setting up, uh, your remote environment on Claude Code Web. 最后是在 Claude Code Web 上设置你的远程环境。 This makes it so that the compute that's running your Claude Code is separated or decoupled from your laptop. 这样运行 Claude Code 的计算资源就和你的笔记本电脑分离了。 So you can close your laptop, your laptop could die, you could spill some water on your laptop, and your Claude Code sessions will still continue because they're running in the cloud. 你可以关掉电脑，电脑可以没电，可以泼水在上面，Claude Code 会话都还会在云端继续跑。 Um, I'd love to see a show of hands here. 举个手，我想看看。 Uh, how many people use Claude Code every day? 有多少人每天都在用 Claude Code？ Okay, that's almost everyone. 好，几乎全部。 Uh, how many people have completed the first two things here? 有多少人完成了前两件事？ So high-quality CLAUDE.md and you've connect—and you've connected your tools. 就是高质量 CLAUDE.md，加上已经接了工具的。 Okay, it's about 50% I'd say. 大概 50% 吧。 And then how many people have done all three? 那么三件事都做完的有多少？ Okay. 好。 Uh, if you haven't raised your hand at all, don't worry, you'll still get some value out of this talk, but I would encourage you to start with these three things first. 如果你一次手都没举，不要担心，这场演讲你依然能学到东西，但我建议你先从这三件事开始做起。 Okay, so why does your tooling need to change? 好，那为什么你的工具需要改变？ Um, most software tooling so far was built with, uh, humans in mind. 到目前为止，大多数软件工具都是为人设计的。 Uh, you know, whether it's linters, IDEs, Prettiers, type checkers, even compilers, they were mostly written with, uh, with the goal of making humans and human teams faster. 无论是 linter、IDE、Prettier、类型检查器，还是编译器，它们基本都是为了让人和团队跑得更快而写的。 But the problem now is that humans aren't writing most of our code anymore. 但问题在于，现在写大多数代码的已经不是人了。 It's—it's agents. 是智能体。 So we have to take a step back, zoom out, and reconsider our tooling. 所以我们需要退一步，重新审视我们的工具链。 And when you do that, there's—there's some good news and then there's some bad news. 当你这样审视的时候，有好消息也有坏消息。 Uh, the good news is that a lot of these tools that we've built for ourselves translate over pretty well for agents as well. 好消息是，我们为自己打造的很多工具，对智能体也同样适用。 So things like Prettiers and linters and symbol servers, Claude and agents can end up using these things quite effectively, uh, and they serve them pretty well. 比如 Prettier、linter、符号服务器，Claude 和智能体都能高效使用，效果也挺好。 But the—the bad news is that we also have blind spots. 但坏消息是，我们也有盲点。 As human beings, we have some assumptions that we make about our tooling and our toolchain that Claude doesn't have. 作为人类，我们对工具链有一些默认假设，是 Claude 所没有的。 And for that reason, it's important to ask the question: what does an agent need from your codebase that a human takes for granted? 因此有一个很重要的问题值得思考：智能体需要从你的代码库中得到什么，而人类是理所当然就知道的？ And I'd love for you guys to keep that question in mind as we continue through the rest of the talk because it kind of frames, uh, frames the goal of not babysitting your agents as much in—in a much more clear way. 我希望你们在接下来的演讲中把这个问题放在脑子里，因为它能更清晰地框定减少对智能体过度监管这一目标。 So this is our roadmap for today. 这是我们今天的路线图。 Uh, we'll be talking about three distinct, uh, uh, three distinct things that build on top of each other. 我们会讲三个不同的主题，它们层层递进。 And when you take all of these three things together, they become incredibly powerful and give you a set of tools that, uh, can help you work in a way that we just haven't worked before as—as human beings. 把这三件事叠加在一起，会产生极其强大的效果，给你一套以前作为人类从未有过的工作方式。 Uh, so we'll be talking about verification, which is how to teach Claude to check its own work. 首先讲验证，也就是教会 Claude 自己检查自己的工作。 Uh, once Claude can check its own work and be more reliable, we can now run many Claudes at the same time and be confident that they'll be doing the right thing. 一旦 Claude 能自我检查、更加可靠，我们就可以同时跑很多个 Claude，而且有信心它们都在做正确的事。 So we'll be talking about strategies for multi-Clauding or parallelizing your work. 所以我们会讲多开 Claude 或并行工作的策略。 And then finally, we'll end with background loops. 最后是后台循环。 And background loops are a way for you to completely take your keyboard out of the, uh, the hot path. 后台循环让你完全把键盘从关键路径上移除。 So your keyboard is not the bottleneck anymore and Claude just keeps running in the background in a loop doing useful work for you. 键盘不再是瓶颈，Claude 就在后台循环持续为你做有价值的事。 Um, so I'd like to start the verification section with, uh, a brainstorm for—for a minute or so. 好，我想先从验证这一节开始，做一分钟左右的头脑风暴。 Uh, I'd like everyone here to think about the last software project or feature that you worked on. 请大家想想你最近做的那个软件项目或功能。 And while you were working on that feature, how did you check your own work? 在做那个功能的过程中，你是怎么检查自己的工作的？ And I don't just mean how did you check the final output of your work, but I also mean how did you iterate on your work in a way that gave you confidence that you will end up in a place, uh, where you're expecting to go. 我不只是指你怎么检查最终成果，还指你怎么在迭代过程中建立信心，确保自己会走到预期的地方。 So let's take 30 seconds. 让我们花 30 秒。 If you have a pen and paper in front of you, feel free to jot this down. 如果你面前有纸笔，可以写下来。 If you have a laptop and you want to, like, put this in your notes, let's take 30 seconds together and just, like, come up with, uh, come up with, um, your last project and kind of how you verified your work there. 如果你有笔记本，想记在笔记里，我们一起花 30 秒，回想一下你的上个项目，以及你是怎么验证那个工作的。 Okay, I see some typing slowing down. 好，我看到键盘声慢下来了。 So, um, hopefully you've had a chance to think about it a little bit. 希望大家稍微思考了一下。 It's okay if you haven't completely, but I've found that most software engineering tasks can be broken down into the series of steps that you see on the screen. 没完全想到也没关系，但我发现大多数软件工程任务可以拆解成屏幕上这几个步骤。 Uh, some—some combination or sequence or subset of these—of these things, uh, enable you to check your own work and build software. 某种组合、顺序或子集，让你能检查自己的工作，完成软件开发。 So you kind of start with designing and writing code. 你通常先从设计和写代码开始。 Uh, you then usually end up building your code, running your compilers, type checkers, etc. 然后通常会构建代码，跑编译器、类型检查器等工具。 If they fail, you kind of go back and change your code again and run it and, you know, do that in a loop. 如果失败了，你改代码再跑，如此循环。 Then you might run your—run your executable. 然后你可能跑一下可执行文件。 Whether that's a—a Docker container or a CLI application or a web server. 不管是 Docker 容器、CLI 应用还是 Web 服务器。 Uh, and then you might check for side effects. 然后你可能检查副作用。 So if you're running, uh, a web server, you might spin up your browser and you might see if the UI elements are showing up in the correct place. 比如跑 Web 服务器，你可能打开浏览器，看 UI 元素有没有出现在正确的位置。 Uh, you might even look for logs to see if is—is a specific log that you're looking for present in your—in your logs. 你可能还会看日志，确认你要找的日志是否出现在日志里。 Or you might check the database to see what the state is and if—if state has been manipulated correctly. 或者查一下数据库，看状态是否如预期被正确修改了。 Uh, and then hopefully you'll run unit tests to make sure that you haven't made any regressions and your feature hasn't, uh, broken some other feature. 希望你还会跑单元测试，确保没有引入回归，新功能也没有破坏其他功能。 And—and hopefully you also added new unit tests for your—for the thing that you're working on. 最好还为自己做的东西新增了单元测试。 And then finally, you deploy to staging. 最后，部署到 staging。 Or—or if you're really brave, you—you go straight to prod. 或者如果你够勇敢，直接上 prod。 Um, and that's usually how humans kind of verify their work and build software. 这大概就是人类验证工作和开发软件的方式。 And what's interesting is that this same exact playbook can be used by Claude quite effectively to also verify its own work and build software. 有趣的是，这套完全相同的方法，Claude 也可以很有效地用来验证自己的工作和开发软件。 So as we go through the rest of this presentation, uh, it's—it's helpful to think about teaching Claude how to do things in a similar way that you would do them. 在接下来的演示里，不妨把它想成：教 Claude 用和你类似的方式做事。 And the only thing that's required is giving Claude the right tools and instruction set to make this possible. 唯一需要的，就是给 Claude 正确的工具和指令集。 Okay, so we've talked about verification, uh, how humans do verification and how Claude should theoretically do verification, uh, but loops are really what makes the whole thing go round. 好，我们谈过了验证，聊了人类怎么做验证，也聊了 Claude 理论上应该怎么做，但让一切运转起来的关键其实是循环。 Uh, and this is arguably the most important slide in—in this presentation. 这可以说是整个演示里最重要的一张幻灯片。 So if you haven't been paying attention yet, this is—this is the good time to get started. 如果你之前没在认真听，现在是时候了。 Uh, a loop essentially is an autonomous circuit that you can complete for Claude. 循环本质上是一个自主回路，你可以为 Claude 搭建起来。 And it allows Claude to hill climb—uh, hill climb on a given task or a given success criteria. 它让 Claude 能在给定任务或成功标准上不断爬坡优化。 So you can think about it as giving Claude, uh, access to tools to verify its own work and to write code. 你可以把它理解为：给 Claude 访问工具的权限，用于验证自己的工作和写代码。 And what Claude will do is it will write some code, it will check if there's a failure. Claude 会写一些代码，然后检查有没有错误。 If there's a failure, it will debug that failure and write some more code. 如果有错误，它会调试并继续写代码。 And it keeps doing that in a loop again and again and again until it gets to a success state. 它就这样在循环里反复做，直到到达成功状态。 And when it finally gets to a success state, you can be confident that it—the PR that it's sending you is higher quality and will actually work. 一旦到达成功状态，你就可以有信心，它发给你的 PR 质量更高，是真正能用的。 So in—in this image that you see on the screen, uh, I faced an issue recently where on my personal website, the sign-up button stopped working. 屏幕上这张图来自我最近遇到的一个问题：我个人网站的注册按钮不工作了。 And what I told Claude was to make the sign-up button work. 我告诉 Claude 的就是：让注册按钮能用。 And this is kind of what it did. 它大概是这么做的。 Uh, there's more steps here too, but for—for brevity's sake, it basically started writing some code, it—it built my app. 还有更多步骤，但简单说，它先写了一些代码，然后构建了我的应用。 It clicked my sign-up button, opened up a browser and saw that the sign-up—clicking the sign-up button isn't really doing anything. 它点击注册按钮，打开了浏览器，发现点击注册按钮没有任何反应。 It doesn't take you anywhere. 没有跳转到任何页面。 So then it decided to read some logs and it—it found out what the problem was. 于是它去读了日志，找到了问题所在。 It fixed the code, reloaded the app, and kept doing that until it got to a successful state. 修复代码，重新加载应用，如此循环，直到成功。 And finally, what it came up with was a PR that indeed worked. 最终它提了一个 PR，确实可以用了。 So, the most important thing to take away from this slide is that wherever possible, our goal now is to get Claude into a loop by giving it the tools and instructions that are required for it—to work effectively. 这张幻灯片最重要的结论是：只要有可能，我们现在的目标就是通过给 Claude 正确的工具和指令，让它进入循环模式。 Uh, so verification comes, uh, in many flavors, right? 验证有很多种形式。