Back to PodcastsClaude
Stop babysitting your agents
[Music starts]
[音乐开始]
Good afternoon everybody. My name is Sid Budhiraja. I'm one of the founding engineers of Claude Code. And today I'm excited to be here to talk to you guys about how you can stop babysitting your agents.
大家下午好。我叫 Sid Budhiraja,是 Claude Code 的创始工程师之一。今天很高兴能在这里和大家分享:如何告别对 agent 的保姆式盯守。
Um, as models have been getting smarter, I've noticed that we're increasingly spending a larger percentage of our time staring at the screen waiting for Claude to finish its work or just acting as a glorified QA tester for Claude.
随着模型越来越强,我发现我们越来越多的时间都花在盯着屏幕等 Claude 完成任务上,或者只是充当 Claude 的专职 QA 测试员。
And this can be quite unsatisfactory and also just an inefficient use of your time. And my goal for this talk is to, uh, give you strategies and help you take back some of this time so that you can manage your agents better.
这不仅体验很差,也是对你时间的低效利用。我今天演讲的目标,就是给大家一些策略,帮你把这些时间夺回来,更好地管理你的 agent。
Um, you can also think of this as a, uh, a more advanced Claude Code talk, so a Claude Code 301 type university class. Uh, and because of that, we have some prerequisites.
你也可以把这看成一堂进阶版的 Claude Code 课,相当于大学里的 301 课程。所以,有一些前置要求。
And some table stakes that, uh, everyone here should have at least heard about if not implemented for your own projects. Starting with a very high-quality CLAUDE.md file. This is the single highest leverage thing that you can do to improve your Claude Code experience. So if you haven't done this yet, highly encourage you to try it out.
这些是基础门槛,在座的每个人至少应该听说过,最好已经在自己的项目里实践了。首先是一份高质量的 CLAUDE.md 文件。这是你能做到的、对 Claude Code 体验提升最大的单件事。如果你还没做,强烈建议去试试。
Number two is connecting your tools to Claude Code. A good rule of thumb is that if a tool is useful for you in your day-to-day life, it will also be useful for Claude. So things like, you know, Slack, uh, Asana, Linear, Datadog, BigQuery—all of these things help Claude stitch together a much richer context for itself.
第二是把你的工具接入 Claude Code。一个好的判断标准是:如果某个工具在你日常工作中有用,对 Claude 也会有用。比如 Slack、Asana、Datadog、BigQuery 之类的工具,都能帮 Claude 构建出更丰富的上下文。
And it's able to perform much better if you give it access to these tools. And finally, setting up, uh, your remote environment on Claude Code Web.
给它访问这些工具的权限,它的表现会好很多。最后,在 Claude Code Web 上配置你的远程环境。
This makes it so that the compute that's running your Claude Code is separated or decoupled from your laptop. So you can close your laptop, your laptop could die, you could spill some water on your laptop, and your Claude Code sessions will still continue because they're running in the cloud.
这样一来,运行 Claude Code 的计算资源就和你的电脑解耦了。你可以关上笔记本,笔记本可以没电,甚至泡水,Claude Code 的会话都还在跑,因为它们运行在云端。
Um, I'd love to see a show of hands here. Uh, how many people use Claude Code every day?
我想请大家举一下手。每天都用 Claude Code 的有多少人?
Okay, that's almost everyone. Uh, how many people have completed the first two things here? So high-quality CLAUDE.md and you've connect—and you've connected your tools.
好,差不多所有人。那完成了前两项的有多少?就是有高质量的 CLAUDE.md,也接入了工具的。
Okay, it's about 50% I'd say. And then how many people have done all three?
大概一半吧。那三项都做完的有多少人?
Okay. Uh, if you haven't raised your hand at all, don't worry, you'll still get some value out of this talk, but I would encourage you to start with these three things first.
好。如果你一次手都没举,别担心,今天的分享对你还是有价值的,但我建议你先从这三件事开始做起。
Okay, so why does your tooling need to change? Um, most software tooling so far was built with, uh, humans in mind.
那么,为什么你的工具链需要改变?目前为止,大多数软件工具都是为人类设计的。
Uh, you know, whether it's linters, IDEs, Prettiers, type checkers, even compilers, they were mostly written with, uh, with the goal of making humans and human teams faster.
无论是 linter、IDE、Prettier、类型检查器,还是编译器,它们基本上都是为了让人类和人类团队跑得更快而设计的。
But the problem now is that humans aren't writing most of our code anymore. It's—it's agents. So we have to take a step back, zoom out, and reconsider our tooling.
但现在的问题是,写代码的主力已经不是人类了,而是 agent。所以我们要退一步,重新审视我们的工具链。
And when you do that, there's—there's some good news and then there's some bad news. Uh, the good news is that a lot of these tools that we've built for ourselves translate over pretty well for agents as well.
当你这样做的时候,会发现有好消息,也有坏消息。好消息是,我们为自己打造的很多工具,对 agent 来说同样适用。
So things like Prettiers and linters and symbol servers, Claude and agents can end up using these things quite effectively, uh, and they serve them pretty well.
比如 Prettier、linter、符号服务器,Claude 和 agent 都能很好地利用这些工具。
But the—the bad news is that we also have blind spots. As human beings, we have some assumptions that we make about our tooling and our toolchain that Claude doesn't have.
但坏消息是,我们也有盲区。作为人类,我们对工具链有一些默认的假设,而 Claude 并没有这些假设。
And for that reason, it's important to ask the question: what does an agent need from your codebase that a human takes for granted?
因此,有一个问题很重要:代码库里哪些是人类习以为常、agent 却不具备的东西?
And I'd love for you guys to keep that question in mind as we continue through the rest of the talk because it kind of frames, uh, frames the goal of not babysitting your agents as much in—in a much more clear way.
我希望大家在接下来的分享中一直记着这个问题,因为它非常清晰地点明了我们为什么要告别保姆式盯守。
So this is our roadmap for today. Uh, we'll be talking about three distinct, uh, uh, three distinct things that build on top of each other.
这是我们今天的路线图。我们会讨论三个相互递进的主题。
And when you take all of these three things together, they become incredibly powerful and give you a set of tools that, uh, can help you work in a way that we just haven't worked before as—as human beings.
把这三件事叠加在一起,会产生非常强大的效果,给你一套全新的工作方式,这是人类以前从未有过的。
Uh, so we'll be talking about verification, which is how to teach Claude to check its own work. Uh, once Claude can check its own work and be more reliable, we can now run many Claudes at the same time and be confident that they'll be doing the right thing.
我们会讲验证,也就是如何教 Claude 检验自己的工作。一旦 Claude 能自我验证并变得更可靠,我们就可以同时运行多个 Claude,并对它们的执行结果有信心。
So we'll be talking about strategies for multi-Clauding or parallelizing your work. And then finally, we'll end with background loops. And background loops are a way for you to completely take your keyboard out of the, uh, the hot path.
所以我们会讲多路并行 Claude 的策略。最后,我们以后台循环收尾。后台循环能让你彻底从关键路径上退出来。
So your keyboard is not the bottleneck anymore and Claude just keeps running in the background in a loop doing useful work for you.
键盘不再是瓶颈,Claude 就在后台不停地循环运行,持续为你产出有价值的成果。
Um, so I'd like to start the verification section with, uh, a brainstorm for—for a minute or so. Uh, I'd like everyone here to think about the last software project or feature that you worked on.
我想用一分钟左右的时间,先从验证这个话题切入,做一个思维发散。请在座各位想想你最近做过的一个软件项目或功能。
And while you were working on that feature, how did you check your own work? And I don't just mean how did you check the final output of your work, but I also mean how did you iterate on your work in a way that gave you confidence that you will end up in a place, uh, where you're expecting to go.
在做那个功能的过程中,你是怎么检验自己的工作的?我说的不只是如何验证最终产出,更是指你在迭代过程中如何建立信心,确保自己走在正确的轨道上。
So let's take 30 seconds. If you have a pen and paper in front of you, feel free to jot this down. If you have a laptop and you want to, like, put this in your notes, let's take 30 seconds together and just, like, come up with, uh, come up with, um, your last project and kind of how you verified your work there.
给大家30秒时间。如果面前有纸笔,可以记下来;有电脑也可以打在笔记里。大家一起花30秒,回想最近的项目,想想当时是怎么验证自己的工作的。
[Silence during brainstorm]
[静默,思考中]
Okay, I see some typing slowing down. So, um, hopefully you've had a chance to think about it a little bit. It's okay if you haven't completely, but I've found that most software engineering tasks can be broken down into the series of steps that you see on the screen.
好,看到大家打字速度慢下来了。希望大家有机会思考了一下,没想完也没关系。我发现大多数软件工程任务都可以拆解成你们在屏幕上看到的这几个步骤。
Uh, some—some combination or sequence or subset of these—of these things, uh, enable you to check your own work and build software.
这几个环节的某种组合或顺序,让你能够验证自己的工作、把软件做出来。
So you kind of start with designing and writing code. Uh, you then usually end up building your code, running your compilers, type checkers, etc. If they fail, you kind of go back and change your code again and run it and, you know, do that in a loop.
通常从设计和写代码开始,然后构建,运行编译器、类型检查器等等。如果失败了,回头改代码,再跑,如此循环。
Then you might run your—run your executable. Whether that's a—a Docker container or a CLI application or a web server. Uh, and then you might check for side effects.
然后是运行你的可执行文件,不管是 Docker 容器、CLI 工具还是 web server,之后再检查副作用。
So if you're running, uh, a web server, you might spin up your browser and you might see if the UI elements are showing up in the correct place. Uh, you might even look for logs to see if is—is a specific log that you're looking for present in your—in your logs.
如果跑的是 web server,可能要打开浏览器确认 UI 元素是否在正确的位置,还可能查日志,看你关心的那条日志有没有出现。
Or you might check the database to see what the state is and if—if state has been manipulated correctly. Uh, and then hopefully you'll run unit tests to make sure that you haven't made any regressions and your feature hasn't, uh, broken some other feature.
或者检查数据库,看状态是否被正确地更新了。然后跑单元测试,确认没有引入回归,没有把其他功能搞坏。
And—and hopefully you also added new unit tests for your—for the thing that you're working on. And then finally, you deploy to staging. Or—or if you're really brave, you—you go straight to prod.
最好还能为你正在做的功能补上新的单元测试。最后,部署到 staging,或者胆子大的,直接上 prod。
Um, and that's usually how humans kind of verify their work and build software. And what's interesting is that this same exact playbook can be used by Claude quite effectively to also verify its own work and build software.
这基本上就是人类验证工作、构建软件的方式。有趣的是,这套打法完全可以被 Claude 有效地用来验证它自己的工作和构建软件。
So as we go through the rest of this presentation, uh, it's—it's helpful to think about teaching Claude how to do things in a similar way that you would do them.
所以在接下来的分享中,可以想象你在教 Claude 做事,方式和你自己做事的方式类似。
And the only thing that's required is giving Claude the right tools and instruction set to make this possible.
唯一需要做的,就是给 Claude 配备正确的工具和指令,让这一切成为可能。
Okay, so we've talked about verification, uh, how humans do verification and how Claude should theoretically do verification, uh, but loops are really what makes the whole thing go round.
好,我们聊了验证,聊了人类怎么做验证、Claude 理论上应该怎么做验证,但循环才是让整件事真正运转起来的核心。
Uh, and this is arguably the most important slide in—in this presentation. So if you haven't been paying attention yet, this is—this is the good time to get started.
这可以说是整个演讲中最重要的一张幻灯片。如果你之前没在听,现在是时候集中注意力了。
Uh, a loop essentially is an autonomous circuit that you can complete for Claude. And it allows Claude to hill climb—uh, hill climb on a given task or a given success criteria.
循环本质上是一个你可以为 Claude 构建的自主回路,让 Claude 能够在给定任务或成功标准上不断爬坡。
So you can think about it as giving Claude, uh, access to tools to verify its own work and to write code. And what Claude will do is it will write some code, it will check if there's a failure.
你可以理解为:给 Claude 验证自己工作和写代码的工具,它会先写一些代码,然后检查是否有错误。
If there's a failure, it will debug that failure and write some more code. And it keeps doing that in a loop again and again and again until it gets to a success state.
如果有错误,它就调试、再写代码,如此反复,直到达到成功状态。
And when it finally gets to a success state, you can be confident that it—the PR that it's sending you is higher quality and will actually work.
一旦达到成功状态,你就可以有信心,它发给你的 PR 质量更高,而且真的能用。
So in—in this image that you see on the screen, uh, I faced an issue recently where on my personal website, the sign-up button stopped working.
在你们看到的这张图里,我最近遇到了一个问题:我个人网站的注册按钮失效了。
And what I told Claude was to make the sign-up button work. And this is kind of what it did. Uh, there's more steps here too, but for—for brevity's sake, it basically started writing some code, it—it built my app.
我告诉 Claude 让注册按钮恢复正常。这大致是它的执行过程,实际步骤更多,简单说来它先写了一些代码,然后构建了我的应用。
It clicked my sign-up button, opened up a browser and saw that the sign-up—clicking the sign-up button isn't really doing anything. It doesn't take you anywhere.
它点击了注册按钮,打开了浏览器,发现点击注册按钮没有任何响应,不会跳转到任何页面。
So then it decided to read some logs and it—it found out what the problem was. It fixed the code, reloaded the app, and kept doing that until it got to a successful state.
于是它决定读取日志,找到了问题所在,修复了代码,重新加载应用,如此循环,直到达到成功状态。
And finally, what it came up with was a PR that indeed worked. So, the most important thing to take away from this slide is that wherever possible, our goal now is to get Claude into a loop by giving it the tools and instructions that are required for it—to work effectively.
最终产出的是一个确实有效的 PR。这张幻灯片最重要的结论是:在条件允许的情况下,我们现在的目标是通过给 Claude 配备所需的工具和指令,让它进入循环,自主运转。
Uh, so verification comes, uh, in many flavors, right? Like we talked about UX verification, but you can have back-end verification, you may want to verify your entire app end-to-end including infra.
验证有很多种形式。我们聊了 UX 验证,但也可以有后端验证,甚至想端到端验证整个应用包括基础设施。
And the—the core concept remains the same. You—you want to give Claude the tools and the instructions to get it into a loop.
核心思路是一样的:给 Claude 工具和指令,让它进入循环。
And once you kind of figure that piece out, all three—all three of these flavors kind of merge into one, right? Like you—you don't have to be very specific about the instructions you give Claude as long as it has all the right tools and instructions, it'll be able to verify all of these things.
一旦搞清楚这一点,三种验证形式其实殊途同归。只要 Claude 有正确的工具和指令,你不需要把指令写得多细,它能自己验证所有这些东西。
We've talked a lot about theory and we've talked a lot about hypotheticals and, uh, jargon, but I wanted to—this slide to be a little bit more concrete. So what does it actually mean to give Claude the instructions and the tools to make it go in a loop?
我们聊了很多理论、假设和术语,但我希望这张幻灯片更具体一点。给 Claude 工具和指令让它循环运转,实际上意味着什么?
Uh, and it usually boils down to like four things. Uh, and I'll go through the the front-end or UX section from this slide.
通常可以归结为四件事。我来过一遍前端或 UX 这条线。
The first thing is to—is to run your application. So for a—a front-end application or a front-end, uh, verification loop, this might correspond to running your dev server.
第一件事是运行你的应用。对于前端应用或前端验证循环,这对应的是启动 dev server。
So running `npm run start` or whatever your dev server might be, it just spins up a dev server. Once the dev server—serv is up, you want Claude to actually use the web server.
跑 `npm run start` 或者你用的任何启动命令,把 dev server 起来。dev server 起来之后,你想让 Claude 真正使用这个 web server。
And the way it does that is by opening up a—a browser. My personal MCP tool of choice for this is the Claude in Chrome MCP tool.
它的做法是打开浏览器。我个人最常用的 MCP 工具是 Claude in Chrome MCP 工具。
Uh, you can access this with `/chrome`, uh, if you're using Claude Code. You can also use Playwright or there's a bunch of other like browser control MCPs that you can use to do that.
在 Claude Code 里可以用 `/chrome` 来访问它。你也可以用 Playwright,或者其他各种浏览器控制 MCP 工具。
Once Claude can, uh, drive your—your browser, the next—the next step is to—is to prove that something works. So if it's a fix it's working on, you—you want to take a screenshot before the fix and after the fix and make sure that, uh, it's the right state, right?
一旦 Claude 能驱动浏览器,下一步就是证明某件事确实有效。比如修了一个 bug,你要在修复前后各截一张图,确认状态是符合预期的。
And finally, there's unblocking it. Uh, so if you've ever tried to create a verification loop in a production app, you'll very quickly find that, uh, there are some blockers you run into.
最后是解除阻塞。如果你试过在生产环境搭建验证循环,很快就会碰到一些阻塞点。
Uh, and some of the common blockers are for example auth and, uh, state, right? So auth basically means, you know, you—you want to give Claude an identity that it can log into to your web application so it can actually start to use your app.
常见的阻塞点比如鉴权和状态。鉴权的意思是,你需要给 Claude 一个身份,让它能登录你的 web 应用,这样它才能真正使用你的应用。
And then state means you may want to pre-configure some state. For example, if you have like a e-commerce store, you may want to populate the inventory for that store for Claude to be able to like use your app meaningfully.
状态的意思是,你可能需要预配置一些数据。比如你有一个电商平台,可能要给 Claude 填充库存数据,它才能有意义地使用你的应用。
And this isn't very novel. Uh, in fact, uh, in traditional software engineering too, when you write end-to-end tests, uh, writing these state setup scripts are quite common.
这并不新鲜。其实在传统软件工程中,写端到端测试的时候,编写这类状态初始化脚本也很常见。
The only difference here is that you want to give Claude access to these scripts and you want to make them dynamic. You don't want to be too, uh, prescriptive about what these scripts are doing, and that allows Claude to do a much wider variety of things than you can do with static scripts.
唯一的区别是,你需要把这些脚本的访问权限给 Claude,并且让它们动态化。不要把脚本写得太死,这样 Claude 才能做到比静态脚本更多样的事情。
Okay, so we know what a verification loop now is. We know how to write one. How do you package it? How do you distribute this script to your colleagues, to your coworkers, even to your future self?
好,现在我们知道什么是验证循环了,也知道怎么写了。怎么打包?怎么把这个脚本分发给同事、队友,甚至留给未来的自己?
And one of the best ways of doing this is by using a skill. Uh, you can think of a skill as just a way to store some arbitrary context about a specific topic.
最好的方式之一就是使用 skill。你可以把 skill 理解为存储某个特定主题任意上下文的一种方式。
And in—in this case, that topic happens to be a verification loop. Uh, the interesting thing about skills also is that you can make them self-improving.
在这个场景里,这个主题就是验证循环。skill 还有一个有趣的地方:它可以自我进化。
So if you put in instructions into your skill about improving the skill every time Claude hits a blocker, you will end up creating this self-documenting, self-improving skill which everyone on your team can contribute to, not just you.
如果你在 skill 里写上指令,要求 Claude 每次遇到阻塞就去完善 skill,你最终会得到一个自我记录、自我进化的 skill,团队里的每个人都可以参与完善,而不只是你一个人。
And this makes it really powerful. This is actually how we do verification in the Claude Code team as well. We have a—a single verification skill, and the skill is—uh, explicitly told to keep documenting itself.
这让它变得非常强大。这其实也是 Claude Code 团队内部做验证的方式。我们有一个统一的验证 skill,这个 skill 被明确要求持续记录自身。
So every time someone runs into a blocker, the skill will go back in and edit its—edit itself so that next time when you or your colleague run into the same issue, it's not a problem.
每次有人遇到阻塞,skill 就会回去编辑自己,这样下次你或你的同事遇到同样的问题,就不再是问题了。
Okay, so uh we're going to jump into a demo next. Uh, but before the demo, uh I want to talk about what the application that I'm going to be using.
好,接下来我们进入 demo 环节。但在 demo 之前,我先介绍一下我要用的应用。
Uh, there is a—a type tester application called MonkeyType. Uh, how many of you have heard of MonkeyType? Okay, I thought so. It's a niche community.
有一个打字测速应用叫 MonkeyType。在座有多少人听说过 MonkeyType?好,我就知道,这是个小众社区。
Uh, but it's basically a type tester where uh it shows you a bunch of words as you can see, uh and you have to type those words as accurately and as fast as possible. And the application just, uh, tracks your stats for you.
它基本上是一个打字测速工具,会显示一堆单词,你需要尽可能又快又准地打出来,应用会自动统计你的成绩。
Uh, I like this as a demo app because it is represent—representative of a real-world full-stack app. Uh, it's written in TypeScript with a—a with a—uh with a Express back-end and MongoDB and Redis as persistence layers.
我选它做 demo 应用,是因为它很有代表性,是一个真实的全栈应用。它用 TypeScript 编写,后端是 Express,持久层用了 MongoDB 和 Redis。
Uh, and it's open source. So, you know, you guys can go to monkeytype.com right now, you can even check out the source code if you want. Uh, but what we'll be doing in this demo is we'll be creating a verification loop live.
而且是开源的。大家现在就可以去 monkeytype.com,还可以看源代码。在这个 demo 里,我们会现场创建一个验证循环。
So, you know, we'll tell Claude to spin up a new dev server, we'll tell it to—uh to kind of go and use the Chrome MCP to—to check some of its work.
我们会告诉 Claude 启动一个新的 dev server,然后用 Chrome MCP 检验一些功能。
Uh, and then once we create the verification skill, we'll also create a new feature and ask Claude to use the verification skill to verify itself. So, uh, let's get started with the demo. Uh, so we can switch over to my laptop screen.
创建好验证 skill 之后,我们还会做一个新功能,让 Claude 用验证 skill 来验证自己。好,开始 demo,切换到我的屏幕。
Okay, so this is a brand new Claude Code session. Um, I've already done the homework of setting up MonkeyType uh locally.
好,这是一个全新的 Claude Code 会话。我已经做好了在本地搭建 MonkeyType 的准备工作。
I've also installed some dependencies and—and curated a CLAUDE.md because I didn't want to do that in front of you guys and waste your time. So, let's tell Claude to spin up the dev server.
我也装好了依赖,整理好了 CLAUDE.md,这些我不想当着大家的面做,免得浪费时间。现在,让我们告诉 Claude 启动 dev server。
[Typing sounds]
[打字声]
Okay, so it says the dev server is already running, and that's right because I start—started it right before our talk. Uh, and let's go and check out what the—what's on the front-end.
好,它说 dev server 已经在跑了,确实,因为我在演讲开始前就启动了。我们去看看前端是什么样的。
So, if we go here, MonkeyType opens up, uh, I can start typing and there's like a little timer that shows up.
打开之后,MonkeyType 就出来了,我可以开始打字,然后会出现一个小计时器。
I'm not very good at typing, so there's a lot of typos here. Uh, but it's essentially what I would expect. Uh, let's also check out the back-end link.
我打字不太行,错字挺多的。不过基本符合预期。再来看一下后端接口。
This just returns a—a JSON. Uh, and it just basically means that, uh, the back-end is—is up and running, which is good. Uh, the next thing I'm going to do is I'm going to make sure that my Chrome MCP is—is enabled.
它返回了一个 JSON,说明后端已经正常运行了,很好。接下来,我要确认 Chrome MCP 已经启用。
And the way you do that is just `/chrome`. And as you can see here, it says status enabled, extension installed, which is—which is exactly what we're looking for.
方法是输入 `/chrome`。如你所见,它显示 status enabled,extension installed,这正是我们想要的。
If you don't have it installed, it'll take you through a little setup guide and you can install it for yourself. Uh, and now I'm going to say use the Chrome MCP to make sure that the front-end is working. Make it quick please.
如果你没有安装,它会引导你完成安装流程。现在我要说:用 Chrome MCP 确认前端正常运行,请快点。
[Typing sounds]
[打字声]
Okay, and what we should see now is that this is the tab that Claude is using, and it should call the Chrome MCP tool. So if we go back here, we can see two Chrome MCP tool calls.
好,现在应该能看到这个标签页是 Claude 正在用的,它应该会调用 Chrome MCP 工具。我们切回来,可以看到两次 Chrome MCP 工具调用。
I can Control-O and see exactly what it did. So, it navigated to localhost 3000, and then it's looking at the contents of—of the tab, which is—which is great.
我可以按 Control-O,看到它具体做了什么。它导航到了 localhost 3000,然后正在查看标签页的内容,很好。
But we want to do something more exciting than just looking at a static web page. It isn't—isn't very helpful. So let's say: "Can you..." actually, before I do that I'm going to resize these so you guys can see what's happening in the background.
但我们想做点更有趣的事,光看一个静态网页没什么意思。先等一下,我把窗口调整一下,让大家能看清后台发生的事情。
"Can you try typing and make sure everything works?"
好,输入:「你能试着打字,确认一切正常吗?」
Okay, so Claude apparently is also not very good at typing. Uh, but it typed in something and it says that typing works. Um, that's great. Let's do one more thing. Let's say: "Can you also use the settings and change something?"
Claude 看来也不太擅长打字。不过它确实打了些什么,它说打字功能正常。不错。再来一件事:「你能用设置改一个选项吗?」
Okay, so it navigated to the settings page, and it's changing the difficulty to expert. Not a good idea based on how—how it performed.
好,它导航到了设置页面,把难度改成了「专家」。根据它的表现来看,这个决定不太明智。
Okay, and it claims that the setting is persisted and it's able to—to verify that. So, that's great. This is—what we did so far is we just held Claude's hand and told it exactly what to do.
好,它声称设置已经持久化,并且成功验证了这一点。到目前为止,我们做的其实就是手把手告诉 Claude 要做什么。
So you were like: spin up the dev server, go and do these like two or three things that we care about. And that's basically verification, right?
比如:启动 dev server,去做这两三件我们关心的事。这基本上就是验证了。
Uh, what I can do next is I can tell Claude to take all the learnings from this session and put it into a skill file.
接下来我可以告诉 Claude,把这次会话里学到的东西整理成一个 skill 文件。
So I can say: "Take everything we learned and put it into a skill file in `.claude/demo-verification`."
我说:「把我们学到的一切整理成 skill 文件,放到 `.claude/demo-verification` 里。」
I didn't have to give it the full path but uh I chose to anyway. Okay, let's see. It wants to create a new directory.
其实我不需要给完整路径,但我还是给了。好,看看它想做什么。它想创建一个新目录。