Terug naar podcastsClaude
Ship your first Managed Agent
All right.
好。
Hello everyone.
大家好。
It's great to see you all here today for our session on shipping your first manage agent.
很高兴今天在这里和大家一起参加「发布你的第一个托管智能体」专场。
Let's go ahead and get started.
我们开始吧。
My name is Isabella He.
我叫 Isabella He。
I'm a member of technical staff at Anthropic on the Applied AI team.
我是 Anthropic Applied AI 团队的技术成员。
The Applied AI team at Anthropic sits at the intersection of products, research, and our customers, which means that I get to contribute internally to products at Anthropic like Claude code and our Claude harnesses, as well as work externally with our customers that are building on top of Claude and on top of our harnesses.
Applied AI 团队处于产品、研究与客户的交汇点,这让我既能参与 Anthropic 内部产品的研发,比如 Claude Code 和我们的 Claude 执行框架,也能与在 Claude 和我们框架上构建应用的外部客户合作。
So, my goal today is to get you all hands-on with actually building on top of manage agents, understanding how the harness works under the hood, and getting you ready to actually ship your first incident response management.
今天我的目标是让大家亲手在托管智能体上实际构建,理解执行框架底层的运作原理,并为发布第一个事件响应管理智能体做好准备。
So, the quick overview of today's agenda.
先来快速过一下今天的议程。
We're going to cover first a quick refresher of Claude manage agents.
首先简要回顾 Claude 托管智能体。
I want to talk you through a little bit about how this harness works under the hood and what makes it so special.
我想带大家了解一下这个框架底层是如何运作的,以及它有何独特之处。
Our team put a lot of thought into the architectural design of Claude manage agents to make sure that it runs ready and reliably for production-ready agents.
我们团队在 Claude 托管智能体的架构设计上投入了大量心思,确保它能为生产环境的智能体提供稳定可靠的运行环境。
So, I want to talk you through a little bit of how that works so that then when we transition to the second portion here, which is the hands-on workshop, you'll actually understand what each of the primitives you're building actually mean for your agents under the hood.
我想先带大家了解一下它的工作原理,这样当我们进入第二部分的实践工坊时,你们才能真正理解所构建的每个原语对底层智能体意味着什么。
So, for the majority of today's session, I want you all to actually have your laptops open, building alongside me, actually working inside of a repository, and getting you ready to actually spin up a working incident response agent.
今天大部分时间,我希望大家打开笔记本电脑,跟着我一起动手,在代码仓库里实际操作,完成一个可运行的事件响应智能体。
Lastly, we'll talk a little bit about beyond the basics.
最后,我们会简单聊聊进阶内容。
Today's session is the first session of a couple of other ones that will build on top of this on Claude manage agents.
今天是 Claude 托管智能体系列中的第一场,后续还有其他专场会在此基础上深入展开。
Specifically, right after this one, I think there's another session on dreaming, which is one of my favorite new features with Claude manage agents for self-improving agents and memory built into the harness.
就在这场之后,我记得还有一场关于 dreaming 的专场,这是 Claude 托管智能体里我最喜欢的新特性之一,专门面向自我改进型智能体和框架内置的记忆功能。
So, encourage everyone to dive in a little bit deeper into what else is in the box after we set you all up for success today with a quick introduction.
鼓励大家在今天帮你们打好基础、完成快速入门之后,继续深入探索这个框架还有什么。
So, let's first touch a little bit about how we got here with Claude manage agents.
先来简单回顾一下我们是如何一步步走到 Claude 托管智能体的。
When we first released the very first Claude back in 2023, we released a messages API alongside access to Claude.
2023 年我们发布第一个 Claude 版本时,同步发布了 Messages API。
This provided raw model access to all Claude models.
这让所有人可以直接访问所有 Claude 模型。
This became the very first way that people could programmatically build on top of Claude and essentially gave a way for people to access tokens in and tokens out via our Claude models.
这是人们第一次能以编程方式在 Claude 上构建应用,本质上是通过 Claude 模型实现 token 的输入与输出。
This also meant that for everyone building on top of Claude models, they had to implement all the various primitives themselves.
这也意味着,所有在 Claude 模型上构建的开发者,都需要自己实现各种原语。
Things like context management, the actual agent loop, compaction, etc.
比如上下文管理、实际的智能体循环、压缩等等。
All the primitives that come alongside making the agent work.
让智能体运转所需的所有原语。
When models were less intelligent back in the early days of let's say 2023, some of these primitives were much simpler because agents could simply do less.
早期大约 2023 年那会儿,模型智能程度有限,这些原语相对简单,因为智能体能做的事情本来就少。
But, as we evolved into now with higher model intelligence and as agents are able to take on more complex tasks and actually take actions within environments and come to actually do entire tasks for humans, the primitives that come alongside context management and managing an agent's ability to execute API calls and tool calls becomes much more complex.
但随着模型智能程度的提升,智能体越来越能承担复杂任务、在环境中采取行动、真正为人类完成完整任务,与上下文管理和管理智能体执行 API 调用、工具调用相关的原语也变得复杂得多。
So, that's when we moved to the agent SDK, which became a harness that allows you to programmatically call Claude code, one of our agents at Anthropic.
于是我们转向了 Agent SDK,它成为一个执行框架,让你能以编程方式调用 Claude Code,也就是我们 Anthropic 的一个智能体。
So, Claude code is something that an agent has access to a computer and takes actions within file system.
Claude Code 是一个能访问计算机并在文件系统内采取行动的智能体。
So, the agent SDK became a way for you to make Claude much more powerful by leveraging the power of Claude code within a harness.
Agent SDK 因此成为一种途径,让你通过在框架内发挥 Claude Code 的能力,使 Claude 更加强大。
The main thing here though is that with the agent SDK, developers still had to manage hosting and scaling on their own and making sure that the agent SDK would be safe to run within their containers.
不过关键在于,使用 Agent SDK 时,开发者仍然需要自己管理托管和扩缩容,并确保 Agent SDK 能在自己的容器内安全运行。
That's only then evolved into Claude managed agents, which is the first harness to be able to handle scaling and production ready components for you by Anthropic, providing things like a purpose-built harness, sandboxing, observability, tool runtime, all within a managed infrastructure system.
这之后才演进出了 Claude 托管智能体,这是第一个由 Anthropic 为你处理扩缩容和生产就绪组件的执行框架,提供专用框架、沙箱、可观测性、工具运行时,所有这些都在托管基础设施体系内。
This means that developers can focus on task and agent configuration, custom tool logic, the things that actually matter for bringing domain expertise and customizability to your agents, where you're handing off the rest of all the primitives and core compute and primitives of essentially managing the basics of agent running to Anthropic.
这意味着开发者可以专注于任务和智能体配置、自定义工具逻辑,这些才是为智能体带来领域专业能力和可定制性的核心,而将其他所有原语和基础计算,也就是管理智能体运行基础的工作,全部交给 Anthropic 来处理。
So, that brings me to managed agents as the fastest way to build production ready agents on Claude.
这就是托管智能体作为在 Claude 上构建生产就绪智能体最快途径的由来。
We've seen people build 10 to 15 times faster to production with Claude managed agents by leveraging our purpose-built harness.
我们已经看到很多团队借助 Claude 托管智能体,实现了 10 到 15 倍的生产效率提升。
Part of the reason why we built Claude managed agents is because is because harnesses should evolve alongside your agents.
我们构建 Claude 托管智能体的部分原因,是因为执行框架应该随着智能体一起演进。
For example, back when we were building ourselves on top of models like Sonnet 4.5, we noticed that Sonnet 4.5 emitted a particular behavior called context anxiety.
比如,当我们早期在 Sonnet 4.5 这类模型上自行构建时,发现 Sonnet 4.5 会产生一种叫做上下文焦虑的行为。
This meant that with Sonnet 4.5, Claude started wrapping up tasks early even when it still had room to spare in its context window.
这意味着 Sonnet 4.5 版本的 Claude 会在上下文窗口还有余量的情况下提前收尾任务。
To manage that in our harness, we then added some mitigations to combat against this early stopping behavior.
为了在框架里应对这个问题,我们专门加入了一些缓解措施来对抗这种提前停止的行为。
But, when Opus 4.5 then came out, we actually saw this behavior go away, making all that work we had done inside of the harness essentially obsolete because Claude had evolved beyond that behavior that we had built into the harness to manage.
但 Opus 4.5 发布后,我们发现这个行为消失了,使得我们在框架内做的那些工作基本上成了多余,因为 Claude 已经超越了我们当初构建进框架里要应对的那个行为。
So, the takeaway there is that it's a lot of work to maintain harnesses and make sure that they actually evolve alongside your agents, which is why with Claude managed agents, we want to make it really easy for Claude and Anthropic to handle all the complexities that come with compaction, caching, things like context anxiety, all these various primitives that come with actually making agent production ready and getting the most out of Claude.
所以,关键启示是:维护执行框架并确保它真正随着智能体演进,需要大量工作,这正是 Claude 托管智能体希望解决的问题,让 Claude 和 Anthropic 来处理压缩、缓存、上下文焦虑等所有复杂性,以及在生产环境中真正让智能体跑起来、充分发挥 Claude 潜力所涉及的各种原语。
So again, you can focus on the tasks, tools, and things that actually matter for building agents on Claude.
这样你就可以专注于任务、工具以及在 Claude 上构建智能体真正重要的事情。
So three primary resources go into building on Claude managed agents.
构建 Claude 托管智能体涉及三种核心资源。
First is the agent's endpoint, which is the persona and capabilities.
第一是 Agent 端点,也就是智能体的人格和能力定义。
This is the core system prompt that powers your agent.
这是驱动你的智能体的核心系统提示。
Essentially here, you're defining the model, the MCP servers, the skills, the various components that your agent can actually leverage when it's able to run in that agent loop.
在这里,你定义模型、MCP 服务器、技能以及智能体在进入智能体循环后实际能够调用的各种组件。
The next is the environments.
其次是环境。
You can think of this as the hands of the agent, where the previous one is the brain of the agent where the agent is thinking through what to execute, and then it's using an environment to actually have a space and a container to actually take action on your behalf.
可以把它理解为智能体的双手,前者是智能体的大脑,负责思考要执行什么,而环境则是一个空间和容器,让智能体可以代表你真正采取行动。
Sessions are next the way to tie together agents and environments.
接下来是会话,用于将智能体和环境绑定在一起。
A single session has a spun up on an agent instance within an environment.
单个会话会在某个环境内启动一个智能体实例。
So you can connect the two together and actually stream events back to your user and start to take action on behalf of your humans as part of a Claude powered agent.
你可以将二者连接起来,将事件实时流式传输给用户,并作为 Claude 驱动的智能体开始代表用户采取行动。
A key thing here, as I alluded to briefly before, Claude managed agent has the agent loop run server side.
这里有一个关键点,正如我之前简要提到的,Claude 托管智能体的智能体循环在服务端运行。
This means that a lot of the complexities that come with managing hosting and scaling are abstracted away.
这意味着大量与管理托管和扩缩容相关的复杂性都被抽象掉了。
And when you close your laptop or you hit hard refresh on your agent that you're building on Claude managed agents, everything is maintained and you don't have to worry about durability, reliability, all these various aspects that usually come to bite you when you're trying to turn your agent from a prototype into production.
当你关上笔记本,或者对正在 Claude 托管智能体上构建的智能体强制刷新时,一切都保持不变,你不必担心持久性、可靠性以及其他各种在将智能体从原型推向生产时会困扰你的问题。
And lastly here, before we dive into the hands-on portion, is I want to talk you through a key design decision that went into Claude managed agents.
最后,在进入实践环节之前,我想带大家了解一个 Claude 托管智能体设计时的关键决策。
Previously, with a lot of agent harnesses, we saw the agent loop coupled tightly with tool execution.
以前很多智能体框架中,智能体循环与工具执行是紧耦合的。
This design pattern made sense and still makes sense for some agents because you want to give the agent powerful abilities to actually take action within the environment.
这种设计模式在某些智能体中有其合理性,因为你希望赋予智能体在环境中真正采取行动的强大能力。
For instance, with Claude Code, we want the agent to be able to access various files on your computer, take action within a file system, and therefore it makes sense for the agent to have access to all those tools spun up on every container.
比如 Claude Code,我们希望智能体能访问你计算机上的各种文件、在文件系统内操作,因此让智能体在每个容器上都能访问所有工具是合理的。
But, we also realized there are some constraints for this, especially with some agents where you essentially want to be able to decouple the hands from the brains of the agents.
但我们也意识到这有一些局限,尤其对于某些智能体,你实际上希望能将智能体的双手与大脑解耦。
For instance, credentials and uh credentials and security became a huge concern.
比如,凭证管理和安全成为了一个重大问题。
With the ability to have the agent access your file system, you can actually add very distinct sandboxing by decoupling these two components, where the agent is no longer able to access the actual credentials without encryption by decoupling the hands from the sandbox of the agent.
通过让智能体能访问你的文件系统,你可以通过解耦这两个组件实现非常精细的沙箱隔离,智能体在与沙箱解耦之后,便无法在没有加密的情况下直接访问实际凭证。
The other aspect here is actually you can see huge benefits by doing these decoupling on things like time to first token and latency.
另一个方面是,这种解耦在首 token 时间和延迟方面也能带来巨大收益。
Previously, with the agent loop into execution in the same box, you had to spin up containers for every single session that you're spinning up in the agent, which contributed to additional latency from a time to first time to first token perspective.
以前,智能体循环与执行在同一个容器内,每次启动一个会话都要为智能体重新创建容器,这从首 token 时间的角度带来了额外的延迟。
But, with this now decoupled, our teams actually saw reductions in time to first token along the lines of over 90% reduction in TTFT for our P95 metrics on latency.
而解耦之后,我们团队实际上看到首 token 时间大幅下降,P95 延迟指标的 TTFT 降幅超过了 90%。
So, here you can start to see the power of this design decision coming through from the perspective of safety, reliability, latency, and everything else that you care about when it comes to building production-ready agents.
从这里可以开始看出这个设计决策的力量,它在安全性、可靠性、延迟以及你在构建生产就绪智能体时关心的方方面面都带来了价值。
All right, so now it's time for the exciting part of today's session, which is where I want you all to open up your laptops and go to this URL here to actually clone a repository, and let's start to actually feel the magic of everything that I just talked through.
好,现在到了今天最精彩的部分:我希望大家打开笔记本,访问屏幕上的这个链接,克隆代码仓库,然后我们一起感受我刚才介绍的这一切的魅力。
So, I'm going to give everyone a second to just go over to that URL there and just spin up the repository that we have ready for you.
我给大家一点时间,去那个链接,把我们准备好的代码仓库克隆下来。
All right, so here are some additional commands that I want you all to run to make sure this is all set up on your computers.
好,这里还有一些额外命令,我希望大家都运行一下,确保在你们的电脑上都配置好。
So, the first step many of you might have done already, but just take that repository, hit the URL, get clone it, and then I want you to CD into the specific repository for the session, which is ship your first manage agent.
第一步很多人可能已经做过了,但还是按步骤来:访问链接、克隆仓库,然后 cd 进入本次专场的具体仓库目录,也就是 ship your first manage agent。
And then, if you're on Mac, you'll see those two commands on the side, the Python and the source.
如果你用 Mac,可以看到旁边有两条命令,Python 那条和 source 那条。
Um, there's a command there for Windows as well.
嗯,Windows 也有对应的命令。
And you'll just do the rest there where you want to install the requirements, copy over the environment key into your .env file.
然后按步骤安装依赖,把环境变量复制到 .env 文件。
Um, here you'll put in the Anthropic API key that hopefully all of you also received from the QR code for free credits earlier.
嗯,在这里填入 Anthropic API 密钥,希望大家都通过之前的二维码拿到了免费额度。
And lastly, we'll just run the app.
最后,运行应用就好了。
All right, let's go ahead and dive in, but as I mentioned before, let me just show everyone where these instructions are.
好,开始吧,不过正如我之前提到的,先让大家看看这些说明在哪里。
If you go into the repository in the link and then go to ship your first manage agents, you scroll down on the read me, you'll see all the setup instructions here.
进入链接里的仓库,然后打开 ship your first manage agents 目录,滚动到 README 下方,可以看到所有的安装说明。
So, feel free to do this, um, as we go along or even in your own time later today and continue playing around with it, but as I mentioned before, everything will be also shown on the screen to follow along with.
可以跟着我们同步操作,也可以之后自己再回来折腾,但正如我之前说的,所有步骤都会在屏幕上展示,跟着看就好。
So, do not worry if you did not have time to fully get it set up on your laptop.
如果没来得及在笔记本上全部配置好也不用担心。
Without further ado, let's go ahead and dive in.
废话不多说,开始吧。
So, once you run streamlit run app.py, you should be able to see a URL that looks like this and a page that looks like this.
运行 streamlit run app.py 之后,应该能看到一个类似这样的链接,以及类似这样的页面。
What we're doing here is we're going to be simulating an agent, um, interaction here where we have an incident that's going to come up.
我们接下来要做的是模拟一个智能体交互场景,会有一个事件出现。
A lot of you who might be software engineers in the room will be intimately familiar with the pain that comes alongside incident response.
在座很多可能是软件工程师的朋友,对事件响应的痛苦一定深有体会。
If you are software engineer, you might be woken up at, let's say, 3:00 a.m. in the morning, 2:00 a.m. in the morning when you're out around on on vacation as you're on call, and this is usually a very painful portion of a software engineer's life, uh, because when you're on call, it means that if a server goes down or a service goes down, you have to be immediately the one there to respond and tackle the incident.
作为软件工程师,你可能会在凌晨 3 点、凌晨 2 点被叫醒,在度假途中被 on call 打断,这通常是软件工程师生涯中非常痛苦的一段经历,因为当你 on call 时,一旦服务器或某个服务挂掉,你必须立刻出现、处理故障。
Usually for a human, this means diving into metrics and logs and deployments.
对人类来说,这通常意味着要钻进指标、日志和部署记录里。
You can actually investigate what's going on.
你得亲自排查到底出了什么问题。
And so, what we're going to do is we're going to now have an agent run on Claude manage agents to do all this for us.
所以,我们接下来要做的,就是让一个运行在 Claude 托管智能体上的智能体来替我们完成这一切。
So, that when we get woken up by 3:00 a.m., we can hand it off to an agent, or maybe we don't even get woken up at all if Claude is able to do everything for us.
这样当凌晨 3 点被叫醒时,我们可以把任务交给智能体,甚至可能根本不需要被叫醒,如果 Claude 能替我们搞定一切的话。
Okay.
好。
So, let's now go ahead and dive into the code here.
那我们来看代码。
What we're going to open up here is we have the agent.py file on the left and the agent complete on the right.
这里打开的是左边的 agent.py 文件和右边的 agent_complete 文件。
If you want to challenge yourself, you can of course try to implement everything yourself here or with Claude.
如果想挑战自己,当然也可以自己或者用 Claude 来实现所有内容。
Um but, what we're going to do just for simplicity's sake is just copy over various elements from the completed file onto the incomplete file one by one.
不过为了简单起见,我们就逐一把完整文件里的各个部分复制到不完整的文件中。
So, you can see how these primitives compose our agent one piece at a time.
这样你就能看到这些原语是如何一步步组成我们的智能体的。
So, let's go ahead and start off with this very first part, which is the agent.
好,先从第一个部分开始,也就是 Agent。
We mentioned before that the agent is the one that defines the persona and the capabilities of the agent here.
我们之前提到,Agent 定义了智能体的人格和能力。
So, that's model, the system prompts, and the tools in our case for our agent here.
在我们的例子里,就是模型、系统提示和工具。
So, let me go ahead and copy over what we see there on the screen.
我来把屏幕上看到的内容复制过来。
And you can see here that we're defining the SRE agent.
可以看到,这里我们在定义 SRE agent。
We're going to use Claude Opus 4.7 here.
我们将使用 Claude Opus 4.7。
And I've preconfigured a system prompt and tools for the agent.
我已经预先配置好了系统提示和工具。