How to get to production faster with Claude Managed Agents
Hi everybody.
大家好。
Um, I hope everybody's having a good time today.
希望大家今天玩得开心。
I am Michael.
我是 Michael。
I'm a member of technical staff here at Enthropic working on cloud managed agents.
我在 Anthropic 做 Claude Managed Agents。
What's up everybody?
大家好!
My name is Harrison and I'm also a member of technical staff working on cloud manage agents.
Harrison,也负责 Claude Managed Agents。
A lot of members of technical staff.
技术人员真不少。
Yeah.
对。
Yeah.
是的。
Um okay.
嗯,好的。
So uh today we want to talk to you about cloud manage agents.
今天聊 Claude Managed Agents。
Um but before we do that we wanted to do a quick recap over the last couple of years and the exponential that we've I think everybody in this room has been experiencing.
在此之前,先带大家快速回顾过去几年的发展,相信在座各位都亲历了那条指数级增长曲线。
After that we'll uh talk a little bit about the motivations behind why we built cloud managed agents.
之后我们会聊聊为什么要做 Claude Managed Agents。
um followed by a deep dive into some of the primitives that we offer with cloud manage agents.
再深入介绍 Claude Managed Agents 提供的核心原语。
Um and then afterwards we will uh bring out some of the partners that we've been working with on some of the new features that we announced today.
然后我们会请几位合作伙伴上台,聊聊今天发布的新功能。
Um and then we'll wrap it up with a little bit of a getting started.
最后简单介绍一下如何上手。
Cool.
好的。
So, uh AI capabilities over the last couple of years have been on like an absolute rocket ship of like an exponential.
过去几年,AI 能力一直在以惊人的速度指数级增长。
I think like I said everybody here has been kind of experiencing that.
在座每个人应该都有这种切身感受。
Um, we started with like the Claude 3 kind of family of of uh of models.
我们从 Claude 3 系列模型起步。
Um, and even back then like you were starting to see the the semblance of of really capable things starting to happen.
那时候已经能看到一些真正有能力的事情开始发生了。
Um, but really you you could only really get like very simple short things uh going.
但当时能跑起来的基本上只是一些简单的短任务。
Uh, then with Opus 4, we went on an absolute tear.
到了 Opus 4,我们开始突飞猛进。
Um, and things like Claude Code uh started like becoming really really prominent.
Claude Code 这样的工具也开始真正火起来。
Um and then uh these days with some of the newer model families that we have um we're seeing that like the bottleneck towards increasing capabilities is really the infrastructure around these models and not so much the intelligence for them.
现在有了更新的模型系列,瓶颈已经不是模型智能本身,而是这些模型周围的基础设施。
So yeah, like I said with uh Opus 3, you could maybe have Claude like generate a test function for you.
Opus 3 的时候,大概也就是让 Claude 帮你写个测试函数。
Maybe you you would steer it a lot throughout and you were like approving every single tool that you were doing.
你需要全程大量介入,每个工具调用都要审批。
And then with uh Opus 4 and Claude Code be coming around um you were able to maybe have it drive an entire feature.
到了 Opus 4 和 Claude Code 出来,就能让它驱动完成一整个功能了。
uh it could maybe put up a PR for you, but you're still steering it a lot throughout the way.
它能帮你提 PR,但你还是要全程把控方向。
Um and then with uh Opus 4 uh.7, the the newest model that we have, uh like Boris mentioned earlier, people are clearing their entire backlogs um and are waking up to like a bunch of merge ready PRs, which is amazing to see.
到了 Opus 4.7,也就是我们最新的模型,就像 Boris 刚才提到的,大家早上醒来发现积压全清了,一堆 merge-ready 的 PR 等着你,真的很震撼。
Who doesn't love waking up in the morning to a bunch of PRs that you have to review?
谁不爱早上一起床就看到一堆 PR 要 review?
Um and where we think we're seeing uh things going in the future is entire quarters worth of work being able to be getting accomplished within a couple of hours.
我们预判的方向是,未来一整季的工作量,可以在几个小时内完成。
Um so you can imagine a full M&A pipeline u being done end to end with like an a swarm of agent teams and when these agents work for like a couple of hours uh things like prompt plus tool use are okay but really where we start uh or where we need to start get getting going is uh towards like task completion and overall uh agent infrastructure pipelines.
想象一下,一整个并购流程由一组智能体团队从头到尾完成。这些智能体工作几小时,光靠提示词加工具调用还勉强够用,但真正要推进任务完成和整体智能体基础设施流水线,我们就需要往前一步了。
But in order for your agents to be able to accomplish more, they need access to more.
但要让智能体能做更多事,它们需要访问更多资源。
And that's where cloud manage agents is here to help you manage some of the complexity.
Claude Managed Agents 管这些复杂度。
You can imagine that if you have an entire team running an M&A deal, they need access to secure credentials, internal systems.
想象一下,一整个团队在跑并购交易,需要访问安全凭证和内部系统。
If you're making code changes, you need access to your private GitHub repositories and the credentials that uh allow that kind of access.
改代码的话,需要访问私有 GitHub 仓库和相应的认证凭证。
And additionally, you need identity and off for your agents.
另外,智能体还需要身份认证和授权。
This is essentially an identifier for who they are.
这本质上是个标识符,代表智能体的身份。
Like uh you know, I I as an engineer have access to Slack and my email and a bunch of tools internally like that.
就像我作为工程师,能访问 Slack、邮件和一堆内部工具。
our agents are going to need access to those systems as well.
我们的智能体也同样需要访问这些系统。
But additionally, we're seeing more and more different conversational methodologies for interacting with our agents.
另外,智能体的交互方式也越来越多样化。
The first is probably the most familiar with a lot of folks, which is you send the the agent text and it gives you a response conversationally.
第一种大家最熟悉,就是发文本给智能体,它对话式地回应你。
But we're seeing more of a transition towards outcome oriented agentic activity.
但我们看到越来越多的转变,朝着结果导向的智能体活动方向走。
So this is again give the M&A deal that needs to happen to the agent and the agent set and have them just go off and accomplish the task coming back to you only when they feel relatively confident that the entire activity is complete.
比如把并购交易直接交给智能体,让它自己去跑完,完成后再回来汇报结果。
Additionally, as an agent platform, we would be remiss to not support other methodologies of interacting with your agents like starting an agent and then picking it up later on, maybe weeks or months in the future when you want the agent to pick back up right where it left off.
作为智能体平台,我们也支持其他交互方式,比如启动智能体后过几周甚至几个月再回来接着跑,从上次的断点继续。
So it was very clear that um we're going to start expecting a lot out of these agents and uh our developers will as well.
显然,我们对这些智能体的期望越来越高,开发者也一样。
Um when we were doing a bunch of research as we were starting to develop something like cloud managed agents um we saw a lot of key sticking points around infrastructure and primitive development that um uh really stood out.
在开发 Claude Managed Agents 过程中,我们做了大量调研,发现基础设施和原语开发上有几个关键卡点。
So the first of which was uh figuring out things like context management and memory.
第一个是上下文管理和记忆的问题。
Um, these things are things that work really, really well if they are working, but if you get it wrong, it can like completely destroy how well your agents are going to work.
做好了效果很好,但一旦出错,整个智能体的表现就会崩掉。
Um, and infrastructure concerns was another kind of like big sticking point.
基础设施问题也是另一个重要卡点。
It was actually the number one thing that was cited as preventing people from being able to like skate the exponential and like really benefit from these improved model intelligences.
它被引用为头号障碍,阻止大家跟上指数曲线,真正从模型智能的提升中获益。
Um, you need things like reliability, scalability, security, um, even latency starts mattering when you're having these things run in prod.
你需要可靠性、可扩展性、安全性,到了生产环境,延迟也开始变得很重要。
Um, and then finally, uh, none of this really matters if you don't have observability into what these things are doing.
最后,如果对智能体在做什么没有可观测性,其他都白说。
Um, if you can't tell whether or not your agent is succeeding, uh, or doing things successfully, uh, it doesn't really matter like how do you can how can you even assess that the the thing is good.
如果无法判断智能体是否在正常运行,你根本没法评估它到底有没有用。
So with cloud manage agents, we did all of that platform work um so that you don't have to so that you can kind of pick and choose the primitives that we have available out of the box uh around infrastructure agent primitives and observability all available on the cloud platform um where you can kind of pick and choose the the composable primitives that we have um and and kind of like build your product on top of them.
Claude Managed Agents 把这些平台层的工作全都做完了,让你不用自己操心,直接选用我们开箱即用的基础组件,包括基础设施、智能体原语和可观测性,都在云平台上,可以按需组合,在上面构建你的产品。
Cool.
好。
So that's a lot.
内容挺多的。
How do you actually get started building with cloud manage agents?
Claude Managed Agents 入门?
The first step is just to define an agent.
第一步就是定义一个智能体。
This is essentially a bundle of configuration that identifies who your agent is and what it can do.
本质上就是一份配置,定义了你的智能体是谁、能做什么。
It's a system prompt, model, skills, tools, permissions, and generally just the identity of the thing that's actually taking the action.
包括系统提示词、模型、技能、工具、权限,以及执行操作的主体身份。
Second, you need a you need an environment in which the agent will actually run.
第二,你需要一个智能体实际运行的环境。
So, really helps to give cloud access to a computer.
说白了就是给 Claude 访问一台电脑。
In this case, your agent needs a sandboxing environment where you can configure the network allow list and pre-installed packages within that environment.
智能体需要一个沙盒环境,你可以在里面配置网络白名单和预装的软件包。
When all that's ready to go, you can actually kick off the session.
准备好之后,就可以启动会话了。
Ask your agent to go and complete some piece of work and then come back to you when it's ready to rock.
让智能体去完成某项工作,完成后回来汇报。
And through it all, if you want to observe the agent as it's doing its thing, cooking, you can just listen to the event stream and understand what the agent is doing, why it's doing it, and generally interact with it in whatever way you see fit.
全程如果想观察智能体在干什么,直接监听事件流就行,能看到它在做什么、为什么这么做,也可以随时介入。
So, let's demystify what we mean when we're talking about this event stream.
来解释一下事件流到底是什么。
Every session that you start in cloud managed agents is effectively a log of events that you um have where you or your end users are interacting with cloud and cloud's responding.
在 Claude Managed Agents 中启动的每个会话,本质上就是一段事件日志,记录了你或终端用户与 Claude 的交互,以及 Claude 的响应。
So we kind of like split up the domains of events that we have uh within the platform so that it's easier for you to kind of understand what each event means.
我们按域对平台内的事件进行了分类,方便理解每个事件的含义。
Um the first of which is user events.
第一类是用户事件。
These are things that your own end users or maybe your platform is sending to cloud managed agent sessions.
这些是你的终端用户或平台发送给 Claude Managed Agents 会话的内容。
Um these could include text messages, um images, documents.
可以是文本消息、图片、文档。
Um you can interrupt your agent if you see that it's going off course and you want to steer it back onto onto it.
如果智能体跑偏了,你可以打断它,把它拉回正轨。
um tool results for custom tools that you implement and uh execute on your end um and even confirmations for human in the loop controls for any tools that are executed on Anthropic servers.
还有你自己实现并在本地执行的自定义工具的工具结果,以及在 Anthropic 服务器上执行工具时人工审批的确认。
And then finally we have outcome definitions which we'll go into a little bit more detail about later.
最后是结果定义,稍后会详细介绍。
Next we have agent events.
接下来是智能体事件。
Agent events are uh anything that cla um on on its side.
智能体事件是 Claude 自己这边的动作。
So this could be responding to the user with a message um executing tools on its end um or coordinating with other agents which we'll go into a little bit more detail later.
比如向用户回复消息、在自己端执行工具,或者与其他智能体协调,后面也会详细说。
Next we have the session events.
接下来是会话事件。
These are just like the overall life cycle of the session itself.
这些是会话本身整体生命周期的记录。
So any descriptions around the status of the session changing from idle to running.
比如会话状态从空闲变为运行中。
Um error recovery and information about the sort of sorts of errors that claude is running into and outcome processing.
还有错误恢复信息,以及 Claude 遇到的各类错误,以及结果处理。
And then finally we have span events which make it really really easy to understand when certain things are starting and ending like claude starting to write together a really really long response.
最后是 span 事件,帮助你清楚了解某些操作的开始和结束时间,比如 Claude 开始生成一段很长的回复。
So we know that's a ton of information.
信息确实挺多的。
So, let's make it concrete by doing a quick demo of Pascal, a fictitious agent that's responsible for understanding a little bit more about grocery shopping habits of our users.
来看个具体的例子,这是 Pascal,一个虚构的智能体,负责分析用户的杂货购物习惯。
So, if we jump into the demo, we're going to we're going to start by showing our dashboard that's integrated with manage agents and we're kicking off an analysis run where we've clicked this analyze button in the top right.
进入 demo,我们会先展示与 Managed Agents 集成的控制台,点击右上角的「分析」按钮,启动一次分析任务。
Jumping back to the console where we can see everything that the agent is doing in real time.
回到控制台,可以实时看到智能体在做什么。
We can see the list of events that are coming through the event stream, tool runs, agent events, generally understanding what's happening in real time.
能看到事件流里涌入的事件列表,工具运行、智能体事件,实时了解整个过程。
On the right side, you can see our agent tech definition.
右侧可以看到智能体的技术定义。
This includes the system prompt model and all of the MCP tool configuration that I was talking about earlier.
包括系统提示词、模型,以及我刚才提到的所有 MCP 工具配置。
And as we click into the environment, we can also see our networking configuration as well as the packages that we've installed into our uh container.
点进环境,还能看到网络配置以及我们安装到容器里的软件包。
Jumping back to our application, we can see all of this shown on our surface because all of this is exposed via an API.
回到我们的应用,这些都通过 API 暴露出来,所以都能在界面上展示。
And what's that?
那是什么?
Cloud came back, found some bits for us.
Claude 回来了,帮我们找到了一些结论。
Looks like bananas are super popular, I guess.
看来香蕉特别受欢迎。
I already know.
早就知道了。
Uh, and also jumping forward, if you want to avoid the crowds, it turns out that Sunday is not the right time to go shopping for groceries.
还有一个结论,如果你想避开人流,周日不是去超市购物的好时机。
But then that's not enough for us.
但这还不够。