Terug naar podcastsClaude
How to get to production faster with Claude Managed Agents
Hi everybody.
Um, I hope everybody's having a good time today.
I am Michael.
I'm a member of technical staff here at Enthropic working on cloud managed agents.
What's up everybody?
My name is Harrison and I'm also a member of technical staff working on cloud manage agents.
A lot of members of technical staff.
Yeah.
Yeah.
Um okay.
So uh today we want to talk to you about cloud manage agents.
Um but before we do that we wanted to do a quick recap over the last couple of years and the exponential that we've I think everybody in this room has been experiencing.
After that we'll uh talk a little bit about the motivations behind why we built cloud managed agents.
um followed by a deep dive into some of the primitives that we offer with cloud manage agents.
Um and then afterwards we will uh bring out some of the partners that we've been working with on some of the new features that we announced today.
Um and then we'll wrap it up with a little bit of a getting started.
Cool.
So, uh AI capabilities over the last couple of years have been on like an absolute rocket ship of like an exponential.
I think like I said everybody here has been kind of experiencing that.
Um, we started with like the Claude 3 kind of family of of uh of models.
Um, and even back then like you were starting to see the the semblance of of really capable things starting to happen.
Um, but really you you could only really get like very simple short things uh going.
Uh, then with Opus 4, we went on an absolute tear.
Um, and things like Claude Code uh started like becoming really really prominent.
Um and then uh these days with some of the newer model families that we have um we're seeing that like the bottleneck towards increasing capabilities is really the infrastructure around these models and not so much the intelligence for them.
So yeah, like I said with uh Opus 3, you could maybe have Claude like generate a test function for you.
Maybe you you would steer it a lot throughout and you were like approving every single tool that you were doing.
And then with uh Opus 4 and Claude Code be coming around um you were able to maybe have it drive an entire feature.
uh it could maybe put up a PR for you, but you're still steering it a lot throughout the way.
Um and then with uh Opus 4 uh.7, the the newest model that we have, uh like Boris mentioned earlier, people are clearing their entire backlogs um and are waking up to like a bunch of merge ready PRs, which is amazing to see.
Who doesn't love waking up in the morning to a bunch of PRs that you have to review?
Um and where we think we're seeing uh things going in the future is entire quarters worth of work being able to be getting accomplished within a couple of hours.
Um so you can imagine a full M&A pipeline u being done end to end with like an a swarm of agent teams and when these agents work for like a couple of hours uh things like prompt plus tool use are okay but really where we start uh or where we need to start get getting going is uh towards like task completion and overall uh agent infrastructure pipelines.
But in order for your agents to be able to accomplish more, they need access to more.
And that's where cloud manage agents is here to help you manage some of the complexity.
You can imagine that if you have an entire team running an M&A deal, they need access to secure credentials, internal systems.
If you're making code changes, you need access to your private GitHub repositories and the credentials that uh allow that kind of access.
And additionally, you need identity and off for your agents.
This is essentially an identifier for who they are.
Like uh you know, I I as an engineer have access to Slack and my email and a bunch of tools internally like that.
our agents are going to need access to those systems as well.
But additionally, we're seeing more and more different conversational methodologies for interacting with our agents.
The first is probably the most familiar with a lot of folks, which is you send the the agent text and it gives you a response conversationally.
But we're seeing more of a transition towards outcome oriented agentic activity.
So this is again give the M&A deal that needs to happen to the agent and the agent set and have them just go off and accomplish the task coming back to you only when they feel relatively confident that the entire activity is complete.
Additionally, as an agent platform, we would be remiss to not support other methodologies of interacting with your agents like starting an agent and then picking it up later on, maybe weeks or months in the future when you want the agent to pick back up right where it left off.
So it was very clear that um we're going to start expecting a lot out of these agents and uh our developers will as well.
Um when we were doing a bunch of research as we were starting to develop something like cloud managed agents um we saw a lot of key sticking points around infrastructure and primitive development that um uh really stood out.
So the first of which was uh figuring out things like context management and memory.
Um, these things are things that work really, really well if they are working, but if you get it wrong, it can like completely destroy how well your agents are going to work.
Um, and infrastructure concerns was another kind of like big sticking point.
It was actually the number one thing that was cited as preventing people from being able to like skate the exponential and like really benefit from these improved model intelligences.
Um, you need things like reliability, scalability, security, um, even latency starts mattering when you're having these things run in prod.
Um, and then finally, uh, none of this really matters if you don't have observability into what these things are doing.
Um, if you can't tell whether or not your agent is succeeding, uh, or doing things successfully, uh, it doesn't really matter like how do you can how can you even assess that the the thing is good.
So with cloud manage agents, we did all of that platform work um so that you don't have to so that you can kind of pick and choose the primitives that we have available out of the box uh around infrastructure agent primitives and observability all available on the cloud platform um where you can kind of pick and choose the the composable primitives that we have um and and kind of like build your product on top of them.
Cool.
So that's a lot.
How do you actually get started building with cloud manage agents?
The first step is just to define an agent.
This is essentially a bundle of configuration that identifies who your agent is and what it can do.
It's a system prompt, model, skills, tools, permissions, and generally just the identity of the thing that's actually taking the action.
Second, you need a you need an environment in which the agent will actually run.
So, really helps to give cloud access to a computer.
In this case, your agent needs a sandboxing environment where you can configure the network allow list and pre-installed packages within that environment.
When all that's ready to go, you can actually kick off the session.
Ask your agent to go and complete some piece of work and then come back to you when it's ready to rock.
And through it all, if you want to observe the agent as it's doing its thing, cooking, you can just listen to the event stream and understand what the agent is doing, why it's doing it, and generally interact with it in whatever way you see fit.
So, let's demystify what we mean when we're talking about this event stream.
Every session that you start in cloud managed agents is effectively a log of events that you um have where you or your end users are interacting with cloud and cloud's responding.
So we kind of like split up the domains of events that we have uh within the platform so that it's easier for you to kind of understand what each event means.
Um the first of which is user events.
These are things that your own end users or maybe your platform is sending to cloud managed agent sessions.
Um these could include text messages, um images, documents.
Um you can interrupt your agent if you see that it's going off course and you want to steer it back onto onto it.
um tool results for custom tools that you implement and uh execute on your end um and even confirmations for human in the loop controls for any tools that are executed on Anthropic servers.
And then finally we have outcome definitions which we'll go into a little bit more detail about later.
Next we have agent events.
Agent events are uh anything that cla um on on its side.
So this could be responding to the user with a message um executing tools on its end um or coordinating with other agents which we'll go into a little bit more detail later.
Next we have the session events.
These are just like the overall life cycle of the session itself.
So any descriptions around the status of the session changing from idle to running.
Um error recovery and information about the sort of sorts of errors that claude is running into and outcome processing.
And then finally we have span events which make it really really easy to understand when certain things are starting and ending like claude starting to write together a really really long response.
So we know that's a ton of information.
So, let's make it concrete by doing a quick demo of Pascal, a fictitious agent that's responsible for understanding a little bit more about grocery shopping habits of our users.
So, if we jump into the demo, we're going to we're going to start by showing our dashboard that's integrated with manage agents and we're kicking off an analysis run where we've clicked this analyze button in the top right.
Jumping back to the console where we can see everything that the agent is doing in real time.
We can see the list of events that are coming through the event stream, tool runs, agent events, generally understanding what's happening in real time.
On the right side, you can see our agent tech definition.
This includes the system prompt model and all of the MCP tool configuration that I was talking about earlier.
And as we click into the environment, we can also see our networking configuration as well as the packages that we've installed into our uh container.
Jumping back to our application, we can see all of this shown on our surface because all of this is exposed via an API.
And what's that?
Cloud came back, found some bits for us.
Looks like bananas are super popular, I guess.
I already know.
Uh, and also jumping forward, if you want to avoid the crowds, it turns out that Sunday is not the right time to go shopping for groceries.
But then that's not enough for us.