Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray
When people think about the ability of an AI to run your app and test it, I think they actually overindex on the computer use part of it because computer use in my mind is the literal okay you want you know a button you want to click can you emit the right coordinates to go click that button.
人们在考虑 AI 运行应用、测试它的能力时,我觉得大家普遍过于关注 computer use 这部分,因为 computer use 在我看来就是字面意思,你想点一个按钮,AI 能不能算出正确的坐标去点它。
I think testing is actually a really interesting problem solving uh challenge for these AIs because if you wanted to do arbitrary testing like imagine you make a change that spans the front end and the back end to actually test that change we have to reason through what how do you first run these applications to orchestrate with each other with the right version of the code then okay how do I trigger the feature or how do I make the thing actually happen that is where we spend most of our time
我认为测试对这些 AI 来说其实是一个很有趣的难题。如果你想做任意测试,比如你改了一处横跨前端和后端的代码,要真正测试这个改动,就必须推理清楚:先怎么把这些应用以正确的代码版本跑起来、相互编排好,然后再想怎么触发那个功能,让事情真正发生。这才是我们花最多时间的地方。
before we get into today's episode, I just have a small message for listeners.
在进入今天的节目之前,我有一条小通知要跟听众说。
Thank you.
谢谢。
We would not be able to bring you the AI engineering, science, and entertainment content that you so clearly want if you didn't choose to also click in and tune into our content.
如果没有你们选择点进来收听,我们就无法持续为你们带来这些 AI 工程、科学和娱乐内容。
We've been approached by sponsors on an almost daily basis.
每天几乎都有赞助商来找我们。
But fortunately, enough of you actually subscribe to us to keep all this sustainable without ads, and we want to keep it that way.
但幸运的是,有足够多的你们订阅了我们,让节目能在没有广告的情况下维持下去,我们也希望一直保持这样。
But I just have one favor to ask all of you.
我只有一个小小的请求。
The single most powerful, completely free thing you can do is to click that subscribe button.
最有力、完全免费的事情,就是点击订阅按钮。
It's the only thing I'll ever ask of you.
这是我唯一会向你们请求的事。
And it means absolutely everything to me and my team that works so hard to bring the inspace to you each and every week.
对我和每周努力为你们带来 Latent Space 节目的团队来说,这意义重大。
If you do it, I promise you we'll never stop working to make the show even better.
如果你订阅了,我保证我们永远不会停止努力让节目变得更好。
Now, let's get into it.
好,让我们进入正题。
All right, we're in the studio with Walden Yen, co-founder, CPO.
好的,今天我们在录音室请到了 Walden Yan,联合创始人,CPO。
Yeah.
对。
Which is cool title.
这头衔很酷。
Um, yes.
嗯,是的。
And
还有,
one coiner of context engineering.
context engineering 这个词的提出者之一。
Yes.
是的。
Yes.
是的。
Although I think there were many people who used the terms in various ways beforehand.
虽然我觉得以前也有很多人用过类似的说法。
But um I I I did find that people both internally and externally enjoyed the upgrade from font engineering or you know model wrapping into maybe a more thoughtful way to build agents.
但我发现,无论是内部还是外部,大家都觉得从「提示工程」或者说「套模型」,升级成一种更有深度的 agent 构建方式,这个词挺好用的。
Yeah.
嗯。
For uh for those who haven't caught up on that, I have on screen the don't build multi-agents post which you should read read on and we might refer to.
没跟上的朋友,屏幕上这篇是《别构建多智能体》,建议读一读,我们今天可能会提到。
And Cole Murray who created open inspect.
还有创建了 OpenInspect 的 Cole Murray。
Great to be here.
很高兴来到这里。
Okay, so let's talk about it.
好,来聊聊吧。
Everyone is building their own Devins.
每家公司都在打造自己的 Devin。
Um, what's going on?
怎么回事?
Yeah, so I think the engineering world is kind of waking up to this idea of background agents, cloud agents, uh, whatever you'd like to call it.
对,我觉得工程界正在慢慢意识到 background agent、cloud agent 这个概念,不管你叫它什么。
And I think we saw a shift around the December time frame of 2025 where the models Opus 4.5 and GPT 5.2, to they reached a capability where we moved away from kind of handholding the model and being able to actually more or less autonomously drive the model.
我认为在 2025 年 12 月前后出现了一个转折:Opus 4.5 和 GPT-5.2 的能力到位了,我们从手把手引导模型,变成了能够几乎完全自主地驱动模型。
And what I mean by that is that we could pretty much go from a specification to a completed pull request assuming the spec was good enough uh with very little friction.
具体说就是,只要 spec 质量够好,我们基本上可以从一份规格说明直接跑出一个完整的 pull request,中间几乎没有摩擦。
And that paradigm alone I think changed a lot of how we interact with agents um and kind of opened this world where background agents became more practical.
仅仅这一个范式转变,就极大地改变了我们与 agent 互动的方式,打开了 background agent 真正可实用的世界。
I think for for call everyone experienced this in December but I feel like there was just this increasing ramp right like um there was this the moment which was I think sonnet 37 where like you guys rewrote Devon in one night or something.
我觉得大家都在 12 月经历了这个,但我感觉其实有一条不断上升的曲线,其中有个节点是 Sonnet 3.7,就是你们那时在一夜之间用 Devin 重写了 Devin 来着。
Yes.
是的。
Yeah.
对。
Yeah.
对。
So describe 2025 or you know how how it felt from your side.
所以说说 2025 年,或者你们那边的感受。
In retrospect you know we always thought it was ramping up but then even now over the last 3 4 months from today we it's been ramping up even faster.
回过头来看,我们一直觉得在增长,但就在今天往前数三四个月,增速反而更快了。
So it's almost funny to be talking about how like big of a leap Sonet 3.7 was and we honestly a lot of it was stripping out parts of Devon that were no longer needed with that jump in intelligence.
所以聊 Sonnet 3.7 是多大的飞跃,现在回想起来有点好笑,那时候我们做了很多工作,是在把 Devin 里那些因为模型能力提升而变得多余的部分剥掉。
But I also just think that a lot of the recent leaps uh especially you know you look at like models like opus and latest GBT models they are reaching levels of autonomy where people are actually fighting that they actually can't just be hands off and people who were once debating
但我也认为,最近的这些飞跃,尤其是 Opus 和最新的 GPT 系列模型,已经达到了一个自主性层级,让人们真的在纠结:他们其实没办法完全放手不管了。那些曾经还在争论的人,
oh you know do I need to be in the weeds with my model in the IDE
「我还需要在 IDE 里盯着模型吗」,
um can I just completely move it off into the cloud
「我能不能直接把它扔到云上不管」,
that's a that's a more serious conversation and we've seen that in in all of our growth charts um internally there's this funny graph where our usage has uh of PRs or our merged PRs has grown 7x since I I I forget what
这已经变成一个更严肃的问题了。我们在所有内部增长图表上都看到了这一点。内部有一张很有意思的图,我们合并 PR 的数量增长了 7 倍,我忘了从哪个时间点开始算,
I think Dave uh maybe tweeted that.
我记得 Dave 好像发推提到过。
Yeah.
对。
Uh yes.
嗯,是的。
Um it grew like 7x over like the last
大概在过去……增长了 7 倍,
I think it was like 2 months, 3 months, something like that.
我记得大概是两个月、三个月吧。
Uh and then you see our engineing headcount growth.
然后看我们工程师人数的增长,
It's like gone up by like 10% or something
大概只涨了 10% 左右。
like we were we were afraid to release this.
我们当时还挺犹豫要不要公布这个数据。
So, so this is Devon commit percentages on all Devon repos uh was 16% in January and now 80% in March.
所以,Devin 在所有 Devin 仓库的 commit 占比,1 月是 16%,3 月已经到了 80%。
Yeah, it's like uh it's a big shift right now.
对,现在是一个很大的转变。
Um and so it makes sense that a lot of people are now thinking about you know buying Devon but also maybe like you know trying to build their own and and there's lots of I I have a lot of fun building Devon so I can see why other people would want to build their own cloud agents as well.
所以很多人现在想的是,要么买 Devin,要么自己动手搭。我自己构建 Devin 很有意思,所以能理解为什么别人也想做自己的 cloud agent。
Yeah.
嗯。
Well, maybe it's it's good to hear like what what initially inspired you to try to build open inspect.
那聊聊吧,是什么最初让你想动手做 OpenInspect 的。
Yeah, open inspect came about uh through primarily my clients observing how they were using tools like cloud web um openai's codeex at the time and seeing some of the friction that they were having with it.
OpenInspect 的起源,主要是我的客户在用 Claude 网页版、OpenAI 的 Codex 这类工具时,我观察到了一些摩擦。
Um, primarily the claw web uh was being used through Slack and a big issue they ran into was that these sessions that were launched were specific to whoever called it via Slack and so if a PM was the one who invoked the session and they would then go to pass context
主要是 Claude 网页版在通过 Slack 使用,他们遇到的一个大问题是:通过 Slack 触发的 session 只属于发起那个人,如果是 PM 启动了一个 session,再把上下文传给工程师,
engineering engineering can't see the session and that in itself was kind of a dealbreaker because the PM hey engineering can you jump in but there's nothing to jump in on unless they're copy pasting out or you know the single response that came back.
工程师根本看不到这个 session,这本身就是个障碍,PM 说「工程师能过来接一下吗」,但根本没有什么可以接的,除非对方把返回的那一条回复复制粘贴过来。
Um and so kind of seeing some of these problems, I had built a similar kind of architecture internally um just to experiment with um kind of test out different ideas as this trend of moving off of local host was starting to kind of become um and as RAMP released their blog post, I had a lot of the pieces for this already in place.
看到这些问题后,我内部搭了一套类似的架构来做实验,测试不同的想法。这股离开 localhost 的趋势正在形成,恰好 Ramp 发布了那篇博文,我发现很多东西已经准备好了。
Um, and just thought it would be kind of funny to uh see what Claude could do just purely from the blog post and uh on my ex account there's actually kind of a a thread of where I live tweeted like going through this.
然后就想试试看,纯粹基于那篇博文,Claude 能做到什么。我 X 账号上有一个帖子,记录了我当时边做边直播的全过程。
Oh wow.
哇。
Uh comparing GPT and Claude as both of them are going through it
对比 GPT 和 Claude,看它们各自怎么跑。
like on the announcer thing or something else.
是在那个宣布的帖子上,还是别的地方?
Uh right after it got released.
就在发布之后。
Okay.
好的。
Um we can put it in the show notes.
可以放在节目注释里。
Um
嗯,
yeah, it was helpful that I had already kind of knew how to verify the system.
那次挺有帮助的,因为我已经知道怎么验证这个系统了。
I knew what I was looking for.
我知道要找什么。
I think RAMP did a great job of really illustrating uh the technical aspects of how to build something.
我觉得 Ramp 在呈现构建技术细节方面做得非常好。
Um it was much more than just kind of like hey we built a great system.
不只是「嘿我们做了个很厉害的系统」那种感觉,
It was and here's how you can build it too.
而是「这是怎么做的,你也可以做」。
And so um I resonated a lot with that just with the problems that I was already seeing.
所以我非常认同,这些问题我自己也早就看到了。
And I thought that uh looking around I didn't really see anything in the open source community that um kind of met this type of system.
我环顾四周,在开源社区里确实没看到什么能达到这种系统要求的东西。
I think there's a lot that run uh in localhost like superset conductor um and many others but nothing that was actually running in the cloud and so um I built it and I thought it was interesting to just open source it and allow anyone to then have a foundation that they can mix and match on top of.
有很多跑在 localhost 的,比如 Superwhisper、Conductor 之类,但真正跑在云上的几乎没有。所以我做了出来,然后觉得开源挺有意思的,让任何人都能有一个可以在上面组合搭建的基础。
So literally after Devon was launched was there was open devon which became all all hands.
所以 Devin 发布之后很快就出现了 Open Devin,后来变成了 All Hands。
Uh I don't know if you tried that or
不知道你有没有试过,
Yeah.
对。
Well, I was going to say one of the things that interested me a lot with open inspect was like you didn't try to go make it then something you you monetize.
我一直觉得 OpenInspect 很有意思的一点是,你没有试着把它做成可以商业化的东西。
Um there are a lot of I think these open source projects would then go really try to like raise vis.
很多开源项目都会去融资。
Um yeah and uh how did you think about that?
你是怎么想的?
I I thought that was very interesting.
我觉得这挺有意思的。
I thought and kind of just what I had seen across my clients was that having a background agent system is going to become a critical infrastructure within their company.
我从客户那边看到的情况是,一套 background agent 系统会成为公司的关键内部基础设施。
And so because of that, I think that I wanted to open source it so that they could fork it and put in whatever customization they wanted.
正因为如此,我想把它开源,让他们可以 fork 下来,按需定制。
Um to that question though, I get asked all like, "Oh, are you going to raise are you going to turn this into a service?"
但关于这个问题,大家也会问我:「你要不要融资,要不要做成服务?」
I'm sure you've gotten offers.
肯定有人来谈过。
Uh but uh primarily I don't want to do that for a few reasons.
主要有几个原因让我不想这么做。
One, I think that I don't want to compete for like $20 a seat.
第一,我不想在 $20/seat 这个价位竞争。
I think that that is just a really difficult business.
那真的是一门很难做的生意。
I think it's very easy to copy the main pieces of it.
核心的东西很容易被复制。
I mean, again, like I built this fairly quickly and I think because you are not owning I guess the entire stack, it's hard to monetize.
我做这个挺快的,而且你没有掌控整个技术栈,就很难变现。
You have money being made at the sandbox layer with Daytona, E2B, many other players.
沙箱层有 Daytona、E2B 这些玩家在挣钱,
You have money being made at the model layer.
模型层也有人挣钱,
Um, and you kind of sit in this weird in between gray area where what are you actually selling?
你夹在中间的灰色地带,实际上在卖什么?
You're selling I guess the infrastructure.
卖基础设施?
You're selling uh the integrations maybe.
卖集成能力?
Um, let's ask the guy what are you what are you selling?
问问这位大佬,你们在卖什么?