Back to PodcastsClaude
Coding is no longer the constraint: Scaling devex to teams and agents at Spotify
[Music starts]
[音乐开始]
[Music ends]
[音乐结束]
Hey everyone. Yeah, so I'm Niklas. It was very surprising to see my face on screen earlier because I had completely forgotten that Boris was gonna mention Spotify as part of the keynote. So, I'm here to give you a bit of a, uh, run down on how we're approaching the AI transition at Spotify.
大家好。我是 Niklas。之前看到屏幕上出现我的脸,我很意外,因为我完全忘了 Boris 会在主题演讲中提到 Spotify。所以我今天来这里,就是给大家简单介绍一下我们在 Spotify 是如何应对 AI 转型的。
So let me start with a little bit of an introduction to Spotify. Uh, anyone in here who's a Spotify user? Oh, lots of hands. Good. Uh, so we're a fairly sizeable engineering org at this point, close to 3,000 engineers.
先让我简单介绍一下 Spotify。在座有多少 Spotify 用户?哦,很多人举手,很好。我们目前是一个相当大规模的工程组织,工程师人数接近 3000 人。
We've spent many years trying to optimize our developer experience and how we build products. We try to make sure that it's as easy as possible to deploy and ship changes to our users. One way to illustrate that is that we do around 4,500 deployments every day to our production environment.
我们花了很多年时间优化开发体验以及产品构建方式,努力确保向用户部署和交付变更尽可能轻松。举个例子:我们每天向生产环境进行约 4500 次部署。
We run on a mix of repositories. I'll come back to this later. Some are very large monorepos. Our backend is in a 40 million lines of code monorepo, and then we have lots and lots of smaller polyrepos, thousands of them.
我们的代码库采用混合仓库模式,稍后我会详细介绍。其中有一些非常大的 monorepo,我们的后端就在一个拥有 4000 万行代码的 monorepo 中,此外还有大量规模较小的 polyrepo,多达数千个。
The AI transition for us has been a journey of very rapid adoption curves. We roll out tools internally all the time to make our developers more productive. But we have never seen the rate of adoption that we've seen rolling out, uh, AI coding tools.
对我们来说,AI 转型是一段采用曲线极速上升的旅程。我们一直在内部持续推出各种工具来提升开发者的生产力,但从未见过像 AI 编程工具这样迅猛的采用速度。
And you can see in particular how Claude Code, orange in this diagram, completely exploded. It's a little bit hard to see due to the holiday break, but it it really happened around the Opus 3.5 release in November last year.
尤其可以看到 Claude Code 在图中橙色部分是如何呈爆炸式增长的。因为假期的关系稍微难以辨认,但真正的爆发点是去年 11 月 Opus 3.5 发布前后。
And since then, growth and usage of, uh, Claude in particular but AI tools in general has gone completely bananas. And today more than 99% of our engineers use AI coding tools every week.
从那之后,Claude 乃至整体 AI 工具的使用量就完全失控了。如今,超过 99% 的工程师每周都在使用 AI 编程工具。
And we do a recurring engineering survey to all our engineers, and in the latest one who just came—just came in last week, 94% of our engineers reports that using AI tooling has helped them become more productive. And that's with a record high self-assessed productivity.
我们会定期向所有工程师做工程调查,上周刚刚收到的最新结果显示,94% 的工程师表示 AI 工具帮助他们变得更高效,同时自评生产力也创下历史新高。
We can also look at productivity in other ways. One way is to look at PR frequency as a proxy for—for how fast and how much we're able to ship. We're seeing today an increase of 76% in PR frequency.
我们还可以从其他维度衡量生产力。一种方式是将 PR 频率作为衡量交付速度和交付量的代理指标。目前我们的 PR 频率提升了 76%。
As I was working on these slides over the last two weeks, I had to change this number because it keeps growing all the time. And by now, by far most of the PRs that we ship are authored by an AI agent together with the developer.
过去两周我在准备这些幻灯片时,不得不一再修改这个数字,因为它一直在增长。而且现在,绝大多数 PR 都是由 AI 代理与开发者协作完成的。
One thing you can see in this curve if you look, you can—it's actually hard to see here, but it—the—the number of PRs has been very slowly growing over a longer period of time, but you can see that jump again happening around the Opus 3.5 release. That was when this took up for—took off for us.
如果仔细观察这条曲线,实际上有点难以辨认,但 PR 数量在较长一段时间里一直在缓慢增长,你能明显看到那个跳跃,同样出现在 Opus 3.5 发布前后,那是我们真正起飞的时刻。
So this of course then means that we also have an explosive growth in our code. Uh, luckily that's something that we came prepared for. We've seen this for a long time also prior to AI.
当然,这也意味着我们的代码量出现了爆炸式增长。好在这不是我们没有预料到的,在 AI 出现之前我们就已经观察到这一趋势了。
In fact, a few years ago, uh, we noticed that our codebase, our production codebase, was growing seven times faster than the number of engineers. So that meant that engineers would spend more and more of their time maintaining our existing codebase compared to being able to build new features and value for our users.
事实上,几年前我们就发现,我们的生产代码库的增长速度是工程师数量增长速度的 7 倍。这意味着工程师将越来越多的时间花在维护现有代码库上,而不是构建新功能、为用户创造价值。
So we realized that we needed to fix this. So we started, uh, an effort to automate as much of that maintenance as possible. A lot of that maintenance comes down to pretty dull things that we just need to do, you know, migrate from this version to this version, deprecate this API, fix this security vulnerability, those types of things.
所以我们意识到必须解决这个问题,于是启动了一项尽可能将维护工作自动化的行动。大量维护工作归根结底都是些枯燥的事情:从这个版本迁移到那个版本、废弃某个 API、修复某个安全漏洞,诸如此类。
But that took a lot of time for our developers. And the way we typically did those migrations back then was to send out some migration path to hundreds of teams saying, "Hey, you need to upgrade from this Java version to this Java version, uh, for your components."
但这些工作占用了开发者大量时间。我们过去通常的做法是向数百个团队发送迁移指南,告诉他们需要把组件从这个 Java 版本升级到那个 Java 版本。
The teams would go ahead and do that, and this would typically take us months to complete one of those upgrades across many thousands of components. That was not fun for anyone.
各团队随后照做,但通常完成一次跨数千个组件的升级需要好几个月。这对任何人来说都不好受。
In—in that same engineering survey back then, migrations was the top thing that users or, sorry, our developers were frustrated about. So, we imagined instead of doing this like component per component and fairly manually, can we imagine a way where we do this as a way to mutate our entire fleet of components?
在那时的工程调查中,迁移是开发者最头疼的事情。于是我们设想:与其逐个组件手动操作,能否找到一种方式,对整个代码库的所有组件进行批量变更?
Figure out the way to do that. And we—and we built this out, built out the infrastructure for this, something we call fleet management and the underlying system that we use is called FleetShift.
我们找到了方法,并构建了相应的基础设施,我们称之为 fleet 管理,底层系统叫做 Fleetshift。
And today up—up until today we've now merged two and a half million of those automated maintenance PRs. Work that our developers did not have to do.
截至今天,我们已通过这套系统合并了 250 万条自动化维护 PR,这些工作全都不需要开发者亲自动手。
The vast majority of those, the green part of this graph, have been automerged. So there's no human in the loop. It's automation creating the—the PR to begin with, automation validates that PR is safe to merge, and then go ahead and merge it without any developer needing to care about that change. This happens every day. We ship thousands of these every day.
其中绝大多数都是自动合并的,也就是图中绿色部分,整个过程无需人工介入:自动化系统创建 PR,验证其安全可合并,然后直接合并,开发者完全不需要关心。这每天都在发生,我们每天提交数千条这样的 PR。
So, that was all pre-AI. Uh, and one thing that we noted pretty quickly was that this works really well for simple changes. That might be changes to configuration, it might be bumping some dependency in your build file, those types of things. Works great.
以上说的都是 AI 出现之前的情况。我们很快发现,这套系统对于简单变更非常有效,比如配置改动、构建文件中的依赖版本升级之类的,效果非常好。
But once you get into a little bit more complex changes, like replacing API calls, those types of things, the scripts that we used to run these shifts across our fleet became incredibly complicated.
但一旦涉及稍微复杂的变更,比如替换 API 调用,我们用来在整个代码库中执行这些迁移的脚本就会变得极其复杂。
Code, as it turns out, has a very, very wide API surface. There are many, many ways to achieve the same thing if it's just calling a method. And when you write that script and you run that across millions of lines of code and thousands of components, you are going to find every corner case.
代码的 API 表面实际上非常宽泛,调用同一个方法可以有很多种写法。当你把这个脚本跑在数百万行代码和数千个组件上时,你会遇到每一种边缘情况。
And you need to deal with that in your migration script. There's even a word for—a term for this, it's called Hyrum's Law, coming from an engineer at Google that discovered this many years before we then ran into it.
你需要在迁移脚本中处理所有这些情况。这甚至有个专门的术语,叫做 Hyrum's Law,来自 Google 的一位工程师,他在我们遇到这个问题很多年前就发现了它。
So, pretty early on as LLMs came about, we figured that, hey, instead of writing these deterministic scripts to do these code modifications, can we use an LLM for this? So very early we started iterating on trying to do this, prior to Claude and—and similar tools.
所以,在 LLM 兴起之初,我们就想到:与其编写这些确定性脚本来做代码变更,能不能用 LLM 来做?于是我们很早就开始探索这条路,那时 Claude 和类似工具还没有出现。
And we noticed it was challenging initially. The models were just too stupid. The way we were trying to do it was just too stupid. But over time on many iterations we started to figuring out the patterns for it, and the models got better.
最初确实很有挑战性,模型太弱了,我们的方式也太粗糙了。但经过多次迭代,我们逐渐摸索出了规律,模型也越来越强。
Out of this came a tool that we now called Honk. Uh, Boris mentioned this this morning, has a silly name, uh, and—and a silly icon, but it's a very useful tool as it turns out. And Honk is really the result of all of these iterations of us try—trying different ways of solving this problem of, like, automating these still relatively simple code changes, but again applied over many, many variants of code.
由此诞生了一个工具,我们把它叫做 Honk。Boris 今天上午提到过,名字有点傻,图标也有点傻,但事实证明它非常好用。Honk 是我们反复迭代的产物,用于解决如何自动化处理这些相对简单但需要应对大量代码变体的代码变更问题。
It started out very differently, but today Honk is—has Claude under the hood using the Agent SDK. And it wraps up the Agent SDK in—inside our own harness, inside a Kubernetes pod so we can schedule many of these running in our cloud environment.
Honk 最初的样子和现在完全不同,但今天它的底层是 Claude,使用 Agent SDK 驱动,并将 Agent SDK 封装在我们自己的调度框架内,运行在 Kubernetes Pod 中,可以在云环境里并发调度大量任务。
And we give it access to a set of trusted tools. Uh, the chart here just says the verification tools, but there's actually more tools that it has available to it.
我们为它提供了一套可信工具。图表上显示的是验证工具,但实际上它可以使用的工具不止这些。
And for verification, it's able to run builds of the code, running in our CI environment. So one thing that is important to us is that we can run our builds across multiple operating systems, for example, because our clients runs on many different operating systems. So Honk has available tools that it can use to verify that its changes are correct.
在验证方面,它能够在我们的 CI 环境中运行代码构建。对我们来说很重要的一点是,我们需要在多个操作系统上运行构建,因为我们的客户端运行在很多不同的操作系统上。所以 Honk 拥有可用的工具来验证其所做的变更是否正确。
And again, we run many of these every day. And then we integrate this into that fleet management tooling that I mentioned before. So, we use FleetShift, our tool that I was showing the graph before, to schedule and orchestrate these changes across our thousands of repositories.
我们每天都会运行大量这样的任务,并将其集成到我之前提到的 fleet 管理工具中。我们使用 Fleetshift,就是我之前展示过那张图的工具,来调度和编排跨数千个代码库的这些变更。
And Honk sits in the middle doing the actual code changes.
而 Honk 则位于中间,负责实际的代码变更。
And it might look something like this. In this case, this is a fairly small migration targeting 39 repositories. But for a team that owns this, they can go in and see what's the status of this particular shift today, how many PRs has been created, how many has been merged, how many failed in CI so I need to take a look at them, those types of things.
大概是这个样子。这个例子是一个相对较小的迁移,目标是 39 个代码库。负责这项工作的团队可以进去查看当前这个迁移任务的状态:创建了多少个 PR、已合并多少、有多少在 CI 中失败了需要关注,诸如此类。
And as Boris mentioned this morning, we're seeing pretty significant, uh, time savings from this. What used to be what I described before, hundreds of teams doing migrations for their components, taking weeks and weeks or months, now can be done by a single engineer in a few days.
正如 Boris 今天上午提到的,我们从中获得了相当显著的时间节省。过去需要数百个团队花费数周乃至数月完成的迁移,现在一个工程师几天就能搞定。
The latest Java migration that we did—we run our backend mostly on Java on the JVM—the latest Java migration we did took three days using these tools.
我们最近做的一次 Java 迁移,我们的后端主要跑在 JVM 上的 Java,用这些工具只花了三天时间。
And we're making this now available. So we have a commercial offering for other companies through our Backstage developer portal, and we're making this available as a product in that packaging. So if this is something that is relevant for your company, you can take a look there.
我们现在正在将这套能力开放出来。通过我们的 Backstage 开发者门户,我们为其他公司提供了商业化产品。如果这对你们公司有价值,可以去了解一下。
But as it turns out, developers are very resourceful and innovative. So pretty quickly they—folks figured out that, hey, hm, this Honk thing that we run for all these migrations, how about I figure out how I can call that over Slack and have it do things for me that way?
不过,开发者其实非常有创造力。很快就有人想到:我们跑的这个 Honk 专门用来做各种迁移,我能不能通过 Slack 调用它、让它帮我做一些事情?
So similar to how you might invoke Claude or other tools over Slack, you can do that with Honk at Spotify as well. And it's become a very common way that people will have a Slack conversation for something, then just at-mention Honk. Honk goes off and work on that and comes back with a PR.
就像你可能会通过 Slack 调用 Claude 或其他工具一样,在 Spotify 你也可以这么用 Honk。现在这已经成为一种非常常见的使用方式:在 Slack 里讨论某件事时,直接 @Honk,Honk 就会去处理,然后带着一个 PR 回来。
So we're seeing more and more of these patterns evolve around Honk. And in fact, yesterday we released Honk V2. The V2 versioning is a little bit off because I think it's actually like the eighth version of Honk, but I don't know what we did with the versioning, but it doesn't matter too much.
我们看到越来越多这样围绕 Honk 的使用模式不断涌现。事实上,昨天我们发布了 Honk V2。这个版本号有点名不副实,因为这实际上大概已经是第八个版本了,但不管怎样,这不太重要。
So- so this week we have Hack Week at Spotify, and we released the Alpha of Honk V2, which is a pretty significant addition of features for Honk. And it really now builds towards this world where developers are using it more interactively.
本周是 Spotify 的 Hack Week,我们发布了 Honk V2 的 Alpha 版,为 Honk 增加了相当多的新功能。它现在真正开始走向一个让开发者可以更交互式地使用它的世界。
So we've integrated it with our agent orchestration tool that we call Chirp. This is similar to, uh, what you can do with Claude agents or with Agent SDK or similar tools, but this is a little bit more features and it's integrated into our infrastructure.
我们将它与我们称为 Chirp 的代理编排工具进行了集成。这类似于 Claude 代理或 Agent SDK 等工具能做到的事情,但功能上更丰富,并且与我们的基础设施深度集成。
This is the way that you can run many, many agent sessions at the same time and coordinate those—those types of things. And Honk is built into that so you can use Chirp to schedule Honk jobs, uh, for example.
通过这个工具,你可以同时运行大量 agent 会话并对其进行协调。Honk 已经内置在其中,你可以用 Chirp 来调度 Honk 任务。
You can also collaborate with other developers on shared sessions. So instead of it being you in front of your agent, you're now sharing that agent session with more people and you can collaborate on that and give feedback and ideas and whatnot on that. So basically imagine, uh, Google Docs or something similar, but for Claude.
你还可以与其他开发者在同一个会话中协作。不再是你单独面对 agent,而是与多人共享这个 agent 会话,可以在其中协作、提供反馈和想法。基本上想象一下 Google Docs 之类的工具,但用于 Claude。
And that then also groups up into larger efforts. So imagine you're working on a completely new feature or product, you're working with a team on that, you can have a project that you're sharing and in that you can have many sessions with Honk that where you collaborate over, um, working towards whatever that goal is.
而这些会话还可以汇聚成更大的工作项。比如你正在开发一个全新功能或产品,与团队协作推进,你可以创建一个大家共享的项目,在其中开启多个 Honk 会话,一起朝着目标推进。
This is also available on any device and whatnot, so users can—users can use them from wherever they are. And lots, lots more features that we're rolling out going forward. We're very excited about Honk V2 and in particular I'm gonna say personally I'm very excited about these like multiplayer features of imagining how agents actually collaborates in—with multiple developers and teams.
这在任何设备上都可以使用,用户可以随时随地访问。我们还有很多功能正在陆续推出。我们对 Honk V2 非常兴奋,尤其是这些多人协作功能,我个人非常期待想象多个代理如何真正与多位开发者和团队协同工作的场景。
All right, let me switch gears a little bit. So I want to also talk about how we try to optimize our codebase to make agents as effective as possible in our code. So we've had for many, many years, more than—I've been at Spotify for 15 years and this happened prior to I arrived, so I don't know actually know exactly how old it is, but we've had this belief on the fewer technologies that we use, the faster we will be able to go.
好的,让我换个角度聊聊。我还想谈谈我们如何优化代码库,让代理在我们的代码中尽可能高效地工作。多年来,我们一直坚信,使用的技术越少,我们就能走得越快,这个理念在我加入 Spotify 之前就已存在,我在 Spotify 已经工作了 15 年。
And this basically comes down to a few different aspects. One, if we have a set of technologies that we know really, really well, we're really, like, deep experts on them, we will be able to build better things on top of those.
这背后有几个维度的考量。第一,如果我们对一套技术非常熟悉、是真正的深度专家,我们就能在这些技术上构建出更好的东西。
We can also eliminate a lot of small decisions for teams. Instead of having to pick the technology for everything you're building, there's a ready set—ready set of, uh, technologies available to you that basically hopefully solves your problem.
这也可以为团队省去大量小决策。不需要为每件事都从头选技术,已经有一套现成的技术方案,大概率能解决你的问题。
It also means that it's much easier to collaborate. If you're working with some other team on their components and their components look roughly the same as yours, it's gonna be easier for you to contribute to those.
这也意味着协作更容易。如果你在跨团队合作,对方的组件和你的大体相似,那么为他们的代码做贡献也会更顺畅。
And similarly, if we need to move components around or move developers move to a different team, things look roughly the same over there. So if you look at our—if you look at a typical backend service at Spotify, they will all look very similar. Same technology stack, roughly the same design patterns and so on.
同样,如果需要迁移组件或者开发者换组,到了新地方看到的代码也大体相同。如果你看 Spotify 的典型后端服务,它们看起来都非常相似:相同的技术栈,大致相同的设计模式,等等。
And we think this makes sense for agents as well. So for many years now, we've been driving towards a more and more standardized stack. Less unnecessary variance at least. We want some level of variance.
我们认为这对代理同样有意义。多年来,我们一直在推动技术栈的标准化,减少不必要的差异,当然,我们还是希望保留一定程度的差异。
We want to be—we want to experiment with new technologies, evaluate new things that could be good for us, but we don't want to do that, uh, willy-nilly. We want to be intentional about it. And we—we see that this leads to more effective teams at Spotify.
我们希望探索新技术、评估对我们有价值的新事物,但不希望随意为之,而是要有意识地去做。我们看到这让 Spotify 的各个团队都更加高效。
And we believe that it also leads to more effective agents. Simply if Claude has a lot of other code to look at and that code looks roughly consistent, Claude will do better job. That's what we're seeing. And we actually have codebases that are—that are more fragmented, and we can actually see Claude perform worse in those codebases.
我们相信这同样能让代理更高效。简单来说:如果 Claude 有大量其他代码可以参考,且这些代码风格大致一致,Claude 就会表现得更好。这是我们实际观察到的结果。我们确实有一些代码库更加碎片化,也能明显看到 Claude 在那些代码库中表现更差。
And the starting point for this is—I mentioned Backstage before. Backstage is our developer portal. It used to be that it provided this single pane of glass for us developers. Prior to Backstage within Spotify, we had, I think, roughly like a hundred different tools that you as a developer would go to.
这一切的起点是 Backstage,我之前提到过。Backstage 是我们的开发者门户。过去,它为开发者提供了一个统一视图。在 Backstage 出现之前,Spotify 内部大概有一百种不同的工具,开发者需要到处去查。
There was one tool to check your deployments, one to look at CI, one to look at AB tests, and whatnot. And it was very, very confusing. All of those tools were kind of shit as well, like they weren't particularly good.
查部署状态要去一个工具,看 CI 要去另一个,AB 测试又是另一个,非常混乱。而且那些工具也都不太好用,说实话挺差的。
So we—we thought there was an opportunity to consolidate this and provide a better experience for our developers. And it really started with this notion of a catalog of all our software. I mentioned before that we have thousands of components in production, and Backstage came about just as a way to know who owns a—one of those components.
所以我们认为有机会整合这些,为开发者提供更好的体验。这真正始于一个理念:建立一份涵盖我们所有软件的目录。我之前提到我们的生产环境中有数千个组件,Backstage 最初就是为了解决一个问题:知道某个组件由谁负责。
Let's say we have an incident and I need to be able to page someone on the owner of that team. I—before Backstage, I couldn't even figure out who that owner was. So it started as a way to just having a catalog for that.
比如发生了一个故障,我需要通知那个组件的负责人。在 Backstage 出现之前,我甚至没办法弄清楚负责人是谁。所以它一开始就是一个简单的目录。
Over the—over the years, it's then grown into having lots and lots of tools around those components as well. So today for as a human developer, everything I do, uh, when I need to take an action on some of our source code—some of our software components—I'm gonna do that in Backstage.
多年来,它逐渐演化,围绕这些组件增加了越来越多的工具。如今,作为开发者,我需要对某个软件组件执行任何操作时,都会在 Backstage 里完成。
And as it turns out, that's equally useful for agents. So we expose all of these as MCPs or command-line tools for our agents, and Claude can go look up who's an owner for something and it can go ping that team on Slack if it needs to ask questions about it, for example.
事实证明,这对代理同样极其有用。我们将所有这些能力以 MCP 或命令行工具的形式提供给代理,Claude 可以查找某个组件的负责人,也可以在需要提问时通过 Slack 联系相应的团队。
This has turned out to be incredibly useful for us and, in particular, as we've scaled up, it allows us to keep track of everything we have going on. It is also a way for us to drive our standardization.
这对我们来说非常有价值,尤其是随着规模不断扩大,它让我们能够追踪所有正在进行的工作。它也是我们推动标准化的一个途径。
So I mentioned this before. We have strong recommendations for which technologies to use for a particular problem. And we describe these in a few different ways. We have a technology radar, as many companies do, that just, like, lists all the technologies that are available and—and what state they're in. Like this one we recommend using, this one we don't recommend using, and so on.
我之前提到过,我们对特定问题应该使用哪些技术有明确的建议,并通过几种方式来描述这些建议。像很多公司一样,我们有一个技术雷达,列出所有可用技术及其状态:这个我们推荐使用,那个我们不推荐,诸如此类。
We also have what we call Golden State. So this is essentially for a particular type of component, if you're this type of backend service or you're this type of iOS view, these are the technologies and practices that we recommend that you use.
我们还有一套叫做 Golden State 的东西。本质上是针对特定类型的组件,如果你是这类后端服务或这类 iOS 视图,我们会给出推荐使用的技术和实践。
And we have a way or a UI in Backstage, uh, we call Soundcheck, that—where you as a team can go in and self-assess this. This is an example of such a view. You can see here some component and it has, uh, a requirement to define a valid owner. That was what I was talking about before.
我们在 Backstage 中有一个叫 Soundcheck 的工具,团队可以在其中进行自评。这是一个示例视图,你可以看到某个组件有一项要求:必须定义有效的负责人,这正是我之前提到的。
This allows us to then make our codebase much, much more consistent and has been something that we've been driving over several years. It's been—it's been very, very powerful and setup—set us up well for where we are now with AI.
这让我们的代码库变得更加一致,我们为此推进了好几年,成效非常显著,也为我们现在拥抱 AI 打下了良好基础。
And we then also combine that with static analysis and linting. So these things are then implemented in our codebases as checks so that when Claude works in our codebase, it will get immediate feedback on if it's using the right set of technologies and right set of design patterns.
我们还将此与静态分析和 lint 检查相结合。这些规范以检查项的形式嵌入到代码库中,当 Claude 在我们的代码库中工作时,可以立即获得反馈,了解自己是否使用了正确的技术和设计模式。
So if Claude comes up with something that—a way to, I don't know, call GRPC in a way that, uh, we know is—is not optimal for our infrastructure, Claude will get feedback from our lint system to—to correct that. And we think this is super useful both for our developers and for our agents.
比如,如果 Claude 以某种对我们基础设施而言不理想的方式调用 gRPC,我们的 lint 系统会给出反馈提示 Claude 修正。我们认为这对开发者和代理都非常有价值。
And we see this all the time as when I, uh, work with Claude in our codebase, I will see Claude run into these lint checks all the time and correct itself. It's an awesome way to—to drive this type of standardization.
这种情况我们时常看到,当我在我们的代码库中使用 Claude 时,会经常看到 Claude 触发这些 lint 检查,然后自行纠正。这是推动标准化的一种非常好的方式。
All right, I'll try to sum this up. So, first, hopefully this came through, but the need for strong engineering practices has not gone away with agents. It remains as important as it was before.
好,让我来做个总结。首先,希望这一点已经传达清楚:有了代理,对强工程实践的需求并没有消失,它依然和以前一样重要。
Boris mentioned verification this morning. We fully agree with that. The ability to have your code being well tested and having your agents being able to invoke those tests, either Claude running locally or Honk with the verification tools that I showed before, that is the way to make your agents be much more autonomous and come up with better solutions in your—in your code.
Boris 今天上午提到了验证。我们完全认同。让代码有良好的测试覆盖,并让代理能够调用这些测试,无论是 Claude 本地运行,还是 Honk 使用我之前展示的验证工具,这是让代理更加自主、在代码中得出更好解决方案的关键。
Similarly, what I just talked about in terms of making sure that your codebase is, uh, consistent and—and it's well defined what developers and agents are supposed to do turns out to make agents work much, much better, at least in our case.
同样,我刚才谈到的确保代码库一致、明确界定开发者和代理该做什么,在我们的实践中确实让代理工作得更好。
We're also, uh, very careful about trying to measure everything, measure every aspect of our developer experience. So we instrument all our infrastructure, we, uh, instrument all our PRs and so on, and we can collect that and measure how we're doing. So some of the numbers that I've been showing here today comes from that instrumentation. And we have tons and tons of metrics that we're tracking.
我们也非常重视对所有事情的度量,包括开发体验的每一个维度。我们对所有基础设施进行埋点,对所有 PR 进行追踪,收集数据来评估我们的表现。我今天展示的一些数字就来自这些埋点数据,我们追踪着大量的指标。
We believe that human judgment matters just as much as it did before or even more now that we're able to move faster. We need to figure out where to apply that human judgment though.
我们相信,人类判断力的重要性和以前一样,甚至随着我们能够移动得更快而变得更加重要。但我们需要弄清楚在哪里真正需要发挥这种判断力。
So, I mentioned the, uh, increase in PR frequency. The flip side of that is that we now have 76% more PRs to review. Developers—one of our most frequent feedbacks at the moment is there's just too many freaking PR- PRs to review.
我之前提到 PR 频率提升了 76%,另一面则是我们需要审查的 PR 也多了 76%。目前我们最常听到的反馈之一就是:要审查的 PR 实在太多了。
So we need to figure out where we apply humans to review those PRs, where it matters the most. So that won't be all PRs. We're already auto-approving some PRs that we think are safe enough to merge without a human review, and then we try to focus the human review where it really matters.
所以我们需要判断在哪些地方、哪些最重要的 PR 上需要人工审查。不是所有 PR 都需要。我们已经对一些认为足够安全的 PR 启用了自动审批,然后把人工审查的精力集中在真正重要的地方。
And I think this will be recurring. We'll figure out over time where we need the human judgment to be applied. And that's gonna be both, I think, prior to invoking the agent and post-invoking the agent.
我认为这会是一个持续演进的过程,我们会随着时间推移逐渐判断清楚哪些地方需要人类参与。这种判断既会发生在调用代理之前,也会发生在之后。
And lastly, as we're moving faster, um, we're seeing that coding is much less of a bottleneck now. It used to be that if you looked at our—the way that we build our products, our—our product development life cycle, we were mostly waiting on developers building out features, implementing them.
最后,随着我们行动越来越快,编码已经不再像以前那样是瓶颈了。过去,如果你看我们构建产品的方式,也就是我们的产品开发生命周期,主要是在等待开发者实现功能。
And that might have been early in the phase where we need to validate something, or it might be building that out for production. In both of those cases, that was one of the main bottlenecks that we had as a company.
无论是早期需要验证某个想法,还是为生产环境构建完整功能,那都曾是公司最主要的瓶颈之一。
And that is now starting to loosen up. I wouldn- I won't say that it's completely eliminated, but it's—it's starting to be reduced. So, for example, for that early validation, Spotify is a company that has too many ideas, way too many ideas about what we could do to our users than we've ever been able to build, that we had the capacity to build.
而这个瓶颈现在开始松动了。我不会说它已经完全消失,但正在减弱。举个例子,在早期验证方面:Spotify 是一家想法远远多于执行能力的公司,我们对用户能做的事情的设想,一直多于我们有能力构建的。
And having that many ideas about what we can do means that we need to validate which of those ideas makes sense. And one way we can do that is to prototype. Prototype used to be a fairly expensive thing for us to do.
拥有这么多想法意味着我们需要验证哪些想法真正有价值。原型验证是其中一种方式,但过去原型制作对我们来说成本相当高。
You had to convince a bunch of developers to build something for you so you can then show that to other people. One thing that Claude and agents allows us to do is to allow anyone to prototype in our actual production codebase.
你需要说服一批开发者为你构建某个东西,才能展示给其他人看。而 Claude 和代理让我们能够让任何人直接在我们的生产代码库中制作原型。
So now at Spotify, you can open up Claude in our client monorepo and through a set of skills and some infrastructure that we've built, you can prompt Claude to build out any feature that you want to try out and imagine.
现在在 Spotify,你可以在我们的客户端 monorepo 中打开 Claude,通过我们构建的一套技能和基础设施,直接提示 Claude 构建你想探索和设想的任何功能。
Claude will build that for you. You will get a—an app back that you can install and test on your device and share with other people within Spotify to actually get a sense of what it feels to use that idea you had.
Claude 会为你构建出来,你会得到一个可以安装在设备上测试的应用,还可以分享给 Spotify 内部的其他人,让大家真正体验一下你的想法是什么感觉。
And this has brought prototyping for something that could take days or weeks to literally taking minutes now. So anyone, including as it turns out even some of our CEOs are now building these prototypes for the ideas they have.
这让原型制作从原本需要数天乃至数周,变成了现在真正意义上的几分钟。现在任何人都可以做,甚至包括我们的一些 CEO,也在用这套方式为自己的想法构建原型。
So that's for prototyping, and then the same is true for like building things out in production. But what we're seeing is that this is moving the constraints around. So what where coding used to be the bottleneck, we're now seeing more and more of that—those constraints and bottlenecks turning into other aspects of how we build products.
原型制作如此,生产环境的构建同样如此。但我们看到的是,约束正在转移。原来编码是瓶颈,现在这些约束和瓶颈越来越多地转向产品构建的其他环节。
And in particular where we have human decision making in the loop. So, again, things like deciding what we're gonna ship to our users or which ideas we want to explore. Those things used to be- we didn't have to make that many of those decisions because, again, we were constrained on how fast we could build things.
尤其是涉及人类决策的环节。比如,决定我们要向用户推出什么,或者我们想探索哪些想法。这类决策过去不需要做那么多,因为我们受限于能构建多快。
But as that constraint lifts, we need to figure out better and more effective ways of, uh, making those decisions. And we're seeing this now and we're trying to shift around how we, uh, plan the work we do and how we decide on- or how we make those decisions at the moment.
但随着这个约束放开,我们需要找到更好、更有效的方式来做这些决策。我们现在已经看到这一趋势,也在调整我们规划工作的方式和做决策的方式。
It is still very much an ongoing learning at the moment and set of experiments that we're running, but I think in six months or so, I think we'll have a very, very different way of building products compared to what it have looked like previously.
这仍然是一个持续学习和实验的过程,但我认为大约六个月后,我们构建产品的方式会和以前相比有非常大的不同。
That was it. If again, if you want to try out FleetShift and Honk, that's, uh, where you can take a look at that. And thanks for having me.
就这些。如果你想体验 Fleetshift 和 Honk,可以去了解一下。感谢大家邀请我来分享。
[Applause]
[掌声]
[Music starts]
[音乐开始]
[Music ends]
[音乐结束]