Back to Podcasts Claude

Coding is no longer the constraint: Scaling devex to teams and agents at Spotify

Hey everyone. 大家好。 Yeah, so I'm Niklas. 是的，我叫 Niklas。 It was very surprising to see my face on screen earlier because I had completely forgotten that Boris was gonna mention Spotify as part of the keynote. 看到自己的脸出现在屏幕上，我还挺意外的，因为我完全忘了 Boris 要在主题演讲里提到 Spotify。 So, I'm here to give you a bit of a, uh, run down on how we're approaching the AI transition at Spotify. 今天我来给大家介绍一下 Spotify 应对 AI 转型的方式。 So let me start with a little bit of an introduction to Spotify. 先简单介绍一下 Spotify。 Uh, anyone in here who's a Spotify user? 在座有多少 Spotify 用户？ Oh, lots of hands. 哦，很多人举手。 Good. 很好。 Uh, so we're a fairly sizeable engineering org at this point, close to 3,000 engineers. 我们是一个相当大的工程团队，目前接近 3,000 名工程师。 We've spent many years trying to optimize our developer experience and how we build products. 多年来我们一直在优化开发者体验，以及我们构建产品的方式。 We try to make sure that it's as easy as possible to deploy and ship changes to our users. 我们努力让向用户部署和交付变更的过程尽可能顺畅。 One way to illustrate that is that we do around 4,500 deployments every day to our production environment. 有一个数字可以说明这点：我们每天向生产环境进行约 4,500 次部署。 We run on a mix of repositories. 我们使用多种代码库类型，这个后面会讲。 I'll come back to this later. 这个后面再说。 Some are very large monorepos. 有些是非常大的 monorepo。 Our backend is in a 40 million lines of code monorepo, and then we have lots and lots of smaller polyrepos, thousands of them. 后端是一个有 4000 万行代码的 monorepo，另外还有大量较小的 polyrepo，有几千个之多。 The AI transition for us has been a journey of very rapid adoption curves. AI 转型对我们来说是一段快速采用曲线的旅程。 We roll out tools internally all the time to make our developers more productive. 我们一直在内部推出各种工具来提升开发者效率。 But we have never seen the rate of adoption that we've seen rolling out, uh, AI coding tools. 但我们从未见过像推广 AI 编码工具时这样的采用速度。 And you can see in particular how Claude Code, orange in this diagram, completely exploded. 可以看到，图中橙色的 Claude Code 完全爆发了。 It's a little bit hard to see due to the holiday break, but it it really happened around the Opus 3.5 release in November last year. 假期间隔让趋势有点难看清，但它确实是从去年 11 月 Opus 3.5 发布前后开始爆发的。 And since then, growth and usage of, uh, Claude in particular but AI tools in general has gone completely bananas. 从那以后，Claude 以及 AI 工具的整体使用量完全飞速增长。 And today more than 99% of our engineers use AI coding tools every week. 现在每周有超过 99% 的工程师在使用 AI 编码工具。 And we do a recurring engineering survey to all our engineers, and in the latest one who just came—just came in last week, 94% of our engineers reports that using AI tooling has helped them become more productive. 我们定期对工程师做问卷调查，上周刚收到最新一次的结果，94% 的工程师表示 AI 工具帮助他们提升了效率。 And that's with a record high self-assessed productivity. 这也创下了自我评估生产力的历史新高。 We can also look at productivity in other ways. 我们还可以从其他角度衡量生产力。 One way is to look at PR frequency as a proxy for—for how fast and how much we're able to ship. 一种方式是把 PR 频率作为我们交付速度和交付量的指标。 We're seeing today an increase of 76% in PR frequency. 目前 PR 频率提升了 76%。 As I was working on these slides over the last two weeks, I had to change this number because it keeps growing all the time. 我在过去两周准备这些幻灯片时不得不一直更新这个数字，因为它一直在涨。 And by now, by far most of the PRs that we ship are authored by an AI agent together with the developer. 而现在，我们发布的绝大多数 PR 都是由 AI agent 和开发者共同完成的。 One thing you can see in this curve if you look, you can—it's actually hard to see here, but it—the—the number of PRs has been very slowly growing over a longer period of time, but you can see that jump again happening around the Opus 3.5 release. 从这条曲线可以看出，PR 数量在更长的时间跨度里一直在缓慢增长，但可以看到在 Opus 3.5 发布前后又出现了一次跳跃。 That was when this took up for—took off for us. 那是我们真正起飞的时间节点。 So this of course then means that we also have an explosive growth in our code. 这当然也意味着我们的代码量出现了爆炸式增长。 Uh, luckily that's something that we came prepared for. 好在这是我们早就有所准备的。 We've seen this for a long time also prior to AI. 这个趋势在 AI 出现之前就存在了。 In fact, a few years ago, uh, we noticed that our codebase, our production codebase, was growing seven times faster than the number of engineers. 事实上，几年前我们就注意到，我们的生产代码库增长速度是工程师数量增长速度的 7 倍。 So that meant that engineers would spend more and more of their time maintaining our existing codebase compared to being able to build new features and value for our users. 这意味着工程师越来越多的时间花在维护现有代码库上，而不是为用户构建新功能和创造价值。 So we realized that we needed to fix this. 所以我们意识到必须解决这个问题。 So we started, uh, an effort to automate as much of that maintenance as possible. 于是我们开始推进一个项目，尽可能地将这些维护工作自动化。 A lot of that maintenance comes down to pretty dull things that we just need to do, you know, migrate from this version to this version, deprecate this API, fix this security vulnerability, those types of things. 大量维护工作说白了都是些枯燥的事情，比如从这个版本迁移到那个版本、弃用某个 API、修复安全漏洞之类的。 But that took a lot of time for our developers. 但这些事情占用了开发者大量时间。 And the way we typically did those migrations back then was to send out some migration path to hundreds of teams saying, "Hey, you need to upgrade from this Java version to this Java version, uh, for your components." 以前做这些迁移的方式，是向几百个团队发出迁移说明，告诉他们需要把某个 Java 版本升级到另一个版本。 The teams would go ahead and do that, and this would typically take us months to complete one of those upgrades across many thousands of components. 各团队各自去完成，一次这样的升级通常要花好几个月才能在几千个组件上全部完成。 That was not fun for anyone. 这对任何人来说都不是好体验。 In—in that same engineering survey back then, migrations was the top thing that users or, sorry, our developers were frustrated about. 当时那次工程师调查里，迁移是开发者反映最令人沮丧的头号问题。 So, we imagined instead of doing this like component per component and fairly manually, can we imagine a way where we do this as a way to mutate our entire fleet of components? 于是我们设想，能不能不逐个组件手动操作，而是找到一种方式批量变更整个组件集群？ Figure out the way to do that. 想办法实现这个目标。 And we—and we built this out, built out the infrastructure for this, something we call fleet management and the underlying system that we use is called FleetShift. 我们把这套基础设施搭建出来了，叫做 fleet management，底层系统叫 FleetShift。 And today up—up until today we've now merged two and a half million of those automated maintenance PRs. 到今天为止，我们已经合并了 250 万个这样的自动化维护 PR。 Work that our developers did not have to do. 这些都是开发者不需要亲自去做的工作。 The vast majority of those, the green part of this graph, have been automerged. 其中绝大多数，也就是图表绿色部分，已经自动合并。 So there's no human in the loop. 整个过程没有人工介入。 It's automation creating the—the PR to begin with, automation validates that PR is safe to merge, and then go ahead and merge it without any developer needing to care about that change. 全程自动化：自动创建 PR、自动验证合并安全性、自动完成合并，开发者完全不需要关心这个变更。 This happens every day. 这每天都在发生。 We ship thousands of these every day. 每天要处理几千个这样的 PR。 So, that was all pre-AI. 以上这些，都是 AI 出现之前的事。 Uh, and one thing that we noted pretty quickly was that this works really well for simple changes. 我们很快发现，这套方案对简单变更非常有效。 That might be changes to configuration, it might be bumping some dependency in your build file, those types of things. 比如修改配置、在构建文件里升级某个依赖之类的。 Works great. 效果很好。 But once you get into a little bit more complex changes, like replacing API calls, those types of things, the scripts that we used to run these shifts across our fleet became incredibly complicated. 但一旦涉及稍微复杂一点的变更，比如替换 API 调用，用来在整个代码库中执行这些迁移的脚本就会变得极其复杂。 Code, as it turns out, has a very, very wide API surface. 事实证明，代码有非常非常宽广的 API 接口面。 There are many, many ways to achieve the same thing if it's just calling a method. 要调用同一个方法，有很多很多种写法。 And when you write that script and you run that across millions of lines of code and thousands of components, you are going to find every corner case. 当你写出一个脚本并在数百万行代码和几千个组件上运行时，你会遇到每一种边界情况。 And you need to deal with that in your migration script. 这些情况都需要在迁移脚本里处理。 There's even a word for—a term for this, it's called Hyrum's Law, coming from an engineer at Google that discovered this many years before we then ran into it. 这方面甚至有一个专有名词，叫做 Hyrum's Law，来自 Google 的一位工程师，他在我们碰到这个问题之前很多年就已经发现了这个规律。 So, pretty early on as LLMs came about, we figured that, hey, instead of writing these deterministic scripts to do these code modifications, can we use an LLM for this? 所以在 LLM 兴起之初，我们就想到，与其写确定性脚本来做这些代码修改，能不能用 LLM 来做？ So very early we started iterating on trying to do this, prior to Claude and—and similar tools. 所以我们很早就开始在 Claude 和类似工具出现之前尝试这个方向。 And we noticed it was challenging initially. 一开始确实很有挑战。 The models were just too stupid. 模型当时太弱了。 The way we were trying to do it was just too stupid. 我们的尝试方式也太简单粗暴了。 But over time on many iterations we started to figuring out the patterns for it, and the models got better. 但经过多次迭代，我们逐渐摸索出了适合的模式，模型也越来越好。 Out of this came a tool that we now called Honk. 由此诞生了一个工具，我们现在叫它 Honk。 Uh, Boris mentioned this this morning, has a silly name, uh, and—and a silly icon, but it's a very useful tool as it turns out. Boris 今天早上提到过，名字有点傻，图标也有点傻，但实际上非常有用。 And Honk is really the result of all of these iterations of us try—trying different ways of solving this problem of, like, automating these still relatively simple code changes, but again applied over many, many variants of code. Honk 是我们不断尝试不同方案的产物，核心问题是如何自动化那些相对简单但需要应对大量代码变体的代码修改。 It started out very differently, but today Honk is—has Claude under the hood using the Agent SDK. 它起步时形态完全不同，但今天 Honk 的底层使用 Claude，通过 Agent SDK 驱动。 And it wraps up the Agent SDK in—inside our own harness, inside a Kubernetes pod so we can schedule many of these running in our cloud environment. Agent SDK 被封装在我们自己的执行框架里，运行在 Kubernetes pod 中，这样我们就可以在云环境里调度大量并发任务。 And we give it access to a set of trusted tools. 我们给它配备了一套可信赖的工具。 Uh, the chart here just says the verification tools, but there's actually more tools that it has available to it. 图表上只写了验证工具，但实际上它可用的工具比这多。 And for verification, it's able to run builds of the code, running in our CI environment. 在验证方面，它可以在我们的 CI 环境中运行代码构建。 So one thing that is important to us is that we can run our builds across multiple operating systems, for example, because our clients runs on many different operating systems. 对我们来说很重要的一点是，可以跨多个操作系统运行构建，因为我们的客户端运行在不同操作系统上。 So Honk has available tools that it can use to verify that its changes are correct. 所以 Honk 有工具可以验证其改动是否正确。 And again, we run many of these every day. 同样，我们每天运行大量这类任务。 And then we integrate this into that fleet management tooling that I mentioned before. 然后我们把 Honk 集成到前面提到的 fleet management 工具中。 So, we use FleetShift, our tool that I was showing the graph before, to schedule and orchestrate these changes across our thousands of repositories. 我们用 FleetShift 来调度和编排这些跨越几千个代码库的变更。 And Honk sits in the middle doing the actual code changes. Honk 居于中间，负责实际的代码修改。 And it might look something like this. 大概长这个样子。 In this case, this is a fairly small migration targeting 39 repositories. 这个例子是针对 39 个代码库的小规模迁移。 But for a team that owns this, they can go in and see what's the status of this particular shift today, how many PRs has been created, how many has been merged, how many failed in CI so I need to take a look at them, those types of things. 负责这个的团队可以进来看看这次 shift 今天的状态：创建了多少 PR，合并了多少，有多少在 CI 里失败了需要排查，诸如此类。 And as Boris mentioned this morning, we're seeing pretty significant, uh, time savings from this. Boris 今天早上也提到了，我们从中看到了相当可观的时间节省。 What used to be what I described before, hundreds of teams doing migrations for their components, taking weeks and weeks or months, now can be done by a single engineer in a few days. 过去那种方式，要让几百个团队各自完成组件迁移，要花好几周甚至好几个月。现在一个工程师几天就能搞定。 The latest Java migration that we did—we run our backend mostly on Java on the JVM—the latest Java migration we did took three days using these tools. 最近一次 Java 迁移，我们的后端主要运行在 Java JVM 上，用这套工具只花了三天时间。 And we're making this now available. 我们现在也在对外开放这个能力。 So we have a commercial offering for other companies through our Backstage developer portal, and we're making this available as a product in that packaging. 我们通过 Backstage 开发者门户为其他公司提供商业化版本，正在把这套能力打包成一款产品。 So if this is something that is relevant for your company, you can take a look there. 如果这对你们公司有参考价值，可以去看看。 But as it turns out, developers are very resourceful and innovative. 不过话说回来，开发者非常善于想办法，也很有创造力。 So pretty quickly they—folks figured out that, hey, hm, this Honk thing that we run for all these migrations, how about I figure out how I can call that over Slack and have it do things for me that way? 很快就有人发现，Honk 这个我们用来跑迁移的工具，能不能直接从 Slack 调用，让它来帮我做事？ So similar to how you might invoke Claude or other tools over Slack, you can do that with Honk at Spotify as well. 就像你在 Slack 里调用 Claude 或其他工具一样，在 Spotify 内部也可以这样调用 Honk。 And it's become a very common way that people will have a Slack conversation for something, then just at-mention Honk. 这已经成为一种很常见的使用方式，Slack 里聊着聊着，就直接 @ 一下 Honk。 Honk goes off and work on that and comes back with a PR. Honk 就去处理，然后带着 PR 回来。 So we're seeing more and more of these patterns evolve around Honk. 我们看到越来越多围绕 Honk 形成的这类使用模式。