Terug naar podcastsSequoia Capital

Andrej Karpathy: From Vibe Coding to Agentic Engineering

We're so excited for our very first special guest. 我们非常高兴迎来今天的特别嘉宾。 He has helped build modern AI, then explain modern AI, and then occasionally rename modern AI. 他参与构建了现代 AI，又亲手解释了现代 AI，偶尔还给现代 AI 重新命名。 He actually helped co-found OpenAI right inside of this office, was the one who actually got autopilot working at Tesla back in the day. 他是 OpenAI 的联合创始人，就在这栋办公楼里，也是当年真正让 Tesla Autopilot 跑起来的那个人。 And he has a rare gift of making the most complex technical shifts feel both accessible and inevitable. 他有一种罕见的天赋：能让最复杂的技术变迁听起来既通俗易懂，又势不可挡。 You all know him for having coined the term vibe coding last year, but just in the last few months he said something even more startling, that he's never felt more behind as a programmer. 大家都知道他去年创造了 vibe coding 这个词，但就在过去几个月，他说出了更令人惊讶的话：他说自己从未感觉在编程上落后得如此彻底。 That's where we're starting today. 今天我们就从这里开始。 Thank you, Andre, for joining us. Andre，感谢你来到我们现场。 Yeah, hello. 嗯，大家好。 I'm excited to be here and to kick us off. 很高兴来这里，一起把这场对话拉开序幕。 Okay, so just a couple months ago you said that you've never felt more behind as a programmer. 好的，就在几个月前，你说你从未感觉自己在编程上如此落后。 That's startling to hear from you of all people. 听你说这话，实在让人吃惊。 Um can you help us unpack that? 嗯，能帮我们解释一下吗？ Was that feeling exhilarating or unsettling? 那种感觉是令人兴奋，还是令人不安？ Uh yeah, mixture of both for sure. 嗯，两者都有，肯定的。 Uh well, first of all, um I guess like as many of you I've been using agentic tools like Claude code adjacent things uh for a while, maybe over the last year as it came out. 嗯，首先，我想大家跟我一样，过去一年多以来一直在用 Claude Code 这类 agentic 工具。 And it was very good at, you know, chunks of code. 它在处理一段段代码方面很不错。 And sometimes it would mess up and you have to edit them, and it was kind of helpful. 有时候它会出错，你得手动修改，不过总体还挺有帮助的。 And then I would say December was this uh clear point where for me uh I was on a break, so I had a bit more time. 然后我会说，12 月是一个明显的转折点，那段时间我正好在休假，有更多时间。 I think many other people were similar. 我想很多人也有类似的体验。 And uh I just start to notice that with the latest models uh the chunks just came out fine. 嗯，我开始注意到，用最新的模型生成的代码，直接就没问题了。 And then I kept asking for more, and just came out fine. 我继续要更多，它也都没问题。 And then I can't remember the last time I corrected it. 然后我都记不清上次手动改它是什么时候了。 And then I was I just uh you know, trusted the system more and more. 然后我就越来越信任这个系统了。 And then I was vibe 然后我就开始 vibe [laughter] [笑声] And uh so it was kind of a I do think that it was a very stark transition. 嗯，所以我觉得那是一个非常明显的转变。 I think that a lot of people actually I tried to I tried to stress this on uh Twitter and or X because I think a lot of people experienced AI uh last year as ChatGPT adjacent thing, uh but you really had to look again, and you had to look as of December uh because things have changed fundamentally and uh especially on this like agentic coherent workflow that really started to actually work. 我觉得其实很多人……我当时努力在 X 上强调这件事，因为我觉得很多人去年对 AI 的印象还是类似 ChatGPT 的东西，但你真的得重新审视一下，得重新审视 12 月的进展，因为一切从根本上都变了，尤其是那种真正开始能用起来的 agentic 连贯工作流。 Um and so I would say that um yeah, it was just that realization that really uh had me um go down the whole rabbit hole of just, you know, infinity side project. 嗯，所以我会说，就是那个顿悟让我一头扎进了无穷尽的支线项目。 Uh my side projects folder is like extremely full with lots of random things and uh just I've been coding all the time. 嗯，我的支线项目文件夹已经塞得满满的，各种随机东西，我一直在不停地写代码。 Uh so uh yeah, that kind of happened in December, I would say. 嗯，所以大概就是 12 月那时候发生的。 And I was looking at the repercussions of that since. 从那以后我就一直在琢磨这件事的后续影响。 Um you've talked a lot about this idea of LLMs as a new computer. 嗯，你谈了很多关于 LLM 作为一种新型计算机的想法。 Um that it isn't just better software, it's a whole new computing paradigm. 嗯，它不只是更好的软件，而是一种全新的计算范式。 And um software 1.0 was explicit rules, software 2.0 was learned weights, software 3.0 is this. 嗯，Software 1.0 是显式规则，Software 2.0 是学习权重，Software 3.0 就是现在这个。 Um if that's actually true, what does a team build differently the day they actually believe this? 嗯，如果这确实是真的，一支团队在真正相信这一点的那天，会有什么不同的做法？ Right. 对。 So uh yeah, exactly. 嗯，是的，就是这样。 So software 1.0 所以 Software 1.0 I'm writing code, software 2.0 I'm actually programming by creating data sets and training uh training neural networks. 我写代码；Software 2.0 我是通过创建数据集、训练神经网络来编程。 So the programming is kind of like arranging data sets and maybe some objectives and neural network architectures. 所以编程变成了安排数据集，以及定义目标函数和神经网络架构。 And then what happened is that basically if you train one of these GPT models or LLMs on a sufficiently large set of tasks implicit basically implicitly because by training on the internet you have to multitask all the things that are in the data set. 然后发生的事情是：如果你在足够大的任务集上训练 GPT 或 LLM，就是隐含地，因为在互联网上训练就意味着你要多任务处理数据集里的所有东西。 Uh these actually become kind of like a programmable computer in a certain sense. 嗯，这些模型在某种程度上实际上变成了一种可编程的计算机。 So software 3.0 is kind of about uh you know, your programming now turns to prompting and what's in the context window is your lever over the interpreter that is the LLM that is kind of like interpreting your context and uh performing computation in the digital digital information space. 所以 Software 3.0 的意思就是：你的编程现在变成了提示，上下文窗口里的内容是你操控 LLM 解释器的杠杆，而 LLM 就像是在解析你的上下文并在数字信息空间里执行计算。 So I guess um yeah, that's kind of the transition and I think there's a few examples of that really drove it home for me and maybe that might be instructive. 所以我想，嗯，大概就是这样的转变，我觉得有几个例子真的让我印象深刻，也许对大家也有启发。 Uh so for example, when you when Claude Code came out when you want to install Claude Code, you would expect that normally this is a bash bash script like a shell script. 嗯，比如说，当 Claude Code 刚出来的时候，你想安装 Claude Code，通常你会期待它是一个 bash 脚本，就像 shell 脚本那样。 So, run the shell script to run uh to install OpenClaw. 所以，运行那个 shell 脚本来安装 Claude Code。 Um but the thing is that in order to target lots of different platforms and lots of different types of computers you might run an OpenClaw, uh this these shell scripts usually ballooned up and become extremely complex. 嗯，但问题是，为了适配各种不同的平台和计算机，这些 shell 脚本往往会越写越长、越来越复杂。 But the thing is you're still stuck in a software 1.0 universe of wanting to write the code. 但你其实还是困在 Software 1.0 的思维框架里，非得把代码写出来不可。 And actually the OpenClaw installation is a is a copy-paste of a bunch of text that you're supposed to give to your agent. 而实际上，Claude Code 的安装方式是复制粘贴一段文本，然后交给你的 agent。 Uh so, basically it's it's a little skill of uh you know, copy-paste this and give it to your agent and it will install OpenClaw. 嗯，本质上就是：复制粘贴这个给你的 agent，它就会帮你安装 Claude Code。 And the reason this is a lot more powerful is you're working now in the software 3.0 paradigm where you don't have to precisely uh spell out, you know, all the individual details of that setup. 这么做强大得多，因为你现在是在 Software 3.0 范式下工作，不需要精确地把每一个细节都写出来。 The agent has its own intelligence that it packages up and then it kind of like follows the instructions and it looks at your environment, your computer, and it kind of like performs intelligent actions to make things work and debugs things in the loop. agent 有自己的智能，它会打包好，然后按照指令执行，看看你的环境和机器，并智能地采取行动让事情运转，还会在循环中调试。 And it's just like so much more powerful, right? 这就强大多了，对吧？ So, I think that's a very different kind of like way of thinking about it. 所以我觉得这是一种很不同的思维方式。 It's just like, what is the piece of text to copy-paste to your agent? 就是：复制粘贴给 agent 的那段文本是什么？ That's the programming paradigm now. 这才是现在的编程范式。 I think one more maybe uh example that comes to mind that is even more extreme than that is when I was building um MenuGen. 我想再举一个也许更极端的例子，就是我在开发 menu gen 的时候。 So, MenuGen is this idea where you um you come to a restaurant, they give you a menu, there's no pictures usually, so I don't know what any of these things are. menu gen 的想法是这样的：你去一家餐厅，菜单上通常没有图片，我不知道那些菜是什么。 Uh usually I like 30% of the things I don't have no idea what they are, 50%. 嗯，大概有 30% 的菜我完全不知道是什么，大概 50% 吧。 So, I wanted to take a photo of the restaurant menu and to get pictures of what those things might look like in a generic sense. 所以我想拍一张餐厅菜单的照片，然后看看那些菜大概长什么样子。 And so, I built I built coded this app that basically lets you upload a photo and it does all this stuff and it runs on Vercel and uh it basically re-renders the menu and it gives you like all the items and it gives you a picture that it uses an image um you know, generator uh for to basically OCR all the different titles, uh use the image generator to get pictures of them and then shows it to you. 于是我就做了一个 app，可以上传照片，然后它会做一堆处理，运行在 Vercel 上，基本上重新渲染了整个菜单，列出所有菜品，用图像生成器给每道菜配上图片，先 OCR 识别出菜名，再生成对应图片展示给你。 And then I saw the software 3.0 version of this, which is which blew my mind, which is literally just take your photo, give it to Gemini, and say use Nano Banana to overlay the the things onto the menu." 然后我看到了这件事的 Software 3.0 版本，把我惊到了，就是直接拍照，扔给 Gemini，说用 Nano Banana 把东西叠加到菜单照片上。 Uh Uh and Nana Banana basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels, it rendered the different things in the menu. 嗯，Nana Banana 直接返回了一张图片，就是我拍的那张菜单照片，但它直接把各道菜的内容渲染到了像素里。 And this blew my mind because actually all of my menu gen is spurious. 这把我惊呆了，因为我做的 menu gen 根本是多余的。 It's working in the old paradigm that app shouldn't exist. 它是在旧范式下运作的，那个 app 不应该存在。 Uh and uh yeah, the software 3.0 paradigm is a lot more kind of raw. 嗯，Software 3.0 范式要原始得多。 It just um your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image, and there's no need to have any of the app in between. 就是神经网络越来越多地承担工作，你的提示或上下文就是那张图片，输出也是图片，中间根本不需要任何 app。 Um so, I think that people have to kind of like reframe, you know, not to work in the existing paradigm of what things existed and just think about it as a speed up of what exists. 嗯，所以我觉得大家需要重新思考框架，不要在既有范式里工作，不要只把它当成现有东西的加速器。 It's actually like new things are available now. 现在其实有了全新的可能。 And going back to your programming question, it's not even I think that's also an example of working in the in the old mindset because it's not just about programming and programming becoming faster. 回到你关于编程的问题，我觉得那也是一种旧思维，因为这不只是编程在变快，这是更广泛的信息处理，现在都可以自动化了。 This is more general information processing that is automatable now. 这是更广泛的信息处理，现在都可以自动化了。 So, um it's not just even about code. 所以，不光是代码的事。 So, previous code worked over a kind of like structured data, right? 以前的代码是处理结构化数据，对吧？ And uh you write code over structured data. 你写代码来处理结构化数据。 But like for example with my LLM knowledge bases project, um uh basically you get LLMs to create wikis for your organization or for you in person, etc. 但比如我的 LLM 知识库项目，本质上就是让 LLM 为你的组织或个人生成 wiki。 This is not even a program. 这根本不是一个程序。 This is not something that could exist before because there was no there was no code that would create a knowledge base based on a bunch of facts. 以前这种东西不可能存在，因为没有代码能根据一堆事实生成一个知识库。 But now you can just take these documents and uh basically uh recompile them in a different way, and uh reorder them, and create something that is uh new and interesting uh as a reframing of the data. 但现在你可以直接取这些文档，用不同的方式重新编排，对数据进行重新解读，生成全新的有价值的东西。 And so, these are new things that weren't possible. 这些是以前不可能存在的新事物。 Uh and so, I think this is uh something that I keep trying to get back to as to not only what can we do that existed that is faster now, but I think there's new opportunities of just things that couldn't be possible before. 嗯，所以我觉得我一直试图回归这个问题：不只是哪些已有的东西现在变快了，而是有哪些以前根本不可能实现的全新机会。 And I almost think that that's more exciting. 我甚至觉得这部分更令人兴奋。 I love the menu gen progression and dichotomy that you laid out, and I think even I'm sure many folks here followed your own progression of programming from last October to early January, February this year. 我很喜欢你描述的 menu gen 演进路径和那组对比，我相信在场很多人也关注了你自己从去年 10 月到今年 1、2 月的编程历程。 If you extrapolate that further, what is the 2026 equivalent for building websites in the '90s, building mobile apps in the 2010s, building SaaS in the last cloud era? 如果继续往前推，2026 年的等价物是什么？就像 90 年代建网站、2010 年代做移动 app、上一个云时代做 SaaS？ What will look completely obvious in hindsight that is still mostly unbuilt 什么东西回头看会显得理所当然，但现在还基本没人做？ [clears throat] [清嗓子] Well, going with the example of MenuGen, I guess. 嗯，还是用 menu gen 的例子来说吧。 So, a lot of this code shouldn't exist and it's just neural networks doing most of the work. 很多这样的代码根本不应该存在，神经网络完成了绝大部分工作。 Um I do think that the extrapolation looks very weird because you could basically imagine I don't think I 嗯，我确实觉得这个推演会走向非常奇怪的地方，因为你可以想象……我觉得我 Yeah, so you could imagine completely neural computers in a certain sense. 嗯，所以你可以想象完全意义上的神经计算机。 Uh you feed a raw videos like imagine a device that takes raw videos or audio into basically what's a neural net and uses diffusion to render a UI that is kind of like, you know, unique for that moment in a certain sense. 嗯，你把原始视频喂进去，想象一种设备，把原始视频或音频直接输入神经网络，用扩散模型渲染出一套 UI，那套 UI 是独一无二的，专属于那一刻。 And um I kind of feel like in the early days of computing actually, people were a little bit confused as to whether computers would look like calculators or computers would look like neural nets. 嗯，我感觉有点像计算机发展的早期，人们对于计算机到底会长得像计算器还是像神经网络，其实有些迷茫。 And in '50s and '60s, it was not really obvious which way would go. 50 年代和 60 年代，走哪条路根本不明朗。 And of course, we went down the calculator path and ended up building classical computing and then neural nets are currently running virtualized on existing computers. 当然，我们最终走上了计算器的道路，建立了经典计算体系，而神经网络现在是虚拟化运行在现有计算机上的。 But you could imagine I think that a lot of this will flip and that the neural net becomes kind of like the host process. 但你可以想象，我觉得很多东西会翻转过来，神经网络变成宿主进程。 And the CPUs become kind of like the co-processor. CPU 变成协处理器。 So, we saw the diagram of, you know, intelligence compute is going to neural networks is going to take over and become the dominant spend of flops. 所以，我们看到那张图，智能算力正在流向神经网络，神经网络会接管并占据浮点运算的主导份额。 So, you could imagine something really weird and foreign when where neural nets are doing most of the heavy lifting, they're using tool use as just like, you know, historical appendage for some kinds of like deterministic tasks. 所以，你可以想象一种非常陌生的场景：神经网络承担了大部分繁重工作，它们把工具调用当成某种历史遗留附件，用来处理某些确定性任务。 But what's really running the show is these neural nets that are networked in a certain way. 但真正掌控全局的，是以某种方式互联的神经网络。 Um so, you can imagine something extremely foreign as the extrapolation, but I think we're going to probably get there sort of piece by piece. 嗯，推演下去可以想象出极其陌生的东西，但我觉得我们大概会一步一步走到那里。 And I don't 而且我不