返回播客 Latent Space

⚡️用「品味」让 DeepSeek V4 超越 Opus 4.7 — @AhmadAwais , CommandCode.ai

Okay, we are here in the remote studio with Amadou Wace. 好的，我们现在在远程演播室，Ahmad Awais 在线。 How are you? 你好吗？ I'm doing great. 我很好。 Thanks a lot for having me. 非常感谢邀请我。 Yeah, you and I have known each other since before AI. 是啊，你我认识的时间比 AI 出现还早。 You were I were active in the WordPress community. 你我都活跃在 WordPress 社区里。 I don't know how I first came across you beyond that. 我已经记不清当时是怎么认识你的了。 I think just general web stuff. 我觉得就是通过一些通用的 Web 圈子吧。 Maybe DevRel, maybe DevRel. 可能是 DevRel 圈，应该是 DevRel。 I think you before you had joined Netlify. 我觉得你是在加入 Netlify 之前。 Yeah, but you I 对，但你我 Were you ever a professional DevRel? 你有没有做过专职 DevRel？ Yeah, I think I used to lead like I was VP of DevRel at RapidAPI for a while. 有，我觉得我当时在 RapidAPI 做了一段时间 VP of DevRel。 Oh, that's right. 哦，对对。 That's right. 对，就是那时候。 Yes, yes, yes, yes. 对对对对对。 That is as professional as one gets, right? 那已经算是最专业的级别了，对吧？ No, no, no, cuz 不不不，因为 I always see you as like an independent creator type person. 我在我眼里你一直是那种独立创作者类型的人。 No, I like I worked at a bunch of different places with Google, with Airbnb. 不，我在好多不同地方工作过，Google、Airbnb 都待过。 Mostly Mostly been like this open source guy where I have like published like 300 plus open source 总体来说大多时候是以开源作者的身份出现，我发布了 300 多个开源项目。 Yeah. 嗯。 Everything is semantics 这都是说法问题。 after that, you know, my open source work took me places and you know, like you know, like you 从那以后，我的开源工作带我去了很多地方，就是那样，就像你 I create a lot of content, so DevRel. 我做了很多内容创作，所以算是 DevRel。 Yeah, okay, so tell us about the path into command code and then you know, we're going to highlight some of the work you did recently. 好，那你跟我们说说进入 CommandCode 的经历，然后聊聊你最近做的一些工作。 I think the story kind of starts at COVID, you know, I basically did a Corona CLI thing that went viral. 我觉得这段故事从新冠疫情开始，我当时做了一个 Corona CLI 项目，结果火了。 So COVID was at its peak. 那时候疫情正处于高峰期。 I was traveling a lot being in DevRel and whatnot and I think Greg Brockman and Sam Altman ended up giving me access to GPT-3 early. 我当时在做 DevRel 经常出差，Greg Brockman 和 Sam Altman 给了我 GPT-3 的早期访问权限。 So since then I've been an AI engineer, right? 从那以后我就算是 AI 工程师了。 And the first thing I did, in fact, I was just looking at it. 我做的第一件事，其实我当时就是在看着它研究。 It was July 2020. 那是 2020 年 7 月。 And Greg sent me a message like what is the use case? Greg 给我发了条消息，问这个用例是什么？ What 什么 What are you going to use this API for? 你要用这个 API 做什么？ And I told him I'm going to suggest the next line of code like a code snippet, right? 我告诉他我要用来提示下一行代码，就像代码片段那种，对吧？ This is year and three more than a year before GitHub Copilot was a thing, right? 那是 GitHub Copilot 出现的早一年多以前。 Mhm. 嗯。 So I started building this thing called CLAI. 所以我开始做一个叫 CLAI 的东西。 I've always been a big fan of building CLI, so you know, silly side project 我一直很喜欢做 CLI，就是个玩票的小项目。 Cly, right? CLAI，对吧？ And uh I think that has eventually became command code now. 后来这个东西慢慢变成了现在的 CommandCode。 Like, you know, our short history is we ended up building an AI cloud. 我们短暂的历史就是，我们最后做了一个 AI 云。 Now, everything is an AI cloud. 现在什么都变成了 AI 云。 It was called LangBase. 那时候叫 LangBase。 Uh it grew quite big. 发展得挺快。 Uh 1.2 billion agent runs a month. 每月大概有 12 亿次 agent 运行。 Ended up building an a memory infrastructure or whatnot. 后来做了一套记忆基础设施之类的东西。 And now we've sort of like pivoted into this feeling that there is only one type of agent, and that is a coding agent. 然后我们有点转向了，感觉只有一种 agent，那就是代码 agent。 It can do it all, right? 它可以做任何事，对吧？ So, why hide that capability behind some memory system or some primitive or whatnot? 那为什么要把那个能力藏在什么记忆系统或者原始组件后面？ In that uh 6-year-old old code base has eventually turned into this thing we call command code, right? 那个 6 年的老代码库最终变成了我们叫做 CommandCode 的东西。 And command code actually started with this feeling that I was using command code a lot more than other coding agents personally. CommandCode 真正的起点，是我个人用 CommandCode 比用其他代码 agent 多太多了。 And then a couple of team members started adopting it. 然后几个团队成员也开始用了。 And we started ending up building this meta-neuro-symbolic model called, you know, 我们就开始构建这个叫做元神经符号模型的东西，就是 [clears throat] [清嗓子] Taste One. Taste One。 The thing it does is like I have a lot of experience with code. 它做的事情是这样的，我有大量的编程经验。 Like, I've been I think writing code for 27 years or something, right? 我写代码大概有 27 年了。 After publishing 300 plus open source projects, you get to have a lot of opinions on things. 发布了 300 多个开源项目之后，你对很多事情都有自己的见解。 And I mostly find myself working on things that are super cutting edge, so there are no docs that an AI agent can go read or whatnot, right? 我做的事情大多处于前沿，没有现成的文档让 AI agent 去读。 So, at that time I feel like, you know, my opinions matter more than what an LLM can actually find or what you can do with rag or whatever. 所以那时候我觉得，我自己的判断比 LLM 能查到的东西或者 RAG 能做到的要更有价值。 So, I ended up encoding this behavior uh in meta-neuro-symbolics uh neuro-symbolic architecture where if you learn something from me, document it for me like a skill, right? 于是我把这种行为编码进了元神经符号架构里，如果你从我这里学到了什么，就把它像 skill 一样记录下来。 And we started calling it 我们开始把它叫做 Taste, right? Taste，对吧？ If you see me prefer pnpm a lot, but publish or link my local CLI with npm global link, right? 如果你发现我经常用 pnpm，但在本地链接 CLI 时用的是 npm global link。 They'd learn that I prefer pnpm for installing packages in almost every other thing, but when I'm linking my CLI locally, I'm using npm. 它会学到我在安装包的时候几乎什么情况都用 pnpm，但在本地链接 CLI 时我用 npm。 So, these type of things and learnings and [snorts] that eventually ended up becoming taste files, which are very similar to skill files. 类似这样的学习和记录，最终就变成了 taste 文件，和 skill 文件非常类似。 You can think of it like command code automatically learns from you on a per repository basis. 你可以把它理解成 CommandCode 会按每个代码库自动从你这里学习。 So, your team, right? 你的团队，对吧？ And it builds a library of skills, which is quite less verbose, right? 它会建立一个 skill 库，非常精简。 They're not like, you know, everything is not in there. 不是什么都往里放。 It's like things that it sees as like repeated preferences and patterns across your work, right? 只记录它在你的工作里观察到的反复出现的偏好和模式。 And it could be coming from so many different coding agents or whatnot. 这些偏好可能来自很多不同的代码 agent 或别的什么。 When we 当我们 When you merge something into main, that is when we can trigger what were your accepts, edits, and rejects overall. 当你把改动合并到 main，就是在那个时间点，我们可以触发统计你的接受、编辑和拒绝情况。 But, that's not what we are here to talk about. 但这不是我们今天要聊的主题。 You know how the silly thing about anything in tech is the thing that you think will go viral or will get adopted never does. 你知道科技圈有个有趣的现象，你以为会火、会被广泛采用的东西，结果偏偏没有。 And the thing you basically just, you know, off the cuff ignore like, "Who cares about open models?" 反而是你随手忽视的那种，心想「开源模型有什么好关注的」。 It becomes a thing, right? 它就火了，对吧？ So, it is 25th of May today. 今天是 5 月 25 日。 And about literally 25 days ago everybody was talking about how DeepSeek is so good. 大概 25 天前，所有人都在谈 DeepSeek 有多厉害。 And there were a lot of people talking about how DeepSeek is so bad and why are people saying that it is so good. 也有很多人在说 DeepSeek 其实没那么好，为什么大家都说它好。 So, I was in the in the camp of I I need to make a decision on if DeepSeek V4 Pro is actually as good as Opus, right? 我当时的判断是，我需要自己验证 DeepSeek V4 Pro 到底能不能像 Opus 一样强。 And we are doing anywhere from like, you know, a couple billion tokens a day at that time. 那时候我们每天大概在跑几十亿 token。 We are like 100 times more than that now. 现在是那时候的 100 倍。 I ended up discovering this thing that I call tool confusion, which is very odd. 我发现了一个现象，我把它叫做工具混淆，非常奇特。 We have been able to figure out how to deterministically fix tool calling for open source models or open models. 我们已经找到了如何确定性地修复开源模型工具调用的方法。 It's been such an amazing thing that a lot of people have been trying it out through a command code. 这个发现非常有价值，很多人都在通过 CommandCode 来测试它。 And I also made a bunch of that like completely open. 我也把其中相当一部分完全开源了。 You can 你可以 You can go and implement this in any coding hardness. 你可以把这套东西移植到任何代码 agent harness 里。 Because, you know, me being me, this is not that important, right? 因为我这个人比较豁达，这个其实没那么重要，对吧？ So, yeah, let me know how how deep you want me to go into that topic. 所以，告诉我你想聊这个话题聊多深。 I'm very excited to talk about it. 我非常想聊这个。 Yeah, I think we can get dive right into it. 对，我觉得我们可以直接切进去。 I think you have a viral post that you want to share. 我觉得你有一篇很火的帖子想分享一下。 And also, I think frame the problem for people who maybe are only used to open AI, right? 另外，我觉得可以先给那些只用过 OpenAI 的朋友说说背景。 They did they don't know what a tool calling is. 他们不知道工具调用是什么。 Think of it like this from our whole vibe is command code, right? 这样来想，我们整个产品理念就是 CommandCode，对吧？