返回播客 Unsupervised Learning: With Jacob Effron

杨立昆谈 LLM 之后的路

You're one of the godfathers of AI. 你是AI教父之一。 What's your kind of view of the path of progress here? 你怎么看待这里的进展路径？ 5 years complete world domination. 5年，征服全世界。 The best way to get breakthrough research is you hire the best people. 做出突破性研究的最好方式是招到最优秀的人。 You get the out of the way. 然后让他们自由发挥。 Pardon my French. 请原谅我的直白。 You share the touring award with two others. 你和另外两人共享图灵奖。 When did your views start diverging? 你们的观点是什么时候开始出现分歧的？ In 2023. 2023年。 How do you know it was time to leave Meta? 你怎么知道是时候离开Meta了？ It sounds like you were thinking through some of these things over a period of time. 听起来你是在一段时间里慢慢想清楚的。 Here's a big misconception about my role, my relation to Alex, and how AI was run at Meta. 外界对我的角色、我和Alex的关系，以及AI在Meta是如何运作的，存在很大误解。 What's like one thing you've changed your mind on in the last year? 过去一年里，你改变主意的一件事是什么？ I mean, the whole idea of uh 我是说，整个想法的…… Yan Lun is one of the godfathers of AI. Yann LeCun是AI教父之一。 He's an absolute legend in the field. 他是这个领域的绝对传奇。 Uh someone I've admired for a long time. 我仰慕他已经很久了。 And so it was such a treat to get him on unsupervised learning. 能请到他来Unsupervised Learning，真的是太难得了。 Uh he's been a noted skeptic of of LMS in many ways. 他一直是LLM的知名质疑者。 And so we dug into what LM can do, what they can't do, uh some of the limitations he sees, and why he ultimately decided to pursue a different architecture. 我们深入探讨了LLM能做什么、不能做什么，他看到的局限，以及他最终为什么选择走一条不同的架构路线。 Uh and we also talked about his time at Meta. 我们也聊了他在Meta的那段时光。 um you know the things he's proud of in setting up fair how the last few years proceeded and what ultimately led him to uh spin out and start his own company uh AMI um I think it's just fascinating to get Yan's thoughts on everything happening in the AI ecosystem today this tension between basic research and then pushing LLM forward and how that's happening in in a bunch of organizations today as well as his thoughts on just where the the whole space is headed uh he's just an absolute giant in the field and when I started this podcast I hope we get guests like him so it is just such a treat I think folks will really enjoy hearing the conversation we had. 他为创立FAIR感到自豪的事情，过去几年的进展，以及是什么最终让他出来创办自己的公司AMI，我觉得从Yann这里听到对当今AI生态系统方方面面的看法真的很迷人，基础研究与推进LLM之间的张力，这种张力在一批机构里都在上演，还有他对整个领域走向的判断，他在这个领域是绝对的巨人，当我创办这档播客的时候，我就希望有一天能请到像他这样的嘉宾，所以这真的太难得了，我相信大家会非常享受我们这次对话。 Without further ado, here's Yan. 废话不多说，有请Yann。 Yan, this is such a pleasure. Yann，非常荣幸。 You're one of the godfathers of AI. 你是AI教父之一。 I feel like when I started doing this podcast years ago, I was really hoping we might one day get someone like you on. 我做这档播客好几年了，心里一直希望有朝一日能请到像你这样的人。 You know, I don't like that term because I live in New Jersey. 你知道，我不喜欢这个说法，因为我住在新泽西。 When you're a godfather in New Jersey, 在新泽西当教父， it doesn't mean the same thing. 意思可不一样。 Very fair. 说得有道理。 Very fair. 说得有道理。 You know, obviously, you know, your bet on on neural nets when everyone doubted them is legendary. 你当年押注神经网络，在所有人都不信的时候，这已经成了传奇。 And I feel like today you're making a similar bet in many ways against LLMs and the kind of predominant generative architectures that that so many believe in. 我感觉今天你在很多方面做出了类似的押注，反对LLM以及那些如此多人相信的主流生成架构。 Uh you've recently started a new company uh behind this theme. 你最近在这个方向上创办了一家新公司。 And so you know our goal today in the conversation is to leave our listeners with a lot more information about AME, what you're doing there, some of your work at Tapestry. 所以今天我们的目标是让听众更深入了解AMI，你在那里做什么，以及你在Tapestry方面的工作。 Um you know, why you think the rest of the field is is is pointed in the wrong direction around some of these generative models and then also just get your reflections on the way the field's unfolded your time at Meta and all that. 还有你为什么认为领域里的其他人在这些生成模型上走错了方向，以及你对整个领域发展方式、在Meta时光的回顾。 So, you know, modest goals for uh for for for a single podcast episode. 所以，对一期播客来说目标还算谦虚。 I figured it'd be great to start with the meat um because the company feels like the clearest statement of your technical thesis going forward. 我觉得不如直接从核心开始，因为这家公司感觉是你技术论点最清晰的表达。 And so, you recently launched the company. 你最近创立了公司。 It's focused on world models uh and scaling the Jeter architecture, which you obviously pioneered uh over at Meta. 专注于世界模型，以及扩展JEPA架构，这显然是你在Meta主导开创的。 And so, I'm wondering if you could talk a little bit about the origins of that architecture and the extent to which you drew inspiration from the human brain and the way that works. 我想请你聊聊这个架构的起源，以及你从人脑的工作方式中汲取了多少灵感。 So first of all, I want to say there's nothing wrong with LLMs in the sense of LLM, you know, are the basis for a lot of very useful AI products that all of us use, including me. 首先我想说，LLM本身没有什么问题，LLM是大量非常有用的AI产品的基础，我们所有人都在用，包括我。 Uh they're great, okay, for what they do. 就它们能做的事情而言，很棒。 They're just not a path towards human level or human like intelligence or even animalike intelligence. 只是它们不是通向人类水平或类人智能，乃至类动物智能的路径。 Uh so that's my claim, okay? 这是我的论点。 I'm not saying are useless, right? 我不是说它们没用。 I'm I'm just saying they're not a path towards you. 我只是说它们不是通向那个方向的路径。 I mean, 我是说， you helped build some of the first major open source ones, 你还帮助构建了最早的一批主要开源模型， right? 对吧？ Absolutely. 当然。 So, what is uh AME? 那AMI是什么？ So, ME really stands for advanced machine intelligence and the the the kind of subtitle the moto if you want is uh AI for the real world. AMI代表高级机器智能，副标题是面向真实世界的AI。 So basically a lot of you know AI techniques that people know about today are good for language manipulation either human language or computer code or mathematics or or legal ease which barely qualifies as human language. 很多人熟知的AI技术擅长语言操作，无论是自然语言、计算机代码、数学，还是法律文书。 Unfortunately a lot of human language used for it 很不幸，很多人类语言都是用来搞这个的。 right sadly you know language is very special in a way and it's particularly well suited for the type of uh you know architectures that have been so successful uh recently the the you know large language models GPT style architectures but what about the real world what about like understanding the physical world turns out reality is way more complicated than language uh because 是啊，语言确实很特殊，特别适合近年来大获成功的那类架构，比如LLM、GPT系列，但真实世界呢？理解物理世界呢？现实比语言复杂得多，因为 It's highdimensional. 它是高维的。 It's continuous. 它是连续的。 It's noisy. 它充满噪声。 It's messy. 它很混乱。 And uh training a system to understand the real world is much much harder. 训练一个理解真实世界的系统要难得多。 So that's really what we're after. 这才是我们真正追求的目标。 That's what I've been after for most of my career. 这也是我职业生涯大部分时间所追求的。 And really kind of, you know, working on in an accelerated fashion over the last five, six years or so and making significant progress over the last two years. 过去五六年里我一直在加速推进，过去两年取得了重大进展。 And so it made sense to really do a startup around it and sort of go to into high gear, you know, in pushing that. 所以围绕这个方向做一家初创公司是顺理成章的，真的要踩上油门去推。 and it became clear, you know, by the end of last year that Meta was really not the right place for that. 去年年底逐渐清楚，Meta真的不是做这件事的合适地方。 So, which is why I left and started Emmy Labs. 所以我才离开，创立了AMI Labs。 I think it's an interesting like, you know, trend that we're seeing across the board, right, where it feels like um there you're there's there's many folks spinning out of, you know, either some of the large companies or research labs, you know, that have a a particular direction of research they're excited about. 我觉得这是一个有趣的趋势，从各大公司或研究实验室出来的人越来越多，他们对某个特定研究方向充满热情，想独立去做。 And you you'd have such an interesting vantage point of this from your time at fair. 你在FAIR的这段经历给了你非常独特的视角。 This uh almost tension that exists between, you know, go pursue as many different research directions as possible in these companies versus hey, something's really working. 这种张力几乎是内在的，在这些公司里，一方面是尽可能探索多种研究方向，另一方面是某个方向真的在起效，就专注推那个方向。 This is the thing that we're going to sell for the next 61 12 months like go focus on that. 这是我们接下来6到12个月要卖的东西，去专注吧。 You know, I'm curious your your thoughts on that and and what you've kind of seen in the industry at large. 我很好奇你对这件事的看法，以及你在整个行业里观察到了什么。 Well, it's a strange uh trade-off. 这是一个奇怪的权衡。 There's really two modes of R&D, right? 研发真的有两种模式。 There's a lot of exploratory research, a lot of d research directions, right? 有大量探索性研究，很多不同的研究方向。 And sometimes something kind of seems to work and you you need to push it further and it's not research anymore. 有时候某个方向似乎在奏效，你需要继续深挖，到了这一步就不再是研究了。 I mean the people working on it are researchers or they're called researchers at least in the press but uh but really it's becoming more engineering and pushing for for products, right? 我是说，做这件事的人是研究员，至少在媒体上被叫做研究员，但实际上已经越来越工程化，越来越在为产品发力了。 So that happened a number of times at Meta because of things that was started at fair. 这在Meta发生过好几次，源头都是FAIR开始的事情。 Such a thing happened in you know early 2023 essentially uh when you know Lama which was developed at fair lama one um was very promising and uh meta created a whole organization geni to turn it into something real and a series of products uh and produced you know lama 2, lama 3, lama 4 which was a bit of disappointment uh and because you know Mark Zuckerber was disappointed by it he kind rebooted the entire organization, reorganized it and hired new people etc. 2023年初就发生了这样的事，当时在FAIR开发的Llama 1非常有前景，Meta成立了GenAI整个部门，把它变成真正的东西和一系列产品，产出了Llama 2、Llama 3、Llama 4，但Llama 4有些令人失望，Mark Zuckerberg也对此失望，他基本上重启了整个组织，重新架构，换了新人。 But what also happened uh over the last year is that uh basically the company meta realized that um they had fallen behind a little bit and so that kind of refocused the the strategy on trying to catch up with the industry. 过去一年还发生的是，Meta意识到自己落后了一点，于是战略重心转向追赶行业。 And the sad side effect of it is that a lot of the exploratory research was basically not given high priority anymore. 悲哀的副作用是，大量探索性研究基本上不再被列为高优先级。 I mean it didn't concern the stuff I was working on. 我是说，这不影响我在做的事情。 all the JA and role models because you know Mark himself and and Dub Bosworth the CTO and a bunch of other people in the company were really interested in that project and really believed in the long-term impact but the rest of the company was just you know totally entirely focused on LLM and made it clear to me that Ma was really not the the right place to push for that project anymore and then we started to had good results and so it was clear that you know we had to kind of make that transition between research and actually kind of uh developing the technology, scaling it up and building products out of it. 所有JEPA和世界模型的工作，因为Mark本人、CTO Buzz Bosworth以及公司里的一批人真的对这个项目感兴趣，真的相信它的长期影响，但公司其他人完全彻底地专注在LLM上，这让我很清楚地意识到Meta已经不再是推进这个项目的合适地方，而且我们开始出现好的结果，所以很明显我们必须完成那个从研究到实际开发技术、扩大规模、构建产品的转变。 And we realized also that most of the applications were probably for things that Meta was not particularly interested in. 我们也意识到，大多数应用场景可能是Meta没有特别兴趣的领域。 A lot of applications of the kind of stuff that we've been working on is in industry like manufacturing industry and stuff like that. 我们一直在做的这类工作，很多应用在工业领域，比如制造业之类的。 Obviously, you're you're kind of pursuing world models and and and in that broader world. 你显然在追求世界模型，以及那个更广泛的方向。 And I think there's other people that have come at the world model pace from a more like generative approach. 我觉得还有一些人从更偏生成式的角度切入世界模型这个问题。 And so I think you've got folks, you know, uh got the Google folks in Genie and the video models. 比如Google有Genie和视频模型。 You've got folks, you know, building VAS on the robotic side. 还有人在机器人侧做VLA。 You've got uh FE and and kind of like the 3D spatial models. 还有NeRF以及3D空间模型这类东西。 as you think about kind of the the body of of of of evidence that got you excited about the JEPA models and how you kind of compare them to what the generative folks have done, you know, where do you think we are today in in terms of like comparing these architectures and approaches? 当你想到让你对JEPA模型感到兴奋的那些证据，以及你怎么把它们和生成式路线做的东西相比，你觉得今天这些架构和方法的比较到底在哪个位置？ Okay, so what model is quickly becoming a buzzword? 好，世界模型正在迅速变成一个流行词。 Yeah. 对。 Right now, right, certainly in research, but also in industry to some extent. 目前至少在研究界是这样，在产业界也有一定程度上。 And uh and then there are two factions if you want. 有两个阵营。 I'm not going to talk about VA because VA is clearly now being seen as not going anywhere like it's really not working. 我不说VLA了，因为VLA现在已经被普遍认为走不通了，真的没什么进展。 So VA is you know vision language action models right? VLA就是视觉-语言-动作模型。 to basically use the LLM technology to train a system to produce actions for like controlling a robot or something like this, right? 基本上是用LLM技术训练系统来产生动作，比如控制机器人之类的。 So you have vision in, language in, action out, maybe language out too. 所以是视觉输入、语言输入，动作输出，可能还有语言输出。 Um, and that's pretty much now seen as a failure. 现在基本上被认为是失败了。