AI Research Legend's Honest Assessment of Where We Are
Is reasoning enough to get to generalization or is another method needed?
推理足以实现泛化,还是需要另一种方法?
It does feel like there is something else that possibly could generalize much better.
感觉确实还有别的东西,可能泛化能力会强得多。
Why do you think
你为什么觉得
Anthropic was the first to be like really successful on the coding side?
Anthropic 是第一个在编程领域真正取得成功的?
Antropic made this very good decision to focus on coding.
Anthropic 做了一个很明智的决定,专注于编程。
Opening was like we're doing chat GP.
开局就像我们在做 ChatGPT。
The hard particap we'll see between closed source and open source models and whether that widens or shrinks in the next few years.
闭源和开源模型之间难以逾越的差距,以及未来几年会扩大还是缩小。
I think it's a fair question, but
我觉得这是个合理的问题,但
Lucas Kaiser is one of the authors of the transformer paper and has had amazing roles at both Google and OpenAI.
Lukasz Kaiser 是 transformer 论文的作者之一,在 Google 和 OpenAI 都担任过重要职位。
On unsupervised learning, I got to ask him all the top of- mind questions of what's happening in AI today.
在 Unsupervised Learning 节目里,我得以向他请教当下 AI 领域最受关注的问题。
Of course, we had to talk about the transformer uh and how he thinks about its persistence and whether it will remain the dominant architecture and what its shortcomings are.
当然,我们不得不聊聊 transformer,以及他怎么看待它的持久性,它会不会继续占据主导架构地位,以及它的局限在哪里。
We also got his thoughts on what changed in the fall to really make coding models so much better and why anthropic was really first to code.
我们还听取了他对去年秋天发生了什么、让编程模型突飞猛进的看法,以及为什么 Anthropic 最先在编程上突破。
We talked about what the future research directions that he's really excited about and we also hit on a bunch of things around how he thinks the ecosystem will evolve from open versus closed source model to application companies.
我们聊到了他真正感兴趣的未来研究方向,也涉及了他对生态系统如何演变的判断,从开源对比闭源模型,到应用公司的走向。
I think folks will really enjoy this episode with a top researcher whose uh research really set off a lot in the space.
我想大家会非常喜欢这期节目,嘉宾是一位顶级研究者,他的研究真正点燃了这个领域。
Without further ado, here's Lucas.
闲话少说,请出 Lukasz。
It's it's a pleasure to have a uh Transformer paper co-author on the podcast.
非常荣幸能邀请到 transformer 论文的共同作者来上播客。
I feel like you've been at the forefront of of so many major changes in the AI world.
我觉得你一直站在 AI 世界众多重大变革的最前沿。
And you know, our goal is really to get your thoughts on on all the questions around the AI frontiers today.
我们今天真正想做的,就是听你聊聊当下 AI 前沿的所有关键问题。
So, I really appreciate you coming on the podcast.
非常感谢你来参加这期播客。
Thank you very much.
非常荣幸。
Thank you for having me.
感谢邀请。
I can think of kind of no better place to start than generalization, right?
我觉得没有比泛化更好的起点了,对吧?
It feels like that's the the question in the air right now.
感觉这是眼下最热的核心问题。
Um, and I think in November I heard you say, you know, basically the big this big question of is reasoning enough to get to generalization or is another method needed?
嗯,我记得你大概是 11 月说过,这个大问题,推理是否足以实现泛化,还是需要另一种方法?
And I'm wondering I guess you said that you know maybe six months ago now which is you know dog years ears in AI world so years ago.
我在想,你大概是六个月前说的,在 AI 领域这相当于狗年,也就是好几年前了。
Uh how has your thinking on that that question evolved since then?
从那以后,你在这个问题上的思考有什么变化?
If you we take the current transformers with reasoning, right?
如果我们看看当前带推理能力的 transformer,对吧?
And and and agents and they have access to a shell and and and stuff, they can do amazing things, right?
再加上智能体,有 shell 访问权限,还有各种工具,能做出非常惊人的事情,对吧?
It's incredible how far we've gotten.
我们走到这一步真是令人难以置信。
like 2 years ago even not to mention before transformers.
就算是 2 年前,更别说 transformer 出现之前了。
I would have never believed that you know you just take this next word predictor give it then chain of thought and RL that and tools and that it will I know every day spend hours talking to codeex in my case or other people and it works right that you you talk to it about hard problems at work and it makes sense and it implements things and so so so that's incredible on the other hand there is this feeling um that that it is not quite like us, right?
我以前绝对不会相信,就是这样一个「预测下一个词」的东西,加上思维链和 RL,再加上各种工具,竟然真的管用,我现在每天花好几个小时跟 Codex 聊,聊工作上的难题,它能理解,能实现,这确实令人叹服。但另一方面,又有一种感觉,它跟我们不太一样,对吧?
That it's not quite at the edge of what's that we we all feel that it possibly should be even better, right?
感觉还没到我们觉得理应达到的边界,也许可以更好,对吧?
That that we can generalize from less data like somehow make like bigger leaps, get these concepts from way less.
我们能从更少的数据里泛化,以某种方式实现更大的跨越,用少得多的信息就掌握概念。
I recently have this saying that you know people say like Americans will do the right thing after exhausting all other options and like LLMs they will learn a concept they will learn it but after exhausting all other options you need this trillion tokens you need to like learn all the surface level things and only when that doesn't explain something they will finally learn the concept that's not how we learn we just get concepts from like sometimes we make them up and they're not great but so it does feel like there is something else that possibly could generalize much better that could possibly have this like a bit of a different form of understanding more like long term.
我最近有个说法,人们说美国人会在穷尽所有选项之后才做正确的事,LLM 也一样,它会学一个概念,确实会学,但要等穷尽所有其他选项之后。你得喂它万亿个 token,让它把所有表层规律都学透,等这些规律解释不了新情况,它才终于去学背后的概念。我们人类不是这样学东西的,我们有时候直接就抓住概念了,哪怕有时候抓错了。所以感觉确实存在某种别的东西,可能泛化能力会强得多,可能有一种不同的理解方式,更偏向长期的那种。
Um but it's a feeling, right?
但这只是一种感觉,对吧?
And every time we try to put a our thumb on it, it seems to evaporate or more like it maybe it doesn't even evaporate but it's like the transformer just catches up, right?
每次我们想抓住这种感觉,它好像就消散了,或者说也许它根本没消散,只是 transformer 追上来了,对吧?
It's it was like so so so so both sides in this time have grown right like transformers have gotten even better but the case for something else has also gotten even better
就像这段时间两边都在进步,transformer 越来越强,但支持「还有别的东西」的论据也越来越充分。
I would say there is there's now like a number of labs that pursue post transformers and and people see interesting results
现在确实有不少实验室在研究后 transformer 架构,大家也看到了一些有意思的结果。
there is certainly interesting things out there so so you know who wins I I still don't know to be honest I I think there's good arguments for both sides and and it will be extremely interesting to see how this goes.
肯定有值得关注的东西,但谁会赢,说实话我还真不知道。两边都有很好的论据,结局会非常有意思。
I think it'll be interesting for our listeners, you know, you you obviously I think at at at a talk in Nearcon more recently like alluded to this like whiff in the air, right?
我觉得这对我们的听众很有价值,你在 Nearcon 的演讲里也提到了空气中弥漫的那种气息,对吧?
That there's something that that that's happening on progress that's inspired like these neolabs and other folks to spin out and you know work on things that are maybe alternatives to the dominant architectures that are being worked on within the labs.
就是某种关于进展的感知,激励了那些新兴实验室和其他人出来创业,去做可能不同于主流架构的东西。
What is that feeling?
那种感觉具体是什么?
Is it seeing some of these early results or or what is it like or is it just like researchers intuition like maybe making a little more concrete for uh for our listeners?
是看到了一些早期结果,还是研究者的直觉?能给我们的听众说得更具体一些吗?
I think a lot of this is intuition and you know you need to be aware because it's like a lot of this happens in San Francisco at parties and like people talk to each other so it may be or like on podcasts so so it may be that it's self-fueled to some extent it it's but but I think there is a part of it that that that's very fundamental
我觉得很大程度上是直觉,而且要保持警惕,因为很多这类讨论发生在旧金山的派对上,大家互相交流,也可能在播客上,所以某种程度上可能是自我强化的。但我觉得其中有一部分是非常基本的东西。
I mean like Yan Leon has been saying something like this for years right way before now which
比如 Yann LeCun 说类似的话已经说了好多年了,远比现在更早。
The models we have in in a long long history, they were meant they're called neural networks because they were meant to imitate our brain, but they don't really.
我们的模型,追溯很长的历史,它们叫「神经网络」,因为本来是要模仿人脑的,但实际上并没有。
They they quite different even if they may have some similarities, right?
它们相当不同,即便可能有一些相似之处,对吧?
And if you look at how humans learn what what we can do it it is quite hard not to say that that from much less data we can do much more than our current models.
如果你看看人类是怎么学习的,能做到什么,很难不承认,我们用少得多的数据,就能做到比现有模型多得多的事情。
So so so it feels like there is this fundamental ability that that we as learning machines have that our models currently don't.
所以感觉我们作为学习机器确实具备某种基本能力,而现有模型目前还没有。
So so so fundamentally there should be something there not just a vibe.
所以从根本上说,这里应该有些东西,不只是一种感觉。
Now you can say as a counterargument that these models always had a trillion tokens to train on and people never do.
当然你可以反驳说,这些模型训练时一直有万亿个 token,人类从来没有。
So so we just didn't optimize them for training with less and if you you know if you had the same amount of compute but limited data you can tweak transformers to do much better than than they do today.
所以我们只是没有针对少量数据训练去优化,如果用同样的算力但限制数据,调整一下 transformer 也能比现在好得多。
So, so, so you know it's like some people say why why would you right we have the data we we now it's it's a big enterprise but it does feel even when we even when we try to push with as little data as people
有些人会说,为什么要那样做,数据我们有的是,这已经是个大产业了。但感觉即使在数据受限的情况下尝试
well it you know it's also like we get a lot of data from visual things from moving in the world we take actions so it's very different kinds of data it's not truly comparable that that's why it's it's hard to like make a very firm scientific statement about it.
我们从视觉、从在世界中行动、从执行动作里获取大量数据,这是非常不同的数据类型,真的没法直接比较,这就是为什么很难做出非常坚定的科学论断。
But but there is this feeling that that we have not exploited all that is there in machine learning.
但确实有一种感觉,就是机器学习中还有很多东西我们没有挖掘出来。
And and obviously the exciting feeling is that maybe if we find out what's out it it it it could make what we have even more amazing.
令人兴奋的想象是,如果我们找到了那个东西,它可能让现有的成就变得更加惊人。
You know, maybe not.
当然,也可能不是。
Maybe maybe it vanishes when you have that much data.
也许当你有了这么多数据,那个东西就消失了。
Who knows, right?
谁知道呢,对吧?
But but it's definitely
但肯定
extremely interesting to me as a as a researcher and I think to many people it's like I mean transformers were fascinating
对我作为一名研究者来说极度有趣,我相信对很多人也是,transformer 本身就已经很迷人了。
right
对吧。
they're they're great reasoning is I mean it can solve research math problems
transformer 很出色,推理能力也很厉害,能解数学竞赛题。
I find
我发现
I'm sure you've heard about the recent ERS things and
你肯定听说过最近 ERS 相关的事情。
of course as I was a mathematician before in my life so this is extremely exciting
我以前是数学家,所以这对我来说格外振奋。
I never thought a computer in this time frame will you know talk to me about mathematics takes at a high level as a real researcher that exists now and and this is insane.
我以前从没想过,在这个时间节点会有一台计算机,能以高水平研究者的方式跟我讨论数学。而这已经成为现实,简直太疯狂了。
But then as a level researcher, I'm like, okay, but we haven't really figured out this learning.
但作为一个还在做研究的人,我又想,好,但我们其实还没真正搞清楚这个学习的问题。
There is this feeling, right, that that it learns certainly, but it needs so much data.
有一种感觉,就是它确实在学,但需要的数据量太大了。
It needs so much compute.
需要的算力也太大了。
This this feels like we're not quite there yet.
感觉我们还没到位。
Now, is this only a feeling?
但这只是一种感觉吗?
Is this a vibe?
只是一种直觉?
It it seems to be reality to some extent, right?
在一定程度上似乎是现实,对吧?
But but we'll need to we'll need to see.
但我们还得走着看。
The research appeal of of figuring that out makes a ton of sense.
从研究的角度,想把这个问题弄清楚,这很有吸引力。
And I think other folks might look at it and be like well you know so what if it's not like people right like we have the data we have a a method that works
我想另一些人可能会说,就算它不像人类又怎样,我们有数据,有一套管用的方法。
um you know
嗯,你知道
obviously there's going to be some areas where there is limited data like you know medic like you know drug development and other things where uh learning from more limited data would be really helpful but so many problems that exist in the world actually aren't that data constrained right
肯定有一些数据受限的领域,比如医药、药物研发之类的,从更少的数据中学习会很有价值,但世界上很多问题其实并不那么受数据约束,对吧。
sometimes I feel like these sides almost like talk past each other right like people at the labs will roll their eyes at Yan Lun or something like that
有时感觉这两派人在互说各的,大实验室的人可能对 Yann LeCun 翻白眼,类似这样。
I think this is fair to say but on the other hand given how how quickly and you know with the whole investment in AI the problems that are not data limited get solved very rapidly.
这话有道理,但另一方面,AI 领域的发展速度和投入规模,使得那些不受数据约束的问题被解决得非常快。
So very soon all bottlenecks that remain will be quite data limited or or already are becoming and in particular it does feel that that to work well
所以很快,剩下的所有瓶颈都会相当受数据约束,或者已经开始了,特别是要想在
in the physical world you do need to solve some part of it at least because the physical world
物理世界里真正好用,你至少需要解决一部分,因为物理世界
if you you know you train on like one robot hardware it it's not doesn't quite scale data the way that the virtual or text worlds uh or or internet worlds do.
如果你只在一种机器人硬件上训练,数据规模化的方式就跟虚拟世界、文本世界或互联网世界很不一样。
So, so in the physical world is is a sizable chunk of
所以物理世界是一个相当大的一块
so people are certainly trying right with simulation data and with egocentric video data cheaper sources.
大家当然也在努力,用仿真数据和第一视角视频这些成本更低的来源。
Yeah.
是的。
I mean, you know, I'm a huge fan of Whimos, right?
我的意思是,我是 Waymo 的超级粉丝,对吧?
They I
他们我
I have always this joke like people say, "Where are my self-driving cars?"
我一直有个笑话,人们问「我的自动驾驶汽车在哪里?」
Well, I drive them.
嗯,我自己就在坐。
They're here.
它们来了。
But then they just canled the highway driving, right?
但后来他们取消了高速公路驾驶,对吧?
Because they couldn't deal with some construction zone again.
因为又处理不了某个施工区域。
And it feels almost like you you know they have had this construction zone things for years and this I'm sure there's like millions of miles and simulation and quite some in real driving and it still can't generalize to the construction zone on a highway like this feels feels just off right
感觉有点奇怪,你知道,他们为这种施工区问题已经努力了好多年了,跑了几百万英里的仿真,还有相当多的真实道路驾驶,却还是无法泛化到高速公路上的施工区,这感觉就是不对。
I mean I don't know what exactly didn't work there but I certainly know you know no teenager has this problem
我不知道具体是哪里出了问题,但我很清楚,没有哪个青少年会有这种问题。
yeah or no human right
是的,或者说没有哪个人类,对吧。
We have many other problems, right?
我们有很多别的问题,对吧?
But but not that we can drive in a construction zone in the city but not on the highway that just construction zone is a construction zone.
但不包括这种问题:能在城市施工区开车,却不能在高速公路施工区开车,施工区就是施工区。
And do you think that like some of this stuff you know will be you know or or could be solved within the you know within the transformers and I guess like what would you what what are you kind of looking for
你觉得这类问题能在 transformer 架构内解决吗?你在接下来几年里在找什么样的答案?