Terug naar podcastsLatent Space
⚡️ Google's Open AI Strategy — Omar Sanseviero, Google DeepMind
We got so much Gemma 4, Gemma 3 1, Gemma scope med Gemma.
Gemma 4、Gemma 3N、Gemma Scope,我们聊了好多 Gemma。
Give us the TLDR.
给我们说个 TLDR 吧。
Yeah, so yeah, Gemma 4 is just out.
对,Gemma 4 刚发布。
This is the most capable open model we've released so far.
这是我们迄今发布过最强的开源模型。
We really tried to compact as much intelligence per parameter as we could.
我们竭力把每个参数的智能密度压到最高。
Bring all of these multimodal capabilities.
集成了全套多模态能力。
So yeah, that's Gemma 4.
这就是 Gemma 4。
So one interesting thing, you have this thing with effective parameters, not active parameters.
有个有意思的点,你们提的是有效参数,不是激活参数。
Can you explain what it is?
能解释一下是什么意思吗?
Yeah, so pretty much in the traditional transformer architecture you have like this big embedding layer, right?
传统 Transformer 架构有一个很大的嵌入层,对吧?
And this new architecture is is more of a small change in the transformer architecture, in the transformer block.
新架构对 Transformer block 做了一个小改动。
Pretty much we add a per layer embedding.
简单说就是加了逐层嵌入。
So at every layer we add an embedding table.
每一层都加一个嵌入表。
What is exciting is that you don't need to do like the full matrix multiplication.
妙处在于不需要做完整的矩阵乘法。
This is pretty much a lookup table.
本质上就是一个查找表。
So the Gemma 4 model is a E2B.
Gemma 4 这个模型是 E2B 的。
That means that it effectively has 2 billion parameters loaded into the GPU.
也就是说实际加载进 GPU 的有效参数只有 20 亿。
It actually has almost 5 billion parameters, but those 3 billion parameters can be in the CPU, they can be in the disk, which means that you can do inference extremely quickly.
它实际上接近 50 亿参数,但那 30 亿可以放在 CPU 里,甚至在磁盘上,这样推理速度就能极快。
This is just a lookup table.
这就是查找表。
And what's the con?
那缺点是什么?
Why don't we
为什么不
Why don't we always do this?
为什么我们不一直这么做?
Can it scale?
它能规模化吗?
Is it open research?
这是公开研究吗?
Like you know, it seems very
感觉这个思路很
Okay, if I can just offload half the parameters to CPUs.
好像只要把一半参数卸载到 CPU 就行了。
Yeah, so pretty much here we did lots of quality experimentation and this is really optimized and designed for like on device.
我们做了大量质量实验,专门针对端侧场景做了优化和设计。
And when I say on device I mean like running in a phone, Android, Raspberry Pi, and so on, right?
我说端侧是指在手机、Android、Raspberry Pi 之类的设备上运行。
When you go larger you usually want to compact more
模型越大,通常反而要压缩得更紧。
You want to have more like dense architectures or MOEs.
更大的模型要用密集架构或者 MoE。
So this this research
所以这项研究
This research decisions were very helpful for these small small use cases.
这些研究决策对小型端侧场景非常有帮助。
Yeah, something I learned from the run that you organized this morning.
对,这是我今早参加你组织的跑步活动学到的东西。
For for our listeners, I think it's the first ever like official run club at AIE 6:30 a.m.
对我们的听众说一句,这应该是 AIE 史上第一个官方跑步俱乐部,早上 6:30 开始。
Very rough, but at least I woke up for it.
挺累的,但至少我爬起来了。
I met Cormac and he was telling me that I apparently in China the super apps are shipping models in the app bundle.
我碰到了 Cormac,他告诉我,中国的超级 App 都在把模型打包进 App。
For inference and just like use among all their super app.
用于推理,所有超级 App 都在用。
Assistants.
助手。
Yeah.
对。
And I don't know is is is that like a target use case for you guys?
这是你们的目标使用场景吗?
Yeah, so actually if you install like if you buy a pixel phone or a high end Samsung, they come from with a Gemini Nano and Gemini Nano is baked into the operating system and Gemini Nano is really built on top of Gemma.
你买 Pixel 手机或者高端三星,出厂就带 Gemini Nano,Gemini Nano 烧进操作系统,底层就是建在 Gemma 上的。
So last year we released Gemma 3N which was this architecture really designed for phone use cases and they use a Gemma 3N with some additional training, some additional adaptations to make the model good for like traditional on device use cases, right?
去年我们发布了 Gemma 3N,专门为手机场景设计,他们用 Gemma 3N 加上额外训练和适配,针对传统端侧场景做了优化。
So pretty much when you buy like these high end phones, you can already use a Gemini out of the box.
买了这些高端手机,开箱就能用 Gemini。
Yeah, we actually covered the 3N paper in our paper club and this like idea of like sort of parameter offloading or like download on demand is like very cool.
我们论文俱乐部讲过 3N 那篇论文,参数卸载、按需下载,这个思路特别酷。
Is it exactly the same in the Gemma 4 stuff?
Gemma 4 里也是同样的机制吗?
Yep.
对。
Okay.
好的。
For the smaller models.
针对较小的模型。
Yeah.
对。
Yeah.
对。
Yeah.
对。
And does it does it scale?
那它能规模化吗?
Is there a potential
有没有可能
So for reference, Gemma 4 is a 29B and a 31B ones and only one's dense, but have you scaled it?
Gemma 4 有 29B 和 31B 两个版本,只有一个是密集的,你们有没有推到更大规模?
Have you pushed it up?
有没有往上推?
Is it
它
We are doing lots of experiments.
我们在做大量实验。
Experiments.
实验。
Yeah, yeah.
对,对。
Stay tuned.
敬请期待。
Yeah.
对。
What goes into shipping a mean line model like this?
发布这样一条产品线的模型需要做哪些工作?
Like
比如
Yeah.
对。
What what's the behind the scenes?
幕后是什么样的?
It's complex.
挺复杂的。
The Gemma team is actually relatively small.
Gemma 团队其实规模不大。
We have like two or three PMs, we have one marketing person and then there is our like engineers and researchers working on shipping this.
两三个 PM,一个市场人员,然后是工程师和研究员负责交付。
Of course there's like the full training part, we how do we do the post training, distillation, post training techniques and so on.
当然有完整的训练流程,后训练怎么做,蒸馏、后训练技术等等。
What is quite exciting is that once we have the model, then we collaborate with a bunch of open source partners, right?
特别棒的是,模型出来后我们会和一批开源合作伙伴协作。
So for example, we work with a Lama CPP, Olama, MLX, Hugging Face, vLLM, Nvidia, AMD.
比如 llama.cpp、Ollama、MLX、Hugging Face、vLLM、Nvidia、AMD。
So we have almost 50 external partners for every well for the Gemma for lunch, which has been the most complex launch.
Gemma 4 发布有近 50 家外部合作伙伴,是最复杂的一次发布。
And also internally, we collaborate with a bunch of different teams.
内部也要和很多团队协同。
So, think of Google Cloud, Vertex, Vertex models models as a service, ADK, uh and then Android as well, right?
Google Cloud、Vertex、Vertex 模型即服务、ADK,还有 Android。
So, we work, for example, with Android team and uh with the launch of Gemma 4, we released an integration with Android Studio.
比如和 Android 团队合作,Gemma 4 发布时我们发布了与 Android Studio 的集成。
So, in Android Studio, there is this agent mode where you can have a a model helping you write code and do things within Android Studio.
Android Studio 里有个 Agent 模式,可以有模型帮你写代码、在 Android Studio 里做各种操作。
And they ship this integration with offline models using llama.cpp or vLLM or any open AI compatible endpoint.
他们用 llama.cpp、vLLM 或任何 OpenAI 兼容端点做离线模型集成。
So, now you can use Gemma 4 to also write code Android applications in Android Studio.
现在你可以用 Gemma 4 在 Android Studio 里编写 Android 应用。
What's the difference?
有什么区别?
When would someone want to do that versus just using Gemini?
什么场景下用这个而不是直接用 Gemini?
Outside of course Outside of the obvious, you're offline or you want the privacy.
除了离线或者需要隐私这类明显原因。
planes a lot or something.
经常坐飞机之类的。
I did.
我就是。
Okay, I will say, on my long 10-hour flight to London, I did use Gemini as
我在飞伦敦的 10 小时航班上用了 Gemini 作为
Yeah, I I was on Gemma 4 though.
对,我用的是 Gemma 4。
Sorry, Gemma Gemma.
不对,Gemma,Gemma。
Yeah, yeah, it's mostly offline use cases.
对,主要就是离线场景。
Right or if you
或者如果你
Yeah.
对。
Offline or privacy, like if you want to have all of your development set up locally and you don't want to send any code to to any API, you would use that.
离线或隐私需求,如果你想把开发环境全放本地,不想把代码发给任何 API,就用这个。
Do you see a future where, you know, small models get good enough?
你觉得将来小模型够用了之后会怎样?
Like, does it cannibalize?
会不会互相蚕食?
It's an interesting position.
这个位置挺有意思的。
Like, you have big Gemini, you have Gemma, both get exponentially better over time.
大 Gemini、Gemma,两个都在指数级变强。
Like, current Gemma is much better than what we had closed source a few years ago, right?
现在的 Gemma 已经远超几年前的闭源水平了,对吧?
Yeah, for me, it's quite exciting.
对我来说挺令人兴奋的。
I mean, if you look at Gemma, you compare to how we were 1 year ago, I would say Gemma uh 4 is matching state-of-the-art from 1 1 and 1/2 years ago for most things.
和一年前对比,Gemma 4 在大多数任务上能匹配一年半前的最优水平。
With local models or models that you can run in your own hardware, you can get capabilities, so you can get agentic agentic capabilities, function calling, system instructions, like conversational and that kind of stuff.
用本地模型或者自己硬件跑的模型,能获得 Agent 能力、函数调用、系统指令、对话能力之类的。
Knowledge is much trickier, so for knowledge, you do need a larger model, right?
知识这块更难,需要知识就得用更大的模型。
That's why if you compare Gemini to Gemma, Gemini
所以 Gemini 和 Gemma 相比,Gemini