Zurück zu Podcasts Latent Space

⚡️ Googles offene KI-Strategie — Omar Sanseviero, Google DeepMind

We got so much Gemma 4, Gemma 3 1, Gemma scope med Gemma. Wir haben so viel Gemma 4, Gemma 3.1, Gemma Scope, Med Gemma. Give us the TLDR. Gib uns den TLDR. Yeah, so yeah, Gemma 4 is just out. Ja, also Gemma 4 ist gerade rausgekommen. This is the most capable open model we've released so far. Das ist das leistungsstärkste offene Modell, das wir bisher veröffentlicht haben. We really tried to compact as much intelligence per parameter as we could. Wir haben wirklich versucht, so viel Intelligenz pro Parameter wie möglich reinzupacken. Bring all of these multimodal capabilities. Und all diese multimodalen Fähigkeiten einzubringen. So yeah, that's Gemma 4. Also ja, das ist Gemma 4. So one interesting thing, you have this thing with effective parameters, not active parameters. Eine interessante Sache: ihr habt diese Geschichte mit effektiven Parametern, nicht aktiven Parametern. Can you explain what it is? Kannst du erklären, was das ist? Yeah, so pretty much in the traditional transformer architecture you have like this big embedding layer, right? Ja, also in der traditionellen Transformer-Architektur hat man so eine große Embedding-Schicht, oder? And this new architecture is is more of a small change in the transformer architecture, in the transformer block. Und diese neue Architektur ist eher eine kleine Änderung in der Transformer-Architektur, im Transformer-Block. Pretty much we add a per layer embedding. Im Wesentlichen fügen wir ein Embedding pro Schicht hinzu. So at every layer we add an embedding table. Also fügen wir auf jeder Schicht eine Embedding-Tabelle ein. What is exciting is that you don't need to do like the full matrix multiplication. Das Spannende ist, dass man keine vollständige Matrixmultiplikation machen muss. This is pretty much a lookup table. Das ist im Wesentlichen eine Lookup-Tabelle. So the Gemma 4 model is a E2B. Das Gemma 4 Modell ist also ein E2B. That means that it effectively has 2 billion parameters loaded into the GPU. Das bedeutet, dass es effektiv 2 Milliarden Parameter in die GPU lädt. It actually has almost 5 billion parameters, but those 3 billion parameters can be in the CPU, they can be in the disk, which means that you can do inference extremely quickly. Es hat eigentlich fast 5 Milliarden Parameter, aber diese 3 Milliarden Parameter können in der CPU oder auf der Festplatte liegen, was bedeutet, dass man extrem schnell Inferenz betreiben kann. This is just a lookup table. Das ist einfach eine Lookup-Tabelle. And what's the con? Und was ist der Nachteil? Why don't we Warum machen wir das nicht Why don't we always do this? Warum machen wir das nicht immer so? Can it scale? Ist das skalierbar? Is it open research? Ist das offene Forschung? Like you know, it seems very Es scheint sehr Okay, if I can just offload half the parameters to CPUs. Okay, wenn ich einfach die Hälfte der Parameter auf CPUs auslagern kann. Yeah, so pretty much here we did lots of quality experimentation and this is really optimized and designed for like on device. Ja, also hier haben wir viele Qualitätsexperimente gemacht und das ist wirklich für On-Device optimiert und ausgelegt. And when I say on device I mean like running in a phone, Android, Raspberry Pi, and so on, right? Und wenn ich On-Device sage, meine ich auf einem Telefon laufen, Android, Raspberry Pi und so weiter. When you go larger you usually want to compact more Wenn man größer wird, will man normalerweise mehr komprimieren You want to have more like dense architectures or MOEs. Man möchte eher dichte Architekturen oder MoEs haben. So this this research Also diese Forschung This research decisions were very helpful for these small small use cases. Diese Forschungsentscheidungen waren sehr hilfreich für diese kleinen Use Cases. Yeah, something I learned from the run that you organized this morning. Ja, etwas das ich von dem Lauf heute Morgen gelernt habe, den du organisiert hast. For for our listeners, I think it's the first ever like official run club at AIE 6:30 a.m. Für unsere Zuhörer: ich glaube das ist der erste offizielle Run Club bei AIE, um 6:30 Uhr. Very rough, but at least I woke up for it. Sehr hart, aber zumindest bin ich dafür aufgestanden. I met Cormac and he was telling me that I apparently in China the super apps are shipping models in the app bundle. Ich habe Cormac getroffen und er hat mir erzählt, dass in China die Super-Apps anscheinend Modelle in das App-Bundle packen. For inference and just like use among all their super app. Für Inferenz und einfach zur Nutzung in ihrer gesamten Super-App. Assistants. Assistenten. Yeah. Ja. And I don't know is is is that like a target use case for you guys? Und ich weiß nicht, ist das so ein Ziel-Use-Case für euch? Yeah, so actually if you install like if you buy a pixel phone or a high end Samsung, they come from with a Gemini Nano and Gemini Nano is baked into the operating system and Gemini Nano is really built on top of Gemma. Ja, also wenn man ein Pixel-Telefon oder ein High-End-Samsung kauft, kommen die mit einem Gemini Nano vorinstalliert, und Gemini Nano ist ins Betriebssystem eingebaut und ist wirklich auf Gemma aufgebaut. So last year we released Gemma 3N which was this architecture really designed for phone use cases and they use a Gemma 3N with some additional training, some additional adaptations to make the model good for like traditional on device use cases, right? Letztes Jahr haben wir Gemma 3N herausgebracht, das war eine Architektur wirklich für Telefon-Use-Cases entworfen, und sie verwenden Gemma 3N mit etwas zusätzlichem Training und Anpassungen, damit das Modell gut für traditionelle On-Device-Use-Cases ist. So pretty much when you buy like these high end phones, you can already use a Gemini out of the box. Also wenn man so ein High-End-Telefon kauft, kann man Gemini schon out of the box nutzen. Yeah, we actually covered the 3N paper in our paper club and this like idea of like sort of parameter offloading or like download on demand is like very cool. Ja, wir haben das 3N-Paper in unserem Paper Club besprochen, und die Idee von Parameter-Offloading oder Download-on-Demand ist wirklich cool. Is it exactly the same in the Gemma 4 stuff? Ist das bei Gemma 4 genauso? Yep. Genau. Okay. Okay. For the smaller models. Für die kleineren Modelle. Yeah. Ja. Yeah. Ja. Yeah. Ja. And does it does it scale? Und skaliert es? Is there a potential Gibt es ein Potenzial So for reference, Gemma 4 is a 29B and a 31B ones and only one's dense, but have you scaled it? Also zum Vergleich: Gemma 4 ist ein 29B und ein 31B, und nur eines ist dicht, aber habt ihr es skaliert? Have you pushed it up? Habt ihr es nach oben gedrückt? Is it Ist es We are doing lots of experiments. Wir machen viele Experimente. Experiments. Experimente. Yeah, yeah. Ja, ja. Stay tuned. Bleibt dran. Yeah. Ja. What goes into shipping a mean line model like this? Was steckt dahinter, so ein State-of-the-Art-Modell zu shippen? Like Also Yeah. Ja. What what's the behind the scenes? Was passiert hinter den Kulissen? It's complex. Das ist komplex. The Gemma team is actually relatively small. Das Gemma-Team ist eigentlich relativ klein. We have like two or three PMs, we have one marketing person and then there is our like engineers and researchers working on shipping this. Wir haben ungefähr zwei oder drei PMs, eine Marketingperson, und dann gibt es unsere Ingenieure und Forscher, die daran arbeiten, das zu shippen. Of course there's like the full training part, we how do we do the post training, distillation, post training techniques and so on. Natürlich gibt es den ganzen Trainingsteil, wie wir das Post-Training machen, Distillation, Post-Training-Techniken und so weiter. What is quite exciting is that once we have the model, then we collaborate with a bunch of open source partners, right? Was ziemlich spannend ist: sobald wir das Modell haben, arbeiten wir mit einer Reihe von Open-Source-Partnern zusammen. So for example, we work with a Lama CPP, Olama, MLX, Hugging Face, vLLM, Nvidia, AMD. Wir arbeiten zum Beispiel mit llama.cpp, Ollama, MLX, Hugging Face, vLLM, Nvidia, AMD. So we have almost 50 external partners for every well for the Gemma for lunch, which has been the most complex launch. Wir haben fast 50 externe Partner für den Gemma-4-Launch gehabt, was der komplexeste Launch bisher war. And also internally, we collaborate with a bunch of different teams. Und intern arbeiten wir auch mit einer Reihe verschiedener Teams zusammen. So, think of Google Cloud, Vertex, Vertex models models as a service, ADK, uh and then Android as well, right? Denkt an Google Cloud, Vertex, Vertex Models-as-a-Service, ADK, und dann auch Android. So, we work, for example, with Android team and uh with the launch of Gemma 4, we released an integration with Android Studio. Wir arbeiten zum Beispiel mit dem Android-Team zusammen und beim Launch von Gemma 4 haben wir eine Integration mit Android Studio veröffentlicht. So, in Android Studio, there is this agent mode where you can have a a model helping you write code and do things within Android Studio. In Android Studio gibt es diesen Agentenmodus, in dem ein Modell helfen kann, Code zu schreiben und Dinge in Android Studio zu erledigen. And they ship this integration with offline models using llama.cpp or vLLM or any open AI compatible endpoint. Und die haben diese Integration mit Offline-Modellen über llama.cpp, vLLM oder jeden OpenAI-kompatiblen Endpunkt geshippt. So, now you can use Gemma 4 to also write code Android applications in Android Studio. Jetzt kann man also Gemma 4 auch verwenden, um Android-Anwendungen in Android Studio zu schreiben. What's the difference? Was ist der Unterschied? When would someone want to do that versus just using Gemini? Wann würde jemand das machen wollen, anstatt einfach Gemini zu nutzen? Outside of course Outside of the obvious, you're offline or you want the privacy. Abgesehen natürlich vom Offensichtlichen: man ist offline oder möchte Datenschutz. planes a lot or something. Fliegt viel oder so. I did. Ich schon. Okay, I will say, on my long 10-hour flight to London, I did use Gemini as Okay, ich muss sagen, auf meinem langen 10-Stunden-Flug nach London habe ich Gemini als Yeah, I I was on Gemma 4 though. Ja, ich war auf Gemma 4. Sorry, Gemma Gemma. Entschuldigung, Gemma, Gemma. Yeah, yeah, it's mostly offline use cases. Ja, ja, es sind meistens Offline-Use-Cases. Right or if you Oder wenn man Yeah. Ja. Offline or privacy, like if you want to have all of your development set up locally and you don't want to send any code to to any API, you would use that. Offline oder Datenschutz, also wenn man seine gesamte Entwicklungsumgebung lokal haben will und keinen Code an irgendein API senden möchte, dann würde man das nutzen. Do you see a future where, you know, small models get good enough? Siehst du eine Zukunft, in der kleine Modelle gut genug werden? Like, does it cannibalize? Kannibalisiert das? It's an interesting position. Das ist eine interessante Position. Like, you have big Gemini, you have Gemma, both get exponentially better over time. Man hat das große Gemini, man hat Gemma, beide werden exponentiell besser. Like, current Gemma is much better than what we had closed source a few years ago, right? Das aktuelle Gemma ist viel besser als das, was wir vor ein paar Jahren als Closed-Source hatten. Yeah, for me, it's quite exciting. Ja, für mich ist das ziemlich aufregend. I mean, if you look at Gemma, you compare to how we were 1 year ago, I would say Gemma uh 4 is matching state-of-the-art from 1 1 and 1/2 years ago for most things. Ich meine, wenn man Gemma anschaut und vergleicht, wie wir vor einem Jahr standen, würde ich sagen, Gemma 4 entspricht dem State-of-the-Art von vor eineinhalb Jahren für die meisten Dinge. With local models or models that you can run in your own hardware, you can get capabilities, so you can get agentic agentic capabilities, function calling, system instructions, like conversational and that kind of stuff. Mit lokalen Modellen oder Modellen, die man auf eigener Hardware betreiben kann, bekommt man Fähigkeiten wie agentische Fähigkeiten, Function Calling, System-Instructions, Konversationsfähigkeiten und so etwas. Knowledge is much trickier, so for knowledge, you do need a larger model, right? Wissen ist viel kniffliger; dafür braucht man ein größeres Modell. That's why if you compare Gemini to Gemma, Gemini Deshalb, wenn man Gemini mit Gemma vergleicht, Gemini