팟캐스트로 돌아가기 Latent Space

⚡️ Google의 오픈 AI 전략 — Omar Sanseviero, Google DeepMind

We got so much Gemma 4, Gemma 3 1, Gemma scope med Gemma. Gemma 4, Gemma 3, Gemma Scope까지 정말 다양한 Gemma가 나왔네요. Give us the TLDR. TLDR로 정리해 주세요. Yeah, so yeah, Gemma 4 is just out. 네, Gemma 4가 방금 출시됐어요. This is the most capable open model we've released so far. 지금까지 저희가 공개한 오픈 모델 중 가장 뛰어난 모델입니다. We really tried to compact as much intelligence per parameter as we could. 파라미터당 최대한 많은 지능을 압축하려고 정말 공을 들였어요. Bring all of these multimodal capabilities. 멀티모달 기능도 모두 담았고요. So yeah, that's Gemma 4. 네, 그게 Gemma 4입니다. So one interesting thing, you have this thing with effective parameters, not active parameters. 흥미로운 점이 하나 있는데요, 활성 파라미터가 아닌 유효 파라미터라는 개념이 있더라고요. Can you explain what it is? 그게 어떤 건지 설명해 주실 수 있나요? Yeah, so pretty much in the traditional transformer architecture you have like this big embedding layer, right? 네, 기존 트랜스포머 구조에서는 이렇게 큰 임베딩 레이어가 있잖아요. And this new architecture is is more of a small change in the transformer architecture, in the transformer block. 이 새로운 구조는 트랜스포머 블록, 즉 트랜스포머 구조에 작은 변화를 준 거예요. Pretty much we add a per layer embedding. 기본적으로 레이어별 임베딩을 추가한 거예요. So at every layer we add an embedding table. 모든 레이어마다 임베딩 테이블을 추가하는 방식이에요. What is exciting is that you don't need to do like the full matrix multiplication. 흥미로운 점은 전체 행렬 곱셈을 할 필요가 없다는 거예요. This is pretty much a lookup table. 그냥 룩업 테이블이에요. So the Gemma 4 model is a E2B. Gemma 4 모델은 E2B입니다. That means that it effectively has 2 billion parameters loaded into the GPU. 즉 유효 파라미터 20억 개만 GPU에 로드된다는 뜻이에요. It actually has almost 5 billion parameters, but those 3 billion parameters can be in the CPU, they can be in the disk, which means that you can do inference extremely quickly. 실제로는 거의 50억 개의 파라미터를 갖고 있지만, 나머지 30억 개는 CPU나 디스크에 올려둘 수 있어요. 덕분에 추론을 매우 빠르게 할 수 있죠. This is just a lookup table. 그냥 룩업 테이블이니까요. And what's the con? 단점은 뭔가요? Why don't we 왜 항상 Why don't we always do this? 왜 항상 이렇게 하지 않는 건가요? Can it scale? 확장이 가능한가요? Is it open research? 공개 연구인가요? Like you know, it seems very 뭔가 굉장히 Okay, if I can just offload half the parameters to CPUs. 파라미터의 절반을 CPU로 오프로드할 수 있다면 말이죠. Yeah, so pretty much here we did lots of quality experimentation and this is really optimized and designed for like on device. 네, 저희는 품질 실험을 많이 했고요, 이건 온디바이스, 그러니까 폰에서 실행하는 용도로 최적화하고 설계한 거예요. And when I say on device I mean like running in a phone, Android, Raspberry Pi, and so on, right? 온디바이스라고 하면 휴대폰, Android, Raspberry Pi 같은 기기에서 실행하는 걸 말해요. When you go larger you usually want to compact more 모델이 커지면 더 많이 압축하고 싶어지잖아요. You want to have more like dense architectures or MOEs. 밀집형 구조나 MoE를 더 선호하게 되죠. So this this research 그래서 이 연구가 This research decisions were very helpful for these small small use cases. 이 연구 방향은 소규모 사용 사례에 매우 유용했어요. Yeah, something I learned from the run that you organized this morning. 네, 오늘 아침 당신이 주최한 런닝 모임에서 배운 건데요. For for our listeners, I think it's the first ever like official run club at AIE 6:30 a.m. 청취자 여러분께 말씀드리자면, AIE에서 최초로 열린 공식 런 클럽인 것 같아요. 새벽 6시 30분에요. Very rough, but at least I woke up for it. 꽤 힘들었지만 그래도 일어났어요. I met Cormac and he was telling me that I apparently in China the super apps are shipping models in the app bundle. Cormac을 만났는데, 그가 말하길 중국에서는 슈퍼앱들이 앱 번들에 모델을 담아 출시한다고 하더라고요. For inference and just like use among all their super app. 슈퍼앱 전체에서 추론에 활용하려고요. Assistants. 어시스턴트들이요. Yeah. 네. And I don't know is is is that like a target use case for you guys? 그게 여러분이 목표로 하는 사용 사례인가요? Yeah, so actually if you install like if you buy a pixel phone or a high end Samsung, they come from with a Gemini Nano and Gemini Nano is baked into the operating system and Gemini Nano is really built on top of Gemma. 네, 픽셀폰이나 고급 삼성 폰을 구입하면 Gemini Nano가 탑재되어 있는데, Gemini Nano는 운영 체제에 내장되어 있고 Gemma를 기반으로 만들어졌어요. So last year we released Gemma 3N which was this architecture really designed for phone use cases and they use a Gemma 3N with some additional training, some additional adaptations to make the model good for like traditional on device use cases, right? 작년에 Gemma 3N을 출시했는데, 이 구조는 폰 사용 사례를 위해 특별히 설계됐어요. 그리고 온디바이스 전통적 사용 사례에 맞게 추가 학습과 적응 과정을 거쳐 Gemini Nano에 활용했죠. So pretty much when you buy like these high end phones, you can already use a Gemini out of the box. 그래서 이런 고급 폰을 구입하면 처음부터 Gemini를 사용할 수 있어요. Yeah, we actually covered the 3N paper in our paper club and this like idea of like sort of parameter offloading or like download on demand is like very cool. 저희도 페이퍼 클럽에서 3N 논문을 다뤘는데, 파라미터 오프로딩이나 수요에 따른 다운로드 개념이 정말 흥미롭더라고요. Is it exactly the same in the Gemma 4 stuff? Gemma 4에도 똑같이 적용되나요? Yep. 맞아요. Okay. 알겠어요. For the smaller models. 소형 모델에는요. Yeah. 네. Yeah. 네. Yeah. 네. And does it does it scale? 그리고 확장이 되나요? Is there a potential 가능성이 있나요? So for reference, Gemma 4 is a 29B and a 31B ones and only one's dense, but have you scaled it? 참고로 Gemma 4는 29B와 31B가 있는데 하나만 밀집형이잖아요. 더 크게 확장해 보셨나요? Have you pushed it up? 밀어붙여 보셨나요? Is it 혹시 We are doing lots of experiments. 실험을 많이 하고 있어요. Experiments. 실험들이요. Yeah, yeah. 네, 네. Stay tuned. 기대해 주세요. Yeah. 네. What goes into shipping a mean line model like this? 이런 주요 라인업 모델을 출시하는 데 뭐가 들어가나요? Like 그러니까 Yeah. 네. What what's the behind the scenes? 내막이 어떻게 되나요? It's complex. 복잡해요. The Gemma team is actually relatively small. Gemma 팀은 사실 규모가 작아요. We have like two or three PMs, we have one marketing person and then there is our like engineers and researchers working on shipping this. PM이 두세 명, 마케팅 담당자가 한 명, 그리고 엔지니어와 연구자들이 이걸 출시하기 위해 일하고 있어요. Of course there's like the full training part, we how do we do the post training, distillation, post training techniques and so on. 물론 전체 학습 부분도 있고, 포스트 트레이닝, 디스틸레이션, 포스트 트레이닝 기법 등도 있죠. What is quite exciting is that once we have the model, then we collaborate with a bunch of open source partners, right? 흥미로운 점은 모델이 완성되면 오픈소스 파트너들과 협력한다는 거예요. So for example, we work with a Lama CPP, Olama, MLX, Hugging Face, vLLM, Nvidia, AMD. 예를 들어 llama.cpp, Ollama, MLX, Hugging Face, vLLM, NVIDIA, AMD와 함께 작업해요. So we have almost 50 external partners for every well for the Gemma for lunch, which has been the most complex launch. Gemma 4 런치 기준으로 외부 파트너가 거의 50개인데, 지금까지 가장 복잡한 런치였어요. And also internally, we collaborate with a bunch of different teams. 내부적으로도 다양한 팀과 협력해요. So, think of Google Cloud, Vertex, Vertex models models as a service, ADK, uh and then Android as well, right? Google Cloud, Vertex, Vertex 모델 서비스, ADK, 그리고 Android 팀도 있어요. So, we work, for example, with Android team and uh with the launch of Gemma 4, we released an integration with Android Studio. 예를 들어 Android 팀과 협력해서 Gemma 4 런치와 함께 Android Studio 연동 기능을 출시했어요. So, in Android Studio, there is this agent mode where you can have a a model helping you write code and do things within Android Studio. Android Studio에는 에이전트 모드가 있어서 모델이 Android Studio 안에서 코드 작성을 도와주는 기능이 있어요. And they ship this integration with offline models using llama.cpp or vLLM or any open AI compatible endpoint. 그리고 llama.cpp나 vLLM, OpenAI 호환 엔드포인트를 사용하는 오프라인 모델로 이 연동 기능을 출시했어요. So, now you can use Gemma 4 to also write code Android applications in Android Studio. 이제 Gemma 4를 Android Studio에서 Android 앱 코드 작성에도 활용할 수 있어요. What's the difference? 차이가 뭔가요? When would someone want to do that versus just using Gemini? 그냥 Gemini를 쓰지 않고 이걸 언제 쓰나요? Outside of course Outside of the obvious, you're offline or you want the privacy. 물론 오프라인이거나 프라이버시를 원하는 경우는 빼고요. planes a lot or something. 비행기를 많이 타거나 하는 경우요. I did. 저도 그랬어요. Okay, I will say, on my long 10-hour flight to London, I did use Gemini as 런던 가는 10시간 비행에서 Gemini를 썼는데요. Yeah, I I was on Gemma 4 though. 아, Gemma 4를 썼어요. Sorry, Gemma Gemma. 죄송해요, Gemma를요. Yeah, yeah, it's mostly offline use cases. 네, 대부분 오프라인 사용 사례예요. Right or if you 아니면 Yeah. 네. Offline or privacy, like if you want to have all of your development set up locally and you don't want to send any code to to any API, you would use that. 오프라인이거나 프라이버시 때문에, 개발 환경을 로컬로 구성하고 싶고 어떤 API에도 코드를 보내고 싶지 않을 때 쓰겠죠. Do you see a future where, you know, small models get good enough? 소형 모델이 충분히 좋아지는 미래가 올 것 같으세요? Like, does it cannibalize? 서로 잠식하는 게 아닐까요? It's an interesting position. 흥미로운 포지션이에요. Like, you have big Gemini, you have Gemma, both get exponentially better over time. 대형 Gemini도 있고, Gemma도 있는데 둘 다 시간이 지날수록 기하급수적으로 좋아지잖아요. Like, current Gemma is much better than what we had closed source a few years ago, right? 현재 Gemma는 몇 년 전 클로즈드 소스보다 훨씬 뛰어나잖아요. Yeah, for me, it's quite exciting. 네, 저는 꽤 흥미롭다고 봐요. I mean, if you look at Gemma, you compare to how we were 1 year ago, I would say Gemma uh 4 is matching state-of-the-art from 1 1 and 1/2 years ago for most things. Gemma를 1년 전과 비교해보면, Gemma 4는 대부분의 영역에서 1년 반 전의 최신 기술과 비슷한 수준이에요. With local models or models that you can run in your own hardware, you can get capabilities, so you can get agentic agentic capabilities, function calling, system instructions, like conversational and that kind of stuff. 로컬 모델이나 자체 하드웨어에서 돌릴 수 있는 모델로 에이전트 기능, 함수 호출, 시스템 지시사항, 대화 같은 기능을 활용할 수 있어요. Knowledge is much trickier, so for knowledge, you do need a larger model, right? 지식은 훨씬 까다로워서 지식을 위해서는 더 큰 모델이 필요해요. That's why if you compare Gemini to Gemma, Gemini 그래서 Gemini와 Gemma를 비교해보면, Gemini가