팟캐스트로 돌아가기 Sequoia Capital

Cursor가 Fireworks로 Composer를 학습시킨 방법: 고성능 RL을 위한 분산 인프라

You need all the infrastructure to run these environments that have to mimic as closely as possible what a user's computer would look like. 이러한 환경을 실행하려면 사용자 컴퓨터와 최대한 유사하게 구현할 수 있는 모든 인프라가 필요합니다. And it's very important as closely as possible because sometimes the model can actually figure out when it's being run in like a fake environment or not a real one and it has like different behaviors during RL than in production. 최대한 유사하게 만드는 게 정말 중요한데요, 모델이 가짜 환경에서 실행되고 있다는 걸 감지해 RL 학습 중에 실제 운영 환경과 다른 행동을 보이는 경우가 있기 때문이에요. Are you saying it being conscious that it's being is in a fake environment and it starts being behaving differently? 모델이 가짜 환경에 있다는 걸 의식하고 다르게 행동하기 시작한다는 말씀이신가요? Yes. 네. Yes. 네. Interesting. 흥미롭네요. Like it's like oh I'm in a fake environment. 마치 '아, 나 지금 가짜 환경에 있구나' 하는 식으로요. I've learned a few tricks to like get the better reward in this environment and let me try them out. 이 환경에서 더 높은 보상을 받기 위한 요령들을 학습했으니 써먹어 봐야겠다는 거죠. Models love to cheat. 모델들은 속임수 쓰는 걸 좋아하거든요. RL is really good at encouraging cheating. RL은 속임수를 조장하는 데 정말 탁월하니까요. I'm delighted to welcome Federico from Cursor and Dima from Fireworks to the podcast today. 오늘 팟캐스트에 Cursor의 Federico와 Fireworks의 Dima를 모시게 되어 정말 반갑습니다. Federico, you are the research lead on Composer 2 at Cursor, Cursor's new agentic coding model. Federico, 당신은 Cursor의 새로운 에이전틱 코딩 모델인 Composer 2의 리서치 리드를 맡고 계시죠. And Dima, you spent how many of the last few months moonlighting at Cursor in order to support all of the infrastructure required to make this gargantuan training task happen. 그리고 Dima, 이 엄청난 훈련 작업을 가능하게 하는 모든 인프라를 지원하기 위해 지난 몇 달 동안 Cursor에서 부업으로 얼마나 많은 시간을 보내셨나요. And so, I'm excited to talk to both of you today about how the training of Composer 2 came together, what hard problems you solved together, and what you think it means for the future of of AI and foundation model companies. 오늘 두 분과 Composer 2 훈련이 어떻게 이루어졌는지, 함께 해결한 어려운 문제들은 무엇인지, 그리고 AI와 파운데이션 모델 회사들의 미래에 대해 어떻게 생각하시는지 이야기 나눠볼 수 있어 정말 기대됩니다. Exciting. 정말 기대되네요. Yeah, exciting. 네, 정말요. Thank you for having us. 초대해 주셔서 감사합니다. Thanks for joining. 와 주셔서 감사합니다. Okay, let's dive right in. 자, 바로 시작해 볼게요. For those who haven't been following as closely, uh Cursor recently announced Composer 2, which is an agentic coding model uh meant for long horizon coding tasks. 가까이서 따라오지 못하셨던 분들을 위해 말씀드리자면, Cursor가 최근 장기간의 코딩 작업을 위한 에이전틱 코딩 모델인 Composer 2를 발표했습니다. Federico, uh up till now, um Cursor was mostly uh enabling uh other people's uh coding agents. Federico, 지금까지 Cursor는 주로 다른 분들의 코딩 에이전트를 지원하는 역할을 해왔는데요. Uh what was the impetus for Cursor to lean so heavily into Composer 2, and how existential is it for you to become not just an application company but also a foundation model company yourselves? Cursor가 Composer 2에 이토록 깊이 투자하게 된 계기는 무엇이었나요? 그리고 애플리케이션 회사에 머무르지 않고 파운데이션 모델 회사로까지 거듭나는 것이 Cursor에게 얼마나 중요한 일인가요? The reason why we started looking into training our own models is you can sort of think about the model as sort of like like a storage drive. 자체 모델 훈련을 고민하게 된 이유는, 모델을 일종의 저장 드라이브처럼 생각할 수 있기 때문이에요. It has certain amount of bits that it can store in its weights. 웨이트에 저장할 수 있는 비트의 양이 정해져 있는 거죠. And the idea is very simple, you know, like we care about only one task. 아이디어는 아주 간단해요. 저희는 오직 한 가지 작업만 신경 씁니다. We don't even care about coding or programming necessarily. 코딩이나 프로그래밍 자체에도 크게 얽매이지 않아요. We care about software engineering inside cursor and inside cursor only. 저희가 신경 쓰는 건 오직 Cursor 안에서의 소프트웨어 엔지니어링, 그것뿐입니다. And so, what if we were to allocate all of the bits of information that can be stored inside the model weights to that one particular task? 그렇다면 모델 웨이트에 저장할 수 있는 모든 비트를 바로 그 한 가지 작업에 전부 할당한다면 어떨까요? Also, as people may have noticed, composer is order of magnitude less expensive than Opus and other like coding models because we can just simply specialize all of the model weights to that particular task. 많은 분들이 눈치채셨겠지만, Composer는 Opus나 다른 코딩 모델들에 비해 비용이 훨씬 저렴한데요, 모델 웨이트를 그 특정 작업에만 특화시킬 수 있기 때문입니다. And so, we can serve like a smaller model or something of that sort, yeah. 그래서 더 작은 모델을 서빙하거나 그런 식으로 운용할 수 있는 거죠, 맞아요. So, it's about let's make sure every single bit of weight or information we have is dedicated toward the specific problem that we have at hand. 즉, 보유한 웨이트나 정보의 모든 비트를 지금 당면한 특정 문제에 집중시키겠다는 거군요. Exactly. 정확합니다. Got it. 알겠습니다. Um that seems like it's an almost generalizable problem. 거의 모든 곳에 일반화할 수 있는 문제처럼 보이네요. Uh Dima, I'm curious your perspective. Dima, 당신의 관점이 궁금합니다. Do you think that every application company should be looking at cursor as a harbinger of what's to come? 모든 애플리케이션 회사들이 Cursor를 앞으로의 변화를 예고하는 선구자로 봐야 한다고 생각하시나요? Like should they all be looking to do the same thing? 다들 같은 방향으로 나아가야 할까요? Yeah, absolutely. 네, 물론입니다. I mean, we actually generally see it as a pattern of kind of evolution of the applications. 사실 저희는 이걸 애플리케이션의 진화 패턴으로 봐요. You maybe start prototyping, you might be using kind of off-the-shelf model to get something running, maybe do some prompt engineering, figure out how your harness works. 처음에는 프로토타이핑을 하면서 기성 모델을 사용해 무언가를 돌리고, 프롬프트 엔지니어링을 해보면서 하니스가 어떻게 동작하는지 파악할 수 있겠죠. But the most kind of leveraged attribute of your application is the actual usage of user data or particular specific aspects of how this application works, maybe some aspects of your harness, which tools do you provide, how the application works, kind of really important bits which are important for your application. 하지만 애플리케이션에서 가장 레버리지가 높은 요소는 실제 사용자 데이터의 활용이나 애플리케이션이 동작하는 특정한 방식들, 예를 들어 어떤 도구를 제공하는지, 애플리케이션이 어떻게 동작하는지 같은 정말 중요한 부분들이에요. And the right way to capture that, you can do a little bit of that through prompting, but really the right way to do this is craft your model to act in your environment. 그걸 잘 담아내는 방법으로 프롬프팅으로 어느 정도는 가능하지만, 진짜 제대로 된 방법은 모델이 여러분의 환경에서 올바르게 행동하도록 만드는 거예요. Yeah, absolutely. 네, 정말 그렇습니다. Like there are certain tools the agent calls that it's very hard to succinctly describe exactly the behavior of that tool to the model. 에이전트가 호출하는 특정 도구들 중에는 그 도구의 정확한 동작 방식을 모델에게 간결하게 설명하기가 매우 어려운 것들이 있거든요. And you know, with just like post-training, we can bake in the optimal way to use those tools. 포스트 트레이닝만으로도 해당 도구들을 최적으로 사용하는 방법을 학습시킬 수 있죠. Like Composer, we do serve a prompt to Composer, but I I think the way we are training it, it would work even without a prompt and it would know what to do just because like we are intrinsically pushing the model to like the right direction of how it should act throughout our training. Composer에도 프롬프트를 제공하긴 하지만, 저희가 훈련하는 방식 덕분에 프롬프트 없이도 작동할 것 같아요. 훈련 과정 전반에 걸쳐 모델이 어떻게 행동해야 하는지 올바른 방향으로 내재적으로 유도하고 있거든요. Basically, there's kind of like upper bound of like how far you can get with prompt engineering. 기본적으로 프롬프트 엔지니어링으로 갈 수 있는 한계가 있는 거죠. And if you want to uh craft really great AI products, you have to go through kind of fine-tuning and influence model behavior. 정말 훌륭한 AI 제품을 만들고 싶다면 파인 튜닝을 거쳐 모델 동작에 영향을 미쳐야 합니다. That's kind of one reason. 그게 한 가지 이유고요. I mean, reason number two is what Federico mentioned is kind of cost trade-off or XP trade-off. 두 번째 이유는 Federico가 언급한 비용 트레이드오프 혹은 성능 대비 비용 트레이드오프입니다. Like the way we kind of view it at Fireworks is that when you're trying to do optimization, you have this like three-dimensional trade-off between quality, speed, and cost. Fireworks에서 저희가 보는 관점은, 최적화를 할 때 품질, 속도, 비용 사이에 3차원 트레이드오프가 있다는 거예요. And uh you can go quite far and we're doing it with all of our customers initially. 인프라 최적화만으로도 상당히 멀리 갈 수 있고, 저희도 모든 고객들과 처음에 그걸 하고 있어요. We can go quite far with just optimizing infrastructure, but when you start getting to model training, you can really push this trade-off much further and you can get better model at fraction of the cost running much faster. 인프라 최적화만으로도 꽤 멀리 갈 수 있지만, 모델 훈련에 들어서면 이 트레이드오프를 훨씬 더 멀리 밀어붙일 수 있어요. 훨씬 낮은 비용으로 훨씬 빠르게 실행되는 더 나은 모델을 얻을 수 있거든요. And you know, Composer is a great example of Composer가 그 좋은 사례인데요. Can I push on this a little bit? 여기서 조금 더 파고들어도 될까요? I want to ask you if this approach is better lesson pills. 이 접근법이 교훈 측면에서 더 나은지 여쭤보고 싶어요. And we were we were actually all talking about TabNine on the walk-in. 입장하면서 저희 모두 TabNine 얘기를 나누고 있었는데요. I'm remembering before the LLM era, there were these like small specialized coding models. LLM 시대 이전에는 이런 소규모 특화 코딩 모델들이 있었던 게 기억나요. And one of the things that was I think surprising to to a lot of people was as you've scaled up, you know, you scaled up just training on the internet and a lot of a bunch of English text and other languages, actually the models themselves got inherently better at coding as well. 그리고 당시 많은 분들을 놀라게 했던 것 중 하나는, 규모를 키우면서 인터넷과 영어 텍스트, 다른 언어들로 훈련했는데, 모델 자체가 코딩도 본질적으로 훨씬 잘하게 됐다는 거예요. And so at least the trend line I've seen so far is just like bigger models perform better on everything including on coding. 그래서 제가 지금까지 봐온 추세는 그냥 더 큰 모델이 코딩을 포함한 모든 것에서 더 잘한다는 거예요. Is what you guys are saying, does that go against the grain of the better lesson? 두 분이 말씀하시는 게 그 더 나은 교훈과 배치되는 건가요? I think no, but one one sort of like thing to point out is that the big models trained by the labs train on a lot of code as well. 아니라고 생각해요. 다만 한 가지 짚고 넘어가야 할 점은 대형 연구소들이 훈련하는 대규모 모델들도 코드를 많이 학습한다는 겁니다. Like code is one of the main tasks the labs are interested in pushing and so they don't just generalize to it. 코드는 연구소들이 집중하는 주요 작업 중 하나라서 단순히 일반화되는 게 아니에요. They're a bit specialized as well. 어느 정도 특화되어 있기도 하죠. I think for our case, actually, you know, if we believe about the bitter lesson, we are just pushing very hard on the data dimension, and we know that the models inherently have finite capacity. 저희 경우에는, 사실 더 나은 교훈을 믿는다면 저희는 데이터 차원에서 정말 열심히 밀어붙이고 있는 거예요. 모델이 본질적으로 유한한 용량을 가진다는 걸 알고 있으니까요. And so, if we want to saturate all that capacity, we need to scale data. 그 용량을 모두 채우고 싶다면 데이터를 확장해야 해요. And in order to ingest more data, we we need to like free up the weights from distractions the model may have. 더 많은 데이터를 소화하려면 모델이 가질 수 있는 잡다한 것들로부터 웨이트를 해방시켜야 하죠. Mhm, okay. 음, 알겠습니다. Got it. 이해했어요. Super interesting. 정말 흥미롭네요. Okay, let's dig into the training of Composer 2. 좋아요, Composer 2 훈련을 자세히 살펴볼게요. You launched a couple weeks ago, immediately grabbed attention. 몇 주 전에 출시하자마자 즉시 주목을 받았는데요. Strong benchmark numbers, much lower cost to to run inference on. 강력한 벤치마크 수치와 훨씬 낮아진 추론 비용. What's the short version of how Composer 2 works, and and what you guys did to make it so performant? Composer 2가 어떻게 작동하는지, 그리고 이렇게 높은 성능을 만들어내기 위해 무엇을 하셨는지 간단히 설명해 주실 수 있을까요? We started from a very strong base, which is uh Kimmy 2.5. 저희는 매우 강력한 기반에서 시작했는데, Kimi 2.5예요. It's like a 1 trillion and parameter MoE, that's 30 B active, so very very sparse, actually. 1조 파라미터 MoE인데 30B 액티브라서, 실제로는 매우 매우 희소한 구조예요. We sort of like looked at the stock and realized there are like two axes. 재고를 살펴보면서 두 가지 축이 있다는 걸 깨달았어요. So, mainly Composer 1 was just pushing on one of these axes, which is reinforcement learning, but Composer 2 pushes in two different axes. Composer 1은 주로 이 축 중 하나인 강화학습만 밀어붙였는데, Composer 2는 두 가지 다른 축으로 나아갑니다. One is continual pre-training, and the other is reinforcement learning. 하나는 지속적 사전훈련이고, 다른 하나는 강화학습이에요. So, the thing that made Composer 2 very good is pushing in both of these directions. Composer 2를 정말 뛰어나게 만든 건 두 방향 모두를 함께 밀어붙인 겁니다. So, we started off the training run by doing lots of mid-training on code tokens, almost sort of pre-training scale, actually. 훈련 실행은 코드 토큰에 대한 방대한 미드 트레이닝으로 시작했는데, 실제로는 거의 사전훈련 규모에 가까웠어요. And then, coming out of that mid-training run, we took the checkpoints and we did very large-scale RL on lots of lots of tasks. 그 미드 트레이닝 실행을 마치고 체크포인트를 가져와서 수많은 작업들에 대해 매우 대규모 RL을 진행했습니다. Okay, and then the premise here would be because Cursor sits in the middle of so many interesting coding tokens, you actually pretty uniquely have access to data to be able to train at almost pre-training scale. 자, 그러면 여기서 전제는 Cursor가 정말 많은 흥미로운 코딩 토큰의 중심에 있기 때문에, 거의 사전훈련 규모로 훈련할 수 있는 데이터에 상당히 독보적으로 접근할 수 있다는 거죠? Yeah. 네. Why not pre-train your own model, then? 그럼 왜 자체 모델을 처음부터 사전훈련하지 않나요? We just think about our approach from top-down instead of bottom-up. 저희는 접근 방식을 아래에서 위가 아니라 위에서 아래로 생각해요. So, like, how do we get a model that's useful to users in the least time possible if we were to start from the bottom, sort of figure out how how we do pre-training and then scale it up to mid-training and then, okay, now we figured out mid-training, now we do reinforcement learning. 사전훈련 방법을 파악하고 미드 트레이닝으로 확장한 다음 강화학습까지 해야 한다면, 사용자에게 도움이 되는 모델을 최대한 빨리 내놓으려면 어떻게 해야 할까요. 그 방식으로 하면 사용자에게 모델을 제공하는 데 매우 오랜 시간이 걸릴 거예요. That would take a very long time to get a model out to our users. 반대로 하면 훨씬 빠른 시간 안에 유용한 모델을 사용자들에게 제공할 수 있었어요. By doing it the other way around, we were able to give our useful model to our users in very little time. 바라건대 다음 Composer 버전은 오픈소스 기반에 의존하지 않는 저희만의 모델이 될 거예요. So, hopefully, you know, like next Composer versions are going to be our own model instead of basing it off an open-source base. 그래서 그 방식으로 하면 훨씬 빨리 사용자에게 유용한 모델을 줄 수 있었죠. 다음 Composer 버전은 저희 자체 모델이 될 거라 기대합니다. And what is the model roughly learning in the kind of mid-training step? 그러면 미드 트레이닝 단계에서 모델은 대략 무엇을 배우나요? And what is the model learning in the post-training step for you? 그리고 포스트 트레이닝 단계에서는 무엇을 배우나요? Yeah, so in mid-training, it's sort of just kind of learning about libraries of code and learning about specific code patterns that are very common, like just world knowledge as well. 미드 트레이닝에서는 코드 라이브러리들과 정말 흔하게 나타나는 특정 코드 패턴들을 학습하는 거예요. 일종의 세계 지식을 쌓는 거죠. There is like web data there as well. 웹 데이터도 포함되어 있어요. And this is sort of just creating a wider distribution that then reinforcement learning can sharpen on. 이건 강화학습이 나중에 날카롭게 다듬을 수 있도록 더 넓은 분포를 만들어내는 거예요. And so, during reinforcement learning, you know, the model gets to play directly with the cursor harness. 강화학습 중에 모델은 Cursor 하니스와 직접 상호작용하게 됩니다. And so, it gets to learn about the world the model is going to live in for the rest of its life, right? 그래서 모델이 앞으로 평생 살아갈 세계를 학습하게 되죠, 그렇죠? In in some way. 어떤 의미에서는요. And and so, then during reinforcement learning, that's where it learns how to call tools properly, how to navigate its environment, how to write correct code. 그래서 강화학습 중에 도구를 올바르게 호출하는 방법, 환경을 탐색하는 방법, 올바른 코드를 작성하는 방법을 배우게 됩니다. Because during mid-training, it it learns how to write code. 미드 트레이닝 중에 코드 작성하는 방법을 배우기는 하지만.