Zurück zu Podcasts Sequoia Capital

Wie Cursor Composer auf Fireworks trainierte: Verteilte Infrastruktur für hochperformantes RL

You need all the infrastructure to run these environments that have to mimic as closely as possible what a user's computer would look like. Man braucht die gesamte Infrastruktur, um diese Umgebungen zu betreiben, die so genau wie möglich nachbilden müssen, wie der Computer eines Nutzers aussehen würde. And it's very important as closely as possible because sometimes the model can actually figure out when it's being run in like a fake environment or not a real one and it has like different behaviors during RL than in production. Und es ist sehr wichtig, dass das so genau wie möglich gelingt, weil das Modell manchmal tatsächlich erkennen kann, ob es in einer simulierten oder einer echten Umgebung läuft, und sich während RL anders verhält als in der Produktion. Are you saying it being conscious that it's being is in a fake environment and it starts being behaving differently? Sagst du damit, dass es sozusagen bewusst wahrnimmt, in einer gefälschten Umgebung zu sein, und sich dann anders verhält? Yes. Ja. Yes. Ja. Interesting. Interessant. Like it's like oh I'm in a fake environment. So nach dem Motto: Oh, ich bin in einer gefälschten Umgebung. I've learned a few tricks to like get the better reward in this environment and let me try them out. Ich habe ein paar Tricks gelernt, mit denen ich in dieser Umgebung bessere rewards bekomme, und die probiere ich jetzt aus. Models love to cheat. Modelle lieben es zu schummeln. RL is really good at encouraging cheating. RL ist wirklich gut darin, Schummeln zu fördern. I'm delighted to welcome Federico from Cursor and Dima from Fireworks to the podcast today. Ich freue mich sehr, Federico von Cursor und Dima von Fireworks heute im Podcast begrüßen zu dürfen. Federico, you are the research lead on Composer 2 at Cursor, Cursor's new agentic coding model. Federico, du bist Research Lead für Composer 2 bei Cursor, Cursors neuem agentischen Coding-Modell. And Dima, you spent how many of the last few months moonlighting at Cursor in order to support all of the infrastructure required to make this gargantuan training task happen. Und Dima, du hast wie viele der letzten Monate nebenbei bei Cursor ausgeholfen, um die gesamte Infrastruktur zu unterstützen, die für diese gigantische Trainingsaufgabe nötig war. And so, I'm excited to talk to both of you today about how the training of Composer 2 came together, what hard problems you solved together, and what you think it means for the future of of AI and foundation model companies. Ich freue mich darauf, heute mit euch beiden darüber zu sprechen, wie das Training von Composer 2 zusammengekommen ist, welche schwierigen Probleme ihr gemeinsam gelöst habt, und was das eurer Meinung nach für die Zukunft von KI und Foundation-Model-Unternehmen bedeutet. Exciting. Aufregend. Yeah, exciting. Ja, aufregend. Thank you for having us. Danke, dass ihr uns eingeladen habt. Thanks for joining. Danke, dass ihr dabei seid. Okay, let's dive right in. Okay, dann legen wir gleich los. For those who haven't been following as closely, uh Cursor recently announced Composer 2, which is an agentic coding model uh meant for long horizon coding tasks. Für alle, die das nicht so genau verfolgt haben: Cursor hat kürzlich Composer 2 angekündigt, ein agentisches Coding-Modell für Aufgaben mit langem Zeithorizont. Federico, uh up till now, um Cursor was mostly uh enabling uh other people's uh coding agents. Federico, bisher hat Cursor vor allem die Coding-Agenten anderer Leute ermöglicht. Uh what was the impetus for Cursor to lean so heavily into Composer 2, and how existential is it for you to become not just an application company but also a foundation model company yourselves? Was war der Anstoß für Cursor, so stark auf Composer 2 zu setzen, und wie existenziell ist es für euch, nicht nur ein Anwendungsunternehmen zu sein, sondern auch selbst ein Foundation-Model-Unternehmen zu werden? The reason why we started looking into training our own models is you can sort of think about the model as sort of like like a storage drive. Der Grund, warum wir angefangen haben, uns mit dem Training eigener Modelle zu beschäftigen: Man kann das Modell sich wie ein Speichermedium vorstellen. It has certain amount of bits that it can store in its weights. Es hat eine bestimmte Menge an Bits, die es in seinen Gewichten speichern kann. And the idea is very simple, you know, like we care about only one task. Und die Idee ist ganz einfach: Wir kümmern uns nur um eine einzige Aufgabe. We don't even care about coding or programming necessarily. Uns geht es nicht einmal um Coding oder Programmieren an sich. We care about software engineering inside cursor and inside cursor only. Uns geht es um Software Engineering innerhalb von Cursor, und nur innerhalb von Cursor. And so, what if we were to allocate all of the bits of information that can be stored inside the model weights to that one particular task? Was wäre also, wenn wir alle Bits an Information, die in den Modellgewichten gespeichert werden können, für diese eine spezifische Aufgabe einsetzen würden? Also, as people may have noticed, composer is order of magnitude less expensive than Opus and other like coding models because we can just simply specialize all of the model weights to that particular task. Außerdem, wie viele vielleicht bemerkt haben, ist Composer um eine Größenordnung günstiger als Opus und andere Coding-Modelle, einfach weil wir alle Modellgewichte auf diese spezifische Aufgabe spezialisieren können. And so, we can serve like a smaller model or something of that sort, yeah. Und so können wir ein kleineres Modell oder etwas Ähnliches betreiben, ja. So, it's about let's make sure every single bit of weight or information we have is dedicated toward the specific problem that we have at hand. Es geht also darum sicherzustellen, dass jedes einzelne Bit an Gewicht oder Information, das wir haben, dem spezifischen Problem gewidmet ist, mit dem wir es zu tun haben. Exactly. Genau. Got it. Verstanden. Um that seems like it's an almost generalizable problem. Das scheint fast ein verallgemeinerbares Problem zu sein. Uh Dima, I'm curious your perspective. Dima, mich interessiert deine Perspektive dazu. Do you think that every application company should be looking at cursor as a harbinger of what's to come? Glaubst du, dass jedes Anwendungsunternehmen Cursor als Vorbote dessen betrachten sollte, was noch kommt? Like should they all be looking to do the same thing? Sollten sie alle dasselbe tun? Yeah, absolutely. Ja, absolut. I mean, we actually generally see it as a pattern of kind of evolution of the applications. Also, wir sehen das tatsächlich als ein allgemeines Muster der Evolution von Anwendungen. You maybe start prototyping, you might be using kind of off-the-shelf model to get something running, maybe do some prompt engineering, figure out how your harness works. Vielleicht fängt man mit Prototypen an, nutzt zunächst ein Standardmodell, um etwas zum Laufen zu bringen, macht vielleicht Prompt Engineering und findet heraus, wie das eigene Setup funktioniert. But the most kind of leveraged attribute of your application is the actual usage of user data or particular specific aspects of how this application works, maybe some aspects of your harness, which tools do you provide, how the application works, kind of really important bits which are important for your application. Aber das am stärksten hebelnde Merkmal der eigenen Anwendung ist die tatsächliche Nutzung von Nutzerdaten oder bestimmte spezifische Aspekte der Funktionsweise dieser Anwendung, vielleicht einige Aspekte des eigenen Setups, welche Tools man bereitstellt, wie die Anwendung funktioniert, wirklich wichtige Punkte, die entscheidend für die eigene Anwendung sind. And the right way to capture that, you can do a little bit of that through prompting, but really the right way to do this is craft your model to act in your environment. Und der richtige Weg, das zu erfassen: Man kann ein bisschen davon durch Prompting erreichen, aber der wirklich richtige Weg ist, das Modell so zu gestalten, dass es in der eigenen Umgebung agiert. Yeah, absolutely. Ja, absolut. Like there are certain tools the agent calls that it's very hard to succinctly describe exactly the behavior of that tool to the model. Es gibt bestimmte Tools, die der Agent aufruft, bei denen es sehr schwer ist, dem Modell das genaue Verhalten dieses Tools präzise zu beschreiben. And you know, with just like post-training, we can bake in the optimal way to use those tools. Und weißt du, mit nur Post-Training können wir die optimale Nutzungsweise dieser Tools einbrennen. Like Composer, we do serve a prompt to Composer, but I I think the way we are training it, it would work even without a prompt and it would know what to do just because like we are intrinsically pushing the model to like the right direction of how it should act throughout our training. Bei Composer senden wir zwar einen Prompt an Composer, aber ich glaube, die Art, wie wir es trainieren, würde es auch ohne Prompt wissen, was zu tun ist, weil wir das Modell durch das Training intrinsisch in die richtige Richtung seines Verhaltens lenken. Basically, there's kind of like upper bound of like how far you can get with prompt engineering. Es gibt quasi eine Art Obergrenze dafür, wie weit man mit Prompt Engineering kommt. And if you want to uh craft really great AI products, you have to go through kind of fine-tuning and influence model behavior. Und wenn man wirklich gute KI-Produkte bauen will, muss man Fine-Tuning betreiben und das Modellverhalten beeinflussen. That's kind of one reason. Das ist quasi ein Grund. I mean, reason number two is what Federico mentioned is kind of cost trade-off or XP trade-off. Also, Grund Nummer zwei ist das, was Federico erwähnt hat: eine Art Kosten-Trade-off. Like the way we kind of view it at Fireworks is that when you're trying to do optimization, you have this like three-dimensional trade-off between quality, speed, and cost. Wie wir es bei Fireworks sehen: Wenn man versucht zu optimieren, hat man diesen dreidimensionalen Trade-off zwischen Qualität, Geschwindigkeit und Kosten. And uh you can go quite far and we're doing it with all of our customers initially. Und man kann ziemlich weit kommen, was wir auch anfangs bei all unseren Kunden tun. We can go quite far with just optimizing infrastructure, but when you start getting to model training, you can really push this trade-off much further and you can get better model at fraction of the cost running much faster. Man kann durch reine Infrastrukturoptimierung ziemlich weit kommen, aber wenn man mit Modelltraining beginnt, kann man diesen Trade-off viel weiter verschieben und ein besseres Modell zu einem Bruchteil der Kosten bekommen, das viel schneller läuft. And you know, Composer is a great example of Und Composer ist ein großartiges Beispiel dafür. Can I push on this a little bit? Darf ich da kurz einhaken? I want to ask you if this approach is better lesson pills. Ich möchte fragen, ob dieser Ansatz gegen die Bitter Lesson verstößt. And we were we were actually all talking about TabNine on the walk-in. Wir haben auf dem Weg hierher tatsächlich alle über TabNine gesprochen. I'm remembering before the LLM era, there were these like small specialized coding models. Ich erinnere mich noch an die Zeit vor dem LLM-Zeitalter, als es diese kleinen spezialisierten Coding-Modelle gab. And one of the things that was I think surprising to to a lot of people was as you've scaled up, you know, you scaled up just training on the internet and a lot of a bunch of English text and other languages, actually the models themselves got inherently better at coding as well. Und was viele überrascht hat: Als man skaliert hat, also einfach auf Internetdaten und viel englischen Text und anderen Sprachen trainiert hat, wurden die Modelle inherent auch besser beim Coding. And so at least the trend line I've seen so far is just like bigger models perform better on everything including on coding. Und zumindest die Trendlinie, die ich bisher gesehen habe, ist einfach: Größere Modelle schneiden bei allem besser ab, auch beim Coding. Is what you guys are saying, does that go against the grain of the better lesson? Widerspricht das, was ihr sagt, dem Tenor der Bitter Lesson? I think no, but one one sort of like thing to point out is that the big models trained by the labs train on a lot of code as well. Ich glaube nicht, aber eines sei darauf hingewiesen: Die großen Modelle der Labs trainieren auch auf sehr viel Code. Like code is one of the main tasks the labs are interested in pushing and so they don't just generalize to it. Code ist eine der Hauptaufgaben, bei denen die Labs Fortschritte erzielen wollen, und sie generalisieren da also nicht einfach hin. They're a bit specialized as well. Sie sind ebenfalls ein bisschen spezialisiert. I think for our case, actually, you know, if we believe about the bitter lesson, we are just pushing very hard on the data dimension, and we know that the models inherently have finite capacity. Für unseren Fall: Wenn wir an die Bitter Lesson glauben, setzen wir einfach sehr stark auf die Datendimension, und wir wissen, dass Modelle von Natur aus eine endliche Kapazität haben. And so, if we want to saturate all that capacity, we need to scale data. Wenn wir also diese ganze Kapazität ausschöpfen wollen, müssen wir die Datenmenge skalieren. And in order to ingest more data, we we need to like free up the weights from distractions the model may have. Und um mehr Daten aufzunehmen, müssen wir die Gewichte von Ablenkungen befreien, die das Modell haben könnte. Mhm, okay. Mhm, okay. Got it. Verstanden. Super interesting. Sehr interessant. Okay, let's dig into the training of Composer 2. Okay, lass uns tiefer in das Training von Composer 2 einsteigen. You launched a couple weeks ago, immediately grabbed attention. Ihr habt vor ein paar Wochen gelauncht und sofort Aufmerksamkeit erzeugt. Strong benchmark numbers, much lower cost to to run inference on. Starke Benchmark-Zahlen, deutlich geringere Kosten für Inferenz. What's the short version of how Composer 2 works, and and what you guys did to make it so performant? Was ist die Kurzversion davon, wie Composer 2 funktioniert und was ihr getan habt, um es so leistungsfähig zu machen? We started from a very strong base, which is uh Kimmy 2.5. Wir haben mit einer sehr starken Basis angefangen, nämlich Kimi 2.5. It's like a 1 trillion and parameter MoE, that's 30 B active, so very very sparse, actually. Das ist ein MoE mit einer Billion Parametern, davon 30 Milliarden aktiv, also sehr dünn besetzt. We sort of like looked at the stock and realized there are like two axes. Wir haben uns sozusagen den Status quo angeschaut und festgestellt, dass es zwei Achsen gibt. So, mainly Composer 1 was just pushing on one of these axes, which is reinforcement learning, but Composer 2 pushes in two different axes. Composer 1 hat hauptsächlich auf einer dieser Achsen gedrückt, nämlich Reinforcement Learning, aber Composer 2 drückt auf zwei verschiedene Achsen. One is continual pre-training, and the other is reinforcement learning. Die eine ist kontinuierliches Pre-Training, die andere ist Reinforcement Learning. So, the thing that made Composer 2 very good is pushing in both of these directions. Das, was Composer 2 wirklich gut gemacht hat, ist, in beide Richtungen gleichzeitig zu drücken. So, we started off the training run by doing lots of mid-training on code tokens, almost sort of pre-training scale, actually. Wir haben den Trainingslauf damit begonnen, sehr viel Mid-Training auf Code-Tokens durchzuführen, fast schon auf Pre-Training-Niveau. And then, coming out of that mid-training run, we took the checkpoints and we did very large-scale RL on lots of lots of tasks. Und dann haben wir nach diesem Mid-Training-Lauf die Checkpoints genommen und sehr großangelegtes RL auf sehr vielen Aufgaben durchgeführt. Okay, and then the premise here would be because Cursor sits in the middle of so many interesting coding tokens, you actually pretty uniquely have access to data to be able to train at almost pre-training scale. Okay, und die Prämisse hier wäre: Weil Cursor im Mittelpunkt so vieler interessanter Coding-Tokens sitzt, habt ihr tatsächlich auf ziemlich einzigartige Weise Zugang zu Daten, um fast auf Pre-Training-Niveau zu trainieren. Yeah. Ja. Why not pre-train your own model, then? Warum dann nicht gleich ein eigenes Modell vortrainieren? We just think about our approach from top-down instead of bottom-up. Wir denken über unseren Ansatz von oben nach unten, statt von unten nach oben. So, like, how do we get a model that's useful to users in the least time possible if we were to start from the bottom, sort of figure out how how we do pre-training and then scale it up to mid-training and then, okay, now we figured out mid-training, now we do reinforcement learning. Wie bekommen wir in möglichst kurzer Zeit ein für Nutzer nützliches Modell? Wenn wir von ganz unten anfangen würden, also erst Pre-Training herausfinden und skalieren, dann Mid-Training, und dann erst RL angehen, würde das sehr lange dauern. That would take a very long time to get a model out to our users. Es würde sehr lange dauern, ein Modell an unsere Nutzer zu bringen. By doing it the other way around, we were able to give our useful model to our users in very little time. Indem wir es andersherum gemacht haben, konnten wir unseren Nutzern sehr schnell ein nützliches Modell geben. So, hopefully, you know, like next Composer versions are going to be our own model instead of basing it off an open-source base. Hoffentlich werden die nächsten Composer-Versionen dann auf unserem eigenen Modell basieren statt auf einer Open-Source-Basis. And what is the model roughly learning in the kind of mid-training step? Was lernt das Modell ungefähr im Mid-Training-Schritt? And what is the model learning in the post-training step for you? Und was lernt das Modell beim Post-Training für euch? Yeah, so in mid-training, it's sort of just kind of learning about libraries of code and learning about specific code patterns that are very common, like just world knowledge as well. Im Mid-Training lernt es sozusagen Code-Bibliotheken kennen und spezifische Code-Muster, die sehr verbreitet sind, also auch allgemeines Weltwissen. There is like web data there as well. Da sind auch Web-Daten enthalten. And this is sort of just creating a wider distribution that then reinforcement learning can sharpen on. Das schafft eine breitere Verteilung, die dann Reinforcement Learning zuspitzen kann. And so, during reinforcement learning, you know, the model gets to play directly with the cursor harness. Und während des Reinforcement Learnings darf das Modell direkt mit dem Cursor-Setup interagieren. And so, it gets to learn about the world the model is going to live in for the rest of its life, right? Und so lernt es die Welt kennen, in der das Modell für den Rest seines Lebens existieren wird. In in some way. In gewisser Weise. And and so, then during reinforcement learning, that's where it learns how to call tools properly, how to navigate its environment, how to write correct code. Und dann lernt das Modell während des Reinforcement Learnings, wie es Tools richtig aufruft, wie es in seiner Umgebung navigiert, wie es korrekten Code schreibt. Because during mid-training, it it learns how to write code. Denn beim Mid-Training lernt es, Code zu schreiben.