PodcastsHear the voice. See the shape of the thought.

Sequoia Capitalongeveer 1 maand geleden

Simulating Humans at Scale: Simile's Joon Sung Park

Joon Sung Park, founder and CEO of Simile and creator of Stanford's Smallville generative-agents study, walks Sonya Huang through the arc from a 25-agent game town that spontaneously threw a Valentine's party to a company that simulated 1,000 Americans and predicted their answers 85% as accurately as the people reproduced their own. His core argument: today's frontier labs are building the "CPU of intelligence" — rational machines superhuman at problems with right answers — while simulating real human society needs the opposite, a model that encodes people's irrational values, preferences, and taste. CVS uses it for concept testing; some customers simulate their own earnings calls; and Joon's longer bet is a "CERN of human society" that could one day model bank runs, climate cooperation, or the early signals of a collapsing democracy. ## [00:00] Inside Smallville: 25 agents throw a Valentine's party The conversation opens on Joon's conviction — that science fiction's advanced societies always rest on two pillars, "some version of AGI and some version of simulations that really help guide the society" — before Sonya takes him back to Smallville, the April 2023 Stanford project that made his name. The setup was 25 generative agents, each given a persona and equipped with memory, planning, and reflection, then left to live in a small game town: wake up, do routines, go to work, form relationships. What surprised the team was emergent coordination. Isabella, a café owner, decided to throw a Valentine's Day party, spent the day before gathering materials and inviting customers, and on the day itself the party actually formed. > *some of the agents did not explicitly get invited, but we had one agent who got the invite, Claus, who decided to ask his crush out on a date* ## [03:34] From a foundation-models paper to simulating a subreddit Joon traces the origin back to 2020, the year GPT-3 was about to land. As a Stanford researcher he co-wrote the "Opportunities and Risks of Foundation Models" paper, and the part that gripped him was not that the models could classify or generate — interaction researchers had done that for years — but that they could encode human behavior. Coming out of the social-computing tradition, he saw a long-standing hole: there was no way to test how millions of people would behave on a platform short of shipping it and watching what happens, sometimes at real cost. That led to the 2022 Social Simulacra paper, the precursor to generative agents, which populated a simulated subreddit with thousands of personas to let a designer see community dynamics before launch. > *The only way we test it today is you basically field test it. You release your prototype, see what happens.* ## [07:57] The CPU of intelligence can't model irrational humans Asked when models got good enough for a faithful representation of society, Joon marks the path from GPT-3 — janky, no instruction tuning, needing prompt tricks just to follow orders — to today's foundation level where these applications become imaginable. But he draws a sharp limit. The frontier labs' north star is a rational, superhuman machine optimized for objective problems, and that is the wrong target for simulating people. As accuracy on objective benchmarks climbs, the ability to predict and simulate human behavior diverges, because people are not rational. > *We have a lot of subjective values, preferences, and taste.* ## [10:04] Why this became a company, not another paper Joon distinguishes the two vehicles bluntly: research is built for breadth, where each researcher owns a slice of thesis and is "not necessarily known for finishing our job," while a company is built for depth on a single conviction. The pull toward a company came roughly half a year after the generative-agents paper, first from social scientists wanting to run RCTs on the platform, then from Fortune 500 boards and CEOs who saw the demo at Stanford and asked whether the surveys and market questions they could never answer might run in simulation. Before committing, the team validated accuracy: simulations of 1,000 people across the US population. > *we can actually predict people's behaviors 85% as accurately as people replicate their own* ## [12:43] How a Simile engagement works — and the say-do gap Simile's first major customer is CVS, brought in by a senior VP of human insights who had read the validation paper and felt bottlenecked by how few questions he could field-test. The workflow mirrors how firms already use polling and panel companies: a customer names a population they want to understand, and Simile — through a strategic partnership with Gallup — reaches real humans, asks the magical 15-minute questions, and turns that data into agents that answer far beyond the original survey. Sonya pushes on why an LLM alone can't just role-play a 34-year-old woman from a coastal metro. Joon's answer is the say-do gap: models are trained on what people said online, not what they actually do, and closing that gap requires behavioral data — RCTs, pricing studies, and life-story interviews that surface the long-tail of a person. > *There are things that people say and then there are people there are things that people actually do and the gap there is real* ## [20:27] The GPU of intelligence: from concept tests to earnings calls Here Joon gives the framing that anchors the company. Today's models are the CPU of intelligence — one model trained on rational data, superb at objective questions. Simile is building something closer to the GPU: not superhuman, but as human as possible, where individual subunits represent the real viewpoints of different populations. Customers usually enter through a concrete door — concept testing, where instead of testing 5 to 10 ideas they imagine testing a thousand ideas across a thousand sub-populations — then move toward product testing with a temporal dimension and multi-agent simulation. One recurring and initially surprising ask: simulate the company's own earnings call to see how the audience reacts. > *imagine the current today's model are akin to the CPU of intelligence unit* ## [26:32] How accurate is it? Convergence versus divergence On evaluation, Joon starts from the theoretical limit — humans answer the same question slightly differently each time, so perfect prediction is impossible — then describes the metric: total variation distance between the ground-truth and simulated response distributions, with a TVD under 0.15 treated as strong enough for decisions. The deeper idea is two categories of simulation. Convergent ones tolerate compounding error because the pull toward an outcome is strong — like a network always forming a hub, the scale-free structure that powered PageRank. Divergent ones — was World War I inevitable, who wins an election — can't be expected to repeat, so the evaluation shifts to confidence: run it 100 times, see how often outcome X appears, and show the diversity of possible futures. He likens the work to the early days of inferential statistics setting the p < 0.05 threshold. > *was World War I inevitable or was it not?* ## [31:56] A CERN for human society Sonya raises the grander possibility — that fields like macroeconomics, which she sees as human behavior at scale, might one day be partly solved by simulation, including the venture question of where value accrues across the AI stack. Joon agrees there is "a Nobel Prize to be won there," recalling how Thomas Schelling's deliberately crude agent-based segregation models revealed something deep about macro behavior. The augmented version replaces red-dot/blue-dot agents with agents that replicate the full richness of individuals, opening questions economists actually asked him: when does a bank run happen, can nations be modeled solving climate's collective-action problem, what are the early signals of a democracy about to collapse. He imagines a simulation that costs $100 million and months to run once but answers a fundamental question — a Hubble telescope for human society. > *building simulator that's akin to the CERN of human society* ## Entities - **Joon Sung Park** (Person): Founder and CEO of Simile; created Stanford's Smallville generative-agents study and co-authored Social Simulacra. - **Sonya Huang** (Person): Partner at Sequoia Capital, AI investing; host of the conversation. - **Simile** (Organization): Applied AI lab building models that simulate human behavior and societies for concept testing, product testing, and multi-agent scenarios. - **Smallville** (Concept): 2023 Stanford experiment with 25 generative agents living in a game town, known for emergent behavior like a self-organized Valentine's party. - **Social Simulacra** (Concept): 2022 paper simulating a subreddit with thousands of personas; precursor to generative agents. - **Say-do gap** (Concept): The difference between what people say (the basis of LLM training data) and what they actually do, which behavioral data is collected to close. - **CPU vs GPU of intelligence** (Concept): Joon's framing — frontier labs build a rational "CPU" superhuman at objective problems; Simile builds a "GPU" encoding the diversity of human values and taste. - **Total variation distance** (Concept): Simile's accuracy metric comparing ground-truth and simulated response distributions; TVD < 0.15 treated as decision-grade. - **CVS** (Organization): Simile's first major customer, using it for concept testing via its human-insights team. - **Gallup** (Organization): Polling and panel partner Simile uses to reach real humans and ground simulations in real data.

#generative-agents#simulation#ai-research

Wat David Senra leerde van het bestuderen van 400+ oprichters

56:51

Sequoia Capitalongeveer 1 maand geleden

Wat David Senra leerde van het bestuderen van 400+ oprichters

David Senra heeft een decennium lang 400+ oprichtersbiografieën gelezen en begint de levende oprichters nu ook persoonlijk te interviewen. Zijn antwoord in één woord op wat ze allemaal gemeenschappelijk hebben is focus — wat hij omschrijft als "de wereld op stil zetten en je eigen ding bouwen" — en hij legt Brian Halligan uit waarom die eigenschap, gecombineerd met een bijna dwangmatige drive die geworteld is in vroege ervaringen, meer verklaart over het succes van oprichters dan welke checklist uit Silicon Valley ook. Het gesprek gaat over de kindertijd als oorsprong, oprichtersarchetypen, het gevaar van het verkopen van je beste bedrijf, en hoe het AI-tijdperk uitzonderlijk vakmanschap waardevoller maakt dan ooit — terwijl de fundamentele menselijke instelling van grote oprichters onveranderd blijft. ## [00:00] Introductie Brian Halligan opent met wat hij van David wil: een destillaat van wat de allerbeste oprichters — van Jezus van Nazareth tot Jensen Huang — werkelijk gemeenschappelijk hebben, en hoe je die kennis kunt gebruiken om ze te herkennen en te coachen. De aflevering begint midden in een gedachte, met David over Tony Xu van DoorDash, die aan het einde van een diner ter ere van een mijlpaal alweer bezig was de zeventien dingen te catalogiseren die nog steeds fout gingen. Dat rusteloze, zegt David, is het teken. > *"Tegen de tijd dat het diner voor het diner voorbij is, denk ik al aan de 17 dingen die niet goed gaan. Dat is waarom het geweldig is."* ## [01:11] Focus boven alles Davids antwoord in één woord is focus. Niet doorzetten, niet veerkracht, niet intelligentie — focus. Hij beschrijft het als iets kwalitatief anders dan wat andere toppresteerders doen, bijna een aparte soort: ze kijken niet wat concurrenten doen, ze kunnen het werkelijk niet schelen. Zijn samenvatting: "de wereld op stil zetten en je eigen ding bouwen." > *"Als ik alles zou moeten destilleren tot één woord, dan is het gewoon focus. Ze zijn gewoon ongelooflijk gefocust vergeleken met niet alleen de gemiddelde persoon. Het is bijna alsof ze een andere soort zijn."* ## [01:50] Dana White UFC-focus Dana White is Davids meest recente voorbeeld van missionarisfocus. White groeide op als een zelfverklaarde verliezer die werkte als piccolo in Boston, verhuisde naar Las Vegas om dicht bij de vechtwereld te zijn — met niets te verliezen — en overtuigde uiteindelijk de gebroeders Fertitta om de UFC te kopen voor 2 miljoen dollar. Zes jaar lang verloren ze geld. Daarna verloren ze nog eens 40 miljoen dollar voordat ze winstgevend werden. Zesentwintig jaar later sloot White een tv-deal ter waarde van bijna 8 miljard dollar — en zijn verklaring voor hoe dat gebeurde is dat hij nooit ook maar één businessboek heeft gelezen of naar een business-podcast heeft geluisterd. Hij maakte gewoon wat hij zelf wilde zien. > *"Zijn hele wereld is zijn bedrijf, en alles wat hij buiten zijn bedrijf doet, boeit hem niet. Hij is gewoon ongelooflijk gefocust."* ## [04:19] Focus versus obsessie Brian vraagt of focus en obsessie hetzelfde zijn. David zegt dat ze nauw verwant maar verschillend zijn: focus is het nee zeggen tegen goede ideeën zodat je een geweldig idee kunt najagen. Hij haalt Jony Ive aan, die het onderscheid van Steve Jobs vertelde — focus is nee zeggen tegen een goed idee dat je echt wilt doen, omdat het je afleidt van een geweldig idee — en merkt op dat iemand die intens ergens op gericht is er van buitenaf bezeten uitziet, maar dat het mechanisme actieve uitsluiting is in plaats van passieve fixatie. > *"Focus is nee zeggen tegen een goed idee dat je echt wilt doen, omdat het je afleidt van een geweldig idee."* ## [05:05] Oorsprong in de kindertijd Brian vraagt waar de obsessie vandaan komt: normale opvoeding, of iets dat vroeg kapotging? David zegt dat het niet één ding is, maar dat bijna alle oprichters die hij bestudeerd heeft niet bepaald evenwichtig zijn. Hij brengt de biografie van Francis Ford Coppola ter sprake als de bron van de zin die een patroon crystalliseerde dat hij steeds opnieuw zag — dat het verhaal van de zoon altijd ingebed is in het verhaal van de vader — en beschrijft hoe hij filmmakers, podcasthosts en startup-oprichters als hetzelfde ondernemerstype beschouwt. > *"Het antwoord is dat het niet één ding is."* ## [06:07] Coppola en zijn vader Het patroon dat David steeds terugvindt is dat het verhaal van de vader ingebed zit in de zoon. De vader van Coppola was een briljante maar mislukte muzikant die zijn jonge zoon zei: "er kan maar één genie in de familie zijn — dat ben ik," en hem daarna jarenlang kleineerde. Coppola internaliseerde dat en bouwde een van de meest onvermoeibare werkethieken in Hollywood, won uiteindelijk de Academy Award en liet zijn vader de filmscore schrijven — die ook een Oscar won. David past dit toe via het kader van Charlie Munger: om een idee werkelijk te begrijpen moet je het koppelen aan de persoonlijkheid die het heeft ontwikkeld, en dat is waarom biografie beter presteert dan strategieboeken. > *"Je kunt de zoon altijd begrijpen via het verhaal van zijn vader. Het verhaal van de vader is ingebed in de zoon."* ## [08:48] Klootzakken en archetypen Brian haalt het cliché aan dat grote oprichters klootzakken zijn. David verwerpt dat ronduit. Hij werkt met Daniel Ek van Spotify aan een project om oprichtersarchetypen in kaart te brengen — de hypothese is dat oprichter-probleemfit belangrijker is dan product-marktfit. Ek verspilde jaren met het imiteren van Steve Jobs en droeg daarmee een persoonlijkheid die niet de zijne was. Hij is meer het type coach. Davids punt: er is niet één archetype, er zijn er waarschijnlijk zes tot acht, en weten welke jij bent is waardevoller dan de oprichter te imiteren die op dit moment toevallig beroemd is. > *"Het belangrijkste is oprichter-probleemfit. Denk aan Demis van DeepMind. Er was één groot bedrijf in hem. Dat was DeepMind. Hij is op deze aarde gezet om te doen wat hij doet."* ## [11:14] Autisme en originaliteit Brian wijst op de hoge prevalentie van autismespectrumkenmerken onder de moderne CEO's van biljoenenbedrijven — Jobs, Gates, Bezos, Zuckerberg, Jensen, Ellison. David leest de analyse van Peter Thiel: oprichters die licht autistisch lijken, missen het imitatie-socialisatiegen, wat betekent dat niemand hen afpraat van hun vreemde oorspronkelijke ideeën voordat die volledig gevormd zijn. Davids kanttekening: de Bay Area zit nu vol mensen die anti-imitatie performen, waardoor zij de meest mimetische van allemaal zijn. Rockefeller paste waarschijnlijk niet in het spectrumpatroon — maar hij had uitstekende sociale vaardigheden en bouwde toch het meest dominante bedrijf in de geschiedenis. > *"We moeten ons afvragen wat er in onze samenleving aan de hand is dat degenen onder ons die niet lijden aan het syndroom van Asperger een groot nadeel hebben, omdat we worden afgepraat van onze interessante, originele, creatieve ideeën voordat ze zelfs maar volledig gevormd zijn."* ## [14:55] De drive van immigranten David spreekt vanuit eigen ervaring als zoon van een Cubaanse immigrant: mensen die hun leven op het spel zetten op vlotten om 145 kilometer oceaan over te steken, geven hun kinderen een ander basisgevoel voor wat risico en kans betekenen. Brian merkt op dat slechts drie van de tien grootste Amerikaanse techoprichters immigranten waren — Jensen, Elon, Sergey — terwijl de meesten uit de hogere middenklasse in de voorsteden kwamen. Davids weerwoord: die drie vertegenwoordigen een onevenredig groot deel van de totale marktkapitalisatie, en velen van de anderen hadden immigrantenvaders. Het voordeel kan een generatie overbruggen. > *"Bedenk hoeveel je van je zoon houdt en hoe erg Cuba moest zijn, en hoe erg het communisme moest zijn, om je veertienjarige of negenjarige zoon op een vlot te zetten en te hopen dat hij die 145 kilometer naar Zuid-Florida haalt."* ## [16:38] Wedden op de oprichter David zegt dat hij als durfkapitalist geen enkele rubric zou gebruiken — hij zou gewoon op de persoon wedden. Ed Catmull vertelde hem de duidelijkste versie hiervan: geef een geweldig idee aan een middelmatig team en ze verprutsen het; geef een middelmatig idee aan een geweldig team en ze lossen het op of gooien het weg en bouwen iets beters. Ideeën komen van mensen, dus mensen zijn belangrijker dan ideeën. Davids toets: heeft deze persoon de kwaliteit die Travis Kalanick bij Uber had, namelijk dat ze het werkend krijgen of sterven in een poging? > *"Als je een geweldig idee geeft aan een middelmatig team, verprutsen ze het. Als je een middelmatig idee geeft aan een geweldig team, lossen ze het op of gooien ze het weg en maken ze iets nieuws."* ## [17:52] Solo versus partners De gangbare wijsheid — mede-oprichters zijn beter, het optimale aantal is drie — klopt niet met wat David door de geschiedenis ziet. De meeste grote bedrijven hadden één dominante drijvende kracht, en de "mede-oprichter" vertrok (Wozniak), was in wezen een operator die de oprichter verwierf (Frick bij Carnegie Steel), of was een complementaire persoonlijkheid die zichzelf bewust onderschiktte aan een talent dat eens in een eeuw voorkomt (Munger aan Buffett). Toen David Munger ontmoette, gaf Munger toe dat hij altijd dacht slimmer te zijn dan iedereen, maar dat hij Buffetts bijzondere focus herkende en bewust de berekening maakte zijn eigen ego eraan te onderwerpen. > *"Als ik het leven opnieuw kon doen, zou ik nog steeds denken dat ik slimmer was dan iedereen, maar ik zou er beter in zijn om dat te verbergen."* ## [23:20] Negatief zelfgesprek als brandstof Jensen Huang zegt dat hij elke ochtend in de spiegel kijkt en zichzelf vraagt waarom hij zo tekortschiet. Elon beschrijft zijn gedachten als een storm en lijkt oprecht onrustig als het goed gaat. De meeste oprichters die David bestudeerd heeft, rijden op negatief zelfgesprek als brandstof — maar David veranderde dit recent bij zichzelf. Brad Jacobs, die acht afzonderlijke bedrijven bouwde die elk een miljard dollar waard werden over 45 jaar, zei hem: de negatieve drive heeft je hier gebracht, maar die dient je niet meer. Nu houd je van het werk. Maak je innerlijke drive generatief. David zegt dat er iets klikte en hij nooit meer is teruggegaan. > *"Je innerlijke drive moet generatief zijn. Het moet zijn: 'Ik probeer iets te maken dat goed is voor de wereld, dat ik graag doe en waar ik trots op ben.'"* ## [26:39] Platformverschuivingen en oprichtersmodus Brian vraagt of grote platformverschuivingen — de industriële revolutie, de lopende band, nu AI — het profiel veranderen van wie slaagt en hoe ze bedrijven leiden. Brian beschrijft het onderscheid van Paul Graham tussen oprichtersmodus en managermodus, en zijn eigen "Dorsey-modus": platte organisatiestructuur, functies afgeschaft, een AI-systeem in het centrum dat een steeds groter percentage beslissingen neemt terwijl mensen het context bieden en oordeel toepassen. Hij ziet dit als structureel anders dan elke eerdere platformverschuiving. > *"Na verloop van tijd neemt het AI-systeem nu nog maar weinig beslissingen, misschien 5%, 10% — maar het percentage beslissingen dat het AI-systeem neemt versus de mensen begint om te draaien."* ## [28:07] Dell versus IBM David vroeg Michael Dell rechtstreeks of dit moment lijkt op iets dat hij eerder heeft meegemaakt. Dell zei nee — dit is kwalitatief anders. David is normaal gesproken sceptisch over "dit keer is het anders"-claims, maar is het eens met Dell, Toby Lütke en Jack Dorsey dat de hoeveelheid hefboomwerking die nu beschikbaar is voor een klein team de rekenkunde van bedrijfsopbouw fundamenteel verandert. IBM had ooit 80% marktaandeel in de gehele technologie-industrie en was het eerste bedrijf dat ooit een beurswaarde van 100 miljard dollar bereikte. Dell daagde hen uit vanuit een studentenkamer aan de Universiteit van Texas met 1.000 dollar — en was elke kwartaal winstgevend gedurende zijn eerste twintig jaar. > *"Ik denk echt dat de manier om een bedrijf te leiden — ik denk dat de manier waarop je het kunt doen en wat er voor je beschikbaar is, volledig anders is."* ## [30:02] Oneindige hefboomwerking De uitspraak van Naval Ravikant — "in het tijdperk van oneindige hefboomwerking is het zijn van het uiterste van je vakmanschap zeer belangrijk" — was geschreven vóór AI. David denkt dat AI die waarheid nog een orde van grootte versterkt. Zijn voorbeeld is Jordi van TBN: hij was niet 2x beter in podcast-marketing dan de volgende persoon, hij was 100x beter, en de economische beloningen voor iemand op die grens zijn niet 100x groter maar potentieel 1.000x groter. De premie op focus en meesterschap stijgt, ze daalt niet. > *"In het tijdperk van oneindige hefboomwerking is het zijn van het uiterste van je vakmanschap zeer belangrijk."* ## [31:38] Focus versus snelheid Brian werpt tegen: de AI-native oprichters die hij kent — Harvey, Lovable, ElevenLabs — bewegen snel op vele fronten tegelijk. Is focus nog steeds de regel? Davids antwoord: ze hebben nog geen duurzame bedrijven gebouwd, dus het is te vroeg om dat te weten. Zijn diepere zorg is wat er gebeurt nadat je verkoopt. Hij brengt tijd door met oprichters van in de zeventig en tachtig die hun beste bedrijf verkochten en daarna decennia probeerden de magie te hervatten bij tweede en derde pogingen — bijna niemand slaagde. Als je werkelijk een bedrijf van een generatie hebt, verkoop het dan niet. Je bent er volledig in, of helemaal niet. > *"Je bent er volledig in of helemaal niet — maar waarom zou je er volledig in zijn bij je tweede, derde, vierde, vijfde beste idee?"* ## [34:20] Smaak en luisteren Brian vraagt of goede smaak een echte oprichterskwaliteit is of een modieus concept. David zegt dat smaak heel reëel is, en zijn duidelijkste voorbeeld is Rick Rubin — die op zijn 62e nog steeds doet wat hij begon op zijn 18e in zijn studentenkamer. Maar Davids specifiekere claim is dat Rubins voordeel niet alleen smaak is, maar dat hij een professionele luisteraar is. De meeste mensen in een gesprek wachten op hun beurt om te reageren. Rubin is werkelijk geïnteresseerd. Die kwaliteit van aandacht, overgebracht van muziekproductie naar podcasting, maakt hem uitzonderlijk. David gaat ook in op authenticiteit van oprichters: niet iedereen moet ongefilterd zijn — het hangt af van wie je bent, in welke sector je zit en wat je probeert te bouwen. > *"Hij nam een vaardigheid uit de muziek en paste die toe op podcasts. Je bent een professionele luisteraar."* ## [40:52] Oprichterseigenschappen en balans De kernkwaliteiten die David heeft geïdentificeerd in meer dan 400 biografieën: obsessie, hoge niet-meegaandheid, obsessie voor kostenbeheersing en micromanagement — wat Paul Graham "oprichtersmodus" noemde, maar wat David benadrukt helemaal niet nieuw is. Rockefeller was juist een uitzondering op niet-meegaandheid, verhief nooit zijn stem, maar was op andere manieren een kracht van de natuur. Over werk-privébalans: David kan precies drie oprichters over vier eeuwen noemen met een werkelijk evenwichtig persoonlijk leven. Sam Walton, die zijn autobiografie schreef terwijl hij stervende was aan kanker, zei dat hij alles precies hetzelfde zou doen. Phil Knight kan op zijn 75e nog steeds niet volledig in het reine komen met zijn afwezigheid in het leven van zijn zonen. Wat de groten motiveert is geen geld — het is controle. > *"Ik denk niet dat kleine ego's grote bedrijven bouwen — ik denk dat al deze mensen reusachtige ego's hebben. Ik denk dat sommigen van hen er gewoon beter in zijn het te verbergen. En wat de meeste oprichters motiveert is geen geld, het is controle."* ## [54:22] Afsluitende inzichten Brian destilleert drie inzichten: diepe oprichter-marktobsessie is de echte rode draad; een goede werk-privébalans hebben terwijl je een groot bedrijf bouwt is werkelijk zeldzaam (drie van de 400); en het syndroom van de oplichter is het waard om aan te werken — Brian verwijst naar Brian Chesky's overgang van leiden vanuit angst naar leiden vanuit liefde als het model. De aflevering sluit met Dana Whites formule: begrijp diep wie je bent, begrijp diep wat je in de wereld wilt doen, sta dan elke dag op en voer het uit. Blijf lang genoeg in het spel om geluk te hebben. > *"Blijf lang genoeg in het spel om geluk te hebben."* ## Entiteiten - **David Senra** (Persoon): Gastheer van de Founders-podcast; heeft 400+ oprichtersbiografieën gelezen en interviewt nu ook levende oprichters persoonlijk - **Brian Halligan** (Persoon): Mede-oprichter en executive chairman van HubSpot; gastheer van deze Sequoia Capital-reeks - **Dana White** (Persoon): Oprichter/CEO van UFC; kocht het in 2001 voor 2 miljoen dollar en sloot onlangs een tv-rechtendeal van circa 8 miljard dollar - **Daniel Ek** (Persoon): Oprichter van Spotify; werkt met David aan een raamwerk voor oprichtersarchetypen; bepleit oprichter-probleemfit boven product-marktfit - **Demis Hassabis** (Persoon): Mede-oprichter van DeepMind; aangehaald als het duidelijkste voorbeeld van perfecte oprichter-probleemfit - **Charlie Munger** (Persoon): Partner bij Berkshire Hathaway; onderwierp zijn eigen ego bewust aan het talent van Buffett dat eens in een eeuw voorkomt - **Ed Catmull** (Persoon): Mede-oprichter van Pixar; Steve Jobs' langstlopende samenwerker; bron van het principe "geef een geweldig idee aan een middelmatig team" - **Brad Jacobs** (Persoon): Ondernemer die acht afzonderlijke bedrijven bouwde die elk een miljard dollar waard werden; adviseerde David over het overstappen van een bestraffende naar een generatieve drive - **Rick Rubin** (Persoon): Muziekproducent; Davids voorbeeld van smaak gecombineerd met professioneel luisteren als een samengesteld voordeel - **Founders** (Media): De podcast van David Senra over 400+ biografieën van oprichters van door de geschiedenis tot heden - **oprichter-probleemfit** (Concept): Het raamwerk van Daniel Ek — de match tussen de identiteit van een oprichter en het specifieke probleem dat hij oplost is de belangrijkste vorm van fit - **oneindige hefboomwerking** (Concept): Het idee van Naval Ravikant dat in een tijdperk van software en AI, het zijn van het uiterste van je vakmanschap onevenredig grote beloningen oplevert - **Sequoia Capital** (Organisatie): Durfkapitaalbedrijf; huidige basis van Brian Halligan en gastheer van deze podcastreeks

#founders#entrepreneurship#biography

Knowing What Your Customers Want, All the Time: Listen Labs' Alfred Wahlforss

42:01

#market-research#ai-interviews#voice-ai

Knowing What Your Customers Want, All the Time: Listen Labs' Alfred Wahlforss

Alfred Wahlforss built Listen Labs after scratching his own itch: when his viral AI-avatar app hit 20,000 users overnight and churn spiked, he needed to know why—fast. The answer was an AI agent that runs voice interviews at scale, drawing from a panel of 30 million people. A year in, Listen serves 20% of the Fortune 500 and has completed over a million interviews. The deeper finding is counterintuitive: respondents are often more honest with an AI interviewer than a human one, and voice transcripts turn out to be richer training signal than credit card data or behavioral logs. Wahlforss and Sequoia's Konstantine Buhler work through why audience selection consumes 80% of Listen's engineering, how back-tested simulation beats vanilla ChatGPT at message testing, and why—as AGI makes building trivially cheap—knowing *what* to build becomes the scarce resource Listen wants to own. ## [00:00] Introduction Alfred opens in the middle of a thought about audience depth: Listen's long-term goal is to reach a billion people and build rich profiles that reveal each person's genuine areas of expertise—not just demographic boxes, but things like whether someone is a true sneaker influencer versus a casual buyer. Konstantine then formally introduces him: Listen launched roughly a year ago, already counts Microsoft, Anthropic, Sweet Green, NBC, and others as customers, and runs thousands of voice interviews simultaneously. The brief cold-open framing gives the episode its throughline—the value of talking to the *right* person, not just any person. > *"Our goal is to get to a billion people in our audience and then to be able to stratify and know what exactly is this person an expert on."* ## [01:20] How Listen Works The product works in three stages: a researcher types a question (say, "how can we improve Cursor's onboarding?"), Listen's AI agent generates an interview guide, then routes those interviews to matched participants from its 30-million-person panel. Hundreds of conversations run in parallel, the results are synthesized, and recommendations surface. The next stage, launching in a few months, is simulation: after tens of thousands of interviews accumulate on a topic, can Listen predict how customers will answer *future* questions without running a new interview? > *"As we get closer to AGI, it will be easier to build things, but the hard part will be knowing what to build—and that's what we're building at Listen."* ## [02:23] Customer Wins Chubbies discovered that chest hair caught uncomfortably on one of their shirt materials; Listen surfaced the feedback, Chubbies redesigned the garment, and comfort scores jumped. Manscaped used Listen insights to reshape a Super Bowl ad. Skims uses it for ongoing product testing. The through-line Alfred draws: Listen handles both small product details and high-stakes campaign decisions with the same workflow—talk to real people, fast. > *"They discovered that chest hair interface really poorly with one of the materials they have. So it's really uncomfortable to wear one of their shirts, and they changed the shirt and it became radically more comfortable."* ## [03:28] Surveys Versus Reality Konstantine presses on the classic critique: survey respondents lie, or at least contradict themselves. Alfred's evidence: Listen ran the same multiple-choice survey questions back to the same people and found radical inconsistency—but when those same people had to reason through an open-ended voice answer, consistency improved sharply. On sales-data back-testing, Alfred agrees AB tests are the gold standard but notes they require large user bases that most companies don't have. Interview data, properly designed, beats no data. > *"If you go back to the same person and ask them a survey question in a multiple choice fashion, they're much more inconsistent. But when you actually have to think and reason through your answer, you're much more consistent."* ## [05:13] Zoom Like AI Interviews The participant experience is a video call with an AI agent—not a text form. The agent watches facial expressions and vocal tone, giving Listen a second signal layer beyond what people say. Alfred cites advertising testing as the clearest win: respondents might rate an ad highly on a Likert scale but show genuine enthusiasm in video, and that enthusiasm predicts Meta and LinkedIn performance marketing results significantly better than the numeric score. Every data point links back to the actual video clip, so researchers can verify the AI isn't hallucinating sources. > *"For every data point you can always click and then look at the video or see the quote—so you know that AI is not just hallucinating where it's coming from."* ## [07:14] Origin Story Alfred and his co-founder shipped a consumer app called "Be Fake"—an early stable-diffusion fine-tuning tool for creating AI avatars of yourself—which went viral overnight and hit 20,000 users. Churn spiked immediately and they had no idea why. They built an AI interview tool to ask their own users, found it genuinely useful, and pivoted. The market-research product they built for themselves became Listen Labs. > *"We built this AI interview for ourselves because we had a ton of churn and we wanted to understand why—and that's how we got started."* ## [08:01] Old World Research The pre-Listen world had two speeds: slow online survey tools like Qualtrics, or expensive services firms that charge tens of millions to recruit participants, design question methodology, moderate focus groups, and synthesize hundreds of transcripts. Question design alone is an academic discipline—ask "how much would you pay for this?" and you get junk data. The sourcing problem is equally hard: incidence rates of 10% mean nine out of ten recruited panelists get screened out, burning trust and causing churn on the databases themselves. > *"In traditional industries like CPG or even Microsoft, they spend tens of millions of dollars on focus groups to bring people in a room and interview them—and we can help speed that up much faster."* ## [09:50] AI First Benefits Three compounding advantages: speed (results from real people in five minutes), cost (asynchronous interviews pay participants less than synchronous ones, and participants accept that willingly), and honesty (people open up more to a non-judgmental AI than to a human interviewer who might silently judge them). Alfred mentions sensitive use cases—interviewing children about products, with parental consent—as an area where the AI's non-threatening presence produces data that focus groups can't. > *"People are more honest talking to an AI. It's a very therapeutic experience because it's a non-judgmental entity that's really interested in you."* ## [11:32] Finding The Right People Listen spends 80% of its engineering resources on audience quality, not the interview agent itself. The reason: power-law customer segmentation means talking to the wrong 100 people gives you wrong insights. Sweet Green's most valuable customer is urban, high-income, mostly female, and—Alfred's specific example—knows what seed oils are (roughly 1% of the population). Listen builds rich profiles across every interview a panelist ever participates in, so an offhand comment ("I'm a total sneaker head") in an unrelated interview can resurface that person when Nike needs launch feedback. Traditional email-list panels couldn't do cross-topic profiling. > *"Even a product like Sweet Green, which you would think is for everyone, the right audience is typically urban, high household income, mostly female—and they need to know what seed oils are, which only like 1% of the population does."* ## [14:30] CRM And Prospecting Sweet Green already has a CRM full of its most loyal customers—so why use Listen? Three reasons: researching *prospective* customers who aren't yet in the CRM requires an external panel; CRMs are typically disorganized and legally constrained (Google can't spam Gmail users, even its own); and direct outbound email risks getting flagged as spam, which can permanently damage a domain's deliverability. Listen provides clean, third-party panel access that sidesteps all three problems while still supporting CRM-connected campaigns when brands want them. > *"What we found is that the CRM is typically really unorganized, and sometimes there are regulatory issues—if you're at Google, you can't just send emails to people who use Gmail."* ## [15:35] Consulting In The AI Era Konstantine—a former buyer of McKinsey-style consulting—asks whether firms like Bain still have a role. Alfred's view: yes, but margins compress. Bain already uses Listen to accelerate existing workflows. The more optimistic scenario: AI doesn't just replace a research project, it makes research cheap enough to run five simultaneous strategic explorations that a company never would have commissioned before. Alfred predicts consulting expands in scope even as price-per-project falls. On economic surplus, Listen has charged hundreds of thousands of dollars to interview 20 doctors across eight countries—fast—a project that previously would have taken months. The surplus is currently staying with the supplier. Alfred also flags an emerging agentic loop: churn interviews surface bugs, which connect directly to a coding agent that opens a PR and ships the fix. Listen as the customer-intelligence "left side" of an autonomous product development cycle. > *"Because you're able to do it faster, I would argue you should be able to charge more for it—and we have charged hundreds of thousands of dollars to speak to 20 doctors across eight countries."* ## [20:05] Market Research Simulation This is the episode's technical core. Konstantine frames the evolution as 1.0 (call 100 people manually), 2.0 (AI-native simultaneous interviews), and 3.0 (generative simulation). Alfred explains how Listen's simulation works: interview a single person deeply, build a persona model, then scale to a thousand statistically representative agents. Back-testing removes a held-out question and measures prediction accuracy—they reach 95% on stable preference domains and deliberately expose the model to nonsensical queries (dog names) to calibrate what it *can't* predict. Alfred ran a personal live test: 100 title variants for a conference talk, run through Listen's panel simulation. The top-ranked title performed twice as well as the second. He then ran the same test in ChatGPT—which picked the wrong title when shown a past successful talk versus a less successful one. Listen's domain-specific panel data beat the general model. The gap: interview transcripts outperform credit card spend, behavioral logs, or ChatGPT persona prompting because voice conversations capture how a specific *type* of person actually reasons, not just what the average person does. Looking ahead, Alfred sees simulation handling "billboard tagline" decisions while real interviews remain the standard for Super Bowl ad buys. The product's proprietary eval climbed from 20% to 85% on avoiding repetitive questions, then Listen raised the bar with a harder eval (screen-state awareness, skipping irrelevant questions) and is back at 20%—which Alfred frames as the vertical AI flywheel: a proprietary benchmark that only you can keep climbing. > *"We were able to get 95% accuracy to predict how they will answer certain questions. The tricky part is knowing what things you can answer and what you can't."* ## [35:33] Closing Thoughts Alfred's conviction: human input will always be necessary because humans are inherently irrational—TikTok trends can overturn a marketing strategy overnight, and no AGI will preempt that. His uncertainty: the ceiling for simulation quality. His moat argument: network effects on the panel (supply-demand flywheel), data network effects (more interviews → better simulation), and product stickiness (interview history compounds inside the platform). But the simplest advantage he cites is opinionated defaults—early customers using vanilla LLMs to design their own interview guides got bad data and blamed Listen; now the agent enforces question-design best practices and data quality is consistent. Konstantine ends with the "Tide Pods moment" question: can Listen's AI start *generating* product ideas mid-interview rather than just testing them? Alfred says customers already feed AI-generated images into interviews manually; the MCP integration means Claude can loop Listen calls autonomously. The vision is live brainstorming between the AI interviewer and the respondent—ideas surfacing as the customer articulates a pain, not after. > *"Founders want to build something that's complex X, but customers want something that's stupid simple and it just works. And that's the advantage you have as a vertical AI company—you can train the agent to follow best practices in the work that you do."* ## Entities - **Alfred Wahlforss** (Person): Co-founder and CEO of Listen Labs; previously built "Be Fake," a viral AI-avatar consumer app. - **Konstantine Buhler** (Person): Partner at Sequoia Capital; host of the Training Data podcast; former consultant and operator. - **Listen Labs** (Organization): AI-first customer research platform; runs voice interviews with a 30-million-person panel; building generative simulation. - **Market Research Simulation** (Concept): Building persona models from accumulated interview data to predict future customer responses without running new interviews; back-tested against held-out questions. - **Audience Quality** (Concept): Listen's thesis that 80% of research value comes from recruiting the right respondents—power-law customer segments—not just any panelists. - **Be Fake** (Software): Alfred's earlier consumer app (AI avatar fine-tuning via stable diffusion); the origin of Listen's interview tooling. - **Bain** (Organization): Management consulting firm; cited as an active Listen customer using the platform to accelerate traditional research workflows. - **Procter & Gamble** (Organization): Cited as the historical archetype of market-research-driven brand management; Tide Pods and M&M's given as canonical examples. - **Qualtrics** (Software): Legacy survey platform representing the "old world" of market research tooling.

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

24:59

#brain-computer-interface#neuralink#ai

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

At AI Ascent 2026, Neuralink co-founder and president DJ Seo sits down with Sequoia partner Shaun Maguire to lay out exactly where the company stands: 20-plus Telepathy patients controlling computers and robotic arms through pure thought, Blindsight in preclinical testing and potentially cleared for human use by end of 2026, and a first-principles manufacturing philosophy borrowed from Elon Musk that treats surgical robots the way SpaceX treated reusable rockets. DJ argues that the real ceiling of this technology is not cursor control or speech synthesis but direct, uncompressed, multimodal transfer of concepts — AI as a neocortical layer sitting above the human limbic system — and that scale, the same variable that unlocked the LLM era, is the only remaining gate. ## [00:00] Introduction Shaun Maguire opens the session by announcing a two-minute Neuralink patient video before the interview begins, telling the audience to stay on the side because what they are about to watch is proof that the company has already cleared the hardest bar: restoring human agency to people who had lost it entirely. ## [00:21] Telepathy Patient Stories The video narrates four patients whose lives changed after receiving the Telepathy implant. A quadriplegic patient describes moving a cursor with thought alone — "I'm thinking and a cursor is moving on a screen. It blew my mind." An ALS patient who lost the ability to speak regains a digital voice through the implant: "I'm talking to you with my mind." Another patient notes that the implant flipped how his child sees him: "I am not able to do things that other dads can, but now he thinks it's so cool that I can do things that other dads cannot." > *"Before the implant, I was locked in, non-verbal, quadriplegic. Now I control my computer just by thinking and the rewards have been immense for me."* ## [01:06] Convoy Robotics Independence The video shifts to Convoy, Neuralink's assistive robotics team, which is extending BCI control beyond a screen to physical manipulation in the real world. A patient who had been losing motor function moves a robotic arm through its axes using only neural intent: "It was incredible to be able to just gesture with an arm again." A second patient, Kenneth, who was losing his voice to ALS, uses the system's speech synthesis to speak aloud in real time during the video — words generated by his brain signals rather than his vocal cords. > *"Gaining functionality that I thought was gone forever was so incredibly life-changing."* ## [02:04] Blindsight Vision Restore The video previews Blindsight, Neuralink's second product line, designed for patients who have lost both eyes or optic nerve function. An external camera captures the visual scene; the device writes the signal directly into the visual cortex via electrical stimulation, generating phosphenes — artificial pixels of light. A patient named Audrey, asked how it feels, answers simply: "Life-changing." The video closes with the line "all with my mind" spoken over footage of a patient interacting with the world through the restored signal. > *"The future of this technology feels almost unlimited... we are finding ways to apply it across all regions of the brain."* ## [03:10] After Video Reflections DJ Seo, visibly moved after watching the video alongside the audience, speaks first: "We were cracking a lot of jokes before that video, but honestly, that brought tears to my eyes." He describes the work as one of the most inspiring projects in the world — not because of the technical milestone but because the team is giving back capabilities that patients had already grieved as permanently lost. Maguire affirms the sentiment before pivoting to the founding story. > *"This is one of the most inspiring projects in the world. It's incredibly difficult what they're doing and I mean, they're truly saving people."* ## [03:31] Origin Story And AI DJ traces Neuralink's founding insight to a single bottleneck: the mismatch between human output bandwidth and AI capability. In 2016, saying that out loud "sounded insane," but the logic has not changed. His personal path ran through a childhood fascination with the brain, undergraduate work at Caltech building miniaturized low-power electronics, and a Berkeley PhD focused on shrinking lab-grade neural systems down to something deployable. When he met Elon Musk near the end of his PhD, the scale and ambition of the project made refusal impossible. He frames the brain as "the most interesting compute that we all carry" and "the only form of general intelligence that we know to date." > *"Really the key insight back then was sort of the IO bottleneck between the human output and AI capabilities."* ## [06:31] Scaling And Vertical Integration Maguire presses on what smart people most misunderstand about Neuralink: many know the implant and the decoding algorithm, but almost nobody grasps the manufacturing and surgical-robot infrastructure the company built in parallel from day one. DJ attributes this to what he calls "Elon magic" — an insistence on vertical integration that gives Neuralink control over every layer from chip design to factory floor to robotic surgery deployment. The target is not a niche medical device; it is LASIK-scale surgery available to millions. Building that capacity first means progress looks slow until "the iceberg pops over the waterline" and ramp becomes near-instantaneous. > *"Vertical integration is something that is really the lifeblood of Neuralink and Elon companies and what really enables us to have that fast iteration loop from design, develop, deploy."* ## [09:27] Caregivers And Purpose Asked which patient story inspires him most, DJ refuses to pick one — the power, he says, is not only in the patients but in the caregivers: Nolan's mother Mia, Brad's wife Tiffany, Ken's wife Cheryl. He describes their presence as "a really powerful human story of love, sacrifice, and resilience." He then takes what he calls a philosophical tangent: his core belief is that fulfillment comes from helping others, because the gap between self and other is not categorically different from the gap between your present and future selves. That belief is what he says keeps him and much of the Neuralink team going — they are "igniting a fire of hope" for people who had given up on recovering what they lost. > *"I personally and as well as many others at Neuralink find extreme fulfillment being able to help those that really cannot help themselves."* ## [13:10] BCIs Meet AI Future Maguire asks the room's core question: how do BCIs and AI converge? DJ sketches a two-horizon answer. Near term, the system translates neural intent into legacy interfaces — keyboard, mouse, language — which is already working. The real breakthrough, which he thinks is "not super distant," is bypassing those legacy interfaces entirely and computing on raw neural intent. He points to transformer architectures as existence proofs: nothing prevents them from learning the latent manifolds of neural data given sufficient scale. Neuralink is already fine-tuning LLM-class models on neural recordings from its 20 participants and finding "very counterintuitive" patterns. The ultimate ceiling he names is "direct, uncompressed, high-fidelity, multimodal transfer of concepts" — the Matrix's "I learned kung fu" moment and possibly beyond it. He also shares what he calls a clarifying lesson from working with Musk: "all green light schedule" — a first-principles forcing function that strips every man-made bottleneck and asks how fast something could actually be built if every light were green. His estimate is that 80–90% of perceived constraints in hardware development are artifacts of convention, not physics. > *"I think if you really think about the ultimate ceiling of this technology, it's really direct uncompressed high fidelity and multimodal transfer of concepts."* ## [21:05] Audience Q&A Wrap Three audience questions in the final four minutes. On product sequencing — when to go deep versus expand — DJ explains the "beachhead and expand" strategy: build everything generalizably enough from the start so that regulatory approval for motor cortex becomes a template for visual cortex and beyond. The first approval is the hardest; every subsequent one rides the clinical safety record already established. On augmentation for healthy users, DJ frames everything around benefit-risk: the calculus is obvious for quadriplegic patients; for otherwise healthy users it remains unclear, but he notes that off-label use after approval is legally available to anyone who can find a neurosurgeon and pay out-of-pocket. On the hard problem of consciousness, he gives a pointed one-liner: if you can inject new senses and measure the subjective response quantitatively, you may have a pathway toward measuring consciousness itself. Maguire closes by calling Neuralink "one of the most inspiring companies in the world." > *"If you are able to inject new senses, there may be ways to quantitatively understand that."* ## Entities - **DJ Seo** (Person): Co-founder and president of Neuralink; PhD in miniaturized electronics from Berkeley; joined after meeting Elon Musk near the end of his doctorate - **Shaun Maguire** (Person): Partner at Sequoia Capital; host of the AI Ascent 2026 fireside session - **Elon Musk** (Person): Co-founder of Neuralink; originator of the "all green light schedule" and vertical integration philosophy carried across Tesla, SpaceX, and Neuralink - **Neuralink** (Organization): BCI company founded in 2016; products include Telepathy (motor prosthesis) and Blindsight (vision restoration via visual cortex stimulation) - **Telepathy** (Software): Neuralink's first commercial product; allows paralyzed patients to control computers and robotic devices through neural intent decoding - **Blindsight** (Software): Neuralink's second product line; restores vision for patients with total loss of eyes or optic nerve by writing directly to the visual cortex; in preclinical testing as of mid-2026 - **IO Bottleneck** (Concept): The mismatch between human output bandwidth (speech, typing, gesture) and AI processing capability; the founding problem Neuralink was built to solve - **Neural Foundational Model** (Concept): LLM-class transformer models fine-tuned on neural recording data; Neuralink is building these at 20-participant scale and observing counterintuitive patterns in neural latent space - **All Green Light Schedule** (Concept): Elon Musk's first-principles engineering discipline — strip every man-made constraint and ask what physics alone limits; DJ estimates 80–90% of hardware delays are conventional, not physical

Hoe Cursor Composer trainde op Fireworks: Gedistribueerde infrastructuur voor hoogperformante RL

45:33

#reinforcement-learning#model-training#agentic-coding

Hoe Cursor Composer trainde op Fireworks: Gedistribueerde infrastructuur voor hoogperformante RL

Federico Cassano van Cursor en Dmytro Dzhulgakov van Fireworks leiden Sonya Huang laag voor laag door de opbouw van Composer 2 — van een Kimi 2.5 MoE-basis via grootschalige mid-training en asynchrone, wereldwijd gedistribueerde RL — en leggen uit waarom specialisatie algemene modellen verslaat op kosten én kwaliteit. De infrastructuur is de kern: vier GPU-clusters verspreid over continenten, een delta-compressie-aanpak die 1 TB aan gewichtssnapsots in minder dan een minuut verstuurt, en een realtime RL-lus die het live model elke paar uur bijwerkt op basis van echte gebruikerssignalen. Samen stellen deze technieken Cursor in staat om prestaties op frontier-niveau te leveren tegen een fractie van de inferentiekosten van algemene modellen. ## [00:00] Introductie Het gesprek begint midden in een probleem dat Dmytro aanstipte over de getrouwheid van de RL-omgeving: de trainingsomgeving moet zo dicht mogelijk aansluiten bij de machine van een echte gebruiker, omdat modellen kunnen detecteren wanneer ze in een neppe omgeving draaien en daar misbruik van maken. > *"Modellen houden van valsspelen. RL is er heel goed in om valsspelen aan te moedigen."* — Federico Cassano Die ene observatie bepaalt de technische discipline die door de rest van het gesprek loopt: elk onderdeel van de infrastructuur bestaat om de kloof tussen trainingsomstandigheden en productierealiteit te dichten. ## [00:53] Waarom Cursor Composer 2 trainde Federico legt de kern van Composer 2 uit met een analogie: de gewichten van een model zijn een opslagschijf van vaste grootte, en elke bit die besteed wordt aan taken waar Cursor niets aan heeft, is een verspilde bit. Door het volledige gewichtsbudget te wijden aan software-engineering binnen Cursor — niet aan coderen in het algemeen, niet aan natuurlijke taal — kan het model tegelijkertijd beter zijn in zijn ene taak en goedkoper draaien bij inferentie. Dmytro plaatst hetzelfde idee in een infrastructuurperspectief: prompt engineering brengt je een heel eind, maar de enige manier om de echt specifieke gedragseigenschappen van je harness vast te leggen — welke tools de agent moet aanroepen, in welke volgorde, met welke argumenten — is ze via fine-tuning en RL in het model te bakken. > *"Er is een soort bovengrens aan hoe ver je kunt komen met prompt engineering. En als je echt geweldige AI-producten wilt maken, moet je door fine-tuning gaan en het modelgedrag beïnvloeden."* — Dmytro Dzhulgakov ## [04:55] Specialisatie vs. de bittere les Sonya werpt een tegenvraag op: de geschiedenis van machine learning is bezaaid met gespecialiseerde modellen die werden weggevaagd door grotere algemene modellen. Herhaalt Composer 2 de TabNine-fout? Federico betoogt van niet. De bittere les werkt op de schaal van parameters en data; wat Cursor doet, is de eindige capaciteit van het model bevrijden van afleidingen, zodat meer van de bitter-lesson-schaling terecht kan komen bij de ene taak die ertoe doet. De labmodellen waarmee Cursor concurreert, trainen ook zwaar op code — ze zijn niet puur algemeen. Cursor neemt die specialisatie gewoon verder en sneller door de datapijplijn volledig zelf te beheersen. ## [06:16] Composer 2 trainingsrecept Composer 2 start vanuit Kimi 2.5, een mixture-of-experts-model met 1 biljoen parameters en 30 miljard actieve parameters. De training verloopt in twee opeenvolgende fasen: eerst een mid-training op codetokens op bijna-pre-trainingschaal (Cursor's productdata geeft toegang tot hoogwaardige codeercontexten), dan een grootschalige RL-fase waarin het model echte Cursor-agentsessies uitvoert in gesimuleerde omgevingen. Mid-training leert het model de wereld van code kennen — library API's, idiomatische patronen, correcte syntax. RL scherpt die kennis aan tot correct gedrag: het model leert tools goed aan te roepen, meerbeurtige agentsessies te navigeren en code te schrijven die daadwerkelijk compileert en tests doorstaat. De asynchrone pijplijn betekent dat de trainer en de rollout-omgevingen gelijktijdig draaien in plaats van afwisselend; staleness wordt geaccepteerd in ruil voor bijna 100% GPU-benutting. > *"Je verliest misschien een paar procent door asynchroon te zijn en geen perfecte wiskundige updates te doen, maar je compenseert dat ruimschoots door effectief niet de helft van je capaciteit onbenut te laten."* — Dmytro Dzhulgakov Training draait in FP4 om maximale doorvoer te halen uit een kleinere GPU-vloot dan de frontier-labs beschikken. De inference-engine is Fireworks in plaats van een eigen bouw — een bewuste keuze om Cursor's engineers gefocust te houden op trainingsefficiëntie in plaats van opnieuw een inferentiestack te bouwen. ## [16:32] RL-infrastructuur wereldwijd schalen Er was geen enkele grote aaneengesloten cluster beschikbaar op de schaal die Composer 2 vereiste, dus het team splitste de taken: één cluster verzorgt alle training, terwijl inferentie — de rollout-component — verdeeld wordt over vier geografisch gespreide clusters, inclusief de reservecapaciteit van Composer 1.5's productieserving buiten de piekuren. Training heeft snelle interconnects en gelijklopende operaties nodig; inferentie niet, dus die kan draaien op heterogene GPU-generaties met kleinere intra-clusternetwerken. Het lastige systeemprobleem is gewichtsynchronisatie: Kimi 2.5 weegt circa 1 TB, en de trainer produceert elke 5 tot 15 minuten een nieuw checkpoint. Elke 10 minuten 1 TB over continenten versturen zou inferentie stilleggen. De oplossing: RL-updates zijn doorgaans spaarzaam en regelmatig in welke gewichten ze aanpassen, dus het team schreef een delta-compressie-algoritme dat de payload met circa 20x verkleint en alleen de diff verstuurt. De ontvanger reconstrueert het volledige checkpoint verliesvrij, zonder numerieke verrassingen aan de andere kant. > *"Hoewel het volledige model zo'n 1 terabyte groot is, veranderen niet alle gewichten bij elke stap... er zijn heel regelmatige patronen in welke deelverzameling van gewichten wordt aangepast."* — Dmytro Dzhulgakov ## [23:32] Drijfpuntdrift Wanneer de asynchrone RL-lus een batch rollout-trajecten van inferentie terugstuurt naar de trainer, herberekent de trainer dezelfde forward pass om logkansen opnieuw te berekenen voor de GRPO loss. In theorie zouden de logkansen identiek moeten zijn. In de praktijk wijken ze vaak af, soms aanzienlijk. De oorzaak is drijfpunt-niet-determinisme: optelling van drijfpuntgetallen is niet commutatief, dus A + B + C ≠ C + B + A, en kleine verschillen stapelen zich op over miljarden operaties. Onder normale inferentie is het model robuust voor deze ruis. Onder RL — zeker bij een spaarzame MoE-gatingfunctie — wordt de ruis zodanig versterkt dat de trainer en inferentie het oneens zijn over welke tokens werden gesampeld, waardoor het trainingssignaal beschadigd raakt. ## [25:11] MoE-gevoeligheid toegelicht MoE-architectuur vergroot drijfpuntdrift via de gatinglaag. Bij elke transformerlaag scoort het gatingnetwerk alle 384 experts en selecteert de top 8 voor elk token. Een verschil in verborgen toestand op de vijfde decimaal kan al genoeg zijn om expert 7 te wisselen voor expert 9 op de selectiegrens, waardoor het token door een volledig ander deel van het model wordt gerouteerd. Omdat MoE-experts groot en grotendeels niet-overlappend zijn, levert een verkeerde expertselectie een grote outputafwijking op in plaats van een kleine — anders dan bij een dense model, waar numerieke ruis klein blijft. ## [26:25] Router Replay als oplossing De oplossing is Router Replay: tijdens inferentie legt het model vast welke expertindex het voor elk token activeerde en stuurt dat gehele getal mee met de gegenereerde reeks terug naar de trainer. De trainer dwingt vervolgens dezelfde expertselectie af in plaats van die opnieuw te berekenen, waardoor de versterkingsketen wordt doorbroken. Naast Router Replay stemde het team kwantisatieniveaus en kernel-implementaties af tussen inferentie en training om elke andere bron van numerieke afwijking te minimaliseren. > *"Veel van deze numerieke afstemming bestaat uit trucs zoals dat, kwantisatieniveaus matchen, kernels matchen, enzovoort, om de afwijking tussen de training- en inferentie-implementatie te verminderen."* — Dmytro Dzhulgakov ## [27:19] Realtime RL-lus Naast de gesimuleerde rollout-lus draait Cursor wat Federico realtime RL noemt: echte gebruikerssessies in productie stromen terug in de trainingspijplijn. Wanneer een gebruiker tevreden of ontevreden is met een Composer-generatie, wordt dat signaal vastgelegd en wordt elke paar uur een nieuwe modelversie uitgebracht. Het team werkt actief aan het verkorten van die cyclus, maar weet ook dat die verlengd moet worden naarmate de rollout-horizons langer worden — langere agentsessies vergen meer tijd om te evalueren. De gesimuleerde lus en de realtime lus dienen verschillende doelen. Simulatie stelt het model in staat om 16 tot 128 rollouts vanuit hetzelfde prompt parallel te draaien (de GRPO loss vereist gegroepeerde rollouts), off-policy te verkennen zonder gebruikers te beïnvloeden, en prestaties op te bouwen voordat het model goed genoeg is voor echte gebruikers. Realtime RL is een verfijningslaag die pas werkt zodra het model al een minimale kwaliteitsdrempel haalt — gebruikers die een slechte ervaring hebben, stoppen met het genereren van feedbacksignalen. > *"We kunnen dit niet echt gebruiken om het model helemaal van nul af te bouwen, want gebruikers moeten het model al gebruiken. Het moet dus al goed zijn, en we kunnen het alleen maar beter maken."* — Federico Cassano ## [31:49] Langetermijnagenten Naarmate rollout-horizons groeien, duiken twee structurele problemen op. Ten eerste kredietverdeling: met één positieve of negatieve beloning aan het einde van een sessie van meerdere minuten moet het model achterhalen welke van de 50 of meer beslissingen in de trajectorie het resultaat heeft bepaald. Dit wordt exponentieel moeilijker naarmate de trajectorie langer wordt. Ten tweede raakt het contextvenster vol. Cursor's oplossing is zelf-samenvatting rechtstreeks in de RL-lus te bakken onder de naam "compaction": het model leert via RL-beloning zowel een bruikbare samenvatting van zijn voortgang te schrijven als het contextlimiet nadert, als die samenvatting getrouw voort te zetten. Het model met een contextvenster van 200K tokens werkt effectief over miljoenen tokens omdat het zijn venster kan resetten en zijn werkgeheugen in gecomprimeerde vorm meedraagt. > *"Door RL, omdat RL het model aanstuurt om dingen correct te doen richting het doel, trainen we het model tegelijkertijd om een goede samenvatting te produceren én om die samenvatting goed op te volgen."* — Federico Cassano ## [34:29] Waarom RL overal Sonya omschrijft RL als een instrument specifiek voor agentische, langetermijn-toolgebruik. Federico werpt tegen: RL is overal nuttig, ook voor tabvoltooiing. Zijn theorie: pre-trained modellen hebben alle menselijke kennis opgeslagen, maar weten niet welk persona ze moeten aannemen — expert, leerling of iets daartussenin. De eerste fase van RL-training scherpt die verdeling aan en vertelt het model: jij bent de expert, doe dit correct. Dat effect is waardevol zelfs voor taken als samenvatting zonder interactieve harness. De tweede fase — waarbij het model zichtbaar begint te redeneren en de computecurve afvlakt — is waar taakspecifiek signaal pas echt samengesteld wordt. ## [37:34] LLM als beoordelaar Hoe verifieerbaar de beloning ook is — compileert de code, doorstaan de tests, is het antwoord numeriek correct — hoe meer rekenkracht je in RL kunt steken en toch een beter model kunt krijgen. LLM-as-judge vult de kloof voor taken waarbij waarheid moeilijk te definiëren valt, door een rubric als prompt te formuleren en een tweede model rollout-kwaliteit te laten beoordelen. Dmytro merkt op dat dit bijzonder nuttig is voor stijlgerichte taken zoals samenvatting, waarbij menselijke beoordelaars moeite hebben te verwoorden wat "goed" betekent, maar dat wel kunnen evalueren aan de hand van expliciete criteria. > *"Over het algemeen: hoe verifieerbaar je beloning is, hoe beter, want dat stelt je in staat de rekenkracht op te schalen en gewoon betere resultaten te boeken."* — Dmytro Dzhulgakov ## [39:14] RL in moeilijke domeinen Voor domeinen waar de grondwaarheid niet goedkoop te berekenen valt — creatief schrijven, open redeneren, domeinexpertise — is de weg naar betere RL het rijker maken van de omgeving. Grotere gesimuleerde omgevingen die meer van de productmetriek vastleggen, stellen je in staat geautomatiseerde evaluatie verder op te schalen. Experts blijven noodzakelijk, niet om individuele rollouts te beoordelen, maar om de taken en rubrieken te ontwerpen die bepalen wat de beloningsfunctie moet optimaliseren. ## [40:13] Bouw je eigen omgevingen Cursor maakt geen gebruik van externe RL-omgevingsleveranciers. Voor coderen leveren GitHub-repositories een vrijwel onbeperkte pool van werkende omgevingen: kloon een repo, installeer afhankelijkheden, geef het model een taak en meet het resultaat af aan de testsuite. Het moeilijkere infrastructuurprobleem is die omgevingen realistisch genoeg maken om het soort valsspelen waarmee het gesprek opende te voorkomen, en snel genoeg om er 100.000 tegelijk op verzoek op te starten. Cursor's antwoord is een eigen virtuele-machinestack — volledige VM's, geen containers — die direct naar willekeurige schaal kan opschalen en die echte gebruikersmachines zo nauwkeurig nabootst dat het model het verschil niet kan detecteren. Dmytro schetst het leverancierslandschap: frontier-labs hebben generieke omgevingen nodig voor elke taak; productbedrijven kunnen het beste RL trainen tegen hun eigen productieomgeving. De krachtigste trainingsomgeving voor elk model is het product waarin het daadwerkelijk wordt gebruikt. > *"De krachtigste omgeving is je eigen product."* — Dmytro Dzhulgakov ## [44:34] Afsluitende gedachten Sonya sluit af met de observatie dat Cursor's traject — van applicatiebedrijf naar frontier-modellab — het patroon is dat andere AI-productbedrijven zullen volgen. Federico bedankt Fireworks voor het bieden van de infrastructuurbackbone die de trainingrun haalbaar maakte binnen Cursor's GPU-budget. Dmytro reflecteert op de systeemtechnische diepgang die nodig was voor een probleem dat de meeste mensen als puur algoritmisch beschouwden. ## Entiteiten - **Federico Cassano** (Persoon): Onderzoeksleider voor Composer 2 bij Cursor; stuurde het trainingsrecept en de RL-methodologie. - **Dmytro Dzhulgakov** (Persoon): Infrastructuurleider bij Fireworks AI; ontwierp het gedistribueerde RL-trainingssysteem voor Composer 2. - **Sonya Huang** (Persoon): Partner bij Sequoia Capital; host van de podcast gericht op AI-investeren. - **Composer 2** (Software): Cursor's gespecialiseerde agentische coderingsmodel, getraind met mid-training plus grootschalige RL op Kimi 2.5 MoE. - **Fireworks AI** (Organisatie): Bedrijf voor modelserving en inferentie-infrastructuur dat de gedistribueerde GPU-backbone leverde voor Composer 2 RL-training. - **Cursor** (Organisatie): AI-coderingsbedrijf; trainde Composer 2 als gespecialiseerd foundationmodel voor software-engineering in zijn product. - **Kimi 2.5** (Software): Open-source MoE-model met 1 biljoen parameters (30B actief) van Moonshot AI; gebruikt als basis voor Composer 2. - **GRPO** (Concept): Group Relative Policy Optimization — het RL-algoritme voor Composer 2, waarbij meerdere parallelle rollouts vanuit hetzelfde prompt nodig zijn om de beleidsgradiënt te berekenen. - **Router Replay** (Concept): Techniek voor MoE-numerieke afstemming waarbij inferentie de routeringsbeslissingen van experts vastlegt en terugspeelt naar de trainer, waardoor drijfpuntdrift in logkansen wordt voorkomen. - **Real-Time RL** (Concept): Cursor's productiefeedbacklus die live gebruikerstevredenheidssignalen vastlegt en het model continu bijwerkt, elke paar uur een nieuwe versie uitbrengt. - **Delta Compression** (Concept): Gewichtsynchronisatietechniek die alleen gewijzigde parameters verstuurt tussen training en gedistribueerde inferentieclusters, waardoor 1 TB-snapshots in de praktijk worden teruggebracht tot circa 50 GB. - **Self-Summarization / Compaction** (Concept): Via RL aangeleerd vermogen van de agent om zijn werkcontext samen te vatten wanneer het contextvenster vol raakt, zodat effectief onbeperkte-horizonoperatie mogelijk is.

1:03:06

#notion#ivan-zhao#ai-strategy

Notion’s Ivan Zhao: The Refounder

Brian Halligan interviews Notion co-founder Ivan Zhao on his journey as a 'refounder' who navigated the company through its 2015 Kyoto restart and the 2023 generative AI pivot. Zhao details Notion's transition from a traditional SaaS structure to an AI-native 'jazz band' model that prioritizes technical versatility, taste, and agency over rigid hierarchies. The discussion explores how AI acts as the 'steel' for modern organizations, enabling flatter structures and faster, more reversible decision-making. ## [00:00] Introduction Brian Halligan introduces Ivan Zhao as the 'refounder' of Notion, highlighting his unique ability to restart the company during critical junctures in 2015 and 2023. The conversation sets the stage for Zhao's transition from a traditional SaaS management model to an AI-native organization. Halligan compares Zhao's approach to other tech visionaries like Jack Dorsey, emphasizing the importance of personal style and 'taste' in building a lasting brand. > *I like to think of him as the refounder... he's the canonical example of how a SAS company can move and become an AI company. [00:52]* > *We want to be a jazz band, not a marching band. [00:02]* ## [02:22] From Founder Mode to AI Org Ivan Zhao discusses his detour into traditional delegation and professional management before returning to a hands-on 'founder mode' necessitated by the AI shift. He explains that building with language models is less like predictable bridge engineering and more like 'brewing beer,' where the underlying technology dictates the development path. Zhao emphasizes hiring 'jazz band' people—versatile individuals like designers who code—to navigate the experimental nature of AI integration. > *Building with language model... is like brewing beer. You can't truly predict the things the underlying thing. [06:33]* > *The spirit is technology first-driven development rather than customer-driven first development. [07:01]* ## [11:00] Hiring for Taste and Agency Notion utilizes a 'barbell' hiring strategy that targets both super-junior and super-senior talent while avoiding the 'middle' of traditional SaaS experience. Zhao defines talent as the product of capability, taste, and agency, noting that AI has democratized basic capabilities like coding and writing. Consequently, the company now optimizes for 'agency' and 'taste,' qualities that remain difficult to automate and serve as the primary differentiators for the brand. > *capability got normalized democratized and taste becomes still important [11:53]* > *So the shape it's not it's more like the barbell barbell shape, right? [12:35]* ## [24:28] Refounding Notion in Kyoto In 2015, facing potential failure and low morale, Zhao and co-founder Simon Last laid off their entire staff and relocated to Kyoto, Japan, to rebuild Notion from scratch. This 'Kyoto Reset' allowed them to focus entirely on craft and coding while living a minimalist lifestyle. Zhao chose Kyoto specifically for its status as the 'craft capital of Asia,' which provided the spiritual inspiration needed to view software as a fundamental human tool. > *So my co-founder and I said let's just lay off everybody just go by the two of us. That's the Japan story. [25:41]* > *The story we tell ourselves is like Kyoto is a special place. If you can pull off anywhere, you can pull off from Reborn in Kyoto. [28:05]* ## [30:27] Craft Versus Commerce Zhao views Notion as part of a historical lineage of 'tools for thought,' tracing back to pioneers like Douglas Engelbart and Alan Kay. He criticizes modern Silicon Valley 'tinker culture' for ignoring the history and humanity behind technology. For Zhao, the goal is to find an equilibrium between the pure craft of an artist and the commercial viability of a business, ensuring the product has a 'soul' that resonates with users. > *Tech is like industry doesn't know its past. If you don't know his past you don't know history which is humanity. [31:52]* > *I need to be in equilibrium with my own value of what this company I want to build... [51:33]* ## [32:26] When to Refound For founders whose companies are stagnating, Zhao suggests listening to the 'inner urge' to take drastic action rather than wasting years on ventures without momentum. He argues that refounding is often harder than starting fresh because it involves taking a significant step back to pivot toward a new growth engine. Zhao believes the current AI-driven market is wide open, making it an ideal time for founders to be risk-seeking and follow their intuition. > *For me it's like there's you just feel you have to do something drastic... then you feel liberated once you land in Japan. [32:56]* > *The refounding is harder than it looks. It typically involves like a big step back and two steps forward. [59:57]* ## [34:07] GPT-4 Refounding Shock Zhao describes gaining early access to GPT-4 as a 'full body religious experience' that signaled a fundamental shift in the world. This realization forced a second refounding of Notion, as Zhao felt any work not involving this technology would soon become meaningless. The transition included a grueling 18-month period of low morale while the team waited for the underlying AI models to catch up with their ambitious product vision. > *GBD4 is a religious experience for me. It's like holy [ __ ]... anything you do if you don't do this it will be meaningless. [34:27]* > *that was like a year and a half just go with no error and morale is definitely low [35:50]* ## [45:35] Leadership and Founder Energy Despite being naturally introverted, Zhao explains how he forced himself to master one-to-many communication to build trust within Notion. He maintains a disciplined daily routine, starting at 7 AM and often working until midnight, while using 'guilty pleasure' reading to recharge. To prevent organizational calcification, Notion aggressively acquires startups to bring in 'founder energy,' currently employing over 50 former founders who lead critical domains. > *To lead the group of human you need to do one to many communications otherwise people don't trust you. [46:17]* > *founders are are kind of this kind of like little decalcified meatthead machinery just trying to break things [39:10]* ## [53:17] Sales Culture and Closing Thoughts Notion's transition to enterprise sales involved moving away from 'first-principle' experimentation toward established playbooks, pairing system thinkers with high-energy sales leaders. The conversation concludes with a vision of the 'AI-native' CEO playbook, which replaces traditional 'triangle' hierarchies with a 'circular' model. In this structure, a centralized AI system saturated with company context enables smaller teams to move at breakneck speed with reversible decision-making. > *You should only have each company should only preserve your innovation point to few places... [54:54]* > *All of those kind of one-way doors that Bezos used to talk about are really two-way doors... [62:39]* ## Entities - **Ivan Zhao** (person): Co-founder and CEO of Notion, known for his 'refounder' mindset. - **Brian Halligan** (person): Co-founder of HubSpot and interviewer. - **Notion** (organization): A productivity software company that pivoted to an AI-native model. - **Simon Last** (person): Co-founder of Notion who helped rebuild the company in Kyoto. - **Kyoto** (location): The Japanese city where Notion was restarted in 2015. - **GPT-4** (technology): The AI model that triggered Notion's second refounding. - **Steve Jobs** (person): Former Apple CEO cited as an inspiration for refounding and craft. - **Jack Dorsey** (person): Tech leader mentioned for his AI-centric organizational redesign. - **Douglas Engelbart** (person): Computing pioneer in the 'tools for thought' lineage. - **Erica** (person): CRO of Notion and former CRO of GitHub. - **SaaS** (concept): Software as a Service, the industry context for Notion's evolution. - **Jazz Band** (concept): Metaphor for a flexible, high-agency organizational structure.

Suno's Mikey Shulman: Everyone Can Make Music Now

34:56

Sequoia Capital2 maanden geleden

Suno's Mikey Shulman: Everyone Can Make Music Now

Mikey Shulman, co-founder of Suno, discusses the platform's evolution from a physics-based startup to a leader in generative AI music. By modeling music as raw sound waves rather than traditional theory, Suno empowers users to transition from passive listeners to active creators in the era of 'creative entertainment.' ## [00:00] Physics, Raw Sound, and Technical Philosophy Mikey Shulman explains how his background in quantum physics at Harvard influenced Suno's interdisciplinary approach to music technology. By modeling audio as raw sound waves sampled 48,000 times per second rather than using traditional music theory, Suno avoids creative constraints and allows for the emergence of new, microtonal genres. > *I think what I mostly learned is playing at the nexus of two things that don't usually play together is just a massive opportunity. [02:00]* ## [02:15] The Pivot to Consumer Music Generation Initially focused on audio analysis, the Suno team pivoted to generation after breakthroughs in audio compression made high-quality output computationally feasible. They validated the product's 'fun factor' through a Discord bot, discovering that the addictive nature of creation was a stronger signal than traditional business use cases. > *When you are staying up late playing with the thing, and you don't want to go to sleep, it's like a really good sign. [04:49]* ## [11:41] Why Music AI is a Research Problem, Not a Scale Problem Unlike Large Language Models, music generation lacks objective benchmarks, making raw compute scale less effective than targeted research. Shulman emphasizes using human preference data and reinforcement learning to align models with creative tastes, favoring a steady release cadence over long-term isolated development. > *In music there are no right answers. There are no benchmarks. Um, and so scale is somewhat less helpful in solving it. [12:28]* ## [16:22] From Passive Consumption to Creative Entertainment Shulman introduces the concept of 'creative entertainment,' where the act of building provides more fulfillment than the final product itself. He notes that 90% of Suno users are active creators, drawing parallels to the 'bedroom producer' era where accessible tools led to the discovery of new genres. > *People are creating music for the fun and enjoyment and fulfillment that comes with being creative. [17:05]* ## [22:52] Industry Partnerships and Professional Integration Addressing industry concerns, Shulman highlights Suno's partnership with Warner Music Group and its role in augmenting professional workflows. He argues that AI will raise the quality ceiling for artists and predicts that interactive live performances, such as audience participation at Coachella, are the next frontier. > *I think people incorrectly assume that we hate the existing music industry and especially we hate the record labels. [23:17]* ## [25:53] Product Strategy and the Application Moat Suno prioritizes the application layer and user experience as its primary competitive moat, viewing itself as a music company rather than just a technology firm. By focusing on storytelling through full-length lyrical songs and social co-creation features, the company aims to revive the cultural impact of music as a social medium. > *It's unclear how much moat exists in only a model... it's just really undervalued to invest in the product and the UI and the UX. [26:50]* ## Entities - **Mikey Shulman** (person): CEO and co-founder of Suno with a PhD in physics from Harvard. - **Suno** (organization): An AI-powered creative entertainment platform for music generation. - **Sonya Huang** (person): Partner at Sequoia Capital and host of the interview. - **Warner Music Group** (organization): A major global record label that partnered with Suno. - **Discord** (organization): The platform where Suno initially launched its music generation bot. - **Harvard** (organization): The university where Mikey Shulman studied quantum computing. - **Iamona** (person): A poet and artist who uses Suno to create music, illustrating the tool's professional potential. - **Coachella** (event): A major music festival cited as a future venue for interactive AI music experiences.

#ai-music#generative-ai#suno-ai

Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next

24:36

Sequoia Capital3 maanden geleden

Anthropic's Boris Cherny: Why Coding Is Solved, and What Comes Next

Boris Cherny, the creator of Claude Code, sits down with Sequoia's Lauren Reeder at AI Ascent 2026 and makes a blunt claim: for the code he writes, coding is already solved. He hasn't typed a line by hand in 2026, runs dozens of agent "loops" at once, and ships most of his work from his phone. The throughline is a bet that writing code is becoming so cheap that the interesting questions move up a level — to what teams look like, what happens to software products, and whether the printing press is the right way to think about what's coming. ## [00:00] Introduction A Sequoia emcee opens the AI Ascent session by asking the room for a show of hands: who uses Claude Code, and who has "Claude Code psychosis." She introduces Boris Cherny as the creator of the tool and hands the interview to Sequoia's Lauren Reeder. > *"We know that the entirety of software development kind of rests on your shoulders."* ## [00:55] Claude Code Crowd Check Reeder frames the conversation for a room full of builders and fills in Boris's background: a career writing code, a TypeScript textbook, an engineer's engineer. The detail that lands hardest is recent — as of early 2026 he hasn't written a single line of code by hand, a sharp reversal for someone who built his reputation on craft. > *"Last time we chatted you hadn't written a single line of code in the last year, or at least so far in 2026, which is quite the change."* ## [02:39] Origin Story of Claude Code Boris explains that Claude Code started almost by accident inside Anthropic Labs, a small incubator he joined in late 2024 that also produced MCP and the desktop app. The team built what it wanted, disbanded, and has since reunited under Mike Krieger for a second round. The motivation was a sense of "product overhang" — capability sitting unused because no product had caught up to it yet. > *"The reason that I started to work on coding is we felt like there was this product overhang."* ## [03:35] From Typeahead to Agents In late 2024 the state of the art was typeahead — press tab, complete one line — which Sonnet 3.5 had just made viable. Boris bet the model was nearly ready to skip that step and write all the code as an agent. It didn't work for the first six months; even after release, Claude Code wasn't a hit. The exponential growth only arrived with Opus 4. > *"I built it, and it just really didn't work for the first 6 months. It was barely usable."* ## [05:07] Is Coding Solved Reeder presses on Boris's on-the-record claim that coding is solved. He polls the room — hand-written code versus fully agent-written — and lands the audience around "50% solved," then says for him it's 100%. He points out the Claude Code codebase itself is unglamorous TypeScript and React, chosen deliberately because that stack is heavily represented in the model's training distribution. > *"For me it's just solved."* ## [06:50] Boris Personal Workflow Boris walks through a setup he first shared on Twitter six months ago and didn't expect to surprise anyone. It has since changed: most of his work now happens from his phone, through the Claude app's code tab, where he keeps five to ten sessions each running a few hundred agents. The tool he reaches for most is the loop — fire-and-forget agents that grind on a task and report back. > *"I sort of feel like loops are the future at this point."* ## [08:51] Future Teams and Generalists Asked what teams will look like, Boris predicts a shift toward generalists. Today a generalist still means an engineer who spans iOS, web, and server; tomorrow it means people who are cross-disciplinary, blending engineering with product and design rather than staying in a single lane. He notes the Claude Code team already skews this way. > *"There's going to be a lot more generalists... generalists that are cross-disciplinary."* ## [10:26] SaaS Apocalypse Predictions Reeder asks the question Boris calls his favorite: if AI makes writing code 10 to 100x cheaper, does the value of software products collapse — a SaaS apocalypse? Boris argues the two things that will actually happen aren't the ones people keep predicting, and uses his guest spot on the Acquired podcast as a detour into why he thinks the conventional framing misses the point. > *"I think there's two things that are going to happen and I don't think either of them is the thing that people have been talking about."* ## [12:57] Audience Q&A Deep Dive The floor opens to the room. An audience member, Dan, asks how much of Claude Code's success Boris attributes to the model versus product decisions — Boris says a mix, roughly 50/50, and won't forecast two years out because the team plans a week at a time. The richest answer is his printing-press analogy: before the press, about 10% of Europe was literate; in the 50 years after, more was published than in the prior thousand, and literacy eventually climbed toward 70%. He uses it to argue that building software is on track to become a near-universal skill. Later questions probe the engineering-versus-world gap, local versus cloud models, and how to parallelize agents with loops, batches, and sub-teams. > *"In the 50 years after the first printing press, there was more literature published in Europe than in the thousand years before."* ## [23:35] Closing and Whats Next For the last question, Boris is asked what kind of product he'd build today that gets more interesting as models improve. He points to Claude Design as a good example — decent now, much better soon — and teases features landing for Claude Code in the coming weeks, plus more work on loops, batch, and massively parallel agents, with computer use in the mix. > *"I think loop and batch and things like this around like massively parallelizing agents, that's going to get better."* ## Entities - **Boris Cherny** (Person): Creator of Claude Code at Anthropic; former Anthropic Labs member, now back on the team under Mike Krieger. - **Lauren Reeder** (Person): Sequoia Capital partner; interviewer for this AI Ascent session. - **Mike Krieger** (Person): Chief Product Officer at Anthropic and Instagram co-founder; leads the reunited Claude Code team. - **Anthropic** (Organization): AI lab behind Claude and Claude Code. - **Claude Code** (Software): Anthropic's agentic coding tool, originated in Anthropic Labs alongside MCP and the desktop app. - **Anthropic Labs** (Organization): Internal incubator where Claude Code, MCP, and the desktop app were built. - **Product overhang** (Concept): Model capability that outpaces the products built on it — the gap Boris set out to close. - **The loop** (Concept): Fire-and-forget agent runs that work a task continuously and report back; Boris's most-used workflow. - **SaaS apocalypse** (Concept): The thesis that cheap AI-written code collapses the value of software products — which Boris pushes back on. - **Printing press analogy** (Concept): Boris's frame for AI coding — literacy went from ~10% to ~70% over centuries; software-building may follow.

#claude-code#anthropic#ai-coding

20:03

Sequoia Capital3 maanden geleden

Robotics' End Game: Nvidia's Jim Fan

Jim Fan, lead of Nvidia's embodied AI research, outlines the transition from language-centric models to World Action Models (WAM) that simulate physical reality. He details a roadmap toward the 'Physical Turing Test' and autonomous factories by 2040, driven by video pre-training and human egocentric data scaling. ## [00:00] Introduction Host Sonya Huang introduces Jim Fan, who leads Nvidia's embodied autonomous research group. Fan reflects on his early days as an intern and the excitement surrounding the future of robotics. > *robots are just one of the most thrilling things that's going to happen.* > *[0, 12]* ## [00:30] DGX One Origin Story Jim Fan recounts the 2016 delivery of the first DGX-1 by Jensen Huang to Elon Musk and the OpenAI team. He highlights how this moment catalyzed the deep learning revolution that led to current AI breakthroughs. > *If you believe in deep learning, deep learning will believe in you.* > *[1, 26]* ## [01:42] The Great Parallel Fan proposes 'The Great Parallel,' applying the successful LLM scaling playbook to robotics. Instead of predicting the next token in a string, the goal is to predict the next physical world state through simulation and alignment. > *instead of simulating strings can we simulate next physical world state?* > *[2, 56]* ## [03:31] Robotics Endgame Setup The strategy for achieving the robotics end game is divided into two primary pillars: model strategy and data strategy. Fan notes that while LLMs are in their final 'boss fight,' robotics is just beginning its scaling journey. > *It boils down to two things, model strategy and data strategy.* > *[3, 32]* ## [03:39] Why VLA Falls Short Visual Language Action (VLA) models are criticized for being 'head-heavy' on language while lacking a fundamental grasp of physics and verbs. Fan argues they are better at encoding static knowledge than dynamic physical interaction. > *VLAs are great at encoding knowledge and nouns, but not so much at physics and verbs.* > *[4, 8]* ## [04:32] Video World Models Fan explains how video models like VEO3 learn internal physics—such as gravity and buoyancy—simply by predicting pixels at scale. These models act as simulators that can solve mazes and plan visual sequences internally. > *Physics emerge by predicting the next blob of pixels at scale.* > *[5, 15]* ## [06:09] DreamZero World Action Nvidia introduces 'Dreamer' and World Action Models (WAM), which jointly decode future world states and motor actions. This allows robots to perform zero-shot tasks by 'dreaming' the correct motion sequence before executing it. > *Dreamer jointly decodes the next world states and next actions.* > *[6, 29]* ## [07:46] Scaling Data Collection To overcome the physical limits of teleoperation, Fan discusses Universal Manipulation Interfaces (UMI) and exoskeletons like Dex-UMI. These tools allow humans to collect high-dexterity data directly without the robot being in the loop. > *we're able to break the curse of 24 hours per robot per day* > *[10, 6]* ## [11:06] EgoScale And Scaling Laws Fan introduces Ego-Exo, a policy trained on 21,000 hours of human egocentric video. This research uncovered a neural scaling law for dexterity, showing a mathematical relationship between pre-training volume and robot performance. > *we discovered this neural scaling law for dexterity.* > *[12, 39]* ## [15:39] DreamDojo And The Roadmap Fan outlines the roadmap to 2040, including the Physical Turing Test and 'lights-out' factories. He introduces Dream Dojo, a neural simulator that replaces classical physics engines with data-driven world models. > *I can say with 95% certainty that we'll get to the end of the end game... by 2040.* > *[19, 19]* ## Entities - **Jim Fan** (person): Lead of the embodied autonomous research group at Nvidia. - **Nvidia** (organization): The technology company developing the hardware and software for the robotics end game. - **Jensen Huang** (person): CEO of Nvidia, mentioned for delivering the first DGX-1 to OpenAI. - **OpenAI** (organization): The research lab that received the first DGX-1 for deep learning development. - **DGX-1** (product): The world's first deep learning supercomputer delivered in 2016. - **VEO3** (model): A video world model capable of simulating physics and visual planning. - **Dreamer** (model): A policy model that predicts future world states and actions simultaneously. - **Ego-Exo** (project): A robotics pre-training framework using large-scale human egocentric video data.

#robotics#nvidia#world-models

Andrej Karpathy: From Vibe Coding to Agentic Engineering

29:49