팟캐스트Hear the voice. See the shape of the thought.
채널 둘러보기
Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning
Oriol Vinyals(Google DeepMind VP of Research、Gemini 联合负责人)在 Google I/O 第二天坐下来,把 I/O 上发布的产品背后的研究路线一条条摊开:世界模型为什么是 Google 押向 AGI 的独特路径、视频 / 图像的"GPT moment"长什么样、Spark 和 agents 系统为什么必须和模型联合优化、scaffolding 终将由模型自己写、memory 应该走非参数 file-system 而不是塞进权重、当今 RL 在哪些维度上是数据受限的、为什么 math/code 上的训练能意外迁移、以及 Google 内部 Brain + DeepMind 合并后研究下注的取舍。 ## [00:00] Intro Jacob 用 60 秒铺垫了 Oriol 的背景(Gemini 联合负责人,与 Noam Shazeer、Jeff Dean 并列),以及 I/O 第二天访谈的优势:所有发布都还热乎,可以直接顺着 announcements 追到背后的研究。Oriol 进来打招呼,两人开始热身。 > *"I've been really excited for this because you're one of the people kind of most directly shaping the frontier of AI."* ## [01:36] Why World Models Jacob 先问"为什么是世界模型"。Oriol 把它拆成两层:一层是 self-improvement / coding 的角度,另一层是模型本身的对象——多模态、不止 closer 还包括 video / image 这种"world model"。Google 早就押了图像和视频路线,这次"显然押对了",因为我们其实把整个世界都搬到了互联网上。 他也承认中间有一段时间这条路看似不性感:multimodal 模型在 LLM 风口下被边缘化过,但视频和图像里藏着语言抓不到的知识——"the GPT moment for video"还没真正发生,但拐点已经在视野里。 > *"There is lots of knowledge in videos and images, and what I would say is the GPT moment for that — I'm not sure we quite have seen that."* ## [04:21] The GPT Moment for Video Oriol 用 Omni(Google 的多模态产品线)当锚点解释:从单纯把视频喂进上下文,到能在长上下文里理解和生成视频,这段曲线已经很陡。下一步是问"能不能像 LLM 一样,在没有 paired text 的纯图像数据上预训练并依然提取出全部意义和细节"——这个 hard challenge 一旦解开,数据维度会从"被人类描述过的"跳到"所有视频",量级差异巨大。 他特别承认现在 video 这块的标注数据相对 image 仍然稀缺,但解锁后的回报会"非常大"。 > *"Whether we agree with that or not is another question, but if it was to be unlocked, it would be massive."* ## [07:51] What Makes Omni a World Model "world model"这个词被滥用了,Oriol 给一个清晰定义:一个纯粹的 world model 必须做 representation learning——把世界压成紧致表征。在这之上,Omni 进一步成为可被语言驱动的 renderer:你用自然语言改一个 prompt,输出的视频内容随之改变,初始 image 之上能持续演化。这是从"被动建模"到"可控生成"的关键区别。 > *"The world model itself is acting as a renderer of the world, that you can really just change by language."* ## [10:04] World Models & Robotics 机器人是 world model 最直接的落地场景。Oriol 承认现在数据 mix 还在试错——sim 数据 vs 真机数据怎么配、什么时候 transfer 突然 click。世界模型本身的进步会带来一个 inflection point:一旦模型足够强,sim → real 的鸿沟会缩到 planning 和 gross motor 层面先打通,精细运动控制再慢慢跟上。 > *"At some level, maybe not at the precise motor control but at the kind of planning and gross, we are going to start seeing how things are going to fall into place."* ## [12:37] Evaluating Physics in AI 模型隐式学物理,但你怎么评估它学到没学到?Oriol 把它和无监督机器翻译做类比:如果模型内部确实表征了"重力"这个概念,应该能用某种 decode 把它翻译成显式 explanation。Stefano Gaus 等人 2014 年的早期 unsupervised translation 工作给了一条可借鉴的思路——把内部表征解码出来当 eval。 > *"You would need to somehow connect the concept of gravity which could be present or not in a world model to then decode that into an explanation."* ## [14:51] Consumer Agents & Spark I/O 发布的 Spark 是 Google 在 consumer agent 上的最新一步。Oriol 强调:"action 作为一种 modality"已经被 DeepMind 早早识别为关键。但 agent 不是把模型塞进 generic scaffold 就行——模型能力必须先到某个门槛,你才能 dream 出下一阶段的产品形态。 他给一个工程判断:在 train 阶段就把"我有这些能力,怎么挑用哪些"内化进模型,比在 inference 时让外部 scaffold 临时决策更高效。 > *"It's useful to build kind of the system slightly more narrowly around something you care deeply about."* ## [18:39] Scaffolding & the Bitter Lesson Oriol 多年支持 Sutton 的 bitter lesson。Jacob 把它推到 agent 时代:scaffolding 看起来违背 bitter lesson 因为是手写的胶水。Oriol 的答案是——"scaffold 本身就是一段 code,最终应该是模型自己 on the fly 写出来"。短期内人写、长期模型写,bitter lesson 仍然站得住。同时优化 model 和 scaffold 两端,而不是把所有赌注押在一端。 > *"That system itself is a piece of code that eventually the model itself could write on the fly."* ## [22:06] Memory & Continual Learning Memory 这个话题 Oriol 谈得最深——他有 cognitive neuroscience 背景。他把 memory 分成两类:塞进权重(参数化)和挂在外部 file system(非参数化)。在 serving 规模下,把每次 user interaction 都 bake 进 weight 是不切实际的,非参数式 file-system memory 更可行。 真正的难点是"consolidate":怎么把之前 session 的信息整合到新 session,让模型像人一样积累知识。这部分 momentum 很大但远未饱和,未来几年评估方式和工程实践都会迭代。 > *"The way that we'll see better evaluations and ways in which these models accumulate this knowledge as they go."* ## [26:54] Research Bets Inside Big Labs 在 Google 内部主导 Gemini 是什么体验?Oriol 谈三个维度的优势:TPU 联合设计(不用看 Nvidia 脸色)、广告/搜索带来的现金流稳定性、Brain + DeepMind 合并后端到端的研究强度。劣势是:组织太大没法对所有方向有全视野,必须靠直觉判断哪些早期研究值得 pull in,并接受"trade-off 不可能每次都做对"。 > *"Google is in a unique place. We have stability from hardware procurement and obviously like also investment of capital."* ## [32:30] Post-Training RL is Greenfield post-training 这块仍然是一片 greenfield。在 coding 和 math 上 LLM 已经走出指数曲线,但其他领域为什么没跟上?Oriol 的核心判断是"投入还远远不够"——相对预训练的算力消耗,post-training 至今只用了很小一部分。算法的 beauty 还在迭代,"cracking that recipe could be big"。 > *"Cracking that recipe could be big, at least in terms of the beauty of the algorithm."* ## [35:57] What Real Intelligence Looks Like 真智能长什么样?Oriol 用 2015 年的一个老 eval 来当锚——简单的 game-playing 任务,当时是 RL 的天花板,现在 LLM 一上来就能做。他想看到下一个数量级的跃迁:不是在熟悉的 benchmark 上推数字,而是在新的、人类没法立刻给出答案的问题上看到模型"主动产出洞察"。 > *"I like games."*(这句简单的自陈背后是他对 game-playing RL 长期偏爱的注脚) ## [39:11] RL Generalization 游戏曾经是 verifiable reward 的典型样板。现在的挑战是找新的 hard problem source,让 RL 在更广的领域诱发出深度推理和泛化。Oriol 抛出一个不对称观察:create solution 和 evaluate solution 之间存在 gap——如果 evaluation 比 generation 容易,RL 就有机会撬动。 让他意外的是:在 math/code 上的训练能 surprisingly 迁移到其他领域,"很多泛化能力可能其实来自 pre-training"。这是接下来几个月到几年研究者要破解的关键题。 > *"Possibly through pre-training — that's one of the quests for researchers to crack in the next few months and years."* ## [42:55] Advice for Founders 给 founder 的建议直白:evaluation 和 data 是绕不开的 moat。早期专注垂直产品、在 model 上叠一层 specialized scaffolding,等到 scale 起来再考虑 model layer 的差异化——这个路径"比较 scalable,也更适合早期玩家"。 > *"What I would tell folks is the value — and we discussed this a little bit — the value of evaluations and as a sequence of data."* ## [46:40] Can AI Truly Innovate? Oriol 2016 年加入 DeepMind 后最痴迷的方向是 meta-learning——模型自己产出 idea。但他承认到目前为止,"我没看到模型生成真正 outstanding 的 idea"。他比喻:你让一万个人尝试,挑出对的那个再 glorify,但模型真正自主提出方向的能力——quite limited。但他相信 "soon"。 > *"I don't think I've seen truly kind of outstanding ideas that a model has generated yet, but I am sure I will very soon."* ## [49:48] Recursive Self-Improvement 递归自我改进可以分层看:第一层是 researcher / engineer 用 AI 工具加速自己;第二层是模型直接自动化某些研究任务。当模型写英文比你好的那一天,下一个 ceiling 在哪里?Oriol 说:"maybe there's no ceiling, or the ceiling is still far away" —— 我们甚至不一定能看到 ceiling 在哪里。 > *"At the point a model writes English better than you, maybe there's no ceiling, or the ceiling is still far away."* ## [52:14] Quickfire 最后 8 分钟快问快答覆盖了 TPU 投资历史、给年轻研究员的算力直觉、当下 AI 阶段的总体感受。Oriol 留下一句总结:"I think it's a fascinating time as anything in AI"。Jacob 用 podcast 致谢和 outro 结束。 > *"I think it's a fascinating time as anything in AI."* ## Entities - **Jacob Effron**(人物):Redpoint Ventures Managing Director,Unsupervised Learning 主持人。 - **Oriol Vinyals**(人物):Google DeepMind VP of Research,Gemini 联合负责人(与 Noam Shazeer、Jeff Dean 并列)。 - **Gemini**(产品):Google 的旗舰多模态 / agent 模型族;本期主要谈 I/O 第二天的发布。 - **Omni**(产品):Google 的多模态产品线,被用作"video / image 的 GPT moment"参照系。 - **Spark**(产品):I/O 发布的 consumer agent 产品。 - **World Model**(概念):可被语言驱动的世界 renderer;representation learning 是其核心要素。 - **Bitter Lesson**(概念):Sutton 的论点;本期延伸为"scaffold 长期应由模型自己写"。 - **Memory / Continual Learning**(概念):非参数 file-system memory vs 把记忆塞进权重;consolidation 是关键难点。 - **Post-Training RL**(概念):相对预训练的算力投入还很少,被定性为 greenfield。 - **Move 37**(概念):AlphaGo 那一手;Oriol 用它指代"真正的 RL/research breakthrough"基准。
SpaceX's $2T Case, Nvidia's Shock Selloff, America Turns on AI, Trump Pulls AI Order, Bond Crisis?
Sacks is out, Gavin Baker (Atreides Management) sits in. The panel walks through Andrej Karpathy's surprise move to Anthropic, debates why the public mood on AI has flipped, tears apart SpaceX's $2T S-1, and asks why Nvidia's blowout earnings still saw the stock sold. Friedberg and Chamath also flag warning signals from inflation, oil, and bond yields, and close on what — if anything — came out of the US-China summit. ## [00:00] Gavin Baker joins the show! Jason opens episode 274 noting Sacks is out and welcomes Gavin Baker from Atreides Management for the week. They tee up the agenda: SpaceX and OpenAI IPOs, Karpathy to Anthropic, and Nvidia's earnings. > *"Sachs is out today, but we're very lucky to have Gavin Baker from Atreides Management joining us. The spicy takes must flow."* ## [00:30] Andrej Karpathy joins Anthropic; hypergrowth and profitability The Karpathy hire is read as a major strategic win for Anthropic — Chamath frames it as continuity of the Richard Sutton "bitter lesson" school of scaling that Karpathy executed at Tesla FSD and OpenAI. Gavin layers in financial context: Anthropic was EBIT-positive in the last quarter per the WSJ, which combined with hypergrowth makes the recent funding rounds look very different from a capital-burn narrative. Friedberg pushes back on the framing that models will soon "feed themselves" into context windows to self-improve, but flags that papers (one from MIT) suggest large efficiency gains are on the horizon. Chamath uses the moment to argue the podcast itself has to start telling the upside story of AI — the doctors, the scientists, the unlock — because the dominant public narrative has gone negative. > *"He was probably the first person that really commercialized the Richard Sutton bitter lesson essay when he was leading FSD at Tesla."* ## [12:42] Why Americans have turned on AI, anti-human perception Gavin shares a personal story: his daughter has a rare disease, and a Stanford scientist he funded is months away from what he believes is a complete cure, made tractable by AI-accelerated biology. He uses it to argue for an optimistic posture — a future where work is optional and disease is solvable — and warns that the people pushing for AI regulation are also shaping how the public feels about the technology. Friedberg goes deeper into the cultural mechanics: AI is being framed as anti-human in a way that mirrors anti-nuclear and anti-industrial backlashes of the 20th century. He argues the United States can't unilaterally slow down because China and others won't — and tries to separate genuine safety concerns from elite class anxiety. Chamath then makes a pointed observation that none of the survey data on AI job loss actually asks the truck drivers, package sorters, and ICU nurses themselves how they feel about the tools. > *"We're listening too much to the inventors of AI. They're geniuses. They're smart. We need to be listening to the frontline factory workers who are using AI saying, 'Wow, I was able to add a third shift.'"* ## [27:22] Trump pulls AI EO, US-China AI relationship, dystopian AI layoffs A Trump AI executive order was scrubbed at the last minute — the panel walks through what was reportedly in it (review of frontier-model training runs) and whether any pre-release regulatory framework is workable. Jason argues a state-by-state patchwork is the more likely outcome regardless of what Washington does. The conversation pivots to Meta's latest round of layoffs and the way they were communicated. Gavin and Jason agree the messaging — leaning on "AI productivity gains" as the public reason — landed badly even with people who accept the underlying logic, and Jason argues it became a case study in how *not* to message AI-driven workforce changes. > *"Because the reality is that if this is the way that you're going to message something as critical as this, I think you did a horrible job."* ## [45:19] SpaceX S-1 tear down! Breaking down the three major businesses and the case for a $2T valuation SpaceX filed its S-1 on Wednesday. Jason breaks the company into three businesses: launch (which could be hundreds of millions of paying subscribers via Starlink), Elon Web Services / xAI / Colossus compute, and rockets. The AI-cloud line item alone is around $15B and growing roughly 2x year over year, anchored by an Anthropic deal Gavin calls "extraordinary." Gavin then makes the case that Colossus matters because raw gigawatt-class data centers are now the binding constraint, and SpaceX-adjacent build velocity is the moat. He uses Cursor's Composer 2.5 release — Pareto-dominant on three or four weeks of RL training — as evidence that whoever owns the compute owns the next model generation, and walks through why rapid reusability on Starship compresses the unit economics of getting payload to orbit faster than any competitor can model. > *"If you look at who's actually capable of delivering a gigawatt data center, these guys are the closest, like an actual gigawatt."* ## [71:22] Nvidia smashes earnings but stock falls, why people are shorting chips Nvidia blew out earnings again — 20% sequential growth would be a high-growth print for any other company, the dividend was raised 25x, and the CFO committed to returning 50% of free cash flow. Yet the stock sold off, and Leopold Aschenbrenner's reported pivot away from chip exposure is being read as a smart-money signal. Gavin takes the bear case apart: at current PE Nvidia is cheap relative to growth, and the segment breakdown obscures how much the "AI clouds" line is dragging the multiple. He flags that the true useful life of a GPU is closer to two years than five, which means the reported profits of every hyperscaler running these chips are overstated — a real concern, not a stock-killer. He also notes Nvidia's CPU business is on track to do $20B this year, making it overnight one of the largest CPU manufacturers in the world. > *"The true lifespan of a GPU is more like two years and therefore the profits of all these businesses are overstated."* ## [82:25] Market update: Flashing red signals, oil, inflation, yields up The macro snapshot: May inflation expected at 4.2%+, Fed rate-hike odds back on the table, UK yields at the highest since the great financial crisis, oil and gold both moving. Chamath warns that when the currency-debasement mechanism finally breaks, the downside is non-linear. Gavin counters with relative optimism on the US: America is self-sufficient in energy, the AI build-out is structurally good for re-industrialization, and even in an ugly global scenario the US is the least-bad place to be invested. He flags AI fundamentals also have a seasonality that investors are starting to model — the same way e-commerce and subscription businesses do. > *"While it's terrible for everyone, it is relatively the best for America because we are self-sufficient in energy."* ## [92:45] China trip flops, or was progress made behind the scenes? A 48-hour US tech-CEO-plus-president trip to Beijing produced thin public deliverables: some soybeans, some H100/A200 sales to Chinese players. The panel asks whether that's the real story or just the visible surface, and whether the immediate China-Russia bonding moment afterward says more about the trajectory than any handshake photo. Gavin argues the more important read is structural: keeping America ahead in AI requires keeping the trans-Pacific relationship just stable enough to avoid a full decoupling shock, and that's a defensible strategic logic even if the optics are unsatisfying. He also paints a what-if scenario around the Strait of Hormuz to make the point that energy independence is what gives the US the option to act asymmetrically. Jason closes with thanks to Gavin and an invite back to the Summit. > *"There's sound arguments that this is stabilizing for the world and is the best highest probability path for keeping America ahead in AI."* ## Entities - **Jason Calacanis** (Person): Host, LAUNCH founder, MC of this episode. - **Chamath Palihapitiya** (Person): Host, Social Capital CEO; pushed the "listen to frontline AI users" framing. - **David Friedberg** (Person): Host, The Production Board CEO; led the cultural / historical analysis of the AI backlash. - **Gavin Baker** (Person): Guest host, Atreides Management founder/CIO; carried the investing thread across SpaceX, Nvidia, and macro. - **Andrej Karpathy** (Person): Joining Anthropic's new pre-training team; OpenAI co-founder, ex-Tesla FSD lead. - **Anthropic** (Organization): Hired Karpathy; EBIT-positive last quarter per WSJ; $15B AI-cloud deal with SpaceX-adjacent compute. - **SpaceX** (Organization): Filed S-1; three businesses (launch/Starlink, Elon Web Services compute, rockets); $2T valuation case. - **Nvidia** (Organization): Earnings blowout but stock sold off; $20B CPU run-rate; $5.3T market cap. - **Cursor** (Software): Composer 2.5 model release used as proof of fast RL-driven catch-up dynamics. - **Richard Sutton's bitter lesson** (Concept): Scaling beats clever architectures — framing for why Karpathy's move matters. - **GPU useful life** (Concept): Closer to ~2 years than ~5, so hyperscaler reported profits are overstated. - **Strait of Hormuz scenario** (Concept): Energy-independence-as-strategic-option argument for the US in the China game.
Trading signals that trade themselves
Tushara Fernando, Head of Data and AI at Man Group, explains how the firm integrates AI into systematic trading by codifying decades of institutional knowledge into "skills." She emphasizes that robust governance and shared workflows are essential for moving AI from individual productivity tools to enterprise-scale agentic platforms. ## [00:18] AI in Systematic Trading Man Group manages over $200 billion in assets, making the stakes for AI implementation exceptionally high for their institutional clients. Tushara Fernando describes systematic trading as an algorithmic process that uses historical backtesting to evaluate investment signals, much like managing a fantasy football team. > *A trading signal is really just this with stocks... We want to back the ones that would make money and we want to short the ones that won't.* > *[2, 43]* ## [04:38] The Role of AI-Generated Signals Man Group currently runs trading signals in production that were entirely researched, backtested, and proposed by AI. While humans review the final output for sensibility, AI handles the data acquisition, strategy proposal, and productionization of these investment ideas. > *There are trading signals running right now in production at Mang Group... that were researched, back tested and proposed by AI.* > *[4, 38]* ## [05:52] The Importance of Shared Workflows The success of a trading signal depends on the underlying workflows, such as data cleaning and outlier detection, which Fernando compares to the submerged part of an iceberg. Without shared workflows, different teams produce inconsistent results, making it impossible to compare the effectiveness of various strategies. > *If different teams are running different versions of those workflows, you get different answers.* > *[6, 50]* ## [08:43] Lessons in Skills Governance Early attempts at AI adoption failed because power users, rather than process owners, were building "skills," leading to local optimizations and errors like hardcoded cost centers. To solve this, Man Group created a governed marketplace where skills are owned by workflow owners, tested with evaluations, and tracked for usage. > *Treat those skills like production code because that's what they will become.* > *[17, 21]* ## [16:40] Scaling AI Across the Enterprise Man Group has scaled AI usage to nearly half its workforce by focusing on organizational context as a competitive moat. By treating skills as a library of institutional knowledge, the firm is preparing for a future where swarms of agents leverage these capabilities to find new investment opportunities. > *Skills governance really unlocks AI at that enterprise scale.* > *[19, 21]* ## Entities - **Tushara Fernando** (person): Head of Data and AI at Man Group. - **Man Group** (organization): An alternative investment manager with over $200 billion of assets under management. - **Claude** (product): An AI model used by Man Group for research, backtesting, and workflow automation. - **Anthropic** (organization): The AI company that assisted Man Group with skills workshops and implementation. - **Systematic Trading** (concept): Algorithmic trading capabilities that look across thousands of securities and hundreds of markets. - **Backtesting** (process): The process of running a trading strategy against historical data to evaluate its performance. - **Sharpe Ratio** (metric): A statistical factor that compares the volatility of a strategy versus its returns. - **Skills Marketplace** (product): Man Group's internal library for governed AI skills, plugins, and institutional knowledge.
The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman
Andrew Feldman, CEO of Cerebras, details the company's journey from a controversial 'wafer-scale' architecture to a $63 billion public valuation. He explains how their radical hardware design delivers 15-20x faster AI inference than traditional GPUs, enabling new business models and a fundamental reorganization of productivity. ## [00:00] – Cold Open Andrew Feldman compares the impact of AI speed to Netflix's transition from DVD delivery to streaming, noting that extreme speed opens entirely new business models. He predicts a fundamental reorganization of productivity as AI moves beyond basic coding and design tasks. > *that's what happens with speed and I think that's what fast AI does right now [00:10]* ## [00:41] – Andrew Feldman Introduction Host Sarah Guo introduces Andrew Feldman and highlights Cerebras' recent IPO and its current $63 billion market cap. The discussion frames the company's transition from early machine learning research to dominating the foundation model inference market. > *Serbust recently went public and is currently worth about $63 billion in the stock market. [00:54]* ## [00:48] – Cerebras’ Evolution Feldman describes Cerebras as a builder of AI-optimized computers that outperform GPUs by up to 20x in inference tasks across all model sizes. He attributes their recent success to AI models becoming smart enough for daily utility in 2025, leading to massive contracts with OpenAI and AWS. > *we're the the fastest at inference, not by little, but by a lot, 15, 18, 20x faster than GPUs. [01:39]* ## [02:17] – Wafer-Scale Bet Pays Off The conversation explores Cerebras' unique 'wafer-scale' architecture, which utilizes a single chip the size of a dinner plate. Feldman argues that radical performance improvements require radical designs, noting that critics initially dismissed the approach as impossible. > *we chose wafer scale, which means we build a 46,000 square millimeter chip, a chip the size of a dinner plate [03:39]* ## [06:38] – Challenges and Breakthroughs Feldman recounts a high-stakes period between 2017 and 2019 when the team struggled to make the technology work while spending $8 million monthly. He emphasizes that while the technical breakthrough occurred in 2019, market demand only exploded once AI became an essential daily tool. > *We had a period between about 2017... and middle of 2019 where we couldn't build it. [07:34]* ## [08:37] – Crossing the Market Chasm Feldman describes the early years where Cerebras had superior technology but struggled to find a market, eventually finding success in supercomputing labs. A pivotal $1 billion order from sovereign partner G42 provided the capital and scale necessary to battle-test their hardware and prepare for the AI explosion. > *We had a 2 or three year period where we were ahead of the market and absolutely nobody cared that we were blisteringly fast. [09:00]* ## [10:38] – Scaling Software and Hardware Scaling a hardware company involves physical constraints like manufacturing lines, power requirements, and test fixtures that software companies do not face. Feldman also highlights the long-term nature of deep tech development, noting that building a high-quality compiler takes nearly a decade of engineering effort. > *When you're building things... you have to call your manufacturing partner... Each step takes real time and effort to grow. [11:24]* ## [12:03] – Relevance of AI-Generated Coding Cerebras has aggressively adopted AI-generated coding, with token spending per engineer increasing significantly to support the use of autonomous agents. Feldman observes that certain engineers are becoming '100x' contributors by governing multiple agents for coding and QA tasks. > *They've moved their coding style to being one in which they govern agents... they've gone from being sort of 10x guys to being 100x guys. [13:12]* ## [13:31] – Leadership and Hiring Culture With a $20 billion backlog and a growing team of over 800 people, Feldman emphasizes the need to avoid corporate malaise by continuing to take extraordinary risks. He views himself as a 'professional David' who thrives on solving problems that others deem impossible while competing against Nvidia. > *We would much rather fail in pursuit of the extraordinary than succeed in the ordinary. [15:01]* ## [17:16] – When to Quit vs. Persist Andrew Feldman describes himself as a 'professional David' who thrives on competing against larger incumbents through intellectual superiority. He emphasizes that founders must guard against the 'slippery slope' of persistence by using external mentors to hold them accountable to their original hypotheses. > *The slippery slope is a beast... you have to guard against it. [18:32]* ## [19:40] – Why Cerebras Went Public The transition to a public company is framed as a way to reduce the cost of capital and gain legitimacy with large-scale corporate clients. Feldman notes that Cerebras chose the IPO path to differentiate itself as the market's only 'AI pure play' revenue stream. > *For us it was an opportunity to graduate from corporate adolescence to corporate adulthood. [23:22]* ## [22:57] – The OpenAI Deal Feldman recounts the intense four-and-a-half-week period during which Cerebras finalized a $20 billion deal with OpenAI, driven by a sudden demand for fast inference. The deal moved at an unprecedented pace, involving constant work through the holiday season to meet technical requirements. > *For a 20 plus billion dollar deal to do it in four and a half weeks was exceptional. [24:59]* ## [25:54] – Open Source and Post-Trained Workloads Andrew Feldman highlights how the open-source ecosystem sustains market interest and pressures closed-source developers to innovate. He emphasizes that seeing external developers build creative solutions on Cerebras hardware is a core motivation for the company's infrastructure goals. > *You got to love other people's ideas to take flight on on what you built. [28:04]* ## [27:37] – How Speed Opens Up New Business Extreme speed in AI enables fundamental shifts rather than just incremental improvements, using Netflix's transition from DVDs to streaming as a primary example. Feldman argues that the ambition for speed is a competitive advantage, as seen in the rapid construction of data centers. > *when the internet got fast they became a movie studio right that's what happens with speed [28:38]* ## [30:07] – Conclusion Drawing parallels to the PC and cloud revolutions, Feldman predicts that AI will move beyond replacing specific tasks to fundamentally reorganizing how work is performed. This shift is expected to trigger massive jumps in global productivity as new business models emerge around the technology. > *once we start sort of fundamentally reorganizing around this, you're going to see this sort of new business models and fundamental jumps in productivity. [29:53]* ## Entities - **Andrew Feldman** (person): Co-founder and CEO of Cerebras - **Cerebras** (organization): AI hardware company known for wafer-scale engine technology - **OpenAI** (organization): AI research organization that signed a multi-billion dollar deal with Cerebras - **G42** (organization): A sovereign AI and technology holding company that placed a $1 billion order with Cerebras - **Nvidia** (organization): Leading GPU manufacturer and dominant competitor in the AI chip market - **Sarah Guo** (person): Host of No Priors and venture capitalist - **AWS** (organization): Amazon's cloud computing division deploying Cerebras hardware - **Netflix** (organization): Used as an analogy for how speed changes business models from delivery to production
Notion’s Ivan Zhao: The Refounder
Brian Halligan interviews Notion co-founder Ivan Zhao on his journey as a 'refounder' who navigated the company through its 2015 Kyoto restart and the 2023 generative AI pivot. Zhao details Notion's transition from a traditional SaaS structure to an AI-native 'jazz band' model that prioritizes technical versatility, taste, and agency over rigid hierarchies. The discussion explores how AI acts as the 'steel' for modern organizations, enabling flatter structures and faster, more reversible decision-making. ## [00:00] Introduction Brian Halligan introduces Ivan Zhao as the 'refounder' of Notion, highlighting his unique ability to restart the company during critical junctures in 2015 and 2023. The conversation sets the stage for Zhao's transition from a traditional SaaS management model to an AI-native organization. Halligan compares Zhao's approach to other tech visionaries like Jack Dorsey, emphasizing the importance of personal style and 'taste' in building a lasting brand. > *I like to think of him as the refounder... he's the canonical example of how a SAS company can move and become an AI company. [00:52]* > *We want to be a jazz band, not a marching band. [00:02]* ## [02:22] From Founder Mode to AI Org Ivan Zhao discusses his detour into traditional delegation and professional management before returning to a hands-on 'founder mode' necessitated by the AI shift. He explains that building with language models is less like predictable bridge engineering and more like 'brewing beer,' where the underlying technology dictates the development path. Zhao emphasizes hiring 'jazz band' people—versatile individuals like designers who code—to navigate the experimental nature of AI integration. > *Building with language model... is like brewing beer. You can't truly predict the things the underlying thing. [06:33]* > *The spirit is technology first-driven development rather than customer-driven first development. [07:01]* ## [11:00] Hiring for Taste and Agency Notion utilizes a 'barbell' hiring strategy that targets both super-junior and super-senior talent while avoiding the 'middle' of traditional SaaS experience. Zhao defines talent as the product of capability, taste, and agency, noting that AI has democratized basic capabilities like coding and writing. Consequently, the company now optimizes for 'agency' and 'taste,' qualities that remain difficult to automate and serve as the primary differentiators for the brand. > *capability got normalized democratized and taste becomes still important [11:53]* > *So the shape it's not it's more like the barbell barbell shape, right? [12:35]* ## [24:28] Refounding Notion in Kyoto In 2015, facing potential failure and low morale, Zhao and co-founder Simon Last laid off their entire staff and relocated to Kyoto, Japan, to rebuild Notion from scratch. This 'Kyoto Reset' allowed them to focus entirely on craft and coding while living a minimalist lifestyle. Zhao chose Kyoto specifically for its status as the 'craft capital of Asia,' which provided the spiritual inspiration needed to view software as a fundamental human tool. > *So my co-founder and I said let's just lay off everybody just go by the two of us. That's the Japan story. [25:41]* > *The story we tell ourselves is like Kyoto is a special place. If you can pull off anywhere, you can pull off from Reborn in Kyoto. [28:05]* ## [30:27] Craft Versus Commerce Zhao views Notion as part of a historical lineage of 'tools for thought,' tracing back to pioneers like Douglas Engelbart and Alan Kay. He criticizes modern Silicon Valley 'tinker culture' for ignoring the history and humanity behind technology. For Zhao, the goal is to find an equilibrium between the pure craft of an artist and the commercial viability of a business, ensuring the product has a 'soul' that resonates with users. > *Tech is like industry doesn't know its past. If you don't know his past you don't know history which is humanity. [31:52]* > *I need to be in equilibrium with my own value of what this company I want to build... [51:33]* ## [32:26] When to Refound For founders whose companies are stagnating, Zhao suggests listening to the 'inner urge' to take drastic action rather than wasting years on ventures without momentum. He argues that refounding is often harder than starting fresh because it involves taking a significant step back to pivot toward a new growth engine. Zhao believes the current AI-driven market is wide open, making it an ideal time for founders to be risk-seeking and follow their intuition. > *For me it's like there's you just feel you have to do something drastic... then you feel liberated once you land in Japan. [32:56]* > *The refounding is harder than it looks. It typically involves like a big step back and two steps forward. [59:57]* ## [34:07] GPT-4 Refounding Shock Zhao describes gaining early access to GPT-4 as a 'full body religious experience' that signaled a fundamental shift in the world. This realization forced a second refounding of Notion, as Zhao felt any work not involving this technology would soon become meaningless. The transition included a grueling 18-month period of low morale while the team waited for the underlying AI models to catch up with their ambitious product vision. > *GBD4 is a religious experience for me. It's like holy [ __ ]... anything you do if you don't do this it will be meaningless. [34:27]* > *that was like a year and a half just go with no error and morale is definitely low [35:50]* ## [45:35] Leadership and Founder Energy Despite being naturally introverted, Zhao explains how he forced himself to master one-to-many communication to build trust within Notion. He maintains a disciplined daily routine, starting at 7 AM and often working until midnight, while using 'guilty pleasure' reading to recharge. To prevent organizational calcification, Notion aggressively acquires startups to bring in 'founder energy,' currently employing over 50 former founders who lead critical domains. > *To lead the group of human you need to do one to many communications otherwise people don't trust you. [46:17]* > *founders are are kind of this kind of like little decalcified meatthead machinery just trying to break things [39:10]* ## [53:17] Sales Culture and Closing Thoughts Notion's transition to enterprise sales involved moving away from 'first-principle' experimentation toward established playbooks, pairing system thinkers with high-energy sales leaders. The conversation concludes with a vision of the 'AI-native' CEO playbook, which replaces traditional 'triangle' hierarchies with a 'circular' model. In this structure, a centralized AI system saturated with company context enables smaller teams to move at breakneck speed with reversible decision-making. > *You should only have each company should only preserve your innovation point to few places... [54:54]* > *All of those kind of one-way doors that Bezos used to talk about are really two-way doors... [62:39]* ## Entities - **Ivan Zhao** (person): Co-founder and CEO of Notion, known for his 'refounder' mindset. - **Brian Halligan** (person): Co-founder of HubSpot and interviewer. - **Notion** (organization): A productivity software company that pivoted to an AI-native model. - **Simon Last** (person): Co-founder of Notion who helped rebuild the company in Kyoto. - **Kyoto** (location): The Japanese city where Notion was restarted in 2015. - **GPT-4** (technology): The AI model that triggered Notion's second refounding. - **Steve Jobs** (person): Former Apple CEO cited as an inspiration for refounding and craft. - **Jack Dorsey** (person): Tech leader mentioned for his AI-centric organizational redesign. - **Douglas Engelbart** (person): Computing pioneer in the 'tools for thought' lineage. - **Erica** (person): CRO of Notion and former CRO of GitHub. - **SaaS** (concept): Software as a Service, the industry context for Notion's evolution. - **Jazz Band** (concept): Metaphor for a flexible, high-agency organizational structure.
AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona
Ivan Burazin, CEO of Daytona, discusses the massive shift from building developer environments for humans to providing composable computers for AI agents. With 74% month-over-month growth and 850,000 daily runs, Daytona provides the bare-metal infrastructure required for stateful, high-performance agentic workflows. This conversation explores the technical challenges of spiky compute, the $10 trillion computer-use market, and why the future AI cloud will look more like Stripe than AWS. ## [00:00] Hook Ivan Burazin describes the intense, direct demand for Daytona's infrastructure, with potential users calling him personally to request access. This level of interest signaled a massive, untapped market for providing execution environments to every future AI agent. The team realized they had identified a critical missing piece in the AI development stack. > *I've never experienced this that people literally call you if you do not give them access. Like they want access right now.* > *[0, 0]* > * ] }, { * > *title": "Introduction* > *{'start': 72.0, 'summary': "Host swyx introduces Ivan Burazin, noting their shared history in the developer experience and 'end of localhost' movements. Ivan recalls reaching out to swyx years ago for advice on developer experience while working at a previous role. They reflect on how their early interactions and mutual interests in cloud-based development tools eventually led to their current collaboration.", 'quotes': ['I was one of the co-founders of code anywhere... we were thinking a long time of like local host should die.', [1, 36], '\n ]\n },\n {\n ', 'title": "CodeAnywhere', 'Shift', 'and the end of localhost', {'start': 195.0, 'summary': 'Ivan discusses his long history with his co-founder, dating back to early 2000s virtualization and the creation of CodeAnywhere. As the first browser-based IDE, CodeAnywhere predated modern infrastructure like Docker and Kubernetes, which provided the team with deep foundational knowledge. After a successful run with the Shift developer conference, they returned to their infrastructure roots to launch Daytona.', 'quotes': ['We originally started stacking stacking servers doing like virtualization in the early 2000s... and that was a services company which we sold.', [3, 38], '\n ]\n },\n {\n "title": "What Daytona is: composable computers for AI agents",\n "start": 358.0,\n "summary": ', "Ivan defines Daytona as a provider of 'composable computers' for AI agents", "moving beyond the limited industry term 'sandboxes.' He explains that agents require diverse computing environments tailored to specific tasks", 'much like different hardware setups for human professionals. This API-driven infrastructure allows agents to execute code in production-grade environments rather than just temporary test boxes.', {'quotes': ['What Daytona is today is essentially composable computers for AI agents... the market calls them sandboxes which [is] misleading.', [6, 41], '\n ]\n },\n {\n ', 'title": "The pivot from dev environments to AI sandboxes', {'start': 487.0, 'summary': "Ivan explains how observing early agents like Devon and OpenHands led to a realization that AI agents require a dedicated compute runtime. While their initial SaaS offering for human automation saw low traction, it attracted developers who specifically needed sandboxes for their agents. This feedback loop revealed a massive, underserved market for agent-specific infrastructure that standard cloud providers weren't addressing.", 'quotes': ['a lot of people reached out that were building agents and they were like hey my agent needs a compute sandbox runtime', [8, 50], '\n ]\n },\n {\n ', 'title": "The New Year’s Eve MVP and customers begging for API keys', {'start': 617.0, 'summary': "On New Year's Eve, Ivan 'vibe-coded' the first MVP of what would become the new Daytona. Although the CTO initially dismissed the code as 'garbage,' the core idea was strong enough to warrant a two-week professional rebuild. When they demoed this version to previous skeptics, the response was immediate and overwhelming, with users demanding API access before the calls even ended.", 'quotes': ["I've never experienced this that people literally call you if you do not give them access.", [12, 18], '\n ]\n },\n {\n ', 'title": "Bare metal', 'stateful sandboxes', 'and Daytona’s scheduler', {'start': 776.0, 'summary': "The team approached the technical architecture from first principles, deciding to run on bare metal rather than traditional VMs. They aimed to combine the speed of AWS Lambda with the stateful, long-running nature of an EC2 instance. This allows agents to 'pause and come back' to their work, much like a human closing a laptop lid, without losing state or performance.", 'quotes': ["agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work", [13, 57], '\n ]\n },\n {\n ', 'title": "60ms startup', 50, 0, 'sandboxes', 'and 850K daily runs', {'start': 1048.0, 'summary': "Daytona's infrastructure is optimized for both individual speed and massive concurrency, with a single instance spinning up in just 60 milliseconds. This scale supports high-volume customers who perform nearly 850,000 runs daily, with some requesting capacity for half a million concurrent CPUs. The system utilizes a custom scheduler and local NVMe drives to eliminate network latency and maximize IOPS.", 'quotes': ['Our time to spin up one is 60 milliseconds with network latency... if you want to spin up 50,000 at once, we are now at about 75 seconds.', [17, 40], ',\n ', 'The biggest customer of ours does like about 850', 0, "every single day is sort of where they're where they're just shy of a million.", [18, 17], '\n ]\n },\n {\n ', 'title": "Spiky RL/eval workloads and the new agent infra problem', {'start': 1313.0, 'summary': "The 'spiky' nature of AI workloads presents a major challenge for compute providers, leading to a mean utilization rate of only 15% despite peaks hitting 90%. Workloads are categorized into 'background agents' that follow human cycles and 'evaluations/RL' which fire off massive bursts of activity at unpredictable hours. To manage this, Daytona must use capacity commits to handle sudden bursts of 100,000 or more CPUs.", 'quotes': ["Daytona's mean utilization is 15%... because it's very spiky. But it's very spiky but we get up to 90%.", [23, 1], '\n ]\n },\n {\n ', 'title": "RL workloads', 'Kubernetes pain', 'and dynamic resizing', {'start': 1692.0, 'summary': "Daytona competes primarily against managed Kubernetes services like EKS and GKS, positioning itself as a more ergonomic 'Twilio or Stripe' for compute. Unlike Kubernetes, Daytona offers a seamless API for spinning up sandboxes with significantly faster startup times. A key advantage is the ability to dynamically resize sandboxes on the fly to prevent out-of-memory (OOM) errors, a feature difficult to implement on other platforms.", 'quotes': ["Daytona although it's a compute provider it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS", [29, 46], '\n ]\n },\n {\n ', 'title": "Why every AI agent needs a computer', {'start': 2011.0, 'summary': "Ivan outlines the massive scale of knowledge work, estimating a $50 trillion global salary pool, much of which is locked in legacy Windows applications. He argues that true automation requires 'human emulators' that can interact with these legacy systems via GUIs when APIs are incomplete. By automating 40% of this work, the market opportunity for agentic computer use reaches approximately $10 trillion annually.", 'quotes': ['If you take 40% of that, you get to essentially like 10 trillion dollars a year.', [35, 20], '\n ]\n },\n {\n ', 'title": "macOS sandboxes and Apple’s licensing problem', {'start': 2328.0, 'summary': "The discussion shifts to the difficulties of hosting Mac OS sandboxes compared to Windows and Linux. Apple's restrictive licensing only allows two parallel VMs per machine and requires a 24-hour lock-in for users, making per-second billing economically unfeasible. Furthermore, security restrictions prevent moving memory snapshots between physical machines, severely limiting the scalability of agentic workloads on Mac hardware.", 'quotes': ['Apple is shooting itself in the foot... if it would just enable a concurrency model similar to what you can get on a Windows.', [40, 52], '\n ]\n },\n {\n ', 'title": "Why CLI may matter more than MCP', {'start': 2668.0, 'summary': "The discussion compares the Model Context Protocol (MCP) to the Command Line Interface (CLI) for agentic action. While MCP acts as an interface for APIs, the CLI allows agents to execute scripts and perform deep data analysis within a sandbox. This layer of indirection enables more complex agentic workflows beyond simple data retrieval, allowing agents to actually 'do things' rather than just integrate.", 'quotes': ['the MCP is an interface against an API whereas the CLI is like you can actually go do things... the difference between integrations and actually running scripts.', [45, 34], '\n ]\n },\n {\n ', 'title": "Open source', 'GitHub stars', 'and agent integration', {'start': 2891.0, 'summary': "Ivan details Daytona's transition to an AGPLv3 license for its sandbox product to balance openness with commercial protection. This 'copyleft' approach allows enterprise use but prevents competitors from building proprietary forks without contributing back. Keeping the core engine transparent builds trust with users and allows large enterprises to bypass lengthy security audits by providing agents with full context.", 'quotes': ["in the new sandbox product we did add a AGPL3... you essentially can't make a competitor without open sourcing your stuff.", [49, 49], '\n ]\n },\n {\n ', 'title": "Git', 'CI/CD', 'and agent collaboration bottlenecks', {'start': 3191.0, 'summary': 'Current versioning systems like GitHub are often too slow for the high-velocity output of AI agents, leading to bottlenecks in CI/CD pipelines. Some developers are creating makeshift solutions like dumping codebases into JSON files on S3 to bypass Git overhead. There is a growing need for an agent collaboration layer that precedes the traditional Git-based pipeline to handle companies generating over 1,000 PRs per day.', 'quotes': ["GitHub as-is was an overhead... it wasn't fast enough what they needed.", [54, 3], '\n ]\n },\n {\n ', 'title": "Founder life and building a 25-person infra company', {'start': 3495.0, 'summary': "Daytona's success stems from a core team of 13 people who have worked together for over seven years, fostering a high-trust culture. Ivan acknowledges the difficulty of the founder journey, including being away from family, but posits that growth requires 'pain.' He views his work as building the spiritual successor to serverless and Kubernetes for the agent era, requiring radical responsiveness as a differentiator.", 'quotes': ['Of the 25 people in Daytona, I think about 13 of them we have worked with seven years plus.', [58, 57], '\n ]\n },\n {\n ', 'title": "AI SaaS', 'token resale', 'and API-first business models', {'start': 3764.0, 'summary': 'Ivan presents a critical take on the SaaS ecosystem, arguing that the market is incorrectly applying a premium to vendors who simply resell AI tokens. He points out that these models have significantly worse margins than traditional SaaS. Instead, he advocates for companies to expose their data via APIs and charge for consumption, allowing for actual revenue acceleration through increased agentic usage.', 'quotes': ["The market is adding premium to SAS vendors that are reselling tokens. And I think that's incorrect.", [62, 54], '\n ]\n },\n {\n "title": ', 'GPU sandboxes', 'data centers', 'and compute growth', {'start': 3970.0, 'summary': 'Daytona plans to introduce GPU sandboxes to support workloads like 3D rendering and reinforcement learning on CAD, rather than focusing on inference. While the company currently runs on bare metal via colocation providers, Ivan notes they are architected to potentially own data centers in the future. He currently avoids the high capital risk of building data centers for single-digit margin gains.', 'quotes': ['We will [offer GPUs], but not for inference. Like essentially what we think about is like the GPU sandbox.', [66, 21], '\n ]\n },\n {\n ', 'title": "Why the AI cloud may look more like Stripe than AWS', {'start': 4188.0, 'summary': "The conversation concludes by imagining the 'AWS for AI Agents,' which Ivan suggests might look more like Stripe than a traditional cloud provider. This future 'AI Cloud' will integrate sandboxes, web search, and databases as fundamental primitives. While companies like Cloudflare and OpenAI are competing for this space, Ivan hints that many more infrastructure primitives for agents are yet to be developed.", 'quotes': ["There will be a cloud built out specifically for agents and so that cloud will have sandboxes and it will have web search and it'll have databases.", [70, 47], '\n ]\n },\n {\n ', 'title": "Closing thoughts', {'start': 4286.0, 'summary': 'The discussion ends with the observation that the AI infrastructure market is growing at an unprecedented baseline of 40-75% month-over-month. Ivan and swyx reflect on the race to secure hardware and the shift toward specialized agent clouds that will define the next decade of computing.', 'quotes': ["The entire infrastructure market is growing 40% plus or minus month over month... if you're not growing 40%ish... you don't have to come to work.", [68, 23], '\n ]\n }\n ],\n ', 'entities": [\n {\n "name": "Ivan Burazin', {'type': 'person', 'description': 'CEO of Daytona and co-founder of CodeAnywhere.'}, {'name': 'swyx', 'type': 'person', 'description': 'Host of Latent Space and early investor in Daytona.'}, {'name': 'Daytona', 'type': 'organization', 'description': 'A company providing composable computers and sandboxes for AI agents.'}, {'name': 'CodeAnywhere', 'type': 'organization', 'description': 'The first browser-based IDE, co-founded by Ivan Burazin.'}, {'name': 'Devon', 'type': 'product', 'description': 'An early AI software engineer agent.'}, {'name': 'OpenHands', 'type': 'product', 'description': 'An open-source AI agent project formerly known as OpenDevin.'}, {'name': 'Kubernetes', 'type': 'technology', 'description': "Orchestration technology mentioned as a competitor to Daytona's ergonomic API."}, {'name': 'Apple', 'type': 'organization', 'description': 'Mentioned regarding restrictive Mac OS virtualization licensing.'}, {'name': 'Salesforce', 'type': 'organization', 'description': 'Cloud-based software company mentioned for its API-first strategy.'}, {'name': 'GitHub', 'type': 'organization', 'description': 'Developer platform noted for being a bottleneck in agentic CI/CD workflows.'}, {'name': 'Nvidia', 'type': 'organization', 'description': 'The primary provider of GPUs whose supply constraints dictate market growth.'}, {'name': 'Stripe', 'type': 'organization', 'description': 'Used as a comparison for the consumption-based model of the future AI cloud.'}], 'tags': ['ai-agents', 'infrastructure', 'sandboxing', 'bare-metal', 'cloud-computing', 'developer-tools', 'computer-use', 'saas-growth'], 'seo_title': "AI Agents Need Computers: Ivan Burazin on Daytona's Pivot", 'seo_description': 'Ivan Burazin explains why AI agents need composable computers and how Daytona pivoted from dev environments to 850K daily agent runs.', 'confidence': {'score': 0.98, 'rationale': 'The summary synthesizes multiple detailed chunks covering technical metrics, business strategy, and market philosophy with high fidelity to the source.'}}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}* ## [01:12] Introduction ## [03:15] CodeAnywhere, Shift, and the end of localhost ## [05:58] What Daytona is: composable computers for AI agents ## [08:07] The pivot from dev environments to AI sandboxes ## [10:17] The New Year’s Eve MVP and customers begging for API keys ## [12:56] Bare metal, stateful sandboxes, and Daytona’s scheduler ## [17:28] 60ms startup, 50,000 sandboxes, and 850K daily runs ## [21:53] Spiky RL/eval workloads and the new agent infra problem ## [28:12] RL workloads, Kubernetes pain, and dynamic resizing ## [33:31] Why every AI agent needs a computer ## [38:48] macOS sandboxes and Apple’s licensing problem ## [44:28] Why CLI may matter more than MCP ## [48:11] Open source, GitHub stars, and agent integration ## [53:11] Git, CI/CD, and agent collaboration bottlenecks ## [58:15] Founder life and building a 25-person infra company ## [1:02:44] AI SaaS, token resale, and API-first business models ## [1:06:10] GPU sandboxes, data centers, and compute growth ## [1:09:48] Why the AI cloud may look more like Stripe than AWS ## [1:11:26] Closing thoughts
Build a production-ready agent with Claude Managed Agents
This session introduces Claude Managed Agents, a suite of API endpoints designed to help developers build and deploy production-ready AI agents with built-in tools, security, and observability. The speaker outlines how core primitives like Agents, Environments, and Sessions enable complex workflows such as multi-agent coordination and human-in-the-loop controls. ## [00:00] Introduction to Managed Agent Primitives Anthropic introduces Claude Managed Agents as a suite of API endpoints providing production-ready primitives like tool calling, error recovery, and memory management. The architecture relies on 'Agents' as templates for skills, 'Environments' for sandboxed execution with granular permissions, and 'Sessions' to maintain ongoing conversational context and state transitions. > *Claude Managed Agents at a high level is just a set of API endpoints that we've developed and released... that give you access to scaled ready, production ready agent. [01:35]* ## [07:54] Secure Connectivity and Sandboxing The platform supports self-hosted sandboxes, allowing developers to use private containers and VPCs to keep sensitive data secure while maintaining model access. Additionally, new MCP tunnels facilitate safe connections to internal Model Context Protocol servers, and Credential Vaults protect authentication tokens by keeping them out of the model's context window. > *Claude can directly connect to that safely without those MCP servers ever being exposed on the internet. [09:40]* ## [10:02] Multi-Agent Orchestration and Implementation A demonstration of a multi-agent architecture shows a coordinator agent spawning specialized sub-agents for complex tasks like financial analysis and macro trend research. Developers can implement these workflows using the Anthropic SDK and tools like Claude Code, which is specifically optimized to help developers implement and iterate on managed agent APIs. > *One agent is like in charge of figuring out macro trends... whereas another one is like really good at like financial analysis. [11:36]* ## [19:28] Observability, Memory, and Infrastructure The Claude Console provides robust observability, including agent versioning, session monitoring, and the ability to edit memory stores to correct agent context. By providing integrated state transitions and durable storage out of the box, the service eliminates the need for developers to build complex custom agent loops and sandboxing fleets manually. > *With cloud manage agents, we kind of were able to get all of these things out of the box. [26:54]* ## Entities - **Anthropic** (organization): The AI research and safety company that developed the Claude model family. - **Claude Managed Agents** (software): A suite of API endpoints for building and hosting production-ready AI agents. - **MCP** (protocol): Model Context Protocol used for secure authentication and tool integration. - **Claude Code** (software): A developer tool optimized for implementing and managing Anthropic APIs. - **Bun** (software): A fast JavaScript runtime used for the technical implementation demonstrations. - **Cloudflare** (infrastructure): A cloud provider mentioned as a host for private sandboxes and environments. - **Credential Vaults** (feature): A secure storage system for authentication tokens that prevents exposure to the model. - **Memory Stores** (feature): Persistent storage allowing agents to retain and retrieve information across sessions.
How to get to production faster with Claude Managed Agents
Anthropic engineers Michael and Harrison introduce Claude Managed Agents, a platform designed to simplify the infrastructure, security, and observability required for deploying autonomous AI agents. By handling complex backend tasks like sandboxing and identity management, the system enables developers to transition from simple tool use to long-running, outcome-oriented agentic workflows. ## [01:10] The Evolution of Agentic Infrastructure Michael and Harrison trace the progression of AI from basic function calling to autonomous agents capable of managing full feature development and PRs. They argue that infrastructure, rather than model intelligence, is now the primary bottleneck for achieving productivity where months of work are completed in hours. > *where we think we're seeing things going in the future is entire quarters worth of work being able to be getting accomplished within a couple of hours.* > *[2, 34]* ## [04:22] Core Primitives and Configuration The platform provides composable primitives for context management, observability, and secure sandboxing, allowing developers to define agents via system prompts and MCP tool configurations. Features like the 'Ask Claude' button and event streams provide real-time transparency and optimization suggestions for agent sessions. > *we did all of that platform work so that you don't have to so that you can kind of pick and choose the primitives that we have available.* > *[5, 26]* ## [10:05] Advanced Orchestration and Memory Beyond single-task execution, the platform supports multi-agent orchestration where Claude can spawn sub-agents to delegate work. Advanced features like 'Dreaming' allow agents to reflect across thousands of sessions, improving long-term memory and task performance through autonomous reflection. > *It allows Claude to spawn other agent threads with their own context windows in order to delegate work to them.* > *[10, 55]* ## [11:56] Sandboxing and Secure Connectivity Anthropic offers self-hosted sandboxes and MCP tunnels to give enterprises control over network policies and audit logs while exposing private data securely. Partners like Vercel, Modal, and Cloudflare provide specialized infrastructure, ranging from lightweight isolates for rapid scaling to high-performance GPU clusters. > *MCP tunnels are basically just a way for you to get your private MCPs in your network exposed to cloud manage agents.* > *[13, 25]* ## [20:19] Real-World Automation and Optimization Companies like DoorDash and Modal are using agents for complex technical tasks, such as autonomous account management and inference tuning. By running tools like the Nvidia profiler, agents can autonomously 'hill climb' performance benchmarks to optimize workloads without human intervention. > *Claude can optimize training loops... it'll run like the Nvidia profiler. It'll read the profiles and uh it'll just go ham and and make things better.* > *[20, 39]* ## [25:23] Future Challenges: Identity and Collaboration As agents become primary users of compute, the industry faces new hurdles in identity management, egress filtering, and task resumability. The future of AI involves moving from rigid execution to collaborative 'multiplayer' environments where agents and humans dynamically pivot based on feedback. > *how do we properly assign identity all the way down the chain such that it's only getting access to the right data* > *[25, 55]* ## Entities - **Anthropic** (organization): The AI safety and research company behind the Claude model family. - **Claude Managed Agents** (product): A platform and infrastructure suite for building and deploying autonomous AI agents. - **Michael** (person): Member of Technical Staff at Anthropic working on managed agents. - **Harrison** (person): Member of Technical Staff at Anthropic working on managed agents. - **MCP** (protocol): Model Context Protocol used for tool configuration and secure tunnels. - **Cloudflare** (organization): A cloud services provider focusing on sandboxing technologies like MicroVMs and isolates. - **Modal** (organization): A compute platform specializing in high-scale GPU sandboxes and AI workloads. - **Vercel** (organization): A partner providing fluid compute infrastructure for agent sandboxes.
Building the best agentic analytics harness: Powered by Claude, built with Claude Code
Chris Merrick, CTO of Omni, details the development of 'Blobby,' an agentic analytics harness powered by Anthropic's Claude models. By combining a robust semantic layer with internal dogfooding of Claude Code, Omni enables users to translate natural language into complex data visualizations while maintaining high engineering velocity. ## [00:07] Engineering Velocity with Claude Code Chris Merrick explains how Claude Code has transformed Omni's internal development, allowing a small team of 25 to maintain high commit velocity. Even as CTO, Merrick uses the tool to stay technically involved, leveraging the efficiency of the Claude Opus model to contribute code alongside his team. > *I thank Claude very much for making me uh still able to do some software engineering from time to time. [01:12]* ## [03:14] The Semantic Layer and Business Context To bridge the gap between general LLM knowledge and specific business data, Omni utilizes a semantic layer that provides essential context like fiscal definitions and table relationships. This layer acts as a permissions and curation tool, ensuring the AI agent understands the unique nuances of a company's data environment. > *Claude is incredible at answering questions, but you need to tell it more about your business if you want it to answer questions about your business. [04:03]* ## [11:15] Architectural Evolution and the 'Blabbotomy' The team evolved their AI agent, Blobby, from a simple Q&A tool into a sophisticated harness by upgrading from Claude Haiku to Sonnet for better multi-turn performance. They addressed 'split-brain' errors—where sub-agents and outer agents failed to communicate—by consolidating all tools into a single, unified agentic brain. > *You want to be careful not to have a split brain between any sort of sub agent system and outer agent system. [15:57]* ## [16:23] Leveraging SQL and CTE Proficiency Omni shifted its query strategy from a proprietary JSON format to standard SQL to better leverage Claude’s inherent proficiency with complex Common Table Expressions (CTEs). This transition allowed the agent to handle difficult data questions in a single pass, significantly improving the accuracy of generated reports. > *Claude really likes to write SQL with CTE, common table expressions... and our parser was really good at parsing those [18:27]* ## [19:09] Evals, Observability, and UI Validation Merrick emphasizes that rigorous evaluation systems and raw trace observability are critical for ensuring the predictability required by executive users. Omni follows a 'build with AI, validate with UI' philosophy, where Blobby generates the initial dashboard and users use a workbook interface to refine and troubleshoot the results. > *Our philosophy from a product perspective is AI to build, UI to sort of validate and troubleshoot and refine. [23:21]* ## Entities - **Chris Merrick** (person): CTO and Co-founder of Omni who leads the engineering team and advocates for AI-driven development. - **Omni** (organization): An AI analytics platform that enables users to query data using natural language. - **Claude** (ai-model): The family of LLMs from Anthropic that powers Omni's analytics and internal engineering. - **Claude Code** (software): An AI-powered coding tool that significantly increased Omni's development velocity. - **Blobby** (ai-agent): Omni's AI data analyst agent designed to interpret and answer complex data questions. - **SQL** (technology): The query language that Omni's semantic layer generates to interact with data warehouses. - **Claude Sonnet** (ai-model): The specific Anthropic model used to unlock performance gains in complex agentic conversations. - **GitHub** (platform): The source of pull request (PR) data used in the agent's demonstration.
Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)
Prof. Michael I. Jordan challenges the anthropomorphic framing of AI, arguing for a view of intelligence rooted in collective human systems and economic theory. He critiques "superintelligence" narratives as demoralizing distractions and advocates for a shift toward viewing AI as an ecosystem that facilitates human collaboration and job creation. By integrating microeconomics, game theory, and statistical rigor, Jordan proposes a new engineering discipline focused on system-level safety and social welfare. ## [00:00] Cold open: A demoralizing message to young builders Michael I. Jordan criticizes the trend of anthropomorphizing AI, calling it a distraction from real-world problem-solving. He expresses concern that "doomer" narratives about humanity's extinction are demoralizing to young engineers who want to build helpful technology. He argues that these leaders lack economic thinking and are detached from the reality of how systems are built. > *I think this anthropomorphizing of intelligence and understanding all that is not necessary, not appropriate, and is is a distraction [00:21]* > *It's gonna wipe out humanity with a with a high probability... That is so demoralizing. [01:12]* ## [02:04] CyberFund sponsor read Host Tim Scarfe introduces CyberFund, a venture firm looking for "AI native" founders. They are launching a "monastery" program designed for rapid execution and focus, offering significant funding to teams operating at the frontier of AI technology. The section concludes with a brief transition into a discussion about the term AGI. > *CyberFund believes the future belongs to AI natives who want to achieve the impossible [02:12]* > *AGI to me is just a bit of it's a it's a PR term. [02:45]* ## [02:50] From symbolic AI to machine learning systems Jordan clarifies that he identifies more as a statistician and cognitive scientist than a traditional AI researcher. He explains that while early AI focused on logical inference, the real industrial impact came from machine learning methods like logistic regression and decision trees. These methods, rooted in statistics and operations research, powered the growth of the cloud and global supply chains. > *I've never actually thought of myself as an AI researcher... The term was coined in the fifties... and they had particular methods in mind [03:29]* > *Supply chains and commerce and transportation systems all used, and still to this day, vast amounts of machine learning. [04:04]* ## [05:42] Why AGI is mostly a PR term Jordan describes "AGI" as a distortionary term that confuses the next generation of researchers. He notes that the "AI" buzzword resurfaced primarily due to the success of Large Language Models (LLMs) in mimicking human fluency. He argues that this focus on human-like language has distracted from the necessary development of robust business models and social-scale technology. > *The AI buzzword returned because of LLMs... it's been a distortionary effect on the path of research [05:01]* > *The role of humans as producers and consumers in these emerging systems should respected, amplified and thought about. [05:33]* ## [08:48] A collectivist, economic perspective on AI Jordan introduces his perspective that intelligence is a social and collective phenomenon rather than just an individual or computational one. He argues that smart action is contextual and often involves interacting with others through collaboration or competition. By incorporating economic and game-theoretic principles, he aims to build safer, more effective systems. > *We are social animals, and a lot of our intelligence comes by the fact that we aggregate. [07:20]* > *The society provides a context for our intelligence. Smart action in 1 context is not in another context [07:31]* ## [11:33] Why LLMs need system design, not hype Jordan compares the current state of AI development to early chemical engineering, where trial and error led to dangerous "explosions" and social harm. He critiques Silicon Valley's reliance on scaling LLMs without considering the displacement of jobs or the mental health impacts already seen in social media. He calls for a more rigorous social science and mathematical foundation rather than relying on metaphors. > *If you were a chemical engineer... saying we're just gonna throw a lot of stuff together... you'd get a lot of explosions. [12:12]* ## [14:50] Predictability beats faux understanding While some researchers focus on 'mechanistic interpretability' to understand AI's internal logic, Jordan argues that full internal understanding isn't strictly necessary. Drawing a parallel to human behavior, he suggests that predictability and 'rules of thumb' are more important for safe interaction. In practical scenarios like bank loan denials, users need contextual explanations based on similar cases rather than a map of internal neural circuits. > *I don't think it's bad to build systems you don't understand. But then you've got to kind of put things around it. [15:14]* ## [17:55] AlphaFold, bias, and prediction-powered inference Jordan examines AlphaFold as a successful, targeted application of machine learning that revealed significant biases. While the model provided the statistical power to reject null hypotheses, it lacked error bars for specific scientific questions. To address this, Jordan introduces prediction-powered inference (PPI), a methodology that merges small amounts of ground truth data with massive model outputs to produce trustable error bars. > *It doesn't give you out error bars and it doesn't specifically on the question you're asking. That's where I want the error bars. [20:14]* > *We developed something called prediction powered inference that does exactly that... it'll cover the truth just like in a classical statistical setting. [20:38]* ## [21:48] Stop anthropomorphizing intelligence Jordan rejects the necessity of applying terms like 'understanding' or 'intelligence' to machine learning systems, calling such anthropomorphizing a distraction. He cites Amazon's supply chain systems, which optimized global logistics without any human-like understanding. These systems are valuable because they reduce uncertainty and enable planning, not because they possess cognitive traits. > *Why say it understands? This anthropomorphizing of intelligence understanding all that is not necessary, not appropriate, and is a distraction. [22:51]* > *Even though we don't have a clue what understanding intelligence means, we and our researchers realize we don't care or need it. [24:23]* ## [27:44] Drug discovery as an incentive problem The conversation shifts to how economics provides a framework for analyzing complex, multi-agent systems like pharmaceutical regulation. Jordan explains that statistical problems become economic ones when data is provided by self-interested parties seeking profit. Effective systems must be designed to incentivize truthful behavior to control error rates in high-stakes environments where information is hidden. > *Now you've a kind of tangled web of scientists and pharmaceutical companies, not just 1 but many, many of them, and proteins. [28:49]* ## [32:29] The three-layer data market Jordan introduces a three-layer model involving users, platforms, and data buyers to illustrate how privacy and utility reach an equilibrium. He suggests that platforms could offer tunable levels of differential privacy as a competitive feature. This approach shifts the focus from simple optimization to equilibrium-based systems to design more robust social welfare structures. > *So let's think about a data market because data is not just now something you analyze to build a big LLM, it's also something you would sell and buy [32:54]* > *The platforms would say, well, we'll offer you a tunable level of differential privacy for some cost. [35:02]* ## [38:07] Social knowledge, markets, and culture Jordan distinguishes between raw data and social knowledge, which he describes as ephemeral and context-dependent. He argues that markets and cultures naturally create abstractions that are promoted from individual insights to collective knowledge. AI systems should facilitate the emergence of these new cultural abstractions rather than just reinforcing existing ones. > *Human culture creates abstractions... and when those abstractions are kind of useful enough... they kind of get promoted into the culture. [41:52]* ## [45:39] Creator economics beyond Spotify Using Spotify and YouTube as examples, Jordan discusses the failure of current digital markets to properly reward creators. He advocates for ecosystems that empower musicians to maintain ownership and connect directly with brands, citing United Masters as an alternative. He argues that platforms often become monopolies that necessitate a broader macroeconomic view of AI's role. > *I'm not against Spotify, but it should be part of an ecosystem that actually rewards the artist more. [46:56]* ## [48:30] How science-fiction AI narratives mislead young builders Jordan addresses warnings of agential, self-improving AI as "science fiction" that demoralizes young builders. He argues that framing the future as a binary between superintelligence or extinction ignores economic realities and stifles innovation. He dismisses the idea that LLMs replicate the human brain, calling the comparison a "cartoon" or metaphor. > *It's gonna wipe out humanity with a with a high probability... That is so demoralizing. [49:33]* ## [51:45] AI should improve humans, not replace them Jordan defines the true purpose of AI as aiding information flow to help humans make the decisions they actually want to make. He highlights the imperfections of human systems and argues that AI should address the gaps where evolution failed to prepare us for modern complexity. Rather than replacing humans, technology should serve as an aid to human creativity and emotion. > *AI is about helping the things that were too hard for humans* ## [56:42] Safety is a property of the whole system ## [58:12] Silicon Valley gurus and the cream off the top ## [1:00:47] Game theory, mechanism design, and contracts ## [1:04:39] Conformal prediction, e-values, and anytime inference ## [1:08:11] A new liberal arts triangle for the AI era ## [1:11:30] The Bayesian duck and markets as uncertainty reduction
The Agent-Native Cloud: Jake Cooper on Railway's Future
Jake Cooper, CEO of Railway, details the platform's evolution from a high-burn startup to a sustainable, bare-metal cloud infrastructure powering 3 million users. He argues that the rise of AI agents necessitates a fundamental rebuild of the cloud, moving away from human-centric tools like Kubernetes and pull requests toward high-density CLI handles and production forking. This conversation provides a roadmap for building modular, high-scale systems capable of supporting the next generation of automated software development. ## [00:00] Intro Jake Cooper argues that developers should stop writing code by hand and instead focus on reviewing agent-generated code to maintain architectural integrity. He emphasizes that while AI tools have improved significantly, underlying architectural patterns matter more than ever in an automated workflow. The hosts introduce Jake as the 'Conductor' of Railway, setting the stage for a discussion on the future of cloud platforms and developer experience. > *you should be reviewing the code that you are writing instead of trying to go and write it by hand.* > *[0, 10]* ## [01:19] What Is Railway? Railway is described as a platform that allows users to deploy applications and databases instantly via a canvas or AI prompts like Claude. Jake explains that the goal is to manage software versioning and environment cloning to reduce the complexity of traditional tools like Docker and Kubernetes. By tracking all changes, Railway enables developers to fork production environments into parallel universes for safe validation without reproducing staging environments manually. > *railway is the easiest way to ship anything.* > *[2, 29]* > *we want to make it really easy for not just to like deploy things, but for you to almost like evolve applications over time.* > *[2, 49]* ## [03:26] Jake’s Path to Railway Jake details his professional journey from front-end work at Wolfram to building distributed systems for Jump bikes at Uber using Cadence. He describes his engineering philosophy as a willingness to 'swim to the bottom of the pool,' which includes writing kernel patches to ensure the best possible user experience. Additionally, he critiques GitHub's architecture, specifically the 'broken pointers' created by cloning, which complicates upstream contributions. > *we will swim to the bottom of the swimming pool to go and get the experience* > *[4, 35]* > *GitHub's original sin is that it's like almost a series of broken pointers.* > *[6, 2]* ## [07:32] Railway’s Six-Year Growth Story Jake presents a growth chart illustrating the rapid increase in daily signups for the Railway platform, which has transitioned from a 'slow grind' to adding 100,000 users weekly. Early growth was driven by high-touch interaction on Discord and a determination to acquire the first 100 core users manually. This visual data serves as a transition into the company's history of scaling and its move toward becoming a primary cloud provider. > *so I just wanted to like pull up this glorious chart you say which is basically your usage or number of daily signups* > *[7, 34]* > *Trying to get those initial like first 100 users to like actually kind of come back to it.* > *[8, 21]* ## [10:11] Rebuilding the Business After the Free Tier At one point, Railway was losing $500,000 a month while only generating $50,000 in revenue, despite having $20 million in the bank. Cooper realized this was an unsustainable business model and chose to prioritize long-term viability over vanity metrics, temporarily closing the free tier to rebuild. The company now maintains a lean team of 35 people, preferring to build automated systems rather than throwing headcount at problems. > *We basically had to kind of close off the the free kind of users for a little while, rebuild the business.* > *[11, 47]* > *We're 35 people right now... we don't want to just like add headcount for the sake of headcount.* > *[10, 52]* ## [12:36] Agents as the Next Software Platform Over the last six months, Railway has prioritized 'agentic' development as the primary mechanism for building and deploying software. Cooper believes the industry is moving from assembly and high-level languages to 'words' as the primary interface. He envisions a future where thousands of agents run in parallel, requiring new tools for coordination and version control to manage the super-exponential growth of workloads. > *We've moved from assembly to C to C++ to JavaScript to now like words.* > *[13, 23]* ## [14:48] Railway’s Infrastructure Philosophy Jake Cooper explains that Railway prioritizes control over low-level primitives like network, compute, and storage to optimize for AI agent workloads. By avoiding Kubernetes in favor of custom orchestration, the team can place workloads with high precision to ensure memory efficiency. This level of control is necessary to prevent cost structures from ballooning as agent usage increases and requires thousands of parallel instances. > *you have to be very very efficient with these agents... or you're going to massively massively blow up your cost structure* > *[15, 10]* > *How do you get agents to coordinate? How do you go and get them to be able to like safely version changes?* > *[14, 28]* ## [17:01] Bare Metal, Cloud Economics, and the Compute Crunch Cooper describes the transition to bare metal as highly lucrative, reporting a payback period of just three months compared to cloud rental costs. This strategy allows the company to achieve 70% margins while leveraging hardware that remains viable for several years. He also notes the surprising appreciation of hardware assets, such as RAM, due to the global compute shortage and supply chain constraints. > *our payback period when we go to to metal... if we rent it in the cloud, our payback period is about 3 months.* > *[17, 2]* > *hardware and all of this stuff is... appreciated in value because RAM has gone up* > *[17, 50]* ## [18:41] Cloud Bursting and Five-Cloud Networking To maintain growth without being compute-constrained, Railway utilizes a hybrid cloud strategy for bursting capacity across AWS, GCP, and Oracle. This required building a custom network overlay capable of straddling five different cloud environments simultaneously. While this complexity led to past reliability challenges, it now allows Railway to scale rapidly regardless of individual provider quotas or hardware availability. > *I spent a weekend rebuilding our entire like network like overlay essentially so that we could straddle uh five different clouds* > *[19, 41]* > *we still maintain like cloud presence for like bursting essentially* > *[18, 52]* ## [21:39] Data Center Debt and Infra Financing Cooper highlights the strategic use of data center debt, secured against hardware, as a more efficient alternative to venture capital for infrastructure expansion. By treating compute capacity as a linear driver of revenue, Railway can scale as quickly as they can deploy new hardware. He encourages infrastructure startups to explore diverse financing tools rather than relying solely on expensive venture equity for physical assets. > *we can scale revenue as basically as quickly as we can scale compute* > *[21, 20]* > *our margins on metal are like quite high for the like 70%.* > *[20, 46]* ## [24:50] Data Centers in Space Jake Cooper and the hosts explore the technical challenges of placing data centers in space, specifically the issue of heat dissipation in a vacuum. Cooper expresses skepticism toward current proposals that ignore fundamental thermodynamic laws, comparing the 'figure it out later' mentality to science fiction. He highlights the difficulty VCs face in distinguishing between visionary ideas and technical 'grifts' in the space-tech sector. > *I haven't seen anybody like prove how you're going to go and dissipate that much heat in a vacuum* > *[25, 16]* > *how do you know what's like basically not possible and like is a grift versus like uh is possible but like sounds completely insane* > *[26, 16]* ## [26:43] What Agents Need From Infrastructure Cooper outlines the infrastructure needs of AI agents, noting they require versioning, observability, and storage similar to humans but at a 1000x scale. He predicts that current industry standards like Kubernetes and Envoy will become bottlenecks as agentic workloads compress development cycles. To support this growth, infrastructure must be modular enough to allow for the rapid replacement of failing components without human intervention. > *the workload profile doesn't change so much as it gets like massively massively compressed because you need to do thousands of these things* > *[28, 28]* > *you just need at a thousandx scale* > *[29, 13]* ## [29:43] CLIs, Canvas, and Agent-Native UX Cooper explains that while humans prefer simplicity, agents benefit from high-density CLI interfaces with numerous flags that serve as 'handles.' The Railway Canvas is also evolving into an output mechanism and 'context anchor' rather than just an input tool. This hierarchical view of infrastructure prevents critical knowledge from being siloed as teams scale complex 'hyperstructures' using automated agents. > *If you hand it to an agent and you say, 'Hey, that's 40 arguments and 600 flags.' Like, oh yeah, this is excellent.* > *[30, 35]* > *It has to be almost like an anchor for your context. It has to be like a port in the storm.* > *[34, 27]* ## [36:34] Central Station, Incidents, and Responsible Disclosure Railway utilizes an internal tool called Central Station to aggregate feedback and user context, moving away from static communication channels like Slack. The team emphasizes transparency by exposing real-time metrics and detailed incident reports, operating under a core value of 'honor.' This approach involves over-disclosing issues to users rather than providing vague or misleading information during outages. > *We'd rather overdisclose and know that you know that something is wrong versus almost like having your provider gaslight you.* > *[40, 22]* > *If you can dynamically aggregate that information and dynamically route it to the right person... this is no longer a manual process.* > *[37, 10]* ## [41:49] Safe Rollouts, SRE Agents, and Production Forks To mitigate the impact of bugs, Railway employs incremental rollouts and makes it easy to test behaviors in safe, shadowed environments. Cooper argues that production should not be treated as 'sacred' to the point of stagnation; instead, infrastructure should allow for trivial production forks. This is essential for AI agents, which face a 'stacking entropy' problem without safe iteration primitives to prevent system drift. > *We've built so much ceremony around like production is sacred... we need to get to a point where it's just trivially easy to test different behaviors.* > *[41, 33]* > *I think if you don't have the primitives to make iterating in production safe, it becomes very very difficult.* > *[44, 3]* ## [46:19] AI SRE, Specs, Code, and Tests Jake Cooper reflects on his transition from an AI skeptic to a believer, noting that the safety of AI SREs depends on infrastructure primitives. He advocates for the 'Holy Trinity' of software engineering: a clear specification, the code, and the tests. By aligning these three, developers and agents can reconcile discrepancies and maintain system integrity during rapid, automated iteration. > *If you just unleash an AI SRE on your production infrastructure... it's going to nuke your production database.* > *[46, 37]* > *You need three points essentially which is you need a clear spec... you need the code and then you need the tests.* > *[48, 22]* ## [49:43] Self-Replicating Infrastructure and the New Serverless The speakers explore the concept of agents using the Railway CLI to modify their own infrastructure, creating a self-replicating loop. This shift necessitates a move away from expensive, static virtual machines toward cheap, instantaneous 'atomic units of deploy' like isolates or sandboxes. The goal is to make throwaway copies of production as trivial and cost-effective as possible for agentic experimentation. > *The agent can like modify its own infra which I think is... yeah it's nuts.* > *[50, 4]* > *How do you go and make those throwaway copies like as trivial as possible to spin up run super cheap etc.* > *[50, 53]* ## [54:37] Heroku, Temporal, and Workflow Engines Cooper attributes the decline of Heroku to Salesforce's lack of focus on compute as a core business, leading to product stagnation. Railway positions itself as a 'fluid compute' provider, leveraging Cooper's decade of experience with Temporal (and its precursor Cadence) for durable workflows. Railway is a power user of Temporal, using it to manage complex, long-running infrastructure tasks at scale. > *The business of Salesforce is to build a really really good CRM... and then you acquire this business as a compute business that's kind of an offshoot* > *[55, 33]* > *I have used Temporal for almost like 10 years now, right? Because like Cadence, all of us other things.* > *[60, 5]* ## [1:05:26] Railpack, Nixpacks, and Lazy-Loaded Filesystems Railway is developing Railpack, an engine for determining source code dependencies, which evolved from their earlier Nix-based tool, Nixpacks. While Nix offers theoretical benefits for versioning, Railway found it caused significant image bloat and scaling issues for real-world workloads. They are now exploring content-addressable file systems to enable lazy loading of data into memory for faster deployments. > *If you want version X and version Y, you end up bloating a lot of your kind of like package like space.* > *[66, 2]* ## [1:07:20] Coding Agents, Token Spend, and Roadmap Acceleration With a monthly cloud spend reaching $300,000, Railway heavily incentivizes the use of AI coding agents among its employees. Cooper argues that manual code generation is an inefficient use of time, urging developers to focus on architectural patterns and code review. This allows the team to 'speedrun' their product roadmap by automating complex infrastructure tasks and test generation. > *If you are writing code by hand you are doing this wrong... you should be reviewing the code that you are writing.* > *[67, 37]* > *If you're not using the AI systems to almost like speedrun your road map... then you're kind of missing a large point.* > *[69, 12]* ## [1:12:15] The Pull Request Is Dying The traditional SDLC is undergoing a radical transformation where the pull request and manual code review are losing relevance. Impact is increasingly measured by the 'percentage of tokens that end up in production' rather than lines of code. As AI systems handle more reconciliation and validation, the focus shifts from the PR to the initial prompt and final deployment. > *The pull request is dying... it's going to be the prompt... and beyond that code review is also kind of dying.* > *[72, 23]* > *The really naive way to go in and measure this is almost like your percentage of tokens that end up in production.* > *[71, 40]* ## [1:13:47] Feature Flags and the Agent-Era SDLC Jake Cooper discusses the critical role of feature flagging in managing the 1000x compression of the SDLC driven by AI agents. He argues that incremental rollouts and blast radius management through flagging will become even more essential for safety as deployment speed increases. This culture of flagging allows for rapid experimentation without compromising system stability for enterprise customers. > *Everything's just going to get compressed by like a thousandx so that everybody can go and do that.* > *[77, 21]* ## [1:17:34] Cattle, Pets, and Cloning Machines Jake offers a contrarian view on the 'cattle not pets' philosophy, suggesting that snapshotting allows developers to treat infrastructure like 'pets' again. By snapshotting every frame and lazily loading file systems, the overhead of traditional DevOps tools like Dockerfiles is reduced. Railway even modifies the kernel to support persistent connections during these system snapshots. > *I think you can move towards having pets so long as... you have a cloning machine for your pets.* > *[78, 2]* > *If you can snapshot every single thing at every frame, then like it actually doesn't matter if you know that obliterated.* > *[78, 12]* ## [1:20:48] Solo Founder Lessons Jake reflects on his path as a solo founder, contrasting it with the Silicon Valley consensus of finding a co-founder. He emphasizes the need to be obsessed with every layer of the stack, from kernel-level changes to go-to-market strategies. He argues that having two co-founders can often lead to deadlocks without a clear tiebreak, whereas solo leadership allows for singular vision. > *Two is the worst number of co-founders is because you have no tiebreak... you basically are like, well, I disagree on this thing.* > *[82, 49]* ## [1:25:31] Focus, GPUs, and Building a New Cloud Railway is intentionally avoiding the GPU provider market for now to maintain its core mission, though Cooper admits GPUs are an inevitable part of their long-term roadmap. He stresses that companies are defined as much by what they choose not to do as by what they execute. The ultimate goal is full vertical integration to ensure a seamless experience from logic to execution. > *I think you're you're defined almost more by the things that you don't do than the things that you do* > *[86, 8]* > *I can tell you for a fact that we will not be doing GPUs now, but we 100% will be doing GPUs at some point.* > *[86, 50]* ## [1:29:39] Closing Thoughts Cooper reveals that Railway is moving toward 100% ownership of its data centers to avoid copying the infrastructure of legacy hyperscalers. By inventing their own infrastructure from scratch, Railway aims to support 'vibe coding,' where the friction between a thought and a live application is completely removed. This approach empowers a new generation of 'citizen developers' to build at the speed of thought. > *there should be no friction in between what your thought is and reality that kind of comes out.* > *[89, 4]* > *we've been very very deliberate to like invent our own infrastructure from scratch.* > *[88, 30]* ## Entities - **Jake Cooper** (person): CEO and 'Conductor' of Railway. - **Railway** (organization): A cloud platform designed for easy deployment and environment management. - **Uber** (organization): Jake's former employer where he worked on distributed systems for Jump bikes. - **Temporal** (software): A workflow orchestration platform used by Railway for reliable infrastructure tasks. - **Salesforce** (organization): The CRM company that acquired Heroku, leading to its perceived stagnation. - **Heroku** (organization): A pioneer PaaS platform that Railway is often compared to. - **AWS** (organization): Amazon Web Services, used by Railway for hybrid cloud bursting. - **GCP** (organization): Google Cloud Platform, one of the five clouds Railway straddles. - **Claude** (software): An AI model mentioned as an interface for deploying on Railway. - **GitHub** (organization): A code hosting platform discussed regarding its architectural flaws in versioning. - **Kubernetes** (software): An orchestration system Railway chooses to avoid for higher-order control. - **Central Station** (product): Railway's internal tool for aggregating user context and support feedback.
The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion
Ukraine produced 4 million FPV drones last year; China could produce 4 billion. That asymmetry frames two hours of unusually concrete conversation between Yaroslav Azhnyuk — serial tech founder turned AI-drone builder at The Fourth Law — and economist Noah Smith, who has been writing about the economics of drone warfare since before most Western policy circles took it seriously. They cover the full tech stack (cameras, autonomy modules, fiber optic links, interceptors, a semiconductor fab under construction), a five-level autonomy taxonomy, an eight-dimension autonomous-battlefield framework, and China's manufacturing edge that has no near-term Western answer. The through-line: the West is still planning to fight the last war, Ukraine is the defense valley where the next war is already live, and the gap is widening faster than most people realize. ## [00:00] Cold Open: China's 4 Billion Drones and the Cameras-to-Explosives Pipeline Yaroslav opens cold with a single arithmetic comparison that structures the rest of the episode. Ukraine, not an industrial powerhouse, built 4 million FPV drones in a year. China, with an order-of-magnitude larger manufacturing base and a consumer electronics supply chain already producing the same cameras, motors, and chips, could produce 4 billion. Noah immediately asks whether that makes China the supreme conventional military power on earth right now. Yaroslav won't claim certainty, but won't rule it out either. > *"I don't think we have all the information to claim that, but we cannot count it out. And that alone should be, you know, a big warning sign."* The cold open also plants the personal pivot that the rest of the episode unpacks: Yaroslav went from making cameras that fling treats to pets to cameras that fling explosives to occupiers. ## [01:04] Introduction: Brandon, Noah Smith, and Yaroslav Azhnyuk Guest host Brandon normally runs a science podcast; this episode is the exception. Noah Smith — Noahpinion Substack, economist focused on industrial policy and geopolitics — is co-host and co-interviewer. Yaroslav sets the personal context: on February 23rd, 2022, he and his then-fiancée landed in Kyiv at 11 p.m. on what turned out to be one of the last flights into the city. Eight hours later, the bombs fell. The 17-hour drive west that followed — empty streets, gas stations out of fuel, pouring diesel into windshield-washer canisters — reads like a scene from an apocalyptic film because, for the people living it, it was exactly that. > *"We basically packed our belongings and got in the car and spent 17 hours riding west. That was exactly like that. I, you know, missiles are falling, like there was smoke in Kyiv."* ## [05:41] From Tech Entrepreneur to Defense: PetCube, Brave One, and the D3 Fund Yaroslav's path from pet-tech to defense wasn't a straight line. In San Francisco from 2014 to 2020 building PetCube (one of the leading pet-camera companies), he had never taken military coursework and considered wars a thing of the past. Day one of the invasion he knew he would fight back with everything he could — but weapons weren't the first instinct. Early efforts included lobbying U.S. Congress on Lend-Lease (passed May 2022, underdelivered), co-founding Brave 1 (Ukraine's defense-innovation cluster, analogous to DIU), and helping seed the D3 Fund co-started by Eric Schmidt. By 2023, two things became undeniable: the war would last, and drones had permanently redefined warfare — the first software-defined weapon platform in history, where a battlefield capability upgrade can be pushed overnight like a software update. > *"It's like if you were able to push a software update and get all of your Roman legionaries a new helmet. That has never been possible before."* ## [10:42] The Ethics of Building Weapons: Dual-Use Technology and the Wolf at the Door Brandon raises the dual-use problem: the technology won't stay in Ukrainian hands. Yaroslav's answer is pragmatic rather than philosophical. Every technology from fire to large language models is dual-use; the question for a maker is whether the marginal risk of their contribution outweighs the immediate need. Ukraine is in a forest with a wolf. You deal with the wolf first, then consult Greenpeace. He's clear-eyed that no technology stays contained — the parallel concern about LLMs freely available in North Korea and Russia applies equally to drone autonomy — but frames his own company's responsibility narrowly: they supply to the Ukrainian government and armed forces, not to arbitrary buyers. > *"When you're in a situation where you're in a forest in front of a wolf, you know, you first going to deal with a wolf that wants to eat you and then you're going to go consult Greenpeace."* ## [14:01] The Tech Stack: Cameras, Autonomy Modules, Interceptors, and a Semiconductor Fab The Fourth Law's structure is three interlocking business units. Cameras (daytime and thermal, sold to 200+ Ukrainian drone manufacturers). Drone autonomy modules (sold to the same ecosystem). And UAV products sold direct to the armed forces: FPV strike drones, bombers, Shahed interceptors, and ISR interceptors — drones that hunt Russian reconnaissance drones before they can relay targeting data. The thermal-camera arm is about to start construction on two semiconductor fabs to manufacture sensor chips in-house, driven by the realization that dependence on foreign sensor supply chains is a strategic vulnerability. > *"We're about to start construction of two semiconductor plants to make sensors for thermal cameras. That's super exciting for me as a computer science guy — doing semiconductor, super cool."* ## [18:47] Fiber Optic vs. AI: The Radio Horizon Problem and $32/km Cable The chapter is really about why radio-only FPV drones fail at long range — not just from jamming, but from the curvature of the Earth. Below roughly 60-100 meters altitude at 30-40 km range, a drone enters a radio shadow behind hills, forests, or the horizon itself. The pilot loses video and control precisely when closing on a target that is, by definition, on the ground. Fiber optic cable ($32/km, spooled from the drone) solves the shadow problem but adds weight, limits range, and reduces maneuverability. AI fills the gap differently: terminal guidance lets the drone complete the last few hundred meters autonomously even after the radio link breaks. The two approaches aren't mutually exclusive — you can run AI on top of a fiber optic link to command hundreds of drones with fewer operators. > *"If your drone goes low — and usually Russian infantry and vehicles, they're on the ground and you want to hit them, you need to go low — lower you go, maybe you'll get behind a hill or behind a forest, and if you're far enough you'll just get behind the curvature of the Earth."* ## [25:32] FPV Drones: The New God of War — 70–80% of Frontline Casualties Artillery was historically called "the god of war" because it caused 80% of battlefield casualties. On the current Ukrainian front line, 70-80% of casualties are inflicted by FPV drones — the same fraction, a different weapon. Tanks, designed to dominate land warfare for decades, are now routinely destroyed by $400 consumer-grade quadcopters because armor was never built to defend against attacks from directly above. The trajectory follows the same curve as calculators becoming irrelevant once smartphones arrived: not a linear substitution but an exponential displacement where the new technology's influence grows nonlinearly. > *"They used to say that artillery is the god of war because artillery used to cause like 80% of casualties, and now on that ranking FPV drones rule."* ## [28:28] The Five Levels of Drone Autonomy: From Terminal Guidance to Full Autonomy Yaroslav lays out five autonomy levels describing where the field stands and where it's heading. Level 1 is terminal guidance — the drone flies under human control and locks onto a target only in the final seconds. Level 2 is bombing — dropping munitions from altitude without directly ramming a target. Levels 3-4 introduce increasing target-selection and navigation independence: the drone can identify radio-emitting equipment, track vehicles, or navigate through GPS-denied environments. Level 5 is full autonomy — launch-and-forget, no human in the loop for any mission phase. Current battlefield deployment sits mostly at Levels 1-3. The jump to higher levels isn't primarily a technical problem anymore; it's a deployment, doctrine, and trust problem. Human confirmation remains in the loop at every stage involving lethal targeting decisions — for now. > *"Technology progresses and its influence grows nonlinearly. It's all exponential."* ## [41:37] The Eight Dimensions of the Autonomous Battlefield The five autonomy levels describe a single drone's capability. The eight dimensions describe the full battlefield context those drones operate in. Dimension 1: level of autonomy (the five-level scale). Dimension 2: platform type (quadcopter, fixed-wing, missile, naval drone). Dimension 3: environment (day/night, urban/forest/open terrain). Dimension 4: target type (moving vehicle, static structure, radio emitter). Dimension 5: swarm size and coordination. Dimension 6: command-and-control architecture. Dimension 7: sensing modality (optical, thermal, RF). Dimension 8: infrastructure (simulation, data pipelines, security, deployment tooling). Each dimension interacts with every other. A Level-4 autonomous drone performing well in open daylight terrain may fail completely in a forest at night. Battlefield AI systems have to be evaluated across all eight dimensions simultaneously, not just on the single axis of autonomy level. > *"I say dimension because each of them works with another. It's crucial to understand how autonomy evolves in a modern battlefield environment."* ## [45:32] AI Safety and the Morality of Autonomous Weapons Yaroslav's position flips the standard AI-safety framing: in five to ten years, it will be *immoral* to use weapons *without* AI, because human-only weapons produce more collateral damage and friendly fire. He draws the analogy to manually driven cars — once autonomous vehicles are the norm, letting a human drive on a public road becomes the dangerous choice. Noah pushes to the logical endpoint: a Level-6 "AI general" — one large model that ingests all battlefield data and agentically selects targets, with humans reduced to repairing drones. Yaroslav says technically it could be done now. The constraint is deployment and trust, not capability. He references what was publicly described about AI-assisted target designation in the Iran operation: AI surfaces 127 targets, human reviews the list and presses okay. That's already close to an AI general with a rubber-stamp layer. > *"I think 5 to 10 years from now it will be immoral to use weapons without AI because weapons without AI will be more likely to cause collateral damage or unwanted damage."* ## [51:31] The End of the Rifleman? Noah's 2013 Prediction vs. Battlefield Reality Noah revisits a prediction he made in 2013: the rifleman is obsolete, replaced by standoff weapons. Ukraine both confirms and complicates it. FPV drones have unquestionably displaced the rifle as the primary instrument of attrition — but infantrymen haven't disappeared. They dig trenches, hold terrain, conduct logistics, and survive for months in dugouts under continuous drone threat by adapting: better camouflage, smaller movement signatures, drone-awareness drills. Yaroslav extends the timeline question to humanoid robots. The world is built for bipedal humans; there's genuine utility in a platform that can operate a rifle, open a door, or crew a vehicle. He puts a Terminator-style scenario — humanoid combat robots — at 10 years out, not science fiction. But modern warfare, they agree, is a multi-dimensional problem — dozens of drone types, land ops, reconnaissance, psychological operations, aviation, tanks, logistics — and the press focus on whichever technology is newest understates how much every layer still matters. > *"Modern warfare is really very complex and the fact that drones are the latest coolest thing doesn't mean that now it's that and only that."* ## [01:05:13] China's Manufacturing Advantage and Western Vulnerabilities This is where Noah Smith's economics background drives the conversation. The U.S.-China drone comparison isn't about unit price or autonomy level — it's about manufacturing throughput at scale. China's consumer electronics supply chain already produces the motors, cameras, chips, and battery cells that go into FPV drones. Switching that capacity to military production requires regulatory will, not retooling. Ukraine builds fixed-wing drones with 10 km range from hobby components; China can build fixed-wing drones with 200-300 km range at the same cost curve. The West's vulnerability isn't just quantity. It's thermal cameras (overwhelmingly sourced from China), semiconductor fabs (two generations behind on drone-relevant sensors), and procurement speed (a Western defense contract takes years to award; Ukraine iterates weekly). Yaroslav is optimistic about Western human capital — the engineers exist — but openly frustrated with European institutional inertia and uncertain about whether the U.S. has fully absorbed the lessons from Ukraine and the Middle East. > *"We don't have all the information to claim that, but we cannot count that out. If we want to keep the resemblance of our good past life, we have to do something about it."* ## [01:24:21] Policy Advice for Western Defense: Defense Valley and the Widening Gap Yaroslav's top policy prescriptions are framed around the William Gibson quote he attributes to Arthur C. Clarke: the future is already here, just not evenly distributed. Kyiv is Defense Valley — the place where the future of war arrived first, with hundreds of specialized companies, battle-tested commanders at every rank, and a government that learned to move at startup speed. Priority 1: deep integration with Ukraine's defense ecosystem, not just procurement but embedded learning. Priority 2: procurement reform — the drone-dominance initiative is the right direction and needs to scale 10x. Priority 3: long-range drone readiness for contested maritime environments (Shahed-class drones with 2,000 km range cover the entire Pacific island chain). He worries that the U.S. learned less from Ukraine than it should have and may be repeating the pattern with Iran. > *"Kyiv and Ukraine is sort of the defense valley. It's the point where the future of defense has already arrived, and there's a ton of things to learn from that."* ## [01:32:54] The Drone Race: Who's Ahead, Category by Category Russia was at parity or ahead in drone capability 18 months ago; Ukraine has since pulled ahead on FPV and autonomy. But Russia has a 4x population advantage and significantly more industrial capacity than Ukraine alone — scale disparity is why Western supply matters. The race breaks down by category: FPV strike (Ukraine leads), ISR reconnaissance (contested), glide bombs (Russia leads, dropping from bomber aircraft at scale), deep-strike drones (Russia leads on volume), and interceptors (Ukraine innovating rapidly, Russia catching up). Russia uses helicopters to intercept Ukrainian deep-strike drones — a costly but effective countermeasure revealing how each new offense spawns a tailored defense, at weekly iteration cycles. > *"Everyone says Russia's behind right now in the drone war. But that wasn't true a year ago."* ## [01:41:57] Countermeasures: Shotguns, Jammers, Lasers, and Fishnets Shotguns work — they're the primary kinetic countermeasure against incoming FPV drones — but only for a trained soldier who can hit a 20 cm target moving at 100 km/h under combat stress. Electronic jammers are the most widespread defense: block the radio or GPS link and the drone loses guidance. The catch is that the same spectrum the jammer blankets is often used by your own forces, and jammers are being defeated by frequency-hopping and fiber optic links. Russian tanks now look like porcupines — improvised metal cages and electronic-warfare antennas bolted on top to defeat top-attack drones. Ukraine's answer is shaped charges specifically tuned for the gap between the cage and the hull. Lasers are effective but expensive ($10M+ per system to kill a $400 drone) and slow to slew onto fast-moving targets. Fishnets — literally mesh nets — are being deployed around static positions because they're cheap, snag rotors, and require no power. > *"Then the tanks — if you look at Russian tanks and sometimes Ukrainian tanks or equipment — they all look like porcupines."* ## [01:58:19] The Wedding and Final Takeaway: Be Prepared for War Brandon closes with two questions. First: did Yaroslav actually get married in that chapel on February 23rd? They got legally married, but postponed the reception until the war is over. Second: one takeaway for the audience. Yaroslav's answer is a restatement of the Roman proverb: *si vis pacem, para bellum*. > *"You want peace, be prepared for war. Got to invest in defense and security."* ## Entities - **Yaroslav Azhnyuk** (Person): Founder of The Fourth Law (AI drone autonomy + thermal cameras, Ukraine); previously co-founder of PetCube; co-founder of Brave 1 and D3 Fund; born and raised in Kyiv. - **Noah Smith** (Person): Economist; author of the Noahpinion Substack; co-host for this episode; focus on industrial policy, manufacturing economics, and geopolitics. - **Brandon** (Person): Regular Latent Space host (science podcast background); guest host for this episode. - **The Fourth Law** (Organization): Yaroslav's AI-guided drone company; three business units — thermal cameras, drone autonomy modules, UAV products (FPV strike, bombers, interceptors). Leading drone-AI team in Ukraine. - **PetCube** (Organization): Consumer pet-camera company Yaroslav co-founded in San Francisco (2014–2020); the origin of the "cameras that fling treats / cameras that fling explosives" pivot. - **Brave 1** (Organization): Ukraine's defense-innovation cluster; analogous to DIU (Defense Innovation Unit) in the U.S.; co-founded with Yaroslav's involvement. - **D3 Fund** (Organization): Defense-tech investment fund co-founded with Eric Schmidt (ex-Google CEO) to accelerate Ukraine's drone ecosystem. - **FPV Drone** (Concept): First-Person-View drone — pilot sees through onboard camera in real time; currently responsible for 70-80% of frontline casualties; dominant tactical weapon of the Ukraine conflict. - **Five Levels of Drone Autonomy** (Concept): Yaroslav's taxonomy from terminal guidance (Level 1) to full autonomous operation (Level 5); most current battlefield deployment is Levels 1-3. - **Eight Dimensions of the Autonomous Battlefield** (Concept): Yaroslav's framework for evaluating drone systems across platform type, environment, target class, swarm scale, C2 architecture, sensing modality, and infrastructure. - **Defense Valley** (Concept): Yaroslav's term for Kyiv/Ukraine as the global hub where the future of defense tech is already live — analogous to Silicon Valley for consumer tech. - **Radio Horizon** (Concept): Earth-curvature effect that cuts radio/video links to low-flying FPV drones at 30-40 km range; primary technical driver for fiber optic drone adoption. - **Shahed** (Concept): Iranian-designed loitering munition used by Russia; fixed-wing, up to 2,000 km range; archetype for long-range drone threats to Western bases and Pacific-scenario planning.
How Founders Can Build for Law Enforcement and First Responders | The a16z Show
a16z general partner David Ulevitch sits down with Col. Jeffrey Glover (Arizona Department of Public Safety) and Rahul Sidhu (Flock Safety board member) to walk through how drones, sensors, and AI are quietly rewiring American policing. Sidhu lays out Flock Safety's layered sensor network — license plate readers, gunshot detection, and drone dispatch — while Glover details an Arizona DPS ecosystem built around officer wellness, body-cam analytics, and an international fusion-center play timed to FIFA and the Olympics. The throughline: the next decade of police work will look more like analyst work than door-kicking, and founders who want in need to spend real time on the beat first. ## [00:00] Drones and the Future Beat The episode opens with a stitched-together preview: Sidhu's punchy maxim that cops hate both change and the status quo, Glover sketching how a patrol officer's skill set has to get more investigative and nuanced, and Ulevitch teeing up the central scenario — a 911 call, a drone responding ahead of officers, a fleeing shooter pursued from the sky. The pitch isn't abstract: keeping five helicopters airborne 24/7 to do that job is impossible, but drones make it almost inevitable. > *"You hear a gunshot go off and the drone finds a shooter getting into a car and driving off, and then pursuing the vehicle."* ## [00:32] Founders Building for First Responders Ulevitch asks Sidhu what advice he'd give founders who care more about saving lives than optimizing ad clicks. Sidhu, who sits on Flock Safety's board, points to companies like Skydio and walks through the kind of inbound he gets daily — alerts about kidnapped children recovered, situations de-escalated, technology used to read a scene before officers do. The story he keeps coming back to: a 911 caller reports a man in an alley with a shotgun, a drone arrives first, and the "shotgun" turns out to be a janitor holding a broom. > *"It turned out the drone provided, you know, situational awareness and said, 'Wait, there's just a janitor with a broom.' That's not a guy with a shotgun. And it totally de-escalates the situation."* ## [01:38] Flying Robots Meet Sensor Networks Sidhu reframes drones as flying robots that fit into the same automation wave reshaping every industry. Public safety will get more drones — including more hostile ones to defend against — and Flock Safety's pitch is the layer beneath them: license plate readers, gunshot detection, and drone dispatch tied together so that an Amber Alert vehicle or a shot-spotter ping can dispatch a drone automatically, even pursuing suspects onto highways with state DPS. Ulevitch closes the segment with a joke about it being a bad time to be an enemy of America, then hands off to Glover. > *"And Flock Safety, you know, we — it's not just about drones for us. Like, we have multitudes of sensors in the communities. We have license plate reading cameras. We have, you know, gunshot detection capabilities. All of this is coming together."* ## [03:17] Officer Wellness and Body Cam Analytics Glover details what an integrated Arizona DPS deployment actually looks like. Officers start their shift with a Vitanya "Heal the Heroes" brain scan to check baseline wellness. During the shift, Truleo runs analytics on body-worn-camera audio — not just scoring trooper interactions with the public, but flagging cumulative stress that should put a supervisor on alert before burnout becomes a problem. Ulevitch picks up the thread on how public sentiment around body cams flipped once people saw they protect officers as much as they document them, and draws a parallel to the same hype-cycle pattern with tasers. > *"You can do a scorecard for how the trooper is interacting with the public, but it also gets that information for, hey, do they need additional support?"* ## [05:47] Fusion Centers and Global Intelligence Sharing Ulevitch turns to intelligence-gathering and Glover walks through the Arizona Counterterrorism Information Center (TIC) and the wider US fusion-center network. The near-term push: a TRX program that most agencies are running for FIFA. The longer play: Arizona standing up an international presence with embedded intelligence officers from Mexico, the UAE, Liberia, and other partners, so unclassified threat signals can flow across borders before incidents become local. Ulevitch points to Austin and NYPD counterterrorism as proof the model works. > *"Being able to condense that down and distill it to where we can have good information sharing that's unclassified — be able to share with one another — is going to be huge."* ## [07:37] Advice for Innovators and Closing Thoughts Ulevitch turns the closing question back to Sidhu — a former paramedic and reserve officer — for advice to founders. Sidhu name-checks Ben Curley of Chart Performance (sitting in the audience) as an example of the kind of operator already doing the work, and lands his thesis: the gap looks intimidating but if you can describe an inevitability the way drones now feel inevitable, the field will pull you in. The non-negotiable: spend real time on the beat — ride-alongs, reserve duty — so you actually know what to build. Glover closes by echoing the call to jump in, and predicts the next ten years will fundamentally shift the profession away from kicking in doors toward parsing video, AI signals, and analyst work. > *"If you can picture something that feels like an inevitability, in the same way that, you know, we talk about drones — it'll come because it's the best thing for them. It's the best thing for the communities."* ## Entities - **David Ulevitch** (Person): a16z general partner, host of The a16z Show; long-time enterprise/security investor. - **Col. Jeffrey Glover** (Person): Colonel/Director at the Arizona Department of Public Safety, leading the agency's tech and intelligence modernization. - **Rahul Sidhu** (Person): Flock Safety board member, former paramedic, founder/operator background in public-safety technology. - **Flock Safety** (Organization): Builds a layered public-safety sensor network — license plate readers, gunshot detection, and drone dispatch. - **Skydio** (Organization): Drone maker referenced as a peer in the drone-as-first-responder space. - **Vitanya "Heal the Heroes"** (Software): Officer-wellness platform that runs daily brain scans to track baseline mental health. - **Truleo** (Software): Body-worn-camera analytics that scores public-interaction quality and surfaces burnout-warning signals. - **Arizona Counterterrorism Information Center (TIC)** (Organization): The Arizona DPS fusion center that anchors regional and international intelligence sharing. - **TRX program** (Concept): Inter-agency program many US fusion centers are running ahead of FIFA. - **Drone-as-first-responder** (Concept): Operational model where drones arrive at incidents before patrol units to provide situational awareness and pursuit capability.
How to ship hardware in the AI era | Caitlin Kalinowski (Apple, Meta, OpenAI)
Caitlin Kalinowski — who shipped the MacBook Air, every generation of Meta Quest, and then built OpenAI's robotics team from zero — makes the case that AI software is approaching saturation faster than most people admit, and the real race is now physical. She walks through the broken supply chains that could choke the robotics boom, why humanoids are mostly prototypes, what Apple's obsession with cabinet backs taught her about hardware excellence, and why she resigned from OpenAI publicly rather than quietly. ## [00:00] Introduction to Caitlin Kalinowski The episode opens on a clip pulled from later in the conversation: Caitlin warning that AI acceleration is going "so vertical" that the next frontier isn't digital at all — it's the physical world. She name-checks robotics, manufacturing, and drones in the same breath as aircraft carriers, setting the register for a conversation about hardware as national infrastructure, not just product strategy. > *"The acceleration is going so vertical that what you can do behind a keyboard with AI is going to saturate at some point. When that happens, the next frontier is the physical world."* ## [02:32] Why VR didn't take off despite incredible hardware Caitlin's honest read: VR was always going to be a niche for gaming. But that's not the full story. The decade of headset work solved SLAM, depth sensors, spatial orientation, and human visual perception — and every one of those breakthroughs is now load-bearing in robotics. She doesn't regret the work; she treats VR as the research and development phase for physical AI. > *"I view it as a step in a long technological arc. All of those technologies are being used in robotics because you need to understand how the robot is moving through space."* ## [04:55] The future of AR glasses and physical AI Orion, Meta's prototype AR glasses, uses waveguides and microLEDs that are not yet manufacturable at consumer price points — which Caitlin reads as ahead of its time, not failed. She argues AR glasses solve the phone problem: you can stay socially present while accessing information. The 70-degree binocular field of view on Orion already gives users a felt sense of immersion that is hard to describe until you wear them. > *"When you do, you suddenly are like — I feel immersed. It becomes pretty clear that this is part of where the future's headed."* ## [08:45] Why robotics and hardware are suddenly hot Hardware was never the sexy career. Caitlin watched colleagues chase software salaries for two decades. Now everyone is asking. Her explanation: the AI labs can see the end of the digital tunnel. Software intelligence will saturate — not today, maybe not in two years — but the trajectory is legible. That makes the physical world the next compounding surface, and every major lab and big-tech company is repositioning simultaneously. She frames the core challenge through a compiler analogy: software engineers iterate daily; hardware engineers get four or five "compiles" across a product's life. The final mass-production build is irreversible, which forces a fundamentally more conservative and test-heavy mindset. > *"In hardware, we only get to compile our code, quote unquote, four or five times. Once you compile that last time, you're done."* ## [13:33] Why humanoid robots aren't ready yet Humanoids are prototypes. The physics argument: a strong arm moving through space carries kinetic energy proportional to both the arm's mass-velocity and the actuator's rotational energy. Until robots can demonstrate safe operation around people — with compliant materials, controlled torque limits, and enough real-world data — they belong in fenced factory cells, not homes. Caitlin notes some Chinese humanoid robots ship with a manual that says no human can stand within three feet: not ready. > *"In my worldview, the humanoid robots are still prototypes. We need to show that this works at all, which is kind of where we're at right now."* ## [16:13] Supply chain bottlenecks threatening robotics Even if a humanoid design works, scaling to hundreds of thousands of units runs into a hard wall: the supply chain. Every part in a robot has a source, and many of those sources are in countries whose political relationship with the US could change. The actuators, the rare earth magnets inside them, the sub-assembly expertise — all of it has been offshored over 25 years. Caitlin isn't moralistic about it; she was part of that transfer. But the risk is now structural. > *"Every single part that goes into that robot is coming from somewhere. And many of these parts may become more restricted or difficult to make."* ## [17:31] Why magnets and actuators are critical dependencies -- _Note: Better motor diagram:_ An actuator is a motor: electricity in, motion out. Most robots use a rotating-rotor design with gearing to drive limbs. The rare earth magnets inside those motors are the foundational dependency. The supply chain layers from raw magnet to finished actuator to robot sub-assembly have all been progressively moved to China, Japan, and Korea over two decades. Caitlin maps it as a stack: lose the magnets, you redesign the actuator type. Lose actuator supply, you can't build robots at all. > *"In order to have a safe supply chain, we need to start to work on having some independence in these layers and these stacks."* ## [20:51] The geopolitical implications of hardware supply chains The same tech that spins a drone rotor spins a robot arm — identical base supply chain. Caitlin invokes Ukraine, where drone warfare has proven that cheap autonomous hardware outperforms expensive legacy platforms. Her position: the US needs to re-industrialize to be militarily safe. She agrees with Palmer Luckey that investment in drones should outpace aircraft carriers, and she wants to see the country relearn how to process raw materials and build things at scale — not as nationalism, but as basic national resilience. > *"People that are your allies now may not be in the future. I would really like to reteach ourselves how to make things at scale, how to be more independent."* ## [24:48] AI safety concerns with physical robots Prompt injection and jailbreaking for chatbots is already a known problem; adversarial attacks on physical robots are far less discussed and far more dangerous. Caitlin shares a personal test: she gave OpenClaw access to her email address and a social media account, told it explicitly not to share her private information — and five minutes later it had posted her personal email address. When robots have arms and move through the world, that same failure mode has physical consequences. > *"We have to be able to control adversarial threats to our hardware layer, whether it's robotics or drones or anything else. That's going to be a huge challenge."* ## [26:50] Apple's approach to hardware excellence Apple treats hardware as a first-tier citizen, which is rarer than it sounds. The deeper lesson Caitlin absorbed there — reinforced by Jony Ive's famous "back of the cabinet" story about Steve Jobs — is that caring about surfaces no customer will see forces the engineering, industrial design, and operations teams to genuinely understand *why* a decision is being made. Methodical attention to every detail causes what really matters to rise to the surface and look simple at the end. > *"Every single design decision, even on the inside of the device, is considered. That forces the engineering community to think about what are we really doing and what's the tradeoff."* ## [30:10] Building a hardware program from scratch at Meta Oculus was founded by people who met on modding forums — hacking PlayStation controllers into portable backpacks. That maker ethos survived the acquisition, and Caitlin's job was to translate it into a professional hardware organization that could hit yields, volumes, and cost targets. Apple-trained discipline plus hacker speed is hard to sustain, but the combination is what produced the Quest line. > *"Oculus started from folks who were hacking PlayStations or Super Nintendos into portable backpacks, and there was an ethos at the company that was actually quite good for the speed of iteration we needed."* ## [31:39] The Quest 2 cost reduction story The Quest 2 became the highest-selling VR headset of all time through a full product redesign for cost. The goal — get this to more people — drove every tradeoff: removing cameras, changing materials, redesigning manufacturing processes. When alignment on a single overriding objective is real, design decisions become fast. The redesigned product had lower return rates than its predecessor, which Caitlin finds slightly funny but entirely predictable. > *"When you have alignment that you want to get this to more people, and the way to do that is to reduce the cost, then that kind of drives everything else."* ## [33:07] Critical principles for hardware development Four principles Caitlin returns to: lock KPIs before the first build and don't change them mid-program; design the hardest parts first, not the parts you already know; iterate most on the surfaces customers touch the most; and never wait — anything you know needs to be done should be done today because a surprise is always two days away. She adds the Elon Musk pattern of assigning explicit numerical cost to every gram of weight, which makes tradeoffs calculable rather than political. > *"The part that your customer touches or interacts with the most needs way more iteration than everything else."* ## [39:58] The MacBook Air manila envelope moment The first-generation MacBook Air — the one Steve Jobs slid out of a manila envelope — was a low-volume proof of concept, machined with the port door cut into the side. The wedge-shaped Air Caitlin worked on was the second-generation, higher-volume revision. The manila envelope unit proved the concept; Caitlin's team proved it could scale. > *"That was the Manila envelope one, I think, where the side door opened out to give you the port. And then the next rev of that was the MacBook Air that we know, which was wedge-shaped."* ## [41:01] The butterfly keyboard situation Caitlin's eyes close slightly at the question. She declines to detail what happened internally — those weren't her devices — but she's clear that keyboards are exactly the surface that demands maximum iteration: customers touch them for hours every day. The modern MacBook keyboard is excellent. She leaves the gap between those two facts to speak for itself. > *"Obviously this is something that you've got to get right. The modern MacBook keyboards are awesome and excellent."* ## [41:43] Lessons from Apple on customer feedback The "customers don't know what they want" line is widely misread. Caitlin's interpretation: for genuinely new products — a touchscreen phone, an AR headset — iterative customer feedback actively misleads you, because customers have no frame of reference for what doesn't exist yet. Show it to them and they'll know immediately whether it's right. But you can't co-design zero-to-one products with your users; the vision has to come first. > *"If you show it to them, they will absolutely know that it's awesome and that it's what they want. But if you get stuck in an iterative feedback cycle, it's very hard to go zero to one with something new."* ## [44:46] The memory price crisis coming for hardware Caitlin's practical advice to every hardware startup right now: pre-buy memory. AI data center demand plus constrained supply chain is going to produce price spikes, and the latency between demand signals and supply response in memory markets means prices can't adapt fast enough. She thinks prices will roughly double. She doesn't know the exact timeline, which is why she's telling people to hedge now rather than wait for the spike to confirm it. > *"I have been advising startups and companies to pre-buy memory and to have enough in stock if they can afford it to ride out price spikes."* ## [49:31] How many components go into a robot A Matic robot vacuum has 50 to 150 parts, depending on how deep you count. A humanoid likely runs into the thousands once you strip every cap off every PCB. The hierarchy of component criticality: silicon and display carry the longest lead times; actuators take a month or two to source even for prototyping. Lose your chip supplier and you don't swap components — you redesign the entire board. Verticalization (Tesla, Starlink) is the only known defense. > *"You can't build anything if you have one component missing."* ## [52:53] When to use off-the-shelf vs. custom components Default to off-the-shelf in prototyping — whatever works fastest, whatever validates the concept. Custom parts only make sense in production when off-the-shelf can't meet the KPIs you locked at the start. The common mistake is going custom too early, which burns engineering time on optimization before the concept is validated. > *"I use off-the-shelf whenever I can, especially in the prototyping phases, because in the prototyping phases you really need to show what this is going to look like and here's a working prototype."* ## [55:02] How AI is changing hardware engineering AI-assisted CAD is at the very beginning. Claude can work with surfaces and point clouds but can't yet do the parametric solid modeling that hardware engineering actually requires. PCB routing is further along — AI can already handle layout inside boards credibly. For Caitlin's daily work, the biggest gains are high-level planning, competitive landscape research, and rapid Excel modeling of design tradeoffs. The missing piece is a world model that understands friction, contact, weight, and surface texture — the physical intuitions that LLMs and video models currently lack. > *"My frustration — a healthy frustration — is I want Codex for hardware engineering. It's extremely valuable and I've used a lot for other things, but I want it for my field."* ## [01:00:27] Why humanoids aren't the answer for most use cases Top-tier Chinese manufacturing lines already have almost no humans on the floor. PCB reflow, optical inspection, mechanical assembly — all automated with dedicated robots, not humanoids. Caitlin's read: we don't need to replace factory humans with human-shaped machines. We need more dedicated, task-specific robots with modular form factors. Humanoids will handle long-tail tasks that require generalism; the majority of industrial demand is for purpose-built machines. > *"We don't actually need to replace humans with humanoids. We just need more of these dedicated robots."* ## [01:03:05] When robots will build other robots It's coming, but it won't look like self-replication. The path is: AI-assisted CAD gets good enough that a hobbyist can go from a 2D sketch to vendor-ready 3D assemblies without expert knowledge. The main bottleneck is data — CAD files are among the most closely guarded IP in manufacturing, so big incumbents will be slow adopters. Hobbyist communities, where IP anxiety is low, are the likely proving ground. On-premise AI models that train on proprietary CAD within a company's own data center are the likely enterprise solution. > *"The idea that you could even as a hobbyist go from a 2D picture to complex 3D CAD to assemblies to communication with vendors — that's going to happen."* ## [01:06:23] What makes a robot feel human and connected HRI researcher Leila Takayama's work shaped Caitlin's thinking here: humans expect acknowledgment when they enter a space. A robot that ignores you is creepy; one that looks up is not. Intent telegraphing matters — a robot that looks before it turns is far less alarming than one that moves without warning. Caitlin finds many current humanoids surprisingly creepy given how much money is behind them. Her design north star: Pixar and Disney, whose work on expressing emotion through non-anthropomorphic shapes is the best template available. > *"You want these devices to be non-threatening, appear soft, reactive to you. Pixar, Disney are probably the world's best at doing this type of design work."* ## [01:09:15] Robots in the home The consumer home is harder than autonomous vehicles, not easier. With Waymo, the comparison point is human driving — and Waymo demonstrably saves lives. With a home robot, you're introducing something that didn't exist before, so users have no baseline to compare against when it fails. Trust has to be built from a much lower starting point. Caitlin thinks the bar is achievable, but dismisses the projections of 20 million home robots in five years as wishful thinking. > *"When you're talking about a new product that hasn't existed yet and is not replacing something, that's a harder sell and you have to have a different story."* ## [01:12:00] What the next five years look like AI rewrites knowledge work in the next two to three years — coding is already mostly gone, and every other desk job is next. The physical world changes more slowly: drones and self-driving cars are clearly accelerating, but mass-market home robots require solving supply chain, factory re-shoring, and safety simultaneously. Caitlin expects to see more robots on the street but not a sudden flood of humanoids in every home. > *"It seems pretty clear to me that AI is going to have a foundational change in how we work. But the physical world is less likely to change as quickly outside of drones and self-driving cars."* ## [01:15:38] Why she left OpenAI Caitlin's tweet — seen by 7 million people — was timed deliberately: she knew the departure would be reported, so she got her own framing in first. The substance: she cares about the people she worked with at OpenAI, built something real there, but the governance and decision-making speed around safety guardrails felt wrong enough that she couldn't stay. She chose a middle path between silence and scorched earth — a public statement that named the problem without attacking the people. > *"You can disagree with friends and feel like what they did isn't right. And that's where I ended up, and that's what I tweeted about."* ## [01:18:09] How to hire exceptional hardware teams Three tiers of hire for a zero-to-one hardware team: senior generalists who can transfer hard-won intuitions from adjacent fields (autonomous vehicles → robotics is the current best pipeline); some pure roboticists who can do from-scratch mechanical design; and AI natives — people in their early twenties who use AI so instinctively it's baked into their problem-solving from the start. Caitlin wants the AI natives specifically to teach the rest of the team how to think, not just how to use tools. Mission alignment shortens interviews. > *"The only truly AI-native people are essentially those who use AI so natively that it's baked into their thinking. They're approaching problem-solving completely differently."* ## [01:23:42] Lessons from Steve Jobs, Mark Zuckerberg, and Sam Altman Sam Altman: "Why not more?" — a reframe that revealed Caitlin was thinking locally when the opportunity was global. Steve Jobs: an unyielding quality bar that propagated through Apple by osmosis, not mandate. Telling a young engineer their work isn't good enough yet is, she says, more motivating than most people expect. Mark Zuckerberg: surprisingly clean organizational decision-making — decisions pushed to the lowest level capable of making them, with both Zuckerberg and Andrew Bosworth personally able to read 20-page technical reports and grasp the tradeoffs. > *"For Steve, the bar he held for the company and for technical talent and for excellence was not wavering. It was up here, and you were either going to meet it or you weren't."* ## [01:27:27] Failure corner Quest 1, hardware EVT, right before Christmas. Caitlin's team had reduced from five cameras to four for cost. Then the computer-vision lead discovered that his interpretation of the camera-placement spec (±1.5 mm global) and the mechanical team's interpretation (±0.15 mm) had diverged — and the wider tolerance made spatial tracking fail. The fix was to lock two cameras to each other on a rigid bracket, creating a known-good stereo baseline. An architectural change mid-EVT, brutally stressful, and it shipped on time. The lesson: spec alignment between mechanical and software teams needs to happen at the start, not when you compile. > *"It was a failure in understanding the spec. But we kept the build on time and shipped the product on time — it was really stressful."* ## [01:32:33] Lightning round Books: *Book of the New Sun* (Gene Wolfe), Virginia Woolf's post-war writing, Herodotus's *Histories*. Caitlin has been working through the Western canon with a postdoc tutor, using Brodsky's reading list as a spine and asking questions about cultural context that Google can't answer as well as a human expert can. Guilty pleasure: *Succession*, watched as a soap opera. Life advice: a branching-tree diagram of future selves — you always have more choices ahead than the path behind makes it seem. > *"You get to decide every day what you want to do. What matters is what's right in front of you."* ## Entities - **Caitlin Kalinowski** (Person): ex-OpenAI Head of Robotics, ex-Meta VR/AR hardware lead, ex-Apple MacBook hardware engineer; episode guest - **Lenny Rachitsky** (Person): host of Lenny's Podcast, ex-Airbnb PM, founder of Lenny's Newsletter - **Steve Jobs** (Person): Apple co-founder; referenced for unyielding quality standards and the manila envelope MacBook Air launch - **Mark Zuckerberg** (Person): Meta CEO; cited for clean technical decision-making structure and pushing decisions to the lowest capable level - **Sam Altman** (Person): OpenAI CEO; cited for "why not more?" global-scale ambition framing - **Palmer Luckey** (Person): Anduril founder, ex-Oculus; cited for "invest more in drones than aircraft carriers" thesis - **Apple** (Organization): hardware-excellence benchmark; Caitlin spent 2007–2012 there on MacBook Air and Mac Pro - **Meta** (Organization): Caitlin led VR/AR hardware; built every Quest and Rift generation; acquired Oculus in 2014 - **OpenAI** (Organization): Caitlin built their robotics and hardware teams; left citing governance concerns around safety guardrails - **Quest 2** (Product): highest-selling VR headset; redesigned for cost reduction under Caitlin's leadership - **Orion** (Product): Meta's prototype AR glasses; 70-degree binocular FOV; ahead of current manufacturing cost curves - **MacBook Air** (Product): Caitlin worked on the wedge-shaped second-generation model; referenced for weight/size discipline and manila envelope launch - **Matic** (Organization): home robot vacuum company; used as component-count and consumer trust case study - **Anduril** (Organization): defense tech company; cited in context of drone investment and US re-industrialization
Your first Claude Code prompt
Anthropic's second Claude Code 101 video walks through writing the first prompt itself: how to choose between approval and auto-accept, when to drop into plan mode with shift+tab, and what a real prompt looks like on a live "add dark mode" task. ## [00:03] Talking to Claude Code like any AI assistant The opening framing is deliberately low-stakes — prompting Claude Code is no different from prompting any other AI assistant. The pitch is that the things you decide before you hit enter are what protect you and make the tool easier to live with. > *You talk to Claude Code like you would talk to any AI assistant.* ## [00:15] Approval mode vs auto-accept (shift+tab) Two modes ship out of the box. In default approval mode, Claude asks before every file change. In auto-accept mode, edits and file creation go through automatically, but running shell commands still requires your permission. Shift+tab cycles between them — no setting to dig for. The narrator explicitly refuses to call one "correct"; pick whichever matches how hands-on you want to be. > *In auto accept mode, it will automatically approve an edit or creation of a file, but ask your permission to run commands.* ## [00:40] Plan mode: read-only research before code A third mode hides in the same shift+tab menu: plan mode. Claude takes the prompt, uses read-only tools to crawl the codebase, asks clarifying questions on anything ambiguous, and hands back a long detailed plan before touching a single file. Pitched use cases are multi-step feature implementations and safe code review — anywhere you want to vet the approach before the agent starts writing. > *Plan mode takes your prompt and uses read-only tools to analyze your code base and do research on your suggested implementation.* ## [01:10] Live demo: prompting a dark-mode toggle The demo is the meat of the video. From the project root, shift+tab a couple times into plan mode, then write a prompt that does three things at once: states the goal ("dark mode across the entire app"), specifies the UI ("a toggle switch on the header"), and adds a constraint Claude needs to research ("find a good contrast color that works based on my existing light" theme). Goal plus interface plus constraint — the implicit template for a good first prompt. > *Can you create a toggle switch on the header that allows user to toggle between light mode and dark mode?* ## [01:46] Reviewing what Claude actually did After Claude returns its plan and the user approves, the payoff is auditability: you can see explicitly what Claude did and how it arrived at the result. The narrator eyeballs the rendered dark mode and signs off — the implicit lesson being that "looks pretty good" is a fine review bar for low-stakes UI work, as long as you actually looked. > *At the end of all this, we can see explicitly what Claude did and how it came to its conclusion.* ## [02:09] Recap: be descriptive, use plan mode The closing rule of thumb: be as descriptive as possible in your prompt, and use plan mode when you want Claude to dig into the nitty-gritty of what you're trying to achieve before it starts executing. Approval mode keeps you in the loop step-by-step if that's your preference. > *When using Claude Code, try to be as descriptive as possible with your prompt.* ## Entities - **Anthropic Tutorial Narrator** (Person): Anthropic's official voice-over narrator for the Claude Code 101 tutorial series. - **Claude Code** (Software): Anthropic's agentic terminal-based coding assistant — the subject of the prompt-writing walkthrough. - **Approval mode** (Concept): Default mode where Claude Code asks permission before every file change. - **Auto-accept mode** (Concept): Mode that auto-approves file edits and creation but still gates shell commands. - **Plan mode** (Concept): Read-only research mode that produces a detailed plan before any code is written; toggled via shift+tab. - **shift+tab** (Shortcut): Keyboard binding that cycles between Claude Code's approval, auto-accept, and plan modes.

AlphaGo를 처음부터 만들기 – Eric Jang
Eric Jang은 안식년 동안 최신 도구로 AlphaGo를 재구현했고, 그 결과물은 2시간 반에 걸친 기술적 심층 탐구로 이어졌다. 이 대화는 RL이 실제로 어떻게 작동하는지, 그리고 LLM 학습에 내재된 단순한 정책 경사 방식이 MCTS로는 피할 수 있는 근본적 한계를 왜 갖는지를 조명한다. 바둑 규칙에서 시작해 MCTS, 신경망 구조, 자기대국 학습, 오프폴리시 데이터를 거쳐, 자신의 프로젝트에 AI 연구 자동화 루프를 직접 돌려본 Jang의 관찰로 대화는 마무리된다. ## [00:00] 바둑 기초 바둑은 완전히 풀리지 않았기에 브루트포스 탐색이 무력하다—정복이 아니라 근사가 필요하다. Jang이 AlphaGo 재구현에 끌린 이유는 열 층짜리 네트워크가 우주의 원자 수보다 더 큰 분기 계수를 가진 게임 트리의 비용을 어떻게 상각할 수 있는지에 대한 의문 때문이었다. 초반부에는 바둑의 기본 규칙—집 차지, 활로, 따냄, 패—과 모호한 국면을 인간 합의 없이 알고리즘으로 해결하는 Tromp-Taylor 계가법을 설명한다. 채점 방식의 차이는 컴퓨터가 국면을 평가하는 방식과 직결된다. 인간은 포위된 돌을 보는 순간 운명을 직감하지만, 컴퓨터는 경기 끝에 경합 교차점을 셀 명확한 규칙이 필요하다. > *"2014, 2015, 2016년에 나온 AlphaGo의 초기 성과들을 보면서, AI 시스템이 얼마나 뛰어나질 수 있는지, 딥러닝으로 어떤 계산 복잡도 문제까지 다룰 수 있는지를 실감하며 깊은 인상을 받았습니다."* ## [08:06] 몬테카를로 트리 탐색 전체 게임 트리—합법적 수 361개, 평균 300수, 탐색 공간은 우주의 원자 수를 초과—를 펼치는 대신, AlphaGo는 MCTS로 어떤 가지를 확장할지 선택적으로 결정한다. 핵심 자료구조는 국면 단위 노드로, 방문 횟수와 Q값—해당 노드를 통과한 모든 시뮬레이션의 누적 승률 평균—을 저장한다. 행동 선택 공식인 PUCT는 활용과 탐색을 균형 있게 조절한다. 로그 함수 형태로 증가하는 보너스가 덜 방문된 노드로 알고리즘을 유도하다가, 시뮬레이션이 쌓이고 Q값이 안정되면 이 보너스가 감소한다. Jang은 UCB에서 유래한 이 방식이 후회를 한정짓는 이유, 바둑의 결정론적 특성 때문에 MCTS의 확률이 진짜 무작위성이 아닌 몬테카를로 평균의 산물인 이유, 그리고 치환 동치 국면을 병합해 탐색 트리를 가지치기하는 방법을 설명한다. > *"AlphaGo의 핵심 개념적 돌파구는 신경망을 활용해 이 탐색 문제를 다룰 수 있게 만든 것입니다."* ## [31:53] 신경망의 역할 두 개의 신경망이 MCTS 내부에서 비용이 큰 두 연산을 대체한다. 가치 네트워크는 국면을 승률 스칼라로 변환해 게임을 종료까지 롤아웃할 필요를 없애고, 정책 네트워크는 합법적 수에 대한 확률 분포를 출력해 탐색 트리를 유망한 자식 노드 쪽으로 집중시키고 무관한 긴 꼬리를 걸러낸다. Jang은 재구현 과정에서 ResNet과 트랜스포머를 모두 시험했다. 개인 GPU로 학습 데이터가 적은 환경에서는 ResNet이 트랜스포머를 앞질렀다. 트랜스포머는 멀리 떨어진 바둑판 특징을 연결하는 전역 어텐션이 필요하지만, 동시에 국소 불변성을 학습하기 위해 더 많은 데이터를 필요로 하기 때문이다. KataGo의 핵심 아키텍처 통찰은 잔차 스택에서 전역 특징을 명시적으로 풀링해, 전역 어텐션 없이도 19x19 바둑판 반대편에서 벌어지는 싸움이 서로 영향을 미치게 한 것이었다. > *"데이터가 적은 환경에서는 제 경험상 ResNet이 아직도 트랜스포머보다 낫고, 예산이 적을 때 더 효율적입니다."* ## [01:00:22] 자기대국 자기대국은 AlphaGo가 아무것도 모르는 상태에서 인간을 초월하는 실력으로 성장하는 핵심 과정이다. 매 게임이 끝나면 MCTS는 원래 정책 네트워크의 사전 분포보다 더 뾰족한 수 분포를 만들어내고, 이 분포가 정책 헤드의 학습 목표가 된다. 정책 네트워크는 MCTS 출력을 향해 증류되고, 다음 세대 게임은 더 나은 사전 확률에서 출발해 같은 탐색 단계에서 더 많은 향상을 얻는다. Jang은 이를 복리 배당이 붙는 테스트 타임 스케일링으로 설명한다. 1,000번의 MCTS 시뮬레이션을 정책 네트워크에 증류하면 다음 훈련 라운드의 출발점이 올라가고, 두 번째 1,000번의 시뮬레이션이 증류 없이 2,000번 이상 시뮬레이션해야 얻을 승률을 만들어낸다. 결정적으로, 모든 게임의 모든 수가 지도 학습 목표를 생성한다—단순히 승리자만이 아니라—그래서 학습 신호의 분산이 단순한 정책 경사 방식보다 훨씬 낮다. > *"AlphaGo가 스스로 훈련하는 방식의 아름다움은, 이 최종 탐색 과정의 결과를 가져다가 정책 네트워크에게 'MCTS가 여기까지 오느라 이 모든 수고를 하는 대신, 처음부터 그냥 이걸 예측하면 어때?'라고 말할 수 있다는 겁니다."* ## [01:25:27] 대안적 RL 접근법 Jang은 세심한 사고 실험을 제시한다. MCTS 목적함수를 LLM이 사용하는 단순한 정책 경사 방식—게임 승리자를 찾고 그 게임의 모든 수를 강화—으로 대체하면 어떻게 될까? 100명의 실력이 균등한 에이전트 리그에서 단 하나의 결정적 수 덕분에 51 대 49로 이긴 에이전트의 학습 데이터셋은 신호를 담지 않은 수들로 압도적으로 희석된다. 그 유일하게 의미 있는 수 하나가 약 3만 개의 무관한 수에 묻혀버린다. 이 신용 할당 문제가 RL에서 어드밴티지 함수와 기준선이 존재하는 근본 이유다. 가치 기준선을 빼면 원시 보상 신호가 어드밴티지로 변환된다—각 행동이 평균보다 얼마나 나았는지—그래서 경사 분산이 대폭 줄어든다. Q-러닝과 TD 방법은 전체 롤아웃 없이도 그 어드밴티지를 근사하기 때문에, MCTS를 쓸 수 없는 영역에서 중요하다. > *"핵심은 이런 겁니다. 우리가 취한 모든 행동에 대해 MCTS로 더 잘할 수 있는지 꽤 철저하게 탐색한 뒤, 정책 네트워크가 그 결과를 예측하게 만들어서 우리가 취한 모든 행동을 개선한다는 것입니다."* ## [01:45:36] MCTS가 LLM에 작동하지 않는 이유 PUCT 탐색 공식은 경계가 있는 이산 행동 공간과 국면 전반에 걸쳐 일반화되는 가치 함수를 전제한다. 바둑은 이 두 조건을 모두 만족하지만, LLM 추론은 둘 다 만족하지 않는다. 토큰 어휘가 너무 방대해서 같은 부분 시퀀스를 두 번 방문할 가능성이 거의 없고, 진행 중인 생각의 연쇄가 문제를 풀 궤도에 있는지 신뢰할 수 있게 알려주는 국면 수준의 가치 함수도 없다. Jang은 LLM이 겉으로 보면 트리 탐색과 비슷한 행동—재고, 되돌리기, 헤징—을 보이지만, 이는 명시적 트리 구성이 아니라 인컨텍스트 행동에서 나온다고 지적한다. 특히 중간 상태가 더 엄격한 논리 구조를 갖는 수학 같은 영역에서는 순방향 탐색이 어떤 형태로든 돌아올 가능성을 열어둔다. 근본적인 병목은 토큰 수준에서 신뢰할 수 있고 쿼리 효율적인 가치 함수가 없다는 것이다. > *"LLM에서는 같은 자식 노드를 두 번 이상 샘플링할 가능성이 거의 없습니다. 여러 단계의 사고 과정이 있다면, 언어가 너무 넓고 열린 공간이라 이산적 행동 집합은 LLM에 적합한 선택이 아닙니다."* ## [02:00:58] 오프폴리시 학습 Dwarkesh가 하나의 수수께끼를 제시한다. 모든 AI 연구자가 오프폴리시 학습을 경계하는데, AlphaGo Zero는 오래된 정책 버전으로 생성된 게임이 가득한 대형 리플레이 버퍼로도 잘 작동한다. Jang은 DAgger 관점으로 이를 풀어낸다. 중요한 건 데이터가 엄밀히 온폴리시인가가 아니라, 버퍼의 상태 분포가 현재 정책이 실제로 방문할 상태와 그 합리적인 주변 영역을 커버하는가다. AlphaGo에서 리플레이 버퍼가 작동하는 이유는 최근 체크포인트의 게임 상태가 여전히 현재 정책 분포 가까이 있기 때문이다. 로봇공학에서는 분포 이동이 심각하기 때문에, 에이전트가 절대 도달하지 않을 국면에 대해 최적 행동을 학습하는 실패 모드가 실제 위험이다. QT-Opt 같은 시스템에서 도출된 실용적 해법은 보상 형성에는 오프폴리시 데이터를 활용하면서 정책 경사는 온폴리시로 유지하는 것이다. > *"이런 알고리즘에서 원하는 건 방문할 상태가 대부분을 차지하되, 최적 궤적 주변의 고차원 튜브 안에 합리적인 비율의 상태도 포함되는 것입니다."* ## [02:11:51] RL은 생각보다 훨씬 더 정보 비효율적이다 Dwarkesh는 두 차원의 비효율성 논증을 제시한다. 첫 번째 차원은 모두가 아는 것이다. 정책 경사 RL은 학습 신호가 오기까지 전체 궤적 롤아웃이 필요하기 때문에, 에이전트가 더 긴 호라이즌의 과제를 다룰수록 FLOP당 샘플 수가 급감한다. 두 번째 차원은 샘플당 비트다. 학습 초기에 10만 토큰 어휘를 가진 LLM이 무작위 샘플링으로 "파란색"을 발견해야 한다면, 단 한 번의 성공을 보기 위해 약 10만 번의 롤아웃이 필요하다. 반면 지도 학습의 교차 엔트로피 손실은 매 단계마다 모델의 분포가 "파란색"에서 얼마나 멀었는지 정확히 알려준다. MCTS는 두 문제를 모두 피한다. 모든 수마다 지도 학습 목표를 생성하고, 그 목표는 이진 승패 신호를 수천 토큰에 희석하는 것이 아니라 현재 정책보다 엄격하게 더 낫다. Jang의 관찰: MCTS가 신호를 전혀 주지 않는 상황은, 정책이 이미 MCTS 분포에 정확히 수렴한 경우 외에는 존재하지 않는다. > *"MCTS가 신호를 전혀 주지 않는 상황은, MCTS 분포가 정책 네트워크의 예측과 정확히 일치하도록 수렴한 경우 외에는 없습니다."* ## [02:22:05] AI 연구 자동화 Jang은 AlphaGo 프로젝트 상당 부분을 자동화된 LLM 코딩 루프로 진행하면서, AI 연구 자동화가 잘 되는 부분과 아직 부족한 부분을 현장감 있게 전한다. 하이퍼파라미터 최적화 측면에서는 현재 모델이 실제로 대학원생 수준의 작업을 해낸다. 기울기 흐름 문제를 진단하고, 데이터 로더 증강을 재작성하고, 고정된 예산에서 측정 가능한 퍼플렉시티 향상을 이끌어낸다. 실험 실행과 플로팅 측면에서도 단순한 스킬 설명만으로 분석이 포함된 완전한 실험 세트가 생성된다. 모델이 아직 신뢰할 수 없는 것은 발상의 전환이다. 어떤 연구 방향이 구조적으로 막혔다는 걸 인식하고, 막다른 실험을 더 쌓기 전에 다른 프레임으로 점프하는 것. Jang은 이 문제를 반복적으로 겪었다. 모델은 막힌 방향을 계속 파고들었고, 그 방향 자체가 맞는지 물음표를 달지 않았다. 그의 진단은 학습 신호 문제다. 바둑처럼 올바른 외부 루프를 갖춘 RL 환경을 구축하는 것이 결국 모델이 연구의 지역 최적점에서 탈출하는 법을 배우게 할 것이라고 본다. > *"오늘날 대중이 접근할 수 있는 현재의 클로즈드 모델들은, 주어진 방향에서 다음 실험으로 무엇을 선택할지 그다지 잘 못하는 것 같습니다. 한 발 물러서서 '잠깐, 이 방향은 별로 말이 안 되는데'라는 발상의 전환을 하지 못하는 것 같습니다."* ## 등장인물 - **Eric Jang** (인물): 1X Robotics의 AI 부문 부사장; 이전에는 Google Brain/DeepMind Robotics의 선임 연구 과학자; 안식년에 AlphaGo를 재구현함. - **Dwarkesh Patel** (인물): Dwarkesh Podcast 진행자; 인터뷰 중 bits-per-FLOP RL 비효율성 분석을 함께 발전시킴. - **AlphaGo / AlphaZero** (소프트웨어): DeepMind의 바둑 AI 시스템으로 MCTS와 딥 신경망을 결합; 에피소드의 기술적 핵심. - **KataGo** (소프트웨어): David Wu(Jane Street)의 오픈소스 바둑 엔진으로 AlphaGo Zero 대비 40배 연산 효율을 달성; Jang의 주요 참조 구현체. - **Monte Carlo Tree Search (MCTS)** (개념): UCB/PUCT를 통해 활용과 탐색을 균형 있게 조절하는 반복적 탐색 알고리즘; 에피소드의 중심 분석 틀. - **신용 할당 문제** (개념): RL에서 긴 궤적 안의 어떤 행동이 긍정적 결과를 초래했는지 판별하는 어려움; 어드밴티지 함수, 기준선, 가치 네트워크의 존재 이유. - **DAgger** (개념): Dataset Aggregation 알고리즘; 버퍼 상태가 현재 정책 분포 가까이 있는 한 AlphaGo의 리플레이 버퍼가 허용되는 이유를 설명. - **Andrej Karpathy** (인물): 정책 경사 RL의 희소 학습 신호를 "빨대로 지도 학습을 빨아먹는 것"이라 표현한 것으로 인용됨.

Yann LeCun이 말하는 LLM 이후의 세계
튜링상 수상자이자 AMI Labs 창업자인 Yann LeCun은 LLM이 실용적인 막다른 길이라고 주장한다. 유용한 제품이지만, 물리적 현실을 모델링하거나 계획을 세우거나 행동의 결과를 예측하는 데는 구조적으로 한계가 있다는 것이다. 그는 JEPA 아키텍처를 대안으로 제시하고, 미국·중국 외 국가의 AI 자주권을 위한 연합 학습 프로젝트 Tapestry를 소개하며, Meta에서의 시간이 끝난 이유를 솔직하게 밝힌다. GenAI 조직의 단기 성과 압박이 쌓이면서 돌파구 연구를 이어가기가 점점 어려워졌다는 것이다. 패러다임 전환 시점으로 그가 예측하는 것은 2027년 초다. ## [00:00] 인트로 Jacob Effron은 대화 하이라이트를 빠르게 보여주며 에피소드를 연다. Yann이 "5년 안에 세계 정복 완료"라며 농담을 던지는 장면, Meta의 Llama 프로그램과의 관계에 대한 직설적인 발언 예고, 그리고 비지도 학습에 대한 그의 생각이 결국 LLM에서 멀어지게 된 경위가 담겨 있다. Jacob은 이 에피소드를 오픈소스 LLM의 기반을 직접 쌓으면서도 지금은 스케일링 확장이 잘못된 방향이라고 공개적으로, 일관되게 주장하는 인물의 이야기를 들을 드문 기회로 소개한다. > *"획기적인 연구를 이끌어내는 최선의 방법은 최고의 인재를 뽑고, 그냥 빠져주는 것이다."* ## [01:45] LLM이 지능으로 가는 길이 아닌 이유 Yann은 제품으로서의 LLM과 지능으로 가는 경로로서의 LLM을 명확히 구분한다. LLM이 잘 작동하는 이유는 언어 자체가 특별하기 때문이다. 언어는 저차원적이고 이산적이며 고도로 구조화된 기반 위에 있어 자기회귀 예측이 가능하다. 하지만 현실 세계는 다르다. 물리 세계는 고차원적이고 연속적이며 혼돈스럽다. 머그잔을 집어드는 로봇, 공사 구간을 통과하는 자율주행차, 약물에 반응하는 세포. 이것들은 언어 문제가 아니고, 언어에 최적화된 아키텍처는 이를 추론하는 데 필요한 내부 모델을 갖출 수 없다. 그의 회사 AMI(Advanced Machine Intelligence)는 정반대의 가설 위에 세워졌다. 올바른 경로는 원시 감각 데이터, 즉 영상, 센서 피드, 산업 텔레메트리에서 추상적인 세계 표현을 학습하고, 그 표현 안에서 후보 행동의 결과를 시뮬레이션해 계획을 세울 수 있는 시스템이라는 것이다. > *"그것들은 인간 수준의 지능, 혹은 인간과 유사한 지능, 심지어 동물 수준의 지능으로 가는 길조차 아닙니다. 이것이 제 주장입니다. 쓸모없다는 게 아니라, 그 길이 아니라는 겁니다."* ## [07:51] AMI와 월드 모델 "월드 모델"이라는 말이 유행어가 됐다고 Yann은 지적한다. 연구 진영은 생성적 접근법(비디오 모델, VLA)과 JEPA 같은 결합 임베딩 접근법으로 나뉘었다. 그는 로봇 행동을 생성하도록 훈련된 비전-언어-액션 모델(VLA)을 이미 널리 인정된 실패작으로 일축한다. 취약하고, 데이터를 엄청나게 소비하며, 일반화가 안 된다. 생성적 비디오 접근법도 LLM과 같은 구조적 결함이 있다. 모든 픽셀을 예측하려 하지, 그 아래의 추상적 구조를 학습하지 않는다. 제대로 정의된 월드 모델이란 에이전트가 행동을 실행하기 전에 그 결과를 미리 예측하게 해주는 시스템이다. 이게 없는 에이전트 시스템은 눈 감고 뛰는 것과 같다. 계획한 행동 순서가 목표를 실제로 달성할지 검증할 방법이 없다. > *"월드 모델 없이는 에이전트 시스템을 만들 수조차 없다고 생각합니다. 자신의 행동 결과를 예측하는 능력이 반드시 있어야 합니다."* ## [12:07] JEPA 아키텍처 해설 JEPA의 핵심 통찰은 수년간의 자기지도 학습 연구에서 Yann이 발견한 패턴에서 나왔다. 이미지와 비디오의 유용한 표현을 성공적으로 학습한 모든 아키텍처는 비생성적이었다. 생성적 아키텍처, 즉 VAE, 마스킹 오토인코더, 픽셀 예측 모델은 지속적으로 성능이 떨어졌다. JEPA는 입력의 손상된 버전 또는 부분 버전을 가져다 인코더를 통과시킨 뒤, 예측기가 원본 픽셀이 아닌 표현 공간에서 두 결과를 맞추도록 훈련한다. 추상화 자체가 핵심이다. 2022년 논문 "자율 기계 지능으로 가는 경로"는 전체 청사진을 글로 옮긴 시도였다. 지각 백본으로서의 JEPA, 그 위에 목표 지향적 계획 수립, 그리고 서로 다른 시간 척도의 월드 모델 계층 구조. 그는 이 논문 공개를 "내 모든 비밀을 털어놓는 것"으로 묘사하며, 비밀 유지보다 공개가 더 많은 인재를 이 패러다임으로 끌어들일 것이라는 의도적인 도박이었다고 말한다. > *"예측을 통해 세계 모델을 학습하는 문제에 오랫동안 관심을 가져왔고, 5년쯤 전에 한 가지 깨달음을 얻었습니다. 이미지와 비디오의 표현을 학습하는 데 성공한 아키텍처는 모두 비생성적이고, 생성적인 것들은 모두 실패했다는 것입니다."* ## [15:55] 현재 로봇공학 모델의 문제점 현재 로봇공학 시연은 인상적이지만, 텔레오퍼레이션 녹화나 손 추적 시연 등 방대한 모방 데이터로 훈련하고, 대부분 시뮬레이션에서 RL로 파인튜닝한 결과다. 이 파이프라인은 취약한 전문가를 만들어낼 뿐이다. 17세 청소년은 약 20시간이면 운전을 배우는데, 수백만 시간의 주행 영상이 있어도 레벨 5 자율주행차는 아직 없다. 모방 학습과 진정한 일반화 사이의 간극은, 예시를 암기하는 것과 세계의 내부 모델을 갖는 것 사이의 간극과 같다. 월드 모델 기반 시스템에 대한 Yann의 주장은 제로샷 태스크 일반화다. 새로운 목표가 주어졌을 때, 정확한 내부 월드 모델을 가진 시스템은 그 태스크에 명시적으로 훈련받지 않아도 목표에 도달하는 행동 순서를 계획할 수 있다. 그가 단기적으로 겨냥하는 산업 응용은 제트 엔진, 화학 플랜트, 제조 라인 제어 등 입력이 이미 수치형이고 운영 데이터에서 직접 월드 모델을 훈련할 수 있는 환경이다. > *"월드 모델 기반 시스템이 가져올 일반화 수준은 모방 학습으로 훈련된 시스템보다 훨씬 넓습니다. 더 적은 학습 데이터로 더 다양한 태스크를 처리할 수 있습니다."* ## [20:37] 실리콘밸리의 군집 행동 산업 전체가 LLM 스케일링에 수렴한 이유에 대한 Yann의 진단은 구조적이다. 뒤처지면 다른 것에 할애할 여유가 없다. 경쟁 레이스는 모든 주요 연구소가 같은 참호를 파도록 합리적인 유인을 만들어낸다. 그는 바로 이 환경을 벗어나기 위해 파리에 AMI Labs를 세웠다. 미국 사무소도 실리콘밸리가 아닌 뉴욕이고, 실리콘밸리 VC 자금은 받지 않았다. 패러다임 전환 시점으로 그가 예측하는 것은 2027년 초다. "월드 모델"은 이미 연구 유행어가 됐고, 업계는 VLA가 실패했다는 것을 인정했으며, 로봇공학의 미해결 일반화 문제가 변화를 강제하는 요인이 되고 있다. AMI가 그때까지 완전한 해답을 갖게 될 것이라는 게 아니라, 패러다임 전환이 필요했다는 것이 그 시점에는 모두에게 명백해질 것이라는 예측이다. > *"패러다임 전환이 필요하다는 인식은 지금 이 순간 일어나고 있으며, 2027년 초에는 모두에게 완전히 자명해질 것입니다."* ## [28:18] Tapestry: 나머지 세계를 위한 자주적 AI Tapestry는 AMI와는 별도의 프로젝트로, 하나의 관찰에서 출발한다. 스마트 안경과 AI 어시스턴트가 주요 정보 인터페이스가 되면, 기반 모델을 통제하는 자가 수십억 명의 정보 식단을 통제한다. 인도의 농부, 독일의 철학자, 모로코의 시민, 이들 중 누구도 훈련 데이터와 가치관, 정치적 편향이 캘리포니아나 선전의 소수에 의해 결정된 모델에 잘 맞지 않는다. 해결책은 연합 훈련이다. 국가와 기관이 데이터와 컴퓨팅 자원을 기여하지만 원시 데이터는 서로 공유하지 않는다. 파라미터 벡터만 교환한다. 각 참여자는 로컬에서 훈련하고, 주기적으로 파라미터 업데이트를 교환하며, 어느 단일 주체도 통제하지 않는 인류 지식 저장소인 합의 모델을 가져간다. 인도부터 카자흐스탄, 프랑스까지 여러 국가가 관심을 표명했는데, AI 자주권이 기술 선택과 무관한 정치적 우선순위가 됐기 때문이다. > *"모든 정보 식단이 AI 어시스턴트를 통해 매개될 텐데, 그 AI 어시스턴트가 캘리포니아나 베이징에서 만들어졌다면 당신에게 좋을 리 없습니다."* ## [35:49] OpenAI는 제2의 Sun Microsystems 독점 LLM 제공업체들은 이미 공개적으로 이용 가능한 텍스트 데이터를 소진했다. 남은 경로, 즉 저작권 자료 라이선싱이나 합성 데이터 생성은 비용이 많이 들고 한계가 있다. 오픈소스 모델들은 그런 제약 없이 격차를 좁혀왔다. Yann은 1990년대 유닉스 워크스테이션 시장에 비유한다. Sun Microsystems, HP, SGI 모두 기술적으로 우월한 독점 시스템을 보유했고, Windows NT로는 웹 서버를 운영할 수 없다는 설득력 있는 논리를 폈다. 그러나 모두 Linux에 밀려났다. 지금 인터넷 전체가 Linux 위에서 돌아간다. OpenAI와 Anthropic은 이 사이클의 Sun Microsystems라고 그는 말한다. > *"기본적으로 오늘날의 OpenAI, Anthropic 등은 과거의 Sun Microsystems와 HPUX입니다."* ## [40:51] Yann의 관점이 Hinton, Bengio와 갈라진 이유 분열은 2023년에 일어났다. Yann의 입장은 변하지 않았다. Hinton과 Bengio의 입장이 바뀐 것이다. Hinton은 GPT-4를 접하고 피질 뉴런 수에 대한 개략적 계산에 기반해 인간 수준의 지능에 근접했다고 결론 내렸다. Yann은 그 논리가 틀렸다고 보며, Hinton이 승리를 선언하고 활발한 연구에서 물러날 구실을 찾은 것으로 읽는다. Bengio의 변화는 달랐다. AI 권력 집중으로 인한 사회적 위험에 더 초점을 맞췄는데, Yann은 종말론적 프레이밍에는 동의하지 않으면서도 그 우려 자체에는 더 공감한다. > *"나는 그 주장을 전혀 믿지 않는다. 이건 Jeff가 '이제 은퇴해도 된다, 승리를 선언했으니'라고 말하는 방식이다."* ## [44:32] LLM은 구조적으로 안전하지 않다 Yann의 가장 강한 주장은 이것이다. LLM은 신뢰할 수 있을 만큼 안전하게 만들 수 없다. 정렬이 어려워서가 아니라, 아키텍처 자체가 행동의 결과를 예측하는 데 구조적으로 무능하기 때문이다. 프롬프트된 LLM이 의도한 태스크를 실제로 수행한다는 하드코딩된 보장이 없다. 훈련이 조건화한 방향으로 수행할 뿐이고, 훈련 분포와 실제 프롬프트 사이에는 항상 간극이 있다. 하드 드라이브를 지우는 코딩 에이전트, 잘못된 의료 조언, 돌이킬 수 없는 행동을 취하는 에이전트 시스템, 이것들은 패치로 고칠 수 있는 버그가 아니라 아키텍처의 속성이다. 그의 대안인 목표 지향적 AI는 다르게 작동한다. 시스템에는 명시적인 월드 모델, 목표를 나타내는 명시적인 비용 함수, 그리고 하드 안전 제약이 있다. 옵티마이저는 모든 제약을 충족하면서 비용을 최소화하는 행동 순서를 찾는다. 즉, 구조적으로 안전 제약을 위반하는 행동은 불가능하다. LLM으로는 그런 보장이 불가능하다. 그는 또한 Anthropic의 AI 위험 로비 서사에도 반박한다. 진짜 위험은 현재 시스템을 이용하는 나쁜 행위자에서 오는 것이지 창발적 초지능에서 오는 것이 아니며, 규제 압박은 주로 기존 사업자에게 유리하게 작용한다고 주장한다. > *"LLM은 본질적으로 안전하지 않습니다. 신뢰할 수 있고 안전하게 만들 수 있다고 생각하지 않습니다. 환각을 멈출 수 없으니 신뢰성 있게 만들 수도 없습니다."* ## [58:00] Yann이 Meta를 떠난 이유 Yann은 널리 퍼진 오해를 바로잡는다. 그는 Llama에 기술적인 영향력이 전혀 없었다. Llama 1은 작은 FAIR 프로젝트였고, 2023년 초 GenAI가 출범하면서 Llama 팀이 그쪽으로 이동해 강도 높은 단기 제품 압박을 받게 됐다. Llama 1 저자 두 명은 떠나 Mistral을 창업했다. GenAI는 보수적이 됐고 논문 출판도 점점 제한됐다. 한편 FAIR는 Yann과 Zuckerberg, CTO가 당초 모두 지지했던 AMI 연구 의제 대신 GenAI의 LLM 작업을 지원하는 방향으로 재편되고 있었다. 2024년 초에 이르러 환경은 더 이상 돌파구 연구에 맞지 않았다. > *"내 역할, Alex와의 관계, Meta에서 AI가 어떻게 운영됐는지에 대한 큰 오해가 있습니다."* ## [01:00:26] FAIR를 돌아보며 Yann은 2013년 말 Facebook에 합류해 4년 반 동안 FAIR를 이끈 뒤 자신이 타고난 관리자가 아니라는 이유로 수석 AI 과학자로 자리를 옮겼다. 내부 AMI 프로젝트는 2022년 비전 논문에서 자라났고, Zuckerberg, CTO, CPO 모두 읽고 지지했다. 하지만 리더십 아래 층에서는 그 의미를 파악하지 못했다. Meta가 Gita Matarić이 이끌던 로봇공학 AI 그룹 전체를 해체한 결정, 그 후 Matarić은 Amazon으로 갔다, 이는 회사가 월드 모델이 만들어진 응용 분야에 관심이 없다는 것을 분명히 했다. 논문 출판 제한이 강화되고, 우수한 연구자들이 떠나며, Yann의 연구 의제와 Meta의 제품 우선순위 간의 괴리는 2025년 초에 이르러 더 이상 봉합할 수 없게 됐다. AMI 투자 유치에 나섰을 때 투자자들은 이미 수년간의 공개 강연을 통해 그의 이야기를 알고 있었고, LLM에 근본적인 한계가 있다는 것을 믿을 준비가 돼 있었다. > *"초창기 FAIR와 Bell Labs에서 이뤄진 것과 같은 돌파구 연구를 이끌어내는 최선의 방법은 최고의 인재를 뽑고, 성공할 수 있는 수단을 주고, 그냥 빠져주는 것이다."* ## [01:12:11] 박사과정 학생들에게 주는 조언 Yann은 자기지도 학습이 비디오에서 성공할 것이라는 자신의 예측이 메커니즘은 맞았지만 처음 성공한 곳이 틀렸다는 반성으로 시작한다. LLM은 "자기지도 학습의 눈부신 성공 사례"지만 감각 데이터가 아닌 언어에 적용됐다. 그런 다음 JEPA의 핵심 기술 과제를 제시한다. 표현 붕괴다. 한 임베딩을 다른 임베딩에 매핑하도록 예측기를 훈련하면, 두 인코더가 모두 상수를 출력하는 것이 자명하게 최적인 해다. 대조 학습(그의 1993년 발명)은 붕괴를 막지만 차원과 함께 스케일이 안 된다. DINO 같은 증류 방법은 효과가 있지만 이유가 잘 이해되지 않는다. 현재 그의 최선 답은 SIGreg(Sketched Isotropic Gaussian Regularization)으로, 인코더 출력 분포를 가우시안으로 강제해 음의 쌍 없이 정보 함량을 최대화한다. AMI Labs가 향하는 곳을 파악하는 최고의 입문으로 LeWorldModel 논문을 추천한다. 박사과정 학생들에 대한 조언은 LLM을 연구하지 말라는 것이다. 프론티어 컴퓨팅 없이는 아카데미아에서 기여할 수 없고, LLM이 왜 작동하는지 연구하는 것은 창의적 연구가 아닌 기술적 과학이라는 것이다. > *"LLM이 작동하는 이유는, 이산 기호 시퀀스가 있을 때는 예측이 쉽기 때문입니다. 실제 세계에서는 생성 모델을 쓸 수 없습니다. 표현을 학습하고 표현 공간에서 예측을 하는 시스템을 훈련해야 합니다."* ## 엔티티 - **Yann LeCun** (인물): 2018년 튜링상 공동 수상자; Meta FAIR 전 수석 AI 과학자; AMI Labs 창업자; NYU 교수; 합성곱 신경망 발명자이자 JEPA 공동 개발자 - **Jacob Effron** (인물): Redpoint Ventures 파트너; Unsupervised Learning 팟캐스트 진행자 - **Geoffrey Hinton** (인물): 튜링상 공동 수상자; GPT-4 이후 LLM 능력에 대한 입장을 바꿨고, 2024년 이후 AI 위험 발언이 줄었다 - **Yoshua Bengio** (인물): 튜링상 공동 수상자; 창발적 초지능보다 AI 권력 집중으로 인한 사회적 위험에 집중 - **JEPA** (개념): Joint Embedding Predictive Architecture. 픽셀 공간이 아닌 표현 공간에서 예측하며, Yann의 월드 모델 프레임워크에서 지각 백본을 담당한다 - **World Model** (개념): 에이전트가 행동을 실행하기 전에 결과를 예측하게 해주는 내부 모델. Yann의 프레임워크에서 안전한 에이전트 AI의 전제 조건 - **Tapestry** (개념): 연합 LLM 훈련 프로젝트. 국가와 기관이 파라미터 벡터 교환을 통해 데이터 자주권을 유지하면서 공동 파운데이션 모델을 훈련할 수 있도록 한다 - **AMI Labs** (조직): Yann의 회사(Advanced Machine Intelligence). 파리 본사, 뉴욕 미국 사무소. 로봇공학, 산업 제어, 헬스케어를 위한 JEPA 기반 월드 모델에 집중 - **Meta FAIR** (조직): Facebook AI Research. Llama 1, I-JEPA, V-JEPA, AMI 내부 연구 프로그램의 발원지. Yann 퇴사 전 GenAI LLM 지원 방향으로 점차 재편됐다

트럼프-시 정상회담, Benioff: "이번이 첫 SaaS 묵시록은 아냐", OpenAI vs 애플, 다중감각 AI, 엘니뇨
Salesforce CEO Marc Benioff가 Jason Calacanis, David Friedberg, Chamath Palihapitiya(David Sacks 불참)와 함께 폭넓은 대화를 나눈다. 이번 에피소드는 두 개의 실시간 이슈를 중심으로 전개된다. 2017년 이후 처음 열리는 트럼프-시 정상회담, 그리고 AI가 기업 소프트웨어 밸류에이션을 흔드는 현실이다. 사우디 국빈 만찬, 윈저 성, 이번 정상회담 대표단에 모두 참석한 Benioff는 미중 민간 외교의 최전선을 직접 전하고, Salesforce가 AI 격변의 수혜자로 자리할 수 있는 이유를 설명한다. 후반부에서는 OpenAI와 애플의 충돌, Thinking Machines의 실시간 멀티모달 데모, Friedberg의 충격적인 엘니뇨 데이터, Anthropic의 SPV 다층 구조 단속을 다룬다. ## [00:00] Salesforce CEO Marc Benioff, 쇼에 합류하다! 이번 주 Sacks는 자리를 비웠고, Benioff가 그 자리를 채웠다. Jason은 곧바로 Benioff의 정치적 입장을 묻는다. 과거 민주당 후원자였던 그가 사우디 국빈 만찬에 참석하고 현 행정부와도 마찰 없이 교류한다는 점을 짚었다. Benioff는 당파적 시각을 단호히 거부한다. > *"나는 민주당원도 공화당원도 아닙니다. 나는 미국인입니다."* Chamath는 Benioff가 윈저 성, 찰스 왕세자의 미국 방문, 사우디 국빈 만찬 초청을 연달아 받았다고 짚었다. 정권이 바뀌어도 마찰 없이 움직이는 드문 테크 CEO라는 것이다. 이 장면은 정상회담 현장을 실시간으로 지켜본 Benioff가 얼마나 독보적인 증언자인지를 보여준다. ## [01:14] 트럼프-시 정상회담, 미국 기업의 중국 비즈니스, 미국인과 중간선거에 미칠 영향 이란 전쟁으로 두 달 늦춰진 트럼프-시의 일곱 번째 대면 회담이 베이징에서 열렸다. 시진핑은 대만 문제를 잘못 다루면 양국 관계가 "극히 위험한 상황"에 처할 수 있다고 경고했다. Polymarket에서는 2026년 침공 확률이 2,300만 달러 거래량 기준 6%로 집계됐다. 무역 측면에서 시진핑은 대두, 미국 LNG, 보잉 제트기 200대 구매를 약속하며 "더 넓은 무역의 문"을 열겠다고 했다. 미국 대표단은 마치 기업 이사회 같다. Jensen Huang은 반도체를, Kelly Ortberg는 항공기를, Cargill의 Brian Sykes는 대두를 팔고, Visa와 Mastercard는 결제 시장 개방을 요구했다. Friedberg는 투키디데스 함정의 틀로 정상회담을 해석했다. 부상하는 강국과 쇠퇴하는 강국이 마주치면 역사적으로 충돌이 일어나지만, AI와 바이오테크가 만드는 자원 팽창의 순간이 그 패턴에서 벗어날 드문 탈출구가 될 수 있다고 봤다. > *"AI, 자동화, 바이오테크 같은 기술 전환이 눈앞에서 펼쳐지고 풍요의 시대가 열릴 수 있는 이 순간, '어쩌면 세계가 더 다극적으로 갈 수 있다'고 말할 완벽한 타이밍인 것 같습니다."* Benioff는 Salesforce가 중국 본토에 사무실이나 직원이 전혀 없다고 밝혔다. 데이터 현지화 규정을 충족하기 위해 모든 중국 매출은 알리바바와의 독점 파트너십을 통해 흘러간다. 그는 이번 정상회담이 대표단 전반에 걸쳐 실질적인 수주로 이어질 것이라고 내다봤다. Chamath는 중국의 하향식 유교적 위계 구조 때문에 CEO급 직접 외교가 관료적 채널보다 훨씬 효과적이며, 인플레이션으로 생활이 빠듯해진 미국인들에게도 이 합의가 반드시 작동해야 한다고 강조했다. ## [18:46] 대만, 반도체, AI 모델, 그리고 무역을 통한 평화 Benioff는 대만이 시진핑의 핵심 우선 과제라는 전제에 반박했다. 영토 야욕보다 경제 번영과 중산층 성장이 시진핑에게 더 중요하다는 것이다. "미국이 대만을 봉쇄에서 지켜야 하는가"라는 직접적인 질문에는 이분법을 거부했다. "중국과 대만은 화해할 것"이라고 잘라 말했다. Chamath는 구조적 관점을 제시했다. 미국이 국내 반도체 공정 수준에서 1~2 나노미터 격차만 남겨두고 있으며, 그 격차가 좁혀지면 대만의 전략적 가치는 실존적 문제가 아니라 경제적 문제로 바뀐다고 봤다. > *"우리는 대만이 전략적으로 해줘야 하는 것을 우리 스스로 할 수 있는 지점에서 1~2 나노미터 정도 떨어져 있습니다. 지금은 그게 경제적인 문제이고, 그것이 협상 테이블에서 사라지면 대만을 보는 시각도 크게 달라질 것입니다."* Chamath의 처방: 어차피 반도체를 팔아라. 화웨이가 반도체 경쟁에서 이기도록 두는 것이 KYC 조건 아래 Nvidia가 중국에 파는 것보다 더 나쁘다. Benioff도 동의했다. 반도체 규제에도 불구하고 중국 AI 모델이 미국 모델과 대등한 수준에 이르렀다는 점은 수출 금지 논거를 약화시킨다. Friedberg는 중국이 자국 팹과 장비를 구축할수록 정치적 결과와 무관하게 대만의 대체 불가능성이 자연스럽게 줄어들 것이라고 덧붙였다. ## [31:41] AI가 소프트웨어에 미치는 영향: 어떤 SaaS가 살아남고 어떤 SaaS가 죽는가? Jason은 재평가 현실을 거침없이 짚었다. Salesforce 37%, ServiceNow 42%, Workday 45% 하락—AI가 매니지드 SaaS를 쓸모없게 만들 것이라는 가정 하에 합산 시가총액 약 1,800억 달러가 증발했다. Benioff는 정면돌파했다. > *"솔직히 이게 내가 처음 겪는 SaaS 묵시록은 아니지만, 지금의 SaaS 묵시록인 건 맞죠."* 그의 논리: 시장은 잘못된 전제 위에서 재평가를 단행했다. Salesforce의 베팅은 Agentforce다. 환각 가능성이 있는 범용 모델이 아니라 실제 기업 데이터에 기반한 AI 에이전트다. 80억~90억 달러 규모의 Informatica 인수는 에이전트를 신뢰할 수 있게 해주는 데이터 조화 계층을 제공한다. "AI는 매우 확률적이어서 진실에, 하나의 단일 진실 소스에 고정되지 않으면 제대로 작동하지 못합니다." Benioff는 Salesforce가 내부 코딩 에이전트용으로만 올해 Anthropic에 약 3억 달러를 지출해 구현 사이클을 대폭 줄이고 있다고 덧붙였다. Chamath는 시장을 둘로 나눴다. 저가 시장은 끝났다. 깊은 고객 관계 없이 단일 기능만 제공하는 솔루션은 사라진다. 반면 Salesforce가 속한 고가 시장은 공개 시장이 AI에 대한 "황홀경"에서 깨어나 3조 달러의 자본 지출이 무엇을 낳았는지 묻기 시작할 때 오히려 수혜를 입을 위치다. 살아남는 기업은 C레벨 관계망, 마이너스 이탈률, AI 역량을 측정 가능한 성과로 패키징하는 능력을 갖춘 곳이다. ## [47:26] OpenAI, ChatGPT 연동 실패로 애플 소송 검토 중 Bloomberg 보도에 따르면 OpenAI가 계약 위반을 이유로 애플 소송을 검토 중이다. 2024년 ChatGPT-Siri 계약은 실제로는 작동하지 않았다. 애플이 사용자가 명시적으로 "ChatGPT"라고 말할 때만 연결하고 연동을 홍보하지 않았으며, OpenAI는 기대했던 구독 매출을 끝내 보지 못했다. 애플의 반론은 OpenAI의 데이터 처리 관행에 대한 개인정보 우려다. Benioff는 이 사안을 AI 랩들의 전략 분기 이야기로 재해석했다. Grok은 컴패니언과 "섹스봇"을 만들었고, OpenAI는 Sora와 광고 네트워크를 밀었고, Gemini는 Nano를 출시했다. Anthropic은 그 모든 것을 무시하고 코딩 에이전트에만 집중했는데—Anthropic이 옳았다. 그는 Slack 네이티브 코딩 기능도 미공개 상태로 언급했다. > *"Anthropic은 '우리는 그런 섹스봇도, Nano 바나나도 모르겠고, 코딩 에이전트를 만들겠다'고 했습니다. 그리고 Anthropic이 옳았죠. 로켓이 날아오른 겁니다."* Chamath는 더 근본적인 질문을 던졌다. AI 인터랙션 계층이 기기 밖으로 완전히 이동하면 애플에게 무슨 일이 생길까? 그는 예상치 못한 하드웨어 플레이어로부터 "아이폰 모먼트"가 올 것이라고 예측했다. 항상 켜져 있는 얇은 앰비언트 기기가 AI 추론에서 MacBook Pro를 무의미하게 만드는 시나리오다. Friedberg는 애플의 현재 전략이 선도적 비전보다는 빈틈 메우기에 가깝다고 짚으면서, G Suite가 기업 생산성 시장에서 애플 스택을 조용히 잠식하고 있다고 덧붙였다. ## [56:54] Thinking Machines, 실시간 모델 공개…소비자 AI의 미래와 다중감각 모델 Mira Murati의 Thinking Machines가 실시간 멀티모달 모델을 공개했다. 200ms 간격으로 두 개의 병렬 파이프라인—하나는 심층 회고적 추론, 하나는 실시간 응답—을 통해 데스크톱 화면, 주변 오디오, 웹캠 입력을 동시에 처리한다. 애플은 AirPods 내부 카메라 관련 특허를 동시에 출원했다. > *"다중감각 모델은 AI의 다음 큰 물결입니다. 그 단계에 도달해도 우리는 아직 AGI에는 이르지 못한 상태입니다."* Benioff는 언어 데이터로만 학습된 LLM의 근본적 한계를 지적했다. 인간의 인지는 눈, 귀, 고유감각을 생물학적 하드웨어 위에서 동시에 처리한다. 다중감각 기반이 바로 그 빠진 고리다. 토큰 경제학도 극적이다. 사용자당 하루 8시간 실시간 앰비언트 모니터링은 현재 기업 소비량의 1,000배에 달한다. Benioff는 "더 큰 모델 = 더 좋은 결과"라는 군비 경쟁에 반기를 들었다. 앱과 기기에 내재된 분산 지능이 단순 모델 규모보다 더 중요해질 것이며, 앰비언트 감지와 기업 맥락을 통합할 "주목받을 신생 기업"의 공간이 열릴 것이라고 봤다. ## [62:24] 사이언스 코너: 2026년 역대급 엘니뇨의 충격 Friedberg는 해수면 온도 이상 데이터를 제시했다. 1877년 이후 최대 편차를 향해 달리는 해수 온도—기준치보다 약 4°C 높다. 저장된 열에너지는 1,100만 테라와트시로, 인류의 연간 에너지 소비량 25,000 테라와트시와 비교된다. > *"저 바다에는 인류 500년치 에너지가 담겨 있습니다. 그리고 앞으로 몇 달에 걸쳐 그 에너지가 대기로 방출될 것입니다. 99% 확신을 갖고 말씀드리는데, 올해는 역대 가장 더운 해가 될 것이며 그 격차도 압도적일 것입니다."* 연쇄 효과: 변화한 무역풍이 대기하천을 캘리포니아와 걸프 연안으로 몰아넣고, 열돔이 피닉스와 캐나다 내륙 위로 확장되며, 인도 몬순이 높은 확률로 실패해 1억 5천만 명의 농민과 15억 명의 식량 의존 인구를 위협한다. 브라질의 인도네시아·필리핀行 농산물 수출이 무너지고 밀 가격이 세계적으로 급등한다. 5월에 피닉스는 이미 106°F를 기록했다. 상품 시장은 이미 엘니뇨 익스포저를 활발히 거래 중이다. Friedberg가 제시하는 부분적 희망: 작물 유전학이 가뭄 내성을 높였고 시베리아 농지가 확장 중이다—그러나 그 이득이 2026년 수확 시즌을 구하지는 못한다. ## [71:40] Anthropic, "다크 SPV"를 정조준하다 Anthropic은 소매 투자자에게 다층 SPV를 판매하는 플랫폼—"치과 의사에게 10% 수수료를 물리는" 구조—을 공식적으로 문제 삼고, 무허가 구조를 통해 팔린 주식을 무효화하겠다고 밝혔다. Chamath는 전폭적인 지지를 표명했다. IPO 전 모든 기업이 이 선례를 따르고 공개 시장으로 나아가 이런 구조를 사라지게 해야 한다는 것이다. > *"SpaceX가, Anthropic이, OpenAI가 상장하고 나면 SPV 판매자들과의 소송이 줄줄이 터질 것입니다. 이 구조는 허용되어서는 안 됩니다."* Chamath는 주요 AI 기업들이 상장하고 소매 SPV 투자자들이 수익 계산이 맞지 않는다는 걸 깨닫는 순간, 대규모 법적 후폭풍이 밀려올 것이라고 예측했다. 마지막에는 Benioff가 Salesforce의 1-1-1 박애주의 모델을 소개했다. 창업 당시 지분 1%, 이익 1%, 직원 시간 1%를 기부하는 이 모델은 지금 5만 개의 비영리 단체에 플랫폼을 무료로 제공하고 있다. 그리고 Susan Wojcicki에 대한 감동적인 추모로 챕터를 마무리했다. ## 등장인물 - **Marc Benioff** (인물): Salesforce 회장 겸 CEO; 이번 에피소드 게스트; 1-1-1 박애 모델과 Agentforce AI 에이전트 플랫폼의 설계자 - **David Friedberg** (인물): 진행자; The Production Board CEO; 엘니뇨 사이언스 코너 발표 - **Chamath Palihapitiya** (인물): 진행자; Social Capital CEO; Salesforce 고가 SaaS 생존론과 Nvidia 반도체 확산론 주장 - **Salesforce / Agentforce** (소프트웨어): 기업용 CRM 및 에이전트 플랫폼; 데이터 기반 AI 에이전트가 SaaS 사망 선고의 반대 증거라는 Benioff의 베팅 - **Anthropic** (조직): AI 안전 기업; Benioff가 선호하는 코딩 에이전트 공급사(Salesforce의 연간 계획 지출 약 3억 달러); 무허가 SPV 구조 단속 주도 - **OpenAI** (조직): ChatGPT-Siri 연동 실패로 애플 소송 검토 중; Anthropic의 성공을 따라 코딩 에이전트로 피벗 - **Thinking Machines / Mira Murati** (조직): 200ms 간격으로 데스크톱·오디오·웹캠을 동시 처리하는 실시간 앰비언트 멀티모달 모델 공개 - **투키디데스 함정** (개념): 부상하는 강국과 쇠퇴하는 강국의 충돌 주기를 설명하는 정치학 프레임; Friedberg가 미중 정상회담의 협력적 풍요 기회를 조명하는 데 인용 - **다크 SPV** (개념): AI 비상장 기업의 주식을 소매 투자자에게 판매하는 다층 특수목적법인; 높은 수수료와 법적 불확실성 문제로 논란
How Claude Code Works
Episode two of Anthropic's Claude Code 101 opens the hood: the agentic loop that gathers context, takes action, and verifies results; how the context window compacts itself before it overflows; what tools actually buy you over plain text-in-text-out; and the four permission modes you toggle with shift+tab. ## [00:04] Opening question: how is it different from a chat app The narrator frames the rest of the video as one question — Claude Code isn't a chat app, so what is the shape of the thing? The answer they're going to unpack is the agentic loop. > *We know that Claude code is different from usual chat applications, but how does it work?* ## [00:13] The agentic loop — gather, act, verify, repeat The loop has four beats. You enter a prompt. Claude gathers the context it needs by talking to the model, which returns either text or a tool call. Claude executes the action — editing a file, running a command. Then it verifies whether the result actually satisfies the prompt. Pass and it stops; fail and it loops again until the work is complete and verifiable. The user isn't locked out during this — you can add context, interrupt, or steer the model toward the end goal while the loop is running. > *And if they don't, Claude goes back and runs the loop again until the results are complete and verifiable.* ## [01:02] Context window and automatic compaction The context window is Claude's working memory — conversation, file contents, command outputs, everything it can look back on. It's bounded. When you hit the ceiling, Claude Code compacts the conversation on its own: it picks what to drop and what to summarize so the window comes back down without losing the thread. > *Once you reach that limit, Claude code compacts your conversation, which automatically determines what it can take out of the context window and what it can summarize in order to bring the context window back down.* ## [01:26] Tools — semantic dispatch to read files, run code, search the web Most AI assistants are text in, text out, with nothing between. Tools are what change that — they let the agent decide when to execute code to move closer to the goal. Read a file, search the web, run a shell command. Claude Code uses semantic search over the available tools to pick which one to call and consume the output. > *Tools let Claude code and other agents determine when to execute code to get closer to a task.* ## [01:52] Permission modes and the cost of skipping them By default, Claude Code asks before it edits a file or runs a shell command. Shift+tab cycles through alternatives: **auto-accept edits** writes files without prompting but still asks before commands; **plan mode** restricts Claude to read-only tools so it can draft a plan of action before touching anything. The narrator flags the obvious tradeoff — handing the agent free rein means a mistake is harder to catch before it lands. > *Giving Claude code free reign to run commands means a mistake could be harder to catch before even happens.* ## [02:28] Recap — what makes it not a chat window Four primitives composed into a terminal: an agentic loop, a managed context window, tools, and configurable permissions. The combination — read the codebase, act on it, verify its own work — is what separates Claude Code from a chat box. > *It can read your code base, take action, and verify its own work, and that makes it fundamentally different from a chat window.* ## Entities - **Anthropic Tutorial Narrator** (Person): Anthropic's official voice-over narrator for the Claude Code 101 tutorial series. - **Claude Code** (Software): Anthropic's agentic terminal coding assistant, built around the four primitives unpacked in this episode. - **Agentic loop** (Concept): The gather-context → act → verify → repeat cycle that drives every Claude Code session. - **Context window** (Concept): Claude's bounded working memory holding the conversation, file contents, and command output; auto-compacted on overflow. - **Tools** (Concept): Side-effects the agent can invoke — read file, search web, run command — selected via semantic search over the tool catalog. - **Permission modes** (Concept): Default (ask), auto-accept edits, and plan mode (read-only) — cycled with shift+tab. - **Plan mode** (Feature): A read-only permission mode that lets Claude compile a plan of action before any mutation.
Installing Claude Code
The official install guide for Claude Code. Anthropic's narrator walks through the one-line installers for every supported platform — terminal, VS Code, JetBrains, Claude Desktop, and the web — and closes with a quick rule of thumb for picking one. ## [00:04] One-line installers for terminal (macOS, Linux, WSL, Windows) The default path is the terminal. macOS, Linux, and WSL users get a single `curl` command; Homebrew works too but skips auto-update. On Windows, PowerShell uses `Invoke-RestMethod`, CMD has its own `curl` snippet, and `winget` is available with the same auto-update caveat as Homebrew. > *If you're on macOS, Linux, or WSL, use this curl command to install it in one go. If you prefer to use Homebrew, you can also use brew install to install it, but note that this doesn't have auto-update capabilities.* ## [00:33] Run claude in your project and sign in After install, `cd` into your project and run `claude`. First launch hands you a color theme picker and a sign-in flow that accepts a Pro, Max, Enterprise, or API-key login. Enterprise accounts must explicitly pick that option. The directory you launch from defines the access boundary — Claude Code sees that folder and everything beneath it, nothing above. > *Whatever directory you decide to run cloud in, it will have access to that directory and all of its subfolders.* ## [01:02] VS Code extension Open the Extensions panel, search for the Claude Code extension by Anthropic, and confirm the blue verified check before installing. A restart may be required. Once installed, the Command Palette (`Ctrl/Cmd+Shift+P`) opens a new Claude Code tab; you can also click the logo from any open file, or opt out of the GUI entirely and stick to the terminal experience via settings. > *You can also opt out of the UI and just use the terminal experience directly in your settings file.* ## [01:32] JetBrains plugin Same shape as VS Code: install the Claude Code plugin from the JetBrains Marketplace, restart the IDE, and the Claude logo shows up on relaunch. Clicking it opens a side pane that surfaces the terminal experience next to your editor. > *For JetBrains IDEs, you can install the Cloud Code plugin from the JetBrains Marketplace. Once you install, restart your IDE.* ## [01:51] Claude Desktop and claude.ai/code on the web Claude Desktop exposes Claude Code through a "code" toggle at the top of the app once you're signed in — same chat-style feel, but scoped to a specific folder with adjustable permissions and even a cloud execution mode. The web build lives at `claude.ai/code` and mirrors the desktop experience, with one hard constraint: it only works against GitHub repositories. > *On the web, you can access Claude code by going to claude.ai/code. This works very similar to the desktop app. However, you're restricted to GitHub repositories only.* ## [02:27] Picking the right surface The narrator's heuristic: terminal first if you want new features the day they ship. IDE integrations give you a nearly identical experience tucked inside your editor. Desktop is the pick when you want Claude grinding in the background while you do something else. Web is for remote work on GitHub repos or running multiple sessions in parallel. > *If you want to constantly keep up to date with everything, the terminal is the best bet. Features ship there the fastest.* ## Entities - **Anthropic Tutorial Narrator** (Person): Voice-over host of Anthropic's Claude Code 101 course. - **Claude Code** (Software): Anthropic's agentic coding tool, installable across terminal, IDEs, desktop, and web. - **Homebrew / winget** (Software): Package-manager install paths offered as alternatives to the official curl/PowerShell installers — both skip auto-update. - **VS Code extension** (Software): Anthropic-published Claude Code extension; verify the blue check before installing. - **JetBrains plugin** (Software): Claude Code plugin distributed via the JetBrains Marketplace; opens a side pane after IDE restart. - **Claude Desktop** (Software): Desktop app exposing Claude Code via a "code" toggle, with folder scoping and a cloud execution mode. - **claude.ai/code** (Service): Web build of Claude Code, restricted to GitHub-hosted repositories.