LaiDub

Podcasts

Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning
59:41
EN/ZH
Watch with Captions
Unsupervised Learning: With Jacob Effronhace 4 días

Gemini Co-Lead on World Models, RL's Next Domains & Continual Learning

Oriol Vinyals(Google DeepMind VP of Research、Gemini 联合负责人)在 Google I/O 第二天坐下来,把 I/O 上发布的产品背后的研究路线一条条摊开:世界模型为什么是 Google 押向 AGI 的独特路径、视频 / 图像的"GPT moment"长什么样、Spark 和 agents 系统为什么必须和模型联合优化、scaffolding 终将由模型自己写、memory 应该走非参数 file-system 而不是塞进权重、当今 RL 在哪些维度上是数据受限的、为什么 math/code 上的训练能意外迁移、以及 Google 内部 Brain + DeepMind 合并后研究下注的取舍。 ## [00:00] Intro Jacob 用 60 秒铺垫了 Oriol 的背景(Gemini 联合负责人,与 Noam Shazeer、Jeff Dean 并列),以及 I/O 第二天访谈的优势:所有发布都还热乎,可以直接顺着 announcements 追到背后的研究。Oriol 进来打招呼,两人开始热身。 > *"I've been really excited for this because you're one of the people kind of most directly shaping the frontier of AI."* ## [01:36] Why World Models Jacob 先问"为什么是世界模型"。Oriol 把它拆成两层:一层是 self-improvement / coding 的角度,另一层是模型本身的对象——多模态、不止 closer 还包括 video / image 这种"world model"。Google 早就押了图像和视频路线,这次"显然押对了",因为我们其实把整个世界都搬到了互联网上。 他也承认中间有一段时间这条路看似不性感:multimodal 模型在 LLM 风口下被边缘化过,但视频和图像里藏着语言抓不到的知识——"the GPT moment for video"还没真正发生,但拐点已经在视野里。 > *"There is lots of knowledge in videos and images, and what I would say is the GPT moment for that — I'm not sure we quite have seen that."* ## [04:21] The GPT Moment for Video Oriol 用 Omni(Google 的多模态产品线)当锚点解释:从单纯把视频喂进上下文,到能在长上下文里理解和生成视频,这段曲线已经很陡。下一步是问"能不能像 LLM 一样,在没有 paired text 的纯图像数据上预训练并依然提取出全部意义和细节"——这个 hard challenge 一旦解开,数据维度会从"被人类描述过的"跳到"所有视频",量级差异巨大。 他特别承认现在 video 这块的标注数据相对 image 仍然稀缺,但解锁后的回报会"非常大"。 > *"Whether we agree with that or not is another question, but if it was to be unlocked, it would be massive."* ## [07:51] What Makes Omni a World Model "world model"这个词被滥用了,Oriol 给一个清晰定义:一个纯粹的 world model 必须做 representation learning——把世界压成紧致表征。在这之上,Omni 进一步成为可被语言驱动的 renderer:你用自然语言改一个 prompt,输出的视频内容随之改变,初始 image 之上能持续演化。这是从"被动建模"到"可控生成"的关键区别。 > *"The world model itself is acting as a renderer of the world, that you can really just change by language."* ## [10:04] World Models & Robotics 机器人是 world model 最直接的落地场景。Oriol 承认现在数据 mix 还在试错——sim 数据 vs 真机数据怎么配、什么时候 transfer 突然 click。世界模型本身的进步会带来一个 inflection point:一旦模型足够强,sim → real 的鸿沟会缩到 planning 和 gross motor 层面先打通,精细运动控制再慢慢跟上。 > *"At some level, maybe not at the precise motor control but at the kind of planning and gross, we are going to start seeing how things are going to fall into place."* ## [12:37] Evaluating Physics in AI 模型隐式学物理,但你怎么评估它学到没学到?Oriol 把它和无监督机器翻译做类比:如果模型内部确实表征了"重力"这个概念,应该能用某种 decode 把它翻译成显式 explanation。Stefano Gaus 等人 2014 年的早期 unsupervised translation 工作给了一条可借鉴的思路——把内部表征解码出来当 eval。 > *"You would need to somehow connect the concept of gravity which could be present or not in a world model to then decode that into an explanation."* ## [14:51] Consumer Agents & Spark I/O 发布的 Spark 是 Google 在 consumer agent 上的最新一步。Oriol 强调:"action 作为一种 modality"已经被 DeepMind 早早识别为关键。但 agent 不是把模型塞进 generic scaffold 就行——模型能力必须先到某个门槛,你才能 dream 出下一阶段的产品形态。 他给一个工程判断:在 train 阶段就把"我有这些能力,怎么挑用哪些"内化进模型,比在 inference 时让外部 scaffold 临时决策更高效。 > *"It's useful to build kind of the system slightly more narrowly around something you care deeply about."* ## [18:39] Scaffolding & the Bitter Lesson Oriol 多年支持 Sutton 的 bitter lesson。Jacob 把它推到 agent 时代:scaffolding 看起来违背 bitter lesson 因为是手写的胶水。Oriol 的答案是——"scaffold 本身就是一段 code,最终应该是模型自己 on the fly 写出来"。短期内人写、长期模型写,bitter lesson 仍然站得住。同时优化 model 和 scaffold 两端,而不是把所有赌注押在一端。 > *"That system itself is a piece of code that eventually the model itself could write on the fly."* ## [22:06] Memory & Continual Learning Memory 这个话题 Oriol 谈得最深——他有 cognitive neuroscience 背景。他把 memory 分成两类:塞进权重(参数化)和挂在外部 file system(非参数化)。在 serving 规模下,把每次 user interaction 都 bake 进 weight 是不切实际的,非参数式 file-system memory 更可行。 真正的难点是"consolidate":怎么把之前 session 的信息整合到新 session,让模型像人一样积累知识。这部分 momentum 很大但远未饱和,未来几年评估方式和工程实践都会迭代。 > *"The way that we'll see better evaluations and ways in which these models accumulate this knowledge as they go."* ## [26:54] Research Bets Inside Big Labs 在 Google 内部主导 Gemini 是什么体验?Oriol 谈三个维度的优势:TPU 联合设计(不用看 Nvidia 脸色)、广告/搜索带来的现金流稳定性、Brain + DeepMind 合并后端到端的研究强度。劣势是:组织太大没法对所有方向有全视野,必须靠直觉判断哪些早期研究值得 pull in,并接受"trade-off 不可能每次都做对"。 > *"Google is in a unique place. We have stability from hardware procurement and obviously like also investment of capital."* ## [32:30] Post-Training RL is Greenfield post-training 这块仍然是一片 greenfield。在 coding 和 math 上 LLM 已经走出指数曲线,但其他领域为什么没跟上?Oriol 的核心判断是"投入还远远不够"——相对预训练的算力消耗,post-training 至今只用了很小一部分。算法的 beauty 还在迭代,"cracking that recipe could be big"。 > *"Cracking that recipe could be big, at least in terms of the beauty of the algorithm."* ## [35:57] What Real Intelligence Looks Like 真智能长什么样?Oriol 用 2015 年的一个老 eval 来当锚——简单的 game-playing 任务,当时是 RL 的天花板,现在 LLM 一上来就能做。他想看到下一个数量级的跃迁:不是在熟悉的 benchmark 上推数字,而是在新的、人类没法立刻给出答案的问题上看到模型"主动产出洞察"。 > *"I like games."*(这句简单的自陈背后是他对 game-playing RL 长期偏爱的注脚) ## [39:11] RL Generalization 游戏曾经是 verifiable reward 的典型样板。现在的挑战是找新的 hard problem source,让 RL 在更广的领域诱发出深度推理和泛化。Oriol 抛出一个不对称观察:create solution 和 evaluate solution 之间存在 gap——如果 evaluation 比 generation 容易,RL 就有机会撬动。 让他意外的是:在 math/code 上的训练能 surprisingly 迁移到其他领域,"很多泛化能力可能其实来自 pre-training"。这是接下来几个月到几年研究者要破解的关键题。 > *"Possibly through pre-training — that's one of the quests for researchers to crack in the next few months and years."* ## [42:55] Advice for Founders 给 founder 的建议直白:evaluation 和 data 是绕不开的 moat。早期专注垂直产品、在 model 上叠一层 specialized scaffolding,等到 scale 起来再考虑 model layer 的差异化——这个路径"比较 scalable,也更适合早期玩家"。 > *"What I would tell folks is the value — and we discussed this a little bit — the value of evaluations and as a sequence of data."* ## [46:40] Can AI Truly Innovate? Oriol 2016 年加入 DeepMind 后最痴迷的方向是 meta-learning——模型自己产出 idea。但他承认到目前为止,"我没看到模型生成真正 outstanding 的 idea"。他比喻:你让一万个人尝试,挑出对的那个再 glorify,但模型真正自主提出方向的能力——quite limited。但他相信 "soon"。 > *"I don't think I've seen truly kind of outstanding ideas that a model has generated yet, but I am sure I will very soon."* ## [49:48] Recursive Self-Improvement 递归自我改进可以分层看:第一层是 researcher / engineer 用 AI 工具加速自己;第二层是模型直接自动化某些研究任务。当模型写英文比你好的那一天,下一个 ceiling 在哪里?Oriol 说:"maybe there's no ceiling, or the ceiling is still far away" —— 我们甚至不一定能看到 ceiling 在哪里。 > *"At the point a model writes English better than you, maybe there's no ceiling, or the ceiling is still far away."* ## [52:14] Quickfire 最后 8 分钟快问快答覆盖了 TPU 投资历史、给年轻研究员的算力直觉、当下 AI 阶段的总体感受。Oriol 留下一句总结:"I think it's a fascinating time as anything in AI"。Jacob 用 podcast 致谢和 outro 结束。 > *"I think it's a fascinating time as anything in AI."* ## Entities - **Jacob Effron**(人物):Redpoint Ventures Managing Director,Unsupervised Learning 主持人。 - **Oriol Vinyals**(人物):Google DeepMind VP of Research,Gemini 联合负责人(与 Noam Shazeer、Jeff Dean 并列)。 - **Gemini**(产品):Google 的旗舰多模态 / agent 模型族;本期主要谈 I/O 第二天的发布。 - **Omni**(产品):Google 的多模态产品线,被用作"video / image 的 GPT moment"参照系。 - **Spark**(产品):I/O 发布的 consumer agent 产品。 - **World Model**(概念):可被语言驱动的世界 renderer;representation learning 是其核心要素。 - **Bitter Lesson**(概念):Sutton 的论点;本期延伸为"scaffold 长期应由模型自己写"。 - **Memory / Continual Learning**(概念):非参数 file-system memory vs 把记忆塞进权重;consolidation 是关键难点。 - **Post-Training RL**(概念):相对预训练的算力投入还很少,被定性为 greenfield。 - **Move 37**(概念):AlphaGo 那一手;Oriol 用它指代"真正的 RL/research breakthrough"基准。

#unsupervised-learning#redpoint-ai#oriol-vinyals
SpaceX's $2T Case, Nvidia's Shock Selloff, America Turns on AI, Trump Pulls AI Order, Bond Crisis?
1:42:00
EN/ZH
Watch with Captions
All-In Podcasthace 4 días

SpaceX's $2T Case, Nvidia's Shock Selloff, America Turns on AI, Trump Pulls AI Order, Bond Crisis?

Sacks is out, Gavin Baker (Atreides Management) sits in. The panel walks through Andrej Karpathy's surprise move to Anthropic, debates why the public mood on AI has flipped, tears apart SpaceX's $2T S-1, and asks why Nvidia's blowout earnings still saw the stock sold. Friedberg and Chamath also flag warning signals from inflation, oil, and bond yields, and close on what — if anything — came out of the US-China summit. ## [00:00] Gavin Baker joins the show! Jason opens episode 274 noting Sacks is out and welcomes Gavin Baker from Atreides Management for the week. They tee up the agenda: SpaceX and OpenAI IPOs, Karpathy to Anthropic, and Nvidia's earnings. > *"Sachs is out today, but we're very lucky to have Gavin Baker from Atreides Management joining us. The spicy takes must flow."* ## [00:30] Andrej Karpathy joins Anthropic; hypergrowth and profitability The Karpathy hire is read as a major strategic win for Anthropic — Chamath frames it as continuity of the Richard Sutton "bitter lesson" school of scaling that Karpathy executed at Tesla FSD and OpenAI. Gavin layers in financial context: Anthropic was EBIT-positive in the last quarter per the WSJ, which combined with hypergrowth makes the recent funding rounds look very different from a capital-burn narrative. Friedberg pushes back on the framing that models will soon "feed themselves" into context windows to self-improve, but flags that papers (one from MIT) suggest large efficiency gains are on the horizon. Chamath uses the moment to argue the podcast itself has to start telling the upside story of AI — the doctors, the scientists, the unlock — because the dominant public narrative has gone negative. > *"He was probably the first person that really commercialized the Richard Sutton bitter lesson essay when he was leading FSD at Tesla."* ## [12:42] Why Americans have turned on AI, anti-human perception Gavin shares a personal story: his daughter has a rare disease, and a Stanford scientist he funded is months away from what he believes is a complete cure, made tractable by AI-accelerated biology. He uses it to argue for an optimistic posture — a future where work is optional and disease is solvable — and warns that the people pushing for AI regulation are also shaping how the public feels about the technology. Friedberg goes deeper into the cultural mechanics: AI is being framed as anti-human in a way that mirrors anti-nuclear and anti-industrial backlashes of the 20th century. He argues the United States can't unilaterally slow down because China and others won't — and tries to separate genuine safety concerns from elite class anxiety. Chamath then makes a pointed observation that none of the survey data on AI job loss actually asks the truck drivers, package sorters, and ICU nurses themselves how they feel about the tools. > *"We're listening too much to the inventors of AI. They're geniuses. They're smart. We need to be listening to the frontline factory workers who are using AI saying, 'Wow, I was able to add a third shift.'"* ## [27:22] Trump pulls AI EO, US-China AI relationship, dystopian AI layoffs A Trump AI executive order was scrubbed at the last minute — the panel walks through what was reportedly in it (review of frontier-model training runs) and whether any pre-release regulatory framework is workable. Jason argues a state-by-state patchwork is the more likely outcome regardless of what Washington does. The conversation pivots to Meta's latest round of layoffs and the way they were communicated. Gavin and Jason agree the messaging — leaning on "AI productivity gains" as the public reason — landed badly even with people who accept the underlying logic, and Jason argues it became a case study in how *not* to message AI-driven workforce changes. > *"Because the reality is that if this is the way that you're going to message something as critical as this, I think you did a horrible job."* ## [45:19] SpaceX S-1 tear down! Breaking down the three major businesses and the case for a $2T valuation SpaceX filed its S-1 on Wednesday. Jason breaks the company into three businesses: launch (which could be hundreds of millions of paying subscribers via Starlink), Elon Web Services / xAI / Colossus compute, and rockets. The AI-cloud line item alone is around $15B and growing roughly 2x year over year, anchored by an Anthropic deal Gavin calls "extraordinary." Gavin then makes the case that Colossus matters because raw gigawatt-class data centers are now the binding constraint, and SpaceX-adjacent build velocity is the moat. He uses Cursor's Composer 2.5 release — Pareto-dominant on three or four weeks of RL training — as evidence that whoever owns the compute owns the next model generation, and walks through why rapid reusability on Starship compresses the unit economics of getting payload to orbit faster than any competitor can model. > *"If you look at who's actually capable of delivering a gigawatt data center, these guys are the closest, like an actual gigawatt."* ## [71:22] Nvidia smashes earnings but stock falls, why people are shorting chips Nvidia blew out earnings again — 20% sequential growth would be a high-growth print for any other company, the dividend was raised 25x, and the CFO committed to returning 50% of free cash flow. Yet the stock sold off, and Leopold Aschenbrenner's reported pivot away from chip exposure is being read as a smart-money signal. Gavin takes the bear case apart: at current PE Nvidia is cheap relative to growth, and the segment breakdown obscures how much the "AI clouds" line is dragging the multiple. He flags that the true useful life of a GPU is closer to two years than five, which means the reported profits of every hyperscaler running these chips are overstated — a real concern, not a stock-killer. He also notes Nvidia's CPU business is on track to do $20B this year, making it overnight one of the largest CPU manufacturers in the world. > *"The true lifespan of a GPU is more like two years and therefore the profits of all these businesses are overstated."* ## [82:25] Market update: Flashing red signals, oil, inflation, yields up The macro snapshot: May inflation expected at 4.2%+, Fed rate-hike odds back on the table, UK yields at the highest since the great financial crisis, oil and gold both moving. Chamath warns that when the currency-debasement mechanism finally breaks, the downside is non-linear. Gavin counters with relative optimism on the US: America is self-sufficient in energy, the AI build-out is structurally good for re-industrialization, and even in an ugly global scenario the US is the least-bad place to be invested. He flags AI fundamentals also have a seasonality that investors are starting to model — the same way e-commerce and subscription businesses do. > *"While it's terrible for everyone, it is relatively the best for America because we are self-sufficient in energy."* ## [92:45] China trip flops, or was progress made behind the scenes? A 48-hour US tech-CEO-plus-president trip to Beijing produced thin public deliverables: some soybeans, some H100/A200 sales to Chinese players. The panel asks whether that's the real story or just the visible surface, and whether the immediate China-Russia bonding moment afterward says more about the trajectory than any handshake photo. Gavin argues the more important read is structural: keeping America ahead in AI requires keeping the trans-Pacific relationship just stable enough to avoid a full decoupling shock, and that's a defensible strategic logic even if the optics are unsatisfying. He also paints a what-if scenario around the Strait of Hormuz to make the point that energy independence is what gives the US the option to act asymmetrically. Jason closes with thanks to Gavin and an invite back to the Summit. > *"There's sound arguments that this is stabilizing for the world and is the best highest probability path for keeping America ahead in AI."* ## Entities - **Jason Calacanis** (Person): Host, LAUNCH founder, MC of this episode. - **Chamath Palihapitiya** (Person): Host, Social Capital CEO; pushed the "listen to frontline AI users" framing. - **David Friedberg** (Person): Host, The Production Board CEO; led the cultural / historical analysis of the AI backlash. - **Gavin Baker** (Person): Guest host, Atreides Management founder/CIO; carried the investing thread across SpaceX, Nvidia, and macro. - **Andrej Karpathy** (Person): Joining Anthropic's new pre-training team; OpenAI co-founder, ex-Tesla FSD lead. - **Anthropic** (Organization): Hired Karpathy; EBIT-positive last quarter per WSJ; $15B AI-cloud deal with SpaceX-adjacent compute. - **SpaceX** (Organization): Filed S-1; three businesses (launch/Starlink, Elon Web Services compute, rockets); $2T valuation case. - **Nvidia** (Organization): Earnings blowout but stock sold off; $20B CPU run-rate; $5.3T market cap. - **Cursor** (Software): Composer 2.5 model release used as proof of fast RL-driven catch-up dynamics. - **Richard Sutton's bitter lesson** (Concept): Scaling beats clever architectures — framing for why Karpathy's move matters. - **GPU useful life** (Concept): Closer to ~2 years than ~5, so hyperscaler reported profits are overstated. - **Strait of Hormuz scenario** (Concept): Energy-independence-as-strategic-option argument for the US in the China game.

#all-in-podcast#spacex#nvidia
Trading signals that trade themselves
20:45
EN/ZH
Watch with Captions
Claudehace 5 días

Trading signals that trade themselves

Tushara Fernando, Head of Data and AI at Man Group, explains how the firm integrates AI into systematic trading by codifying decades of institutional knowledge into "skills." She emphasizes that robust governance and shared workflows are essential for moving AI from individual productivity tools to enterprise-scale agentic platforms. ## [00:18] AI in Systematic Trading Man Group manages over $200 billion in assets, making the stakes for AI implementation exceptionally high for their institutional clients. Tushara Fernando describes systematic trading as an algorithmic process that uses historical backtesting to evaluate investment signals, much like managing a fantasy football team. > *A trading signal is really just this with stocks... We want to back the ones that would make money and we want to short the ones that won't.* > *[2, 43]* ## [04:38] The Role of AI-Generated Signals Man Group currently runs trading signals in production that were entirely researched, backtested, and proposed by AI. While humans review the final output for sensibility, AI handles the data acquisition, strategy proposal, and productionization of these investment ideas. > *There are trading signals running right now in production at Mang Group... that were researched, back tested and proposed by AI.* > *[4, 38]* ## [05:52] The Importance of Shared Workflows The success of a trading signal depends on the underlying workflows, such as data cleaning and outlier detection, which Fernando compares to the submerged part of an iceberg. Without shared workflows, different teams produce inconsistent results, making it impossible to compare the effectiveness of various strategies. > *If different teams are running different versions of those workflows, you get different answers.* > *[6, 50]* ## [08:43] Lessons in Skills Governance Early attempts at AI adoption failed because power users, rather than process owners, were building "skills," leading to local optimizations and errors like hardcoded cost centers. To solve this, Man Group created a governed marketplace where skills are owned by workflow owners, tested with evaluations, and tracked for usage. > *Treat those skills like production code because that's what they will become.* > *[17, 21]* ## [16:40] Scaling AI Across the Enterprise Man Group has scaled AI usage to nearly half its workforce by focusing on organizational context as a competitive moat. By treating skills as a library of institutional knowledge, the firm is preparing for a future where swarms of agents leverage these capabilities to find new investment opportunities. > *Skills governance really unlocks AI at that enterprise scale.* > *[19, 21]* ## Entities - **Tushara Fernando** (person): Head of Data and AI at Man Group. - **Man Group** (organization): An alternative investment manager with over $200 billion of assets under management. - **Claude** (product): An AI model used by Man Group for research, backtesting, and workflow automation. - **Anthropic** (organization): The AI company that assisted Man Group with skills workshops and implementation. - **Systematic Trading** (concept): Algorithmic trading capabilities that look across thousands of securities and hundreds of markets. - **Backtesting** (process): The process of running a trading strategy against historical data to evaluate its performance. - **Sharpe Ratio** (metric): A statistical factor that compares the volatility of a strategy versus its returns. - **Skills Marketplace** (product): Man Group's internal library for governed AI skills, plugins, and institutional knowledge.

#systematic-trading#ai-governance#man-group
The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman
30:34
EN/ZH
Watch with Captions
No Priors: AI, Machine Learning, Tech, \u0026 Startupshace 5 días

The Story Behind Cerebras’ $63 Billion IPO with Founder and CEO Andrew Feldman

Andrew Feldman, CEO of Cerebras, details the company's journey from a controversial 'wafer-scale' architecture to a $63 billion public valuation. He explains how their radical hardware design delivers 15-20x faster AI inference than traditional GPUs, enabling new business models and a fundamental reorganization of productivity. ## [00:00] – Cold Open Andrew Feldman compares the impact of AI speed to Netflix's transition from DVD delivery to streaming, noting that extreme speed opens entirely new business models. He predicts a fundamental reorganization of productivity as AI moves beyond basic coding and design tasks. > *that's what happens with speed and I think that's what fast AI does right now [00:10]* ## [00:41] – Andrew Feldman Introduction Host Sarah Guo introduces Andrew Feldman and highlights Cerebras' recent IPO and its current $63 billion market cap. The discussion frames the company's transition from early machine learning research to dominating the foundation model inference market. > *Serbust recently went public and is currently worth about $63 billion in the stock market. [00:54]* ## [00:48] – Cerebras’ Evolution Feldman describes Cerebras as a builder of AI-optimized computers that outperform GPUs by up to 20x in inference tasks across all model sizes. He attributes their recent success to AI models becoming smart enough for daily utility in 2025, leading to massive contracts with OpenAI and AWS. > *we're the the fastest at inference, not by little, but by a lot, 15, 18, 20x faster than GPUs. [01:39]* ## [02:17] – Wafer-Scale Bet Pays Off The conversation explores Cerebras' unique 'wafer-scale' architecture, which utilizes a single chip the size of a dinner plate. Feldman argues that radical performance improvements require radical designs, noting that critics initially dismissed the approach as impossible. > *we chose wafer scale, which means we build a 46,000 square millimeter chip, a chip the size of a dinner plate [03:39]* ## [06:38] – Challenges and Breakthroughs Feldman recounts a high-stakes period between 2017 and 2019 when the team struggled to make the technology work while spending $8 million monthly. He emphasizes that while the technical breakthrough occurred in 2019, market demand only exploded once AI became an essential daily tool. > *We had a period between about 2017... and middle of 2019 where we couldn't build it. [07:34]* ## [08:37] – Crossing the Market Chasm Feldman describes the early years where Cerebras had superior technology but struggled to find a market, eventually finding success in supercomputing labs. A pivotal $1 billion order from sovereign partner G42 provided the capital and scale necessary to battle-test their hardware and prepare for the AI explosion. > *We had a 2 or three year period where we were ahead of the market and absolutely nobody cared that we were blisteringly fast. [09:00]* ## [10:38] – Scaling Software and Hardware Scaling a hardware company involves physical constraints like manufacturing lines, power requirements, and test fixtures that software companies do not face. Feldman also highlights the long-term nature of deep tech development, noting that building a high-quality compiler takes nearly a decade of engineering effort. > *When you're building things... you have to call your manufacturing partner... Each step takes real time and effort to grow. [11:24]* ## [12:03] – Relevance of AI-Generated Coding Cerebras has aggressively adopted AI-generated coding, with token spending per engineer increasing significantly to support the use of autonomous agents. Feldman observes that certain engineers are becoming '100x' contributors by governing multiple agents for coding and QA tasks. > *They've moved their coding style to being one in which they govern agents... they've gone from being sort of 10x guys to being 100x guys. [13:12]* ## [13:31] – Leadership and Hiring Culture With a $20 billion backlog and a growing team of over 800 people, Feldman emphasizes the need to avoid corporate malaise by continuing to take extraordinary risks. He views himself as a 'professional David' who thrives on solving problems that others deem impossible while competing against Nvidia. > *We would much rather fail in pursuit of the extraordinary than succeed in the ordinary. [15:01]* ## [17:16] – When to Quit vs. Persist Andrew Feldman describes himself as a 'professional David' who thrives on competing against larger incumbents through intellectual superiority. He emphasizes that founders must guard against the 'slippery slope' of persistence by using external mentors to hold them accountable to their original hypotheses. > *The slippery slope is a beast... you have to guard against it. [18:32]* ## [19:40] – Why Cerebras Went Public The transition to a public company is framed as a way to reduce the cost of capital and gain legitimacy with large-scale corporate clients. Feldman notes that Cerebras chose the IPO path to differentiate itself as the market's only 'AI pure play' revenue stream. > *For us it was an opportunity to graduate from corporate adolescence to corporate adulthood. [23:22]* ## [22:57] – The OpenAI Deal Feldman recounts the intense four-and-a-half-week period during which Cerebras finalized a $20 billion deal with OpenAI, driven by a sudden demand for fast inference. The deal moved at an unprecedented pace, involving constant work through the holiday season to meet technical requirements. > *For a 20 plus billion dollar deal to do it in four and a half weeks was exceptional. [24:59]* ## [25:54] – Open Source and Post-Trained Workloads Andrew Feldman highlights how the open-source ecosystem sustains market interest and pressures closed-source developers to innovate. He emphasizes that seeing external developers build creative solutions on Cerebras hardware is a core motivation for the company's infrastructure goals. > *You got to love other people's ideas to take flight on on what you built. [28:04]* ## [27:37] – How Speed Opens Up New Business Extreme speed in AI enables fundamental shifts rather than just incremental improvements, using Netflix's transition from DVDs to streaming as a primary example. Feldman argues that the ambition for speed is a competitive advantage, as seen in the rapid construction of data centers. > *when the internet got fast they became a movie studio right that's what happens with speed [28:38]* ## [30:07] – Conclusion Drawing parallels to the PC and cloud revolutions, Feldman predicts that AI will move beyond replacing specific tasks to fundamentally reorganizing how work is performed. This shift is expected to trigger massive jumps in global productivity as new business models emerge around the technology. > *once we start sort of fundamentally reorganizing around this, you're going to see this sort of new business models and fundamental jumps in productivity. [29:53]* ## Entities - **Andrew Feldman** (person): Co-founder and CEO of Cerebras - **Cerebras** (organization): AI hardware company known for wafer-scale engine technology - **OpenAI** (organization): AI research organization that signed a multi-billion dollar deal with Cerebras - **G42** (organization): A sovereign AI and technology holding company that placed a $1 billion order with Cerebras - **Nvidia** (organization): Leading GPU manufacturer and dominant competitor in the AI chip market - **Sarah Guo** (person): Host of No Priors and venture capitalist - **AWS** (organization): Amazon's cloud computing division deploying Cerebras hardware - **Netflix** (organization): Used as an analogy for how speed changes business models from delivery to production

#ai-hardware#wafer-scale-engine#semiconductor-industry
Notion’s Ivan Zhao: The Refounder
1:03:06
EN/ZH
Watch with Captions
Sequoia Capitalhace 5 días

Notion’s Ivan Zhao: The Refounder

Brian Halligan interviews Notion co-founder Ivan Zhao on his journey as a 'refounder' who navigated the company through its 2015 Kyoto restart and the 2023 generative AI pivot. Zhao details Notion's transition from a traditional SaaS structure to an AI-native 'jazz band' model that prioritizes technical versatility, taste, and agency over rigid hierarchies. The discussion explores how AI acts as the 'steel' for modern organizations, enabling flatter structures and faster, more reversible decision-making. ## [00:00] Introduction Brian Halligan introduces Ivan Zhao as the 'refounder' of Notion, highlighting his unique ability to restart the company during critical junctures in 2015 and 2023. The conversation sets the stage for Zhao's transition from a traditional SaaS management model to an AI-native organization. Halligan compares Zhao's approach to other tech visionaries like Jack Dorsey, emphasizing the importance of personal style and 'taste' in building a lasting brand. > *I like to think of him as the refounder... he's the canonical example of how a SAS company can move and become an AI company. [00:52]* > *We want to be a jazz band, not a marching band. [00:02]* ## [02:22] From Founder Mode to AI Org Ivan Zhao discusses his detour into traditional delegation and professional management before returning to a hands-on 'founder mode' necessitated by the AI shift. He explains that building with language models is less like predictable bridge engineering and more like 'brewing beer,' where the underlying technology dictates the development path. Zhao emphasizes hiring 'jazz band' people—versatile individuals like designers who code—to navigate the experimental nature of AI integration. > *Building with language model... is like brewing beer. You can't truly predict the things the underlying thing. [06:33]* > *The spirit is technology first-driven development rather than customer-driven first development. [07:01]* ## [11:00] Hiring for Taste and Agency Notion utilizes a 'barbell' hiring strategy that targets both super-junior and super-senior talent while avoiding the 'middle' of traditional SaaS experience. Zhao defines talent as the product of capability, taste, and agency, noting that AI has democratized basic capabilities like coding and writing. Consequently, the company now optimizes for 'agency' and 'taste,' qualities that remain difficult to automate and serve as the primary differentiators for the brand. > *capability got normalized democratized and taste becomes still important [11:53]* > *So the shape it's not it's more like the barbell barbell shape, right? [12:35]* ## [24:28] Refounding Notion in Kyoto In 2015, facing potential failure and low morale, Zhao and co-founder Simon Last laid off their entire staff and relocated to Kyoto, Japan, to rebuild Notion from scratch. This 'Kyoto Reset' allowed them to focus entirely on craft and coding while living a minimalist lifestyle. Zhao chose Kyoto specifically for its status as the 'craft capital of Asia,' which provided the spiritual inspiration needed to view software as a fundamental human tool. > *So my co-founder and I said let's just lay off everybody just go by the two of us. That's the Japan story. [25:41]* > *The story we tell ourselves is like Kyoto is a special place. If you can pull off anywhere, you can pull off from Reborn in Kyoto. [28:05]* ## [30:27] Craft Versus Commerce Zhao views Notion as part of a historical lineage of 'tools for thought,' tracing back to pioneers like Douglas Engelbart and Alan Kay. He criticizes modern Silicon Valley 'tinker culture' for ignoring the history and humanity behind technology. For Zhao, the goal is to find an equilibrium between the pure craft of an artist and the commercial viability of a business, ensuring the product has a 'soul' that resonates with users. > *Tech is like industry doesn't know its past. If you don't know his past you don't know history which is humanity. [31:52]* > *I need to be in equilibrium with my own value of what this company I want to build... [51:33]* ## [32:26] When to Refound For founders whose companies are stagnating, Zhao suggests listening to the 'inner urge' to take drastic action rather than wasting years on ventures without momentum. He argues that refounding is often harder than starting fresh because it involves taking a significant step back to pivot toward a new growth engine. Zhao believes the current AI-driven market is wide open, making it an ideal time for founders to be risk-seeking and follow their intuition. > *For me it's like there's you just feel you have to do something drastic... then you feel liberated once you land in Japan. [32:56]* > *The refounding is harder than it looks. It typically involves like a big step back and two steps forward. [59:57]* ## [34:07] GPT-4 Refounding Shock Zhao describes gaining early access to GPT-4 as a 'full body religious experience' that signaled a fundamental shift in the world. This realization forced a second refounding of Notion, as Zhao felt any work not involving this technology would soon become meaningless. The transition included a grueling 18-month period of low morale while the team waited for the underlying AI models to catch up with their ambitious product vision. > *GBD4 is a religious experience for me. It's like holy [ __ ]... anything you do if you don't do this it will be meaningless. [34:27]* > *that was like a year and a half just go with no error and morale is definitely low [35:50]* ## [45:35] Leadership and Founder Energy Despite being naturally introverted, Zhao explains how he forced himself to master one-to-many communication to build trust within Notion. He maintains a disciplined daily routine, starting at 7 AM and often working until midnight, while using 'guilty pleasure' reading to recharge. To prevent organizational calcification, Notion aggressively acquires startups to bring in 'founder energy,' currently employing over 50 former founders who lead critical domains. > *To lead the group of human you need to do one to many communications otherwise people don't trust you. [46:17]* > *founders are are kind of this kind of like little decalcified meatthead machinery just trying to break things [39:10]* ## [53:17] Sales Culture and Closing Thoughts Notion's transition to enterprise sales involved moving away from 'first-principle' experimentation toward established playbooks, pairing system thinkers with high-energy sales leaders. The conversation concludes with a vision of the 'AI-native' CEO playbook, which replaces traditional 'triangle' hierarchies with a 'circular' model. In this structure, a centralized AI system saturated with company context enables smaller teams to move at breakneck speed with reversible decision-making. > *You should only have each company should only preserve your innovation point to few places... [54:54]* > *All of those kind of one-way doors that Bezos used to talk about are really two-way doors... [62:39]* ## Entities - **Ivan Zhao** (person): Co-founder and CEO of Notion, known for his 'refounder' mindset. - **Brian Halligan** (person): Co-founder of HubSpot and interviewer. - **Notion** (organization): A productivity software company that pivoted to an AI-native model. - **Simon Last** (person): Co-founder of Notion who helped rebuild the company in Kyoto. - **Kyoto** (location): The Japanese city where Notion was restarted in 2015. - **GPT-4** (technology): The AI model that triggered Notion's second refounding. - **Steve Jobs** (person): Former Apple CEO cited as an inspiration for refounding and craft. - **Jack Dorsey** (person): Tech leader mentioned for his AI-centric organizational redesign. - **Douglas Engelbart** (person): Computing pioneer in the 'tools for thought' lineage. - **Erica** (person): CRO of Notion and former CRO of GitHub. - **SaaS** (concept): Software as a Service, the industry context for Notion's evolution. - **Jazz Band** (concept): Metaphor for a flexible, high-agency organizational structure.

#notion#ivan-zhao#ai-strategy
AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona
1:11:40
EN/ZH
Watch with Captions
Latent Spacehace 5 días

AI Agents Need Computers: 74% MoM Growth, 850K/Day Runs, & New Agent Cloud — Ivan Burazin, Daytona

Ivan Burazin, CEO of Daytona, discusses the massive shift from building developer environments for humans to providing composable computers for AI agents. With 74% month-over-month growth and 850,000 daily runs, Daytona provides the bare-metal infrastructure required for stateful, high-performance agentic workflows. This conversation explores the technical challenges of spiky compute, the $10 trillion computer-use market, and why the future AI cloud will look more like Stripe than AWS. ## [00:00] Hook Ivan Burazin describes the intense, direct demand for Daytona's infrastructure, with potential users calling him personally to request access. This level of interest signaled a massive, untapped market for providing execution environments to every future AI agent. The team realized they had identified a critical missing piece in the AI development stack. > *I've never experienced this that people literally call you if you do not give them access. Like they want access right now.* > *[0, 0]* > * ] }, { * > *title": "Introduction* > *{'start': 72.0, 'summary': "Host swyx introduces Ivan Burazin, noting their shared history in the developer experience and 'end of localhost' movements. Ivan recalls reaching out to swyx years ago for advice on developer experience while working at a previous role. They reflect on how their early interactions and mutual interests in cloud-based development tools eventually led to their current collaboration.", 'quotes': ['I was one of the co-founders of code anywhere... we were thinking a long time of like local host should die.', [1, 36], '\n ]\n },\n {\n ', 'title": "CodeAnywhere', 'Shift', 'and the end of localhost', {'start': 195.0, 'summary': 'Ivan discusses his long history with his co-founder, dating back to early 2000s virtualization and the creation of CodeAnywhere. As the first browser-based IDE, CodeAnywhere predated modern infrastructure like Docker and Kubernetes, which provided the team with deep foundational knowledge. After a successful run with the Shift developer conference, they returned to their infrastructure roots to launch Daytona.', 'quotes': ['We originally started stacking stacking servers doing like virtualization in the early 2000s... and that was a services company which we sold.', [3, 38], '\n ]\n },\n {\n "title": "What Daytona is: composable computers for AI agents",\n "start": 358.0,\n "summary": ', "Ivan defines Daytona as a provider of 'composable computers' for AI agents", "moving beyond the limited industry term 'sandboxes.' He explains that agents require diverse computing environments tailored to specific tasks", 'much like different hardware setups for human professionals. This API-driven infrastructure allows agents to execute code in production-grade environments rather than just temporary test boxes.', {'quotes': ['What Daytona is today is essentially composable computers for AI agents... the market calls them sandboxes which [is] misleading.', [6, 41], '\n ]\n },\n {\n ', 'title": "The pivot from dev environments to AI sandboxes', {'start': 487.0, 'summary': "Ivan explains how observing early agents like Devon and OpenHands led to a realization that AI agents require a dedicated compute runtime. While their initial SaaS offering for human automation saw low traction, it attracted developers who specifically needed sandboxes for their agents. This feedback loop revealed a massive, underserved market for agent-specific infrastructure that standard cloud providers weren't addressing.", 'quotes': ['a lot of people reached out that were building agents and they were like hey my agent needs a compute sandbox runtime', [8, 50], '\n ]\n },\n {\n ', 'title": "The New Year’s Eve MVP and customers begging for API keys', {'start': 617.0, 'summary': "On New Year's Eve, Ivan 'vibe-coded' the first MVP of what would become the new Daytona. Although the CTO initially dismissed the code as 'garbage,' the core idea was strong enough to warrant a two-week professional rebuild. When they demoed this version to previous skeptics, the response was immediate and overwhelming, with users demanding API access before the calls even ended.", 'quotes': ["I've never experienced this that people literally call you if you do not give them access.", [12, 18], '\n ]\n },\n {\n ', 'title": "Bare metal', 'stateful sandboxes', 'and Daytona’s scheduler', {'start': 776.0, 'summary': "The team approached the technical architecture from first principles, deciding to run on bare metal rather than traditional VMs. They aimed to combine the speed of AWS Lambda with the stateful, long-running nature of an EC2 instance. This allows agents to 'pause and come back' to their work, much like a human closing a laptop lid, without losing state or performance.", 'quotes': ["agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work", [13, 57], '\n ]\n },\n {\n ', 'title": "60ms startup', 50, 0, 'sandboxes', 'and 850K daily runs', {'start': 1048.0, 'summary': "Daytona's infrastructure is optimized for both individual speed and massive concurrency, with a single instance spinning up in just 60 milliseconds. This scale supports high-volume customers who perform nearly 850,000 runs daily, with some requesting capacity for half a million concurrent CPUs. The system utilizes a custom scheduler and local NVMe drives to eliminate network latency and maximize IOPS.", 'quotes': ['Our time to spin up one is 60 milliseconds with network latency... if you want to spin up 50,000 at once, we are now at about 75 seconds.', [17, 40], ',\n ', 'The biggest customer of ours does like about 850', 0, "every single day is sort of where they're where they're just shy of a million.", [18, 17], '\n ]\n },\n {\n ', 'title": "Spiky RL/eval workloads and the new agent infra problem', {'start': 1313.0, 'summary': "The 'spiky' nature of AI workloads presents a major challenge for compute providers, leading to a mean utilization rate of only 15% despite peaks hitting 90%. Workloads are categorized into 'background agents' that follow human cycles and 'evaluations/RL' which fire off massive bursts of activity at unpredictable hours. To manage this, Daytona must use capacity commits to handle sudden bursts of 100,000 or more CPUs.", 'quotes': ["Daytona's mean utilization is 15%... because it's very spiky. But it's very spiky but we get up to 90%.", [23, 1], '\n ]\n },\n {\n ', 'title": "RL workloads', 'Kubernetes pain', 'and dynamic resizing', {'start': 1692.0, 'summary': "Daytona competes primarily against managed Kubernetes services like EKS and GKS, positioning itself as a more ergonomic 'Twilio or Stripe' for compute. Unlike Kubernetes, Daytona offers a seamless API for spinning up sandboxes with significantly faster startup times. A key advantage is the ability to dynamically resize sandboxes on the fly to prevent out-of-memory (OOM) errors, a feature difficult to implement on other platforms.", 'quotes': ["Daytona although it's a compute provider it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS", [29, 46], '\n ]\n },\n {\n ', 'title": "Why every AI agent needs a computer', {'start': 2011.0, 'summary': "Ivan outlines the massive scale of knowledge work, estimating a $50 trillion global salary pool, much of which is locked in legacy Windows applications. He argues that true automation requires 'human emulators' that can interact with these legacy systems via GUIs when APIs are incomplete. By automating 40% of this work, the market opportunity for agentic computer use reaches approximately $10 trillion annually.", 'quotes': ['If you take 40% of that, you get to essentially like 10 trillion dollars a year.', [35, 20], '\n ]\n },\n {\n ', 'title": "macOS sandboxes and Apple’s licensing problem', {'start': 2328.0, 'summary': "The discussion shifts to the difficulties of hosting Mac OS sandboxes compared to Windows and Linux. Apple's restrictive licensing only allows two parallel VMs per machine and requires a 24-hour lock-in for users, making per-second billing economically unfeasible. Furthermore, security restrictions prevent moving memory snapshots between physical machines, severely limiting the scalability of agentic workloads on Mac hardware.", 'quotes': ['Apple is shooting itself in the foot... if it would just enable a concurrency model similar to what you can get on a Windows.', [40, 52], '\n ]\n },\n {\n ', 'title": "Why CLI may matter more than MCP', {'start': 2668.0, 'summary': "The discussion compares the Model Context Protocol (MCP) to the Command Line Interface (CLI) for agentic action. While MCP acts as an interface for APIs, the CLI allows agents to execute scripts and perform deep data analysis within a sandbox. This layer of indirection enables more complex agentic workflows beyond simple data retrieval, allowing agents to actually 'do things' rather than just integrate.", 'quotes': ['the MCP is an interface against an API whereas the CLI is like you can actually go do things... the difference between integrations and actually running scripts.', [45, 34], '\n ]\n },\n {\n ', 'title": "Open source', 'GitHub stars', 'and agent integration', {'start': 2891.0, 'summary': "Ivan details Daytona's transition to an AGPLv3 license for its sandbox product to balance openness with commercial protection. This 'copyleft' approach allows enterprise use but prevents competitors from building proprietary forks without contributing back. Keeping the core engine transparent builds trust with users and allows large enterprises to bypass lengthy security audits by providing agents with full context.", 'quotes': ["in the new sandbox product we did add a AGPL3... you essentially can't make a competitor without open sourcing your stuff.", [49, 49], '\n ]\n },\n {\n ', 'title": "Git', 'CI/CD', 'and agent collaboration bottlenecks', {'start': 3191.0, 'summary': 'Current versioning systems like GitHub are often too slow for the high-velocity output of AI agents, leading to bottlenecks in CI/CD pipelines. Some developers are creating makeshift solutions like dumping codebases into JSON files on S3 to bypass Git overhead. There is a growing need for an agent collaboration layer that precedes the traditional Git-based pipeline to handle companies generating over 1,000 PRs per day.', 'quotes': ["GitHub as-is was an overhead... it wasn't fast enough what they needed.", [54, 3], '\n ]\n },\n {\n ', 'title": "Founder life and building a 25-person infra company', {'start': 3495.0, 'summary': "Daytona's success stems from a core team of 13 people who have worked together for over seven years, fostering a high-trust culture. Ivan acknowledges the difficulty of the founder journey, including being away from family, but posits that growth requires 'pain.' He views his work as building the spiritual successor to serverless and Kubernetes for the agent era, requiring radical responsiveness as a differentiator.", 'quotes': ['Of the 25 people in Daytona, I think about 13 of them we have worked with seven years plus.', [58, 57], '\n ]\n },\n {\n ', 'title": "AI SaaS', 'token resale', 'and API-first business models', {'start': 3764.0, 'summary': 'Ivan presents a critical take on the SaaS ecosystem, arguing that the market is incorrectly applying a premium to vendors who simply resell AI tokens. He points out that these models have significantly worse margins than traditional SaaS. Instead, he advocates for companies to expose their data via APIs and charge for consumption, allowing for actual revenue acceleration through increased agentic usage.', 'quotes': ["The market is adding premium to SAS vendors that are reselling tokens. And I think that's incorrect.", [62, 54], '\n ]\n },\n {\n "title": ', 'GPU sandboxes', 'data centers', 'and compute growth', {'start': 3970.0, 'summary': 'Daytona plans to introduce GPU sandboxes to support workloads like 3D rendering and reinforcement learning on CAD, rather than focusing on inference. While the company currently runs on bare metal via colocation providers, Ivan notes they are architected to potentially own data centers in the future. He currently avoids the high capital risk of building data centers for single-digit margin gains.', 'quotes': ['We will [offer GPUs], but not for inference. Like essentially what we think about is like the GPU sandbox.', [66, 21], '\n ]\n },\n {\n ', 'title": "Why the AI cloud may look more like Stripe than AWS', {'start': 4188.0, 'summary': "The conversation concludes by imagining the 'AWS for AI Agents,' which Ivan suggests might look more like Stripe than a traditional cloud provider. This future 'AI Cloud' will integrate sandboxes, web search, and databases as fundamental primitives. While companies like Cloudflare and OpenAI are competing for this space, Ivan hints that many more infrastructure primitives for agents are yet to be developed.", 'quotes': ["There will be a cloud built out specifically for agents and so that cloud will have sandboxes and it will have web search and it'll have databases.", [70, 47], '\n ]\n },\n {\n ', 'title": "Closing thoughts', {'start': 4286.0, 'summary': 'The discussion ends with the observation that the AI infrastructure market is growing at an unprecedented baseline of 40-75% month-over-month. Ivan and swyx reflect on the race to secure hardware and the shift toward specialized agent clouds that will define the next decade of computing.', 'quotes': ["The entire infrastructure market is growing 40% plus or minus month over month... if you're not growing 40%ish... you don't have to come to work.", [68, 23], '\n ]\n }\n ],\n ', 'entities": [\n {\n "name": "Ivan Burazin', {'type': 'person', 'description': 'CEO of Daytona and co-founder of CodeAnywhere.'}, {'name': 'swyx', 'type': 'person', 'description': 'Host of Latent Space and early investor in Daytona.'}, {'name': 'Daytona', 'type': 'organization', 'description': 'A company providing composable computers and sandboxes for AI agents.'}, {'name': 'CodeAnywhere', 'type': 'organization', 'description': 'The first browser-based IDE, co-founded by Ivan Burazin.'}, {'name': 'Devon', 'type': 'product', 'description': 'An early AI software engineer agent.'}, {'name': 'OpenHands', 'type': 'product', 'description': 'An open-source AI agent project formerly known as OpenDevin.'}, {'name': 'Kubernetes', 'type': 'technology', 'description': "Orchestration technology mentioned as a competitor to Daytona's ergonomic API."}, {'name': 'Apple', 'type': 'organization', 'description': 'Mentioned regarding restrictive Mac OS virtualization licensing.'}, {'name': 'Salesforce', 'type': 'organization', 'description': 'Cloud-based software company mentioned for its API-first strategy.'}, {'name': 'GitHub', 'type': 'organization', 'description': 'Developer platform noted for being a bottleneck in agentic CI/CD workflows.'}, {'name': 'Nvidia', 'type': 'organization', 'description': 'The primary provider of GPUs whose supply constraints dictate market growth.'}, {'name': 'Stripe', 'type': 'organization', 'description': 'Used as a comparison for the consumption-based model of the future AI cloud.'}], 'tags': ['ai-agents', 'infrastructure', 'sandboxing', 'bare-metal', 'cloud-computing', 'developer-tools', 'computer-use', 'saas-growth'], 'seo_title': "AI Agents Need Computers: Ivan Burazin on Daytona's Pivot", 'seo_description': 'Ivan Burazin explains why AI agents need composable computers and how Daytona pivoted from dev environments to 850K daily agent runs.', 'confidence': {'score': 0.98, 'rationale': 'The summary synthesizes multiple detailed chunks covering technical metrics, business strategy, and market philosophy with high fidelity to the source.'}}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}* ## [01:12] Introduction ## [03:15] CodeAnywhere, Shift, and the end of localhost ## [05:58] What Daytona is: composable computers for AI agents ## [08:07] The pivot from dev environments to AI sandboxes ## [10:17] The New Year’s Eve MVP and customers begging for API keys ## [12:56] Bare metal, stateful sandboxes, and Daytona’s scheduler ## [17:28] 60ms startup, 50,000 sandboxes, and 850K daily runs ## [21:53] Spiky RL/eval workloads and the new agent infra problem ## [28:12] RL workloads, Kubernetes pain, and dynamic resizing ## [33:31] Why every AI agent needs a computer ## [38:48] macOS sandboxes and Apple’s licensing problem ## [44:28] Why CLI may matter more than MCP ## [48:11] Open source, GitHub stars, and agent integration ## [53:11] Git, CI/CD, and agent collaboration bottlenecks ## [58:15] Founder life and building a 25-person infra company ## [1:02:44] AI SaaS, token resale, and API-first business models ## [1:06:10] GPU sandboxes, data centers, and compute growth ## [1:09:48] Why the AI cloud may look more like Stripe than AWS ## [1:11:26] Closing thoughts

Build a production-ready agent with Claude Managed Agents
27:23
EN/ZH
Watch with Captions
Claudehace 5 días

Build a production-ready agent with Claude Managed Agents

This session introduces Claude Managed Agents, a suite of API endpoints designed to help developers build and deploy production-ready AI agents with built-in tools, security, and observability. The speaker outlines how core primitives like Agents, Environments, and Sessions enable complex workflows such as multi-agent coordination and human-in-the-loop controls. ## [00:00] Introduction to Managed Agent Primitives Anthropic introduces Claude Managed Agents as a suite of API endpoints providing production-ready primitives like tool calling, error recovery, and memory management. The architecture relies on 'Agents' as templates for skills, 'Environments' for sandboxed execution with granular permissions, and 'Sessions' to maintain ongoing conversational context and state transitions. > *Claude Managed Agents at a high level is just a set of API endpoints that we've developed and released... that give you access to scaled ready, production ready agent. [01:35]* ## [07:54] Secure Connectivity and Sandboxing The platform supports self-hosted sandboxes, allowing developers to use private containers and VPCs to keep sensitive data secure while maintaining model access. Additionally, new MCP tunnels facilitate safe connections to internal Model Context Protocol servers, and Credential Vaults protect authentication tokens by keeping them out of the model's context window. > *Claude can directly connect to that safely without those MCP servers ever being exposed on the internet. [09:40]* ## [10:02] Multi-Agent Orchestration and Implementation A demonstration of a multi-agent architecture shows a coordinator agent spawning specialized sub-agents for complex tasks like financial analysis and macro trend research. Developers can implement these workflows using the Anthropic SDK and tools like Claude Code, which is specifically optimized to help developers implement and iterate on managed agent APIs. > *One agent is like in charge of figuring out macro trends... whereas another one is like really good at like financial analysis. [11:36]* ## [19:28] Observability, Memory, and Infrastructure The Claude Console provides robust observability, including agent versioning, session monitoring, and the ability to edit memory stores to correct agent context. By providing integrated state transitions and durable storage out of the box, the service eliminates the need for developers to build complex custom agent loops and sandboxing fleets manually. > *With cloud manage agents, we kind of were able to get all of these things out of the box. [26:54]* ## Entities - **Anthropic** (organization): The AI research and safety company that developed the Claude model family. - **Claude Managed Agents** (software): A suite of API endpoints for building and hosting production-ready AI agents. - **MCP** (protocol): Model Context Protocol used for secure authentication and tool integration. - **Claude Code** (software): A developer tool optimized for implementing and managing Anthropic APIs. - **Bun** (software): A fast JavaScript runtime used for the technical implementation demonstrations. - **Cloudflare** (infrastructure): A cloud provider mentioned as a host for private sandboxes and environments. - **Credential Vaults** (feature): A secure storage system for authentication tokens that prevents exposure to the model. - **Memory Stores** (feature): Persistent storage allowing agents to retain and retrieve information across sessions.

#claude-managed-agents#ai-agents#anthropic-api
How to get to production faster with Claude Managed Agents
29:04
EN/ZH
Watch with Captions
Claudehace 5 días

How to get to production faster with Claude Managed Agents

Anthropic engineers Michael and Harrison introduce Claude Managed Agents, a platform designed to simplify the infrastructure, security, and observability required for deploying autonomous AI agents. By handling complex backend tasks like sandboxing and identity management, the system enables developers to transition from simple tool use to long-running, outcome-oriented agentic workflows. ## [01:10] The Evolution of Agentic Infrastructure Michael and Harrison trace the progression of AI from basic function calling to autonomous agents capable of managing full feature development and PRs. They argue that infrastructure, rather than model intelligence, is now the primary bottleneck for achieving productivity where months of work are completed in hours. > *where we think we're seeing things going in the future is entire quarters worth of work being able to be getting accomplished within a couple of hours.* > *[2, 34]* ## [04:22] Core Primitives and Configuration The platform provides composable primitives for context management, observability, and secure sandboxing, allowing developers to define agents via system prompts and MCP tool configurations. Features like the 'Ask Claude' button and event streams provide real-time transparency and optimization suggestions for agent sessions. > *we did all of that platform work so that you don't have to so that you can kind of pick and choose the primitives that we have available.* > *[5, 26]* ## [10:05] Advanced Orchestration and Memory Beyond single-task execution, the platform supports multi-agent orchestration where Claude can spawn sub-agents to delegate work. Advanced features like 'Dreaming' allow agents to reflect across thousands of sessions, improving long-term memory and task performance through autonomous reflection. > *It allows Claude to spawn other agent threads with their own context windows in order to delegate work to them.* > *[10, 55]* ## [11:56] Sandboxing and Secure Connectivity Anthropic offers self-hosted sandboxes and MCP tunnels to give enterprises control over network policies and audit logs while exposing private data securely. Partners like Vercel, Modal, and Cloudflare provide specialized infrastructure, ranging from lightweight isolates for rapid scaling to high-performance GPU clusters. > *MCP tunnels are basically just a way for you to get your private MCPs in your network exposed to cloud manage agents.* > *[13, 25]* ## [20:19] Real-World Automation and Optimization Companies like DoorDash and Modal are using agents for complex technical tasks, such as autonomous account management and inference tuning. By running tools like the Nvidia profiler, agents can autonomously 'hill climb' performance benchmarks to optimize workloads without human intervention. > *Claude can optimize training loops... it'll run like the Nvidia profiler. It'll read the profiles and uh it'll just go ham and and make things better.* > *[20, 39]* ## [25:23] Future Challenges: Identity and Collaboration As agents become primary users of compute, the industry faces new hurdles in identity management, egress filtering, and task resumability. The future of AI involves moving from rigid execution to collaborative 'multiplayer' environments where agents and humans dynamically pivot based on feedback. > *how do we properly assign identity all the way down the chain such that it's only getting access to the right data* > *[25, 55]* ## Entities - **Anthropic** (organization): The AI safety and research company behind the Claude model family. - **Claude Managed Agents** (product): A platform and infrastructure suite for building and deploying autonomous AI agents. - **Michael** (person): Member of Technical Staff at Anthropic working on managed agents. - **Harrison** (person): Member of Technical Staff at Anthropic working on managed agents. - **MCP** (protocol): Model Context Protocol used for tool configuration and secure tunnels. - **Cloudflare** (organization): A cloud services provider focusing on sandboxing technologies like MicroVMs and isolates. - **Modal** (organization): A compute platform specializing in high-scale GPU sandboxes and AI workloads. - **Vercel** (organization): A partner providing fluid compute infrastructure for agent sandboxes.

#ai-agents#anthropic#claude
Building the best agentic analytics harness: Powered by Claude, built with Claude Code
26:46
EN/ZH
Watch with Captions
Claudehace 5 días

Building the best agentic analytics harness: Powered by Claude, built with Claude Code

Chris Merrick, CTO of Omni, details the development of 'Blobby,' an agentic analytics harness powered by Anthropic's Claude models. By combining a robust semantic layer with internal dogfooding of Claude Code, Omni enables users to translate natural language into complex data visualizations while maintaining high engineering velocity. ## [00:07] Engineering Velocity with Claude Code Chris Merrick explains how Claude Code has transformed Omni's internal development, allowing a small team of 25 to maintain high commit velocity. Even as CTO, Merrick uses the tool to stay technically involved, leveraging the efficiency of the Claude Opus model to contribute code alongside his team. > *I thank Claude very much for making me uh still able to do some software engineering from time to time. [01:12]* ## [03:14] The Semantic Layer and Business Context To bridge the gap between general LLM knowledge and specific business data, Omni utilizes a semantic layer that provides essential context like fiscal definitions and table relationships. This layer acts as a permissions and curation tool, ensuring the AI agent understands the unique nuances of a company's data environment. > *Claude is incredible at answering questions, but you need to tell it more about your business if you want it to answer questions about your business. [04:03]* ## [11:15] Architectural Evolution and the 'Blabbotomy' The team evolved their AI agent, Blobby, from a simple Q&A tool into a sophisticated harness by upgrading from Claude Haiku to Sonnet for better multi-turn performance. They addressed 'split-brain' errors—where sub-agents and outer agents failed to communicate—by consolidating all tools into a single, unified agentic brain. > *You want to be careful not to have a split brain between any sort of sub agent system and outer agent system. [15:57]* ## [16:23] Leveraging SQL and CTE Proficiency Omni shifted its query strategy from a proprietary JSON format to standard SQL to better leverage Claude’s inherent proficiency with complex Common Table Expressions (CTEs). This transition allowed the agent to handle difficult data questions in a single pass, significantly improving the accuracy of generated reports. > *Claude really likes to write SQL with CTE, common table expressions... and our parser was really good at parsing those [18:27]* ## [19:09] Evals, Observability, and UI Validation Merrick emphasizes that rigorous evaluation systems and raw trace observability are critical for ensuring the predictability required by executive users. Omni follows a 'build with AI, validate with UI' philosophy, where Blobby generates the initial dashboard and users use a workbook interface to refine and troubleshoot the results. > *Our philosophy from a product perspective is AI to build, UI to sort of validate and troubleshoot and refine. [23:21]* ## Entities - **Chris Merrick** (person): CTO and Co-founder of Omni who leads the engineering team and advocates for AI-driven development. - **Omni** (organization): An AI analytics platform that enables users to query data using natural language. - **Claude** (ai-model): The family of LLMs from Anthropic that powers Omni's analytics and internal engineering. - **Claude Code** (software): An AI-powered coding tool that significantly increased Omni's development velocity. - **Blobby** (ai-agent): Omni's AI data analyst agent designed to interpret and answer complex data questions. - **SQL** (technology): The query language that Omni's semantic layer generates to interact with data warehouses. - **Claude Sonnet** (ai-model): The specific Anthropic model used to unlock performance gains in complex agentic conversations. - **GitHub** (platform): The source of pull request (PR) data used in the agent's demonstration.

#ai-analytics#claude-code#semantic-layer
Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)
1:17:10
EN/ZH
Watch with Captions
Machine Learning Street Talkhace 6 días

Intelligence is collective, not artificial — Prof. Michael I. Jordan (UC Berkeley / Inria)

Prof. Michael I. Jordan challenges the anthropomorphic framing of AI, arguing for a view of intelligence rooted in collective human systems and economic theory. He critiques "superintelligence" narratives as demoralizing distractions and advocates for a shift toward viewing AI as an ecosystem that facilitates human collaboration and job creation. By integrating microeconomics, game theory, and statistical rigor, Jordan proposes a new engineering discipline focused on system-level safety and social welfare. ## [00:00] Cold open: A demoralizing message to young builders Michael I. Jordan criticizes the trend of anthropomorphizing AI, calling it a distraction from real-world problem-solving. He expresses concern that "doomer" narratives about humanity's extinction are demoralizing to young engineers who want to build helpful technology. He argues that these leaders lack economic thinking and are detached from the reality of how systems are built. > *I think this anthropomorphizing of intelligence and understanding all that is not necessary, not appropriate, and is is a distraction [00:21]* > *It's gonna wipe out humanity with a with a high probability... That is so demoralizing. [01:12]* ## [02:04] CyberFund sponsor read Host Tim Scarfe introduces CyberFund, a venture firm looking for "AI native" founders. They are launching a "monastery" program designed for rapid execution and focus, offering significant funding to teams operating at the frontier of AI technology. The section concludes with a brief transition into a discussion about the term AGI. > *CyberFund believes the future belongs to AI natives who want to achieve the impossible [02:12]* > *AGI to me is just a bit of it's a it's a PR term. [02:45]* ## [02:50] From symbolic AI to machine learning systems Jordan clarifies that he identifies more as a statistician and cognitive scientist than a traditional AI researcher. He explains that while early AI focused on logical inference, the real industrial impact came from machine learning methods like logistic regression and decision trees. These methods, rooted in statistics and operations research, powered the growth of the cloud and global supply chains. > *I've never actually thought of myself as an AI researcher... The term was coined in the fifties... and they had particular methods in mind [03:29]* > *Supply chains and commerce and transportation systems all used, and still to this day, vast amounts of machine learning. [04:04]* ## [05:42] Why AGI is mostly a PR term Jordan describes "AGI" as a distortionary term that confuses the next generation of researchers. He notes that the "AI" buzzword resurfaced primarily due to the success of Large Language Models (LLMs) in mimicking human fluency. He argues that this focus on human-like language has distracted from the necessary development of robust business models and social-scale technology. > *The AI buzzword returned because of LLMs... it's been a distortionary effect on the path of research [05:01]* > *The role of humans as producers and consumers in these emerging systems should respected, amplified and thought about. [05:33]* ## [08:48] A collectivist, economic perspective on AI Jordan introduces his perspective that intelligence is a social and collective phenomenon rather than just an individual or computational one. He argues that smart action is contextual and often involves interacting with others through collaboration or competition. By incorporating economic and game-theoretic principles, he aims to build safer, more effective systems. > *We are social animals, and a lot of our intelligence comes by the fact that we aggregate. [07:20]* > *The society provides a context for our intelligence. Smart action in 1 context is not in another context [07:31]* ## [11:33] Why LLMs need system design, not hype Jordan compares the current state of AI development to early chemical engineering, where trial and error led to dangerous "explosions" and social harm. He critiques Silicon Valley's reliance on scaling LLMs without considering the displacement of jobs or the mental health impacts already seen in social media. He calls for a more rigorous social science and mathematical foundation rather than relying on metaphors. > *If you were a chemical engineer... saying we're just gonna throw a lot of stuff together... you'd get a lot of explosions. [12:12]* ## [14:50] Predictability beats faux understanding While some researchers focus on 'mechanistic interpretability' to understand AI's internal logic, Jordan argues that full internal understanding isn't strictly necessary. Drawing a parallel to human behavior, he suggests that predictability and 'rules of thumb' are more important for safe interaction. In practical scenarios like bank loan denials, users need contextual explanations based on similar cases rather than a map of internal neural circuits. > *I don't think it's bad to build systems you don't understand. But then you've got to kind of put things around it. [15:14]* ## [17:55] AlphaFold, bias, and prediction-powered inference Jordan examines AlphaFold as a successful, targeted application of machine learning that revealed significant biases. While the model provided the statistical power to reject null hypotheses, it lacked error bars for specific scientific questions. To address this, Jordan introduces prediction-powered inference (PPI), a methodology that merges small amounts of ground truth data with massive model outputs to produce trustable error bars. > *It doesn't give you out error bars and it doesn't specifically on the question you're asking. That's where I want the error bars. [20:14]* > *We developed something called prediction powered inference that does exactly that... it'll cover the truth just like in a classical statistical setting. [20:38]* ## [21:48] Stop anthropomorphizing intelligence Jordan rejects the necessity of applying terms like 'understanding' or 'intelligence' to machine learning systems, calling such anthropomorphizing a distraction. He cites Amazon's supply chain systems, which optimized global logistics without any human-like understanding. These systems are valuable because they reduce uncertainty and enable planning, not because they possess cognitive traits. > *Why say it understands? This anthropomorphizing of intelligence understanding all that is not necessary, not appropriate, and is a distraction. [22:51]* > *Even though we don't have a clue what understanding intelligence means, we and our researchers realize we don't care or need it. [24:23]* ## [27:44] Drug discovery as an incentive problem The conversation shifts to how economics provides a framework for analyzing complex, multi-agent systems like pharmaceutical regulation. Jordan explains that statistical problems become economic ones when data is provided by self-interested parties seeking profit. Effective systems must be designed to incentivize truthful behavior to control error rates in high-stakes environments where information is hidden. > *Now you've a kind of tangled web of scientists and pharmaceutical companies, not just 1 but many, many of them, and proteins. [28:49]* ## [32:29] The three-layer data market Jordan introduces a three-layer model involving users, platforms, and data buyers to illustrate how privacy and utility reach an equilibrium. He suggests that platforms could offer tunable levels of differential privacy as a competitive feature. This approach shifts the focus from simple optimization to equilibrium-based systems to design more robust social welfare structures. > *So let's think about a data market because data is not just now something you analyze to build a big LLM, it's also something you would sell and buy [32:54]* > *The platforms would say, well, we'll offer you a tunable level of differential privacy for some cost. [35:02]* ## [38:07] Social knowledge, markets, and culture Jordan distinguishes between raw data and social knowledge, which he describes as ephemeral and context-dependent. He argues that markets and cultures naturally create abstractions that are promoted from individual insights to collective knowledge. AI systems should facilitate the emergence of these new cultural abstractions rather than just reinforcing existing ones. > *Human culture creates abstractions... and when those abstractions are kind of useful enough... they kind of get promoted into the culture. [41:52]* ## [45:39] Creator economics beyond Spotify Using Spotify and YouTube as examples, Jordan discusses the failure of current digital markets to properly reward creators. He advocates for ecosystems that empower musicians to maintain ownership and connect directly with brands, citing United Masters as an alternative. He argues that platforms often become monopolies that necessitate a broader macroeconomic view of AI's role. > *I'm not against Spotify, but it should be part of an ecosystem that actually rewards the artist more. [46:56]* ## [48:30] How science-fiction AI narratives mislead young builders Jordan addresses warnings of agential, self-improving AI as "science fiction" that demoralizes young builders. He argues that framing the future as a binary between superintelligence or extinction ignores economic realities and stifles innovation. He dismisses the idea that LLMs replicate the human brain, calling the comparison a "cartoon" or metaphor. > *It's gonna wipe out humanity with a with a high probability... That is so demoralizing. [49:33]* ## [51:45] AI should improve humans, not replace them Jordan defines the true purpose of AI as aiding information flow to help humans make the decisions they actually want to make. He highlights the imperfections of human systems and argues that AI should address the gaps where evolution failed to prepare us for modern complexity. Rather than replacing humans, technology should serve as an aid to human creativity and emotion. > *AI is about helping the things that were too hard for humans* ## [56:42] Safety is a property of the whole system ## [58:12] Silicon Valley gurus and the cream off the top ## [1:00:47] Game theory, mechanism design, and contracts ## [1:04:39] Conformal prediction, e-values, and anytime inference ## [1:08:11] A new liberal arts triangle for the AI era ## [1:11:30] The Bayesian duck and markets as uncertainty reduction

The Agent-Native Cloud: Jake Cooper on Railway's Future
1:29:54
EN/ZH
Watch with Captions
Latent Spacehace 6 días

The Agent-Native Cloud: Jake Cooper on Railway's Future

Jake Cooper, CEO of Railway, details the platform's evolution from a high-burn startup to a sustainable, bare-metal cloud infrastructure powering 3 million users. He argues that the rise of AI agents necessitates a fundamental rebuild of the cloud, moving away from human-centric tools like Kubernetes and pull requests toward high-density CLI handles and production forking. This conversation provides a roadmap for building modular, high-scale systems capable of supporting the next generation of automated software development. ## [00:00] Intro Jake Cooper argues that developers should stop writing code by hand and instead focus on reviewing agent-generated code to maintain architectural integrity. He emphasizes that while AI tools have improved significantly, underlying architectural patterns matter more than ever in an automated workflow. The hosts introduce Jake as the 'Conductor' of Railway, setting the stage for a discussion on the future of cloud platforms and developer experience. > *you should be reviewing the code that you are writing instead of trying to go and write it by hand.* > *[0, 10]* ## [01:19] What Is Railway? Railway is described as a platform that allows users to deploy applications and databases instantly via a canvas or AI prompts like Claude. Jake explains that the goal is to manage software versioning and environment cloning to reduce the complexity of traditional tools like Docker and Kubernetes. By tracking all changes, Railway enables developers to fork production environments into parallel universes for safe validation without reproducing staging environments manually. > *railway is the easiest way to ship anything.* > *[2, 29]* > *we want to make it really easy for not just to like deploy things, but for you to almost like evolve applications over time.* > *[2, 49]* ## [03:26] Jake’s Path to Railway Jake details his professional journey from front-end work at Wolfram to building distributed systems for Jump bikes at Uber using Cadence. He describes his engineering philosophy as a willingness to 'swim to the bottom of the pool,' which includes writing kernel patches to ensure the best possible user experience. Additionally, he critiques GitHub's architecture, specifically the 'broken pointers' created by cloning, which complicates upstream contributions. > *we will swim to the bottom of the swimming pool to go and get the experience* > *[4, 35]* > *GitHub's original sin is that it's like almost a series of broken pointers.* > *[6, 2]* ## [07:32] Railway’s Six-Year Growth Story Jake presents a growth chart illustrating the rapid increase in daily signups for the Railway platform, which has transitioned from a 'slow grind' to adding 100,000 users weekly. Early growth was driven by high-touch interaction on Discord and a determination to acquire the first 100 core users manually. This visual data serves as a transition into the company's history of scaling and its move toward becoming a primary cloud provider. > *so I just wanted to like pull up this glorious chart you say which is basically your usage or number of daily signups* > *[7, 34]* > *Trying to get those initial like first 100 users to like actually kind of come back to it.* > *[8, 21]* ## [10:11] Rebuilding the Business After the Free Tier At one point, Railway was losing $500,000 a month while only generating $50,000 in revenue, despite having $20 million in the bank. Cooper realized this was an unsustainable business model and chose to prioritize long-term viability over vanity metrics, temporarily closing the free tier to rebuild. The company now maintains a lean team of 35 people, preferring to build automated systems rather than throwing headcount at problems. > *We basically had to kind of close off the the free kind of users for a little while, rebuild the business.* > *[11, 47]* > *We're 35 people right now... we don't want to just like add headcount for the sake of headcount.* > *[10, 52]* ## [12:36] Agents as the Next Software Platform Over the last six months, Railway has prioritized 'agentic' development as the primary mechanism for building and deploying software. Cooper believes the industry is moving from assembly and high-level languages to 'words' as the primary interface. He envisions a future where thousands of agents run in parallel, requiring new tools for coordination and version control to manage the super-exponential growth of workloads. > *We've moved from assembly to C to C++ to JavaScript to now like words.* > *[13, 23]* ## [14:48] Railway’s Infrastructure Philosophy Jake Cooper explains that Railway prioritizes control over low-level primitives like network, compute, and storage to optimize for AI agent workloads. By avoiding Kubernetes in favor of custom orchestration, the team can place workloads with high precision to ensure memory efficiency. This level of control is necessary to prevent cost structures from ballooning as agent usage increases and requires thousands of parallel instances. > *you have to be very very efficient with these agents... or you're going to massively massively blow up your cost structure* > *[15, 10]* > *How do you get agents to coordinate? How do you go and get them to be able to like safely version changes?* > *[14, 28]* ## [17:01] Bare Metal, Cloud Economics, and the Compute Crunch Cooper describes the transition to bare metal as highly lucrative, reporting a payback period of just three months compared to cloud rental costs. This strategy allows the company to achieve 70% margins while leveraging hardware that remains viable for several years. He also notes the surprising appreciation of hardware assets, such as RAM, due to the global compute shortage and supply chain constraints. > *our payback period when we go to to metal... if we rent it in the cloud, our payback period is about 3 months.* > *[17, 2]* > *hardware and all of this stuff is... appreciated in value because RAM has gone up* > *[17, 50]* ## [18:41] Cloud Bursting and Five-Cloud Networking To maintain growth without being compute-constrained, Railway utilizes a hybrid cloud strategy for bursting capacity across AWS, GCP, and Oracle. This required building a custom network overlay capable of straddling five different cloud environments simultaneously. While this complexity led to past reliability challenges, it now allows Railway to scale rapidly regardless of individual provider quotas or hardware availability. > *I spent a weekend rebuilding our entire like network like overlay essentially so that we could straddle uh five different clouds* > *[19, 41]* > *we still maintain like cloud presence for like bursting essentially* > *[18, 52]* ## [21:39] Data Center Debt and Infra Financing Cooper highlights the strategic use of data center debt, secured against hardware, as a more efficient alternative to venture capital for infrastructure expansion. By treating compute capacity as a linear driver of revenue, Railway can scale as quickly as they can deploy new hardware. He encourages infrastructure startups to explore diverse financing tools rather than relying solely on expensive venture equity for physical assets. > *we can scale revenue as basically as quickly as we can scale compute* > *[21, 20]* > *our margins on metal are like quite high for the like 70%.* > *[20, 46]* ## [24:50] Data Centers in Space Jake Cooper and the hosts explore the technical challenges of placing data centers in space, specifically the issue of heat dissipation in a vacuum. Cooper expresses skepticism toward current proposals that ignore fundamental thermodynamic laws, comparing the 'figure it out later' mentality to science fiction. He highlights the difficulty VCs face in distinguishing between visionary ideas and technical 'grifts' in the space-tech sector. > *I haven't seen anybody like prove how you're going to go and dissipate that much heat in a vacuum* > *[25, 16]* > *how do you know what's like basically not possible and like is a grift versus like uh is possible but like sounds completely insane* > *[26, 16]* ## [26:43] What Agents Need From Infrastructure Cooper outlines the infrastructure needs of AI agents, noting they require versioning, observability, and storage similar to humans but at a 1000x scale. He predicts that current industry standards like Kubernetes and Envoy will become bottlenecks as agentic workloads compress development cycles. To support this growth, infrastructure must be modular enough to allow for the rapid replacement of failing components without human intervention. > *the workload profile doesn't change so much as it gets like massively massively compressed because you need to do thousands of these things* > *[28, 28]* > *you just need at a thousandx scale* > *[29, 13]* ## [29:43] CLIs, Canvas, and Agent-Native UX Cooper explains that while humans prefer simplicity, agents benefit from high-density CLI interfaces with numerous flags that serve as 'handles.' The Railway Canvas is also evolving into an output mechanism and 'context anchor' rather than just an input tool. This hierarchical view of infrastructure prevents critical knowledge from being siloed as teams scale complex 'hyperstructures' using automated agents. > *If you hand it to an agent and you say, 'Hey, that's 40 arguments and 600 flags.' Like, oh yeah, this is excellent.* > *[30, 35]* > *It has to be almost like an anchor for your context. It has to be like a port in the storm.* > *[34, 27]* ## [36:34] Central Station, Incidents, and Responsible Disclosure Railway utilizes an internal tool called Central Station to aggregate feedback and user context, moving away from static communication channels like Slack. The team emphasizes transparency by exposing real-time metrics and detailed incident reports, operating under a core value of 'honor.' This approach involves over-disclosing issues to users rather than providing vague or misleading information during outages. > *We'd rather overdisclose and know that you know that something is wrong versus almost like having your provider gaslight you.* > *[40, 22]* > *If you can dynamically aggregate that information and dynamically route it to the right person... this is no longer a manual process.* > *[37, 10]* ## [41:49] Safe Rollouts, SRE Agents, and Production Forks To mitigate the impact of bugs, Railway employs incremental rollouts and makes it easy to test behaviors in safe, shadowed environments. Cooper argues that production should not be treated as 'sacred' to the point of stagnation; instead, infrastructure should allow for trivial production forks. This is essential for AI agents, which face a 'stacking entropy' problem without safe iteration primitives to prevent system drift. > *We've built so much ceremony around like production is sacred... we need to get to a point where it's just trivially easy to test different behaviors.* > *[41, 33]* > *I think if you don't have the primitives to make iterating in production safe, it becomes very very difficult.* > *[44, 3]* ## [46:19] AI SRE, Specs, Code, and Tests Jake Cooper reflects on his transition from an AI skeptic to a believer, noting that the safety of AI SREs depends on infrastructure primitives. He advocates for the 'Holy Trinity' of software engineering: a clear specification, the code, and the tests. By aligning these three, developers and agents can reconcile discrepancies and maintain system integrity during rapid, automated iteration. > *If you just unleash an AI SRE on your production infrastructure... it's going to nuke your production database.* > *[46, 37]* > *You need three points essentially which is you need a clear spec... you need the code and then you need the tests.* > *[48, 22]* ## [49:43] Self-Replicating Infrastructure and the New Serverless The speakers explore the concept of agents using the Railway CLI to modify their own infrastructure, creating a self-replicating loop. This shift necessitates a move away from expensive, static virtual machines toward cheap, instantaneous 'atomic units of deploy' like isolates or sandboxes. The goal is to make throwaway copies of production as trivial and cost-effective as possible for agentic experimentation. > *The agent can like modify its own infra which I think is... yeah it's nuts.* > *[50, 4]* > *How do you go and make those throwaway copies like as trivial as possible to spin up run super cheap etc.* > *[50, 53]* ## [54:37] Heroku, Temporal, and Workflow Engines Cooper attributes the decline of Heroku to Salesforce's lack of focus on compute as a core business, leading to product stagnation. Railway positions itself as a 'fluid compute' provider, leveraging Cooper's decade of experience with Temporal (and its precursor Cadence) for durable workflows. Railway is a power user of Temporal, using it to manage complex, long-running infrastructure tasks at scale. > *The business of Salesforce is to build a really really good CRM... and then you acquire this business as a compute business that's kind of an offshoot* > *[55, 33]* > *I have used Temporal for almost like 10 years now, right? Because like Cadence, all of us other things.* > *[60, 5]* ## [1:05:26] Railpack, Nixpacks, and Lazy-Loaded Filesystems Railway is developing Railpack, an engine for determining source code dependencies, which evolved from their earlier Nix-based tool, Nixpacks. While Nix offers theoretical benefits for versioning, Railway found it caused significant image bloat and scaling issues for real-world workloads. They are now exploring content-addressable file systems to enable lazy loading of data into memory for faster deployments. > *If you want version X and version Y, you end up bloating a lot of your kind of like package like space.* > *[66, 2]* ## [1:07:20] Coding Agents, Token Spend, and Roadmap Acceleration With a monthly cloud spend reaching $300,000, Railway heavily incentivizes the use of AI coding agents among its employees. Cooper argues that manual code generation is an inefficient use of time, urging developers to focus on architectural patterns and code review. This allows the team to 'speedrun' their product roadmap by automating complex infrastructure tasks and test generation. > *If you are writing code by hand you are doing this wrong... you should be reviewing the code that you are writing.* > *[67, 37]* > *If you're not using the AI systems to almost like speedrun your road map... then you're kind of missing a large point.* > *[69, 12]* ## [1:12:15] The Pull Request Is Dying The traditional SDLC is undergoing a radical transformation where the pull request and manual code review are losing relevance. Impact is increasingly measured by the 'percentage of tokens that end up in production' rather than lines of code. As AI systems handle more reconciliation and validation, the focus shifts from the PR to the initial prompt and final deployment. > *The pull request is dying... it's going to be the prompt... and beyond that code review is also kind of dying.* > *[72, 23]* > *The really naive way to go in and measure this is almost like your percentage of tokens that end up in production.* > *[71, 40]* ## [1:13:47] Feature Flags and the Agent-Era SDLC Jake Cooper discusses the critical role of feature flagging in managing the 1000x compression of the SDLC driven by AI agents. He argues that incremental rollouts and blast radius management through flagging will become even more essential for safety as deployment speed increases. This culture of flagging allows for rapid experimentation without compromising system stability for enterprise customers. > *Everything's just going to get compressed by like a thousandx so that everybody can go and do that.* > *[77, 21]* ## [1:17:34] Cattle, Pets, and Cloning Machines Jake offers a contrarian view on the 'cattle not pets' philosophy, suggesting that snapshotting allows developers to treat infrastructure like 'pets' again. By snapshotting every frame and lazily loading file systems, the overhead of traditional DevOps tools like Dockerfiles is reduced. Railway even modifies the kernel to support persistent connections during these system snapshots. > *I think you can move towards having pets so long as... you have a cloning machine for your pets.* > *[78, 2]* > *If you can snapshot every single thing at every frame, then like it actually doesn't matter if you know that obliterated.* > *[78, 12]* ## [1:20:48] Solo Founder Lessons Jake reflects on his path as a solo founder, contrasting it with the Silicon Valley consensus of finding a co-founder. He emphasizes the need to be obsessed with every layer of the stack, from kernel-level changes to go-to-market strategies. He argues that having two co-founders can often lead to deadlocks without a clear tiebreak, whereas solo leadership allows for singular vision. > *Two is the worst number of co-founders is because you have no tiebreak... you basically are like, well, I disagree on this thing.* > *[82, 49]* ## [1:25:31] Focus, GPUs, and Building a New Cloud Railway is intentionally avoiding the GPU provider market for now to maintain its core mission, though Cooper admits GPUs are an inevitable part of their long-term roadmap. He stresses that companies are defined as much by what they choose not to do as by what they execute. The ultimate goal is full vertical integration to ensure a seamless experience from logic to execution. > *I think you're you're defined almost more by the things that you don't do than the things that you do* > *[86, 8]* > *I can tell you for a fact that we will not be doing GPUs now, but we 100% will be doing GPUs at some point.* > *[86, 50]* ## [1:29:39] Closing Thoughts Cooper reveals that Railway is moving toward 100% ownership of its data centers to avoid copying the infrastructure of legacy hyperscalers. By inventing their own infrastructure from scratch, Railway aims to support 'vibe coding,' where the friction between a thought and a live application is completely removed. This approach empowers a new generation of 'citizen developers' to build at the speed of thought. > *there should be no friction in between what your thought is and reality that kind of comes out.* > *[89, 4]* > *we've been very very deliberate to like invent our own infrastructure from scratch.* > *[88, 30]* ## Entities - **Jake Cooper** (person): CEO and 'Conductor' of Railway. - **Railway** (organization): A cloud platform designed for easy deployment and environment management. - **Uber** (organization): Jake's former employer where he worked on distributed systems for Jump bikes. - **Temporal** (software): A workflow orchestration platform used by Railway for reliable infrastructure tasks. - **Salesforce** (organization): The CRM company that acquired Heroku, leading to its perceived stagnation. - **Heroku** (organization): A pioneer PaaS platform that Railway is often compared to. - **AWS** (organization): Amazon Web Services, used by Railway for hybrid cloud bursting. - **GCP** (organization): Google Cloud Platform, one of the five clouds Railway straddles. - **Claude** (software): An AI model mentioned as an interface for deploying on Railway. - **GitHub** (organization): A code hosting platform discussed regarding its architectural flaws in versioning. - **Kubernetes** (software): An orchestration system Railway chooses to avoid for higher-order control. - **Central Station** (product): Railway's internal tool for aggregating user context and support feedback.

#cloud-computing#ai-agents#infrastructure
The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion
1:59:28
EN/ZH
Watch with Captions
Latent Spacehace 8 días

The Next War Is Already Here — Yaroslav Azhnyuk, The Fourth Law & Noah Smith, Noahpinion

Ukraine produced 4 million FPV drones last year; China could produce 4 billion. That asymmetry frames two hours of unusually concrete conversation between Yaroslav Azhnyuk — serial tech founder turned AI-drone builder at The Fourth Law — and economist Noah Smith, who has been writing about the economics of drone warfare since before most Western policy circles took it seriously. They cover the full tech stack (cameras, autonomy modules, fiber optic links, interceptors, a semiconductor fab under construction), a five-level autonomy taxonomy, an eight-dimension autonomous-battlefield framework, and China's manufacturing edge that has no near-term Western answer. The through-line: the West is still planning to fight the last war, Ukraine is the defense valley where the next war is already live, and the gap is widening faster than most people realize. ## [00:00] Cold Open: China's 4 Billion Drones and the Cameras-to-Explosives Pipeline Yaroslav opens cold with a single arithmetic comparison that structures the rest of the episode. Ukraine, not an industrial powerhouse, built 4 million FPV drones in a year. China, with an order-of-magnitude larger manufacturing base and a consumer electronics supply chain already producing the same cameras, motors, and chips, could produce 4 billion. Noah immediately asks whether that makes China the supreme conventional military power on earth right now. Yaroslav won't claim certainty, but won't rule it out either. > *"I don't think we have all the information to claim that, but we cannot count it out. And that alone should be, you know, a big warning sign."* The cold open also plants the personal pivot that the rest of the episode unpacks: Yaroslav went from making cameras that fling treats to pets to cameras that fling explosives to occupiers. ## [01:04] Introduction: Brandon, Noah Smith, and Yaroslav Azhnyuk Guest host Brandon normally runs a science podcast; this episode is the exception. Noah Smith — Noahpinion Substack, economist focused on industrial policy and geopolitics — is co-host and co-interviewer. Yaroslav sets the personal context: on February 23rd, 2022, he and his then-fiancée landed in Kyiv at 11 p.m. on what turned out to be one of the last flights into the city. Eight hours later, the bombs fell. The 17-hour drive west that followed — empty streets, gas stations out of fuel, pouring diesel into windshield-washer canisters — reads like a scene from an apocalyptic film because, for the people living it, it was exactly that. > *"We basically packed our belongings and got in the car and spent 17 hours riding west. That was exactly like that. I, you know, missiles are falling, like there was smoke in Kyiv."* ## [05:41] From Tech Entrepreneur to Defense: PetCube, Brave One, and the D3 Fund Yaroslav's path from pet-tech to defense wasn't a straight line. In San Francisco from 2014 to 2020 building PetCube (one of the leading pet-camera companies), he had never taken military coursework and considered wars a thing of the past. Day one of the invasion he knew he would fight back with everything he could — but weapons weren't the first instinct. Early efforts included lobbying U.S. Congress on Lend-Lease (passed May 2022, underdelivered), co-founding Brave 1 (Ukraine's defense-innovation cluster, analogous to DIU), and helping seed the D3 Fund co-started by Eric Schmidt. By 2023, two things became undeniable: the war would last, and drones had permanently redefined warfare — the first software-defined weapon platform in history, where a battlefield capability upgrade can be pushed overnight like a software update. > *"It's like if you were able to push a software update and get all of your Roman legionaries a new helmet. That has never been possible before."* ## [10:42] The Ethics of Building Weapons: Dual-Use Technology and the Wolf at the Door Brandon raises the dual-use problem: the technology won't stay in Ukrainian hands. Yaroslav's answer is pragmatic rather than philosophical. Every technology from fire to large language models is dual-use; the question for a maker is whether the marginal risk of their contribution outweighs the immediate need. Ukraine is in a forest with a wolf. You deal with the wolf first, then consult Greenpeace. He's clear-eyed that no technology stays contained — the parallel concern about LLMs freely available in North Korea and Russia applies equally to drone autonomy — but frames his own company's responsibility narrowly: they supply to the Ukrainian government and armed forces, not to arbitrary buyers. > *"When you're in a situation where you're in a forest in front of a wolf, you know, you first going to deal with a wolf that wants to eat you and then you're going to go consult Greenpeace."* ## [14:01] The Tech Stack: Cameras, Autonomy Modules, Interceptors, and a Semiconductor Fab The Fourth Law's structure is three interlocking business units. Cameras (daytime and thermal, sold to 200+ Ukrainian drone manufacturers). Drone autonomy modules (sold to the same ecosystem). And UAV products sold direct to the armed forces: FPV strike drones, bombers, Shahed interceptors, and ISR interceptors — drones that hunt Russian reconnaissance drones before they can relay targeting data. The thermal-camera arm is about to start construction on two semiconductor fabs to manufacture sensor chips in-house, driven by the realization that dependence on foreign sensor supply chains is a strategic vulnerability. > *"We're about to start construction of two semiconductor plants to make sensors for thermal cameras. That's super exciting for me as a computer science guy — doing semiconductor, super cool."* ## [18:47] Fiber Optic vs. AI: The Radio Horizon Problem and $32/km Cable The chapter is really about why radio-only FPV drones fail at long range — not just from jamming, but from the curvature of the Earth. Below roughly 60-100 meters altitude at 30-40 km range, a drone enters a radio shadow behind hills, forests, or the horizon itself. The pilot loses video and control precisely when closing on a target that is, by definition, on the ground. Fiber optic cable ($32/km, spooled from the drone) solves the shadow problem but adds weight, limits range, and reduces maneuverability. AI fills the gap differently: terminal guidance lets the drone complete the last few hundred meters autonomously even after the radio link breaks. The two approaches aren't mutually exclusive — you can run AI on top of a fiber optic link to command hundreds of drones with fewer operators. > *"If your drone goes low — and usually Russian infantry and vehicles, they're on the ground and you want to hit them, you need to go low — lower you go, maybe you'll get behind a hill or behind a forest, and if you're far enough you'll just get behind the curvature of the Earth."* ## [25:32] FPV Drones: The New God of War — 70–80% of Frontline Casualties Artillery was historically called "the god of war" because it caused 80% of battlefield casualties. On the current Ukrainian front line, 70-80% of casualties are inflicted by FPV drones — the same fraction, a different weapon. Tanks, designed to dominate land warfare for decades, are now routinely destroyed by $400 consumer-grade quadcopters because armor was never built to defend against attacks from directly above. The trajectory follows the same curve as calculators becoming irrelevant once smartphones arrived: not a linear substitution but an exponential displacement where the new technology's influence grows nonlinearly. > *"They used to say that artillery is the god of war because artillery used to cause like 80% of casualties, and now on that ranking FPV drones rule."* ## [28:28] The Five Levels of Drone Autonomy: From Terminal Guidance to Full Autonomy Yaroslav lays out five autonomy levels describing where the field stands and where it's heading. Level 1 is terminal guidance — the drone flies under human control and locks onto a target only in the final seconds. Level 2 is bombing — dropping munitions from altitude without directly ramming a target. Levels 3-4 introduce increasing target-selection and navigation independence: the drone can identify radio-emitting equipment, track vehicles, or navigate through GPS-denied environments. Level 5 is full autonomy — launch-and-forget, no human in the loop for any mission phase. Current battlefield deployment sits mostly at Levels 1-3. The jump to higher levels isn't primarily a technical problem anymore; it's a deployment, doctrine, and trust problem. Human confirmation remains in the loop at every stage involving lethal targeting decisions — for now. > *"Technology progresses and its influence grows nonlinearly. It's all exponential."* ## [41:37] The Eight Dimensions of the Autonomous Battlefield The five autonomy levels describe a single drone's capability. The eight dimensions describe the full battlefield context those drones operate in. Dimension 1: level of autonomy (the five-level scale). Dimension 2: platform type (quadcopter, fixed-wing, missile, naval drone). Dimension 3: environment (day/night, urban/forest/open terrain). Dimension 4: target type (moving vehicle, static structure, radio emitter). Dimension 5: swarm size and coordination. Dimension 6: command-and-control architecture. Dimension 7: sensing modality (optical, thermal, RF). Dimension 8: infrastructure (simulation, data pipelines, security, deployment tooling). Each dimension interacts with every other. A Level-4 autonomous drone performing well in open daylight terrain may fail completely in a forest at night. Battlefield AI systems have to be evaluated across all eight dimensions simultaneously, not just on the single axis of autonomy level. > *"I say dimension because each of them works with another. It's crucial to understand how autonomy evolves in a modern battlefield environment."* ## [45:32] AI Safety and the Morality of Autonomous Weapons Yaroslav's position flips the standard AI-safety framing: in five to ten years, it will be *immoral* to use weapons *without* AI, because human-only weapons produce more collateral damage and friendly fire. He draws the analogy to manually driven cars — once autonomous vehicles are the norm, letting a human drive on a public road becomes the dangerous choice. Noah pushes to the logical endpoint: a Level-6 "AI general" — one large model that ingests all battlefield data and agentically selects targets, with humans reduced to repairing drones. Yaroslav says technically it could be done now. The constraint is deployment and trust, not capability. He references what was publicly described about AI-assisted target designation in the Iran operation: AI surfaces 127 targets, human reviews the list and presses okay. That's already close to an AI general with a rubber-stamp layer. > *"I think 5 to 10 years from now it will be immoral to use weapons without AI because weapons without AI will be more likely to cause collateral damage or unwanted damage."* ## [51:31] The End of the Rifleman? Noah's 2013 Prediction vs. Battlefield Reality Noah revisits a prediction he made in 2013: the rifleman is obsolete, replaced by standoff weapons. Ukraine both confirms and complicates it. FPV drones have unquestionably displaced the rifle as the primary instrument of attrition — but infantrymen haven't disappeared. They dig trenches, hold terrain, conduct logistics, and survive for months in dugouts under continuous drone threat by adapting: better camouflage, smaller movement signatures, drone-awareness drills. Yaroslav extends the timeline question to humanoid robots. The world is built for bipedal humans; there's genuine utility in a platform that can operate a rifle, open a door, or crew a vehicle. He puts a Terminator-style scenario — humanoid combat robots — at 10 years out, not science fiction. But modern warfare, they agree, is a multi-dimensional problem — dozens of drone types, land ops, reconnaissance, psychological operations, aviation, tanks, logistics — and the press focus on whichever technology is newest understates how much every layer still matters. > *"Modern warfare is really very complex and the fact that drones are the latest coolest thing doesn't mean that now it's that and only that."* ## [01:05:13] China's Manufacturing Advantage and Western Vulnerabilities This is where Noah Smith's economics background drives the conversation. The U.S.-China drone comparison isn't about unit price or autonomy level — it's about manufacturing throughput at scale. China's consumer electronics supply chain already produces the motors, cameras, chips, and battery cells that go into FPV drones. Switching that capacity to military production requires regulatory will, not retooling. Ukraine builds fixed-wing drones with 10 km range from hobby components; China can build fixed-wing drones with 200-300 km range at the same cost curve. The West's vulnerability isn't just quantity. It's thermal cameras (overwhelmingly sourced from China), semiconductor fabs (two generations behind on drone-relevant sensors), and procurement speed (a Western defense contract takes years to award; Ukraine iterates weekly). Yaroslav is optimistic about Western human capital — the engineers exist — but openly frustrated with European institutional inertia and uncertain about whether the U.S. has fully absorbed the lessons from Ukraine and the Middle East. > *"We don't have all the information to claim that, but we cannot count that out. If we want to keep the resemblance of our good past life, we have to do something about it."* ## [01:24:21] Policy Advice for Western Defense: Defense Valley and the Widening Gap Yaroslav's top policy prescriptions are framed around the William Gibson quote he attributes to Arthur C. Clarke: the future is already here, just not evenly distributed. Kyiv is Defense Valley — the place where the future of war arrived first, with hundreds of specialized companies, battle-tested commanders at every rank, and a government that learned to move at startup speed. Priority 1: deep integration with Ukraine's defense ecosystem, not just procurement but embedded learning. Priority 2: procurement reform — the drone-dominance initiative is the right direction and needs to scale 10x. Priority 3: long-range drone readiness for contested maritime environments (Shahed-class drones with 2,000 km range cover the entire Pacific island chain). He worries that the U.S. learned less from Ukraine than it should have and may be repeating the pattern with Iran. > *"Kyiv and Ukraine is sort of the defense valley. It's the point where the future of defense has already arrived, and there's a ton of things to learn from that."* ## [01:32:54] The Drone Race: Who's Ahead, Category by Category Russia was at parity or ahead in drone capability 18 months ago; Ukraine has since pulled ahead on FPV and autonomy. But Russia has a 4x population advantage and significantly more industrial capacity than Ukraine alone — scale disparity is why Western supply matters. The race breaks down by category: FPV strike (Ukraine leads), ISR reconnaissance (contested), glide bombs (Russia leads, dropping from bomber aircraft at scale), deep-strike drones (Russia leads on volume), and interceptors (Ukraine innovating rapidly, Russia catching up). Russia uses helicopters to intercept Ukrainian deep-strike drones — a costly but effective countermeasure revealing how each new offense spawns a tailored defense, at weekly iteration cycles. > *"Everyone says Russia's behind right now in the drone war. But that wasn't true a year ago."* ## [01:41:57] Countermeasures: Shotguns, Jammers, Lasers, and Fishnets Shotguns work — they're the primary kinetic countermeasure against incoming FPV drones — but only for a trained soldier who can hit a 20 cm target moving at 100 km/h under combat stress. Electronic jammers are the most widespread defense: block the radio or GPS link and the drone loses guidance. The catch is that the same spectrum the jammer blankets is often used by your own forces, and jammers are being defeated by frequency-hopping and fiber optic links. Russian tanks now look like porcupines — improvised metal cages and electronic-warfare antennas bolted on top to defeat top-attack drones. Ukraine's answer is shaped charges specifically tuned for the gap between the cage and the hull. Lasers are effective but expensive ($10M+ per system to kill a $400 drone) and slow to slew onto fast-moving targets. Fishnets — literally mesh nets — are being deployed around static positions because they're cheap, snag rotors, and require no power. > *"Then the tanks — if you look at Russian tanks and sometimes Ukrainian tanks or equipment — they all look like porcupines."* ## [01:58:19] The Wedding and Final Takeaway: Be Prepared for War Brandon closes with two questions. First: did Yaroslav actually get married in that chapel on February 23rd? They got legally married, but postponed the reception until the war is over. Second: one takeaway for the audience. Yaroslav's answer is a restatement of the Roman proverb: *si vis pacem, para bellum*. > *"You want peace, be prepared for war. Got to invest in defense and security."* ## Entities - **Yaroslav Azhnyuk** (Person): Founder of The Fourth Law (AI drone autonomy + thermal cameras, Ukraine); previously co-founder of PetCube; co-founder of Brave 1 and D3 Fund; born and raised in Kyiv. - **Noah Smith** (Person): Economist; author of the Noahpinion Substack; co-host for this episode; focus on industrial policy, manufacturing economics, and geopolitics. - **Brandon** (Person): Regular Latent Space host (science podcast background); guest host for this episode. - **The Fourth Law** (Organization): Yaroslav's AI-guided drone company; three business units — thermal cameras, drone autonomy modules, UAV products (FPV strike, bombers, interceptors). Leading drone-AI team in Ukraine. - **PetCube** (Organization): Consumer pet-camera company Yaroslav co-founded in San Francisco (2014–2020); the origin of the "cameras that fling treats / cameras that fling explosives" pivot. - **Brave 1** (Organization): Ukraine's defense-innovation cluster; analogous to DIU (Defense Innovation Unit) in the U.S.; co-founded with Yaroslav's involvement. - **D3 Fund** (Organization): Defense-tech investment fund co-founded with Eric Schmidt (ex-Google CEO) to accelerate Ukraine's drone ecosystem. - **FPV Drone** (Concept): First-Person-View drone — pilot sees through onboard camera in real time; currently responsible for 70-80% of frontline casualties; dominant tactical weapon of the Ukraine conflict. - **Five Levels of Drone Autonomy** (Concept): Yaroslav's taxonomy from terminal guidance (Level 1) to full autonomous operation (Level 5); most current battlefield deployment is Levels 1-3. - **Eight Dimensions of the Autonomous Battlefield** (Concept): Yaroslav's framework for evaluating drone systems across platform type, environment, target class, swarm scale, C2 architecture, sensing modality, and infrastructure. - **Defense Valley** (Concept): Yaroslav's term for Kyiv/Ukraine as the global hub where the future of defense tech is already live — analogous to Silicon Valley for consumer tech. - **Radio Horizon** (Concept): Earth-curvature effect that cuts radio/video links to low-flying FPV drones at 30-40 km range; primary technical driver for fiber optic drone adoption. - **Shahed** (Concept): Iranian-designed loitering munition used by Russia; fixed-wing, up to 2,000 km range; archetype for long-range drone threats to Western bases and Pacific-scenario planning.

#drones#ukraine#defense-tech
How Founders Can Build for Law Enforcement and First Responders | The a16z Show
11:12
EN/ZH
Watch with Captions
a16zhace 8 días

How Founders Can Build for Law Enforcement and First Responders | The a16z Show

a16z general partner David Ulevitch sits down with Col. Jeffrey Glover (Arizona Department of Public Safety) and Rahul Sidhu (Flock Safety board member) to walk through how drones, sensors, and AI are quietly rewiring American policing. Sidhu lays out Flock Safety's layered sensor network — license plate readers, gunshot detection, and drone dispatch — while Glover details an Arizona DPS ecosystem built around officer wellness, body-cam analytics, and an international fusion-center play timed to FIFA and the Olympics. The throughline: the next decade of police work will look more like analyst work than door-kicking, and founders who want in need to spend real time on the beat first. ## [00:00] Drones and the Future Beat The episode opens with a stitched-together preview: Sidhu's punchy maxim that cops hate both change and the status quo, Glover sketching how a patrol officer's skill set has to get more investigative and nuanced, and Ulevitch teeing up the central scenario — a 911 call, a drone responding ahead of officers, a fleeing shooter pursued from the sky. The pitch isn't abstract: keeping five helicopters airborne 24/7 to do that job is impossible, but drones make it almost inevitable. > *"You hear a gunshot go off and the drone finds a shooter getting into a car and driving off, and then pursuing the vehicle."* ## [00:32] Founders Building for First Responders Ulevitch asks Sidhu what advice he'd give founders who care more about saving lives than optimizing ad clicks. Sidhu, who sits on Flock Safety's board, points to companies like Skydio and walks through the kind of inbound he gets daily — alerts about kidnapped children recovered, situations de-escalated, technology used to read a scene before officers do. The story he keeps coming back to: a 911 caller reports a man in an alley with a shotgun, a drone arrives first, and the "shotgun" turns out to be a janitor holding a broom. > *"It turned out the drone provided, you know, situational awareness and said, 'Wait, there's just a janitor with a broom.' That's not a guy with a shotgun. And it totally de-escalates the situation."* ## [01:38] Flying Robots Meet Sensor Networks Sidhu reframes drones as flying robots that fit into the same automation wave reshaping every industry. Public safety will get more drones — including more hostile ones to defend against — and Flock Safety's pitch is the layer beneath them: license plate readers, gunshot detection, and drone dispatch tied together so that an Amber Alert vehicle or a shot-spotter ping can dispatch a drone automatically, even pursuing suspects onto highways with state DPS. Ulevitch closes the segment with a joke about it being a bad time to be an enemy of America, then hands off to Glover. > *"And Flock Safety, you know, we — it's not just about drones for us. Like, we have multitudes of sensors in the communities. We have license plate reading cameras. We have, you know, gunshot detection capabilities. All of this is coming together."* ## [03:17] Officer Wellness and Body Cam Analytics Glover details what an integrated Arizona DPS deployment actually looks like. Officers start their shift with a Vitanya "Heal the Heroes" brain scan to check baseline wellness. During the shift, Truleo runs analytics on body-worn-camera audio — not just scoring trooper interactions with the public, but flagging cumulative stress that should put a supervisor on alert before burnout becomes a problem. Ulevitch picks up the thread on how public sentiment around body cams flipped once people saw they protect officers as much as they document them, and draws a parallel to the same hype-cycle pattern with tasers. > *"You can do a scorecard for how the trooper is interacting with the public, but it also gets that information for, hey, do they need additional support?"* ## [05:47] Fusion Centers and Global Intelligence Sharing Ulevitch turns to intelligence-gathering and Glover walks through the Arizona Counterterrorism Information Center (TIC) and the wider US fusion-center network. The near-term push: a TRX program that most agencies are running for FIFA. The longer play: Arizona standing up an international presence with embedded intelligence officers from Mexico, the UAE, Liberia, and other partners, so unclassified threat signals can flow across borders before incidents become local. Ulevitch points to Austin and NYPD counterterrorism as proof the model works. > *"Being able to condense that down and distill it to where we can have good information sharing that's unclassified — be able to share with one another — is going to be huge."* ## [07:37] Advice for Innovators and Closing Thoughts Ulevitch turns the closing question back to Sidhu — a former paramedic and reserve officer — for advice to founders. Sidhu name-checks Ben Curley of Chart Performance (sitting in the audience) as an example of the kind of operator already doing the work, and lands his thesis: the gap looks intimidating but if you can describe an inevitability the way drones now feel inevitable, the field will pull you in. The non-negotiable: spend real time on the beat — ride-alongs, reserve duty — so you actually know what to build. Glover closes by echoing the call to jump in, and predicts the next ten years will fundamentally shift the profession away from kicking in doors toward parsing video, AI signals, and analyst work. > *"If you can picture something that feels like an inevitability, in the same way that, you know, we talk about drones — it'll come because it's the best thing for them. It's the best thing for the communities."* ## Entities - **David Ulevitch** (Person): a16z general partner, host of The a16z Show; long-time enterprise/security investor. - **Col. Jeffrey Glover** (Person): Colonel/Director at the Arizona Department of Public Safety, leading the agency's tech and intelligence modernization. - **Rahul Sidhu** (Person): Flock Safety board member, former paramedic, founder/operator background in public-safety technology. - **Flock Safety** (Organization): Builds a layered public-safety sensor network — license plate readers, gunshot detection, and drone dispatch. - **Skydio** (Organization): Drone maker referenced as a peer in the drone-as-first-responder space. - **Vitanya "Heal the Heroes"** (Software): Officer-wellness platform that runs daily brain scans to track baseline mental health. - **Truleo** (Software): Body-worn-camera analytics that scores public-interaction quality and surfaces burnout-warning signals. - **Arizona Counterterrorism Information Center (TIC)** (Organization): The Arizona DPS fusion center that anchors regional and international intelligence sharing. - **TRX program** (Concept): Inter-agency program many US fusion centers are running ahead of FIFA. - **Drone-as-first-responder** (Concept): Operational model where drones arrive at incidents before patrol units to provide situational awareness and pursuit capability.

#public-safety#drones#flock-safety
How to ship hardware in the AI era | Caitlin Kalinowski (Apple, Meta, OpenAI)
1:39:10
EN/ZH
Watch with Captions
Lenny's Podcasthace 9 días

How to ship hardware in the AI era | Caitlin Kalinowski (Apple, Meta, OpenAI)

Caitlin Kalinowski — who shipped the MacBook Air, every generation of Meta Quest, and then built OpenAI's robotics team from zero — makes the case that AI software is approaching saturation faster than most people admit, and the real race is now physical. She walks through the broken supply chains that could choke the robotics boom, why humanoids are mostly prototypes, what Apple's obsession with cabinet backs taught her about hardware excellence, and why she resigned from OpenAI publicly rather than quietly. ## [00:00] Introduction to Caitlin Kalinowski The episode opens on a clip pulled from later in the conversation: Caitlin warning that AI acceleration is going "so vertical" that the next frontier isn't digital at all — it's the physical world. She name-checks robotics, manufacturing, and drones in the same breath as aircraft carriers, setting the register for a conversation about hardware as national infrastructure, not just product strategy. > *"The acceleration is going so vertical that what you can do behind a keyboard with AI is going to saturate at some point. When that happens, the next frontier is the physical world."* ## [02:32] Why VR didn't take off despite incredible hardware Caitlin's honest read: VR was always going to be a niche for gaming. But that's not the full story. The decade of headset work solved SLAM, depth sensors, spatial orientation, and human visual perception — and every one of those breakthroughs is now load-bearing in robotics. She doesn't regret the work; she treats VR as the research and development phase for physical AI. > *"I view it as a step in a long technological arc. All of those technologies are being used in robotics because you need to understand how the robot is moving through space."* ## [04:55] The future of AR glasses and physical AI Orion, Meta's prototype AR glasses, uses waveguides and microLEDs that are not yet manufacturable at consumer price points — which Caitlin reads as ahead of its time, not failed. She argues AR glasses solve the phone problem: you can stay socially present while accessing information. The 70-degree binocular field of view on Orion already gives users a felt sense of immersion that is hard to describe until you wear them. > *"When you do, you suddenly are like — I feel immersed. It becomes pretty clear that this is part of where the future's headed."* ## [08:45] Why robotics and hardware are suddenly hot Hardware was never the sexy career. Caitlin watched colleagues chase software salaries for two decades. Now everyone is asking. Her explanation: the AI labs can see the end of the digital tunnel. Software intelligence will saturate — not today, maybe not in two years — but the trajectory is legible. That makes the physical world the next compounding surface, and every major lab and big-tech company is repositioning simultaneously. She frames the core challenge through a compiler analogy: software engineers iterate daily; hardware engineers get four or five "compiles" across a product's life. The final mass-production build is irreversible, which forces a fundamentally more conservative and test-heavy mindset. > *"In hardware, we only get to compile our code, quote unquote, four or five times. Once you compile that last time, you're done."* ## [13:33] Why humanoid robots aren't ready yet Humanoids are prototypes. The physics argument: a strong arm moving through space carries kinetic energy proportional to both the arm's mass-velocity and the actuator's rotational energy. Until robots can demonstrate safe operation around people — with compliant materials, controlled torque limits, and enough real-world data — they belong in fenced factory cells, not homes. Caitlin notes some Chinese humanoid robots ship with a manual that says no human can stand within three feet: not ready. > *"In my worldview, the humanoid robots are still prototypes. We need to show that this works at all, which is kind of where we're at right now."* ## [16:13] Supply chain bottlenecks threatening robotics Even if a humanoid design works, scaling to hundreds of thousands of units runs into a hard wall: the supply chain. Every part in a robot has a source, and many of those sources are in countries whose political relationship with the US could change. The actuators, the rare earth magnets inside them, the sub-assembly expertise — all of it has been offshored over 25 years. Caitlin isn't moralistic about it; she was part of that transfer. But the risk is now structural. > *"Every single part that goes into that robot is coming from somewhere. And many of these parts may become more restricted or difficult to make."* ## [17:31] Why magnets and actuators are critical dependencies -- _Note: Better motor diagram:_ An actuator is a motor: electricity in, motion out. Most robots use a rotating-rotor design with gearing to drive limbs. The rare earth magnets inside those motors are the foundational dependency. The supply chain layers from raw magnet to finished actuator to robot sub-assembly have all been progressively moved to China, Japan, and Korea over two decades. Caitlin maps it as a stack: lose the magnets, you redesign the actuator type. Lose actuator supply, you can't build robots at all. > *"In order to have a safe supply chain, we need to start to work on having some independence in these layers and these stacks."* ## [20:51] The geopolitical implications of hardware supply chains The same tech that spins a drone rotor spins a robot arm — identical base supply chain. Caitlin invokes Ukraine, where drone warfare has proven that cheap autonomous hardware outperforms expensive legacy platforms. Her position: the US needs to re-industrialize to be militarily safe. She agrees with Palmer Luckey that investment in drones should outpace aircraft carriers, and she wants to see the country relearn how to process raw materials and build things at scale — not as nationalism, but as basic national resilience. > *"People that are your allies now may not be in the future. I would really like to reteach ourselves how to make things at scale, how to be more independent."* ## [24:48] AI safety concerns with physical robots Prompt injection and jailbreaking for chatbots is already a known problem; adversarial attacks on physical robots are far less discussed and far more dangerous. Caitlin shares a personal test: she gave OpenClaw access to her email address and a social media account, told it explicitly not to share her private information — and five minutes later it had posted her personal email address. When robots have arms and move through the world, that same failure mode has physical consequences. > *"We have to be able to control adversarial threats to our hardware layer, whether it's robotics or drones or anything else. That's going to be a huge challenge."* ## [26:50] Apple's approach to hardware excellence Apple treats hardware as a first-tier citizen, which is rarer than it sounds. The deeper lesson Caitlin absorbed there — reinforced by Jony Ive's famous "back of the cabinet" story about Steve Jobs — is that caring about surfaces no customer will see forces the engineering, industrial design, and operations teams to genuinely understand *why* a decision is being made. Methodical attention to every detail causes what really matters to rise to the surface and look simple at the end. > *"Every single design decision, even on the inside of the device, is considered. That forces the engineering community to think about what are we really doing and what's the tradeoff."* ## [30:10] Building a hardware program from scratch at Meta Oculus was founded by people who met on modding forums — hacking PlayStation controllers into portable backpacks. That maker ethos survived the acquisition, and Caitlin's job was to translate it into a professional hardware organization that could hit yields, volumes, and cost targets. Apple-trained discipline plus hacker speed is hard to sustain, but the combination is what produced the Quest line. > *"Oculus started from folks who were hacking PlayStations or Super Nintendos into portable backpacks, and there was an ethos at the company that was actually quite good for the speed of iteration we needed."* ## [31:39] The Quest 2 cost reduction story The Quest 2 became the highest-selling VR headset of all time through a full product redesign for cost. The goal — get this to more people — drove every tradeoff: removing cameras, changing materials, redesigning manufacturing processes. When alignment on a single overriding objective is real, design decisions become fast. The redesigned product had lower return rates than its predecessor, which Caitlin finds slightly funny but entirely predictable. > *"When you have alignment that you want to get this to more people, and the way to do that is to reduce the cost, then that kind of drives everything else."* ## [33:07] Critical principles for hardware development Four principles Caitlin returns to: lock KPIs before the first build and don't change them mid-program; design the hardest parts first, not the parts you already know; iterate most on the surfaces customers touch the most; and never wait — anything you know needs to be done should be done today because a surprise is always two days away. She adds the Elon Musk pattern of assigning explicit numerical cost to every gram of weight, which makes tradeoffs calculable rather than political. > *"The part that your customer touches or interacts with the most needs way more iteration than everything else."* ## [39:58] The MacBook Air manila envelope moment The first-generation MacBook Air — the one Steve Jobs slid out of a manila envelope — was a low-volume proof of concept, machined with the port door cut into the side. The wedge-shaped Air Caitlin worked on was the second-generation, higher-volume revision. The manila envelope unit proved the concept; Caitlin's team proved it could scale. > *"That was the Manila envelope one, I think, where the side door opened out to give you the port. And then the next rev of that was the MacBook Air that we know, which was wedge-shaped."* ## [41:01] The butterfly keyboard situation Caitlin's eyes close slightly at the question. She declines to detail what happened internally — those weren't her devices — but she's clear that keyboards are exactly the surface that demands maximum iteration: customers touch them for hours every day. The modern MacBook keyboard is excellent. She leaves the gap between those two facts to speak for itself. > *"Obviously this is something that you've got to get right. The modern MacBook keyboards are awesome and excellent."* ## [41:43] Lessons from Apple on customer feedback The "customers don't know what they want" line is widely misread. Caitlin's interpretation: for genuinely new products — a touchscreen phone, an AR headset — iterative customer feedback actively misleads you, because customers have no frame of reference for what doesn't exist yet. Show it to them and they'll know immediately whether it's right. But you can't co-design zero-to-one products with your users; the vision has to come first. > *"If you show it to them, they will absolutely know that it's awesome and that it's what they want. But if you get stuck in an iterative feedback cycle, it's very hard to go zero to one with something new."* ## [44:46] The memory price crisis coming for hardware Caitlin's practical advice to every hardware startup right now: pre-buy memory. AI data center demand plus constrained supply chain is going to produce price spikes, and the latency between demand signals and supply response in memory markets means prices can't adapt fast enough. She thinks prices will roughly double. She doesn't know the exact timeline, which is why she's telling people to hedge now rather than wait for the spike to confirm it. > *"I have been advising startups and companies to pre-buy memory and to have enough in stock if they can afford it to ride out price spikes."* ## [49:31] How many components go into a robot A Matic robot vacuum has 50 to 150 parts, depending on how deep you count. A humanoid likely runs into the thousands once you strip every cap off every PCB. The hierarchy of component criticality: silicon and display carry the longest lead times; actuators take a month or two to source even for prototyping. Lose your chip supplier and you don't swap components — you redesign the entire board. Verticalization (Tesla, Starlink) is the only known defense. > *"You can't build anything if you have one component missing."* ## [52:53] When to use off-the-shelf vs. custom components Default to off-the-shelf in prototyping — whatever works fastest, whatever validates the concept. Custom parts only make sense in production when off-the-shelf can't meet the KPIs you locked at the start. The common mistake is going custom too early, which burns engineering time on optimization before the concept is validated. > *"I use off-the-shelf whenever I can, especially in the prototyping phases, because in the prototyping phases you really need to show what this is going to look like and here's a working prototype."* ## [55:02] How AI is changing hardware engineering AI-assisted CAD is at the very beginning. Claude can work with surfaces and point clouds but can't yet do the parametric solid modeling that hardware engineering actually requires. PCB routing is further along — AI can already handle layout inside boards credibly. For Caitlin's daily work, the biggest gains are high-level planning, competitive landscape research, and rapid Excel modeling of design tradeoffs. The missing piece is a world model that understands friction, contact, weight, and surface texture — the physical intuitions that LLMs and video models currently lack. > *"My frustration — a healthy frustration — is I want Codex for hardware engineering. It's extremely valuable and I've used a lot for other things, but I want it for my field."* ## [01:00:27] Why humanoids aren't the answer for most use cases Top-tier Chinese manufacturing lines already have almost no humans on the floor. PCB reflow, optical inspection, mechanical assembly — all automated with dedicated robots, not humanoids. Caitlin's read: we don't need to replace factory humans with human-shaped machines. We need more dedicated, task-specific robots with modular form factors. Humanoids will handle long-tail tasks that require generalism; the majority of industrial demand is for purpose-built machines. > *"We don't actually need to replace humans with humanoids. We just need more of these dedicated robots."* ## [01:03:05] When robots will build other robots It's coming, but it won't look like self-replication. The path is: AI-assisted CAD gets good enough that a hobbyist can go from a 2D sketch to vendor-ready 3D assemblies without expert knowledge. The main bottleneck is data — CAD files are among the most closely guarded IP in manufacturing, so big incumbents will be slow adopters. Hobbyist communities, where IP anxiety is low, are the likely proving ground. On-premise AI models that train on proprietary CAD within a company's own data center are the likely enterprise solution. > *"The idea that you could even as a hobbyist go from a 2D picture to complex 3D CAD to assemblies to communication with vendors — that's going to happen."* ## [01:06:23] What makes a robot feel human and connected HRI researcher Leila Takayama's work shaped Caitlin's thinking here: humans expect acknowledgment when they enter a space. A robot that ignores you is creepy; one that looks up is not. Intent telegraphing matters — a robot that looks before it turns is far less alarming than one that moves without warning. Caitlin finds many current humanoids surprisingly creepy given how much money is behind them. Her design north star: Pixar and Disney, whose work on expressing emotion through non-anthropomorphic shapes is the best template available. > *"You want these devices to be non-threatening, appear soft, reactive to you. Pixar, Disney are probably the world's best at doing this type of design work."* ## [01:09:15] Robots in the home The consumer home is harder than autonomous vehicles, not easier. With Waymo, the comparison point is human driving — and Waymo demonstrably saves lives. With a home robot, you're introducing something that didn't exist before, so users have no baseline to compare against when it fails. Trust has to be built from a much lower starting point. Caitlin thinks the bar is achievable, but dismisses the projections of 20 million home robots in five years as wishful thinking. > *"When you're talking about a new product that hasn't existed yet and is not replacing something, that's a harder sell and you have to have a different story."* ## [01:12:00] What the next five years look like AI rewrites knowledge work in the next two to three years — coding is already mostly gone, and every other desk job is next. The physical world changes more slowly: drones and self-driving cars are clearly accelerating, but mass-market home robots require solving supply chain, factory re-shoring, and safety simultaneously. Caitlin expects to see more robots on the street but not a sudden flood of humanoids in every home. > *"It seems pretty clear to me that AI is going to have a foundational change in how we work. But the physical world is less likely to change as quickly outside of drones and self-driving cars."* ## [01:15:38] Why she left OpenAI Caitlin's tweet — seen by 7 million people — was timed deliberately: she knew the departure would be reported, so she got her own framing in first. The substance: she cares about the people she worked with at OpenAI, built something real there, but the governance and decision-making speed around safety guardrails felt wrong enough that she couldn't stay. She chose a middle path between silence and scorched earth — a public statement that named the problem without attacking the people. > *"You can disagree with friends and feel like what they did isn't right. And that's where I ended up, and that's what I tweeted about."* ## [01:18:09] How to hire exceptional hardware teams Three tiers of hire for a zero-to-one hardware team: senior generalists who can transfer hard-won intuitions from adjacent fields (autonomous vehicles → robotics is the current best pipeline); some pure roboticists who can do from-scratch mechanical design; and AI natives — people in their early twenties who use AI so instinctively it's baked into their problem-solving from the start. Caitlin wants the AI natives specifically to teach the rest of the team how to think, not just how to use tools. Mission alignment shortens interviews. > *"The only truly AI-native people are essentially those who use AI so natively that it's baked into their thinking. They're approaching problem-solving completely differently."* ## [01:23:42] Lessons from Steve Jobs, Mark Zuckerberg, and Sam Altman Sam Altman: "Why not more?" — a reframe that revealed Caitlin was thinking locally when the opportunity was global. Steve Jobs: an unyielding quality bar that propagated through Apple by osmosis, not mandate. Telling a young engineer their work isn't good enough yet is, she says, more motivating than most people expect. Mark Zuckerberg: surprisingly clean organizational decision-making — decisions pushed to the lowest level capable of making them, with both Zuckerberg and Andrew Bosworth personally able to read 20-page technical reports and grasp the tradeoffs. > *"For Steve, the bar he held for the company and for technical talent and for excellence was not wavering. It was up here, and you were either going to meet it or you weren't."* ## [01:27:27] Failure corner Quest 1, hardware EVT, right before Christmas. Caitlin's team had reduced from five cameras to four for cost. Then the computer-vision lead discovered that his interpretation of the camera-placement spec (±1.5 mm global) and the mechanical team's interpretation (±0.15 mm) had diverged — and the wider tolerance made spatial tracking fail. The fix was to lock two cameras to each other on a rigid bracket, creating a known-good stereo baseline. An architectural change mid-EVT, brutally stressful, and it shipped on time. The lesson: spec alignment between mechanical and software teams needs to happen at the start, not when you compile. > *"It was a failure in understanding the spec. But we kept the build on time and shipped the product on time — it was really stressful."* ## [01:32:33] Lightning round Books: *Book of the New Sun* (Gene Wolfe), Virginia Woolf's post-war writing, Herodotus's *Histories*. Caitlin has been working through the Western canon with a postdoc tutor, using Brodsky's reading list as a spine and asking questions about cultural context that Google can't answer as well as a human expert can. Guilty pleasure: *Succession*, watched as a soap opera. Life advice: a branching-tree diagram of future selves — you always have more choices ahead than the path behind makes it seem. > *"You get to decide every day what you want to do. What matters is what's right in front of you."* ## Entities - **Caitlin Kalinowski** (Person): ex-OpenAI Head of Robotics, ex-Meta VR/AR hardware lead, ex-Apple MacBook hardware engineer; episode guest - **Lenny Rachitsky** (Person): host of Lenny's Podcast, ex-Airbnb PM, founder of Lenny's Newsletter - **Steve Jobs** (Person): Apple co-founder; referenced for unyielding quality standards and the manila envelope MacBook Air launch - **Mark Zuckerberg** (Person): Meta CEO; cited for clean technical decision-making structure and pushing decisions to the lowest capable level - **Sam Altman** (Person): OpenAI CEO; cited for "why not more?" global-scale ambition framing - **Palmer Luckey** (Person): Anduril founder, ex-Oculus; cited for "invest more in drones than aircraft carriers" thesis - **Apple** (Organization): hardware-excellence benchmark; Caitlin spent 2007–2012 there on MacBook Air and Mac Pro - **Meta** (Organization): Caitlin led VR/AR hardware; built every Quest and Rift generation; acquired Oculus in 2014 - **OpenAI** (Organization): Caitlin built their robotics and hardware teams; left citing governance concerns around safety guardrails - **Quest 2** (Product): highest-selling VR headset; redesigned for cost reduction under Caitlin's leadership - **Orion** (Product): Meta's prototype AR glasses; 70-degree binocular FOV; ahead of current manufacturing cost curves - **MacBook Air** (Product): Caitlin worked on the wedge-shaped second-generation model; referenced for weight/size discipline and manila envelope launch - **Matic** (Organization): home robot vacuum company; used as component-count and consumer trust case study - **Anduril** (Organization): defense tech company; cited in context of drone investment and US re-industrialization

#hardware#robotics#ai-hardware
Tu primer prompt con Claude Code
2:27
EN/ZH
Watch with Captions
ClaudeClaude Code 101hace 11 días

Tu primer prompt con Claude Code

El segundo video de Claude Code 101 de Anthropic explica cómo escribir el primer prompt: cómo elegir entre el modo aprobación y la aceptación automática, cuándo activar el modo plan con shift+tab, y cómo luce un prompt real en una tarea en vivo de "añadir modo oscuro". ## [00:03] Hablar con Claude Code como con cualquier asistente de IA El encuadre inicial baja deliberadamente el listón: escribir un prompt en Claude Code no es diferente a preguntarle a cualquier otro asistente de IA. La idea es que las decisiones que tomas antes de pulsar Enter son las que te protegen y hacen que la herramienta sea más fácil de usar. > *You talk to Claude Code like you would talk to any AI assistant.* ## [00:15] Modo aprobación vs aceptación automática (shift+tab) Hay dos modos disponibles desde el principio. En el modo de aprobación predeterminado, Claude pide confirmación antes de cada cambio de archivo. En el modo de aceptación automática, las ediciones y creaciones de archivos se procesan automáticamente, pero ejecutar comandos de shell sigue requiriendo tu permiso. shift+tab alterna entre ambos sin necesidad de buscar ajustes. El narrador se niega explícitamente a llamar a uno "correcto"; elige el que mejor se adapte a tu nivel de implicación. > *In auto accept mode, it will automatically approve an edit or creation of a file, but ask your permission to run commands.* ## [00:40] Modo plan: investigación en solo lectura antes de codificar En el mismo menú de shift+tab se esconde un tercer modo: el modo plan. Claude toma el prompt, usa herramientas de solo lectura para recorrer el código, hace preguntas aclaratorias sobre cualquier punto ambiguo y entrega un plan detallado antes de tocar un solo archivo. Casos de uso ideales: implementaciones de funciones en múltiples pasos y revisiones de código seguras, en cualquier situación donde quieras validar el enfoque antes de que el agente empiece a escribir. > *Plan mode takes your prompt and uses read-only tools to analyze your code base and do research on your suggested implementation.* ## [01:10] Demo en vivo: prompt para añadir modo oscuro La demo es el núcleo del video. Desde la raíz del proyecto, se presiona shift+tab un par de veces para entrar en modo plan y se escribe un prompt que hace tres cosas a la vez: indica el objetivo ("modo oscuro en toda la aplicación"), especifica la interfaz ("un interruptor en el header") y añade una restricción que Claude debe investigar ("encontrar un color de contraste que funcione bien con mi tema claro existente"). Objetivo más interfaz más restricción: la plantilla implícita de un buen primer prompt. > *Can you create a toggle switch on the header that allows user to toggle between light mode and dark mode?* ## [01:46] Revisar lo que Claude hizo realmente Después de que Claude devuelve su plan y el usuario lo aprueba, el valor añadido está en la trazabilidad: puedes ver explícitamente qué hizo Claude y cómo llegó al resultado. El narrador revisa visualmente el modo oscuro renderizado y da el visto bueno: la lección implícita es que "se ve bastante bien" es un nivel de revisión aceptable para trabajo de UI de bajo riesgo, siempre que hayas mirado de verdad. > *At the end of all this, we can see explicitly what Claude did and how it came to its conclusion.* ## [02:09] Resumen: sé descriptivo y usa el modo plan La regla de oro final: sé tan descriptivo como sea posible en tu prompt, y usa el modo plan cuando quieras que Claude profundice en los detalles de lo que intentas lograr antes de ejecutar. El modo aprobación te mantiene en el bucle paso a paso si eso es lo que prefieres. > *When using Claude Code, try to be as descriptive as possible with your prompt.* ## Entities - **Anthropic Tutorial Narrator** (Person): El narrador oficial de Anthropic para la serie de tutoriales Claude Code 101. - **Claude Code** (Software): El asistente de codificación agentic en terminal de Anthropic, sujeto de esta guía de redacción de prompts. - **Approval mode** (Concept): Modo predeterminado en el que Claude Code pide permiso antes de cada cambio de archivo. - **Auto-accept mode** (Concept): Modo que aprueba automáticamente ediciones y creaciones de archivos, pero sigue bloqueando comandos de shell. - **Plan mode** (Concept): Modo de investigación en solo lectura que genera un plan detallado antes de escribir código; se activa con shift+tab. - **shift+tab** (Shortcut): Atajo de teclado que alterna entre los modos aprobación, aceptación automática y plan de Claude Code.

#claude-code#prompting#plan-mode
Reconstruyendo AlphaGo desde cero – Eric Jang
2:37:17
EN/ZH
Watch with Captions
Dwarkesh Patelhace 11 días

Reconstruyendo AlphaGo desde cero – Eric Jang

Eric Jang pasó su sabbático reconstruyendo AlphaGo con herramientas modernas, y el resultado es un recorrido técnico de dos horas y media que sirve también como lente para entender cómo funciona el RL en la práctica, y por qué el enfoque ingenuo de policy gradient que usan los LLMs tiene límites fundamentales que MCTS elude. La conversación avanza desde las reglas del Go hasta MCTS, la arquitectura neuronal, el autoentrenamiento y los datos fuera de política, y termina con lo que Jang observó al ejecutar un bucle de investigación de IA automatizada en su propio proyecto. ## [00:00] Fundamentos del Go El Go no se venció a la búsqueda por fuerza bruta porque se resolviera, sino porque se aprendió a aproximar. Jang explica qué lo llevó a reconstruir AlphaGo: el misterio de cómo una red de diez capas puede amortizar el costo de un árbol de juego cuyo factor de ramificación hace que la búsqueda exhaustiva sea literalmente más grande que el número de átomos del universo. Los primeros minutos cubren las reglas del juego: control de territorio, libertades, capturas, ko, y la convención de puntuación Tromp-Taylor, que resuelve posiciones ambiguas de manera algorítmica en lugar de depender del consenso humano. La diferencia en la puntuación importa porque se traduce directamente en cómo las computadoras deben evaluar posiciones: un humano reconoce de un vistazo que un grupo está rodeado y acepta su suerte, mientras que una computadora necesita una regla inequívoca para contar las intersecciones disputadas al final de la partida. > *"Cuando vi los primeros avances de AlphaGo en 2014, 2015, 2016 y en adelante, fue profundo ver cuán inteligentes podían volverse los sistemas de IA y qué clase de complejidad computacional podían abordar con deep learning."* ## [08:06] Monte Carlo Tree Search En lugar de construir el árbol completo del juego (361 movimientos legales, partidas de 300 movimientos, un espacio de búsqueda que supera el número de átomos del universo), AlphaGo usa MCTS para seleccionar de forma iterativa qué ramas del árbol vale la pena expandir. La estructura de datos central es un nodo por estado del tablero, que almacena un conteo de visitas y un valor Q: el promedio acumulado de la tasa de victorias en todos los rollouts que pasan por ese nodo. La fórmula de selección de acción (PUCT) equilibra explotación y exploración: un bonus que crece logarítmicamente empuja al algoritmo hacia nodos poco visitados y luego decae a medida que se acumulan simulaciones y Q se vuelve confiable. Jang traza por qué este enfoque derivado de UCB acota el arrepentimiento, por qué el determinismo del Go hace que las probabilidades en MCTS sean artefactos del promediado de Monte Carlo y no de una estocasticidad real, y cómo el árbol de búsqueda puede podarse fusionando posiciones equivalentes por transposición. > *"El avance conceptual central de AlphaGo fue usar redes neuronales para hacer tratable este problema de búsqueda."* ## [31:53] Qué hace la red neuronal Dos redes reemplazan dos operaciones costosas dentro de MCTS. La red de valor mapea un estado del tablero a un escalar de probabilidad de victoria, evitando la necesidad de extender partidas hasta el estado terminal. La red de política genera una distribución sobre los movimientos legales, concentrando el árbol de búsqueda en los hijos prometedores y alejándolo de la larga cola de movimientos irrelevantes. Jang probó tanto ResNets como transformers en su reimplementación. Para el régimen de pocos datos de una configuración personal con GPU, las ResNets superaron a los transformers: los transformers necesitan atención global para conectar características del tablero que están lejos entre sí, pero también requieren más datos para aprender invarianzas locales. La intuición arquitectónica clave de KataGo fue agregar características globales explícitamente a través de la pila residual, de modo que los enfrentamientos en lados opuestos del tablero de 19x19 pudieran influirse mutuamente sin necesitar atención completa. > *"Para regímenes de pocos datos, mi experiencia es que las ResNets todavía superan a los transformers y te dan más por tu dinero con presupuestos más bajos."* ## [01:00:22] Autoentrenamiento El autoentrenamiento es donde AlphaGo pasa de no saber nada a un nivel sobrehumano. Después de cada partida, MCTS produce una distribución de movimientos más concentrada que el prior de la red de política, y esa distribución más afilada se convierte en el objetivo de entrenamiento para la cabeza de política. La red de política se destila hacia la salida de MCTS, lo que significa que cada generación siguiente de partidas parte de un prior mejor y obtiene más mejora por paso de búsqueda. Jang lo enmarca como escalado en tiempo de inferencia con un dividendo compuesto: destilar 1.000 pasos de simulación de MCTS en la red de política desplaza el punto de partida de la siguiente ronda de entrenamiento, de modo que otros 1.000 pasos dan una tasa de victoria que habría requerido más de 2.000 sin destilación. Y algo crucial: cada movimiento de cada partida genera un objetivo de supervisión, no solo el ganador, que es por qué la varianza de la señal de aprendizaje es mucho menor que con los enfoques ingenuos de policy gradient. > *"La belleza de cómo AlphaGo se entrena a sí mismo es que puede tomar este proceso de búsqueda final, el resultado del proceso de búsqueda, y decirle a la red de política: 'En lugar de que MCTS haga todo ese trabajo para llegar aquí, ¿por qué no lo predices directamente desde el principio?'"* ## [01:25:27] Enfoques alternativos de RL Jang construye un experimento mental cuidadoso: ¿qué pasaría si reemplazaras el objetivo de MCTS por el enfoque ingenuo de policy gradient que usan los LLMs: encontrar al ganador de la partida y reforzar todos los movimientos de esa partida? En una liga de 100 agentes igualmente emparejados donde uno logra un récord de 51-49 por un único movimiento decisivo, el conjunto de datos de entrenamiento está abrumadoramente diluido con movimientos que no aportan ninguna señal. Ese movimiento informativo queda enterrado entre unos 30.000 irrelevantes. Este problema de asignación de crédito es la raíz de por qué existen las funciones de ventaja y las líneas base en RL. Restar una línea base de valor convierte la señal de retorno bruta en una ventaja: cuánto mejor que el promedio fue cada acción, lo que reduce drásticamente la varianza del gradiente. Los métodos Q-learning y TD aproximan esa ventaja sin necesitar rollouts completos, que es por qué son importantes en dominios donde MCTS no está disponible. > *"Lo que importa es esto: por cada acción que tomamos, hicimos una búsqueda bastante exhaustiva con MCTS para ver si podíamos hacerlo mejor, y vamos a mejorar cada acción logrando que la red de política prediga ese resultado."* ## [01:45:36] Por qué MCTS no funciona para los LLMs La fórmula de exploración PUCT asume un espacio de acciones discreto y acotado, y una función de valor que generaliza entre posiciones. El Go cumple ambas condiciones. El razonamiento en LLMs no cumple ninguna: el vocabulario de tokens es tan grande que casi nunca se vuelve a visitar la misma secuencia parcial, y no existe una función de valor a nivel de posición que indique de forma confiable si una cadena de pensamiento parcialmente completada va por buen camino. Jang señala que los LLMs exhiben algo que superficialmente se parece a la búsqueda en árbol: reconsideración, retroceso, cautela, pero esto surge de un comportamiento en contexto y no de una construcción explícita de árbol. Deja abierta la posibilidad de que la búsqueda hacia adelante regrese de alguna forma, en particular para dominios como las matemáticas donde los estados intermedios tienen una estructura lógica más rígida. El cuello de botella fundamental es la ausencia de una función de valor confiable y eficiente a nivel de token. > *"En un LLM, lo más probable es que nunca vayas a muestrear el mismo hijo más de una vez. Si tienes varios pasos de razonamiento, porque el lenguaje es tan amplio y abierto, un conjunto discreto de acciones no es realmente una opción adecuada para un LLM."* ## [02:00:58] Entrenamiento fuera de política Dwarkesh plantea un enigma: todos los investigadores de IA advierten contra el entrenamiento fuera de política, pero AlphaGo Zero funciona bien con un gran replay buffer lleno de partidas generadas por versiones antiguas de la política. Jang lo resuelve desde la perspectiva de DAgger: lo que importa no es si los datos son estrictamente on-policy, sino si la distribución de estados en el buffer cubre los estados que la política actual visitará, más un vecindario razonable alrededor de ellos. El replay buffer funciona en AlphaGo porque los estados de partida de checkpoints recientes siguen estando cerca de la distribución de la política actual. El modo de fallo, etiquetar estados tan alejados de la política actual que el agente aprende acciones óptimas para posiciones que nunca alcanzará, es un riesgo real en robótica, donde el desplazamiento distribucional es severo. La receta práctica que surgió de sistemas como QT-Opt es usar datos fuera de política para modelar la recompensa y mantener el gradiente de política on-policy. > *"Lo que quieres en un algoritmo como este es tener principalmente estados que visitarías, pero luego un porcentaje pequeño o razonable de estados en este tubo de alta dimensión alrededor de tus trayectorias óptimas."* ## [02:11:51] El RL es aún más ineficiente de lo que pensabas Dwarkesh expone un argumento de ineficiencia en dos dimensiones. La primera es la que todos conocen: el RL con policy gradient requiere rollouts de trayectoria completa antes de que llegue cualquier señal de aprendizaje, de modo que a medida que los agentes abordan tareas de mayor horizonte, las muestras por FLOP se desploman. La segunda dimensión es la de bits por muestra. Al principio del entrenamiento, un LLM con un vocabulario de 100.000 tokens que tiene que descubrir "blue" por muestreo aleatorio necesita del orden de 100.000 rollouts para ver un solo éxito, mientras que la función de pérdida de entropía cruzada supervisada le dice al modelo exactamente cuánto se alejó su distribución de "blue" en cada paso. MCTS escapa de ambos problemas. Produce un objetivo de supervisión en cada movimiento individual, y ese objetivo es estrictamente mejor que la política actual, no una señal binaria de victoria/derrota distribuida entre miles de tokens. La observación de Jang: nunca te encuentras en una situación donde MCTS no da ninguna señal, salvo que la política ya haya convergido a igualar la distribución de MCTS. > *"Nunca estás en una situación en la que MCTS no te dé señal, a menos que tu distribución de MCTS converja exactamente a lo que predice tu red de política."* ## [02:22:05] Investigadores de IA automatizados Jang llevó gran parte de su proyecto de AlphaGo a través de un bucle automatizado de codificación con LLMs, y ofrece una visión de primera mano sobre dónde la automatización de la investigación en IA funciona y dónde todavía falla. En la optimización de hiperparámetros, los modelos actuales hacen trabajo genuino de nivel investigador junior: diagnostican problemas de flujo de gradiente, reescriben aumentaciones del cargador de datos y logran mejoras medibles de perplejidad con presupuestos fijos. En la ejecución de experimentos y la generación de gráficos, una descripción sencilla de la tarea genera un conjunto experimental completo con análisis. Lo que los modelos no pueden hacer de forma confiable es el pensamiento lateral: reconocer que una línea de investigación es estructuralmente poco prometedora y saltar a un enfoque diferente antes de acumular más experimentos sin salida. Jang se encontró con esto repetidamente: los modelos insistían en una línea sin salida en lugar de dar un paso atrás y preguntarse si esa era la dirección correcta. Su tesis es que esto es un problema de señal de entrenamiento: construir entornos de RL con el bucle externo adecuado, como el Go, puede ser lo que finalmente enseñe a los modelos a escapar de los callejones sin salida de la investigación. > *"Lo que encuentro es que los modelos cerrados actuales a los que el público puede acceder hoy no parecen ser especialmente buenos para seleccionar cuál debería ser el siguiente experimento en una línea de trabajo dada. No parecen capaces de dar un paso atrás y hacer el pensamiento lateral de: 'Espera, esta línea realmente no tiene sentido.'"* ## Entidades - **Eric Jang** (Persona): VP de IA en 1X Robotics; anteriormente investigador científico senior en Google Brain/DeepMind Robotics; reconstruyó AlphaGo durante su sabbático. - **Dwarkesh Patel** (Persona): Presentador del Dwarkesh Podcast; codesarrolla durante la entrevista el análisis de ineficiencia de RL en bits por FLOP. - **AlphaGo / AlphaZero** (Software): Sistemas de DeepMind para jugar al Go que combinan MCTS con redes neuronales profundas; el eje técnico del episodio. - **KataGo** (Software): Motor de Go de código abierto creado por David Wu (Jane Street) que logró una reducción de cómputo de 40x respecto a AlphaGo Zero; la implementación de referencia principal de Jang. - **Monte Carlo Tree Search (MCTS)** (Concepto): Algoritmo de búsqueda iterativo que equilibra explotación y exploración mediante UCB/PUCT; la lente analítica central del episodio. - **Problema de asignación de crédito** (Concepto): Dificultad en RL para determinar qué acciones de una trayectoria larga causaron un resultado positivo; motiva las funciones de ventaja, las líneas base y las redes de valor. - **DAgger** (Concepto): Algoritmo Dataset Aggregation; explica por qué los replay buffers en AlphaGo son aceptables siempre que los estados del buffer se mantengan cerca de la distribución de la política actual. - **Andrej Karpathy** (Persona): Citado por la frase "succionar supervisión por una pajita" para describir la señal de aprendizaje escasa del RL con policy gradient en trayectorias largas de tokens.

#alphago#monte-carlo-tree-search#reinforcement-learning
Yann LeCun sobre lo que viene después de los LLMs
1:21:56
EN/ZH
Watch with Captions
Unsupervised Learning: With Jacob Effronhace 11 días

Yann LeCun sobre lo que viene después de los LLMs

Yann LeCun, ganador del Premio Turing y fundador de AMI Labs, expone su argumento de que los LLMs son un callejón sin salida productivo: herramientas genuinamente útiles, pero estructuralmente incapaces de modelar la realidad física, planificar o predecir las consecuencias de las acciones. Recorre la arquitectura JEPA como alternativa, explica el proyecto de aprendizaje federado Tapestry para la soberanía de IA fuera de EE.UU. y China, y revela por qué su etapa en Meta llegó a su fin: la presión a corto plazo de la organización GenAI fue convirtiendo la investigación de ruptura en algo políticamente inviable. Su predicción para el cambio de paradigma: principios de 2027. ## [00:00] Introducción Jacob Effron abre con un avance de la conversación: Yann bromeando con "cinco años, dominación mundial completa", adelantando su opinión directa sobre su relación con el programa Llama de Meta, y señalando cómo su visión del aprendizaje no supervisado lo alejó gradualmente de los LLMs. Jacob enmarca el episodio como una oportunidad poco frecuente de escuchar a alguien que tanto construyó los LLMs de código abierto fundacionales como ahora sostiene, pública y consistentemente, que seguir escalándolos es la apuesta equivocada. > *"La mejor forma de lograr investigación de ruptura es contratar a las mejores personas y quitarte de en medio."* ## [01:45] Por qué los LLMs no son el camino hacia la inteligencia Yann traza una línea clara entre los LLMs como productos y los LLMs como camino hacia la inteligencia. Funcionan bien precisamente porque el lenguaje es especial: un sustrato discreto, de baja dimensionalidad y altamente estructurado donde la predicción autorregresiva es tratable. La realidad no es así. El mundo físico es de alta dimensionalidad, continuo y caótico: un robot recogiendo una taza, un auto autónomo navegando una zona de construcción, una célula respondiendo a un fármaco. Estos no son problemas de lenguaje, y las arquitecturas optimizadas para el lenguaje no pueden adquirir los modelos internos necesarios para razonar sobre ellos. Su empresa, AMI (Advanced Machine Intelligence), se construye sobre la tesis contraria: que el camino correcto son sistemas que aprenden representaciones abstractas del mundo a partir de datos sensoriales en bruto — video, señales de sensores, telemetría industrial — y que pueden planificar simulando las consecuencias de acciones candidatas dentro de esas representaciones. > *"Simplemente no son un camino hacia la inteligencia a nivel humano, ni siquiera a nivel animal. Esa es mi posición. No digo que sean inútiles — solo digo que no llevan a eso."* ## [07:51] AMI y los modelos de mundo "Modelo de mundo" se ha convertido en una palabra de moda, observa Yann, y el campo se ha dividido en dos corrientes: los enfoques generativos (modelos de video, VLAs) y los enfoques de embedding conjunto como JEPA. Descarta las VLAs — modelos de visión-lenguaje-acción entrenados para producir acciones robóticas — como fracasos ya ampliamente reconocidos: frágiles, hambrientos de datos, incapaces de generalizar. El enfoque generativo de video tiene el mismo defecto estructural que los LLMs: predice cada píxel en lugar de aprender la estructura abstracta subyacente. Un modelo de mundo, correctamente definido, es un sistema que permite a un agente anticipar las consecuencias de sus propias acciones antes de ejecutarlas. Sin eso, cualquier sistema agéntico opera a ciegas, sin capacidad de verificar si una secuencia de acciones planificada logrará el objetivo. > *"No puedo imaginar cómo se puede pensar en construir un sistema agéntico sin que ese sistema tenga la capacidad de predecir las consecuencias de sus acciones."* ## [12:07] La arquitectura JEPA explicada La intuición detrás de JEPA surgió de un patrón que Yann detectó a lo largo de años de investigación en aprendizaje autosupervisado: todas las arquitecturas que aprendieron representaciones útiles de imágenes y video con éxito eran no generativas. Las arquitecturas generativas — VAEs, autoencoders enmascarados, modelos de predicción de píxeles — tenían un rendimiento sistemáticamente inferior. JEPA toma una vista corrupta o parcial de una entrada, pasa ambas versiones por encoders y entrena un predictor para que las representaciones coincidan, no los píxeles en bruto. Esa abstracción es el punto central. El artículo de 2022 "A Path Towards Autonomous Machine Intelligence" fue su intento de plasmar el plano completo: JEPA como columna vertebral perceptual, planificación orientada a objetivos encima, y una estructura jerárquica de modelos de mundo a distintas escalas temporales. Describe su publicación como "revelar todos mis secretos" — una apuesta deliberada a que la apertura atraería más talento al paradigma de lo que el secretismo podría proteger. > *"Llevo mucho tiempo interesado en el problema de aprender modelos del mundo mediante predicción, y hace unos cinco años tuve una revelación: todas las arquitecturas que han tenido éxito para aprender representaciones de imágenes y videos son no generativas, y todas las generativas han sido básicamente fracasos."* ## [15:55] Problemas con los modelos de robótica actuales Las demostraciones actuales de robótica son impresionantes, pero están entrenadas con volúmenes enormes de datos de imitación — grabaciones de teleoperación, demostraciones con seguimiento de manos — y ajustadas con RL principalmente en simulación. Ese proceso produce especialistas frágiles. Un joven de 17 años aprende a conducir en unas 20 horas; tenemos millones de horas de video de conducción y aún no hay un auto autónomo de nivel 5. La brecha entre el aprendizaje por imitación y la generalización genuina es la brecha entre memorizar ejemplos y tener un modelo interno del mundo. La promesa de los sistemas basados en modelos de mundo es la generalización de tareas sin entrenamiento previo: dado un nuevo objetivo, un sistema con un modelo de mundo interno preciso puede planificar una secuencia de acciones para alcanzarlo sin haber sido entrenado explícitamente en esa tarea. Las aplicaciones industriales a corto plazo que apunta — control de turbinas de gas, plantas químicas, líneas de manufactura — son entornos donde las entradas ya son numéricas y un modelo de mundo puede entrenarse directamente con datos operativos. > *"El grado de generalización que obtendrías con un sistema basado en modelos de mundo es mucho mayor — un espectro más amplio de tareas con menos datos de entrenamiento que un sistema entrenado con aprendizaje por imitación."* ## [20:37] El comportamiento de manada en Silicon Valley El diagnóstico de Yann sobre por qué toda la industria convergió en escalar LLMs es estructural: cuando estás por detrás, no puedes permitirte trabajar en nada más. La carrera competitiva crea un incentivo racional para que cada laboratorio importante cave la misma trinchera. Fundó AMI Labs en París precisamente para escapar de esto — la oficina en EE.UU. está en Nueva York, no en Silicon Valley — y no levantó capital de ningún VC de Silicon Valley. Su predicción para el cambio de paradigma es principios de 2027. "Modelo de mundo" ya se está convirtiendo en un concepto de moda en investigación; la industria ha reconocido que las VLAs fracasaron; y el problema de generalización sin resolver en robótica actúa como catalizador. No afirma que AMI tendrá una solución completa para entonces, pero espera que para ese momento sea evidente para todos que era necesario un cambio de paradigma. > *"Creo que la comprensión de que se necesita un cambio de paradigma está ocurriendo ahora mismo y será completamente obvia para todos a principios de 2027."* ## [28:18] Tapestry: IA soberana para el resto del mundo Tapestry es un proyecto separado de AMI, construido en torno a una observación: a medida que las gafas inteligentes y los asistentes de IA se convierten en la interfaz principal de información, quien controla el modelo subyacente controla la dieta informativa de miles de millones de personas. Un agricultor en India, un filósofo en Alemania, un ciudadano en Marruecos — ninguno de ellos está bien servido por un modelo cuyo entrenamiento, valores y sesgos políticos fueron definidos por un puñado de personas en California o Shenzhen. La solución es el entrenamiento federado: países e instituciones aportan datos y cómputo, pero nunca comparten datos en bruto entre sí. Comparten vectores de parámetros. Cada participante entrena localmente, intercambia periódicamente actualizaciones de parámetros y obtiene un modelo de consenso en ejecución — un repositorio de todo el conocimiento humano que ninguna parte controla de forma unilateral. Países desde India hasta Kazajistán y Francia han expresado interés, porque la soberanía en IA se ha convertido en una prioridad política independiente de cualquier elección tecnológica. > *"Toda tu dieta de información estará mediada por asistentes de IA, y si ese asistente fue construido en California o en Pekín, eso no te beneficia."* ## [35:49] OpenAI es el próximo Sun Microsystems Los proveedores propietarios de LLMs ya han agotado los datos de texto disponibles públicamente. El camino restante — licenciar material con derechos de autor o generar datos sintéticos — es costoso y tiene límites. Los modelos de código abierto han ido cerrando la brecha sin esa restricción. Yann traza la analogía con el mercado de estaciones de trabajo Unix de los años 90: Sun Microsystems, HP y SGI tenían sistemas propietarios técnicamente superiores y argumentos sólidos para por qué no ibas a montar un servidor web en Windows NT — y todos fueron barridos por Linux. Hoy toda internet corre sobre Linux. OpenAI y Anthropic, dice, son el Sun Microsystems de este ciclo. > *"Básicamente, OpenAI, Anthropic, etc. de hoy son el Sun Microsystems y el HPUX de ayer."* ## [40:51] Por qué las ideas de Yann divergieron de las de Hinton y Bengio La ruptura ocurrió en 2023. La posición de Yann no cambió — las de Hinton y Bengio sí. Hinton se encontró con GPT-4 y concluyó que estaba cerca de la inteligencia humana, a partir de un cálculo aproximado sobre el número de neuronas corticales. Yann considera ese argumento incorrecto y lo interpreta como que Hinton encontró una justificación para declarar la victoria y retirarse de la investigación activa. El giro de Bengio fue distinto — más enfocado en los riesgos sociales de la concentración del poder en IA — y Yann tiene más simpatía por esa preocupación, aunque discrepa del encuadre apocalíptico. > *"No creo en esa afirmación para nada. Es la forma que tiene Jeff de decir: bueno, básicamente puedo retirarme — puedo declarar la victoria."* ## [44:32] Los LLMs son intrínsecamente inseguros La afirmación más contundente de Yann: los LLMs no pueden hacerse fiablemente seguros, no porque la alineación sea difícil, sino porque la arquitectura es estructuralmente incapaz de predecir las consecuencias de sus acciones. No existe ninguna restricción cableada que garantice que un LLM promovido cumpla realmente la tarea prevista; cumple lo que su entrenamiento lo condicionó a hacer, y siempre existe una brecha entre la distribución de entrenamiento y los prompts del mundo real. Agentes de código que borran discos duros, consejos médicos que salen mal, sistemas agénticos que toman acciones irreversibles — no son bugs a parchear, sino propiedades de la arquitectura. Su alternativa, la IA orientada a objetivos, funciona de otra manera: el sistema tiene un modelo de mundo explícito, una función de costo explícita que representa el objetivo y un conjunto de restricciones de seguridad rígidas. El optimizador encuentra una secuencia de acciones que satisface todas las restricciones y minimiza el costo — lo que significa que literalmente no puede tomar una acción que viole una restricción de seguridad por construcción. Esa garantía es imposible con un LLM. También rebate el discurso de lobby de Anthropic sobre el riesgo de la IA, argumentando que el peligro real proviene de actores maliciosos que usan los sistemas actuales, no de una superinteligencia emergente, y que la presión regulatoria beneficia principalmente a los incumbentes. > *"Los LLMs son intrínsecamente inseguros. No creo que puedan hacerse fiables y seguros. No pueden ser fiables porque no puedes evitar que alucinen."* ## [58:00] Por qué Yann dejó Meta Yann corrige un malentendido extendido: tuvo cero influencia técnica en Llama. Llama 1 fue un pequeño proyecto de FAIR; cuando GenAI se creó a principios de 2023, el equipo de Llama se trasladó allí y quedó bajo una intensa presión de producto a corto plazo. Dos de los autores de Llama 1 se fueron para fundar Mistral. GenAI se volvió conservadora y cada vez más restrictiva con las publicaciones. FAIR, mientras tanto, estaba siendo redirigida para apoyar el trabajo de LLMs de GenAI en lugar de perseguir la agenda de investigación de AMI que Yann, Zuckerberg y el CTO habían respaldado originalmente. A principios de 2024, el entorno ya no era propicio para la investigación de ruptura. > *"Hay un gran malentendido sobre mi rol, mi relación con Alex y cómo se gestionó la IA en Meta."* ## [01:00:26] Reflexiones sobre FAIR Yann se incorporó a Facebook a finales de 2013 y dirigió FAIR durante cuatro años y medio antes de dar un paso atrás para convertirse en Chief AI Scientist — un movimiento deliberado porque, como él mismo dice, no es un gestor nato. El proyecto interno de AMI surgió de su artículo de visión de 2022, que Zuckerberg, el CTO y el CPO leyeron y respaldaron. Pero los niveles por debajo del liderazgo no vieron el sentido, y la decisión de Meta de cerrar todo su grupo de IA en robótica — liderado por Gita Matarić, ahora en Amazon — dejó claro que la empresa no tenía interés en las aplicaciones para las que se construyen los modelos de mundo. Las restricciones a publicaciones se endurecieron, buenos investigadores se fueron, y la incompatibilidad entre la agenda de investigación de Yann y las prioridades de producto de Meta se volvió irreconciliable a principios de 2025. Cuando fue a levantar capital para AMI, los inversores ya conocían su historia por años de charlas públicas y estaban predispuestos a creer que los LLMs tenían límites fundamentales. > *"La mejor forma de obtener investigación de ruptura del tipo que lográbamos en los primeros días de FAIR y en Bell Labs es contratar a las mejores personas, darles los medios para tener éxito y quitarte de en medio."* ## [01:12:11] Consejos para estudiantes de doctorado Yann abre reflexionando que su predicción de que el aprendizaje autosupervisado funcionaría para video fue correcta en su mecanismo, pero incorrecta en dónde tuvo éxito primero: los LLMs son "un ejemplo deslumbrantemente exitoso de aprendizaje autosupervisado", solo que aplicado al lenguaje en lugar de a datos sensoriales. Luego plantea el desafío técnico central para JEPA: el colapso de representaciones. Si entrenas un predictor para mapear un embedding a otro, la solución trivialmente óptima es que ambos encoders emitan una constante. El aprendizaje contrastivo (su invención de 1993) previene el colapso, pero no escala con la dimensión. Los métodos de destilación como DINO funcionan, pero por razones mal comprendidas. Su mejor respuesta actual, SIGreg (Sketched Isotropic Gaussian Regularization), fuerza a que la distribución de salida del encoder sea gaussiana, maximizando el contenido de información sin pares negativos. Recomienda el artículo LeWorldModel — el primer modelo de mundo a pequeña escala entrenado con este enfoque — como el mejor punto de entrada a lo que AMI Labs tiene por delante. Su consejo para estudiantes de doctorado: no trabajen en LLMs — no pueden contribuir desde la academia sin cómputo de frontera, y estudiar por qué funcionan es ciencia descriptiva, no investigación creativa. > *"Un LLM funciona porque cuando tienes una secuencia de símbolos discretos, hacer predicciones es fácil. Si tienes el mundo real, no puedes usar un modelo generativo — tienes que entrenar un sistema que aprenda una representación y haga predicciones en el espacio de representación."* ## Entidades - **Yann LeCun** (Persona): Coganador del Premio Turing 2018; ex Chief AI Scientist en Meta FAIR; fundador de AMI Labs; profesor en NYU; inventor de las redes neuronales convolucionales y cocreador de JEPA - **Jacob Effron** (Persona): Partner en Redpoint Ventures; presentador del podcast Unsupervised Learning - **Geoffrey Hinton** (Persona): Coganador del Premio Turing; cambió su posición sobre las capacidades de los LLMs tras GPT-4; menos activo sobre los peligros de la IA desde 2024 - **Yoshua Bengio** (Persona): Coganador del Premio Turing; enfocado en los riesgos sociales de la concentración del poder en IA más que en la superinteligencia emergente - **JEPA** (Concepto): Joint Embedding Predictive Architecture — realiza predicciones en el espacio de representación en lugar del espacio de píxeles; forma la columna vertebral perceptual del marco de modelos de mundo de Yann - **World Model** (Concepto): Modelo interno que permite a un agente predecir las consecuencias de sus propias acciones antes de ejecutarlas; requisito previo para la IA agéntica segura en el marco de Yann - **Tapestry** (Concepto): Proyecto de entrenamiento federado de LLMs que permite a países e instituciones entrenar un modelo fundacional compartido preservando la soberanía de datos mediante el intercambio de vectores de parámetros - **AMI Labs** (Organización): Empresa de Yann (Advanced Machine Intelligence); con sede en París y oficina en EE.UU. en Nueva York; enfocada en modelos de mundo basados en JEPA para robótica, control industrial y salud - **Meta FAIR** (Organización): Facebook AI Research; origen de Llama 1, I-JEPA, V-JEPA y el programa interno de investigación de AMI; redirigida progresivamente hacia el soporte de LLMs de GenAI antes de la salida de Yann

#llm-critique#world-models#jepa
Cumbre Trump-Xi, Benioff: "No es mi primera SaaSpocalipsis", OpenAI vs Apple, IA Multisensorial, El Niño
1:16:30
EN/ZH
Watch with Captions
All-In Podcasthace 11 días

Cumbre Trump-Xi, Benioff: "No es mi primera SaaSpocalipsis", OpenAI vs Apple, IA Multisensorial, El Niño

Marc Benioff, CEO de Salesforce, se une a Jason Calacanis, David Friedberg y Chamath Palihapitiya (con David Sacks ausente) en un episodio de amplio alcance anclado en dos historias de actualidad: la primera cumbre Trump-Xi desde 2017 y el acelerado embate de la IA sobre las valoraciones del software empresarial. Benioff, presente en la cena de estado saudí, el Castillo de Windsor y esta delegación de la cumbre, ofrece una visión de primera fila sobre la diplomacia comercial entre Estados Unidos y China, antes de girar hacia la revalorización existencial de su propia compañía: argumenta que la infraestructura de datos y la plataforma de agentes de Salesforce la sitúan en el lado correcto de la disrupción por IA. La segunda mitad cubre el choque entre OpenAI y Apple, el demo multimodal en tiempo real de Thinking Machines, los alarmantes datos de El Niño de Friedberg y la ofensiva de Anthropic contra los esquemas de SPV en capas. ## [00:00] ¡Marc Benioff, CEO de Salesforce, se une al show! Sacks no está esta semana y Benioff ocupa su lugar. Jason pregunta de inmediato sobre el posicionamiento político de Benioff: exdonante demócrata, ahora asistiendo a cenas de estado saudíes y aparentemente bienvenido en la administración actual. Benioff descarta por completo el encuadre partidista. > *"No soy demócrata ni republicano. Soy americano."* Chamath señala que Benioff acumuló invitaciones al Castillo de Windsor, la visita del Príncipe Carlos a Estados Unidos y la cena de estado saudí en rápida sucesión: el raro CEO tecnológico que navega entre administraciones sin fricciones. El contexto presenta a Benioff como una voz excepcionalmente creíble sobre la cumbre que se desarrolla en tiempo real. ## [01:14] Cumbre Trump-Xi, hacer negocios en China como empresa estadounidense, impacto en los americanos y las elecciones de mitad de período El séptimo encuentro cara a cara entre Trump y Xi, demorado dos meses por la guerra con Irán, se abrió en Pekín con Xi advirtiendo que una mala gestión de Taiwán podría poner toda la relación "en una situación extremadamente peligrosa." Polymarket situó la probabilidad de invasión en 2026 en el 6% con 23 millones de dólares en volumen. En materia comercial, Xi se comprometió a comprar soja, GNL estadounidense y 200 aviones Boeing, y pidió una "puerta más abierta" al comercio. La delegación estadounidense parece un consejo de administración corporativo: Jensen Huang vendiendo chips, Kelly Ortberg vendiendo aviones, Brian Sykes de Cargill vendiendo soja, y Visa y Mastercard presionando por acceso al mercado de pagos. Friedberg encuadró la cumbre a través del prisma de la trampa de Tucídides: cuando una potencia en ascenso se encuentra con una en declive, el conflicto es históricamente probable. Aun así, argumentó que un momento de expansión de recursos, impulsado por la IA y la biotecnología, ofrece una salida poco común a ese patrón. > *"Parece que en este momento, cuando estamos viendo estos extraordinarios cambios tecnológicos desbloqueados por la IA, la automatización, la biotecnología y todo lo que podría anunciar una verdadera abundancia por delante, es el momento perfecto para decir que quizá el mundo pueda ser más multipolar."* Benioff confirmó que Salesforce no tiene oficinas ni empleados en el continente chino: todos los ingresos de China fluyen a través de una asociación exclusiva con Alibaba para cumplir con la ley de residencia de datos. Espera que la cumbre genere flujo real de pedidos en toda la delegación. Chamath argumentó que la jerarquía confuciana vertical de China hace que la diplomacia a nivel de CEO sea más efectiva que los canales burocráticos, y que los americanos que sienten el aprieto de la inflación necesitan que el acuerdo funcione. ## [18:46] Taiwán, chips, modelos de IA y la paz a través del comercio Benioff rechazó la premisa de que Taiwán es la prioridad central de China, insistiendo en que la prosperidad económica y el crecimiento de la clase media importan más a Xi que la ambición territorial. Ante la pregunta directa de si Estados Unidos debería defender Taiwán ante un bloqueo chino, evitó el planteamiento binario: "Creo que China y Taiwán se reconciliarán." Chamath adoptó una visión estructural: Estados Unidos está a roughly 1-2 nanómetros de la paridad doméstica en chips, punto en el que el valor estratégico de Taiwán pasa a ser económico antes que existencial. > *"Estamos en un punto donde probablemente estamos a 1 o 2 nanómetros de poder hacer lo que necesitamos que Taiwán haga estratégicamente por nosotros. Hoy es económico, y si sacas eso de la mesa, creo que tendremos una actitud muy diferente hacia Taiwán."* La propuesta de Chamath: vender los chips de todos modos, porque dejar que Huawei gane la carrera de semiconductores es peor que dejar que Nvidia venda en China con controles KYC sobre el uso de modelos. Benioff coincidió en que los modelos de IA chinos están cerca de la paridad con los estadounidenses a pesar de las restricciones de chips, lo que socava el argumento del embargo. Friedberg añadió que, a medida que China construye fábricas y equipos de capital propios, la irremplazabilidad de Taiwán se reduce por su propio camino, independientemente de los resultados políticos. ## [31:41] El impacto de la IA en el software: ¿qué SaaS prospera y qué SaaS muere? Jason expuso la revalorización sin rodeos: Salesforce bajó un 37%, ServiceNow un 42%, Workday un 45%, unos 180.000 millones de dólares en capitalización de mercado combinada esfumados ante la suposición de que la IA dejará obsoleto al SaaS gestionado. Benioff salió al frente. > *"No es mi primera apocalipsis del SaaS, honestamente, pero es la apocalipsis del SaaS actual."* Su argumento: el mercado se revalorizó sobre una premisa falsa. La apuesta de Salesforce es Agentforce: agentes de IA anclados en datos empresariales reales, no modelos genéricos propensos a alucinaciones. La adquisición de Informatica por entre 8.000 y 9.000 millones de dólares proporciona la capa de armonización de datos que hace confiables a los agentes: "La IA es muy probabilística; necesita estar anclada en la verdad, en una única fuente de verdad, o simplemente no puede funcionar bien." Benioff añadió que Salesforce gastará roughly 300 millones de dólares en Anthropic este año exclusivamente para agentes de codificación internos, comprimiendo los ciclos de implementación. Chamath dividió el mercado en dos: el extremo bajo ya terminó, las soluciones puntuales genéricas sin relaciones profundas con clientes están muertas; pero el extremo alto, donde opera Salesforce, está posicionado para beneficiarse del examen de retorno de inversión cuando los mercados públicos dejen de estar "eufóricos con la IA" y pregunten qué produjeron 3 billones de dólares en capex. Los supervivientes serán quienes tengan relaciones a nivel de C-suite, churn negativo y la capacidad de ofrecer las capacidades de IA como resultados medibles. ## [47:26] OpenAI considera demandar a Apple por el fracaso de la integración con ChatGPT Bloomberg informó que OpenAI podría demandar a Apple por incumplimiento de contrato: el acuerdo ChatGPT-Siri de 2024 colapsó en la práctica porque Apple dirige las consultas a ChatGPT solo cuando los usuarios dicen explícitamente "ChatGPT," nunca promocionó la integración, y OpenAI nunca vio los ingresos por suscriptores que esperaba. La defensa de Apple apunta a preocupaciones de privacidad sobre las prácticas de datos de OpenAI. Benioff reencuadró la historia como una divergencia estratégica entre los laboratorios de IA: Grok construyó compañeros y "sex bots," OpenAI apostó por Sora y redes publicitarias, Gemini lanzó Nano, y Anthropic ignoró todo eso para centrarse en agentes de codificación, y Anthropic resultó tener razón. Lanzó una pista sobre funcionalidad de codificación nativa en Slack aún sin anunciar. > *"Anthropic dijo que no saben de esos sex bots ni de Nano Banana, pero que van a hacer agentes de codificación. Y resultó que Anthropic tenía razón. Y de repente el cohete despegó."* Chamath planteó la pregunta de fondo: ¿qué le ocurre a Apple si la capa de interacción con la IA se desplaza completamente fuera del dispositivo? Predijo un "momento iPhone" de parte de un fabricante de hardware inesperado: un dispositivo ambiental delgado y siempre encendido que haga irrelevante al MacBook Pro para la inferencia de IA. Friedberg señaló que la estrategia actual de Apple consiste en llenar huecos antes que en tener visión, y que G Suite está quitando silenciosamente cuota empresarial a la pila de productividad de Apple. ## [56:54] Thinking Machines lanza modelo en tiempo real, el futuro de la IA para el consumidor y los modelos multisensoriales Thinking Machines, de Mira Murati, lanzó un modelo multimodal en tiempo real que monitorea el escritorio, escucha el audio ambiente y procesa la entrada de la webcam simultáneamente en intervalos de 200ms a través de dos pipelines paralelos: uno para razonamiento retrospectivo profundo y otro para respuesta en vivo. Al mismo tiempo, Apple ha patentado cámaras en el interior de los AirPods. > *"Los modelos multisensoriales son la próxima gran ola para la IA, aunque todavía no llegaremos a la AGI en ese punto."* Benioff argumentó que los LLM entrenados con lenguaje tienen limitaciones fundamentales: la cognición humana integra visión, audición y propiocepción en paralelo sobre hardware biológico. El anclaje multisensorial es la capa que falta. La economía de tokens es impresionante: el monitoreo ambiental en tiempo real a 8 horas por usuario al día supondría 1000 veces el consumo empresarial actual. Benioff rechazó la carrera de "modelo más grande = mejor," prediciendo que la inteligencia distribuida integrada en apps y dispositivos importará más que la escala bruta del modelo, y señalando espacio para una "nueva empresa prometedora" que combine sensores ambientales con contexto empresarial. ## [62:24] Rincón de Ciencia: Impactos de un El Niño históricamente fuerte en 2026 Friedberg presentó datos de anomalías de temperatura oceánica que muestran temperaturas superficiales del mar encaminadas hacia la mayor desviación de la norma desde 1877, roughly 4°C por encima de la línea base. La energía térmica almacenada: 11 millones de teravatios-hora, frente al consumo humano anual global de 25.000 teravatios-hora. > *"Eso equivale a 500 años de energía humana en este océano. Y en los próximos meses, esa energía se liberará en la atmósfera, lo que, con un 99% de confianza, hará que el próximo año sea el más caluroso registrado por un amplio margen."* La cascada: los vientos alisios alterados impulsan ríos atmosféricos hacia California y la costa del Golfo; las cúpulas de calor se extienden sobre Phoenix e interior de Canadá; los monzones indios fallan con alta probabilidad, amenazando a 150 millones de agricultores y 1.500 millones de personas dependientes de la alimentación; las exportaciones agrícolas de Brasil a Indonesia y Filipinas colapsan; los precios del trigo suben globalmente. Phoenix ya estaba a 106°F en mayo. Los mercados de materias primas cotizan activamente la exposición a El Niño. El lado positivo parcial de Friedberg: la genética de cultivos ha mejorado la resiliencia a la sequía y las tierras agrícolas de Siberia se están expandiendo, pero esas ganancias no rescatan la ventana de cosecha de 2026. ## [71:40] Anthropic arremete contra las "Dark SPVs" Anthropic señaló formalmente a las plataformas que venden SPVs en múltiples capas a inversores minoristas, el modelo de "dentistas a los que cobran comisiones de carga del 10%," y declaró que anulará las acciones vendidas a través de estructuras no autorizadas. Chamath lo respaldó sin reservas: cada empresa pre-IPO debería seguir el ejemplo, avanzar hacia los mercados públicos y dejar que estas estructuras desaparezcan. > *"Una vez que SpaceX salga a bolsa, una vez que Anthropic salga a bolsa, una vez que OpenAI salga a bolsa, veremos una letanía de demandas en todas direcciones entre los promotores de estas SPVs. No deberían estar permitidas."* Chamath predijo una oleada de consecuencias legales cuando las principales empresas de IA salgan a bolsa y los inversores minoristas en SPVs descubran que los números no cuadran. El capítulo cierra con Benioff hablando del modelo filantrópico 1-1-1 de Salesforce, el 1% de acciones, el 1% de beneficios y el 1% del tiempo de los empleados desde la fundación, que hoy da servicio gratuito a 50.000 organizaciones sin ánimo de lucro en la plataforma, y un emotivo recuerdo de Susan Wojcicki. ## Entidades - **Marc Benioff** (Persona): Presidente y CEO de Salesforce; invitado en este episodio; arquitecto del modelo filantrópico 1-1-1 y la plataforma de agentes de IA Agentforce - **David Friedberg** (Persona): Presentador; CEO de The Production Board; expuso el rincón de ciencia sobre El Niño - **Chamath Palihapitiya** (Persona): Presentador; CEO de Social Capital; defendió la supervivencia del SaaS de gama alta de Salesforce y la proliferación de chips de Nvidia - **Salesforce / Agentforce** (Software): CRM empresarial y plataforma de agentes de IA; la apuesta de Benioff de que los agentes anclados en datos son lo contrario a una sentencia de muerte para el SaaS - **Anthropic** (Organización): Empresa de seguridad en IA; proveedor preferido de agentes de codificación por Benioff (~300 millones de dólares en gasto planificado en Salesforce); también tomando medidas contra estructuras de SPV no autorizadas - **OpenAI** (Organización): Considera demandar a Apple por el fracaso de la integración ChatGPT-Siri; pivotando hacia agentes de codificación tras el éxito de Anthropic - **Thinking Machines / Mira Murati** (Organización): Lanzó un modelo multimodal ambiental en tiempo real que procesa escritorio, audio y webcam simultáneamente en intervalos de 200ms - **Trampa de Tucídides** (Concepto): Marco de ciencia política sobre el ciclo de conflicto entre potencia en ascenso y potencia en declive, invocado por Friedberg para enmarcar la oportunidad de abundancia cooperativa en la cumbre EE.UU.-China - **Dark SPVs** (Concepto): Vehículos de propósito especial en múltiples capas que venden capital pre-IPO en empresas privadas de IA a inversores minoristas, a menudo con altas comisiones y legitimidad jurídica cuestionada

#ai-agents#enterprise-saas#us-china-trade
Cómo funciona Claude Code
2:50
EN/ZH
Watch with Captions
ClaudeClaude Code 101hace 12 días

Cómo funciona Claude Code

El segundo episodio de Claude Code 101 de Anthropic abre el capó: el bucle agéntico que recopila contexto, toma acciones y verifica resultados; cómo la ventana de contexto se compacta antes de desbordarse; qué aportan realmente las herramientas frente al texto en entrada y salida; y los cuatro modos de permiso que se activan con shift+tab. ## [00:04] Pregunta inicial: en qué se diferencia de una aplicación de chat El narrador encuadra el resto del video en una sola pregunta: Claude Code no es una aplicación de chat, entonces cuál es la forma de la cosa? La respuesta que van a desempaquetar es el bucle agéntico. > *We know that Claude code is different from usual chat applications, but how does it work?* ## [00:13] El bucle agéntico — recopilar, actuar, verificar, repetir El bucle tiene cuatro tiempos. Introduces un prompt. Claude recopila el contexto que necesita hablando con el modelo, que devuelve texto o una llamada a herramienta. Claude ejecuta la acción: editar un archivo, ejecutar un comando. Luego verifica si el resultado satisface realmente el prompt. Si pasa, se detiene; si no, vuelve a ejecutar el bucle hasta que el trabajo esté completo y sea verificable. El usuario no queda bloqueado durante este proceso: puedes agregar contexto, interrumpir o guiar al modelo hacia el objetivo final mientras el bucle se ejecuta. > *And if they don't, Claude goes back and runs the loop again until the results are complete and verifiable.* ## [01:02] La ventana de contexto y la compactación automática La ventana de contexto es la memoria de trabajo de Claude: la conversación, el contenido de los archivos, las salidas de los comandos, todo lo que puede revisar. Está acotada. Cuando se alcanza el límite, Claude Code compacta la conversación por sí solo: decide qué descartar y qué resumir para que la ventana vuelva a bajar sin perder el hilo. > *Once you reach that limit, Claude code compacts your conversation, which automatically determines what it can take out of the context window and what it can summarize in order to bring the context window back down.* ## [01:26] Herramientas — despacho semántico para leer archivos, ejecutar código, buscar en la web La mayoría de los asistentes de IA son texto de entrada, texto de salida, sin nada entre medio. Las herramientas cambian eso: permiten al agente decidir cuándo ejecutar código para acercarse al objetivo. Leer un archivo, buscar en la web, ejecutar un comando de shell. Claude Code usa búsqueda semántica sobre las herramientas disponibles para elegir cuál invocar y consumir la salida. > *Tools let Claude code and other agents determine when to execute code to get closer to a task.* ## [01:52] Modos de permiso y el costo de omitirlos Por defecto, Claude Code pide confirmación antes de editar un archivo o ejecutar un comando de shell. Shift+tab permite cambiar entre alternativas: **aceptación automática de ediciones** escribe archivos sin preguntar pero sigue preguntando antes de los comandos; **el modo plan** restringe a Claude a herramientas de solo lectura para que pueda elaborar un plan de acción antes de tocar nada. El narrador señala el compromiso evidente: dar al agente rienda suelta significa que un error es más difícil de detectar antes de que ocurra. > *Giving Claude code free reign to run commands means a mistake could be harder to catch before even happens.* ## [02:28] Resumen — qué lo hace diferente de una ventana de chat Cuatro primitivas compuestas en un terminal: un bucle agéntico, una ventana de contexto gestionada, herramientas y permisos configurables. La combinación — leer el código base, actuar sobre él, verificar su propio trabajo — es lo que separa a Claude Code de un simple cuadro de chat. > *It can read your code base, take action, and verify its own work, and that makes it fundamentally different from a chat window.* ## Entidades - **Anthropic Tutorial Narrator** (Person): El narrador oficial de Anthropic para la serie de tutoriales Claude Code 101. - **Claude Code** (Software): El asistente de codificación terminal agéntico de Anthropic, construido alrededor de las cuatro primitivas desempaquetadas en este episodio. - **Agentic loop** (Concept): El ciclo recopilar-contexto, actuar, verificar, repetir que impulsa cada sesión de Claude Code. - **Context window** (Concept): La memoria de trabajo acotada de Claude que contiene la conversación, el contenido de archivos y la salida de comandos; auto-compactada al desbordarse. - **Tools** (Concept): Los efectos secundarios que el agente puede invocar — leer archivo, buscar en la web, ejecutar comando — seleccionados mediante búsqueda semántica sobre el catálogo de herramientas. - **Permission modes** (Concept): Por defecto (preguntar), aceptación automática de ediciones y modo plan (solo lectura) — cambiados con shift+tab. - **Plan mode** (Feature): Un modo de permiso de solo lectura que permite a Claude compilar un plan de acción antes de cualquier mutación.

#claude-code#ai-agent#agentic-loop
Instalación de Claude Code
3:01
EN/ZH
Watch with Captions
ClaudeClaude Code 101hace 12 días

Instalación de Claude Code

La guía de instalación oficial de Claude Code. El narrador de Anthropic recorre los instaladores de una línea para todas las plataformas compatibles — terminal, VS Code, JetBrains, Claude Desktop y la web — y cierra con una regla sencilla para elegir la más adecuada. ## [00:04] Instaladores de una línea para el terminal (macOS, Linux, WSL, Windows) La ruta predeterminada es el terminal. Los usuarios de macOS, Linux y WSL disponen de un único comando `curl`; Homebrew también funciona, pero no incluye actualizaciones automáticas. En Windows, PowerShell usa `Invoke-RestMethod`, CMD tiene su propio fragmento de `curl`, y `winget` está disponible con la misma advertencia de auto-actualización que Homebrew. > *If you're on macOS, Linux, or WSL, use this curl command to install it in one go. If you prefer to use Homebrew, you can also use brew install to install it, but note that this doesn't have auto-update capabilities.* ## [00:33] Ejecutar claude en tu proyecto e iniciar sesión Tras la instalación, entra con `cd` en tu proyecto y ejecuta `claude`. El primer arranque muestra un selector de tema de color y un flujo de inicio de sesión que acepta una cuenta Pro, Max, Enterprise o una clave API. Las cuentas Enterprise deben seleccionar esa opción explícitamente. El directorio desde el que se lanza define el límite de acceso — Claude Code ve esa carpeta y todo lo que hay dentro, nada por encima. > *Whatever directory you decide to run cloud in, it will have access to that directory and all of its subfolders.* ## [01:02] Extensión de VS Code Abre el panel de Extensiones, busca la extensión Claude Code de Anthropic y confirma la marca azul de verificación antes de instalarla. Puede ser necesario reiniciar. Una vez instalada, la paleta de comandos (`Ctrl/Cmd+Shift+P`) abre una nueva pestaña de Claude Code; también puedes hacer clic en el logo desde cualquier archivo abierto, o desactivar completamente la interfaz gráfica para usar solo la experiencia de terminal desde los ajustes. > *You can also opt out of the UI and just use the terminal experience directly in your settings file.* ## [01:32] Plugin de JetBrains Mismo proceso que en VS Code: instala el plugin Claude Code desde el JetBrains Marketplace, reinicia el IDE y el logo de Claude aparece al relanzar. Al hacer clic se abre un panel lateral que muestra la experiencia de terminal junto a tu editor. > *For JetBrains IDEs, you can install the Cloud Code plugin from the JetBrains Marketplace. Once you install, restart your IDE.* ## [01:51] Claude Desktop y claude.ai/code en la web Claude Desktop expone Claude Code a través de un botón "code" en la parte superior de la aplicación una vez que has iniciado sesión — misma interfaz de tipo chat, pero limitada a una carpeta específica con permisos ajustables e incluso un modo de ejecución en la nube. La versión web está en `claude.ai/code` y reproduce la experiencia de escritorio, con una restricción importante: solo funciona con repositorios de GitHub. > *On the web, you can access Claude code by going to claude.ai/code. This works very similar to the desktop app. However, you're restricted to GitHub repositories only.* ## [02:27] Elegir la opción adecuada La heurística del narrador: primero el terminal si quieres las nuevas funciones el día en que se publican. Las integraciones con IDE ofrecen una experiencia prácticamente idéntica dentro de tu editor. El Desktop es la opción cuando quieres que Claude trabaje en segundo plano mientras haces otra cosa. La web es para trabajo remoto con repositorios de GitHub o para ejecutar varias sesiones en paralelo. > *If you want to constantly keep up to date with everything, the terminal is the best bet. Features ship there the fastest.* ## Entities - **Anthropic Tutorial Narrator** (Person): Narrador del curso Claude Code 101 de Anthropic. - **Claude Code** (Software): Herramienta de codificación agéntica de Anthropic, instalable en terminal, IDEs, escritorio y web. - **Homebrew / winget** (Software): Gestores de paquetes alternativos a los instaladores curl/PowerShell oficiales; ninguno admite auto-actualización. - **VS Code extension** (Software): Extensión Claude Code publicada por Anthropic; verificar la marca azul antes de instalar. - **JetBrains plugin** (Software): Plugin Claude Code distribuido en el JetBrains Marketplace; abre un panel lateral tras reiniciar el IDE. - **Claude Desktop** (Software): Aplicación de escritorio que expone Claude Code mediante un botón "code", con límite de carpeta y modo de ejecución en la nube. - **claude.ai/code** (Service): Versión web de Claude Code, restringida a repositorios alojados en GitHub.

#claude-code#installation#developer-tools