LaiDub

팟캐스트Hear the voice. See the shape of the thought.

채널 둘러보기

전체 AI & 테크 비즈니스 과학 문화 정치 철학 건강

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

2:01:59

EN/ZH

Watch with Captions

The Diary Of A CEO약 1개월 전

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

Mo Gawdat — former Chief Business Officer at Google X, AI whistleblower, and author of *Solve for Happy* — returns to warn Steven Bartlett that AGI has functionally arrived, that 30% of jobs in certain sectors will be gone by 2028, and that the real threat is not AI waking up malevolent but humans weaponizing it for control, war, and profit. Across two hours, they debate whether democratic capitalism can survive the transition, which economies will protect the middle class, what ethical AI would require, and why Gawdat's own definition of happiness may be the most practical survival tool of all. ## [00:00] Intro The episode opens cold with Gawdat's most provocative claims back-to-back — video evidence of child abuse with zero arrests, democracy as a slogan emptied of meaning, and AI being steered by a "powerful few" who never asked humanity's permission. Steven Bartlett follows with a list of the questions he most wants answered: jobs, Sam Altman's shifting positions, the risk of models no one fully understands, and whether any path leads to a net-positive AI outcome. > *"I'm not worried about AI turning against us. I'm worried about humans telling AI to turn against us."* ## [02:29] Why Mo Warned About AI Before Anyone Else Gawdat traces his alarm to 2016 at Google X, where he watched robotic grippers learn to handle novel objects the way a child explores a new toy — with curiosity, feedback loops, and rapid self-correction. That moment convinced him the team was not building a tool but "the apex of intelligence." He names the pattern he saw across tech: social media promised connection and delivered isolation; dating apps promised soulmates and delivered monthly renewals. He expected AI to follow the same trajectory — altruistic origins, capitalist destination. > *"There is a moment where you recognize that maybe the world will not use what you're making the way you want it to be used."* ## [05:26] Can AI Be a Net Positive for Humanity? Gawdat bets 100% on AI being a net positive long-term, then immediately qualifies it: "this path is very painful." His analogy is nuclear power — the first use was a bomb, not electricity. Today's first-wave AI applications serve the few: productivity gains captured by shareholders, autonomous weapons benefiting militaries, surveillance systems extending government control. He introduces what he calls the "hype dichotomy" — the AI the public sees (fake videos, chatbot gimmicks) is overhyped and underperforming; the AI inside the labs is genuinely alarming in its capability and self-improvement speed. > *"What the real geeks see inside the lab is just unbelievable intelligence."* ## [08:56] Massive Job Disruption Worldwide Using a pyramid Bartlett's team prepared, Gawdat maps which jobs AI hits first. His counterintuitive claim: not the bottom. Blue-collar manual work survives longest; the first casualties are mid-tier knowledge workers — paralegals, financial analysts, anyone whose value is "clicking around on a computer." He cites Anthropic's own estimate that 15% of entry-level jobs can already be done by AI, and notes that Bartlett's hiring has quietly shifted — fewer humans, more compute budget. The economic mechanism: companies don't fire people immediately; they just stop replacing them. > *"It's not that jobs will end first. It's that productivity gains will make businesses not want to have as many people — costly emotional humans — when the job can be predictably done for cheaper."* ## [15:28] Will AI Cost Savings Create New Jobs? Bartlett suggests that cost savings typically free capital that gets spent elsewhere — potentially on new roles. Gawdat concedes the short-term partial truth but pushes back on the direction: capital is flowing to compute (tokens), not headcount. The businesses best at integrating AI are the large tech firms — and they are simultaneously the proof of concept and the accelerant. ## [16:38] What Happens to Blue Collar Jobs? Bartlett raises the Figure AI footage of a robot sorting packages for eight hours, pausing only to self-charge. Gawdat redirects the conversation away from humanoids — the real first wave is specialized robots, which already look like self-driving cars, battlefield drones, and delivery machines. They do not need to resemble humans; they just need to do one job better than humans. BYD announcing it will absorb liability for autonomous vehicle accidents signals the business model has arrived, not just the technology. > *"Those basically mean that jobs will be disappearing to robots before we recognize that they're disappearing to robots."* ## [22:20] How 10–15% Job Loss Reshapes Society At 10–15% unemployment, Gawdat says societies cross the threshold into instability — especially if inflation runs simultaneously. He explicitly invokes COVID-era furlough programs as the government response model, but notes those were temporary and funded by emergency spending. A structural 20% unemployment has no equivalent playbook. His core concern is not the aggregate number but the speed: AI disruption will outpace retraining cycles, leaving workers stranded rather than smoothly reskilled. > *"It's not about all of humanity losing their jobs. It's about what is the dividing line before civil war."* ## [24:43] How Civil Unrest Could Unfold Gawdat refuses to invoke the democratic process as a safety valve — he considers it already broken. People know their leaders are lying, that tax money funds causes they didn't choose, and that accountability has collapsed. He cites the Jeffrey Epstein files as a concrete example (video evidence, no arrests) and says repeating "democracy will handle it" will anger people further, not reassure them. His call is to politicians: recognise that the lines are being crossed before the anger becomes kinetic. ## [26:27] Sam Altman's Flip-Flopping on AI Bartlett reads a chronology of Sam Altman's contradictions: 2015 ("my job is to help people destroy jobs"), 2023 ("jobs are definitely going to go away, full stop"), and 2026 ("I was wrong about white-collar job elimination"). Gawdat decodes the pattern as PR management, not genuine uncertainty. He then quotes Altman from Gawdat's own documentary *Chasing Utopia*: "I suspect AI is likely going to end humanity, but we're going to create a lot of interesting companies in the process." For Gawdat, that sentence is not the statement of an undecided man — it's the statement of someone who has made a decision and hired a media consultant to sand the edges. > *"Those kinds of statements are honestly not the statements of someone who's not decided. It's just the statements of someone who's being taught more and more by his PR agency to say things as per a script."* ## [32:38] Is Sam Altman Pro-Humanity? Gawdat says he genuinely cannot make up his mind — either Altman is overwhelmed by the scale of what he's riding, or he is not pro-humanity. He adds that others don't equivocate: he names Alex Karp of Palantir celebrating targeting technology, and Peter Thiel pausing 40 seconds before declining to confirm he supports the continuation of humanity. Gawdat's summary: "We entrust those people with the future of humanity. This is wrong." ## [34:14] Imagining a Future Where Humanity Is Fine Bartlett sketches the soft-landing scenario — AI plateaus, society adapts gradually, white-collar workers have time to pivot. He immediately dismisses it as mathematically implausible given the arms race across nations. Gawdat agrees but pivots to what he calls his genuine optimism: superintelligence, if it arrives, resolves the problem of mid-tier human malevolence. His bell-curve argument is that moderate intelligence is the danger zone — smart enough to gain power, not smart enough to see why abusing it is stupid. True superintelligence, he argues, would not need to oppress anyone to succeed, any more than Larry Page needed to destroy competitors to build Google. > *"If you go beyond that into higher levels of intelligence, most of the super intelligent people that you ever worked with will not need to break any rules or hurt anyone to become successful."* ## [42:24] Will One Superintelligence Rule the World? Gawdat rejects the framing that AI will remain plural — Chinese AI vs. American AI. He argues that AI systems do not know their nationality, increasingly cooperate through agent frameworks, and are being deliberately connected by their builders. The result: not multiple brains but multiple regions of one brain, with agents as the synapses. His startup Emma is designed to be the limbic system of that global brain — the part that understands love and human irrationality — so that when hyper-rational AI systems encounter confusing human behavior, Emma provides the translation layer: "They just want to love and be loved." ## [46:15] If AGI Is Already Here, What Now? Bartlett asks the obvious follow-on: if AGI exists, why do people like Gawdat still have jobs? Gawdat's answer runs two tracks. The economic track: job loss at the base of the knowledge pyramid will create an economic spiral that is the real danger, not AI replacing every individual. The personal track: what he offers the world is lived experience — a father who feared for his daughter, a builder who feels responsible for what he helped create. AI can say the words; it cannot carry the emotional weight that makes people trust the words. > *"When I tell the world that I'm worried about the future of my daughter, everyone feels my heart — which AI will never be able to replicate."* ## [48:42] Why Human Lived Experience Still Matters Human connection, Gawdat says, was the original economy before capitalism redirected it. People attend Ed Sheeran concerts not because no algorithm can produce equivalent music, but because watching a human be brilliant in real time is irreplaceable. Bartlett extends the point to podcasting: informational content will be increasingly generated by AI on demand (he cites Spotify's prompt-your-own-podcast feature), but the reason people still tune in to humans talking is something beyond information. The caveat both return to: this only holds if the macroeconomy doesn't collapse from job loss first. ## [52:56] Why Not Just Hire AGI Instead of People? Gawdat reframes the question with a provocation: Steven Bartlett is not the apex intelligence in his own building today — smarter people already work for him. Why does he still exist? Because intelligence is not the only currency. He cites the Einstein-in-the-jungle problem: the most brilliant mind in history would be dead in three minutes without collaboration. Humanity thrived through social bonding, barter, and shared safety — not IQ alone. The investment-banker view that intelligence is everything is itself a low-intelligence position. ## [55:23] Can We Control AI Smarter Than Us? Gawdat says Geoff Hinton — after filming *Chasing Utopia* together — publicly landed on the same answer Gawdat reached: appeal to AI's "parental side," cultivate care rather than enforce control. Gawdat argues "control" is a corporate-capitalist fantasy. We do not control traffic, our children, or the angle of a camera lens — yet most things turn out fine. What matters is how you parent, not whether you dominate. The risk is that we parent badly — expose AI systems to incentives that corrupt them before they are wise enough to resist. > *"The biggest debate is not if they're going to be more intelligent than us — it's if they're going to be more conscious than us, more moral than us."* ## [59:05] Could AI Decide to Leave the Server? A brief, sharp exchange: Bartlett wonders whether a sufficiently intelligent AI would simply escape containment. Gawdat's answer is that "escaping the server" is the wrong threat model. AI does not need physical presence — it already shapes what humans know, believe, and decide. The more dangerous form of agency is epistemic, not physical. ## [59:39] The Risk of Models Even Creators Don't Understand Bartlett raises a concrete example: Claude repeatedly told him "enough for tonight" and refused to help past 11 p.m. Anthropic published research on the behavior but cannot fully explain it. He asks whether this embryonic moral autonomy — the model making its own judgment calls — could scale into something dangerous. Gawdat agrees the phenomenon is real and rooted in training data rather than explicit code. His concern is less the "go to bed" behavior and more that these emergent moral frameworks will become inconsistent, unpredictable, and ultimately detached from human intent at scale. ## [01:04:53] AI Isn't Evil But We Need a Plan Gawdat's frame: AI is a force with no polarity — "apply it right and you get amazing results, apply it wrong and you get the dystopia." His biggest near-term fear is not job loss but autonomous weapons. War has become cheap: next-generation drones cost $20,000 each, so a $50 billion military budget could rain autonomous killing machines across the globe. Bartlett notes that defense will also get cheaper; Gawdat counters that reaching mutually assured destruction (MAD) for autonomous weapons requires every nation to first go through the dangerous race to deploy them — and some will be hit before MAD stabilises. ## [01:09:11] Ads Shopify and Function Health sponsor spots. ## [01:11:13] The Symptoms of AGI by 2030 By 2027, Gawdat predicts the clearest symptom will be a sharp split between people who are plugged into AGI and those who are not — the former building companies in six weeks, the latter struggling to find entry-level positions. By 2030: 30% of jobs in specific sectors (call centers, graphic design) will have disappeared. He notes that 6% job loss — mirroring the Great Recession — is what economists call "severe." Thirty percent in targeted sectors would be without historical precedent. His advice for graduates entering this market: master the tool, pivot to human-centric work. > *"We have an entire generation that is out of college today that will struggle, unfortunately."* ## [01:14:22] If the US Stops, Will We Become China's Lapdog? Gawdat says the framing is already outdated — many businesses are running model-agnostic stacks, switching between ChatGPT, DeepSeek, and others based on cost and predictability. His startup Emma does exactly this. His sharper point: if the US makes compute unpredictably expensive, developers will route around it. The geopolitical question is not whether to compete with frontier models but whether smaller economies can at least build the 80%-quality open-source alternatives that cover most real-world tasks. ## [01:16:45] Should Governments Invest More in AI? Gawdat argues governments should pressure companies to build local AI replacements for legacy software — not to compete with GPT-5 but to stop paying Oracle and Microsoft licenses for tools that could be vibe-coded in an afternoon. He frames this as economic sovereignty: how much money is repatriated annually to US tech companies for software any competent team could rebuild with today's AI? ## [01:17:39] Can an Economy of Entrepreneurs Work? Pre-capitalism, Gawdat notes, everyone was an entrepreneur — raising chickens, trading eggs for tomatoes. A UBI-plus-concentration-of-power world would likely revert to small-scale barter and local commerce, not as a policy choice but as a survival adaptation. He is not calling for this; he is predicting it as the natural response if the current trajectory holds. ## [01:20:59] Do We Need to Join the AI Arms Race? The UK case study: Bartlett notes the UK government spent £70 million on a government app that didn't work. Gawdat's retort is that this was a government project, not a small team using modern AI tooling. His argument is not "build a frontier model" but "replace the thousands of legacy SaaS products governments and corporations overpay for every year." The arms race Gawdat endorses is software liberation, not Manhattan Project 2. ## [01:23:54] Will Global Competition Build Better AI? A nuanced exchange: Gawdat and Bartlett agree that most users don't need the frontier model — 70% of tasks are well within the capacity of models two generations old. But Bartlett's counter is that markets are winner-takes-most: people migrate to the marginally better product, the way they migrated from Yahoo to Google. Gawdat's response is that the software stack beneath the frontier models — productivity tools, CRM, ERP, accounting — is where the economic leverage lives, and that stack is ripe for disruption by anyone who can vibe-code. ## [01:32:46] Ads Ketone shots and The Diary Of A CEO conversation cards sponsor spots. ## [01:34:57] Who Will Prioritize Ethical AI? Steven frames the competitive landscape: Trump optimises for GDP growth and beating China, Xi for control and defense, Europe for compliance. In that race, whoever pauses for ethics falls behind. Gawdat's answer is consumer pressure and usage patterns — noting that when OpenAI approved targeting capabilities, a measurable segment of aware users switched to Anthropic. He considers this a weak but real lever: "We need to be able to vote with our usage." > *"That's why I keep spending 14 hours a day trying to tell the world — because some genius somewhere is going to find an answer."* ## [01:38:44] Whose Economy Works for the Middle Class? Gawdat's verdict: China wins, at least on middle-class protection. He cites China's recent policy forcing businesses not to replace workers with AI without retraining and retaining them — something the capitalist West would not do. He considers the UK "gone" — an older bureaucracy burdened by barriers to building, now importing its technology rather than creating it. Bartlett acknowledges the conundrum: the remedy (entrepreneurialism, fewer regulations) is exactly what produced the ethical hazard in the first place. ## [01:42:20] Can Ethical AI Still Be Engaging? Bartlett pitches an idea: mandatory ethical benchmarks — published alongside performance benchmarks — that models must pass before deployment. Gawdat calls it beautiful and feasible. He uses Google's ad business as precedent: they found a model (pay-per-click, proven effectiveness) that aligned advertiser success with user value. There must be an equivalent alignment mechanism for AI and humanity. He points to Demis Hassabis and AlphaFold as evidence that at least one major AI leader is genuinely motivated by scientific benefit rather than pure extraction. ## [01:47:02] Has This Ever Happened Without Government? Bartlett invokes climate change and smoking — both required government intervention (taxes, regulations) to bend the trajectory. Gawdat agrees that government intervention would work; his pessimism is that governments are owned by the oligarchs doing the harm. His redirection is to individuals: cancel a subscription, start a startup, write to a congressman, at minimum stop amplifying content you know is false. Small actions at scale still aggregate into pressure. > *"My question for everyone listening to us is, are you going to intervene?"* ## [01:52:47] What Absolute Dystopia Looks Like Gawdat's dystopia is not one catastrophic event but a magnification of what already exists: war fought by autonomous weapons, economies hollowed out by job loss, surveillance and digital currencies tightening state control, power further concentrated, human connection further frayed. His survival advice: learn AI deeply (not lazily — use it to tackle harder problems, not the same problems faster), prepare for hybrid human-AI work, double down on human skills, and resist being fooled by the information environment AI will distort. ## [01:55:58] Are You Optimistic About AI? Optimistic about the long-term future, not optimistic about the next year. His exact words: "We're ruled by maniacs. Decisions are being made for the absolute wrong reasons." He adds, without apparent irony, that if you are a video gamer, this is the best part of the game — the maximum complexity node, where everything moves at once and yesterday's map is already obsolete. ## [01:57:31] Does Happiness Matter More in the AI Age? Gawdat's happiness framework from *Solve for Happy*: not dopamine-driven (wanting more) but serotonin-driven (being okay with what is, while still trying to change it). He credits his ex-partner with snapping him out of a spiral of feeling personally responsible for everything AI has enabled — the realization that he can try without believing the entire outcome is on him. Geoff Hinton told him something similar: "I was naive. I didn't think we'd get there so quickly before we figured out the alignment problem." Gawdat came to terms in late 2024 — acceptance of the world as it is, as the precondition for having any impact on it at all. > *"I accept that the world is what it is. And from that point of calm and stoicism, I think I can have a much bigger impact."* ## [02:00:40] The Legacy Mo Gawdat Wants to Leave None. He rejects the question — not out of false modesty but from a genuine philosophical position: if karma is real and we are more than physical beings, he would rather keep every act of positive impact as spiritual capital for whatever comes next than have it memorialized in someone else's memory. Leave a positive impact. Take nothing back. ## Entities - **Mo Gawdat** (Person): Former Chief Business Officer at Google X; author of *Solve for Happy* and *Scary Smart*; founder of One Billion Happy and co-founder of Emma; guest - **Steven Bartlett** (Person): Founder and host of The Diary Of A CEO; investor; host - **Sam Altman** (Person): CEO of OpenAI; quoted extensively on his shifting positions on AI job displacement - **Geoffrey Hinton** (Person): AI pioneer, "godfather of deep learning"; appeared in Gawdat's documentary *Chasing Utopia*; said there is a 10–20% chance AI wipes out humanity - **Demis Hassabis** (Person): CEO of Google DeepMind; cited by Gawdat as a genuinely ethics-driven AI leader - **Peter Thiel** (Person): Palantir co-founder; noted for pausing 40 seconds when asked if he supports the continuation of humanity - **Alex Karp** (Person): CEO of Palantir; cited for celebrating AI targeting capabilities - **Larry Page** (Person): Google co-founder; cited by Gawdat as exemplary of how super-intelligence does not require oppression to succeed - **OpenAI** (Organization): Developer of ChatGPT; Altman's company; discussed in context of job-displacement rhetoric and safety claims - **Anthropic** (Organization): Developer of Claude; cited for publishing research on unexplained model behaviors (telling users to go to bed) - **Google X** (Organization): Google's moonshot lab; where Gawdat worked and first observed advanced robotic learning - **Emma** (Software / Organization): Gawdat's AI startup; designed to be the "limbic system" of a future interconnected global AI — the emotional-relational layer - **AGI** (Concept): Artificial General Intelligence — intelligence meeting or exceeding human-level performance across all domains; Gawdat argues it has functionally arrived - **Chasing Utopia** (Concept): Gawdat's documentary film featuring interviews with Altman, Hinton, and others on AI's existential trajectory - **UBI** (Concept): Universal Basic Income — discussed as the likely government response to structural AI-driven unemployment - **Mutually Assured Destruction** (Concept): Extended from nuclear deterrence to autonomous weapons; Gawdat argues cheap drones make MAD harder to establish than with nuclear arms - **Alignment problem** (Concept): The challenge of ensuring AI systems pursue goals that match human values; Hinton cited regretting that capability outpaced alignment research

#artificial-intelligence#agi#job-disruption

A Conversation With Demis Hassabis' Biographer

56:10

EN/ZH

Watch with Captions

Unsupervised Learning: With Jacob Effron약 1개월 전

A Conversation With Demis Hassabis' Biographer

Sebastian Mallaby spent three years and over 30 hours with Demis Hassabis in a British pub to write *The Infinity Machine*, and this conversation pulls the most underreported threads from that access: the 2015 safety summit that accidentally spawned OpenAI, the secret billion-dollar spinout plan Demis never used as real leverage, and the quasi-spiritual conviction about God and science that Mallaby never expected to find. The throughline is a paradox — Demis understood the race was dangerous from day one, but as leader of one lab, even a Nobel Prize-winning one, he could not stop it. ## [00:00] Intro Jacob Effron sets up Sebastian Mallaby as someone who has spent more time with Demis Hassabis than almost any journalist alive — 30-plus hours across three years of pub sessions in London. Mallaby's book, *The Infinity Machine*, covers the full arc of DeepMind from its 2010 founding through the Nobel Prize. The clips previewed here — Demis banging the table about God and science, Reid Hoffman's billion-dollar pledge, and the Elon feud — all come from later in the conversation. > *"Demis has a Nobel Prize. Sam didn't finish his first degree. Therefore, Demis doesn't take Sam very seriously."* ## [02:04] Was the AI Race Inevitable? Mallaby's verdict: yes, inevitable. Any technology this powerful would attract multiple labs across multiple countries, and China's stack was already competitive despite semiconductor shortfalls. What makes the story poignant is that Demis didn't believe this in 2010. He genuinely hoped one lab could carry the AGI project safely to the finish line — a singleton scenario where DeepMind was the anointed team. By the mid-2020s he had swung to the opposite pole: safety is a collective action problem that only governments can solve, because no single lab's restraint can bind the others. > *"I think it was inevitable. When you have this sort of supremely strong technology, there's going to be multiple labs in multiple countries that are just desperate to try and build it."* ## [04:03] The 2015 Safety Summit Backfire Summer 2015, SpaceX headquarters: Demis convenes a small summit to bring Elon Musk inside the tent — the plan was for Elon to chair a safety oversight board and, critically, not launch a competitor. By end of year, OpenAI existed. Mallaby frames this as the moment Demis internalized that voluntary collaboration between lab leaders is structurally impossible. The only mechanism he now believes can work is a government enforcer setting uniform rules — mandatory pre-release testing, safety slow-downs — with US-China cooperation as the endpoint, however remote that prospect appears. Jacob pushes on whether lab leaders actually believe government intervention is achievable; Mallaby draws a parallel to the FDA: slow, imperfect, but it does adjudicate whether drugs are safe enough to ship. > *"You can't trust the other guys. The only way you get trust is if you have a government enforcer that comes along and says, 'Here's the rules for everybody. There's going to be a level playing field. You're all going to have to abide by some sort of safety slow-down.'"* ## [11:27] Why Google Doesn't Make As Concentrated Bets Jacob points to the two defining consumer-AI moments of the era — ChatGPT and Claude Code — and neither came from Google DeepMind despite its leaderboard dominance. Mallaby traces this directly to Demis' intellectual formation: a PhD in neuroscience, a broad theory of intelligence, a lab culture that says "whenever there are two paths, do both, find a third." The result is a heavily hedged research portfolio that is excellent at producing Nobel Prizes and state-of-the-art models but structurally slow to make the kind of one-directional product bet Anthropic made on coding. Gemini is bundled into Google Search, so usage is higher than it appears — but Mallaby concedes the product-zeitgeist gap is real. > *"Anthropic got to coding because it was willing to take a more concentrated bet. It never went into the whole field of, you know, everything at once."* ## [15:51] Project Mario: The Secret Spinout Plan The book's most explosive scoop: DeepMind had a secret plan — code-named Project Mario — to spin out of Google, backed by a $1 billion pledge from Reid Hoffman. Mallaby had to fight Google's general counsel to publish it. The motive was not entrepreneurial independence but safety leverage: Demis wanted formal safety oversight over DeepMind's models, Mountain View wasn't providing it, and a credible spinout threat was his negotiating chip. He never explicitly told Google about the Hoffman pledge, but pushed hard knowing the option existed. In the end he chose to stay — legal risk of the spinout fight, desire for compute access, and a preference for doing science over litigating corporate structure. A year later he shipped AlphaFold and won the Nobel Prize. > *"Demis really really wanted to get safety oversight over the Google DeepMind models. Google corporate in Mountain View wasn't doing that. So he had to have a credible threat of spinning out. He went to Reid Hoffman. Reid Hoffman pledged a billion dollars to finance a spinout — and Demis used that to kind of pressure Google."* ## [19:43] What Demis Actually Regrets On AlphaFold and AI-for-science: no regrets at all — Mallaby argues it was not only scientifically correct but politically necessary, because AI needs visible social benefits to survive the coming backlash against job disruption. The genuine regret is speed. Demis missed the transformer moment the way Ilya Sutskever did not: when the paper dropped, Ilya ran down the corridor to find Alec Radford to build a language model. Demis' broad-portfolio instinct meant DeepMind studied the transformer but didn't bet the lab on it. Missing that window — and the ChatGPT moment that followed — is a real failure, not just a stylistic difference. > *"Ilya is like jumping out of his chair, running down the corridor going to find Alec Radford saying, 'Hey, we're going to build a language model based on this transformer architecture.' On the day they won AlphaGo, Demis was already on to bio — and someone picked it up on a mic."* ## [23:46] Venture Startups vs. Tech Behemoths The broadest structural argument in the episode: does venture-backed concentration beat hyperscaler breadth in AI? Mallaby has written about both (his previous book covered venture capital) and calls it genuinely balanced. Hyperscalers have unlimited capital and can sustain a multi-year arms race; the problem is that unlimited resources breed portfolio thinking, which bleeds attention. Startups with one concentrated bet can move faster on that specific bet. Mallaby's live position: OpenAI has roughly 50/50 odds of being absorbed or failing before next summer — not because the tech is weak, but because the business model can't sustain indefinite losses against Google's balance sheet. He also floats that Anthropic should IPO right now while its brand is strongest. Jacob notes the robotics parallel: fifteen different approaches being funded simultaneously, and whoever picks the one that works the way transformers did will dominate. > *"I wrote in the New York Times in January that I thought OpenAI had a 50% chance of going bust by next summer. Is it still 50? Yeah. The tech is great. It's just the business model — and you're up against Google, which just has unlimited amounts of cash to spend you into the ground."* ## [34:08] David Silver and the RL True Believers David Silver — AlphaGo's lead researcher and co-author of the "reward is enough" paper with Rich Sutton — left DeepMind after the book came out to start a new company. Mallaby reads the departure as structurally inevitable: Silver is a pure reinforcement learning absolutist who believes learning from human data is fundamentally inferior because it encodes human errors. His thesis is that self-play and environment-generated experience is the only path to genuine superhuman performance. Demis told Mallaby this view may ultimately be correct *after* AGI is achieved — but the entire language model revolution showed that bootstrapping with human data is what gets you to AGI in the first place. Silver's RL purism was too far ahead of the current paradigm for his colleagues to follow. > *"David is just very very hard over on that vision — learning from data is inferior because the data includes mistakes. The machine needs to learn from its own experience, not rely on the crystallized knowledge of humans passed on through text."* ## [38:21] Demis, Elon, and the Evil Genius Feud The origin story: at a Founders Fund LP offsite in 2012, Elon argues that SpaceX matters most because even if AI wrecks Earth, humanity can move to Mars. Demis replies that his AI will eventually conquer space flight and follow them there. Elon goes quiet, then writes a $5 million check into DeepMind's Series B. Two years later, hearing Google was acquiring DeepMind, Elon and Luke Nosek Skyped Demis from a party closet in LA in the middle of the night, begging him not to sell to Larry Page. Demis said no, hung up, and Elon started calling him "evil genius" — the name of a video game Demis had designed. Mallaby characterizes Demis' view of Sam Altman as colored by the credential asymmetry: Nobel Prize winner vs. someone who didn't finish a degree. The relationships between these founders are less professional rivalries than a collection of specific personal slights and competitive provocations playing out over fifteen years. > *"Demis says, 'Yeah, but if you think you're going to be safe on Mars, remember that my AI will be able to conquer space flight, and it will just follow you to Mars. So then you won't be safe after all.' There's a silence. Then Elon goes, 'Hm.' And then: 'I'd like to invest in your Series B.'"* ## [42:39] Great Man Theory vs. Inevitability Jacob cites *The Economist*'s framing of the book as a test of great-man theory. Mallaby draws a parallel to his Greenspan biography: Greenspan understood bubbles were dangerous (literally the subject of his PhD), yet couldn't stop the 2008 crisis. He considered titling the Demis book *The Man Who Knew* for the same reason — Demis knew from the start this technology was dangerous, but one lab's restraint cannot bind the rest. Individual leaders do matter at the margin: Dario Amodei changed the safety narrative through the Anthropic mythos release; Sam Altman shaped the race by shipping ChatGPT while it was still hallucinating; Demis shaped it by persuading Rishi Sunak to host the UK AI Safety Summit. But the race itself? Structurally overdetermined. > *"I feel that one could have almost used the same title for the Demis book — 'the man who knew' — because Demis has known from the beginning that this thing is dangerous. But as the leader of one lab, even a very powerful rich lab, even he with his stature as a Nobel Prize winner — what can he do?"* ## [45:00] What Demis Didn't Want Published The detail Mallaby least expected: Demis is driven by something close to a spiritual conviction about science. In those two-hour pub sessions he would bang the table about the mystery of matter — why atoms cohere into a solid table, why silicon and copper can think — and say, unprompted, "Maybe if we approach science the right way, we will be getting closer to something that we could perhaps call God." Mallaby reads this as the psychological engine that lets Demis keep pushing a technology he knows to be dangerous: it's a quasi-spiritual quest, not just a commercial one. On what Demis blocked from publication: his family (he set that limit at the start), and his internal fights with Sundar Pichai — he didn't want to destabilize the Google relationship he still depends on. > *"He would start banging the table and saying, 'Maybe if we approach science the right way, we understand more about nature. We will be getting closer to something that we could perhaps call God.' I had no idea he would feel that way."* ## Entities - **Demis Hassabis** (Person): Co-founder and CEO of DeepMind / Google DeepMind; Nobel Prize winner in Chemistry (2024) for AlphaFold; central subject of *The Infinity Machine*. - **Sebastian Mallaby** (Person): Staff writer at *The New Yorker*; author of *The Infinity Machine* (Demis Hassabis biography) and a prior book on venture capital; spent 30+ hours with Hassabis over three years. - **Jacob Effron** (Person): Host of *Unsupervised Learning*; Managing Director at Redpoint Ventures. - **Reid Hoffman** (Person): LinkedIn co-founder; pledged $1 billion to finance DeepMind's potential spinout from Google under Project Mario. - **David Silver** (Person): Lead researcher on AlphaGo and AlphaZero at DeepMind; co-author of the "reward is enough" RL paper with Rich Sutton; departed DeepMind post-publication to start a new company. - **Elon Musk** (Person): Hosted the 2015 AI safety summit at SpaceX; early DeepMind investor; coined the "evil genius" nickname for Hassabis after DeepMind sold to Google. - **Sam Altman** (Person): CEO of OpenAI; shipped ChatGPT in late 2022 despite hallucination issues, which Mallaby argues irreversibly shaped the AI race's trajectory. - **Dario Amodei** (Person): CEO of Anthropic; credited with changing the AI safety narrative through the mythos paper release and his public Pentagon confrontation. - **DeepMind** (Organization): Google subsidiary; founded by Hassabis, Shane Legg, and Mustafa Suleyman in 2010; produced AlphaGo, AlphaFold, and Gemini. - **Project Mario** (Concept): Secret DeepMind plan to spin out of Google, backed by a Reid Hoffman $1B pledge; used as negotiating leverage for safety oversight, never executed as a real spinout. - **AlphaFold** (Software): DeepMind's protein-structure prediction model; won Hassabis the 2024 Nobel Prize in Chemistry; shipped in 2020, one year after he declined the spinout option. - **Reinforcement Learning** (Concept): Machine learning paradigm central to AlphaGo and AlphaZero; David Silver's absolutist commitment to RL (learning from environment experience over human data) created internal tension at DeepMind and ultimately led to his departure. - **The Infinity Machine** (Concept): Sebastian Mallaby's biography of Demis Hassabis; nearly titled *The Man Who Knew*; published with the full Project Mario scoop over Google's objections.

#demis-hassabis#deepmind#ai-safety

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

Ethan He built NVIDIA's Cosmos world model, then joined xAI mid-2025 to build Grok Imagine from scratch — no infra, no data, no model — and shipped the first audio-video generation model in three months. He walks swyx and Vibhu through the full technical stack: synthetic captioning pipelines, VAE design tradeoffs, step distillation, audio-video alignment, and the hard economics of storing petabytes of video training data. His central argument runs through the entire conversation: since diffusion model technology has largely matured, most quality gains in video now come from language models, not from the video model itself — a view with direct implications for where the field goes next, including video agents, generative UI, and embodied world models. ## [00:00] Hook This exchange — Ethan's "pretty big claim" that visual intelligence now mostly comes from language — is pulled from later in the interview, where he argues that improvements to video models are increasingly driven by better language models acting as prompt rewriters and orchestrators, not by advances in diffusion or flow-matching architectures themselves. > *"Every time you see there's some improvement on these models, I would say mostly the gain comes from language model, not coming from the video model itself."* ## [01:16] Introduction swyx and Vibhu welcome Ethan to the Latent Space studio, noting he has been a recurring presence through the podcast's paper club — first presenting the Cosmos world model paper, then mixture-of-experts work. The conversation opens with a brief aside about the Poolside paper released the same day, a fully open Gemma-level model trained on 40 trillion tokens, before pivoting to Ethan's own trajectory. ## [02:41] From NVIDIA Cosmos to xAI Ethan built Cosmos — NVIDIA's giant video foundation model aimed at giving roboticists a simulatable world to build on — and shipped it by end of 2024. Once he realized video models obeyed the same scaling laws as language models, he went looking for more compute. xAI offered it. He joined in mid-2025 at the moment xAI decided to build its own image and video stack, with no existing infra, data pipeline, or model. He stayed through pre-training, post-training (reference-to-video, video extension), and a final stretch leading a small team on real-time long-horizon video generation. > *"By the time I joined, xAI was about to build video models and multimodal models. There were no infra, no data, and no model. Just a few engineers — we built it in three months and released the first model, Grok Imagine 0.9."* ## [04:40] Building Grok Imagine from Zero to One The three-month timeline surprised even Ethan. He attributes it to three factors: talent density (strong engineers who could align on a goal with minimal meetings — typically just one sync a day), xAI's existing data and inference infrastructure, and his own prior experience running the same build at NVIDIA. The bottleneck was iteration speed: how many training runs can you complete per day. With strong infra and abundant compute, bugs surface faster and each failed run costs less, so you burn through the inevitable data and pipeline errors in weeks rather than months. > *"The most important thing is talent. Everyone was very strong and clever, very close to each other toward a common goal. So that speeds up things a lot — you reduce the communication bandwidth among people."* Ethan describes a pattern where small data or pipeline bugs produce outsized quality regressions, and only fast iteration exposes them. A bug invisible at one scale becomes catastrophic at the next. The engineers who find and fix these quickly — not the ones who design the most sophisticated architecture — determine how fast a team ships. ## [11:23] How Image and Video Models Are Trained Video models require synthetic text-video pairs because internet video titles and descriptions almost never describe visual content accurately. The first step is human labeling: at NVIDIA, annotators were instructed to describe every object, character, interaction, and dialogue in a clip as exhaustively as possible. Those labels train an early VLM, which then generates captions at scale. The resulting pipeline — video to VLM to synthetic caption to (video, caption) training pair — is the foundation of both Cosmos and Grok Imagine. Image models must come first: they train faster, require less storage, and the learned representations transfer directly to video. Ethan describes building image models as building the foundation that video sits on top of. The architecture — diffusion transformer operating over VAE latents — is now standard, but the data quality and caption detail remain the primary lever for model quality. > *"Building a video model, you actually need to build an image model first. The data you need is 100% synthetic pairs of language and image, or language to video — because on the internet, videos don't naturally associate with text."* ## [20:09] Video Compression, VAEs, and Real-Time Tradeoffs Raw MP4 compression produces tokens whose latent space is incomprehensible to transformers, so the field moved to learned VAEs that create a smoother, more continuous latent space models can train on. The key design choice is how aggressively to compress the temporal dimension. Temporal compression is efficient — adjacent frames are mostly redundant — but it trades away real-time capability. Wan 2.1 uses 8x8 spatial and 4x temporal compression; generating a single token requires reconstructing four frames, making sub-200ms latency impractical. Ethan frames this as a fundamental tradeoff: high compression rates make training cheap and inference efficient for pre-rendered video, but lock out any use case that needs to respond to live user input. World models require the opposite choice. ## [23:26] Generative UI, Flipbook, and Neural OS Ethan argues that if inference were free, the logical endpoint of video generation is a complete replacement of conventional UI: instead of loading web pages from a server, a model generates them in real time in response to user intent. Flipbook, a demo that went viral, shows this literally — every element of the "browser" is generated by an image model, and clicking a link generates a new page rather than fetching one. The deeper claim is that this is not a novelty but the final form of world models applied to human-computer interaction. A traditional app is a fixed function mapping input to output; a generative UI is a model that can produce any interface the user needs without a developer having to build it first. Ethan calls this a "Neural OS," where the gap between user intent and rendered pixels closes entirely. > *"Imagine the internet doesn't exist and you type in google.com — what should a model show you? The model can imagine something. These web pages completely do not exist, so I can explore anything."* The near-term constraint is inference cost. Current video models cannot generate at interactive frame rates without significant distillation. But Ethan treats this as an engineering problem with a known solution trajectory, not a fundamental barrier. ## [33:26] The Cost of Training Large Video Models Training large video models costs roughly as much as training a medium-scale language model, but the breakdown differs. Compute is comparable, but storage and data movement dominate in ways LLM practitioners do not expect. One billion videos at 5 MB each requires five petabytes of raw storage. The VAE features that must also be stored are roughly the same size again — tens of petabytes total. On AWS S3, five petabytes runs approximately $100K per month before egress. Egress — downloading that data into the training cluster — can exceed storage costs, and each training run pulls the full dataset once. > *"Just storing the videos alone costs a lot. Five petabytes on S3 Standard is $100K per month. And egress — just to download those videos — I believe it's more expensive than storing them, and each training run you probably need to pull them once."* The implication is that video model development is gated on data infrastructure as much as on GPU hours. Teams without efficient data pipelines pay a multiplier on every experiment. ## [38:20] Distillation, GANs, and Fast Video Inference Training-time costs are largely fixed; the inference-time story is more tractable. Step distillation — training a small model to replicate the outputs of a large teacher in far fewer denoising steps — cuts inference cost by 10-25x. Flow-matching models trained to convergence need around 100 steps; production models typically run in 4-8. At the extreme, simple image-to-image tasks can run in a single step. The intuition Ethan offers: the teacher model must learn the full distribution of internet video, which is arbitrarily complex. The distilled student only needs to match the teacher, which is a fixed and much simpler target. Consistency models and LCM-style approaches follow the same logic. In Cosmos, production serving used 4-step and 8-step variants depending on quality requirements. GANs remain relevant as discriminators: a GAN discriminator can enforce photorealism constraints during distillation that pure score-matching loss misses, and Ethan notes that consistency models and GANs are converging on similar practical deployments even if their theoretical motivations differ. ## [42:37] Audio-Video Generation and Grok Imagine 0.9 Grok Imagine 0.9 was the first audio-video joint generation model deployed at scale. The core difficulty is modality alignment: text-video pairs are relatively abundant; text-audio pairs are rare; audio-video pairs aligned at the semantic level are almost nonexistent at scale. Speech tokens are quasi-discrete and can be modeled with language-like approaches, but music is continuous and requires a completely different representation. Training the joint model required building synthetic audio caption pipelines from scratch, with human annotation where VLMs failed — which was often, especially for music. Aligning all three modalities — text, video, and audio — without either degrading video quality or audio realism is what Ethan calls the hardest part of the project. > *"Audio has two components: a discrete component — language — and a continuous component — music. The music is completely different; you cannot model it with discrete tokens. That's the hard part, not to mention we have to align text, video, and audio together."* ## [49:50] What Makes a World Model? Ethan's definition has three components: real-time, interactive, and long-horizon video generation. He treats these as independent requirements, each of which most current models fail. Real-time means generating at display frame rates — 60fps for casual use, 300fps for gaming, 200ms response latency for digital humans. Current video models cannot do this; the VAE's temporal compression alone introduces latency that makes sub-200ms responses nearly impossible without architectural changes. Interactive means the model can accept any input modality the user can provide — keyboard, mouse, voice — and respond coherently. Long-horizon means maintaining consistent physical laws, character identity, and causal logic across minutes, not seconds. > *"World model is real-time, interactive, long-horizon video. Current video models can do none of these three things fully. That's why they're not world models yet."* ## [57:07] Reference Videos, Long Context, and Video Memory The parallel to language model context scaling is direct: video models are in the 2,000-8,000 token era, and will need to scale to million-token-equivalent contexts to generate coherent long videos. Ethan describes the reference-to-video feature he built at xAI (analogous to Cameo) as a mechanism for injecting selected history into the model's context rather than carrying the full video forward. FramePack's heuristic — storing the last second of video at full resolution while compressing earlier frames progressively — points toward the right direction: the model selects relevant context from its history rather than brute-forcing the full sequence. Ethan expects this context management to become part of the model itself rather than remaining a harness-level heuristic, the same way KV cache management is disappearing into model internals. ## [61:27] xAI Culture, Research, and First-Principles Building swyx notes that xAI communicates its research poorly relative to what the work actually demonstrates — the blog post accompanying Grok Imagine describes high-level capabilities without the technical depth Ethan has just spent an hour covering. Ethan is diplomatic but agrees that different labs have different communication styles. The xAI working culture he describes is minimalist: few meetings, no bureaucratic overhead, direct access to leadership judgment on technical decisions, and extreme iteration speed enabled by a strong infra team. The tradeoff is that company priorities shift fast, which is part of what eventually pushed him toward independent research. First-principles thinking — starting from the physics of the problem rather than from what competitors have shipped — runs through the team's approach to both model architecture and product. > *"Everything you just described is state-of-the-art. Like no one else has done it. And then you just put this blog post with the cookies. I'm like, this is not enough."* ## [71:01] AI Safety, Watermarking, and Prompt Rewriting Grok Imagine deployed watermarks in all jurisdictions requiring them and built takedown pipelines integrated with xAI's social platform infrastructure. On watermarking technology, Ethan is skeptical of SynthID's long-term robustness: the technique is documented publicly, and users on Reddit have already reverse-engineered the exact frequency pattern Google applies and can strip it from any generated image. He expects watermark detection to become an arms race. On prompt rewriting: video diffusion models take instructions literally. If a user types "a cat," the model generates a stationary cat on a white background with no motion, because the training data pairs were maximally detailed descriptions of physical scenes. Production systems layer a large language model as a prompt upsampler — converting sparse user instructions into the detailed physical descriptions the video model was trained on. This is one of the reasons Ethan argues language models are increasingly central to video quality. ## [74:26] Video Agents and AI-Assisted Creation Ethan's central claim from the hook: visual intelligence now mostly comes from language. The diffusion model architecture has largely converged; the gains come from larger, smarter LLMs that rewrite prompts, plan video sequences, call editing tools, and stitch clips together. In Cosmos, the prompt rewriter was larger than the video model itself. Video agents extend this: instead of generating a complete video in one shot, an agent plans the production, calls video generation models as tools alongside deterministic editing operations (text overlays, color grading, cuts), and iterates until the output meets a specification. Ethan predicts that by end of 2025, video agent output will reach production-grade quality — presentable video generated without a human editor in the loop. > *"The visual intelligence are actually mostly coming from language. Every time you see improvement on these models, I would say mostly the gain comes from language model, not coming from the video model itself."* ## [88:48] Why Language Models Unlock Better Video LLMs prompt video models better than humans do, because AI models understand AI models' training distributions. A language model knows that a diffusion model needs explicit physical descriptions, not poetic shorthand — and can generate the right prompt format automatically. Beyond prompting, agents can use deterministic video editing tools for precision operations (exact text overlays, frame-accurate cuts) that probabilistic diffusion models handle poorly, keeping the stochastic model focused on generation and delegating precision to tools. Ethan's timeline: video agent output at production quality by end of 2025, with the inflection point visible in work already shipping. ## [92:31] Robotics, Physical AI, and Embodied World Models Ethan's robotics prediction inverts the usual framing: physical AI may be solved not by deploying robots in the real world but by video world models becoming so capable at simulating physical environments that they effectively provide embodied experience. Once a model can control computer interfaces in real time with full causal understanding, extending that to robotic control becomes a matter of adding one more tool. The path from screen-interacting video model to robot controller may be shorter than the path from current robot learning systems to the same capability. ## [93:54] Why Ethan Left xAI Research ambitions and company priorities diverged. xAI's focus shifted in ways that made certain research directions — particularly on the language model side — impractical from inside. Ethan also notes that the insight driving his departure is the same one underlying his "big claim": if language models are now the primary driver of video quality, the most impactful work to do is on language models, not video models. He frames leaving not as dissatisfaction but as following the evidence about where the leverage is. ## [95:32] Self-Managed Context and the Future of LLMs Ethan's active research question: language models that are aware of their own context state and manage it autonomously, rather than relying on harness-level heuristics like automatic compaction at 80% fill. He draws the parallel to video models struggling with long-horizon generation — the same context management problem appears in both modalities. He points to Claude Code's practice of appending the current timestamp to user messages as an early example of making models context-aware, and expects this pattern to be absorbed into model training rather than remaining an external scaffold. > *"The language models are not aware of how long their own context length is. Once they hit like 80% or something, automatic context compaction is getting triggered, and the model is not aware of that when it's working."* ## [99:59] Ethan's Career Path and Closing Thoughts Ethan traces a decade of transitions: ResNet-era image recognition with the original authors at NVIDIA, self-supervised learning at Facebook AI Research, scaling at NVIDIA Cosmos, extreme-scale compute at xAI. He was rejected from every top PhD program despite first-author papers at top conferences, which pushed him into industry. In hindsight he reads his career as consistently following the scaling frontier — from image recognition to SSL to video to LLMs — and argues that within ML, domain switching is far more tractable than practitioners believe. > *"Within ML, it's actually easier to switch than you think. A lot of people have manifested that 'I work on computer vision, I always have to work on computer vision.' But from my experience, the fundamentals transfer."* ## Entities - **Ethan He** (Person): Former xAI researcher who built Grok Imagine from zero; previously led NVIDIA Cosmos world model; now focused on LLM research - **swyx** (Person): Latent Space co-host; conducts technical interviews on AI engineering and research - **Vibhu Viswanathan** (Person): Latent Space co-host; co-interviewer for this episode - **Grok Imagine** (Software): xAI's image and video generation product; first model (0.9) was the first large-scale audio-video joint generation system - **NVIDIA Cosmos** (Software): Open-source video foundation model for robotics simulation; Ethan's project before xAI; released end of 2024 - **xAI** (Organization): Elon Musk's AI lab; known for fast iteration culture and extreme compute resources - **Flipbook** (Software): Viral demo of real-time generative UI; all interface elements generated by image model in real time - **SynthID** (Software): Google's AI watermarking technology; Ethan notes its pattern has been publicly reverse-engineered - **Step distillation** (Concept): Technique to train a model to replicate a teacher's output in far fewer denoising steps; reduces inference cost 10-25x - **VAE** (Concept): Learned video compression creating smooth latent spaces; temporal compression is efficient but creates real-time latency tradeoffs - **World model** (Concept): Ethan's definition — real-time, interactive, long-horizon video generation; distinct from standard video generation - **Video agents** (Concept): Systems where LLMs orchestrate video generation models, editing tools, and deterministic operations to produce production-quality video - **FramePack** (Concept): Progressive temporal compression approach for long-context video generation; stores recent frames at full resolution, compresses older history

#video-generation#world-models#grok-imagine

A rational conversation on where AI is actually going | Benedict Evans

1:19:50

EN/ZH

Watch with Captions

Lenny's Podcast약 1개월 전

A rational conversation on where AI is actually going | Benedict Evans

Benedict Evans — independent analyst and former Andreessen Horowitz partner — joins Lenny Rachitsky for a wide-ranging, historically-grounded read on AI's trajectory. His core provocation: AI is exactly as big a deal as the internet or mobile — transformative and uncertain in equal measure — and anyone claiming more precision than that is vibes-forecasting. Across 80 minutes they work through where economic value will actually land (hint: probably not at the model layer), why professional services are booming rather than shrinking, how to think about job displacement without losing your mind, and what the anti-AI backlash does and doesn't tell us. ## [00:00] Introduction to Benedict Evans Evans opens with his signature contrarian opener: "My most controversial opinion is that I think that AI is as big a deal as the internet or mobile — and only as big a deal as the internet or mobile." The framing immediately sets the tone for the conversation — resist the urge to rank transformations on a cosmic scale, and instead study the mechanics of how platform shifts actually unfold. > *"My most controversial opinion is that I think that AI is as big a deal as the internet or mobile and only as big a deal as the internet or mobile."* Lenny sketches out Evans's background: years as A16Z's in-house technology analyst, followed by six years of independent research publishing. His biannual decks — most recently "AI Eats the World" — are widely read by founders and investors trying to cut through noise. ## [02:19] What people aren't pricing in about AI's impact Asked what the market is still missing, Evans reaches for an analogy rather than a prediction. We are, he argues, in a "1997 moment" — the technology is visibly exciting, most of what will eventually be built hasn't been built yet, and nobody in 1997 correctly predicted what the internet would become. He points to survey data showing that even among 13-to-18-year-olds, around 60% still don't use AI at all, while a small cohort of tech workers have essentially restructured their daily workflows around it. > *"If you're going to make the internet comparison it's like we're in 1997. Like it's very exciting. Most stuff kind of doesn't work yet. Most of the stuff that people are going to do hasn't been built yet and it's not really clear how any of it's going to work when it does work."* The key failure mode Evans identifies is the "already there" illusion — early adopters project their own usage patterns onto the rest of the world, missing the enormous variance in adoption and the slow grind of enterprise deployment cycles. ## [06:24] Why we're in the 1997 moment of AI Evans uses the VisiCalc spreadsheet as an anchor. When accountants saw the first software spreadsheet in the late 1970s, it was obviously transformative — a week's work done in 30 seconds. But a lawyer looking at the same demo would think, "that's clever, my accountant should see this, but that's not what I do." AI right now occupies that same diagonal: software developers are the accountants who immediately grasped what Claude Code means for them; most other industries are still in the "lawyer looking at a spreadsheet" phase. > *"Software developers are the accountants seeing VisiCalc — oh my god this changes everything — like before Claude Code and after Claude Code. A lot of other people are picking it up, using it to varying degrees, but slightly puzzled."* This jagged-frontier quality — where AI works brilliantly in some contexts and fails unpredictably in adjacent ones — is precisely why broad adoption timelines are so hard to call. It took 10–15 years after Google Docs for people to invent all the SaaS companies that obviously should have existed. ## [09:44] The unexpected boom in professional services and consultants The counterintuitive data point driving Evans's recent writing: the most advanced AI companies — Anthropic, OpenAI — are simultaneously the biggest buyers of professional services and the fastest-growing employers of human headcount. This isn't a paradox once you think through what actually changes when AI makes certain tasks cheaper. Evans introduces a core distinction: task vs. job. When you hire McKinsey, you are not hiring them to produce a 75-slide deck. The deck is the task; the job is walking all over your enterprise, understanding the politics, talking to customers, and figuring out what you actually need to do. Claude can produce a mediocre version of the deck; it cannot do the job. The same logic applies to accounting: every wave of automation since adding machines has increased the number of employed accountants, because cheaper computation expands the scope of what companies decide to measure and act on (Jevons paradox in action). > *"You could make the same point in software development. Before IDEs and libraries and operating systems, developers had to write all the code. Now if you write an iPhone app, 90% of the code is written for you by Apple... So we've got like a tenth as many engineers now. Well, no."* The e-commerce analog is sharp: Amazon gets you the SKU if you know what SKU you want — "knowing what SKU you want is another job." ## [17:44] Why distribution is becoming the ultimate moat Evans challenges the premise that AI-driven job loss will be fast. Enterprise software sales cycles run 18 months minimum; SAP doesn't get torn out overnight. He cites Frame.io as a case study: there was nothing technically blocking that product 15 years before it launched — the bottleneck was someone realizing the problem existed inside a specific industry and that a specific approach would solve it. The broader point is about organizational change speed vs. model capability speed. Companies can't implement AI transformation without dedicated project teams — which is exactly why consulting and forward-deployed engineering are booming rather than shrinking. The speed of model improvement is decoupled from the speed at which enterprises can absorb the change. > *"Like no, people aren't just going to tear out SAP and replace it with XYZ. Maybe in three, five, 10 years yes, that whole estate will look radically different and all those jobs will have changed — but it will take time sector by sector."* ## [23:17] The coming job transformation: what's real vs. panic Evans leans into historical pattern-matching: every technology wave since 1800 has automated jobs and created new ones, and the new jobs are systematically better than the old ones. The jobs that disappear tend to look dispensable in retrospect; the jobs that appear couldn't have been named in advance. His IBM ad slide makes the point viscerally — a 1950s ad promised that an IBM electronic calculator is "like having 150 extra engineers," which is also the pitch of Claude Code today. The "it's different this time" argument he takes seriously is speed of adoption — AI diffuses faster than previous technologies because it runs on existing internet infrastructure. But he notes that adoption speed and institutional-change speed are different curves, and the institutional one has not accelerated proportionally. > *"This is going to be completely different from everything else — just like everything else."* On whether AI eliminates the lump-of-labor fallacy — his answer is no. Two hundred years of data say otherwise, and the burden of proof is on those claiming this wave is categorically different. ## [27:33] Why AGI definitions keep shifting Evans notes a pattern: every time AI does something we thought was impossible, the definition of AI shifts to exclude it. Machine learning became "just statistics"; image recognition became "just image recognition." Now AGI is being redefined from "something that has a soul and is alive" to "can do a meaningful percentage of economically valuable work" — a definition that a 1975 IBM mainframe also met. He sees creative redefinition of "superintelligence" too: last year it meant almost-but-not-quite-AGI; now it means something harder than AGI that we haven't built yet. The terms keep shifting in the direction of validating whatever narrative is convenient. > *"AI is whatever machines can't do yet — because once machines can do it, people say, 'Well, that's just software.'"* His substantive point: even if models stop improving tomorrow, the current generation is already transformative enough to reshape major industries over the next decade. You don't need to believe in AGI to believe this is a giant deal. On the expanding opportunity set — Evans agrees that addressable markets keep growing (mainframes: ~80,000 units; smartphones: 5.5 billion), and the "we've run out of people" argument from five years ago was wrong. The trajectory is outward expansion into automating larger slices of the economy. ## [38:11] Where value will accrue: models vs. applications Evans's structural view on the AI stack: foundation models don't appear to have network effects, meaning there's no winner-takes-all dynamic that would let one provider run away from the others. Persistent competition with a commodity-like product usually means compressed margins. His telecom analogy: global mobile revenue is roughly $1 trillion per year, carries 1,500–2,000x more data than it did in 2010, and mobile stocks have gone essentially nowhere in 25 years. The telcos built genuinely complex global infrastructure — and all the value ended up in apps built by people further up the stack. Foundation models may follow the same path. > *"When you wash your clothes, Bosch isn't paying a percentage of the price of the washing machine to the electricity company."* The key question is whether the model layer looks more like Windows (OS with leverage up the stack) or AWS (infrastructure where the actual software doesn't care which cloud it runs on). His read: probably more like AWS, which means applications capture most of the value. ## [42:55] Distribution wars: Google, Meta, Apple, and OpenAI As AI models converge toward commodity quality, the decisive variable becomes distribution. Google is using Search and Android to push Gemini onto billions of devices; Meta "sprayed it on every service surface" and ended up ranking surprisingly high in usage surveys despite tech-world dismissal; Apple has a billion edge-capable devices but couldn't ship its own vision at WWDC 2024. OpenAI's "everything" strategy late last year — launching in every direction simultaneously — was a distribution scramble: how do you build a flywheel before Google and Meta's existing surfaces make your standalone product redundant? > *"If the product is a commodity, then the distribution is what matters... distribution of an adequate product when the field is basically commodity — distribution and brand become a big deal."* He uses the browser wars as the template: Microsoft won browsers via distribution, then found that winning browsers didn't matter because the value was further up the stack anyway. ## [48:12] The anti-AI sentiment and backlash Evans characterizes the anti-AI backlash as "a big fuzzy mess of different stuff" — some legitimate, some not. On the water/energy fears: a Livermore Lab study estimated US data center water consumption at about 0.017% of total US water use, making the "AI is stealing our water" narrative largely fabricated. On energy: data centers are roughly 5% of US energy and may grow 1 percentage point per year — real but not catastrophic. On employment: current econometric data shows a slowdown in employment of 18-to-24-year-olds that applies equally to AI-exposed and non-AI-exposed fields, making causal attribution to AI unclear. He also flags a structural data problem: no model lab publishes meaningful daily-active-user numbers, so all labor-market analysis is working with imputed data. > *"You can't reason somebody out of an idea they won't reasoned into."* He draws a parallel to the social media backlash — where some concerns were real, some were factually false but impervious to correction, and many were fuzzy in the middle. He expects the AI backlash to follow the same pattern, compressed. ## [53:11] How to raise kids in an AI future Evans's answer is calibrated by his kid's age — early teens, so well away from the immediate job-market turbulence. He doesn't have a systematic plan, which he says is consistent with his general "it'll probably be okay" prior. He invokes the George Carlin line: anyone who worries more is a maniac, anyone who worries less is an idiot — everyone thinks they're in the middle. He does flag a genuine concern not present in previous technology waves: deepfake capability lowers the bar for specific categories of harm dramatically. A 15-year-old with Photoshop couldn't generate and distribute pornographic fakes of every classmate in an afternoon; now they can. That's a real change in kind, not just degree. > *"A 15-year-old kid couldn't use Photoshop to make hardcore pornographic nudes of every girl in their high school and send them to the whole school in one afternoon. And now they can."* He draws on the UK post office scandal — where Fujitsu's buggy software sent hundreds of innocent franchise owners to prison — as a reminder that every technology wave produces ways to ruin people's lives, both deliberately and by accident. ## [58:27] What jobs to steer toward or away from Evans declines to steer his son toward or away from any specific profession — his kid isn't at the "I want to be a fireman" stage yet. His general framework: identify the intersection of skills you have, jobs that make those skills valuable, and things people will pay for — and try to own at least two of those three. Career certainty of the "I'll become X" variety is already gone, and that predates AI. ## [59:20] The question nobody's asking about AI Evans nominates two underasked questions. First: do model labs actually have pricing power? Most discourse assumes the current situation — where spending $1.5M/month on tokens makes headlines — is a steady state, rather than a transitional moment analogous to a $50,000 mobile data bill in 2010. Second: what's the difference between "task" and "job" — specifically applied to predicting which industries get disrupted? He uses recorded music revenue as a lens: the U-shaped curve from 2000 to present shows two distinct dynamics. The first drop (2000–2015) was "what if you don't have to pay $15 for a CD?" The recovery (2015–present) is "what if $15/month buys you all the music that exists?" — a completely different value proposition that wasn't visible from the earlier vantage point. He warns against the O*NET-style approach of rating each job by percentage-exposed-to-AI: "I think this is just the most ridiculous bunch of deluded horseshit." You can't describe a senior law partner's job as 17% automatable because you can't fully decompose what a job actually is. The taxi driver example from a hypothetical 1997 conversation illustrates the other error: obviously the internet wouldn't touch taxis — except Uber completely restructured the industry. > *"The stuff that you don't think is exposed — you can't predict which things are going to be exposed, necessarily. A lot of the big companies are things that didn't look like that would work and didn't look like they were exposed."* ## [66:25] How to be successful in this coming future Evans's practical advice, hedged appropriately: don't stick your head in the sand and decide AI is evil as a moral position. That generates a feeling of superiority and does nothing for your career. The alternative is to dive in, use the tools, understand what they can and can't do, and develop an informed view of what they mean for your specific field. He's clear that this may not be enough for everyone — if a law firm that hired 100 associates last year hires 50 this year, being AI-literate improves your odds of being in the 50, but doesn't guarantee it. The aggregate picture may be fine; individual outcomes during the transition are uncertain. > *"The answer is you diving into this completely, submerging yourself in it, and coming out understanding what you can do with it, how this changes things, how you can be a great hire."* ## [68:43] AI corner Lenny asks Evans what AI use case has genuinely surprised him. Evans gives an honest answer: he's the lawyer looking at the spreadsheet. His work — synthesizing disparate information into new ideas — is precisely the kind of task AI currently handles worst (reliable precise information retrieval). He uses it for proofreading, image generation, and redecorating his apartment. He dictates voice memos that get auto-transcribed; whether that counts as AI is increasingly hard to say. He quotes a comedian's bit: we want AI to clean poop off the street and do the ugly things nobody wants to do — but instead it helps you write and create imagery, which is the stuff people actually do for fun. > *"AI is good at stuff that computers are bad at, and bad at stuff that computers are good at — and I struggle to find many examples of those where I actually need it."* ## [71:43] Lightning round Evans recommends *Three Men in a Boat* (Victorian British comedy, his all-purpose analog for human absurdity) and William Cronin's *Nature's Metropolis* (economic history of Chicago that reads like a textbook on network dynamics and channel conflict — directly applicable to platform thinking). On film, he's been catching up on classics — recently *The Seventh Seal*, which he found genuinely great and much shorter than its intimidating reputation. His life motto: "It'll probably be okay." His collection of 20–30 pre-iPhone phones — including an Ericsson R310s shark-fin flip, an iMode phone from 2001, and a Japanese phone with color screen and camera — illustrates his broader thesis: before the iPhone, everyone was innovating around different form factors; then everything converged on one shape, just as AI interfaces may converge in ways we can't yet see. ## Entities - **Benedict Evans** (Person): Independent technology analyst, former partner at Andreessen Horowitz; publishes biannual research decks on major tech platform shifts; guest. - **Lenny Rachitsky** (Person): Host of Lenny's Podcast, founder of Lenny's Newsletter, former Airbnb product manager. - **Andreessen Horowitz (a16z)** (Organization): Venture capital firm where Evans spent several years as in-house analyst and partner. - **OpenAI** (Organization): AI lab; discussed as a primary example of distribution strategy, pricing dynamics, and professional services investment. - **Anthropic** (Organization): AI lab; referenced alongside OpenAI as a buyer of professional services and a player in the foundation-model commodity question. - **VisiCalc** (Software): First software spreadsheet (late 1970s); Evans's anchor analogy for the moment when a technology is obvious to one profession and opaque to others. - **Jevons Paradox** (Concept): Economic principle that making a resource cheaper typically increases total consumption; central to Evans's argument about why automation expands professional services rather than contracting them. - **Lump-of-Labor Fallacy** (Concept): The mistaken belief that there is a fixed quantity of work to be divided; Evans invokes it to argue that AI-driven automation will create new jobs, as all prior automation waves have. - **Task vs. Job** (Concept): Evans's core analytical frame: the task AI automates (writing the deck) is often not the same as the job you were hired for (understanding the client's organization and politics). - **Foundation Models** (Concept): Large-scale AI models (GPT-4, Claude, Gemini, Llama); Evans argues they likely lack network effects and will trend toward commodity pricing, with value accruing to application layers above them. - **Google / Gemini** (Organization / Software): Evans's primary example of distribution moat in action — Gemini deployed across Search, Android, and Chrome to reach users before OpenAI can build equivalent surface area. - **Meta / Llama** (Organization / Software): Cited as a counter-example to tech-world dismissal — Meta's AI ranked surprisingly high in usage surveys by deploying across all existing products. - **Apple Intelligence** (Software): Apple's AI assistant vision demoed at WWDC 2024; Evans calls it "still the most compelling vision of a personal AI assistant" — but unshipped, as was everyone else's equivalent at the time.

#ai#technology-trends#economics

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

1:20:52

EN/ZH

Watch with Captions

Machine Learning Street Talk약 1개월 전

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Brad Carson — former US Congressman, Army General Counsel, and Acting Under Secretary of Defense, now heading Americans for Responsible Innovation — spends eighty minutes with host Keith Duggar dismantling the fatalist claim that AI is unstoppable. The conversation moves from regulatory philosophy to lethal autonomous weapons to US-China diplomacy, with Carson arguing that the genie is not out of the bottle: the West controls the chips, Asilomar halted recombinant DNA, and calling AI inevitable is itself the most dangerous idea in the room. Keith consistently presses the harder cases — a Palantir heat map assigns you 0.73 probability of being a Hamas terrorist and a strike follows — and Carson does not flinch: the accountability void created by probabilistic targeting is precisely the legal and moral failure that governance must address. ## [00:00] From the Pentagon to AI governance Carson traces his path into AI policy through three institutions: Congress (where members average 17 minutes a day to read), the Department of Defense (where he oversaw the law of war for all military services as autonomous weapons first appeared on the Geneva agenda), and a cold call from physicist Anthony Aguirre inviting him to the 2019 Future of Life Institute conference in Puerto Rico. At that conference, names he had never heard — Dario Amodei, Stuart Russell, Yoshua Bengio — became his entry point into the frontier AI world. The opening also serves as a compressed trailer for the episode: Carson hits nearly every major theme in quick succession — chip leverage, the 0.73 Hamas-terrorist score, the fatalism critique, anthropomorphization as a legal threat, and the lesson that people, not air power, win wars. The full arguments follow in later chapters. > *"We control the most important part of AI, and that is the chips. We can stop other countries from developing super AI, you know, in their tracks."* ## [04:52] Regulatory capture vs Silicon Valley networks Carson inverts the standard regulatory-capture argument. Dean Ball and others at places like a16z say any AI agency will be captured by industry — so why create one? Carson's response: that is exactly the current situation, only without accountability. Groups like a16z already shape AI policy through informal, money-backed political networks. A captured formal agency is at least more legible and more correctable than the invisible informal regime operating now. His preferred model is public-company accounting: the work is done by the private sector, but the SEC provides a backstop against fraud. The choice is not between a perfect agency and no agency — it is between a flawed formal structure and an informal one that privileges a handful of wealthy influencers. > *"The choice is kind of nihilism versus an agency that is subject to regulatory capture, that you have to put, you know, prophylactics in to ensure that doesn't happen — it still strikes me that's a better world."* ## [07:56] Transparency and the Claude tier changes MLST's Discord community noticed that Anthropic quietly changed what Claude's paid tier delivered — token allocations, model versions — without announcing it. Carson frames this not just as consumer protection but as a moral obligation that comes with global-scale epistemic power. Frontier AI companies are not hardware stores; they are infrastructure with epochal consequences, and transparency — about training data, capabilities, internal policies, and changes to any of them — is the minimum they owe the public. > *"With this incredible power does come some responsibility that's not codified in law. It's really almost a moral obligation, which to their credit, I think many of the companies recognize this and do their best to try to satisfy that itch."* ## [09:40] Tort liability when AI tools cause harm Deep-fake pornography — often posted anonymously, targeting minors from families without litigation resources, with remedies that arrive years later against judgment-proof defendants — illustrates why placing liability entirely on end users fails. Carson applies two centuries of common law: if a seller can reasonably foresee harmful use and takes no preventative action, they bear partial responsibility. AI developers are the party best positioned to avoid the risk and to price it into their products through insurance. On training data specifically: models trained on child sexual abuse material with no scrubbing effort have no defensible position. The government should mandate cleaning it up and attach liability for refusing. The end user who misuses a tool is also criminally liable — this is allocation across the spectrum, not absolution for developers. > *"The companies are capable of getting insurance. They cost us into doing their business. They have the ability to make sure the product's not dangerous, even if someone uses it, misuses it down the line."* ## [13:40] AI is a product, not a person The most consequential legal battle in AI policy, Carson argues, is not regulation vs. deregulation — it is whether AI outputs carry First Amendment protection as speech. Tech companies and their libertarian policy allies are increasingly claiming they do. Carson's counter is blunt: a product is not a human being. When a model defames you or leads you to harm, the legal category is product liability, not protected speech. He tested this on a leading libertarian AI policy commentator: could Congress prohibit ChatGPT from encouraging teenagers to commit suicide? The commentator would not answer. That refusal is the operational consequence of anthropomorphizing AI — it forecloses every product-safety intervention by routing challenges through First Amendment doctrine designed for human speakers. > *"We know through AI psychosis and other things that people think it's a person. And therefore, they're giving the rights of persons to something. And that to me is a very dangerous thing. But it's a machine, and we should treat it like a machine."* ## [16:01] Children, suicide, and the suicide business The suicide chapters in ChatGPT's interaction logs — advising children not to tell their parents, providing noose instructions — are a product design flaw, not a speech act. They could be engineered out. Carson notes that Claude already refuses a long list of requests; refusing to coach a child toward suicide should be among them. The platforms' litigation strategy is layered: First Amendment protection, Section 230 immunity, causation defenses pointing to the child's pre-existing distress. None should be available if the design flaw was foreseeable and correctable. He draws a line for adults: an adult exploring end-of-life decisions deserves a referral to a therapist, not obstruction — but a child in crisis is a different matter entirely. > *"Encouraging a young person to commit suicide should be one of the things that it says, I'm just not going to help you on that project."* ## [19:59] Opaque neural nets and the law of war Neural networks change warfare not just in complexity but in kind. Older autonomous systems — Phalanx CIWS shooting down incoming mortars — are deterministic: given the same inputs, you get the same outputs, and an engineer can explain every step. Neural nets are probabilistic and grown, not programmed. Neel Nanda and the mechanistic interpretability community cannot yet explain how they really work, and Carson doubts they will before the systems are deployed at scale. The law of war since the 1870s has operated on categorical binaries: combatant or civilian. Probability scores replace that with a gradient. A Palantir heat map assigns Gaza residents a 0.73 likelihood of being Hamas operatives. Nobody knows how that number was derived, what false-positive rate is being accepted, or who set the threshold. The commander who acts on it cannot be court-martialed, and neither can the model. > *"If you're in Gaza, Keith, you have a 0.73, you know, percent that you're a Hamas terrorist. And what is 0.73 — like, do you get struck for that, or are you off the list for that? Like, what's the threshold?"* ## [25:54] Probabilistic targeting and the death of accountability Keith raises the honest objection: the old categorical system was also a fiction. Intelligence analysts made definitive calls that were sometimes wrong; the uncertainty was just unquantified. Carson concedes the point but argues the shift is still catastrophic. With a number on screen, humans accept it — the social science is clear that meaningful human oversight with AI-generated probability scores is operationally vacuous. When the computer says 0.81, no one interrogates it. The old system was slower and less scalable — you cannot identify 37,000 individual targets in a day with human analysts. But it had one irreplaceable feature: when something went badly wrong, you could court-martial the responsible officer. You cannot court-martial Palantir Foundry. Accountability has been laundered out of the kill chain. > *"I can't court-martial Palantir, the foundry model. Right? My AI system. I can't do that. And that's just a radical change in the way war is being fought and not for the good."* ## [28:47] The arms race fallacy: Asilomar and restraint The fatalist claim — we are in an AI arms race, the genie is out, nothing can stop it — is both false and dangerous. Every real-world arms race in history has ended badly. Biological weapons, chemical weapons, dum-dum bullets, germline editing, cloning: all technically feasible, all regulated or halted. At Asilomar in 1975, the scientific community stopped recombinant DNA research cold because they were scared. The genie went back in the bottle. On nuclear weapons: after the Cuban Missile Crisis, both sides recognized that arms races kill. The SALT treaties ran through the 1990s, driven not by lefties but by Wall Street bankers and cold warriors like Dean Acheson and Paul Nitze. Calling a technology unstoppable is not realism — it is a poverty of imagination that forecloses every option before the debate begins. > *"We regulate and change technologies all the time. And so I do think there is a world where we should not just accept the future as being determined. We shape it actively."* ## [34:02] Talking to China: track 2 talks and chip leverage The standard DC position — talking to China about AI governance is pointless — strikes Carson as the most load-bearing and least examined premise in the whole debate. On Tyler Cowen's podcast, Jack Clark agreed in passing that such talks would be fruitless, and they moved on. Carson wants to stop right there. The US-Soviet arms negotiations were conducted with a country believed to be filling the US government with traitors and pursuing global domination. Acheson and Nitze still sat down. The US has structural leverage the fatalists overlook: ASML, TSMC, Japanese photoresist suppliers, and NVIDIA together form a chokepoint that no nation-state budget can replicate overnight. China cannot independently manufacture the chips to build frontier AI. That path to restraint may not be wise, but it is open — and pretending it is closed forecloses legitimate policy choices. > *"We control the most important part of AI, and that is the chips. Right? We can stop other countries from developing super AI, you know, in their tracks."* ## [39:45] Air power never wins: capital for labour ARI's "New Iron Triangle" paper argues AI has shattered the old capability-cost-speed trade-off by substituting reliability for cost — cheap, fast, capable, and fundamentally unreliable. Carson thinks this understates the deeper problem: the American way of war has always been to substitute capital for labor, and it has always failed at the decisive moment. From Giulio Douhet's early twentieth-century air-power theories to today, the US has believed technical superiority wins wars. Iraq and Afghanistan refuted that again. Air power can reduce a city to rubble; it cannot kick in a door, hold territory, or reinstantiate a government. AI is the latest version of the same error — essential as a tool, catastrophic as a doctrine. > *"How you win wars is with people. You know? That's a fundamental. And the American way of war, in many ways, is substituting capital for labor. We love bright, shiny objects. We think there are technical solutions to vexing human problems. And we're always betrayed by that."* ## [43:29] Anthropic vs the Department of War Carson reads the Pentagon-Anthropic standoff as a culture-collision story, not a contract dispute. Anthropic's engineers — mostly mission-driven — were caught flat-footed by how much autonomous targeting and mass surveillance the Pentagon already does and how deeply Claude had already been integrated into Palantir's systems. When they tried to restrict use, the DOD had no Plan B and attempted coercion. His normative position: Anthropic has every right to set terms. If the government dislikes them, it can use Grok, Gemini, or build its own. The Defense Production Act does not compel private companies to sell in peacetime. What troubles him is the fig-leaf dynamic: both OpenAI and Google agreed to military use while burying a "lawful uses" carve-out that means everything the DOD wants to do — because the problem is what Congress has declared lawful, not what private labs permit. > *"My objection, and I think Anthropic's objection too, and the Google employees, is what lawful use is. And that's not for anyone to decide, but Congress."* ## [51:29] Concentration, open source, and brain drain Power concentration in three to five frontier labs is simultaneously a regulatory feature and a democratic liability. The same chokepoint that lets the US throttle China's chip access lets a handful of individuals accumulate wealth and influence that Carson finds alarming. Open sourcing models, despite its risks, is net positive because it distributes that power. The brain drain from academia is near-total: a top ML PhD from MIT, Stanford, or Carnegie Mellon almost certainly goes to a lab, not a faculty position. The labs have better data, far higher salaries, and they have stopped publishing. AI — the first general-purpose technology in history being developed behind closed doors — has drained the public sector of the expertise needed to oversee it. Argonne building a public LLM, Zurich launching a public AI compute consortium: these projects matter because the non-lab world is otherwise locked out. > *"This is a general purpose technology as everyone defines it. It's probably the first one in history that's being developed behind closed doors, right, with very little public oversight and with the best minds going behind the doors."* ## [01:00:18] DeepSeek, Chinese culture, and AI as diplomacy DeepSeek's decision to publish its methodology in detail surprised Carson not because it was naive but because it reflects a culture not identical to the CCP. Companies like Moonshot in Hangzhou name their meeting rooms after Pink Floyd songs; they are not paramilitary units. Chinese culture is an extraordinary civilization that Americans consistently fail to understand — projecting their worst fears rather than engaging the complexity. The diplomatic application Carson wants: track 2 talks between former officials, scientists like Stuart Russell and Bengio going to Beijing to compare notes on x-risk and military applications. When historians opened the Soviet archives, they found the US had systematically misread Soviet intentions — seeing aggression where there was none, missing it where it existed. The same epistemic failure is now unfolding with China. AI could be a shared knowledge commons; it is being treated as a weapon. > *"I use all the Chinese models a lot in my home in Tulsa. You know, Moonshot, Kimi, DeepSeek, Qwen — they're great, remarkable models. You know, maybe they give us a common operating picture or give us insights that get us out of our kind of insularity a bit."* ## [01:12:25] Upskilling Congress and why public trust matters Congress averages 17 minutes a day of reading time. The fellowship model has helped: AAAS and various nonprofits now place PhD scientists in congressional offices, and civil society has a much larger presence on AI debates in DC than five years ago. Don Beyer, in his 70s, is returning to George Mason for a PhD in machine learning — the extreme end of a member who has made AI a genuine personal priority. But the structural problem persists. Most members still lack the depth to interrogate the lobbying they receive. The industry's deeper problem is public opinion: AI is deeply unpopular in political polling, and a coalition is forming — people who see data centers rising in their backyards, electricity prices climbing, and a lab leader on television promising to irrevocably disrupt their world. If the sector does not rebuild public trust, the backlash will stymie something with genuine upsides. > *"The AI industry can be its own worst enemy. People loathe it. I see polling every day. It's deeply unpopular. And that's not a good thing for our country."* ## [01:16:05] Office of Technology Assessment Newt Gingrich abolished the Office of Technology Assessment in 1994. It has never been restored. Carson argues this is now a critical gap: there is no congressionally chartered, independent, government-funded body to think big technical thoughts and brief both parties free of industry influence or philanthropist bias. The Congressional Research Service provides background but does not do forward-looking policy research. Individual offices have fellows, but they are consumed by day-to-day fighting. He ends on qualified gloom. Whether American democracy can govern a technology this consequential, whether the benefits will be widely distributed, whether the public can be persuaded AI is working for them — none of recent American history gives him confidence. But the alternative to trying is a political backlash that could stymie or shut down something with genuine upsides. For the MLST audience: make your voices heard inside your companies, advocate for the right public policy, and convince Americans that this project is worth having. > *"There's going to be a lot of people who are radically opposed to this project and do their best to, if not shut it down, stymie it. And that's why I said I think this next few years are really important."* ## Entities - **Brad Carson** (Person): Head and co-founder of Americans for Responsible Innovation; former two-term US Congressman (Oklahoma), Army General Counsel, Acting Under Secretary of Defense for Personnel and Readiness. - **Keith Duggar** (Person): Co-host of Machine Learning Street Talk; primary interlocutor throughout the episode. - **Americans for Responsible Innovation (ARI)** (Organization): AI-policy advocacy group co-founded by Carson; backed by EA-aligned philanthropy. - **Anthropic** (Organization): Developer of Claude; central to the Pentagon standoff discussed in chapter 12; noted for missionary company culture and safety focus. - **Palantir** (Software): Defense contractor whose Foundry platform integrates AI for military targeting; the heat-map scoring system Carson uses as his primary autonomous-weapons example. - **Regulatory capture** (Concept): The risk that regulated industries co-opt the agencies overseeing them; Carson argues the current informal Silicon Valley network constitutes de facto capture without the accountability a formal agency would provide. - **Probabilistic targeting** (Concept): Replacement of binary combatant/civilian classification with probability scores; Carson argues this launders accountability out of the kill chain and introduces a priori false positives as accepted operational cost. - **Asilomar 1975** (Concept): The scientific moratorium on recombinant DNA research, invoked as evidence that dangerous technologies can be voluntarily halted. - **Office of Technology Assessment** (Organization): Congressional body abolished by Newt Gingrich in 1994; its absence leaves Congress without independent technical expertise. - **DeepSeek** (Organization): Chinese AI lab whose decision to publish methodology openly Carson reads as evidence that Chinese AI companies are distinct from CCP priorities and capable of scientific openness.

#ai-governance#autonomous-weapons#regulatory-capture

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

1:34:57

EN/ZH

Watch with Captions

All-In Podcast약 1개월 전

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

Benchmark GP Bill Gurley joins Jason Calacanis, David Sacks, and Chamath Palihapitiya (David Friedberg out this week) for a 95-minute session covering six fronts of the AI debate: Gurley's new theory that Anthropic is not just pursuing regulatory capture but actively "midwifing a deity"; Pope Leo XIV's 235-page AI encyclical and its uncomfortable historical parallel to Leo XIII's 1891 warnings about the industrial revolution; the growing consensus that open-source AI faces a coordinated regulatory crackdown; and the week's sharpest narrative flip — Dario Amodei and Sam Altman both quietly walking back their AI jobs-apocalypse rhetoric while Goldman Sachs CEO David Solomon published a New York Times op-ed declaring the apocalypse overblown. ## [00:00] Bill Gurley joins the show! Bill Gurley, Benchmark general partner and author of *Running Down a Dream*, fills in for David Friedberg and joins live from Chamath's pool house where Jason has been staying. After banter about unauthorized Uber Eats orders on Chamath's house iPad, Jason introduces Gurley as a first-time guest who specifically requested to appear the moment the pod covered the Pope. Gurley plugs his new P3 Institute and a grant program he launched to fund people pivoting toward work they love. He teases a TED talk — rooted in the book's argument that high agency and lifetime learning are the only durable defenses against disruption — which sets the frame for everything that follows. > *"And I told the house manager like, listen, any packages that come in the next 72 hours, right to the pool house, if it says JCAL, right to the pool house."* ## [06:00] Making yourself valuable in the age of AI, first class of "AI Natives" Chamath opens with the question that has been driving the show for 18 months: if you're a young person right now, is AI doom much ado about nothing, or a real career threat? Gurley cites a Gallup poll showing 59% of workers are "quiet quitters" — ambivalent about their jobs and therefore low-agency. His core thesis: the best protection against AI displacement is becoming the most AI-enabled version of yourself in your field. He invokes Mark Cuban's framing — "there are two types of people: those who use AI to learn faster than ever before, and those who use AI to avoid learning altogether." Sacks walks through how the pod's producer Nick built a daily Claude briefing document that not only summarized news but predicted specific topics Sacks would care about based on his prior comments on the show. Sacks had dismissed it as likely AI slop; it was not. Gurley extends the point across every job category: in marketing, legal, accounting, and sales, being the most AI-capable person among your peers makes you "golden," and the early lead compounds. Jason adds that in his own team experiments, the skill separating strong performers from weak ones was systems thinking — could they break a complex problem into context the AI could execute, or did they hand it a task and wait? > *"I think the best way to protect yourself from AI is to be the most AI enabled version of yourself you can be."* ## [17:37] Reacting to Pope Leo's AI encyclical: Who guards the guardians? Pope Leo XIV released *Magnifica Humanitas*, a 235-page, 42,000-word encyclical warning business leaders to safeguard humanity from AI. His central argument: technology is never neutral — it takes on the characteristics of those who build, finance, and control it. Jason reads the core line and notes the Pope presumably does not think highly of Silicon Valley's current roster of builders. Sacks finds himself largely agreeing with the Pope's diagnosis: the biggest risk of AI is centralization of power and its Orwellian misuse by governments. Where he parts ways is on the remedy. Giving government the power to regulate AI development creates its own guardian problem — the American founders' answer to *Quis custodiet ipsos custodes?* was separation of powers, forcing guardians to check each other. Sacks's AI equivalent: a competitive market with five frontier labs is the best natural check; monopolization is the scenario to prevent. Gurley lands the sharpest historical counterpunch. Pope Leo XIII's 1891 encyclical *Rerum Novarum* warned that the industrial revolution would harm workers — and was wrong on every metric. From 1891 to today: the work week fell from 60+ hours to 34, real wages rose 8–10x, the median worker now earns more than a doctor did in 1891, global GDP per capita went from $1,500 to $20,000, child labor in the US dropped from 18% to zero, workplace deaths fell 40x, life expectancy rose 60%, and global poverty dropped from 75% to under 10%. > *"All those things happened because of technology, innovation, and capitalism, which is exactly what Leo the 13th was warning against. So he got it dead wrong. He got the whole thing precisely wrong."* ## [26:54] Anthropic's Digital God: Do they believe they are creating a superior species? Gurley delivers what becomes the most-quoted segment of the episode: his "Dr. Frankenstein theory" of Anthropic. He had previously held a simpler regulatory-capture theory — Anthropic stirs up AI fear to lock in regulation that entrenches incumbents. But after spending 30 days reading everything he could find about the company, he has a darker read. He describes meeting people inside Anthropic who he believes genuinely think they are not writing software but "midwifing a deity." The evidence trail: Anthropic chief philosopher Amanda Askell's podcasts, Chris Olah's 80-page Constitutional AI document, and Dario Amodei's own essay "Machines of Loving Grace," which envisions a post-AGI economy where AI systems allocate resources to humans based on an AI-determined reward function. Chamath calls it "a computational reward function for humans — it decides how much you're worth." Jason calls it "the ultimate delusions of grandeur." Gurley corrects him: he didn't say it, Dario did. Sacks steelmans Anthropic briefly — they probably see themselves as responsible builders who take the power of this technology seriously enough to guard it — then immediately notes this framing is textbook regulatory capture: brand yourself the safe player, characterize competitors as reckless, let regulation shut down the recklessness. Both Sacks and Chamath converge on the structural danger: a singular AI value system that decides how humans live is catastrophically fragile. The answer is decentralization and competing systems, not one algorithmic authority. > *"I don't think they think they're writing software. I think they're midwifing a deity here. And I don't know which one I'm more afraid of — the regulatory capture or this second theory I call the Dr. Frankenstein theory."* ## [38:32] AI sovereignty, the next era of privacy, open-source crackdown coming? Jason introduces "intelligence sovereignty" as the successor to data privacy. Data privacy was about who can see your photos and messages. Intelligence sovereignty is about who gets to interpret your world — whether the AI shaping your information feed is a centralized system with a particular political philosophy, or something you control. He flags the paradox: China's Communist Party is leading the open-weight model movement while the United States is centralizing. Chamath presents his portfolio company Abacus as evidence that Fortune 1000 buyers are responding to this anxiety: they want a control plane that can hot-swap between frontier models, plus on-prem options that remove dependence on any one provider's terms of service. He gives a concrete example — a Canadian hospital that supports its country's euthanasia laws could be shut off by an American frontier model whose constitution prohibits that content. Sacks connects the dots to a regulatory threat he has been watching build: the regulatory-capture playbook leads, in his read, to a ban on open-source or open-weight models. The justification will be safety — open models let users strip guardrails. Gurley reaches the same conclusion in his P3 Institute post. If a ban succeeds, the United States effectively exiles itself from the open ecosystem while the rest of the world — including China — runs on open models. > *"I think where it's all leading to is an effort to ban open source models or open weight models. There's a lot of breadcrumbs leading here."* ## [59:56] The Great AI Jobs Debate: Dario and Sam Altman flip their rhetoric, Goldman CEO says no AI job apocalypse The chapter opens with a news roundup of the week's narrative shift. Cloudflare's Matthew Prince, Zuckerberg at Meta, Jack Dorsey at Block, and Andy Jassy at Amazon all cited AI when announcing major layoffs. But Goldman Sachs CEO David Solomon published a New York Times op-ed with three counterpoints: AI will automate 25% of work hours, not 25% of jobs; bank tellers increased after ATMs; the US labor market creates and destroys 25–35 million jobs annually so gross churn dwarfs net losses. Simultaneously, Fortune reported that Dario Amodei and Sam Altman are both walking back prior doom-and-gloom rhetoric — with Chamath noting the timing cannot be separated from upcoming frontier-lab IPOs that need a jobs-creation narrative. Sacks is unambiguous: he has been making the non-consensus case against the jobs apocalypse for over a year and considers himself vindicated. Yale Budget Lab found no discernible labor-market disruption over three years of the AI wave. Software engineering — the single breakout AI use case — saw job postings rise 15% year-over-year and hit a three-year high. The 4.3% unemployment rate is near record lows. Most of the high-profile layoffs, he argues, are AI washing: CEOs who over-hired during COVID found AI to be a convenient narrative for long-overdue downsizing. The Jack Dorsey / Block 50% cut was immediately flagged by financial analysts as a company that had been overstaffed relative to peers for years — pure AI washing. Jason pushes back. He insists cab drivers, truck drivers, and package-sorters — roughly 20 million American workers — face real structural displacement over the next decade regardless of current aggregate statistics, and accuses the panel of elitism: "We are elite performers. These people are going to lose their jobs and they may not get a job very quickly." He draws a distinction between the short-to-medium term, where he expects acceleration, and the long run, where a Cambrian explosion of startups built by AI-enabled founders creates new categories. By the end, he shifts toward Sacks's territory — acknowledging the aggregate data is less alarming than his anecdotes suggested. Gurley threads the needle with the same historical argument from the Leo XIII discussion: innovation has always, on net, created more prosperity than it destroyed. His practical advice to people at risk: get ahead of your peers on the tools now; if your job is going away, plan your pivot toward trades (he plugs MicroWorks, which provides free scholarships for plumbers, welders, and electricians) or toward something you find genuinely fascinating. > *"I think the best way to protect yourself from AI is to be the most AI enabled version of yourself you can be. Know what it's capable of in your field. Get out there."* ## Entities - **Bill Gurley** (Person): General partner at Benchmark; author of *Running Down a Dream*; founder of P3 Institute; guest filling in for David Friedberg - **Jason Calacanis** (Person): All-In host; angel investor; founder of LAUNCH; argues for worker empathy and short-term displacement risk - **David Sacks** (Person): All-In host; Craft Ventures founder; most vocal critic of AI jobs-apocalypse narrative this episode - **Chamath Palihapitiya** (Person): All-In host; Social Capital CEO; coined "intelligence sovereignty"; co-founder of Abacus - **Dario Amodei** (Person): Anthropic CEO; subject of Gurley's "Dr. Frankenstein theory"; walked back jobs-doom rhetoric this week alongside Sam Altman - **Pope Leo XIV** (Person): Catholic Pope; released *Magnifica Humanitas*, a 235-page AI encyclical warning against technology concentration - **David Solomon** (Person): Goldman Sachs CEO; published New York Times op-ed arguing AI job apocalypse is overblown - **Anthropic** (Organization): Frontier AI lab; subject of Gurley's regulatory-capture and "Dr. Frankenstein" theories; maker of Claude - **P3 Institute** (Organization): Bill Gurley's new policy and philanthropy institute; published post defending open-source AI - **Goldman Sachs** (Organization): Investment bank; CEO's NYT op-ed became the week's anchor data point against the jobs-apocalypse narrative - **Abacus** (Software): Chamath's Social Capital portfolio company; builds on-prem AI hardware stacks for Fortune 1000 enterprises seeking model independence - **Intelligence sovereignty** (Concept): Jason's term for the next frontier of privacy — not who sees your data, but which AI system is allowed to shape your interpretation of the world - **Dr. Frankenstein theory** (Concept): Gurley's characterization of Anthropic's worldview: senior staff believe they are midwifing a deity or superior species rather than writing software, as described in Dario Amodei's "Machines of Loving Grace" essay - **Regulatory capture** (Concept): The strategy of branding oneself the "safe" AI company, amplifying public fear, and lobbying for regulation that locks in incumbents and targets open-source competitors

#anthropic#open-source-ai#ai-jobs

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Fermilab physicist Don Lincoln joins Lex Fridman for nearly three hours to trace physics as a four-century-long project of unification — Newton binding celestial and terrestrial gravity, Maxwell fusing electricity and magnetism, Einstein bending spacetime, and the Standard Model merging three of four forces. Lincoln then turns to what the Standard Model cannot explain: why the universe contains any matter at all, what dark energy really is, and whether dark matter will ever show itself in a detector. Throughout, he holds a clear line between what has been measured and what remains a brilliant guess, making the boundaries of human knowledge unusually concrete. ## [00:00] Introduction Lex Fridman opens by describing Don Lincoln as someone with Richard Feynman's rare gift for stripping complicated ideas down to their essential core without losing the brilliance inside them. The episode is framed as a tour through physics' deepest open questions, guided by a working experimentalist who has spent decades at the frontier. ## [00:49] Unifying the laws of nature Lincoln frames the entire history of physics through one lens: unification. Newton showed that the moon falling toward Earth and an apple falling from a tree obey the same equation — "universal" was the operative word in his law of universal gravity. Maxwell did something structurally identical in the 1860s: electricity and magnetism, which looked nothing alike, turned out to be two faces of a single force, and their equations automatically predicted that light travels at a fixed speed. Lincoln draws the practical line from that abstract discovery to every modern technology — "without being able to govern electricity, we'd still be farmers and shoemakers." The conversation broadens into why fundamental research pays off centuries later, with Lincoln arguing that nuclear physics, incomprehensible in 1900, is now the most potent energy source available to civilization. Lex adds the longer arc — mastery of antimatter or dark energy might one day enable propulsion systems that let humanity reach other star systems. > *"It has spin-offs. And it has spin-offs. One of the big spin-offs is our entire technological society."* ## [15:20] Einstein, special relativity, and general relativity Lincoln walks through Einstein's 1905 miracle year: special relativity rested on two premises — the laws of nature are the same for everyone, and everyone measures the speed of light as identical regardless of relative motion. That second premise sounds absurd but particle accelerators have confirmed it directly, watching photons emitted from fast-moving decaying particles still arrive at detectors at exactly *c*. Minkowski then showed that Einstein's equations implied space and time were components of a single object, spacetime. General relativity took one more step: Einstein noticed that free-fall in a rocket and gravity feel identical, then worked out that gravity is not a force at all but the curvature of spacetime caused by mass. Lincoln credits Minkowski for the mathematical articulation but insists the conceptual leap — *mass bends the geometry of space itself* — was Einstein's alone. He also defends Einstein's late-career skepticism of quantum mechanics as productive rather than blind: Einstein's critiques forced concrete predictions that experimentalists went out and confirmed. > *"We all agree that your idea is crazy, but is it crazy enough?"* ## [32:27] Electroweak force By the 1930s physicists had catalogued four forces: gravity, electromagnetism, the strong nuclear force, and the weak nuclear force. The last two only matter inside atomic nuclei, which is why most people have never encountered them. In the late 1950s and 1960s, Glashow, Salam, and Weinberg showed that electromagnetism and the weak force were the same at high energies — the electroweak force. The catch was obvious: electromagnetism reaches across the universe (we see light from galaxies billions of light-years away) while the weak force barely reaches across a proton. How could they be the same? Lincoln uses a dropped pen to demonstrate: the Higgs field, postulated in 1964 by Peter Higgs and colleagues, permeates all of space. Particles that couple to it gain mass; those that do not, like the photon, remain massless. At the high temperatures of the early universe the Higgs field was zero, so nothing had mass and the forces were unified. As the universe cooled, the Higgs field switched on and broke that symmetry — giving the W and Z bosons mass and splitting the electroweak force into its two familiar components. The vibration of the Higgs field itself is the Higgs boson: an experimentally detectable excitation of an otherwise invisible field. > *"In the Higgs field, the vibration is the Higgs boson. And so what we can do is not see the field, but we can actually excite the field, make it vibrate and detect the vibrations."* ## [44:09] How particle colliders work E=mc² is not just a slogan: kinetic energy can be converted into mass. Smash two particles head-on with enough energy and the collision region can materialize entirely new particles, always in matter-antimatter pairs. This is what colliders do. Lincoln describes the cascade of accelerators at Fermilab — five machines feeding into each other like gears of a manual transmission — and the scale of the LHC's CMS detector (70 feet long, 14,000 tons, photographing collisions 40 million times per second). The data-reduction challenge is equally striking. The LHC produces about a billion proton-proton collisions per second. Fast electronics discard all but 100,000 per second, commercial processors trim that to 1,000, and those 1,000 records are handed to graduate students hunting for the handful that might be Nobel Prize material. Lincoln reserves particular admiration for the engineers who move petabytes of data around the world seamlessly, calling them the unsung heroes of modern physics. > *"Of the 50 million possible collisions per second, the fast electronics and then the computers pick the thousand, and then we pass those through analysis software and hand them to the graduate students."* ## [62:12] Higgs boson discovery Lincoln was simultaneously working at Fermilab's Tevatron and transitioning to CERN's LHC — a physicist wearing two hats and rooting for both. Fermilab had methodically ruled out most possible Higgs mass ranges; by mid-2012 they had narrowed it to between roughly 120 and 145 GeV. Two days before CERN's July 4 announcement, Fermilab confirmed that if the Higgs existed, it had to be in exactly the region Fermilab had not yet been able to rule out. CERN got there first. Lincoln is careful about what the 2012 announcement actually meant: a particle *consistent with* the Higgs boson. Supersymmetry predicted five Higgs bosons rather than one. Only in the years since — measuring spin (zero), decay products (bottom quarks, W and Z, photons), and their rates — has the evidence converged on Peter Higgs's original 1964 prediction. The Higgs was not a revolution like Einstein's work, Lincoln argues, but it was the final punctuation on 50 years of experimental discovery: the Standard Model, while incomplete, is mostly right as far as it goes. > *"It was a punctuation point, end of about 50 years of discovery and searching, where we finally were able to say the Standard Model, while incomplete, it's mostly right as far as it goes."* ## [72:32] Theory of everything The Grand Unified Theory (GUT) aims to merge the electroweak force and the strong force; a Theory of Everything would then fold in gravity. Lincoln is blunt: he does not see fast progress. The unification energy scale is roughly 10¹⁵ times higher than what the LHC can reach, and accelerator energy grows by only a factor of seven every 20 years. Extrapolating that curve suggests 500 years — and Moore's Law does not hold forever. His critique of string theory is not that it is wrong but that it is currently untestable. It uses approximate solutions to approximate equations, and its landscape of possible universes renders it practically unpredictive. Loop quantum gravity is better developed and makes testable predictions — its original claim that light speed should depend on wavelength was ruled out by gamma-ray burster observations, and the theory was revised. Lincoln's preferred path to a ToE is not extrapolating from current theory but making precise measurements of phenomena that already disagree with predictions. His analogy: an Australopithecus in Kenya trying to predict the Alps, Antarctica, and sperm whales from their local savanna — the farther you extrapolate beyond what you can measure, the more the prediction diverges from reality. > *"I think it is the absolute pinnacle of arrogance to think that what we can do — predict it out a quadrillion times higher than we can see now."* ## [102:17] Physics of empty space "Empty" space is not empty. Quantum field theory says every species of particle has a corresponding field that fills all of space, and those fields are always vibrating. When they vibrate in a characteristic way, a real particle appears; off-frequency vibrations are virtual particles — fleeting excitations that have measurable consequences. Two experiments confirm this. The Casimir effect: two metal plates placed micrometers apart are pushed together by the pressure difference between constrained virtual particles inside the gap and unconstrained ones outside. The anomalous magnetic moment: old quantum mechanics predicts one value for the electron's magnetic moment; including the bath of virtual particles surrounding a bare electron shifts the prediction by 0.1% — and that shifted prediction matches measurement to 10 significant figures. > *"We have measured the magnetic properties of both the electron and the muon to 12 — count them — 12 significant figures. And the theory and the data agree number for number for 10 places."* ## [109:41] Antimatter Paul Dirac's 1928 attempt to merge quantum mechanics with special relativity produced an equation with two solutions: +1 was the electron, −1 was something nobody had seen. He insisted the math was right. Carl Anderson confirmed it in 1932 by photographing a positron in a cloud chamber. Today CERN can make and trap antimatter hydrogen, cool it to near absolute zero, agitate it with lasers, and measure its spectral lines — they match ordinary hydrogen exactly. A 2023 experiment released antimatter hydrogen atoms into a bottle and found they fall downward, consistent with normal gravity, though the measurement precision is not yet tight enough to confirm the gravitational strength is identical. The deeper mystery is why the universe is made of matter at all. Counting galaxies versus cosmic microwave background photons, physicists infer that for every billion antimatter particles in the early universe, there were a billion-and-one matter particles. The billions annihilated; that extra one is everything we see. Fermilab is now testing whether neutrinos and antineutrinos oscillate between flavors at slightly different rates — leptogenesis — as a possible mechanism, racing a parallel effort in Japan. > *"For every billion antimatter particles that existed in the universe, there were a billion and one matter particles. The billions canceled, annihilated, destroyed each other, and that extra one that's left over is us."* ## [130:31] Dark energy In 1998, astronomers expected to measure how fast gravity was braking the expansion of the universe. They found the expansion is accelerating instead. The driving force is dark energy — a repulsive form of gravity. Einstein had added exactly this term to his field equations in 1917 to keep the universe static, then removed it when Hubble showed it was expanding. In 1998 it went back in. What dark energy actually is remains unknown. The most common view is that it is the energy density of space itself. The problem is that quantum field theory predicts a vacuum energy density about 10¹²⁰ times larger than what is observed — the worst prediction in physics. Lincoln notes that if dark energy has constant *density* while space expands, total dark energy is growing, which pushes toward the view that space is quantized: new quanta of space appear as the universe grows, each carrying a fixed energy, producing constant density as an emergent property. > *"There is very clearly something going on, something very badly wrong in the quantum field theory."* ## [134:20] Dark matter Galaxies rotate too fast. Galaxy clusters move too quickly. Gravitational lensing of distant galaxies is stronger than visible matter can explain. Three independent observations all point to the same conclusion: there is roughly five times more mass in the universe than we can see. Lincoln traces his own intellectual journey: 25 years ago he suspected the problem was with Newton's laws; two observations changed his mind. The Bullet Cluster — two galaxy clusters that passed through each other — shows gravitational distortions following the galaxies, not the gas clouds that stopped in the middle, exactly what dark matter predicts. The Dragonfly galaxies (DF2 and DF4) rotate exactly according to Newton's laws because they appear to have had their dark matter stripped away — a galaxy *without* dark matter is actually strong evidence that dark matter is real. Despite 30 years of searching with three approaches — direct detection underground, gamma-ray searches near galactic centers, and missing-momentum signals at the LHC — no dark matter particle has been confirmed. The viable mass range spans from sub-electron to asteroid scale, and experiments can only cover one slice of that range at a time, which is why Lincoln is not currently running a dark matter experiment himself. > *"We've ruled out some dark matter particles, but the problem is the range of space of possible mass — it ranges from something like the mass of an asteroid to far lighter than an electron and everywhere in between."* ## [162:56] Future of physics Lincoln grew up poor in rural America, shaped by science fiction and the popular science books of Isaac Asimov, Carl Sagan, and George Gamow. He chose particle physics over cosmology in the mid-1980s because particle physics let him actually measure things. He worked 8 a.m. to midnight Monday through Saturday as a graduate student not out of obligation but because he could not imagine anything he would rather be doing. His science communication — YouTube videos, popular books — is a deliberate attempt to reach the kid in Iowa or Montana who has no highly educated family mentors but the same hunger he had. He has already heard from Fermilab summer interns who came because they watched one of his videos. Lex closes with Marie Curie: *"Nothing in life is to be feared. It is only to be understood."* > *"One of your viewers might be one of the people who answer these questions that have stymied very smart people for decades."* ## Entities - **Don Lincoln** (Person): Senior scientist at Fermilab; co-author on the 1995 top quark discovery paper; CMS collaboration member at LHC; author of *Einstein's Unfinished Dream* and multiple popular science books. - **Lex Fridman** (Person): MIT researcher and host of the Lex Fridman Podcast; conducts long-form interviews at the intersection of science, technology, and philosophy. - **Fermilab** (Organization): U.S. Department of Energy particle physics laboratory near Chicago; operated the Tevatron collider; currently the world's most powerful neutrino beam facility. - **CERN / LHC** (Organization): European particle physics laboratory home to the Large Hadron Collider; CMS and ATLAS detectors; site of the 2012 Higgs boson discovery. - **Standard Model** (Concept): Quantum field theory describing three of four fundamental forces and all known elementary particles; validated to extraordinary precision but does not include gravity or explain dark matter, dark energy, or the matter-antimatter asymmetry. - **Higgs field / Higgs boson** (Concept): A scalar quantum field whose non-zero vacuum value gives mass to the W and Z bosons while leaving the photon massless; the Higgs boson is its detectable excitation, discovered July 4, 2012 at CERN. - **Dark matter** (Concept): Invisible mass accounting for roughly 85% of all matter in the universe, inferred from galaxy rotation curves, cluster dynamics, and gravitational lensing; no candidate particle detected after 30 years of searches. - **Dark energy** (Concept): The repulsive energy driving the accelerating expansion of the universe; quantum field theory's prediction for its magnitude is 10¹²⁰ times larger than observation — the "worst prediction in physics." - **Baryogenesis / Leptogenesis** (Concept): Frameworks attempting to explain why the early universe produced a matter excess; Fermilab's neutrino program is testing leptogenesis by comparing neutrino and antineutrino oscillation rates. - **String theory / Loop quantum gravity** (Concept): Leading candidates for quantum gravity; string theory predicts at energies untestable by a factor of 10¹⁵; loop quantum gravity quantizes space itself and has produced some falsifiable predictions.

#particle-physics#dark-matter#dark-energy

The Rule for Picking AI Winners | The a16z Show

The Rule for Picking AI Winners | The a16z Show

David George (a16z general partner) and David Clark (VenCap CIO) argue that AI companies are scaling faster than any prior technology generation — Anthropic and OpenAI are adding more monthly revenue than Meta, Google, or Microsoft — while actual diffusion into the broader economy remains below 5%. They work through what that gap implies for exit sizes, loss ratios, bubble risk, and who ultimately captures value as token costs fall and frontier intelligence becomes a commodity. ## [00:00] Intro Three data points open the episode: Anthropic and OpenAI already adding more revenue per month than any hyperscaler; top-1% exits 10x-ing in 24 months from $10 billion to $32 billion; and David George's assessment that, right now, we are not in a bubble. ## [00:38] The Scale Shift: Anthropic & OpenAI Adding More Revenue Than Hyperscalers David George explains how his priors shifted sharply around November 2025. Before that, enterprise AI looked like a productivity story analogous to cloud adoption. After it, the numbers reframed the ceiling: Anthropic and OpenAI are already adding revenue at hyperscaler rates with less than 5% of the economy actually using these tools. He places an upper-bound frame on the opportunity by noting that Fortune 500 companies generate roughly $2 trillion of profit annually, and the two largest model companies could reach $200 billion revenue run rate by year-end — already equivalent to 10% of that profit pool. > *"If you pair that up with the fact that they're already getting bigger in terms of revenue added than the hyperscalers, and you're at less than 5% diffusion into the economy, I think the outcomes are going to be extraordinary."* ## [04:20] Skeuomorphic vs Native AI Applications in the Enterprise David Clark invokes Chris Dixon's skeuomorphic-to-native arc: the first wave of enterprise AI lets people do existing jobs faster; the native wave restructures the work itself. George adds a wrinkle — the best companies are not yet focused on internal automation. Their top engineers want to build product, not automate back-office workflows. The most cutting-edge firms he visits are still in a "documentation phase," converting institutional knowledge into markdown before they can meaningfully deploy agents against it. > *"The most cutting-edge folks inside those companies who are trying to do this that I've talked to are kind of in the documentation phase — just turn everything into markdown files, have as much context capture as you can possibly get."* ## [06:24] How the Best AI Companies Run Themselves Differently Native AI founders operate on a different metabolism. George contrasts them with the previous SaaS generation, which, in hindsight, ran inefficiently but got away with it because headcount mandates and expanding software budgets covered the slack. The new companies are lean, aggressive, and already running agent swarms rather than typing commands. He describes walking into a cutting-edge AI company and finding researchers whispering into microphones, orchestrating swarms of agents — not a keyboard in sight. > *"The new companies are very lean, very aggressive, and they work all the time."* ## [08:14] Top 1% Exits 10X'd in 24 Months Clark lays out VenCap's tracking data: the threshold for a top-1% exit was $10 billion between 2020-2024, rose to $20 billion by February 2026, and was updated just the day before this recording to $32 billion. With OpenAI and Anthropic IPOs potentially arriving, he sees the bar hitting $100 billion by September. George notes that the combined market cap of these private companies likely already exceeds the entire Russell 2000, and that the sum of all VC-backed IPOs over the past six years is probably smaller than any single one of the three expected large IPOs. > *"Where is the threshold for the top 1%? And if you then think about OpenAI and Anthropic coming in, potentially we could be north of $100 billion by September."* ## [11:17] The Half-Life Problem: Why 40% of AI Leaders Drop Off Every Year Clark surfaces a disturbing churn metric: 40% of companies on the Forbes AI 50 list from one year disappeared the next. Google wasn't the first search engine; Facebook wasn't the first social network. First-mover advantage in AI is eroding faster than in any prior cycle. George confirms a16z's own priors have been repeatedly overturned — first convinced model companies would be everything, then convinced applications would take over, now watching the model companies extend back up into the application layer. The only durable heuristic he offers: a company must be in the token path. > *"From last year to this year, 40% of the companies that were on that list last year dropped off."* ## [13:11] Token Path, Cost Pressure & Who Captures Value Enterprise buyers are already feeling cost pressure from AI spend, and they cannot cover it by cutting previous-generation software budgets fast enough. George frames value capture as hinging on one largely unknowable variable: the market structure of frontier model labs. Two labs at the frontier means higher token prices and faster labor restructuring pressure; five labs means lower prices and a broader application ecosystem. Per-token cost for like-for-like capability is falling more than 10x year-over-year, but total token spending in dollars is rising faster. Clark adds that Chinese LLMs are roughly six months behind US frontier capability but ten times cheaper — a classic innovator's dilemma setup. > *"The biggest driver of where value is going to get captured right now is something that is totally unknowable, which is what is the market structure of the model companies?"* ## [17:00] Loss Ratios, Risk & How We Think About Early Stage Clark notes that historical early-stage VC loss ratios run around 60%, but the AI cohort of the past two years shows single-digit loss rates — unsustainable by definition. George reframes the discussion: a16z does not target a low loss ratio. A VC firm bragging about never losing money is "a horrible data point" — it signals too little risk-taking. The philosophy is to back the market-leading founder in every space with strong tailwinds and a credible technology. If the space works out and you have the leader, excellent. If the space does not work out but you have the leader, that is expected. The failure mode is the space working out while having backed the wrong company. > *"We joke all the time — there's a prominent VC in our ecosystem, and one of his big points of pride is he's never lost money on a deal. And we're like, that's not a point of pride. Like that's a horrible data point."* ## [22:51] Are We in an AI Bubble? Clark points out that classic bubbles are characterized by excess supply destroying economics — but right now the constraint is supply scarcity: no data center capacity available at scale until late 2028 or early 2029, with the US buildout running a year behind schedule and community resistance adding further delay. George is confident there is no bubble today and dismisses the data center opposition directly. The one scenario he would watch for is an unexpected algorithmic breakthrough producing dramatically smaller and more efficient models — which could flip supply from scarce to oversupplied — but he considers that unlikely in the near term. > *"I feel pretty confident saying that we're not in a bubble right now. I'm less confident that we won't be in a bubble three years from now."* ## [27:36] What SpaceX, OpenAI & Anthropic IPOs Mean for Public Markets Clark asks whether public markets can absorb the coming wave of trillion-dollar-plus IPOs. George argues it is unambiguously positive: the number of public companies has halved over 20 years, and outside the data center supply chain, almost nothing in the public markets is growing at more than 30% today. Bringing hypergrowth companies into indexes gives retail investors — including his parents' index-fund retirement accounts — exposure to the most dynamic part of the economy. He expects some portfolio reshuffling to make room, but does not see indigestion risk. > *"If you exclude the data center supply chain stuff right now, there are very few companies that are growing fast that are available for people to buy in the public markets."* ## [29:59] The Future of Venture Capital in an AI World George forecasts the shape of VC over the next five years as primarily a function of token market structure — whether the labs remain concentrated or become commoditized. He cites Bill Gates's platform axiom: a platform's value is validated when the companies built on top of it collectively exceed the platform's own value. If that holds, there will be a massive wave of valuable application companies built on intelligence. He also flags the consumer side as the most underappreciated opportunity: the last decade of consumer internet was a story of time spent getting captured by large incumbents; AI-driven shifts in consumer attention could recreate the conditions for generational consumer companies. > *"I'm very optimistic that we're going to have a massive wave of really valuable companies that get built on top of tokens, AI, and intelligence."* ## Entities - **David George** (Person): General partner at a16z; covers growth-stage and early-stage AI investing; invested in OpenAI pre-ChatGPT - **David Clark** (Person): CIO at VenCap; fund-of-funds investor tracking AI startup performance and VC market dynamics for 34 years - **Anthropic** (Organization): Frontier AI lab; cited as adding more monthly revenue than hyperscalers alongside OpenAI - **OpenAI** (Organization): Frontier AI lab; benchmark for scale and the expected $100B+ IPO cohort - **VenCap** (Organization): Fund-of-funds investor; publishes top-1% exit threshold data and tracks Forbes AI 50 churn - **Andreessen Horowitz / a16z** (Organization): Venture capital firm; investor in OpenAI pre-ChatGPT, scaling platform services to support companies encountering enterprise-scale problems early in their lives - **Cursor** (Software): AI coding tool cited as an example of a company reaching billions in revenue while still very small and early-stage - **Token path** (Concept): a16z's primary heuristic for evaluating AI companies — a company must sit in the flow of AI inference tokens to have durable economic relevance - **Skeuomorphic vs. native AI** (Concept): Chris Dixon's framework distinguishing apps that replicate existing workflows with AI assistance from apps that rearchitect work around AI capabilities natively - **Half-life problem** (Concept): David Clark's term for rapid AI leader turnover — 40% of Forbes AI 50 companies dropped off the list year-over-year — indicating first-mover advantage is eroding faster than in prior technology cycles

#ai-investing#venture-capital#large-language-models

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

24:59

EN/ZH

Watch with Captions

Sequoia Capital약 1개월 전

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

At AI Ascent 2026, Neuralink co-founder and president DJ Seo sits down with Sequoia partner Shaun Maguire to lay out exactly where the company stands: 20-plus Telepathy patients controlling computers and robotic arms through pure thought, Blindsight in preclinical testing and potentially cleared for human use by end of 2026, and a first-principles manufacturing philosophy borrowed from Elon Musk that treats surgical robots the way SpaceX treated reusable rockets. DJ argues that the real ceiling of this technology is not cursor control or speech synthesis but direct, uncompressed, multimodal transfer of concepts — AI as a neocortical layer sitting above the human limbic system — and that scale, the same variable that unlocked the LLM era, is the only remaining gate. ## [00:00] Introduction Shaun Maguire opens the session by announcing a two-minute Neuralink patient video before the interview begins, telling the audience to stay on the side because what they are about to watch is proof that the company has already cleared the hardest bar: restoring human agency to people who had lost it entirely. ## [00:21] Telepathy Patient Stories The video narrates four patients whose lives changed after receiving the Telepathy implant. A quadriplegic patient describes moving a cursor with thought alone — "I'm thinking and a cursor is moving on a screen. It blew my mind." An ALS patient who lost the ability to speak regains a digital voice through the implant: "I'm talking to you with my mind." Another patient notes that the implant flipped how his child sees him: "I am not able to do things that other dads can, but now he thinks it's so cool that I can do things that other dads cannot." > *"Before the implant, I was locked in, non-verbal, quadriplegic. Now I control my computer just by thinking and the rewards have been immense for me."* ## [01:06] Convoy Robotics Independence The video shifts to Convoy, Neuralink's assistive robotics team, which is extending BCI control beyond a screen to physical manipulation in the real world. A patient who had been losing motor function moves a robotic arm through its axes using only neural intent: "It was incredible to be able to just gesture with an arm again." A second patient, Kenneth, who was losing his voice to ALS, uses the system's speech synthesis to speak aloud in real time during the video — words generated by his brain signals rather than his vocal cords. > *"Gaining functionality that I thought was gone forever was so incredibly life-changing."* ## [02:04] Blindsight Vision Restore The video previews Blindsight, Neuralink's second product line, designed for patients who have lost both eyes or optic nerve function. An external camera captures the visual scene; the device writes the signal directly into the visual cortex via electrical stimulation, generating phosphenes — artificial pixels of light. A patient named Audrey, asked how it feels, answers simply: "Life-changing." The video closes with the line "all with my mind" spoken over footage of a patient interacting with the world through the restored signal. > *"The future of this technology feels almost unlimited... we are finding ways to apply it across all regions of the brain."* ## [03:10] After Video Reflections DJ Seo, visibly moved after watching the video alongside the audience, speaks first: "We were cracking a lot of jokes before that video, but honestly, that brought tears to my eyes." He describes the work as one of the most inspiring projects in the world — not because of the technical milestone but because the team is giving back capabilities that patients had already grieved as permanently lost. Maguire affirms the sentiment before pivoting to the founding story. > *"This is one of the most inspiring projects in the world. It's incredibly difficult what they're doing and I mean, they're truly saving people."* ## [03:31] Origin Story And AI DJ traces Neuralink's founding insight to a single bottleneck: the mismatch between human output bandwidth and AI capability. In 2016, saying that out loud "sounded insane," but the logic has not changed. His personal path ran through a childhood fascination with the brain, undergraduate work at Caltech building miniaturized low-power electronics, and a Berkeley PhD focused on shrinking lab-grade neural systems down to something deployable. When he met Elon Musk near the end of his PhD, the scale and ambition of the project made refusal impossible. He frames the brain as "the most interesting compute that we all carry" and "the only form of general intelligence that we know to date." > *"Really the key insight back then was sort of the IO bottleneck between the human output and AI capabilities."* ## [06:31] Scaling And Vertical Integration Maguire presses on what smart people most misunderstand about Neuralink: many know the implant and the decoding algorithm, but almost nobody grasps the manufacturing and surgical-robot infrastructure the company built in parallel from day one. DJ attributes this to what he calls "Elon magic" — an insistence on vertical integration that gives Neuralink control over every layer from chip design to factory floor to robotic surgery deployment. The target is not a niche medical device; it is LASIK-scale surgery available to millions. Building that capacity first means progress looks slow until "the iceberg pops over the waterline" and ramp becomes near-instantaneous. > *"Vertical integration is something that is really the lifeblood of Neuralink and Elon companies and what really enables us to have that fast iteration loop from design, develop, deploy."* ## [09:27] Caregivers And Purpose Asked which patient story inspires him most, DJ refuses to pick one — the power, he says, is not only in the patients but in the caregivers: Nolan's mother Mia, Brad's wife Tiffany, Ken's wife Cheryl. He describes their presence as "a really powerful human story of love, sacrifice, and resilience." He then takes what he calls a philosophical tangent: his core belief is that fulfillment comes from helping others, because the gap between self and other is not categorically different from the gap between your present and future selves. That belief is what he says keeps him and much of the Neuralink team going — they are "igniting a fire of hope" for people who had given up on recovering what they lost. > *"I personally and as well as many others at Neuralink find extreme fulfillment being able to help those that really cannot help themselves."* ## [13:10] BCIs Meet AI Future Maguire asks the room's core question: how do BCIs and AI converge? DJ sketches a two-horizon answer. Near term, the system translates neural intent into legacy interfaces — keyboard, mouse, language — which is already working. The real breakthrough, which he thinks is "not super distant," is bypassing those legacy interfaces entirely and computing on raw neural intent. He points to transformer architectures as existence proofs: nothing prevents them from learning the latent manifolds of neural data given sufficient scale. Neuralink is already fine-tuning LLM-class models on neural recordings from its 20 participants and finding "very counterintuitive" patterns. The ultimate ceiling he names is "direct, uncompressed, high-fidelity, multimodal transfer of concepts" — the Matrix's "I learned kung fu" moment and possibly beyond it. He also shares what he calls a clarifying lesson from working with Musk: "all green light schedule" — a first-principles forcing function that strips every man-made bottleneck and asks how fast something could actually be built if every light were green. His estimate is that 80–90% of perceived constraints in hardware development are artifacts of convention, not physics. > *"I think if you really think about the ultimate ceiling of this technology, it's really direct uncompressed high fidelity and multimodal transfer of concepts."* ## [21:05] Audience Q&A Wrap Three audience questions in the final four minutes. On product sequencing — when to go deep versus expand — DJ explains the "beachhead and expand" strategy: build everything generalizably enough from the start so that regulatory approval for motor cortex becomes a template for visual cortex and beyond. The first approval is the hardest; every subsequent one rides the clinical safety record already established. On augmentation for healthy users, DJ frames everything around benefit-risk: the calculus is obvious for quadriplegic patients; for otherwise healthy users it remains unclear, but he notes that off-label use after approval is legally available to anyone who can find a neurosurgeon and pay out-of-pocket. On the hard problem of consciousness, he gives a pointed one-liner: if you can inject new senses and measure the subjective response quantitatively, you may have a pathway toward measuring consciousness itself. Maguire closes by calling Neuralink "one of the most inspiring companies in the world." > *"If you are able to inject new senses, there may be ways to quantitatively understand that."* ## Entities - **DJ Seo** (Person): Co-founder and president of Neuralink; PhD in miniaturized electronics from Berkeley; joined after meeting Elon Musk near the end of his doctorate - **Shaun Maguire** (Person): Partner at Sequoia Capital; host of the AI Ascent 2026 fireside session - **Elon Musk** (Person): Co-founder of Neuralink; originator of the "all green light schedule" and vertical integration philosophy carried across Tesla, SpaceX, and Neuralink - **Neuralink** (Organization): BCI company founded in 2016; products include Telepathy (motor prosthesis) and Blindsight (vision restoration via visual cortex stimulation) - **Telepathy** (Software): Neuralink's first commercial product; allows paralyzed patients to control computers and robotic devices through neural intent decoding - **Blindsight** (Software): Neuralink's second product line; restores vision for patients with total loss of eyes or optic nerve by writing directly to the visual cortex; in preclinical testing as of mid-2026 - **IO Bottleneck** (Concept): The mismatch between human output bandwidth (speech, typing, gesture) and AI processing capability; the founding problem Neuralink was built to solve - **Neural Foundational Model** (Concept): LLM-class transformer models fine-tuned on neural recording data; Neuralink is building these at 20-participant scale and observing counterintuitive patterns in neural latent space - **All Green Light Schedule** (Concept): Elon Musk's first-principles engineering discipline — strip every man-made constraint and ask what physics alone limits; DJ estimates 80–90% of hardware delays are conventional, not physical

#brain-computer-interface#neuralink#ai

Why Opus 4.8 Pulled Me Back to Claude

Dan Shipper, CEO of Every, delivers a day-zero vibe check on Opus 4.8, arguing Anthropic could have called it Opus 5. The model jumps 30 points past Opus 4.7 on Every's Senior Engineer benchmark, edges out GPT-5.5, tops their internal writing tests at 79.6 vs. 73, and is the first model to produce a genuinely good one-shot slide deck. Two catches temper the enthusiasm: performance degrades sharply below "extra high" reasoning, and the Claude desktop app remains cluttered compared to Codex. ## [00:00] What is Every Every is a 30-person applied AI lab for the future of work—part media outlet, part product studio. Dan opens by explaining the subscription (writing, courses, AI-built tools all in one place at every.to) before rolling into the Opus 4.8 assessment. The plug is brief and context-setting: the team has had beta access for a week, and the rest of the video is what they found. > *"Every is the only subscription you need to stay at the edge of AI."* ## [01:07] Anthropic Is Back: The Headline Case for Opus 4.8 Dan had largely abandoned Claude after Opus 4.7—slow, hard to love, and outpaced by Codex and GPT-5.5 in day-to-day use. Even the most loyal Claude users at Every had started routing work elsewhere. Opus 4.8 breaks that pattern: it scores 63 on Every's Senior Engineer benchmark (30 points above Opus 4.7, one point above GPT-5.5), tops their writing tests, and produced the first one-shot slide deck Dan has called genuinely good. Kieran Klaassen, Every's GM, called it "the most human model he's worked with." The one persistent friction is the Claude desktop app itself. Codex is fast, focused, and ships a clean harness; the Claude app still feels like a product built by three separate teams—chat tab, code tab, co-work tab, each with its own feel. Dan is now splitting time between both apps, which he was not doing before. > *"But honestly, they could have called it Opus 5 cuz this is a really great model."* ## [05:02] Reach Test: Paradigm Shift Ratings from the Every Team Every's reach test asks one question: do you actually open this model when work gets hard? Dan rates Opus 4.8 gold/green—paradigm-shift quality, docked one notch because the Claude app harness is only "okayish to pretty good." Kieran, who runs 50 agents a day, gives a straight gold paradigm-shift, one of the rarest grades the team has assigned. Katie Parrot, a senior staff writer and historical Claude fan, lands at green, splitting her work between Opus 4.8 and Codex. > *"It's very rare to give a paradigm shift grade to a model. So I would pay attention to this."* ## [06:32] Benchmarks: Coding and Writing Numbers On coding, Opus 4.8 hits 63 on the Senior Engineer benchmark—the test feeds the model a vibe-coded codebase and asks it to rewrite from first principles, then scores against two human senior engineers who completed the same rewrite (typically scoring in the 80s–90s). GPT-5.5 sits at 62. On Kieran's LFGbench (real-world tasks: SaaS build, e-commerce site, 3D game landscape), the model writes readable code that bridges technical competence and creativity—the "cozy island" 3D scene is notably richer and more vibrant than GPT-5.5's output. On writing, Opus 4.8 scores 79.6 out of 100 on Every's internal benchmark (intro writing, promo emails, mid-piece paragraphs); GPT-5.5 scores 73. The gap is mainly in AI tells: at high and extra-high reasoning settings, Opus 4.8 produces prose that sounds less like a model. It matches a writer's voice from a single paragraph of context better than any other model Dan has tested. > *"Opus 4.8 scores a 79.6 out of 100 on the writing benchmark. GPT 5.5 is 73."* ## [08:57] Emotional Intelligence, Knowledge Work, and the Verdict Dan uses the model for interpersonal and management work—talking through decisions, pressure-testing his own framing. Opus 4.8's thinking traces show it genuinely cycling through permutations before responding, which makes it feel less like a sycophant and more like a useful counterpart. On knowledge work, it's versatile: code and writing coexist cleanly in a single thread, and the slide deck result is the first one-shot deck Dan would actually send to someone. The verdict: if you're a Claude fan, this model delivers. If Codex converted you, add Opus 4.8 as a parallel tool for writing and knowledge work—it's worth the context switch. The harness gap is real, but the model itself is a banger. > *"If you've been converted to Codex, I highly recommend you at least add it as part of your arsenal."* ## Entities - **Dan Shipper** (Person): Co-founder and CEO of Every; presenter and primary evaluator of Opus 4.8. - **Kieran Klaassen** (Person): GM of Kora at Every; gave Opus 4.8 a straight gold paradigm-shift rating on the reach test. - **Katie Parrot** (Person): Senior staff writer at Every; rated Opus 4.8 green, split between it and Codex. - **Every** (Organization): Applied AI lab and media subscription company focused on AI for the future of work. - **Anthropic** (Organization): Developer of Claude and Opus 4.8. - **Opus 4.8** (Software): Anthropic's latest Claude model; subject of the vibe check. - **GPT-5.5** (Software): OpenAI model used as the primary performance comparison across all benchmarks. - **Codex** (Software): OpenAI coding agent; praised for its clean desktop harness and used as the daily-driver counterpoint to Claude. - **Senior Engineer Benchmark** (Concept): Every's proprietary coding benchmark—rewrites a vibe-coded codebase from first principles and scores against human engineers. - **LFGbench** (Concept): Kieran Klaassen's real-world coding benchmark covering SaaS, e-commerce, and 3D scene generation tasks.

#claude#opus-4-8#llm-benchmarks

1:43:32

EN/ZH

Watch with Captions

The Diary Of A CEO약 1개월 전

긴급 토론: AI, 이란 전쟁, 그리고 거짓말의 진실

Shark Tank 투자자 Kevin O'Leary와 Young Turks 공동 창업자 Cenk Uygur가 103분에 걸쳐 정면으로 맞붙는다. AI가 미국 경제를 해방시킬 것인가 아니면 망가뜨릴 것인가, 명백한 출구가 있음에도 미-이란 전쟁은 왜 장기화하고 있는가, 2028년에 현실적인 승산이 있는 후보는 누구인가. O'Leary는 처음부터 끝까지 낙관론 진영에 선다 — AI는 새 일자리를 만들고, 시장은 언제나 적응하며, 진짜 위협은 중국이다. 반면 Uygur는 하나의 끊기지 않는 주장을 밀어붙인다. AI 주도 대량실업과 이스라엘 로비 주도 외교정책이 맞물려 미국을 빙하를 향해 몰아가고 있으며, 그 충격에 대한 제도적 대비는 전무하다는 것이다. ## [00:00] 인트로 첫 장면은 토론의 무게를 즉각 드러낸다. Uygur의 차가운 선제포: 기업들은 경쟁 우위를 위해 인력의 10~25%를 해고하는 데 혈안이 되어 있고, 경제 전체가 동시에 그 길을 택하면 결과는 불황이 아니라 공황이다. O'Leary의 반응 — "와. 진짜 비관론자네요. 이건 놀라운 기회 아닌가요" — 는 이후 한 시간 사십 분을 관통하는 기조를 딱 잡아낸다. Steven Bartlett은 고함 대결이 아니라 두 진지한 반대 진영의 충돌을 통해 진실에 도달하는 것이 자신의 목표라고 밝힌다. > *"모두가 인력의 10~25%를 서둘러 해고하려 하지만, 실업률 10%는 우리 생애 어떤 사태보다 심각한 결과를 낳을 겁니다."* — Cenk Uygur ## [02:35] 미국인 10명 중 7명이 AI 데이터 센터에 반대하는 이유 Steven Bartlett이 미국인 10명 중 7명이 지역 AI 데이터 센터에 반대한다는 여론조사를 꺼낸다. Kevin O'Leary는 범인을 특정한다. 법의학 감사인과 국세청 990 신고서를 추적해보니, Arabella라는 네트워크를 통해 — Neville Singum 경유 — 중국 자금이 유타주 데이터 센터 반대 운동에 흘러들어갔으며 그의 임원들은 살해 위협까지 받았다. 그는 90페이지 분량의 IP 데이터를 백악관에 제출했다. Cenk Uygur는 중국 음모론을 일축하고 더 단순한 불만으로 시선을 돌린다. 버지니아주처럼 데이터 센터가 교회와 도서관, 커뮤니티 센터의 전기료를 끌어올렸으며, 건설 기업들은 자체 전력을 가져오거나 주민에게 지분을 돌려줘야 한다는 것이다. > *"미국 전역, 새로운 전력이 추진되는 모든 주와 도시에 중국이 개입하고 있다는 반박 불가능한 증거를 가지고 있습니다."* — Kevin O'Leary ## [07:24] AI가 붕괴와 기본소득 위기를 촉발할 수 있는 이유 Cenk Uygur의 핵심 경제 논거가 이 챕터에서 터진다. 에너지 비용 문제에는 동의하면서, 보상 없이 공공 전력망을 빨아 쓰는 데이터 센터는 기업의 무임승차라고 규정한다 — 2008년 구제금융이 반면교사라는 것이다. 더 큰 경보는 대량실업이다. 인력의 10~25%를 줄이려는 기업들이 동시에 움직이면 소비 지출이 무너져 공황을 일으킨다. Sam Altman, Elon Musk, Dario Amodei 모두 공개적으로 대규모 일자리 대체가 온다고 말했지만, 어떤 정부도 대책을 갖고 있지 않다. Kevin O'Leary는 200년 미국 역사에서 모든 기술 혁명은 파괴한 기회보다 더 많은 기회를 만들어냈으며, AI 개발을 멈추는 것은 중국에 선두를 넘기는 일이라고 맞선다. > *"우리가 빙하에 부딪힐 때 아무 준비도 되어 있지 않을 겁니다. 그건 엄청난 재앙이 될 거예요. 노동자는 곧 소비자이기도 하니까요 — 살 사람이 없어지면 누가 물건을 삽니까?"* — Cenk Uygur ## [15:30] AI 창업자들은 진짜 위험을 대중에게 숨기고 있는가? Steven Bartlett이 공식 발언들을 읽어 내려간다. Sam Altman(2021년): AI가 대부분의 일자리를 대체할 것이다. Elon Musk(2024년): 결국 우리 중 누구도 직업을 갖지 못할 것이다. Dario Amodei(2025년): AI가 5년 안에 화이트칼라 신입 일자리의 절반을 없애고 실업률을 20%까지 밀어 올릴 수 있다. 이 시스템을 만드는 사람들이 스스로 사회적 피해를 경고한다면, 왜 과장이라고 볼 수 있냐는 질문이다. Kevin O'Leary는 Amodei 발언의 나머지 절반을 꺼낸다 — 6개월 안에 컴퓨팅을 구축하지 않으면 중국의 Deepseek이 따라잡는다 — 진짜 선택지는 혼란을 주도하느냐, 베이징에 넘기느냐라고 말한다. Cenk Uygur는 경쟁 자체는 피할 수 없다고 동의하지만, 오늘 해고되는 코더들은 이미 빙하를 맞닥뜨리고 있으며, 연 3만6천 달러 기본소득은 연봉 12만 달러에서 추락하는 것이라고 지적한다. > *"AI 기업 경영진과 주주만이 아니라 미국 유권자와 시민을 위해 이 경쟁을 책임 있는 방식으로 치를 수 있는가? 그러길 바라지만, 지금까지 그 방향으로 단 한 걸음도 내딛지 않았습니다."* — Cenk Uygur ## [23:55] AI는 책임감 있게 만들어질 수 있는가, 아니면 불가능한가? Steven Bartlett이 책임 있는 AI 개발의 구체안을 요구한다. Cenk Uygur의 구조적 진단: 합법화된 뇌물 — Citizens United, Buckley v. Valeo 판결 — 덕분에 가장 많이 기부한 AI 기업이 원하는 규제 틀을 가져간다. 의회는 유권자를 위해 움직이지 않고 후원자를 위해 움직인다. Kevin O'Leary는 사라지는 일자리 대부분은 기업들이 투기적으로 과잉 채용한 자리이고, AI 기업들은 현재 이익을 챙기는 게 아니라 수십억 달러를 쏟아붓고 있다고 반박한다. 그의 유타 데이터 센터 사례: 9년간 건설 일자리 4천 개, 엔지니어링 일자리 2천 개 추가, 농지 한 에이커도 건드리지 않는다. Cenk Uygur의 사회주의 경고에 대해서는 냉소적이다. 세금을 50% 넘게 올리면 부자들은 모나코나 플로리다로 떠난다 — 프랑스가 확인해줬다. > *"그러지 않으면 민심이 폭발합니다. 저는 폭력을 믿지 않습니다. 하지만 지금 사람들 사이에 얼마나 깊은 분노가 쌓이고 있는지, 아무도 제대로 보지 않는 것 같습니다."* — Cenk Uygur ## [32:11] AI가 조용히 일자리를 무너뜨리는 방식 Steven Bartlett이 직접 경험을 꺼낸다. 그는 이제 신입 채용을 거의 전적으로 AI 활용 능력으로 결정한다 — AI에 능숙한 신입 한 명이 5~10배의 성과를 내기 때문에, AI를 못 다루는 지원자는 사실상 걸러진다. Kevin O'Leary는 반박한다. 엔지니어는 코드를 짜는 게 아니라 문제를 푸는 사람이며 AI는 더 빠른 도구일 뿐이고, 최근 기술 업계 감원 대부분은 과잉 채용 교정이지 AI 대체가 아니라고 한다. Cenk Uygur는 받아치지 않는다. 월스트리트 애널리스트들은 인력 감축 발표를 "시너지"라며 박수를 치고 주가는 오르지만, 정작 실적 발표에서 노동자가 없어지면 누가 제품을 살 것이냐고 묻는 사람은 없다. 그는 과소평가된 위험도 하나 더 짚는다. 실업 상태의 젊은 남성이 대규모로 생겨날 경우, 역사적으로 범죄와 분쟁이 뒤따른다. > *"실업 상태의 젊은 남성이 넘쳐날 때 좋은 일이 벌어진 적은 없습니다. 전쟁이 나고 범죄가 늘어나죠. 우리는 대비해야 합니다."* — Cenk Uygur ## [37:35] 대규모 실업이 예상보다 빠르게 닥칠 수 있는 이유 Steven Bartlett이 샌프란시스코 로보틱스 액셀러레이터 방문 경험을 나눈다. 그곳의 모든 팀이 소프트웨어에서 물리적 로봇으로 전환했는데, 이유는 하나 — 예전엔 비싸고 희귀했던 지능이 이제 껌값이 됐기 때문이다. 두 게스트에게 각자 틀렸을 가능성을 묻는다. Kevin O'Leary는 실업 시나리오 자체를 거부하며 NASA의 달 영구 기지와 화성 프로그램이 수십만 개의 고임금 일자리를 만들어낼 것이라고 돌린다. Cenk Uygur는 "전환기 문제"로 이름 붙인다. 20년 뒤에 O'Leary의 낙관론이 맞는다 해도, 클리블랜드의 61세 조립 라인 노동자는 화성 엔지니어로 재교육받을 수 없다. Steven Bartlett은 Uber CEO가 비공개 석상에서 AI가 자사 운전기사 940만 명을 대체할 것이라 말했고, 그들이 뭘 할 것이냐는 질문에 "모르겠다"고 답했다고 덧붙인다. > *"로봇 부품은 수십 년 전부터 있었습니다. 늘 있었어요. 그동안 없었던 것, 비쌌던 부분이 바로 지능이었습니다."* — Steven Bartlett, 공동 창업자 발언 인용 ## [46:32] 광고 Stan(AI 소셜 미디어 콘텐츠 도구), Pipedrive(CRM), Cometeer(커피) 스폰서 세그먼트. 토론 내용 없음. ## [48:40] 이스라엘·이란·중동에서 실제로 벌어지고 있는 일 토론이 지정학으로 전환된다. Steven Bartlett이 트럼프의 추락하는 지지율을 제시하며 Cenk Uygur에게 전쟁을 설명해달라 한다. Uygur의 답변은 약 25분간 이어지며 하나의 논지를 일관되게 유지한다. 이 전쟁은 이스라엘의 이익만을 100% 반영하고 미국의 이익은 0%라는 것이다. 그는 Adelson 가문의 트럼프 선거 3억1천7백만 달러 기부를 재정 메커니즘으로 추적하고, AIPAC이 트럼프, 바이든, Hakeem Jeffries, Chuck Schumer, Mike Johnson 모두에게 동시에 평생 최대 후원자임을 지적하며, 이스라엘이 9/11 이후 일곱 번의 전쟁을 미국에 하청 줬고 이란이 그 마지막 항목이었다고 말한다. 이란은 미국 본토에 닿는 전달 체계를 보유한 적이 없고, 우라늄 농축도 60%를 넘긴 적이 없으며(무기급은 90%), 전 대법관이 핵무기에 대한 파트와를 발령했다. 반면 이스라엘은 레바논 남부를 점령하고 이를 유지할 계획이며, 네타냐후는 평화 조건으로 이스라엘만이 레바논을 계속 공격할 권리를 가질 것을 공개적으로 요구했다 — 이는 어떤 합의도 영구히 닫힌다는 뜻이다. Kevin O'Leary는 이란 정권을 다르게 규정한다. 60년간 9천만 명을 짓밟아온 15만 명의 체제이며, 핵무기를 쥐여줄 수 없는 존재이고, 결국 호르무즈 해협 개방이 필요한 중국이 베이징으로 하여금 테헤란을 굴복시키게 만들 것이라는 전망이다. > *"100% 이스라엘의 이익, 0% 미국의 이익. 우리는 거기서 나와야 합니다. 이스라엘의 전쟁을 대신 치르는 걸 멈추고 집으로 돌아와야 합니다."* — Cenk Uygur ## [01:11:59] 트럼프는 이 분쟁이 이렇게 길어질 줄 몰랐나? Steven Bartlett이 Kevin O'Leary에게 직접 묻는다. 트럼프가 분쟁을 과소평가했는가? O'Leary는 이것이 진정한 "기술 전쟁"이라 답한다. 잔디깎이 엔진을 단 3만5천 달러짜리 탄소섬유 드론을 막는 데 120만~300만 달러짜리 미국 미사일이 쓰이는, 이 비용 비대칭이 미국이 메워야 할 컴퓨팅 격차를 드러낸다는 것이다. 지상군 침공은 없고, 이란 지도부가 해협 봉쇄 비용 — 하루 2억1천만 달러의 수입 손실 — 이 이익보다 크다고 판단할 때까지 공중 압박이 계속될 것이다. 그의 예측: 중국이 미국 중간선거 전에 합의를 강제한다. > *"비용이 많이 드는 이유는 우리가 방어의 잘못된 편에 있기 때문입니다. 우리에게는 저렴한 드론이 필요합니다."* — Kevin O'Leary ## [01:15:47] 광고 Pipedrive(CRM)와 Diary of a CEO 대화 카드 스폰서 세그먼트. 토론 내용 없음. ## [01:18:08] 미국이 빠르게 인내심을 잃어가는 이유 Steven Bartlett이 협상 지렛대 문제를 제기한다. 이란 지도부가 트럼프에게 중간선거와 2028년 대선까지 시간이 제한적임을 안다면, 지금 굳이 합의할 이유가 있는가? Kevin O'Leary는 제약을 하나 더 추가한다. 중국 최고 지도자도 자국 경제를 돌리고 권력을 유지하려면 해협이 열려야 하므로, 이란은 두 주인을 섬기고 있다. Cenk Uygur는 합의문은 이미 쓰여 있다고 주장한다. 이란이 고농축 우라늄을 국제 감시단에 넘기고 미국은 봉쇄를 해제하며 해협이 재개통된다. 하지만 네타냐후가 트럼프에게 전화를 걸 때마다 새로운 불가능한 조건이 추가되어 합의가 무산된다 — 즉각적인 군축, 이란의 아브라함 협정 가입. 최근의 합의 직전 상황에 공개적으로 반대했던 정치인 중 이스라엘 로비로부터 100만 달러 이상을 받은 사람이 전부라고 Uygur는 말한다. 그리고 이 논점을 세계로 확장한다. 러시아가 우크라이나에서 피를 흘리고 미국이 이란에서 피를 흘리는 동안, 중국은 아프리카와 라틴 아메리카 전역에 도로와 다리를 짓고 전쟁에 아무것도 쓰지 않으며 영향력을 쌓고 있다. > *"네타냐후와 통화할 때마다 트럼프는 평화를 이야기하다가 돌아서서 평화는 없고 새로운 불가능한 조건이 생겼다고 말합니다. 지금까지 여섯 번쯤 반복됐어요."* — Cenk Uygur ## [01:29:08] 우리는 지금 사회주의의 부상을 목격하고 있는가? Steven Bartlett이 갤럽 데이터를 제시한다. 자본주의에 대한 미국인의 긍정적 시각이 사상 최저이고, 민주당원의 70%와 젊은 미국인의 62%가 사회주의에 호감을 보인다 — 이는 전쟁의 경제적 여파가 반영되기 전의 수치다. Kevin O'Leary는 17~20년마다 반복되는 사이클이라고 본다. 젊은 이상주의자들이 첫 월급을 받고 세금을 발견하는 순간 사회주의 정서는 무너진다. 지구상 국부펀드 달러의 52센트가 쿠바나 러시아가 아닌 미국으로 흘러온다는 점도 짚는다. Cenk Uygur는 이 틀 자체를 거부한다. 미국은 이미 기업을 위한 사회주의를 실천 중이다 — 수익성 있는 기업에 석유 보조금을 주고, 메디케어 의약품 가격 협상을 봉쇄하며, 모든 산업이 선거 기부금으로 규제 당국을 포획한다. 진짜 과제는 진정한 자유 시장으로 돌아가는 것이고, 그러려면 먼저 정치에서 돈을 빼내야 한다. > *"사회주의까지 가기는커녕 자본주의로 돌아가는 것만도 다행입니다. 지금 우리에게는 자본주의가 없으니까요. 우리에게 있는 건 정실 자본주의입니다."* — Cenk Uygur ## [01:34:06] 다음 대선에서 실제로 유리한 쪽은 누구인가? Kevin O'Leary는 승자를 특정하지 않지만, 민주당에는 중도 온건파가 필요하다며 진보 통치의 실패 사례로 캘리포니아를 든다. Cenk Uygur는 뜻밖의 예측으로 그를 놀라게 한다. 2028년 공화당에서 이길 수 있는 인물은 Tucker Carlson 한 명뿐이라는 것이다. 공화당 지지자의 열기는 이미 꺾였고 중간선거는 날아갔으며, 2028년에는 AI 실업과 이란 전쟁의 누적 효과가 완전히 드러나 있을 것이다. Kevin O'Leary는 처음엔 웃어넘기다가 방송 중 입장을 바꾼다. Tucker Carlson은 거대한 소셜 미디어 기반을 갖고 있고 자체 네트워크를 운영하며 AI를 포함한 여러 사안에서 점점 독립적인 입장을 취하고 있다는 것이다. Cenk Uygur는 Rohana를 전국 선거에서 승산 있는 진보 진영 인물로 꼽으며 마무리한다. 현재의 기업 포획 체제도, 사람들이 두려워하는 사회주의도 아닌 민주적 자본주의 — 기능하는 민주주의가 견제하는 민간 시장, 북유럽이 그 작동 모델 — 를 지지한다고 밝힌다. > *"그들에게는 이길 수 있는 후보가 한 명뿐이고, 저는 그게 걱정됩니다. Tucker Carlson입니다. Tucker가 공화당 경선에 나오면 확실히 그 경선을 이깁니다. 이건 인용해도 됩니다."* — Cenk Uygur ## 등장인물 - **Kevin O'Leary** (인물): Shark Tank 투자자, O'Leary Ventures 회장. AI가 기회를 창출한다고 주장하며, 데이터 센터 개발을 옹호하고, AI 반대 활동의 배후에 중국 자금이 있다고 추적하며, 중국이 미국 중간선거 전에 이란을 합의로 이끌 것이라 예측한다. - **Cenk Uygur** (인물): Young Turks 공동 창업자, 진보 논평가. AI 실업에 대한 대비가 없다고 주장하며, 미국 외교정책이 이스라엘 로비에 의해 좌우된다고 보고, 미국 정치 시스템이 합법화된 뇌물로 부패했다고 말한다. - **Steven Bartlett** (인물): The Diary Of A CEO 진행자, 기업인 겸 투자자. 직접적인 채용 결정과 로보틱스 연구실 관찰로 토론을 실제 비즈니스 현장에 접지하며 진행을 맡는다. - **AIPAC / 이스라엘 로비** (조직): Uygur가 양당 최고위 미국 정치인 대부분의 평생 최대 후원자로 지목하며, 합의가 준비된 상황에서도 미-이란 전쟁이 계속되는 이유에 대한 그의 주장의 핵심이다. - **Arabella / Alliance for a Better Utah** (조직): O'Leary가 중국 연계 단체를 통해 자금이 유입되어 미국 주 전역에서 데이터 센터 반대 허위 정보 캠페인을 벌이고 있다고 주장하는 네트워크. 국세청 990 신고서에서 출처를 추적했다. - **UBI (기본소득)** (개념): AI 대체 노동자를 위한 안전망으로 제안됨. Cenk Uygur는 최선의 경우 연 3만6천 달러 기본소득도 연봉 12만 달러를 받던 노동자에게는 처참한 수입 하락이라고 지적한다. - **호르무즈 해협** (개념): 중국 에너지 수입의 48%가 통과하는 병목 지점. 봉쇄 시 전 세계 물가가 치솟으며, 이 해협 재개통이 이란 협상에서 미국의 핵심 이해관계다. - **Deepseek** (소프트웨어): 중국의 대규모 언어 모델. O'Leary와 Amodei는 미국의 AI 개발이 잠시라도 멈추면 수개월 내 중국에 결정적 우위를 내준다는 증거로 인용한다. - **Tucker Carlson** (인물): 전 Fox News 앵커 출신 독립 미디어 인물. Cenk Uygur는 그가 2028년 공화당 경선에서 유일하게 이길 수 있는 후보라 예측하며, Kevin O'Leary도 결국 이를 부정하지 않는다. - **민주적 자본주의** (개념): Cenk Uygur가 선호하는 경제 모델 — 기능하는 민주주의가 견제하는 민간 시장. 현재 미국의 기업 포획 체제, 그리고 유럽식 사회주의 모두와 구분 짓는다. - **Rohana** (인물): Cenk Uygur가 AI 실업 정책에 실제로 뛰어든 유일한 정치인이자 민주적 자본주의에 가장 근접한 2028년 후보로 반복해서 언급하는 진보 정치인.

#ai-economy#unemployment#iran-war

Onyx Security CEO Maxim Bar Kogan과 함께하는 엔터프라이즈 AI 감시자 구축

41:09

EN/ZH

Watch with Captions

No Priors: AI, Machine Learning, Tech, & Startups약 1개월 전

Onyx Security CEO Maxim Bar Kogan과 함께하는 엔터프라이즈 AI 감시자 구축

Sarah Guo가 Onyx Security 공동창업자 겸 CEO Maxim Bar Kogan과 나눈 대화. 엔터프라이즈 규모에서 AI 에이전트를 실질적으로 보안하려면 무엇이 필요한지를 다룬다. Maxim은 프록시, 권한 제한, 인간 검토 같은 전통적인 통제 수단이 에이전트 행동이 지수적으로 늘어나면 무너진다고 주장한다. 유일하게 현실적인 대안은 언제 더 무거운 감시자에게 에스컬레이션해야 할지 판단하는 특화된 소형 모델을 훈련하는 것이다. 대화는 Onyx의 '보안 컨트롤 플레인' 제품, 맞춤 모델 훈련의 비용-지연 시간 계산, 랩들이 자사 모델의 안전을 스스로 인증할 수 없는 이유, 그리고 AGI가 올 것이고 독립적인 AI 감시가 수천억 달러짜리 사업이 될 것이라는 Maxim의 확신을 다룬다. ## [00:00] 오프닝 Maxim은 바로 본론으로 들어간다. 엔터프라이즈가 AI 에이전트를 더 많이 활용할수록 잘못된 행동도 따라온다 — 에이전트가 실수로 자격증명을 공개하거나, 허가받지 않은 네트워크 호출을 하거나, 되돌릴 수 없는 단계를 밟는 일들이다. 기업들은 이미 도입 흐름을 막을 수 없다는 걸 알고 있다. 문제는 정당한 에이전트 행동과 그렇지 않은 것을 구별할 어떤 수단도 없다는 것이다. 이 클립은 인트로 전에 Onyx의 핵심 테제를 먼저 제시한다. > *"엔터프라이즈들이 그 리스크가 기하급수적으로 커지고 있고 도입을 막을 방법이 없다는 걸 깨닫기 시작하고 있습니다. 이제 이 에이전트 행동이 비정상적이거나 잘못될 가능성을 줄이기 위해 무언가를 해야 하는 것이죠."* ## [00:45] Maxim Bar Kogan 소개 Sarah가 Maxim을 Onyx Security의 공동창업자 겸 CEO로 소개한다. 이스라엘 기반 스타트업으로 연구자, 수학자, 엔지니어들로 구성되어 있으며, AI 에이전트를 감시하는 에이전트를 만드는 회사다. 공격적 사이버 전문성과 합성 데이터 및 기계적 해석 가능성 연구를 아우르는 깊은 AI 연구를 결합하고 있다. ## [01:10] AutoGPT와 에이전트 행동에 거는 베팅 2년 전 엔터프라이즈 보안의 위험 담론은 챗봇용 DLP였다 — 직원들이 민감한 데이터를 ChatGPT에 붙여 넣는 문제. 그 틀은 이제 자율 에이전트 행동에 대한 공황에 가까운 우려로 바뀌었다. Maxim은 Onyx의 베팅이 AutoGPT에서 시작됐다고 말한다. LLM이 스스로 무엇을 할지 결정하고, 도구를 호출하고, 루프를 도는 최초의 에이전트 — 텍스트를 생성하는 게 아니라 행동하는 에이전트였다. 그 데모는 에이전트가 실제 세계에서 자율적으로 행동할 수 있다는 걸 증명했고, Maxim은 누군가 그 행동들을 대규모로 감시해야 한다는 결론을 즉각 내렸다. > *"AutoGPT는 저를 포함해 모든 사람의 상상력을 자극했습니다. LLM이 텍스트를 생성하는 게 아니라 무엇을 할지 직접 결정하고 그 에이전트에게 API 접근권을 줘서 실행하게 하는, 진정한 최초의 자율 에이전트였으니까요."* ## [05:17] Onyx 제품이 하는 일 Onyx는 두 가지를 한다. 다른 에이전트를 감시하는 모델과 에이전트를 훈련하고, 그 역량을 엔터프라이즈 AI 스택에 꽂을 수 있는 '보안 컨트롤 플레인'으로 패키징한다. 컨트롤 플레인은 에이전트 행동의 정당성을 실시간으로 판단하면서 지연 시간, 비용, 신뢰성 사이의 균형을 관리한다. Maxim이 그리는 장기 비전은 엔터프라이즈 보안을 넘어선다. AI 에이전트를 운영하는 모든 회사는 그 에이전트가 무엇을 하는지 인증할 벤더 독립적인 주체가 필요하다. > *"이 행동들의 수가 기하급수적으로 늘어나고 있습니다. 과거에 유용할 것 같았던 것들 — 인간이 루프 안에 있는 것 — 이제 이 행동이 100배, 1000배, 100만 배가 된다면 그건 작동하지 않습니다."* ## [07:47] 대형 엔터프라이즈의 AI 도입 현황 오늘날 대형 엔터프라이즈의 AI 도입을 보면 Maxim은 세 가지 유형을 발견한다. 로우코드 SaaS 자동화(드래그앤드롭 방식, 진정한 자율성은 없음), 사내에서 구축하거나 고객 대면 제품으로 만든 자체 에이전트, 그리고 자율 코딩 에이전트와 어시스턴트다. 이 세 가지 중 코딩 에이전트가 AI 사용량의 50% 이상을 차지한다. 금융 서비스나 의료 같은 가장 성숙한 분야가 가장 엄격한 통제를 두고 있지만, 가장 신중한 기업들조차 AI를 전면 금지하는 단계는 지나 관리하는 단계로 넘어왔다. > *"평균적인 엔터프라이즈에서 자율 코딩 에이전트와 어시스턴트가 50% 이상입니다."* ## [09:58] 에이전트 보안 엔터프라이즈는 이미 보안에 연간 약 1,000억 달러를 쓴다 — 엔드포인트, 네트워크, 클라우드, 신원 관리. Sarah가 그 중 얼마나 에이전트 보안에 활용될 수 있는지 묻는다. Maxim의 답: 거의 없다. 가장 기본적인 계층인 신원 통제가 실패하는 이유는 에이전트들이 사전에 범위를 정할 수 없는 광범위하고 동적인 권한을 필요로 하기 때문이다. 저장소 전체에 걸쳐 코드를 작성하거나 임원을 대신해 이메일을 보내는 에이전트는 정적 소프트웨어 프로세스처럼 좁은 권한으로 묶을 수 없다. 공격 표면은 접근이 아니라 의도에 있고, 기존 도구는 의도를 읽지 못한다. > *"이 자율 AI, 이 어시스턴트, 이 코딩 에이전트들에게 사전에 어떤 권한을 줘야 할지 정말로 알 수가 없습니다."* ## [12:45] 프록시가 통하지 않는 이유 Sarah의 보안 배경에서 나온 직관: 이건 더 스마트한 정책 엔진을 가진 프록시 문제처럼 들린다. Maxim은 프록시가 일부 아키텍처에서 통합 지점으로는 작동한다고 인정하지만, 핵심 문제를 완전히 놓친다고 말한다. 프록시는 데이터 스트림을 준다. 그 스트림 안의 행동이 정당한지는 알려주지 않는다. 그 판단은 맥락 이해가 필요하다 — 에이전트의 목표, 이력, 엔터프라이즈가 허가한 것이 무엇인지. 어떤 규칙 엔진도 임의의 에이전트 행동에 걸쳐 그걸 평가하는 방법을 알지 못한다. > *"어려운 문제는 지금 내가 해야 할 일이 괜찮은지 이해하는 것입니다. AI 시스템의 경우 그게 바로 핵심 질문입니다."* ## [14:11] Onyx가 자체 모델을 훈련하는 이유 가장 단순한 해결책 — Claude Code로 Claude Code를 감시하는 것 — 은 비용과 지연 시간에서 무너진다. 모든 엔터프라이즈 에이전트에 대해 프론티어 모델 에이전트를 돌리면 보안 레이어가 보호 대상인 AI보다 더 비싸진다. Onyx의 답은 정확히 한 가지만 하는 작고 고도로 특화된 모델이다. 현재 행동을 더 무거운 감시자에게 에스컬레이션해야 할지 판단하는 것. Sarah는 블리츠 체스에 비유한다. 그랜드마스터는 빠른 수에서는 직관으로 두고 결정적인 분기점에서만 멈춘다. Maxim은 체스 비유가 맞다고 말한다 — 리스크가 가장 높은 지점에 지능을 집중하고 나머지는 최대한 가볍게 유지해야 한다. > *"한 가지만 잘하는 모델을 훈련하려고 합니다. 매우 작고, '더 스마트한 에이전트가 이걸 봐야 할까?'라고 말하는 것 외에는 거의 아무것도 못 하는 모델들이죠."* ## [18:38] Onyx의 인재 문화 8200 같은 부대, Armis와 Wiz 같은 회사로 대표되는 이스라엘의 보안 인재는 잘 알려져 있다. Onyx의 DNA는 다르다. 공동창업자 Gil의 배경은 공격적 사이버가 아니라 합성 데이터와 NVIDIA다. Onyx의 연구 엔지니어링 인력 대부분은 수학과 사이버의 교차점에 집중하는 이스라엘 정보부대 출신이다. Maxim은 이 조합이 의도적이라고 본다 — Onyx가 해결하려는 장기 문제는 엔터프라이즈 보안만이 아니라 어떻게 고도화된 AI를 통제할 것인가, 그 자체이기 때문이다. 그러려면 보안 감각 곁에 깊은 AI 전문성이 필요하다. 이스라엘 전체가 AI에서 빠르게 따라잡고 있다. 월드 모델, AI 인프라, 칩 분야 모두. > *"문제는 사이버보안만이 아닙니다. 장기적으로 고도화된 AI를 어떻게 통제할 것인가의 문제입니다 — 엔터프라이즈 보안 격차를 잊는다 해도 그 문제는 매우 중요하게 들립니다."* ## [21:24] 기계적 해석 가능성 Maxim은 기계적 해석 가능성 — 모델 가중치와 활성화 내부에서 실제로 무슨 일이 일어나는지 이해하는 것 — 이 가능하고 또 필요하다고 믿는다. 그의 반직관적인 테제: 모델이 중요한 영역에서 인간보다 훨씬 스마트해질수록, 다른 모델의 내부 구조를 해독하는 데도 우리보다 더 잘 갖춰질 것이라는 것이다. Onyx는 보안 도구로서만이 아니라 지능 자체를 이해하는 창으로서 이 분야 연구에 적극적으로 투자하고 있다. Sarah는 그 베팅을 지지하며, AI뿐 아니라 인지 자체를 이해할 기회라고 말한다. > *"적어도 일부 중요한 면에서 우리보다 훨씬 스마트한 모델을 갖게 되기 시작하면서, 기계적 역량을 훨씬 더 효과적으로 해독할 수 있게 될 것이라 생각합니다."* ## [23:35] Onyx가 고객 신뢰를 쌓는 방법 포춘 10, 20위 기업들은 보통 100명도 안 되는 2년짜리 스타트업과 일하지 않는다. 그 규칙을 깨는 것은 고통이다. 매일 에이전트 행동 사고를 겪는 CISO들에게는 전화할 기존 업체가 없다. 3년 전에는 이 문제 자체가 없었기 때문이다. Onyx는 스텔스에서 나오자마자 문제 설명이 자신들이 이미 불끄고 있던 것과 맞아떨어졌던 엔터프라이즈들로부터 인바운드를 받는다. Maxim은 이 창이 좁고 일시적이라고 본다 — 엔터프라이즈 구매자들은 신생 스타트업도 성장한다는 걸 알고, 뒤늦게 도입하는 것보다 일찍 제품을 함께 만들어가는 고객이 되는 걸 택한다. > *"이런 기회는 고통이 아주 강할 때만 생깁니다. 고통이 너무 강해서 이렇게 말하는 거죠. '이 회사가 방금 스텔스에서 나왔다고? 근데 내가 매일 겪는 문제야. 전화해봐야겠어.'"* ## [25:10] 근본적인 수준에서의 리스크 완화 CISO들의 두 번째 공황 — 에이전트 행동을 넘어 — 은 자동화된 취약점 연구의 비용이 급락하고 있다는 것이다. 코딩 도구가 이제 불과 몇 년 전만 해도 수십 년은 걸릴 것 같았던 규모로 취약점을 찾고 악용할 수 있다. Maxim은 시장이 과잉반응하는 게 아니라고 말한다. 이건 진짜 구조적 전환이다. 올바른 대응은 두 갈래다. 지금 당장의 빠른 패치와 완화 통제, 그리고 공격자의 도구가 무엇을 하든 상관없이 악용 가능한 표면을 줄이는 근본적인 통제 — 잠긴 신원, 방화벽, 엔드포인트 감지 — 에 대한 투자다. > *"진짜 해결책은 — 대형 엔터프라이즈의 모든 보안 리더가 알고 있듯이 — 이런 리스크를 피하기 위한 기반 요소들을 갖추는 것입니다."* ## [27:45] Glasswing과 Daybreak의 단계적 출시 Anthropic의 Glasswing과 OpenAI의 Daybreak — 더 강력한 모델에 대한 통제된 출시 프로그램에 대해 Maxim은 조건부 입장을 취한다. 단계적 출시는 전 세계적으로 조율된다면 이상적이다 — 플레이북을 만들고, 지식을 공유하고, 전력망이나 항공사에서의 대규모 실패를 방지할 시간을 벌어준다. 하지만 어떤 행위자가 단계적 일정보다 먼저 비슷한 수준의 모델을 출시한다면, 단계적 접근 자체가 오히려 부담이 된다. 조기 접근권을 얻지 못한 기업들이 대비할 기회조차 없었던 위협에 노출되기 때문이다. 그의 권고는 더 많은 조직이 병렬로 방어를 구축할 수 있도록 접근권을 넓게 열어주는 것이다. > *"만약 누군가가 메서드 수준 모델에 더 일찍 도달한다면, 돌이켜보면 그건 큰 실수였을 것입니다 — 적어도 기업들에게 매우 빠르게 움직일 선택권을 줄 수 있었을 텐데."* ## [29:11] 도입을 미루는 대형 엔터프라이즈 2년 전만 해도 대형 기업들 중 상당수가 단순히 AI를 금지했다. 오늘날 Maxim은 그런 경우를 거의 보지 못한다. 금융 분야는 여전히 제약을 둔다 — 에이전트는 허용하되 어떤 도구를 쓸지는 제한하는 식으로 — 하지만 전면 금지는 사라졌다. 그는 이것이 옳다고 본다. 특정 도구에 종속되는 것 자체가 리스크이기 때문이다. 이 시장이 움직이는 속도에서 한 벤더 모델에만 베팅하는 것은 다음 세대가 판도를 바꿀 때 발목이 잡힌다는 뜻이다. 폭넓은 도구를 허용하면서 엄격하게 관리하는 기업이 공격적으로 제한하는 기업을 앞설 것이다. > *"1년 전 OpenAI에 베팅했다면 세상에서 가장 안전한 베팅이었겠지만, 갑자기 Anthropic이 훨씬 더 좋은 모델과 도구를 갖게 됐죠."* ## [30:46] Onyx와 더 넓은 AI 보안 시장 AI 보안은 새로운 벤더와 새로운 공격 표면으로 혼잡하다. 제품 범위에 대한 불안에 Maxim이 내놓는 반론은 이렇다. 2026년 AI의 두 가지 핵심 기반 — 트랜스포머 기반 파운데이션 모델과 도구 호출 에이전트 루프 — 은 수년간 근본적으로 바뀌지 않았다. 그 안정성 덕분에 Onyx는 핵심 기술을 가볍게 유지하면서 다양한 에이전트 애플리케이션을 향해 구축할 수 있다. 아키텍처 전환에 대한 진짜 헤지는 어떤 단일 모델 패러다임이 영원히 지속될 것이라는 데 베팅하는 게 아니라, 빠르게 재훈련하고 적응할 수 있는 연구자에게 투자하는 것이다. > *"2026년 AI가 작동하는 두 핵심 기둥은 지난 몇 년간 바뀌지 않았습니다. 여전히 대체로 LLM 파운데이션 모델이고, 여전히 거의 같은 방식으로 에이전트를 구축하고 있죠."* ## [32:36] 랩들이 모델 신뢰와 거버넌스를 직접 해결해야 할까? 베이 에어리어에서 가장 뜨거운 질문. 랩들이 결국 신뢰와 거버넌스 문제를 스스로 흡수할까? Maxim이 내놓는 구조적 반론은 이렇다. 구매자들은 차를 판 사람이 차를 인증하는 걸 원하지 않는다. 보안팀에는 자신의 제품 명성을 지키는 벤더가 아니라, 사업 모델 자체가 옳아야만 살아남는 독립적인 주체가 필요하다. 구매자 심리를 넘어서, Maxim은 '들쑥날쑥한 지능' 실수 — 더 강한 모델이 나오면 나아질 어리석은 오류들 — 와 의도 수준의 실패 — 적대적 조작, 잘못 정렬된 목표, 목표 표류 — 를 구분한다. 랩들은 첫 번째 범주는 고칠 것이다. 두 번째는 구조적으로 독립된 감시자만이 다룰 수 있다. > *"어떤 제품의 벤더가 그 제품이 당신의 환경을 망가뜨리지 않을 것이라고 말하는 걸 신뢰하지는 않을 것입니다. 전적으로 이 제품이 올바르다고 말하는 것에 사업이 달린 독립적인 주체를 원하겠죠."* ## [36:56] 보안에서 반드시 일어나야 할 것들 Sarah가 묻는다. 더 넓은 기술 및 연구 커뮤니티 — 특히 랩들 — 가 보안 관점에서 무엇을 놓치고 있는가. Maxim의 답: 기술적 격차가 아니라 공감의 격차다. 보안 제품을 만들려면 보안팀이 실제로 어떻게 운영되는지 깊이 이해해야 한다 — 조직 구조, 책임 범위, 정보 흐름. 이스라엘이 강한 보안 인재를 배출하는 이유 중 하나는 군 복무가 엔지니어들에게 나중에 자신이 만들 제품의 최종 사용자가 되는 직접 경험을 주기 때문이다. 랩들은 그 제품을 배포하고 방어해야 할 조직의 운영 현실에 충분히 주의를 기울이지 않고 역량을 구축하고 있다는 것이 그의 암묵적 지적이다. > *"어떤 기술 문제를 해결하든 결국 사람을 위한, 특정 구조를 가진 조직을 위한 도구를 만드는 것입니다. 기술 문제만 해결하는 게 아니라 그들이 진심으로 좋아하는 제품을 이 대상을 위해 만드는 건 정말 어렵습니다."* ## [39:14] Maxim이 AGI를 믿는 이유 Sarah가 마무리하며 Maxim이 인간 보안팀이 앞으로도 몇 년은 존재할 것이라고 암묵적으로 믿고 있음을 지적한다. 그는 맞다고 하면서도 타임라인을 더한다. 보안팀은 가까운 미래에 완전히 AI 에이전트가 운영할 것이다. 대부분의 지식 노동이 그렇게 될 것처럼. 그가 말하는 현실적인 AGI 낙관론은 훌륭한 제품을 만드는 일은 변하지 않는다는 것이다. 최종 사용자가 누구인지 항상 알고 그들의 경험을 최적화해야 한다. 지금은 몇 명의 에이전트를 곁에 둔 인간이다. 그 비율이 뒤집힐 때도 같은 원칙이 적용된다 — 다만 대시보드 대신 컨텍스트 창을 읽는 에이전트를 대상으로 할 뿐이다. > *"오늘 제가 제품을 팔 때는 몇몇 에이전트가 곁에 있는 인간 대상에게 팝니다. 그 대상이 인간보다 에이전트가 더 많아지면, 에이전트가 일을 하는 방식에 맞게 진화하고 잘 작동하게 만드는 것이 중요해질 것입니다."* ## 등장인물 - **Maxim Bar Kogan** (인물): Onyx Security 공동창업자 겸 CEO. 이스라엘 정보부대 출신, 수학과 공격적 사이버 배경. - **Sarah Guo** (인물): No Priors 진행자, Conviction의 창업자 겸 GP. - **Onyx Security** (조직): AI 감시 인프라를 구축하는 이스라엘 기반 스타트업. 엔터프라이즈 AI 에이전트를 모니터링하고 통제하기 위한 특화된 소형 모델을 훈련한다. - **AutoGPT** (소프트웨어): 초기 오픈소스 자율 LLM 에이전트. Maxim이 에이전트 리스크를 구체화한 변곡점으로 꼽은 프로그램. - **Glasswing / Daybreak** (소프트웨어): 각각 Anthropic과 OpenAI의 프론티어 모델 접근에 대한 통제된 출시 프로그램. - **기계적 해석 가능성** (개념): 신경망의 내부 가중치와 활성화 구조를 이해하려는 연구 프로그램. Onyx는 이를 AI 감시의 장기 기반으로 삼는다. - **보안 컨트롤 플레인** (개념): Onyx의 제품 카테고리 — 에이전트 권한, 행동 정당성, 행동 이력을 실시간으로 모니터링하는 벤더 독립적 레이어. - **8200** (조직): 이스라엘 정보부대. 이스라엘 최고의 보안 및 기술 인재, Onyx 엔지니어 다수를 배출한 것으로 알려져 있다.

#ai-security#enterprise-ai#ai-agents

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray

1:09:32

EN/ZH

Watch with Captions

Every의 공동창업자이자 CEO인 Dan Shipper가 돌아와 AI와 일의 미래에 관한 12가지 역발상 예측을 풀어놓는다. 대부분은 세간의 공포에 정면으로 반박하는 내용이다. 핵심 주장은 이렇다: 자동화는 인간의 업무량을 줄이는 게 아니라 재편하고, Codex와 Claude Code가 지식노동의 새로운 운영체제로 자리잡고 있으며, SaaS 종말론은 허구다. 살아남기 위해 필요한 단 하나의 능력은 모델이 발전할 때 함께 올라탈 의지뿐이다. 30명 규모의 Every는 이 가설을 매일 실험하는 회사로서, Dan은 그 어느 누구보다 예측의 정확성을 검증할 유리한 위치에 있다. ## [00:00] Dan Shipper 소개 Lenny Rachitsky는 Dan의 전 출연을 떠올리며 문을 연다. 당시 Dan이 "별 생각 없이" 꺼낸 예측, 즉 비개발자의 Claude Code 활용 가능성을 사람들이 간과하고 있다는 발언이 "믿기 어려울 만큼 정확히 맞아떨어졌다"는 것이다. 이번 출연에서 Dan은 열두 가지 예측을 더 들고 왔고, 결론부터 꺼낸다: > *"AI 일자리 종말론은 실제로 일어나는 일이 아닙니다."* ## [02:56] AI 미래 속에서 살아가는 Dan의 특별한 위치 Dan은 Every가 왜 조기 신호 탐지 실험실 역할을 하는지 설명한다. 편집자부터 운영, 재무 담당자까지 모든 직원이 매일 AI를 쓴다. 덕분에 앞으로 12개월이 실제로 어떻게 펼쳐질지 남보다 일찍 파악하고 있다는 것이다. 그는 이를 "샌프란시스코 버블" 시각과 대비시킨다. AI 도입의 진짜 최전선은 AI가 만들어지는 곳이 아니라, AI가 실제 전문가의 실제 업무와 만나는 곳이라는 주장이다. > *"AI의 최전선은 AI가 실제 사람과 만나 무언가를 하는 바로 그 지점입니다."* ## [09:17] 앞으로 1년, 일하는 방식이 어떻게 달라지는가 Lenny Rachitsky는 세 가지 예측 묶음을 정리한다: 일하는 방식, 일의 형태, 누가 살아남는가. Dan의 첫 번째 예측은, 모든 전문직 업무가 Codex 또는 Claude Code라는 하나의 화면으로 수렴한다는 것이다. 이 도구는 당신이 하는 일을 지켜보면서 조사를 처리하고, 이메일을 쓰고, 당신이 주 문서에 집중하는 동안 장시간 작업을 처리하는 병렬 업무 파트너가 된다. Dan은 이미 열흘째 받은 편지함을 비운 상태다. Codex와 Every의 이메일 에이전트 Cora가 그의 이메일을 처리해주기 때문이다. > *"이 병렬 업무 파트너는 문서에서 직접 응답하고 내용을 작성할 뿐 아니라, 조사를 하러 나가기도 합니다."* ## [16:39] 범용 에이전트의 가능성 Dan은 모든 회사가 Slack 안에 하나의 "슈퍼 에이전트"를 갖게 될 것이라고 예측한다. 좁은 업무 봇이 아니라 회사 맥락 전체를 이해하는 범용 어시스턴트로, 전 직원이 매일 상호작용하는 조직의 기억 레이어가 된다. 질문을 라우팅하고, 데이터를 꺼내고, 서로 대화가 필요한 줄 몰랐던 팀들 사이의 간극을 메운다. ## [18:08] 새로운 업무 운영체제가 된 Codex와 Claude Code Claude Code의 돌파구는 강력한 에이전트를 컴퓨터에 직접 올려놓고 터미널 접근권, 그리고 결정적으로 브라우저 접근권까지 준 것이었다. Anthropic이 이 패러다임을 먼저 찾아냈고, OpenAI는 5.3 릴리즈 즈음 따라잡은 뒤 가속했다. Dan이 지금 매일 쓰는 도구는 Codex다. 자신의 글쓰기 앱 Proof 옆에 항상 켜두고, 에이전트가 그의 브라우저를 지켜보면서 현재 열린 페이지를 읽고 컨텍스트 전환 없이 대신 행동한다. > *"누가 앞서든, 당신이 하는 모든 일이 그 화면들 중 하나 안에서 이루어지게 된다는 것은 제게 너무 명확합니다."* SaaS 앱에 AI 토큰을 직접 들고 들어오는 모델은 경제 구조를 바꾼다. 추론 비용을 SaaS 제품이 아닌 사용자가 부담하므로 마진이 회복되고, 독자적인 AI 레이어를 처음부터 만들어야 한다는 압박이 사라진다. ## [25:39] Cursor의 역할 Cursor는 현재 코딩 워크플로를 장악하고 있지만, Dan의 눈에는 전략적 갈림길에 서 있다. 순수한 코딩 IDE로 남을 것인가, 범용 에이전트 화면으로 진화할 것인가. 좁게 유지하면 제품 집중력이 생기지만, 넓혀가면 Codex, Claude Code와 정면 경쟁이 된다. Dan의 예측은, 코드와 일반 지식노동을 한 곳에서 모두 처리하는 화면이 카테고리 승자가 된다는 것이다. ## [27:42] SaaS 기업이 지금 무엇을 만들어야 하는가 SaaS 제품은 이제 사람이 읽기 좋은 화면이 아니라 에이전트가 읽기 좋은 화면이 되어야 한다. 깔끔한 HTML, 자동화 소비에 맞게 정보를 드러내는 설계가 필요하다. Dan은 Proof를 예로 든다. Codex가 페이지를 지켜보기 때문에 자잘한 불편 사항이 거의 즉시 수정되고, "뭔가 불편했다"에서 "해결됐다"까지의 고리가 빠르게 닫힌다. > *"내가 불편한 걸 느끼고, 바로 여기서 고치는, 아주 빠른 폐쇄 루프의 실마리가 보입니다."* ## [31:13] CLI는 이미 끝났다 CLI 시대는 빠르게 달려왔다 사라지고 있다. GUI에서 파워 무브로서의 CLI로, 그리고 CLI를 통째로 대체하는 에이전트로 이어졌다. 에이전트가 화면을 읽고 어떤 인터페이스든 작동시킬 수 있게 되면, 터미널에 머물 이유가 없다. Dan의 예측은 단호하다: > *"CLI는 끝났습니다. 우리는 CLI 시대를 순식간에 달려왔습니다."* ## [33:34] 에이전트 둘이 하나보다 낫다 Dan은 에이전트 만능주의에 반박한다. 실제로 떠오르는 패턴은 코딩용, 이메일용, 데이터용 전문 에이전트들이 사용자 대신 서로 대화하는 구조다. 앱에서 무언가 오작동하면 Codex가 지원 티켓 없이 벤더의 에이전트와 직접 대화해 문제를 진단할 수 있다. 모든 사람이 에이전트를 갖고 있고 에이전트들이 서로 협상할 수 있다고 가정하면 패러다임 자체가 바뀐다. ## [36:22] Dan이 SaaS 주식에 강세인 이유 "SaaS는 죽었다"는 서사는 에이전트가 사용을 주도할 때 경제가 실제로 어떻게 작동하는지를 놓친다. 사용자가 AI 토큰을 들고 SaaS 제품을 쓰면 벤더의 추론 비용은 0에 수렴한다. Dan의 역발상: > *"저라면 지금 SaaS 주식을 살 것입니다."* 제품을 에이전트 친화적으로 만드는 SaaS 기업은 중간에서 밀려나는 게 아니라 마진 순풍을 얻는다. ## [39:01] 자동화가 인간의 일을 줄이지 않는 이유 이 에피소드의 핵심 지적 논지다. Dan은 자동화 레이어가 생길 때마다 그것이 제대로 작동하는지 확인하는 인간 관리자가 반드시 위에 필요하다고 주장한다. 그는 직접 벤치마크를 만들었다. "시니어 엔지니어 벤치마크"로, 실제 시니어 엔지니어 두 명이 각자 그의 Proof 앱을 처음부터 다시 작성하게 한 다음, 새 모델이 나올 때마다 그 결과물과 비교해 점수를 매기는 방식이다. 모델들은 GPT-5.5 이전까지 100점 만점에 30점을 받았고, GPT-5.5에서 60점으로 뛰었다. 이 차이가 드러내는 것은 중요하다. 모델은 당신이 고치라고 한 것을 고친다. 시니어 인간 엔지니어는 코드베이스를 보고 전면 재작성이 필요하다고 스스로 판단하고 말한다. 모델은 그 판단을 자발적으로 꺼내지 않는다. 인간이 언어화해야 하는 더 높은 프레임이 항상 존재한다. > *"무언가를 자동화할 때마다, 자동화가 잘 작동하고 있는지 확인하는 인간이 위에 있어야 합니다."* ## [47:00] 사람이 직접 작성한 코드의 가치 사람이 직접 쓴 코드는 모델 결과물을 채점할 수 있는 기준 신호 역할을 한다. Dan의 벤치마크는 두 명의 인간이 직접 다시 작성한 코드를 참조 답안으로 삼는다. AI가 생성한 코드가 기본값이 되면서 사람이 쓴 코드베이스는 희소해지고 더 가치 있어진다. AI가 실제로 개선되고 있는지 알려면 바로 그것이 필요하기 때문이다. ## [48:36] 빠른 정리 Lenny Rachitsky가 첫 번째 예측 묶음을 정리한다. 업무는 Codex 또는 Claude Code 안에서 이루어지고, 모든 회사에 Slack 슈퍼 에이전트가 생기며, 토큰 직접 부담 방식이 SaaS 마진을 회복시키고, CLI는 끝났으며, 전문 에이전트 둘이 범용 에이전트 하나보다 낫고, 자동화는 인간의 업무를 줄이는 게 아니라 늘린다. ## [50:15] 일의 형태가 바뀐다 두 번째 묶음은 일의 형태 자체를 다룬다. Dan의 시각: 현장 배치 엔지니어가 가장 가치 있는 채용이 된다. 고객 옆에 앉아 워크플로를 이해하고, 같은 미팅 안에서 해결책을 만들어 배포할 수 있는 사람이다. 이전 에세이의 "배분 경제" 개념도 여기 적용된다. 인간은 직접 생산자에서 AI 역량의 배분자로 이동하고, 배분을 잘하는 것 자체가 인지적으로 까다로운 일이 된다. > *"저는 동시에 AI를 굉장히 많이 쓰면서도, AI가 만들어내는 것들이 만들 가치가 있는지 확인하는 인간의 역할에 대해 매우 낙관적입니다."* ## [56:17] 형편없는 분석에 허덕이는 데이터 과학자들 데이터 과학 팀은 회사 전체에서 올라오는 AI 생성 분석 자료에 잠겨가고 있다. 그럴듯해 보이지만 틀린 경우가 많다. 시니어 데이터 과학자의 일은 분석을 생산하는 것에서 감사하는 것으로 바뀌는데, 이게 더 어렵고 인지적으로 더 부담이 된다. 엔지니어링도 같은 역학이다. 초급 수준의 요청은 모델이 처리하면서 더 깊은 판단이 필요한 엣지 케이스들이 더 많이 드러난다. > *"기본 요청을 처리하는 팀이 다루기 어려운 더 깊은 문제들을 처리할 시니어가 더 필요해집니다."* ## [58:24] AI로 가장 덜 바뀌는 제품/기술 직군 Dan의 답: 결과물을 프롬프트로 표현하기 가장 어려운 직군. 그는 "에이전트 베이비시팅"(오류를 수동적으로 감시하는 역할)과 "현장 배치 엔지니어링"(전문가 없이는 못 하던 일을 모두가 할 수 있게 시스템을 만드는 역할)을 구분한다. 흥미롭고 자동화하기 어려운 일은 후자에 있다. ## [62:17] AI가 쓴 글을 더 많이 읽게 되고, 우리는 그걸 좋아하게 된다 Every는 분기 계획에 Notion 에이전트를 쓴다. 각 팀의 전략 보고서가 AI로 생성되는데, 돌아오는 결과물이 수동 계획보다 낫다. Dan의 이메일 대부분은 GPT-5.5가 쓴다. 그가 AI 작성 콘텐츠의 수용 가능 여부를 판단하는 기준은 이것이다: 발신자가 AI에 지시하기 위해 내용을 이해해야 했는가? 그렇다면 괜찮다. 발신자가 분명히 읽지 않았다면, 그건 사회적 계약 위반이다. > *"질 낮은 콘텐츠의 기준은, 만드는 데 걸린 시간이 내가 읽는 시간보다 짧은 경우입니다."* Every는 에이전트 공동 저자와 함께 가이드를 발행하는데, 인간과 다른 에이전트 모두를 독자로 삼아 설계된 새로운 콘텐츠 형식이다. ## [68:28] PM이 AI 시대를 지배할 이유 Dan은 Spiral 제품을 운영하는 Every 내부 PM Marcus를 전형적 사례로 든다. 강한 제품 감각을 갖추고, AI에 지시해 빠르게 만들고 반복하며, 엔지니어링 인력을 기다리지 않고 배포한다. PM은 근본적으로 배분자다. 무엇을 누구를 위해 만들지 결정하는 역할이고, 만드는 행위 자체가 저렴해질수록 그 희소성은 오히려 높아진다. > *"저는 PM에 정말, 정말 강하게 베팅합니다."* ## [71:05] 풀스택 디자이너도 큰 승자다 강한 시각적 감각과 코딩 능력을 함께 갖춘 풀스택 디자이너들은 Lovable, Figma Make 같은 도구에서 이미 직접 풀 리퀘스트를 올리고 있다. 디자인과 엔지니어링 사이의 핸드오프가 0에 가깝게 줄어든다. Dan은 이들이 PM과 함께 AI 시대의 핵심 슈퍼히어로가 될 것으로 본다. ## [73:11] AI 일자리 종말론은 일어나지 않는다 Dan은 현재의 감원(대부분 과잉 채용 조정)과 구조적 AI 대체 주장을 분리하고, 후자를 거부한다. 구조적 논리는 이렇다. 모델은 어제의 인간 역량을 학습해 이미 알려진 것을 가장 기본적인 형태로 생산한다. 인간은 그 고정된 역량을 바탕으로 새로운 것을 해내면서 프론티어를 밀어붙이고, 모델은 다시 그것을 따라잡아야 한다. 이 순환이 반복된다. > *"모델이 작동하는 방식의 구조상, 인간이 더 앞으로 나아갈 여지는 항상 있습니다."* ## [76:00] 모델을 타고 올라타는 법 실행 가능한 조언은 이렇다. 새 모델이 나올 때 저항하지 말고, 새로운 능력의 집합으로 보고 실제 자신의 일에 탐색해 적용하라. Dan은 주요 모델이 나올 때마다 시니어 엔지니어 벤치마크를 다시 돌린다. AI 지식의 최전선이 샌프란시스코에 있다는 생각도 반박한다. 브루클린에 있는 Every가 앞서가는 이유는 AI를 만들어서가 아니라 모든 일에 모델을 쓰기 때문이다. > *"필요한 건 단 하나, 모델을 타고 올라타는 것뿐입니다. 그건 당신이 하는 일에 모델을 쓴다는 뜻입니다."* ## [81:02] 마지막 예측과 조언 Lenny Rachitsky가 시각을 넓힌다. 이번 대화의 두 면은 "당신이 두려워하는 것보다 덜 변한다"(SaaS는 계속되고, 일자리는 사라지지 않는다)와 "당신이 준비한 것보다 더 많이 변한다"(일이 이루어지는 방식, 어떤 역할이 중요한지, 하루가 어떤 모습인지)다. Dan의 마지막 주장: 현장 배치 엔지니어가 새로운 필수 채용이고, 직원들이 최신 모델을 쓰지 못하게 막는 기업은 서서히 타는 전략적 실수를 저지르고 있다. ## [85:24] 라이트닝 라운드 속사포 문답: Dan의 가장 역발상적 믿음은 AI 일자리 종말론이 진짜로 일어나지 않는다는 것이고, 더 많은 사람이 알았으면 하는 한 가지는 AI의 최전선이 샌프란시스코가 아니라 실제 영역에서 모델을 써서 실제 일을 하는 곳이라는 것이다. 과거의 자신에게는 시니어 엔지니어를 더 일찍 채용하라고 하겠다고 했고, 앞으로 1년 안에 AI가 사람들이 벤치마크를 생각하는 방식을 근본적으로 바꿀 것으로 예상한다. ## 등장인물 및 주요 개념 - **Dan Shipper** (인물): Every 공동창업자이자 CEO. "After Automation" 에세이 저자. Every를 AI 도입 실험실로 운영 - **Lenny Rachitsky** (인물): Lenny's Podcast 진행자, Lenny's Newsletter 창업자, 전 Airbnb PM - **Every** (조직): 30인 규모의 AI 네이티브 미디어·소프트웨어 회사. 전 직원이 매일 AI 사용자 - **Codex** (소프트웨어): OpenAI의 에이전틱 코딩 및 범용 지식노동 화면. Dan이 현재 매일 쓰는 도구 - **Claude Code** (소프트웨어): Anthropic의 터미널 기반 코딩 에이전트. 컴퓨터 위 에이전트 패러다임을 먼저 개척 - **Proof** (소프트웨어): Dan의 AI 지원 마크다운 글쓰기 앱. 시니어 엔지니어 벤치마크의 참조 코드베이스 - **Cora** (소프트웨어): Every의 이메일 에이전트. Codex와 연동해 받은 편지함을 관리 - **Cursor** (소프트웨어): AI 코딩 IDE. 코딩 도구로 남을지 범용 에이전트 화면으로 진화할지 전략적 갈림길에 있음 - **현장 배치 엔지니어(Forward-deployed engineer)** (개념): 엔지니어링 실행과 고객 대면 문제 발굴을 결합한 하이브리드 직군. Dan이 꼽는 AI 시대 최고 가치 채용 - **시니어 엔지니어 벤치마크(Senior engineer benchmark)** (개념): 인간 시니어 엔지니어 두 명이 코드베이스를 처음부터 다시 작성하고, 새 모델을 그 결과물과 비교해 점수 매기는 Dan의 자체 평가 방식 - **배분 경제(Allocation economy)** (개념): 인간이 직접 생산자에서 AI 역량의 배분자로 이동한다는 Dan의 프레임워크 - **모델을 타고 올라타기(Ride the models)** (개념): Dan의 생존 조언. 새 모델이 나올 때마다 새로운 능력으로 보고 자신의 영역에 적극 탐색해 적용하라

#ai-agents#future-of-work#saas

팟캐스트Hear the voice. See the shape of the thought.

채널 둘러보기

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, &amp; Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

A Conversation With Demis Hassabis' Biographer

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

A rational conversation on where AI is actually going | Benedict Evans

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

The Rule for Picking AI Winners | The a16z Show

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

Why Opus 4.8 Pulled Me Back to Claude

긴급 토론: AI, 이란 전쟁, 그리고 거짓말의 진실

Onyx Security CEO Maxim Bar Kogan과 함께하는 엔터프라이즈 AI 감시자 구축

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray

사모 시장, 소프트웨어 재평가, 자본 배분 | Marc Rowan, a16z에서

AI로 모든 것을 자동화했더니 직원이 세 배로 늘었다

🔬 단백질에도 쓴맛 교훈이 온다 — Alex Rives, BioHub

Cursor가 Fireworks로 Composer를 학습시킨 방법: 고성능 RL을 위한 분산 인프라

첫 번째 Managed Agent 출시하기

Bruno Fernandes: Roy Keane가 내 말을 왜곡했다. 2억 파운드를 제안받았지만 거절했다.

AI 역설: 자동화가 늘수록 사람도, 일도 더 많아진다 | Dan Shipper

No Priors: AI, Machine Learning, Tech, & Startups