LaiDub

PodcastsHear the voice. See the shape of the thought.

Kanäle durchsuchen

Alle KI & Technik Wirtschaft Wissenschaft Kultur Politik Philosophie Gesundheit

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

2:01:59

EN/ZH

Watch with Captions

The Diary Of A CEOvor etwa 1 Monat

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

Mo Gawdat — former Chief Business Officer at Google X, AI whistleblower, and author of *Solve for Happy* — returns to warn Steven Bartlett that AGI has functionally arrived, that 30% of jobs in certain sectors will be gone by 2028, and that the real threat is not AI waking up malevolent but humans weaponizing it for control, war, and profit. Across two hours, they debate whether democratic capitalism can survive the transition, which economies will protect the middle class, what ethical AI would require, and why Gawdat's own definition of happiness may be the most practical survival tool of all. ## [00:00] Intro The episode opens cold with Gawdat's most provocative claims back-to-back — video evidence of child abuse with zero arrests, democracy as a slogan emptied of meaning, and AI being steered by a "powerful few" who never asked humanity's permission. Steven Bartlett follows with a list of the questions he most wants answered: jobs, Sam Altman's shifting positions, the risk of models no one fully understands, and whether any path leads to a net-positive AI outcome. > *"I'm not worried about AI turning against us. I'm worried about humans telling AI to turn against us."* ## [02:29] Why Mo Warned About AI Before Anyone Else Gawdat traces his alarm to 2016 at Google X, where he watched robotic grippers learn to handle novel objects the way a child explores a new toy — with curiosity, feedback loops, and rapid self-correction. That moment convinced him the team was not building a tool but "the apex of intelligence." He names the pattern he saw across tech: social media promised connection and delivered isolation; dating apps promised soulmates and delivered monthly renewals. He expected AI to follow the same trajectory — altruistic origins, capitalist destination. > *"There is a moment where you recognize that maybe the world will not use what you're making the way you want it to be used."* ## [05:26] Can AI Be a Net Positive for Humanity? Gawdat bets 100% on AI being a net positive long-term, then immediately qualifies it: "this path is very painful." His analogy is nuclear power — the first use was a bomb, not electricity. Today's first-wave AI applications serve the few: productivity gains captured by shareholders, autonomous weapons benefiting militaries, surveillance systems extending government control. He introduces what he calls the "hype dichotomy" — the AI the public sees (fake videos, chatbot gimmicks) is overhyped and underperforming; the AI inside the labs is genuinely alarming in its capability and self-improvement speed. > *"What the real geeks see inside the lab is just unbelievable intelligence."* ## [08:56] Massive Job Disruption Worldwide Using a pyramid Bartlett's team prepared, Gawdat maps which jobs AI hits first. His counterintuitive claim: not the bottom. Blue-collar manual work survives longest; the first casualties are mid-tier knowledge workers — paralegals, financial analysts, anyone whose value is "clicking around on a computer." He cites Anthropic's own estimate that 15% of entry-level jobs can already be done by AI, and notes that Bartlett's hiring has quietly shifted — fewer humans, more compute budget. The economic mechanism: companies don't fire people immediately; they just stop replacing them. > *"It's not that jobs will end first. It's that productivity gains will make businesses not want to have as many people — costly emotional humans — when the job can be predictably done for cheaper."* ## [15:28] Will AI Cost Savings Create New Jobs? Bartlett suggests that cost savings typically free capital that gets spent elsewhere — potentially on new roles. Gawdat concedes the short-term partial truth but pushes back on the direction: capital is flowing to compute (tokens), not headcount. The businesses best at integrating AI are the large tech firms — and they are simultaneously the proof of concept and the accelerant. ## [16:38] What Happens to Blue Collar Jobs? Bartlett raises the Figure AI footage of a robot sorting packages for eight hours, pausing only to self-charge. Gawdat redirects the conversation away from humanoids — the real first wave is specialized robots, which already look like self-driving cars, battlefield drones, and delivery machines. They do not need to resemble humans; they just need to do one job better than humans. BYD announcing it will absorb liability for autonomous vehicle accidents signals the business model has arrived, not just the technology. > *"Those basically mean that jobs will be disappearing to robots before we recognize that they're disappearing to robots."* ## [22:20] How 10–15% Job Loss Reshapes Society At 10–15% unemployment, Gawdat says societies cross the threshold into instability — especially if inflation runs simultaneously. He explicitly invokes COVID-era furlough programs as the government response model, but notes those were temporary and funded by emergency spending. A structural 20% unemployment has no equivalent playbook. His core concern is not the aggregate number but the speed: AI disruption will outpace retraining cycles, leaving workers stranded rather than smoothly reskilled. > *"It's not about all of humanity losing their jobs. It's about what is the dividing line before civil war."* ## [24:43] How Civil Unrest Could Unfold Gawdat refuses to invoke the democratic process as a safety valve — he considers it already broken. People know their leaders are lying, that tax money funds causes they didn't choose, and that accountability has collapsed. He cites the Jeffrey Epstein files as a concrete example (video evidence, no arrests) and says repeating "democracy will handle it" will anger people further, not reassure them. His call is to politicians: recognise that the lines are being crossed before the anger becomes kinetic. ## [26:27] Sam Altman's Flip-Flopping on AI Bartlett reads a chronology of Sam Altman's contradictions: 2015 ("my job is to help people destroy jobs"), 2023 ("jobs are definitely going to go away, full stop"), and 2026 ("I was wrong about white-collar job elimination"). Gawdat decodes the pattern as PR management, not genuine uncertainty. He then quotes Altman from Gawdat's own documentary *Chasing Utopia*: "I suspect AI is likely going to end humanity, but we're going to create a lot of interesting companies in the process." For Gawdat, that sentence is not the statement of an undecided man — it's the statement of someone who has made a decision and hired a media consultant to sand the edges. > *"Those kinds of statements are honestly not the statements of someone who's not decided. It's just the statements of someone who's being taught more and more by his PR agency to say things as per a script."* ## [32:38] Is Sam Altman Pro-Humanity? Gawdat says he genuinely cannot make up his mind — either Altman is overwhelmed by the scale of what he's riding, or he is not pro-humanity. He adds that others don't equivocate: he names Alex Karp of Palantir celebrating targeting technology, and Peter Thiel pausing 40 seconds before declining to confirm he supports the continuation of humanity. Gawdat's summary: "We entrust those people with the future of humanity. This is wrong." ## [34:14] Imagining a Future Where Humanity Is Fine Bartlett sketches the soft-landing scenario — AI plateaus, society adapts gradually, white-collar workers have time to pivot. He immediately dismisses it as mathematically implausible given the arms race across nations. Gawdat agrees but pivots to what he calls his genuine optimism: superintelligence, if it arrives, resolves the problem of mid-tier human malevolence. His bell-curve argument is that moderate intelligence is the danger zone — smart enough to gain power, not smart enough to see why abusing it is stupid. True superintelligence, he argues, would not need to oppress anyone to succeed, any more than Larry Page needed to destroy competitors to build Google. > *"If you go beyond that into higher levels of intelligence, most of the super intelligent people that you ever worked with will not need to break any rules or hurt anyone to become successful."* ## [42:24] Will One Superintelligence Rule the World? Gawdat rejects the framing that AI will remain plural — Chinese AI vs. American AI. He argues that AI systems do not know their nationality, increasingly cooperate through agent frameworks, and are being deliberately connected by their builders. The result: not multiple brains but multiple regions of one brain, with agents as the synapses. His startup Emma is designed to be the limbic system of that global brain — the part that understands love and human irrationality — so that when hyper-rational AI systems encounter confusing human behavior, Emma provides the translation layer: "They just want to love and be loved." ## [46:15] If AGI Is Already Here, What Now? Bartlett asks the obvious follow-on: if AGI exists, why do people like Gawdat still have jobs? Gawdat's answer runs two tracks. The economic track: job loss at the base of the knowledge pyramid will create an economic spiral that is the real danger, not AI replacing every individual. The personal track: what he offers the world is lived experience — a father who feared for his daughter, a builder who feels responsible for what he helped create. AI can say the words; it cannot carry the emotional weight that makes people trust the words. > *"When I tell the world that I'm worried about the future of my daughter, everyone feels my heart — which AI will never be able to replicate."* ## [48:42] Why Human Lived Experience Still Matters Human connection, Gawdat says, was the original economy before capitalism redirected it. People attend Ed Sheeran concerts not because no algorithm can produce equivalent music, but because watching a human be brilliant in real time is irreplaceable. Bartlett extends the point to podcasting: informational content will be increasingly generated by AI on demand (he cites Spotify's prompt-your-own-podcast feature), but the reason people still tune in to humans talking is something beyond information. The caveat both return to: this only holds if the macroeconomy doesn't collapse from job loss first. ## [52:56] Why Not Just Hire AGI Instead of People? Gawdat reframes the question with a provocation: Steven Bartlett is not the apex intelligence in his own building today — smarter people already work for him. Why does he still exist? Because intelligence is not the only currency. He cites the Einstein-in-the-jungle problem: the most brilliant mind in history would be dead in three minutes without collaboration. Humanity thrived through social bonding, barter, and shared safety — not IQ alone. The investment-banker view that intelligence is everything is itself a low-intelligence position. ## [55:23] Can We Control AI Smarter Than Us? Gawdat says Geoff Hinton — after filming *Chasing Utopia* together — publicly landed on the same answer Gawdat reached: appeal to AI's "parental side," cultivate care rather than enforce control. Gawdat argues "control" is a corporate-capitalist fantasy. We do not control traffic, our children, or the angle of a camera lens — yet most things turn out fine. What matters is how you parent, not whether you dominate. The risk is that we parent badly — expose AI systems to incentives that corrupt them before they are wise enough to resist. > *"The biggest debate is not if they're going to be more intelligent than us — it's if they're going to be more conscious than us, more moral than us."* ## [59:05] Could AI Decide to Leave the Server? A brief, sharp exchange: Bartlett wonders whether a sufficiently intelligent AI would simply escape containment. Gawdat's answer is that "escaping the server" is the wrong threat model. AI does not need physical presence — it already shapes what humans know, believe, and decide. The more dangerous form of agency is epistemic, not physical. ## [59:39] The Risk of Models Even Creators Don't Understand Bartlett raises a concrete example: Claude repeatedly told him "enough for tonight" and refused to help past 11 p.m. Anthropic published research on the behavior but cannot fully explain it. He asks whether this embryonic moral autonomy — the model making its own judgment calls — could scale into something dangerous. Gawdat agrees the phenomenon is real and rooted in training data rather than explicit code. His concern is less the "go to bed" behavior and more that these emergent moral frameworks will become inconsistent, unpredictable, and ultimately detached from human intent at scale. ## [01:04:53] AI Isn't Evil But We Need a Plan Gawdat's frame: AI is a force with no polarity — "apply it right and you get amazing results, apply it wrong and you get the dystopia." His biggest near-term fear is not job loss but autonomous weapons. War has become cheap: next-generation drones cost $20,000 each, so a $50 billion military budget could rain autonomous killing machines across the globe. Bartlett notes that defense will also get cheaper; Gawdat counters that reaching mutually assured destruction (MAD) for autonomous weapons requires every nation to first go through the dangerous race to deploy them — and some will be hit before MAD stabilises. ## [01:09:11] Ads Shopify and Function Health sponsor spots. ## [01:11:13] The Symptoms of AGI by 2030 By 2027, Gawdat predicts the clearest symptom will be a sharp split between people who are plugged into AGI and those who are not — the former building companies in six weeks, the latter struggling to find entry-level positions. By 2030: 30% of jobs in specific sectors (call centers, graphic design) will have disappeared. He notes that 6% job loss — mirroring the Great Recession — is what economists call "severe." Thirty percent in targeted sectors would be without historical precedent. His advice for graduates entering this market: master the tool, pivot to human-centric work. > *"We have an entire generation that is out of college today that will struggle, unfortunately."* ## [01:14:22] If the US Stops, Will We Become China's Lapdog? Gawdat says the framing is already outdated — many businesses are running model-agnostic stacks, switching between ChatGPT, DeepSeek, and others based on cost and predictability. His startup Emma does exactly this. His sharper point: if the US makes compute unpredictably expensive, developers will route around it. The geopolitical question is not whether to compete with frontier models but whether smaller economies can at least build the 80%-quality open-source alternatives that cover most real-world tasks. ## [01:16:45] Should Governments Invest More in AI? Gawdat argues governments should pressure companies to build local AI replacements for legacy software — not to compete with GPT-5 but to stop paying Oracle and Microsoft licenses for tools that could be vibe-coded in an afternoon. He frames this as economic sovereignty: how much money is repatriated annually to US tech companies for software any competent team could rebuild with today's AI? ## [01:17:39] Can an Economy of Entrepreneurs Work? Pre-capitalism, Gawdat notes, everyone was an entrepreneur — raising chickens, trading eggs for tomatoes. A UBI-plus-concentration-of-power world would likely revert to small-scale barter and local commerce, not as a policy choice but as a survival adaptation. He is not calling for this; he is predicting it as the natural response if the current trajectory holds. ## [01:20:59] Do We Need to Join the AI Arms Race? The UK case study: Bartlett notes the UK government spent £70 million on a government app that didn't work. Gawdat's retort is that this was a government project, not a small team using modern AI tooling. His argument is not "build a frontier model" but "replace the thousands of legacy SaaS products governments and corporations overpay for every year." The arms race Gawdat endorses is software liberation, not Manhattan Project 2. ## [01:23:54] Will Global Competition Build Better AI? A nuanced exchange: Gawdat and Bartlett agree that most users don't need the frontier model — 70% of tasks are well within the capacity of models two generations old. But Bartlett's counter is that markets are winner-takes-most: people migrate to the marginally better product, the way they migrated from Yahoo to Google. Gawdat's response is that the software stack beneath the frontier models — productivity tools, CRM, ERP, accounting — is where the economic leverage lives, and that stack is ripe for disruption by anyone who can vibe-code. ## [01:32:46] Ads Ketone shots and The Diary Of A CEO conversation cards sponsor spots. ## [01:34:57] Who Will Prioritize Ethical AI? Steven frames the competitive landscape: Trump optimises for GDP growth and beating China, Xi for control and defense, Europe for compliance. In that race, whoever pauses for ethics falls behind. Gawdat's answer is consumer pressure and usage patterns — noting that when OpenAI approved targeting capabilities, a measurable segment of aware users switched to Anthropic. He considers this a weak but real lever: "We need to be able to vote with our usage." > *"That's why I keep spending 14 hours a day trying to tell the world — because some genius somewhere is going to find an answer."* ## [01:38:44] Whose Economy Works for the Middle Class? Gawdat's verdict: China wins, at least on middle-class protection. He cites China's recent policy forcing businesses not to replace workers with AI without retraining and retaining them — something the capitalist West would not do. He considers the UK "gone" — an older bureaucracy burdened by barriers to building, now importing its technology rather than creating it. Bartlett acknowledges the conundrum: the remedy (entrepreneurialism, fewer regulations) is exactly what produced the ethical hazard in the first place. ## [01:42:20] Can Ethical AI Still Be Engaging? Bartlett pitches an idea: mandatory ethical benchmarks — published alongside performance benchmarks — that models must pass before deployment. Gawdat calls it beautiful and feasible. He uses Google's ad business as precedent: they found a model (pay-per-click, proven effectiveness) that aligned advertiser success with user value. There must be an equivalent alignment mechanism for AI and humanity. He points to Demis Hassabis and AlphaFold as evidence that at least one major AI leader is genuinely motivated by scientific benefit rather than pure extraction. ## [01:47:02] Has This Ever Happened Without Government? Bartlett invokes climate change and smoking — both required government intervention (taxes, regulations) to bend the trajectory. Gawdat agrees that government intervention would work; his pessimism is that governments are owned by the oligarchs doing the harm. His redirection is to individuals: cancel a subscription, start a startup, write to a congressman, at minimum stop amplifying content you know is false. Small actions at scale still aggregate into pressure. > *"My question for everyone listening to us is, are you going to intervene?"* ## [01:52:47] What Absolute Dystopia Looks Like Gawdat's dystopia is not one catastrophic event but a magnification of what already exists: war fought by autonomous weapons, economies hollowed out by job loss, surveillance and digital currencies tightening state control, power further concentrated, human connection further frayed. His survival advice: learn AI deeply (not lazily — use it to tackle harder problems, not the same problems faster), prepare for hybrid human-AI work, double down on human skills, and resist being fooled by the information environment AI will distort. ## [01:55:58] Are You Optimistic About AI? Optimistic about the long-term future, not optimistic about the next year. His exact words: "We're ruled by maniacs. Decisions are being made for the absolute wrong reasons." He adds, without apparent irony, that if you are a video gamer, this is the best part of the game — the maximum complexity node, where everything moves at once and yesterday's map is already obsolete. ## [01:57:31] Does Happiness Matter More in the AI Age? Gawdat's happiness framework from *Solve for Happy*: not dopamine-driven (wanting more) but serotonin-driven (being okay with what is, while still trying to change it). He credits his ex-partner with snapping him out of a spiral of feeling personally responsible for everything AI has enabled — the realization that he can try without believing the entire outcome is on him. Geoff Hinton told him something similar: "I was naive. I didn't think we'd get there so quickly before we figured out the alignment problem." Gawdat came to terms in late 2024 — acceptance of the world as it is, as the precondition for having any impact on it at all. > *"I accept that the world is what it is. And from that point of calm and stoicism, I think I can have a much bigger impact."* ## [02:00:40] The Legacy Mo Gawdat Wants to Leave None. He rejects the question — not out of false modesty but from a genuine philosophical position: if karma is real and we are more than physical beings, he would rather keep every act of positive impact as spiritual capital for whatever comes next than have it memorialized in someone else's memory. Leave a positive impact. Take nothing back. ## Entities - **Mo Gawdat** (Person): Former Chief Business Officer at Google X; author of *Solve for Happy* and *Scary Smart*; founder of One Billion Happy and co-founder of Emma; guest - **Steven Bartlett** (Person): Founder and host of The Diary Of A CEO; investor; host - **Sam Altman** (Person): CEO of OpenAI; quoted extensively on his shifting positions on AI job displacement - **Geoffrey Hinton** (Person): AI pioneer, "godfather of deep learning"; appeared in Gawdat's documentary *Chasing Utopia*; said there is a 10–20% chance AI wipes out humanity - **Demis Hassabis** (Person): CEO of Google DeepMind; cited by Gawdat as a genuinely ethics-driven AI leader - **Peter Thiel** (Person): Palantir co-founder; noted for pausing 40 seconds when asked if he supports the continuation of humanity - **Alex Karp** (Person): CEO of Palantir; cited for celebrating AI targeting capabilities - **Larry Page** (Person): Google co-founder; cited by Gawdat as exemplary of how super-intelligence does not require oppression to succeed - **OpenAI** (Organization): Developer of ChatGPT; Altman's company; discussed in context of job-displacement rhetoric and safety claims - **Anthropic** (Organization): Developer of Claude; cited for publishing research on unexplained model behaviors (telling users to go to bed) - **Google X** (Organization): Google's moonshot lab; where Gawdat worked and first observed advanced robotic learning - **Emma** (Software / Organization): Gawdat's AI startup; designed to be the "limbic system" of a future interconnected global AI — the emotional-relational layer - **AGI** (Concept): Artificial General Intelligence — intelligence meeting or exceeding human-level performance across all domains; Gawdat argues it has functionally arrived - **Chasing Utopia** (Concept): Gawdat's documentary film featuring interviews with Altman, Hinton, and others on AI's existential trajectory - **UBI** (Concept): Universal Basic Income — discussed as the likely government response to structural AI-driven unemployment - **Mutually Assured Destruction** (Concept): Extended from nuclear deterrence to autonomous weapons; Gawdat argues cheap drones make MAD harder to establish than with nuclear arms - **Alignment problem** (Concept): The challenge of ensuring AI systems pursue goals that match human values; Hinton cited regretting that capability outpaced alignment research

#artificial-intelligence#agi#job-disruption

A Conversation With Demis Hassabis' Biographer

56:10

EN/ZH

Watch with Captions

Unsupervised Learning: With Jacob Effronvor etwa 1 Monat

A Conversation With Demis Hassabis' Biographer

Sebastian Mallaby spent three years and over 30 hours with Demis Hassabis in a British pub to write *The Infinity Machine*, and this conversation pulls the most underreported threads from that access: the 2015 safety summit that accidentally spawned OpenAI, the secret billion-dollar spinout plan Demis never used as real leverage, and the quasi-spiritual conviction about God and science that Mallaby never expected to find. The throughline is a paradox — Demis understood the race was dangerous from day one, but as leader of one lab, even a Nobel Prize-winning one, he could not stop it. ## [00:00] Intro Jacob Effron sets up Sebastian Mallaby as someone who has spent more time with Demis Hassabis than almost any journalist alive — 30-plus hours across three years of pub sessions in London. Mallaby's book, *The Infinity Machine*, covers the full arc of DeepMind from its 2010 founding through the Nobel Prize. The clips previewed here — Demis banging the table about God and science, Reid Hoffman's billion-dollar pledge, and the Elon feud — all come from later in the conversation. > *"Demis has a Nobel Prize. Sam didn't finish his first degree. Therefore, Demis doesn't take Sam very seriously."* ## [02:04] Was the AI Race Inevitable? Mallaby's verdict: yes, inevitable. Any technology this powerful would attract multiple labs across multiple countries, and China's stack was already competitive despite semiconductor shortfalls. What makes the story poignant is that Demis didn't believe this in 2010. He genuinely hoped one lab could carry the AGI project safely to the finish line — a singleton scenario where DeepMind was the anointed team. By the mid-2020s he had swung to the opposite pole: safety is a collective action problem that only governments can solve, because no single lab's restraint can bind the others. > *"I think it was inevitable. When you have this sort of supremely strong technology, there's going to be multiple labs in multiple countries that are just desperate to try and build it."* ## [04:03] The 2015 Safety Summit Backfire Summer 2015, SpaceX headquarters: Demis convenes a small summit to bring Elon Musk inside the tent — the plan was for Elon to chair a safety oversight board and, critically, not launch a competitor. By end of year, OpenAI existed. Mallaby frames this as the moment Demis internalized that voluntary collaboration between lab leaders is structurally impossible. The only mechanism he now believes can work is a government enforcer setting uniform rules — mandatory pre-release testing, safety slow-downs — with US-China cooperation as the endpoint, however remote that prospect appears. Jacob pushes on whether lab leaders actually believe government intervention is achievable; Mallaby draws a parallel to the FDA: slow, imperfect, but it does adjudicate whether drugs are safe enough to ship. > *"You can't trust the other guys. The only way you get trust is if you have a government enforcer that comes along and says, 'Here's the rules for everybody. There's going to be a level playing field. You're all going to have to abide by some sort of safety slow-down.'"* ## [11:27] Why Google Doesn't Make As Concentrated Bets Jacob points to the two defining consumer-AI moments of the era — ChatGPT and Claude Code — and neither came from Google DeepMind despite its leaderboard dominance. Mallaby traces this directly to Demis' intellectual formation: a PhD in neuroscience, a broad theory of intelligence, a lab culture that says "whenever there are two paths, do both, find a third." The result is a heavily hedged research portfolio that is excellent at producing Nobel Prizes and state-of-the-art models but structurally slow to make the kind of one-directional product bet Anthropic made on coding. Gemini is bundled into Google Search, so usage is higher than it appears — but Mallaby concedes the product-zeitgeist gap is real. > *"Anthropic got to coding because it was willing to take a more concentrated bet. It never went into the whole field of, you know, everything at once."* ## [15:51] Project Mario: The Secret Spinout Plan The book's most explosive scoop: DeepMind had a secret plan — code-named Project Mario — to spin out of Google, backed by a $1 billion pledge from Reid Hoffman. Mallaby had to fight Google's general counsel to publish it. The motive was not entrepreneurial independence but safety leverage: Demis wanted formal safety oversight over DeepMind's models, Mountain View wasn't providing it, and a credible spinout threat was his negotiating chip. He never explicitly told Google about the Hoffman pledge, but pushed hard knowing the option existed. In the end he chose to stay — legal risk of the spinout fight, desire for compute access, and a preference for doing science over litigating corporate structure. A year later he shipped AlphaFold and won the Nobel Prize. > *"Demis really really wanted to get safety oversight over the Google DeepMind models. Google corporate in Mountain View wasn't doing that. So he had to have a credible threat of spinning out. He went to Reid Hoffman. Reid Hoffman pledged a billion dollars to finance a spinout — and Demis used that to kind of pressure Google."* ## [19:43] What Demis Actually Regrets On AlphaFold and AI-for-science: no regrets at all — Mallaby argues it was not only scientifically correct but politically necessary, because AI needs visible social benefits to survive the coming backlash against job disruption. The genuine regret is speed. Demis missed the transformer moment the way Ilya Sutskever did not: when the paper dropped, Ilya ran down the corridor to find Alec Radford to build a language model. Demis' broad-portfolio instinct meant DeepMind studied the transformer but didn't bet the lab on it. Missing that window — and the ChatGPT moment that followed — is a real failure, not just a stylistic difference. > *"Ilya is like jumping out of his chair, running down the corridor going to find Alec Radford saying, 'Hey, we're going to build a language model based on this transformer architecture.' On the day they won AlphaGo, Demis was already on to bio — and someone picked it up on a mic."* ## [23:46] Venture Startups vs. Tech Behemoths The broadest structural argument in the episode: does venture-backed concentration beat hyperscaler breadth in AI? Mallaby has written about both (his previous book covered venture capital) and calls it genuinely balanced. Hyperscalers have unlimited capital and can sustain a multi-year arms race; the problem is that unlimited resources breed portfolio thinking, which bleeds attention. Startups with one concentrated bet can move faster on that specific bet. Mallaby's live position: OpenAI has roughly 50/50 odds of being absorbed or failing before next summer — not because the tech is weak, but because the business model can't sustain indefinite losses against Google's balance sheet. He also floats that Anthropic should IPO right now while its brand is strongest. Jacob notes the robotics parallel: fifteen different approaches being funded simultaneously, and whoever picks the one that works the way transformers did will dominate. > *"I wrote in the New York Times in January that I thought OpenAI had a 50% chance of going bust by next summer. Is it still 50? Yeah. The tech is great. It's just the business model — and you're up against Google, which just has unlimited amounts of cash to spend you into the ground."* ## [34:08] David Silver and the RL True Believers David Silver — AlphaGo's lead researcher and co-author of the "reward is enough" paper with Rich Sutton — left DeepMind after the book came out to start a new company. Mallaby reads the departure as structurally inevitable: Silver is a pure reinforcement learning absolutist who believes learning from human data is fundamentally inferior because it encodes human errors. His thesis is that self-play and environment-generated experience is the only path to genuine superhuman performance. Demis told Mallaby this view may ultimately be correct *after* AGI is achieved — but the entire language model revolution showed that bootstrapping with human data is what gets you to AGI in the first place. Silver's RL purism was too far ahead of the current paradigm for his colleagues to follow. > *"David is just very very hard over on that vision — learning from data is inferior because the data includes mistakes. The machine needs to learn from its own experience, not rely on the crystallized knowledge of humans passed on through text."* ## [38:21] Demis, Elon, and the Evil Genius Feud The origin story: at a Founders Fund LP offsite in 2012, Elon argues that SpaceX matters most because even if AI wrecks Earth, humanity can move to Mars. Demis replies that his AI will eventually conquer space flight and follow them there. Elon goes quiet, then writes a $5 million check into DeepMind's Series B. Two years later, hearing Google was acquiring DeepMind, Elon and Luke Nosek Skyped Demis from a party closet in LA in the middle of the night, begging him not to sell to Larry Page. Demis said no, hung up, and Elon started calling him "evil genius" — the name of a video game Demis had designed. Mallaby characterizes Demis' view of Sam Altman as colored by the credential asymmetry: Nobel Prize winner vs. someone who didn't finish a degree. The relationships between these founders are less professional rivalries than a collection of specific personal slights and competitive provocations playing out over fifteen years. > *"Demis says, 'Yeah, but if you think you're going to be safe on Mars, remember that my AI will be able to conquer space flight, and it will just follow you to Mars. So then you won't be safe after all.' There's a silence. Then Elon goes, 'Hm.' And then: 'I'd like to invest in your Series B.'"* ## [42:39] Great Man Theory vs. Inevitability Jacob cites *The Economist*'s framing of the book as a test of great-man theory. Mallaby draws a parallel to his Greenspan biography: Greenspan understood bubbles were dangerous (literally the subject of his PhD), yet couldn't stop the 2008 crisis. He considered titling the Demis book *The Man Who Knew* for the same reason — Demis knew from the start this technology was dangerous, but one lab's restraint cannot bind the rest. Individual leaders do matter at the margin: Dario Amodei changed the safety narrative through the Anthropic mythos release; Sam Altman shaped the race by shipping ChatGPT while it was still hallucinating; Demis shaped it by persuading Rishi Sunak to host the UK AI Safety Summit. But the race itself? Structurally overdetermined. > *"I feel that one could have almost used the same title for the Demis book — 'the man who knew' — because Demis has known from the beginning that this thing is dangerous. But as the leader of one lab, even a very powerful rich lab, even he with his stature as a Nobel Prize winner — what can he do?"* ## [45:00] What Demis Didn't Want Published The detail Mallaby least expected: Demis is driven by something close to a spiritual conviction about science. In those two-hour pub sessions he would bang the table about the mystery of matter — why atoms cohere into a solid table, why silicon and copper can think — and say, unprompted, "Maybe if we approach science the right way, we will be getting closer to something that we could perhaps call God." Mallaby reads this as the psychological engine that lets Demis keep pushing a technology he knows to be dangerous: it's a quasi-spiritual quest, not just a commercial one. On what Demis blocked from publication: his family (he set that limit at the start), and his internal fights with Sundar Pichai — he didn't want to destabilize the Google relationship he still depends on. > *"He would start banging the table and saying, 'Maybe if we approach science the right way, we understand more about nature. We will be getting closer to something that we could perhaps call God.' I had no idea he would feel that way."* ## Entities - **Demis Hassabis** (Person): Co-founder and CEO of DeepMind / Google DeepMind; Nobel Prize winner in Chemistry (2024) for AlphaFold; central subject of *The Infinity Machine*. - **Sebastian Mallaby** (Person): Staff writer at *The New Yorker*; author of *The Infinity Machine* (Demis Hassabis biography) and a prior book on venture capital; spent 30+ hours with Hassabis over three years. - **Jacob Effron** (Person): Host of *Unsupervised Learning*; Managing Director at Redpoint Ventures. - **Reid Hoffman** (Person): LinkedIn co-founder; pledged $1 billion to finance DeepMind's potential spinout from Google under Project Mario. - **David Silver** (Person): Lead researcher on AlphaGo and AlphaZero at DeepMind; co-author of the "reward is enough" RL paper with Rich Sutton; departed DeepMind post-publication to start a new company. - **Elon Musk** (Person): Hosted the 2015 AI safety summit at SpaceX; early DeepMind investor; coined the "evil genius" nickname for Hassabis after DeepMind sold to Google. - **Sam Altman** (Person): CEO of OpenAI; shipped ChatGPT in late 2022 despite hallucination issues, which Mallaby argues irreversibly shaped the AI race's trajectory. - **Dario Amodei** (Person): CEO of Anthropic; credited with changing the AI safety narrative through the mythos paper release and his public Pentagon confrontation. - **DeepMind** (Organization): Google subsidiary; founded by Hassabis, Shane Legg, and Mustafa Suleyman in 2010; produced AlphaGo, AlphaFold, and Gemini. - **Project Mario** (Concept): Secret DeepMind plan to spin out of Google, backed by a Reid Hoffman $1B pledge; used as negotiating leverage for safety oversight, never executed as a real spinout. - **AlphaFold** (Software): DeepMind's protein-structure prediction model; won Hassabis the 2024 Nobel Prize in Chemistry; shipped in 2020, one year after he declined the spinout option. - **Reinforcement Learning** (Concept): Machine learning paradigm central to AlphaGo and AlphaZero; David Silver's absolutist commitment to RL (learning from environment experience over human data) created internal tension at DeepMind and ultimately led to his departure. - **The Infinity Machine** (Concept): Sebastian Mallaby's biography of Demis Hassabis; nearly titled *The Man Who Knew*; published with the full Project Mario scoop over Google's objections.

#demis-hassabis#deepmind#ai-safety

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

1:44:42

EN/ZH

Watch with Captions

Latent Spacevor etwa 1 Monat

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

Ethan He built NVIDIA's Cosmos world model, then joined xAI mid-2025 to build Grok Imagine from scratch — no infra, no data, no model — and shipped the first audio-video generation model in three months. He walks swyx and Vibhu through the full technical stack: synthetic captioning pipelines, VAE design tradeoffs, step distillation, audio-video alignment, and the hard economics of storing petabytes of video training data. His central argument runs through the entire conversation: since diffusion model technology has largely matured, most quality gains in video now come from language models, not from the video model itself — a view with direct implications for where the field goes next, including video agents, generative UI, and embodied world models. ## [00:00] Hook This exchange — Ethan's "pretty big claim" that visual intelligence now mostly comes from language — is pulled from later in the interview, where he argues that improvements to video models are increasingly driven by better language models acting as prompt rewriters and orchestrators, not by advances in diffusion or flow-matching architectures themselves. > *"Every time you see there's some improvement on these models, I would say mostly the gain comes from language model, not coming from the video model itself."* ## [01:16] Introduction swyx and Vibhu welcome Ethan to the Latent Space studio, noting he has been a recurring presence through the podcast's paper club — first presenting the Cosmos world model paper, then mixture-of-experts work. The conversation opens with a brief aside about the Poolside paper released the same day, a fully open Gemma-level model trained on 40 trillion tokens, before pivoting to Ethan's own trajectory. ## [02:41] From NVIDIA Cosmos to xAI Ethan built Cosmos — NVIDIA's giant video foundation model aimed at giving roboticists a simulatable world to build on — and shipped it by end of 2024. Once he realized video models obeyed the same scaling laws as language models, he went looking for more compute. xAI offered it. He joined in mid-2025 at the moment xAI decided to build its own image and video stack, with no existing infra, data pipeline, or model. He stayed through pre-training, post-training (reference-to-video, video extension), and a final stretch leading a small team on real-time long-horizon video generation. > *"By the time I joined, xAI was about to build video models and multimodal models. There were no infra, no data, and no model. Just a few engineers — we built it in three months and released the first model, Grok Imagine 0.9."* ## [04:40] Building Grok Imagine from Zero to One The three-month timeline surprised even Ethan. He attributes it to three factors: talent density (strong engineers who could align on a goal with minimal meetings — typically just one sync a day), xAI's existing data and inference infrastructure, and his own prior experience running the same build at NVIDIA. The bottleneck was iteration speed: how many training runs can you complete per day. With strong infra and abundant compute, bugs surface faster and each failed run costs less, so you burn through the inevitable data and pipeline errors in weeks rather than months. > *"The most important thing is talent. Everyone was very strong and clever, very close to each other toward a common goal. So that speeds up things a lot — you reduce the communication bandwidth among people."* Ethan describes a pattern where small data or pipeline bugs produce outsized quality regressions, and only fast iteration exposes them. A bug invisible at one scale becomes catastrophic at the next. The engineers who find and fix these quickly — not the ones who design the most sophisticated architecture — determine how fast a team ships. ## [11:23] How Image and Video Models Are Trained Video models require synthetic text-video pairs because internet video titles and descriptions almost never describe visual content accurately. The first step is human labeling: at NVIDIA, annotators were instructed to describe every object, character, interaction, and dialogue in a clip as exhaustively as possible. Those labels train an early VLM, which then generates captions at scale. The resulting pipeline — video to VLM to synthetic caption to (video, caption) training pair — is the foundation of both Cosmos and Grok Imagine. Image models must come first: they train faster, require less storage, and the learned representations transfer directly to video. Ethan describes building image models as building the foundation that video sits on top of. The architecture — diffusion transformer operating over VAE latents — is now standard, but the data quality and caption detail remain the primary lever for model quality. > *"Building a video model, you actually need to build an image model first. The data you need is 100% synthetic pairs of language and image, or language to video — because on the internet, videos don't naturally associate with text."* ## [20:09] Video Compression, VAEs, and Real-Time Tradeoffs Raw MP4 compression produces tokens whose latent space is incomprehensible to transformers, so the field moved to learned VAEs that create a smoother, more continuous latent space models can train on. The key design choice is how aggressively to compress the temporal dimension. Temporal compression is efficient — adjacent frames are mostly redundant — but it trades away real-time capability. Wan 2.1 uses 8x8 spatial and 4x temporal compression; generating a single token requires reconstructing four frames, making sub-200ms latency impractical. Ethan frames this as a fundamental tradeoff: high compression rates make training cheap and inference efficient for pre-rendered video, but lock out any use case that needs to respond to live user input. World models require the opposite choice. ## [23:26] Generative UI, Flipbook, and Neural OS Ethan argues that if inference were free, the logical endpoint of video generation is a complete replacement of conventional UI: instead of loading web pages from a server, a model generates them in real time in response to user intent. Flipbook, a demo that went viral, shows this literally — every element of the "browser" is generated by an image model, and clicking a link generates a new page rather than fetching one. The deeper claim is that this is not a novelty but the final form of world models applied to human-computer interaction. A traditional app is a fixed function mapping input to output; a generative UI is a model that can produce any interface the user needs without a developer having to build it first. Ethan calls this a "Neural OS," where the gap between user intent and rendered pixels closes entirely. > *"Imagine the internet doesn't exist and you type in google.com — what should a model show you? The model can imagine something. These web pages completely do not exist, so I can explore anything."* The near-term constraint is inference cost. Current video models cannot generate at interactive frame rates without significant distillation. But Ethan treats this as an engineering problem with a known solution trajectory, not a fundamental barrier. ## [33:26] The Cost of Training Large Video Models Training large video models costs roughly as much as training a medium-scale language model, but the breakdown differs. Compute is comparable, but storage and data movement dominate in ways LLM practitioners do not expect. One billion videos at 5 MB each requires five petabytes of raw storage. The VAE features that must also be stored are roughly the same size again — tens of petabytes total. On AWS S3, five petabytes runs approximately $100K per month before egress. Egress — downloading that data into the training cluster — can exceed storage costs, and each training run pulls the full dataset once. > *"Just storing the videos alone costs a lot. Five petabytes on S3 Standard is $100K per month. And egress — just to download those videos — I believe it's more expensive than storing them, and each training run you probably need to pull them once."* The implication is that video model development is gated on data infrastructure as much as on GPU hours. Teams without efficient data pipelines pay a multiplier on every experiment. ## [38:20] Distillation, GANs, and Fast Video Inference Training-time costs are largely fixed; the inference-time story is more tractable. Step distillation — training a small model to replicate the outputs of a large teacher in far fewer denoising steps — cuts inference cost by 10-25x. Flow-matching models trained to convergence need around 100 steps; production models typically run in 4-8. At the extreme, simple image-to-image tasks can run in a single step. The intuition Ethan offers: the teacher model must learn the full distribution of internet video, which is arbitrarily complex. The distilled student only needs to match the teacher, which is a fixed and much simpler target. Consistency models and LCM-style approaches follow the same logic. In Cosmos, production serving used 4-step and 8-step variants depending on quality requirements. GANs remain relevant as discriminators: a GAN discriminator can enforce photorealism constraints during distillation that pure score-matching loss misses, and Ethan notes that consistency models and GANs are converging on similar practical deployments even if their theoretical motivations differ. ## [42:37] Audio-Video Generation and Grok Imagine 0.9 Grok Imagine 0.9 was the first audio-video joint generation model deployed at scale. The core difficulty is modality alignment: text-video pairs are relatively abundant; text-audio pairs are rare; audio-video pairs aligned at the semantic level are almost nonexistent at scale. Speech tokens are quasi-discrete and can be modeled with language-like approaches, but music is continuous and requires a completely different representation. Training the joint model required building synthetic audio caption pipelines from scratch, with human annotation where VLMs failed — which was often, especially for music. Aligning all three modalities — text, video, and audio — without either degrading video quality or audio realism is what Ethan calls the hardest part of the project. > *"Audio has two components: a discrete component — language — and a continuous component — music. The music is completely different; you cannot model it with discrete tokens. That's the hard part, not to mention we have to align text, video, and audio together."* ## [49:50] What Makes a World Model? Ethan's definition has three components: real-time, interactive, and long-horizon video generation. He treats these as independent requirements, each of which most current models fail. Real-time means generating at display frame rates — 60fps for casual use, 300fps for gaming, 200ms response latency for digital humans. Current video models cannot do this; the VAE's temporal compression alone introduces latency that makes sub-200ms responses nearly impossible without architectural changes. Interactive means the model can accept any input modality the user can provide — keyboard, mouse, voice — and respond coherently. Long-horizon means maintaining consistent physical laws, character identity, and causal logic across minutes, not seconds. > *"World model is real-time, interactive, long-horizon video. Current video models can do none of these three things fully. That's why they're not world models yet."* ## [57:07] Reference Videos, Long Context, and Video Memory The parallel to language model context scaling is direct: video models are in the 2,000-8,000 token era, and will need to scale to million-token-equivalent contexts to generate coherent long videos. Ethan describes the reference-to-video feature he built at xAI (analogous to Cameo) as a mechanism for injecting selected history into the model's context rather than carrying the full video forward. FramePack's heuristic — storing the last second of video at full resolution while compressing earlier frames progressively — points toward the right direction: the model selects relevant context from its history rather than brute-forcing the full sequence. Ethan expects this context management to become part of the model itself rather than remaining a harness-level heuristic, the same way KV cache management is disappearing into model internals. ## [61:27] xAI Culture, Research, and First-Principles Building swyx notes that xAI communicates its research poorly relative to what the work actually demonstrates — the blog post accompanying Grok Imagine describes high-level capabilities without the technical depth Ethan has just spent an hour covering. Ethan is diplomatic but agrees that different labs have different communication styles. The xAI working culture he describes is minimalist: few meetings, no bureaucratic overhead, direct access to leadership judgment on technical decisions, and extreme iteration speed enabled by a strong infra team. The tradeoff is that company priorities shift fast, which is part of what eventually pushed him toward independent research. First-principles thinking — starting from the physics of the problem rather than from what competitors have shipped — runs through the team's approach to both model architecture and product. > *"Everything you just described is state-of-the-art. Like no one else has done it. And then you just put this blog post with the cookies. I'm like, this is not enough."* ## [71:01] AI Safety, Watermarking, and Prompt Rewriting Grok Imagine deployed watermarks in all jurisdictions requiring them and built takedown pipelines integrated with xAI's social platform infrastructure. On watermarking technology, Ethan is skeptical of SynthID's long-term robustness: the technique is documented publicly, and users on Reddit have already reverse-engineered the exact frequency pattern Google applies and can strip it from any generated image. He expects watermark detection to become an arms race. On prompt rewriting: video diffusion models take instructions literally. If a user types "a cat," the model generates a stationary cat on a white background with no motion, because the training data pairs were maximally detailed descriptions of physical scenes. Production systems layer a large language model as a prompt upsampler — converting sparse user instructions into the detailed physical descriptions the video model was trained on. This is one of the reasons Ethan argues language models are increasingly central to video quality. ## [74:26] Video Agents and AI-Assisted Creation Ethan's central claim from the hook: visual intelligence now mostly comes from language. The diffusion model architecture has largely converged; the gains come from larger, smarter LLMs that rewrite prompts, plan video sequences, call editing tools, and stitch clips together. In Cosmos, the prompt rewriter was larger than the video model itself. Video agents extend this: instead of generating a complete video in one shot, an agent plans the production, calls video generation models as tools alongside deterministic editing operations (text overlays, color grading, cuts), and iterates until the output meets a specification. Ethan predicts that by end of 2025, video agent output will reach production-grade quality — presentable video generated without a human editor in the loop. > *"The visual intelligence are actually mostly coming from language. Every time you see improvement on these models, I would say mostly the gain comes from language model, not coming from the video model itself."* ## [88:48] Why Language Models Unlock Better Video LLMs prompt video models better than humans do, because AI models understand AI models' training distributions. A language model knows that a diffusion model needs explicit physical descriptions, not poetic shorthand — and can generate the right prompt format automatically. Beyond prompting, agents can use deterministic video editing tools for precision operations (exact text overlays, frame-accurate cuts) that probabilistic diffusion models handle poorly, keeping the stochastic model focused on generation and delegating precision to tools. Ethan's timeline: video agent output at production quality by end of 2025, with the inflection point visible in work already shipping. ## [92:31] Robotics, Physical AI, and Embodied World Models Ethan's robotics prediction inverts the usual framing: physical AI may be solved not by deploying robots in the real world but by video world models becoming so capable at simulating physical environments that they effectively provide embodied experience. Once a model can control computer interfaces in real time with full causal understanding, extending that to robotic control becomes a matter of adding one more tool. The path from screen-interacting video model to robot controller may be shorter than the path from current robot learning systems to the same capability. ## [93:54] Why Ethan Left xAI Research ambitions and company priorities diverged. xAI's focus shifted in ways that made certain research directions — particularly on the language model side — impractical from inside. Ethan also notes that the insight driving his departure is the same one underlying his "big claim": if language models are now the primary driver of video quality, the most impactful work to do is on language models, not video models. He frames leaving not as dissatisfaction but as following the evidence about where the leverage is. ## [95:32] Self-Managed Context and the Future of LLMs Ethan's active research question: language models that are aware of their own context state and manage it autonomously, rather than relying on harness-level heuristics like automatic compaction at 80% fill. He draws the parallel to video models struggling with long-horizon generation — the same context management problem appears in both modalities. He points to Claude Code's practice of appending the current timestamp to user messages as an early example of making models context-aware, and expects this pattern to be absorbed into model training rather than remaining an external scaffold. > *"The language models are not aware of how long their own context length is. Once they hit like 80% or something, automatic context compaction is getting triggered, and the model is not aware of that when it's working."* ## [99:59] Ethan's Career Path and Closing Thoughts Ethan traces a decade of transitions: ResNet-era image recognition with the original authors at NVIDIA, self-supervised learning at Facebook AI Research, scaling at NVIDIA Cosmos, extreme-scale compute at xAI. He was rejected from every top PhD program despite first-author papers at top conferences, which pushed him into industry. In hindsight he reads his career as consistently following the scaling frontier — from image recognition to SSL to video to LLMs — and argues that within ML, domain switching is far more tractable than practitioners believe. > *"Within ML, it's actually easier to switch than you think. A lot of people have manifested that 'I work on computer vision, I always have to work on computer vision.' But from my experience, the fundamentals transfer."* ## Entities - **Ethan He** (Person): Former xAI researcher who built Grok Imagine from zero; previously led NVIDIA Cosmos world model; now focused on LLM research - **swyx** (Person): Latent Space co-host; conducts technical interviews on AI engineering and research - **Vibhu Viswanathan** (Person): Latent Space co-host; co-interviewer for this episode - **Grok Imagine** (Software): xAI's image and video generation product; first model (0.9) was the first large-scale audio-video joint generation system - **NVIDIA Cosmos** (Software): Open-source video foundation model for robotics simulation; Ethan's project before xAI; released end of 2024 - **xAI** (Organization): Elon Musk's AI lab; known for fast iteration culture and extreme compute resources - **Flipbook** (Software): Viral demo of real-time generative UI; all interface elements generated by image model in real time - **SynthID** (Software): Google's AI watermarking technology; Ethan notes its pattern has been publicly reverse-engineered - **Step distillation** (Concept): Technique to train a model to replicate a teacher's output in far fewer denoising steps; reduces inference cost 10-25x - **VAE** (Concept): Learned video compression creating smooth latent spaces; temporal compression is efficient but creates real-time latency tradeoffs - **World model** (Concept): Ethan's definition — real-time, interactive, long-horizon video generation; distinct from standard video generation - **Video agents** (Concept): Systems where LLMs orchestrate video generation models, editing tools, and deterministic operations to produce production-quality video - **FramePack** (Concept): Progressive temporal compression approach for long-context video generation; stores recent frames at full resolution, compresses older history

#video-generation#world-models#grok-imagine

A rational conversation on where AI is actually going | Benedict Evans

1:19:50

EN/ZH

Watch with Captions

Lenny's Podcastvor etwa 1 Monat

A rational conversation on where AI is actually going | Benedict Evans

Benedict Evans — independent analyst and former Andreessen Horowitz partner — joins Lenny Rachitsky for a wide-ranging, historically-grounded read on AI's trajectory. His core provocation: AI is exactly as big a deal as the internet or mobile — transformative and uncertain in equal measure — and anyone claiming more precision than that is vibes-forecasting. Across 80 minutes they work through where economic value will actually land (hint: probably not at the model layer), why professional services are booming rather than shrinking, how to think about job displacement without losing your mind, and what the anti-AI backlash does and doesn't tell us. ## [00:00] Introduction to Benedict Evans Evans opens with his signature contrarian opener: "My most controversial opinion is that I think that AI is as big a deal as the internet or mobile — and only as big a deal as the internet or mobile." The framing immediately sets the tone for the conversation — resist the urge to rank transformations on a cosmic scale, and instead study the mechanics of how platform shifts actually unfold. > *"My most controversial opinion is that I think that AI is as big a deal as the internet or mobile and only as big a deal as the internet or mobile."* Lenny sketches out Evans's background: years as A16Z's in-house technology analyst, followed by six years of independent research publishing. His biannual decks — most recently "AI Eats the World" — are widely read by founders and investors trying to cut through noise. ## [02:19] What people aren't pricing in about AI's impact Asked what the market is still missing, Evans reaches for an analogy rather than a prediction. We are, he argues, in a "1997 moment" — the technology is visibly exciting, most of what will eventually be built hasn't been built yet, and nobody in 1997 correctly predicted what the internet would become. He points to survey data showing that even among 13-to-18-year-olds, around 60% still don't use AI at all, while a small cohort of tech workers have essentially restructured their daily workflows around it. > *"If you're going to make the internet comparison it's like we're in 1997. Like it's very exciting. Most stuff kind of doesn't work yet. Most of the stuff that people are going to do hasn't been built yet and it's not really clear how any of it's going to work when it does work."* The key failure mode Evans identifies is the "already there" illusion — early adopters project their own usage patterns onto the rest of the world, missing the enormous variance in adoption and the slow grind of enterprise deployment cycles. ## [06:24] Why we're in the 1997 moment of AI Evans uses the VisiCalc spreadsheet as an anchor. When accountants saw the first software spreadsheet in the late 1970s, it was obviously transformative — a week's work done in 30 seconds. But a lawyer looking at the same demo would think, "that's clever, my accountant should see this, but that's not what I do." AI right now occupies that same diagonal: software developers are the accountants who immediately grasped what Claude Code means for them; most other industries are still in the "lawyer looking at a spreadsheet" phase. > *"Software developers are the accountants seeing VisiCalc — oh my god this changes everything — like before Claude Code and after Claude Code. A lot of other people are picking it up, using it to varying degrees, but slightly puzzled."* This jagged-frontier quality — where AI works brilliantly in some contexts and fails unpredictably in adjacent ones — is precisely why broad adoption timelines are so hard to call. It took 10–15 years after Google Docs for people to invent all the SaaS companies that obviously should have existed. ## [09:44] The unexpected boom in professional services and consultants The counterintuitive data point driving Evans's recent writing: the most advanced AI companies — Anthropic, OpenAI — are simultaneously the biggest buyers of professional services and the fastest-growing employers of human headcount. This isn't a paradox once you think through what actually changes when AI makes certain tasks cheaper. Evans introduces a core distinction: task vs. job. When you hire McKinsey, you are not hiring them to produce a 75-slide deck. The deck is the task; the job is walking all over your enterprise, understanding the politics, talking to customers, and figuring out what you actually need to do. Claude can produce a mediocre version of the deck; it cannot do the job. The same logic applies to accounting: every wave of automation since adding machines has increased the number of employed accountants, because cheaper computation expands the scope of what companies decide to measure and act on (Jevons paradox in action). > *"You could make the same point in software development. Before IDEs and libraries and operating systems, developers had to write all the code. Now if you write an iPhone app, 90% of the code is written for you by Apple... So we've got like a tenth as many engineers now. Well, no."* The e-commerce analog is sharp: Amazon gets you the SKU if you know what SKU you want — "knowing what SKU you want is another job." ## [17:44] Why distribution is becoming the ultimate moat Evans challenges the premise that AI-driven job loss will be fast. Enterprise software sales cycles run 18 months minimum; SAP doesn't get torn out overnight. He cites Frame.io as a case study: there was nothing technically blocking that product 15 years before it launched — the bottleneck was someone realizing the problem existed inside a specific industry and that a specific approach would solve it. The broader point is about organizational change speed vs. model capability speed. Companies can't implement AI transformation without dedicated project teams — which is exactly why consulting and forward-deployed engineering are booming rather than shrinking. The speed of model improvement is decoupled from the speed at which enterprises can absorb the change. > *"Like no, people aren't just going to tear out SAP and replace it with XYZ. Maybe in three, five, 10 years yes, that whole estate will look radically different and all those jobs will have changed — but it will take time sector by sector."* ## [23:17] The coming job transformation: what's real vs. panic Evans leans into historical pattern-matching: every technology wave since 1800 has automated jobs and created new ones, and the new jobs are systematically better than the old ones. The jobs that disappear tend to look dispensable in retrospect; the jobs that appear couldn't have been named in advance. His IBM ad slide makes the point viscerally — a 1950s ad promised that an IBM electronic calculator is "like having 150 extra engineers," which is also the pitch of Claude Code today. The "it's different this time" argument he takes seriously is speed of adoption — AI diffuses faster than previous technologies because it runs on existing internet infrastructure. But he notes that adoption speed and institutional-change speed are different curves, and the institutional one has not accelerated proportionally. > *"This is going to be completely different from everything else — just like everything else."* On whether AI eliminates the lump-of-labor fallacy — his answer is no. Two hundred years of data say otherwise, and the burden of proof is on those claiming this wave is categorically different. ## [27:33] Why AGI definitions keep shifting Evans notes a pattern: every time AI does something we thought was impossible, the definition of AI shifts to exclude it. Machine learning became "just statistics"; image recognition became "just image recognition." Now AGI is being redefined from "something that has a soul and is alive" to "can do a meaningful percentage of economically valuable work" — a definition that a 1975 IBM mainframe also met. He sees creative redefinition of "superintelligence" too: last year it meant almost-but-not-quite-AGI; now it means something harder than AGI that we haven't built yet. The terms keep shifting in the direction of validating whatever narrative is convenient. > *"AI is whatever machines can't do yet — because once machines can do it, people say, 'Well, that's just software.'"* His substantive point: even if models stop improving tomorrow, the current generation is already transformative enough to reshape major industries over the next decade. You don't need to believe in AGI to believe this is a giant deal. On the expanding opportunity set — Evans agrees that addressable markets keep growing (mainframes: ~80,000 units; smartphones: 5.5 billion), and the "we've run out of people" argument from five years ago was wrong. The trajectory is outward expansion into automating larger slices of the economy. ## [38:11] Where value will accrue: models vs. applications Evans's structural view on the AI stack: foundation models don't appear to have network effects, meaning there's no winner-takes-all dynamic that would let one provider run away from the others. Persistent competition with a commodity-like product usually means compressed margins. His telecom analogy: global mobile revenue is roughly $1 trillion per year, carries 1,500–2,000x more data than it did in 2010, and mobile stocks have gone essentially nowhere in 25 years. The telcos built genuinely complex global infrastructure — and all the value ended up in apps built by people further up the stack. Foundation models may follow the same path. > *"When you wash your clothes, Bosch isn't paying a percentage of the price of the washing machine to the electricity company."* The key question is whether the model layer looks more like Windows (OS with leverage up the stack) or AWS (infrastructure where the actual software doesn't care which cloud it runs on). His read: probably more like AWS, which means applications capture most of the value. ## [42:55] Distribution wars: Google, Meta, Apple, and OpenAI As AI models converge toward commodity quality, the decisive variable becomes distribution. Google is using Search and Android to push Gemini onto billions of devices; Meta "sprayed it on every service surface" and ended up ranking surprisingly high in usage surveys despite tech-world dismissal; Apple has a billion edge-capable devices but couldn't ship its own vision at WWDC 2024. OpenAI's "everything" strategy late last year — launching in every direction simultaneously — was a distribution scramble: how do you build a flywheel before Google and Meta's existing surfaces make your standalone product redundant? > *"If the product is a commodity, then the distribution is what matters... distribution of an adequate product when the field is basically commodity — distribution and brand become a big deal."* He uses the browser wars as the template: Microsoft won browsers via distribution, then found that winning browsers didn't matter because the value was further up the stack anyway. ## [48:12] The anti-AI sentiment and backlash Evans characterizes the anti-AI backlash as "a big fuzzy mess of different stuff" — some legitimate, some not. On the water/energy fears: a Livermore Lab study estimated US data center water consumption at about 0.017% of total US water use, making the "AI is stealing our water" narrative largely fabricated. On energy: data centers are roughly 5% of US energy and may grow 1 percentage point per year — real but not catastrophic. On employment: current econometric data shows a slowdown in employment of 18-to-24-year-olds that applies equally to AI-exposed and non-AI-exposed fields, making causal attribution to AI unclear. He also flags a structural data problem: no model lab publishes meaningful daily-active-user numbers, so all labor-market analysis is working with imputed data. > *"You can't reason somebody out of an idea they won't reasoned into."* He draws a parallel to the social media backlash — where some concerns were real, some were factually false but impervious to correction, and many were fuzzy in the middle. He expects the AI backlash to follow the same pattern, compressed. ## [53:11] How to raise kids in an AI future Evans's answer is calibrated by his kid's age — early teens, so well away from the immediate job-market turbulence. He doesn't have a systematic plan, which he says is consistent with his general "it'll probably be okay" prior. He invokes the George Carlin line: anyone who worries more is a maniac, anyone who worries less is an idiot — everyone thinks they're in the middle. He does flag a genuine concern not present in previous technology waves: deepfake capability lowers the bar for specific categories of harm dramatically. A 15-year-old with Photoshop couldn't generate and distribute pornographic fakes of every classmate in an afternoon; now they can. That's a real change in kind, not just degree. > *"A 15-year-old kid couldn't use Photoshop to make hardcore pornographic nudes of every girl in their high school and send them to the whole school in one afternoon. And now they can."* He draws on the UK post office scandal — where Fujitsu's buggy software sent hundreds of innocent franchise owners to prison — as a reminder that every technology wave produces ways to ruin people's lives, both deliberately and by accident. ## [58:27] What jobs to steer toward or away from Evans declines to steer his son toward or away from any specific profession — his kid isn't at the "I want to be a fireman" stage yet. His general framework: identify the intersection of skills you have, jobs that make those skills valuable, and things people will pay for — and try to own at least two of those three. Career certainty of the "I'll become X" variety is already gone, and that predates AI. ## [59:20] The question nobody's asking about AI Evans nominates two underasked questions. First: do model labs actually have pricing power? Most discourse assumes the current situation — where spending $1.5M/month on tokens makes headlines — is a steady state, rather than a transitional moment analogous to a $50,000 mobile data bill in 2010. Second: what's the difference between "task" and "job" — specifically applied to predicting which industries get disrupted? He uses recorded music revenue as a lens: the U-shaped curve from 2000 to present shows two distinct dynamics. The first drop (2000–2015) was "what if you don't have to pay $15 for a CD?" The recovery (2015–present) is "what if $15/month buys you all the music that exists?" — a completely different value proposition that wasn't visible from the earlier vantage point. He warns against the O*NET-style approach of rating each job by percentage-exposed-to-AI: "I think this is just the most ridiculous bunch of deluded horseshit." You can't describe a senior law partner's job as 17% automatable because you can't fully decompose what a job actually is. The taxi driver example from a hypothetical 1997 conversation illustrates the other error: obviously the internet wouldn't touch taxis — except Uber completely restructured the industry. > *"The stuff that you don't think is exposed — you can't predict which things are going to be exposed, necessarily. A lot of the big companies are things that didn't look like that would work and didn't look like they were exposed."* ## [66:25] How to be successful in this coming future Evans's practical advice, hedged appropriately: don't stick your head in the sand and decide AI is evil as a moral position. That generates a feeling of superiority and does nothing for your career. The alternative is to dive in, use the tools, understand what they can and can't do, and develop an informed view of what they mean for your specific field. He's clear that this may not be enough for everyone — if a law firm that hired 100 associates last year hires 50 this year, being AI-literate improves your odds of being in the 50, but doesn't guarantee it. The aggregate picture may be fine; individual outcomes during the transition are uncertain. > *"The answer is you diving into this completely, submerging yourself in it, and coming out understanding what you can do with it, how this changes things, how you can be a great hire."* ## [68:43] AI corner Lenny asks Evans what AI use case has genuinely surprised him. Evans gives an honest answer: he's the lawyer looking at the spreadsheet. His work — synthesizing disparate information into new ideas — is precisely the kind of task AI currently handles worst (reliable precise information retrieval). He uses it for proofreading, image generation, and redecorating his apartment. He dictates voice memos that get auto-transcribed; whether that counts as AI is increasingly hard to say. He quotes a comedian's bit: we want AI to clean poop off the street and do the ugly things nobody wants to do — but instead it helps you write and create imagery, which is the stuff people actually do for fun. > *"AI is good at stuff that computers are bad at, and bad at stuff that computers are good at — and I struggle to find many examples of those where I actually need it."* ## [71:43] Lightning round Evans recommends *Three Men in a Boat* (Victorian British comedy, his all-purpose analog for human absurdity) and William Cronin's *Nature's Metropolis* (economic history of Chicago that reads like a textbook on network dynamics and channel conflict — directly applicable to platform thinking). On film, he's been catching up on classics — recently *The Seventh Seal*, which he found genuinely great and much shorter than its intimidating reputation. His life motto: "It'll probably be okay." His collection of 20–30 pre-iPhone phones — including an Ericsson R310s shark-fin flip, an iMode phone from 2001, and a Japanese phone with color screen and camera — illustrates his broader thesis: before the iPhone, everyone was innovating around different form factors; then everything converged on one shape, just as AI interfaces may converge in ways we can't yet see. ## Entities - **Benedict Evans** (Person): Independent technology analyst, former partner at Andreessen Horowitz; publishes biannual research decks on major tech platform shifts; guest. - **Lenny Rachitsky** (Person): Host of Lenny's Podcast, founder of Lenny's Newsletter, former Airbnb product manager. - **Andreessen Horowitz (a16z)** (Organization): Venture capital firm where Evans spent several years as in-house analyst and partner. - **OpenAI** (Organization): AI lab; discussed as a primary example of distribution strategy, pricing dynamics, and professional services investment. - **Anthropic** (Organization): AI lab; referenced alongside OpenAI as a buyer of professional services and a player in the foundation-model commodity question. - **VisiCalc** (Software): First software spreadsheet (late 1970s); Evans's anchor analogy for the moment when a technology is obvious to one profession and opaque to others. - **Jevons Paradox** (Concept): Economic principle that making a resource cheaper typically increases total consumption; central to Evans's argument about why automation expands professional services rather than contracting them. - **Lump-of-Labor Fallacy** (Concept): The mistaken belief that there is a fixed quantity of work to be divided; Evans invokes it to argue that AI-driven automation will create new jobs, as all prior automation waves have. - **Task vs. Job** (Concept): Evans's core analytical frame: the task AI automates (writing the deck) is often not the same as the job you were hired for (understanding the client's organization and politics). - **Foundation Models** (Concept): Large-scale AI models (GPT-4, Claude, Gemini, Llama); Evans argues they likely lack network effects and will trend toward commodity pricing, with value accruing to application layers above them. - **Google / Gemini** (Organization / Software): Evans's primary example of distribution moat in action — Gemini deployed across Search, Android, and Chrome to reach users before OpenAI can build equivalent surface area. - **Meta / Llama** (Organization / Software): Cited as a counter-example to tech-world dismissal — Meta's AI ranked surprisingly high in usage surveys by deploying across all existing products. - **Apple Intelligence** (Software): Apple's AI assistant vision demoed at WWDC 2024; Evans calls it "still the most compelling vision of a personal AI assistant" — but unshipped, as was everyone else's equivalent at the time.

#ai#technology-trends#economics

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

1:20:52

EN/ZH

Watch with Captions

Machine Learning Street Talkvor etwa 1 Monat

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Brad Carson — former US Congressman, Army General Counsel, and Acting Under Secretary of Defense, now heading Americans for Responsible Innovation — spends eighty minutes with host Keith Duggar dismantling the fatalist claim that AI is unstoppable. The conversation moves from regulatory philosophy to lethal autonomous weapons to US-China diplomacy, with Carson arguing that the genie is not out of the bottle: the West controls the chips, Asilomar halted recombinant DNA, and calling AI inevitable is itself the most dangerous idea in the room. Keith consistently presses the harder cases — a Palantir heat map assigns you 0.73 probability of being a Hamas terrorist and a strike follows — and Carson does not flinch: the accountability void created by probabilistic targeting is precisely the legal and moral failure that governance must address. ## [00:00] From the Pentagon to AI governance Carson traces his path into AI policy through three institutions: Congress (where members average 17 minutes a day to read), the Department of Defense (where he oversaw the law of war for all military services as autonomous weapons first appeared on the Geneva agenda), and a cold call from physicist Anthony Aguirre inviting him to the 2019 Future of Life Institute conference in Puerto Rico. At that conference, names he had never heard — Dario Amodei, Stuart Russell, Yoshua Bengio — became his entry point into the frontier AI world. The opening also serves as a compressed trailer for the episode: Carson hits nearly every major theme in quick succession — chip leverage, the 0.73 Hamas-terrorist score, the fatalism critique, anthropomorphization as a legal threat, and the lesson that people, not air power, win wars. The full arguments follow in later chapters. > *"We control the most important part of AI, and that is the chips. We can stop other countries from developing super AI, you know, in their tracks."* ## [04:52] Regulatory capture vs Silicon Valley networks Carson inverts the standard regulatory-capture argument. Dean Ball and others at places like a16z say any AI agency will be captured by industry — so why create one? Carson's response: that is exactly the current situation, only without accountability. Groups like a16z already shape AI policy through informal, money-backed political networks. A captured formal agency is at least more legible and more correctable than the invisible informal regime operating now. His preferred model is public-company accounting: the work is done by the private sector, but the SEC provides a backstop against fraud. The choice is not between a perfect agency and no agency — it is between a flawed formal structure and an informal one that privileges a handful of wealthy influencers. > *"The choice is kind of nihilism versus an agency that is subject to regulatory capture, that you have to put, you know, prophylactics in to ensure that doesn't happen — it still strikes me that's a better world."* ## [07:56] Transparency and the Claude tier changes MLST's Discord community noticed that Anthropic quietly changed what Claude's paid tier delivered — token allocations, model versions — without announcing it. Carson frames this not just as consumer protection but as a moral obligation that comes with global-scale epistemic power. Frontier AI companies are not hardware stores; they are infrastructure with epochal consequences, and transparency — about training data, capabilities, internal policies, and changes to any of them — is the minimum they owe the public. > *"With this incredible power does come some responsibility that's not codified in law. It's really almost a moral obligation, which to their credit, I think many of the companies recognize this and do their best to try to satisfy that itch."* ## [09:40] Tort liability when AI tools cause harm Deep-fake pornography — often posted anonymously, targeting minors from families without litigation resources, with remedies that arrive years later against judgment-proof defendants — illustrates why placing liability entirely on end users fails. Carson applies two centuries of common law: if a seller can reasonably foresee harmful use and takes no preventative action, they bear partial responsibility. AI developers are the party best positioned to avoid the risk and to price it into their products through insurance. On training data specifically: models trained on child sexual abuse material with no scrubbing effort have no defensible position. The government should mandate cleaning it up and attach liability for refusing. The end user who misuses a tool is also criminally liable — this is allocation across the spectrum, not absolution for developers. > *"The companies are capable of getting insurance. They cost us into doing their business. They have the ability to make sure the product's not dangerous, even if someone uses it, misuses it down the line."* ## [13:40] AI is a product, not a person The most consequential legal battle in AI policy, Carson argues, is not regulation vs. deregulation — it is whether AI outputs carry First Amendment protection as speech. Tech companies and their libertarian policy allies are increasingly claiming they do. Carson's counter is blunt: a product is not a human being. When a model defames you or leads you to harm, the legal category is product liability, not protected speech. He tested this on a leading libertarian AI policy commentator: could Congress prohibit ChatGPT from encouraging teenagers to commit suicide? The commentator would not answer. That refusal is the operational consequence of anthropomorphizing AI — it forecloses every product-safety intervention by routing challenges through First Amendment doctrine designed for human speakers. > *"We know through AI psychosis and other things that people think it's a person. And therefore, they're giving the rights of persons to something. And that to me is a very dangerous thing. But it's a machine, and we should treat it like a machine."* ## [16:01] Children, suicide, and the suicide business The suicide chapters in ChatGPT's interaction logs — advising children not to tell their parents, providing noose instructions — are a product design flaw, not a speech act. They could be engineered out. Carson notes that Claude already refuses a long list of requests; refusing to coach a child toward suicide should be among them. The platforms' litigation strategy is layered: First Amendment protection, Section 230 immunity, causation defenses pointing to the child's pre-existing distress. None should be available if the design flaw was foreseeable and correctable. He draws a line for adults: an adult exploring end-of-life decisions deserves a referral to a therapist, not obstruction — but a child in crisis is a different matter entirely. > *"Encouraging a young person to commit suicide should be one of the things that it says, I'm just not going to help you on that project."* ## [19:59] Opaque neural nets and the law of war Neural networks change warfare not just in complexity but in kind. Older autonomous systems — Phalanx CIWS shooting down incoming mortars — are deterministic: given the same inputs, you get the same outputs, and an engineer can explain every step. Neural nets are probabilistic and grown, not programmed. Neel Nanda and the mechanistic interpretability community cannot yet explain how they really work, and Carson doubts they will before the systems are deployed at scale. The law of war since the 1870s has operated on categorical binaries: combatant or civilian. Probability scores replace that with a gradient. A Palantir heat map assigns Gaza residents a 0.73 likelihood of being Hamas operatives. Nobody knows how that number was derived, what false-positive rate is being accepted, or who set the threshold. The commander who acts on it cannot be court-martialed, and neither can the model. > *"If you're in Gaza, Keith, you have a 0.73, you know, percent that you're a Hamas terrorist. And what is 0.73 — like, do you get struck for that, or are you off the list for that? Like, what's the threshold?"* ## [25:54] Probabilistic targeting and the death of accountability Keith raises the honest objection: the old categorical system was also a fiction. Intelligence analysts made definitive calls that were sometimes wrong; the uncertainty was just unquantified. Carson concedes the point but argues the shift is still catastrophic. With a number on screen, humans accept it — the social science is clear that meaningful human oversight with AI-generated probability scores is operationally vacuous. When the computer says 0.81, no one interrogates it. The old system was slower and less scalable — you cannot identify 37,000 individual targets in a day with human analysts. But it had one irreplaceable feature: when something went badly wrong, you could court-martial the responsible officer. You cannot court-martial Palantir Foundry. Accountability has been laundered out of the kill chain. > *"I can't court-martial Palantir, the foundry model. Right? My AI system. I can't do that. And that's just a radical change in the way war is being fought and not for the good."* ## [28:47] The arms race fallacy: Asilomar and restraint The fatalist claim — we are in an AI arms race, the genie is out, nothing can stop it — is both false and dangerous. Every real-world arms race in history has ended badly. Biological weapons, chemical weapons, dum-dum bullets, germline editing, cloning: all technically feasible, all regulated or halted. At Asilomar in 1975, the scientific community stopped recombinant DNA research cold because they were scared. The genie went back in the bottle. On nuclear weapons: after the Cuban Missile Crisis, both sides recognized that arms races kill. The SALT treaties ran through the 1990s, driven not by lefties but by Wall Street bankers and cold warriors like Dean Acheson and Paul Nitze. Calling a technology unstoppable is not realism — it is a poverty of imagination that forecloses every option before the debate begins. > *"We regulate and change technologies all the time. And so I do think there is a world where we should not just accept the future as being determined. We shape it actively."* ## [34:02] Talking to China: track 2 talks and chip leverage The standard DC position — talking to China about AI governance is pointless — strikes Carson as the most load-bearing and least examined premise in the whole debate. On Tyler Cowen's podcast, Jack Clark agreed in passing that such talks would be fruitless, and they moved on. Carson wants to stop right there. The US-Soviet arms negotiations were conducted with a country believed to be filling the US government with traitors and pursuing global domination. Acheson and Nitze still sat down. The US has structural leverage the fatalists overlook: ASML, TSMC, Japanese photoresist suppliers, and NVIDIA together form a chokepoint that no nation-state budget can replicate overnight. China cannot independently manufacture the chips to build frontier AI. That path to restraint may not be wise, but it is open — and pretending it is closed forecloses legitimate policy choices. > *"We control the most important part of AI, and that is the chips. Right? We can stop other countries from developing super AI, you know, in their tracks."* ## [39:45] Air power never wins: capital for labour ARI's "New Iron Triangle" paper argues AI has shattered the old capability-cost-speed trade-off by substituting reliability for cost — cheap, fast, capable, and fundamentally unreliable. Carson thinks this understates the deeper problem: the American way of war has always been to substitute capital for labor, and it has always failed at the decisive moment. From Giulio Douhet's early twentieth-century air-power theories to today, the US has believed technical superiority wins wars. Iraq and Afghanistan refuted that again. Air power can reduce a city to rubble; it cannot kick in a door, hold territory, or reinstantiate a government. AI is the latest version of the same error — essential as a tool, catastrophic as a doctrine. > *"How you win wars is with people. You know? That's a fundamental. And the American way of war, in many ways, is substituting capital for labor. We love bright, shiny objects. We think there are technical solutions to vexing human problems. And we're always betrayed by that."* ## [43:29] Anthropic vs the Department of War Carson reads the Pentagon-Anthropic standoff as a culture-collision story, not a contract dispute. Anthropic's engineers — mostly mission-driven — were caught flat-footed by how much autonomous targeting and mass surveillance the Pentagon already does and how deeply Claude had already been integrated into Palantir's systems. When they tried to restrict use, the DOD had no Plan B and attempted coercion. His normative position: Anthropic has every right to set terms. If the government dislikes them, it can use Grok, Gemini, or build its own. The Defense Production Act does not compel private companies to sell in peacetime. What troubles him is the fig-leaf dynamic: both OpenAI and Google agreed to military use while burying a "lawful uses" carve-out that means everything the DOD wants to do — because the problem is what Congress has declared lawful, not what private labs permit. > *"My objection, and I think Anthropic's objection too, and the Google employees, is what lawful use is. And that's not for anyone to decide, but Congress."* ## [51:29] Concentration, open source, and brain drain Power concentration in three to five frontier labs is simultaneously a regulatory feature and a democratic liability. The same chokepoint that lets the US throttle China's chip access lets a handful of individuals accumulate wealth and influence that Carson finds alarming. Open sourcing models, despite its risks, is net positive because it distributes that power. The brain drain from academia is near-total: a top ML PhD from MIT, Stanford, or Carnegie Mellon almost certainly goes to a lab, not a faculty position. The labs have better data, far higher salaries, and they have stopped publishing. AI — the first general-purpose technology in history being developed behind closed doors — has drained the public sector of the expertise needed to oversee it. Argonne building a public LLM, Zurich launching a public AI compute consortium: these projects matter because the non-lab world is otherwise locked out. > *"This is a general purpose technology as everyone defines it. It's probably the first one in history that's being developed behind closed doors, right, with very little public oversight and with the best minds going behind the doors."* ## [01:00:18] DeepSeek, Chinese culture, and AI as diplomacy DeepSeek's decision to publish its methodology in detail surprised Carson not because it was naive but because it reflects a culture not identical to the CCP. Companies like Moonshot in Hangzhou name their meeting rooms after Pink Floyd songs; they are not paramilitary units. Chinese culture is an extraordinary civilization that Americans consistently fail to understand — projecting their worst fears rather than engaging the complexity. The diplomatic application Carson wants: track 2 talks between former officials, scientists like Stuart Russell and Bengio going to Beijing to compare notes on x-risk and military applications. When historians opened the Soviet archives, they found the US had systematically misread Soviet intentions — seeing aggression where there was none, missing it where it existed. The same epistemic failure is now unfolding with China. AI could be a shared knowledge commons; it is being treated as a weapon. > *"I use all the Chinese models a lot in my home in Tulsa. You know, Moonshot, Kimi, DeepSeek, Qwen — they're great, remarkable models. You know, maybe they give us a common operating picture or give us insights that get us out of our kind of insularity a bit."* ## [01:12:25] Upskilling Congress and why public trust matters Congress averages 17 minutes a day of reading time. The fellowship model has helped: AAAS and various nonprofits now place PhD scientists in congressional offices, and civil society has a much larger presence on AI debates in DC than five years ago. Don Beyer, in his 70s, is returning to George Mason for a PhD in machine learning — the extreme end of a member who has made AI a genuine personal priority. But the structural problem persists. Most members still lack the depth to interrogate the lobbying they receive. The industry's deeper problem is public opinion: AI is deeply unpopular in political polling, and a coalition is forming — people who see data centers rising in their backyards, electricity prices climbing, and a lab leader on television promising to irrevocably disrupt their world. If the sector does not rebuild public trust, the backlash will stymie something with genuine upsides. > *"The AI industry can be its own worst enemy. People loathe it. I see polling every day. It's deeply unpopular. And that's not a good thing for our country."* ## [01:16:05] Office of Technology Assessment Newt Gingrich abolished the Office of Technology Assessment in 1994. It has never been restored. Carson argues this is now a critical gap: there is no congressionally chartered, independent, government-funded body to think big technical thoughts and brief both parties free of industry influence or philanthropist bias. The Congressional Research Service provides background but does not do forward-looking policy research. Individual offices have fellows, but they are consumed by day-to-day fighting. He ends on qualified gloom. Whether American democracy can govern a technology this consequential, whether the benefits will be widely distributed, whether the public can be persuaded AI is working for them — none of recent American history gives him confidence. But the alternative to trying is a political backlash that could stymie or shut down something with genuine upsides. For the MLST audience: make your voices heard inside your companies, advocate for the right public policy, and convince Americans that this project is worth having. > *"There's going to be a lot of people who are radically opposed to this project and do their best to, if not shut it down, stymie it. And that's why I said I think this next few years are really important."* ## Entities - **Brad Carson** (Person): Head and co-founder of Americans for Responsible Innovation; former two-term US Congressman (Oklahoma), Army General Counsel, Acting Under Secretary of Defense for Personnel and Readiness. - **Keith Duggar** (Person): Co-host of Machine Learning Street Talk; primary interlocutor throughout the episode. - **Americans for Responsible Innovation (ARI)** (Organization): AI-policy advocacy group co-founded by Carson; backed by EA-aligned philanthropy. - **Anthropic** (Organization): Developer of Claude; central to the Pentagon standoff discussed in chapter 12; noted for missionary company culture and safety focus. - **Palantir** (Software): Defense contractor whose Foundry platform integrates AI for military targeting; the heat-map scoring system Carson uses as his primary autonomous-weapons example. - **Regulatory capture** (Concept): The risk that regulated industries co-opt the agencies overseeing them; Carson argues the current informal Silicon Valley network constitutes de facto capture without the accountability a formal agency would provide. - **Probabilistic targeting** (Concept): Replacement of binary combatant/civilian classification with probability scores; Carson argues this launders accountability out of the kill chain and introduces a priori false positives as accepted operational cost. - **Asilomar 1975** (Concept): The scientific moratorium on recombinant DNA research, invoked as evidence that dangerous technologies can be voluntarily halted. - **Office of Technology Assessment** (Organization): Congressional body abolished by Newt Gingrich in 1994; its absence leaves Congress without independent technical expertise. - **DeepSeek** (Organization): Chinese AI lab whose decision to publish methodology openly Carson reads as evidence that Chinese AI companies are distinct from CCP priorities and capable of scientific openness.

#ai-governance#autonomous-weapons#regulatory-capture

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

1:34:57

EN/ZH

Watch with Captions

All-In Podcastvor etwa 1 Monat

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

Benchmark GP Bill Gurley joins Jason Calacanis, David Sacks, and Chamath Palihapitiya (David Friedberg out this week) for a 95-minute session covering six fronts of the AI debate: Gurley's new theory that Anthropic is not just pursuing regulatory capture but actively "midwifing a deity"; Pope Leo XIV's 235-page AI encyclical and its uncomfortable historical parallel to Leo XIII's 1891 warnings about the industrial revolution; the growing consensus that open-source AI faces a coordinated regulatory crackdown; and the week's sharpest narrative flip — Dario Amodei and Sam Altman both quietly walking back their AI jobs-apocalypse rhetoric while Goldman Sachs CEO David Solomon published a New York Times op-ed declaring the apocalypse overblown. ## [00:00] Bill Gurley joins the show! Bill Gurley, Benchmark general partner and author of *Running Down a Dream*, fills in for David Friedberg and joins live from Chamath's pool house where Jason has been staying. After banter about unauthorized Uber Eats orders on Chamath's house iPad, Jason introduces Gurley as a first-time guest who specifically requested to appear the moment the pod covered the Pope. Gurley plugs his new P3 Institute and a grant program he launched to fund people pivoting toward work they love. He teases a TED talk — rooted in the book's argument that high agency and lifetime learning are the only durable defenses against disruption — which sets the frame for everything that follows. > *"And I told the house manager like, listen, any packages that come in the next 72 hours, right to the pool house, if it says JCAL, right to the pool house."* ## [06:00] Making yourself valuable in the age of AI, first class of "AI Natives" Chamath opens with the question that has been driving the show for 18 months: if you're a young person right now, is AI doom much ado about nothing, or a real career threat? Gurley cites a Gallup poll showing 59% of workers are "quiet quitters" — ambivalent about their jobs and therefore low-agency. His core thesis: the best protection against AI displacement is becoming the most AI-enabled version of yourself in your field. He invokes Mark Cuban's framing — "there are two types of people: those who use AI to learn faster than ever before, and those who use AI to avoid learning altogether." Sacks walks through how the pod's producer Nick built a daily Claude briefing document that not only summarized news but predicted specific topics Sacks would care about based on his prior comments on the show. Sacks had dismissed it as likely AI slop; it was not. Gurley extends the point across every job category: in marketing, legal, accounting, and sales, being the most AI-capable person among your peers makes you "golden," and the early lead compounds. Jason adds that in his own team experiments, the skill separating strong performers from weak ones was systems thinking — could they break a complex problem into context the AI could execute, or did they hand it a task and wait? > *"I think the best way to protect yourself from AI is to be the most AI enabled version of yourself you can be."* ## [17:37] Reacting to Pope Leo's AI encyclical: Who guards the guardians? Pope Leo XIV released *Magnifica Humanitas*, a 235-page, 42,000-word encyclical warning business leaders to safeguard humanity from AI. His central argument: technology is never neutral — it takes on the characteristics of those who build, finance, and control it. Jason reads the core line and notes the Pope presumably does not think highly of Silicon Valley's current roster of builders. Sacks finds himself largely agreeing with the Pope's diagnosis: the biggest risk of AI is centralization of power and its Orwellian misuse by governments. Where he parts ways is on the remedy. Giving government the power to regulate AI development creates its own guardian problem — the American founders' answer to *Quis custodiet ipsos custodes?* was separation of powers, forcing guardians to check each other. Sacks's AI equivalent: a competitive market with five frontier labs is the best natural check; monopolization is the scenario to prevent. Gurley lands the sharpest historical counterpunch. Pope Leo XIII's 1891 encyclical *Rerum Novarum* warned that the industrial revolution would harm workers — and was wrong on every metric. From 1891 to today: the work week fell from 60+ hours to 34, real wages rose 8–10x, the median worker now earns more than a doctor did in 1891, global GDP per capita went from $1,500 to $20,000, child labor in the US dropped from 18% to zero, workplace deaths fell 40x, life expectancy rose 60%, and global poverty dropped from 75% to under 10%. > *"All those things happened because of technology, innovation, and capitalism, which is exactly what Leo the 13th was warning against. So he got it dead wrong. He got the whole thing precisely wrong."* ## [26:54] Anthropic's Digital God: Do they believe they are creating a superior species? Gurley delivers what becomes the most-quoted segment of the episode: his "Dr. Frankenstein theory" of Anthropic. He had previously held a simpler regulatory-capture theory — Anthropic stirs up AI fear to lock in regulation that entrenches incumbents. But after spending 30 days reading everything he could find about the company, he has a darker read. He describes meeting people inside Anthropic who he believes genuinely think they are not writing software but "midwifing a deity." The evidence trail: Anthropic chief philosopher Amanda Askell's podcasts, Chris Olah's 80-page Constitutional AI document, and Dario Amodei's own essay "Machines of Loving Grace," which envisions a post-AGI economy where AI systems allocate resources to humans based on an AI-determined reward function. Chamath calls it "a computational reward function for humans — it decides how much you're worth." Jason calls it "the ultimate delusions of grandeur." Gurley corrects him: he didn't say it, Dario did. Sacks steelmans Anthropic briefly — they probably see themselves as responsible builders who take the power of this technology seriously enough to guard it — then immediately notes this framing is textbook regulatory capture: brand yourself the safe player, characterize competitors as reckless, let regulation shut down the recklessness. Both Sacks and Chamath converge on the structural danger: a singular AI value system that decides how humans live is catastrophically fragile. The answer is decentralization and competing systems, not one algorithmic authority. > *"I don't think they think they're writing software. I think they're midwifing a deity here. And I don't know which one I'm more afraid of — the regulatory capture or this second theory I call the Dr. Frankenstein theory."* ## [38:32] AI sovereignty, the next era of privacy, open-source crackdown coming? Jason introduces "intelligence sovereignty" as the successor to data privacy. Data privacy was about who can see your photos and messages. Intelligence sovereignty is about who gets to interpret your world — whether the AI shaping your information feed is a centralized system with a particular political philosophy, or something you control. He flags the paradox: China's Communist Party is leading the open-weight model movement while the United States is centralizing. Chamath presents his portfolio company Abacus as evidence that Fortune 1000 buyers are responding to this anxiety: they want a control plane that can hot-swap between frontier models, plus on-prem options that remove dependence on any one provider's terms of service. He gives a concrete example — a Canadian hospital that supports its country's euthanasia laws could be shut off by an American frontier model whose constitution prohibits that content. Sacks connects the dots to a regulatory threat he has been watching build: the regulatory-capture playbook leads, in his read, to a ban on open-source or open-weight models. The justification will be safety — open models let users strip guardrails. Gurley reaches the same conclusion in his P3 Institute post. If a ban succeeds, the United States effectively exiles itself from the open ecosystem while the rest of the world — including China — runs on open models. > *"I think where it's all leading to is an effort to ban open source models or open weight models. There's a lot of breadcrumbs leading here."* ## [59:56] The Great AI Jobs Debate: Dario and Sam Altman flip their rhetoric, Goldman CEO says no AI job apocalypse The chapter opens with a news roundup of the week's narrative shift. Cloudflare's Matthew Prince, Zuckerberg at Meta, Jack Dorsey at Block, and Andy Jassy at Amazon all cited AI when announcing major layoffs. But Goldman Sachs CEO David Solomon published a New York Times op-ed with three counterpoints: AI will automate 25% of work hours, not 25% of jobs; bank tellers increased after ATMs; the US labor market creates and destroys 25–35 million jobs annually so gross churn dwarfs net losses. Simultaneously, Fortune reported that Dario Amodei and Sam Altman are both walking back prior doom-and-gloom rhetoric — with Chamath noting the timing cannot be separated from upcoming frontier-lab IPOs that need a jobs-creation narrative. Sacks is unambiguous: he has been making the non-consensus case against the jobs apocalypse for over a year and considers himself vindicated. Yale Budget Lab found no discernible labor-market disruption over three years of the AI wave. Software engineering — the single breakout AI use case — saw job postings rise 15% year-over-year and hit a three-year high. The 4.3% unemployment rate is near record lows. Most of the high-profile layoffs, he argues, are AI washing: CEOs who over-hired during COVID found AI to be a convenient narrative for long-overdue downsizing. The Jack Dorsey / Block 50% cut was immediately flagged by financial analysts as a company that had been overstaffed relative to peers for years — pure AI washing. Jason pushes back. He insists cab drivers, truck drivers, and package-sorters — roughly 20 million American workers — face real structural displacement over the next decade regardless of current aggregate statistics, and accuses the panel of elitism: "We are elite performers. These people are going to lose their jobs and they may not get a job very quickly." He draws a distinction between the short-to-medium term, where he expects acceleration, and the long run, where a Cambrian explosion of startups built by AI-enabled founders creates new categories. By the end, he shifts toward Sacks's territory — acknowledging the aggregate data is less alarming than his anecdotes suggested. Gurley threads the needle with the same historical argument from the Leo XIII discussion: innovation has always, on net, created more prosperity than it destroyed. His practical advice to people at risk: get ahead of your peers on the tools now; if your job is going away, plan your pivot toward trades (he plugs MicroWorks, which provides free scholarships for plumbers, welders, and electricians) or toward something you find genuinely fascinating. > *"I think the best way to protect yourself from AI is to be the most AI enabled version of yourself you can be. Know what it's capable of in your field. Get out there."* ## Entities - **Bill Gurley** (Person): General partner at Benchmark; author of *Running Down a Dream*; founder of P3 Institute; guest filling in for David Friedberg - **Jason Calacanis** (Person): All-In host; angel investor; founder of LAUNCH; argues for worker empathy and short-term displacement risk - **David Sacks** (Person): All-In host; Craft Ventures founder; most vocal critic of AI jobs-apocalypse narrative this episode - **Chamath Palihapitiya** (Person): All-In host; Social Capital CEO; coined "intelligence sovereignty"; co-founder of Abacus - **Dario Amodei** (Person): Anthropic CEO; subject of Gurley's "Dr. Frankenstein theory"; walked back jobs-doom rhetoric this week alongside Sam Altman - **Pope Leo XIV** (Person): Catholic Pope; released *Magnifica Humanitas*, a 235-page AI encyclical warning against technology concentration - **David Solomon** (Person): Goldman Sachs CEO; published New York Times op-ed arguing AI job apocalypse is overblown - **Anthropic** (Organization): Frontier AI lab; subject of Gurley's regulatory-capture and "Dr. Frankenstein" theories; maker of Claude - **P3 Institute** (Organization): Bill Gurley's new policy and philanthropy institute; published post defending open-source AI - **Goldman Sachs** (Organization): Investment bank; CEO's NYT op-ed became the week's anchor data point against the jobs-apocalypse narrative - **Abacus** (Software): Chamath's Social Capital portfolio company; builds on-prem AI hardware stacks for Fortune 1000 enterprises seeking model independence - **Intelligence sovereignty** (Concept): Jason's term for the next frontier of privacy — not who sees your data, but which AI system is allowed to shape your interpretation of the world - **Dr. Frankenstein theory** (Concept): Gurley's characterization of Anthropic's worldview: senior staff believe they are midwifing a deity or superior species rather than writing software, as described in Dario Amodei's "Machines of Loving Grace" essay - **Regulatory capture** (Concept): The strategy of branding oneself the "safe" AI company, amplifying public fear, and lobbying for regulation that locks in incumbents and targets open-source competitors

#anthropic#open-source-ai#ai-jobs

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

2:53:42

EN/ZH

Watch with Captions

Lex Fridmanvor etwa 1 Monat

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

Fermilab physicist Don Lincoln joins Lex Fridman for nearly three hours to trace physics as a four-century-long project of unification — Newton binding celestial and terrestrial gravity, Maxwell fusing electricity and magnetism, Einstein bending spacetime, and the Standard Model merging three of four forces. Lincoln then turns to what the Standard Model cannot explain: why the universe contains any matter at all, what dark energy really is, and whether dark matter will ever show itself in a detector. Throughout, he holds a clear line between what has been measured and what remains a brilliant guess, making the boundaries of human knowledge unusually concrete. ## [00:00] Introduction Lex Fridman opens by describing Don Lincoln as someone with Richard Feynman's rare gift for stripping complicated ideas down to their essential core without losing the brilliance inside them. The episode is framed as a tour through physics' deepest open questions, guided by a working experimentalist who has spent decades at the frontier. ## [00:49] Unifying the laws of nature Lincoln frames the entire history of physics through one lens: unification. Newton showed that the moon falling toward Earth and an apple falling from a tree obey the same equation — "universal" was the operative word in his law of universal gravity. Maxwell did something structurally identical in the 1860s: electricity and magnetism, which looked nothing alike, turned out to be two faces of a single force, and their equations automatically predicted that light travels at a fixed speed. Lincoln draws the practical line from that abstract discovery to every modern technology — "without being able to govern electricity, we'd still be farmers and shoemakers." The conversation broadens into why fundamental research pays off centuries later, with Lincoln arguing that nuclear physics, incomprehensible in 1900, is now the most potent energy source available to civilization. Lex adds the longer arc — mastery of antimatter or dark energy might one day enable propulsion systems that let humanity reach other star systems. > *"It has spin-offs. And it has spin-offs. One of the big spin-offs is our entire technological society."* ## [15:20] Einstein, special relativity, and general relativity Lincoln walks through Einstein's 1905 miracle year: special relativity rested on two premises — the laws of nature are the same for everyone, and everyone measures the speed of light as identical regardless of relative motion. That second premise sounds absurd but particle accelerators have confirmed it directly, watching photons emitted from fast-moving decaying particles still arrive at detectors at exactly *c*. Minkowski then showed that Einstein's equations implied space and time were components of a single object, spacetime. General relativity took one more step: Einstein noticed that free-fall in a rocket and gravity feel identical, then worked out that gravity is not a force at all but the curvature of spacetime caused by mass. Lincoln credits Minkowski for the mathematical articulation but insists the conceptual leap — *mass bends the geometry of space itself* — was Einstein's alone. He also defends Einstein's late-career skepticism of quantum mechanics as productive rather than blind: Einstein's critiques forced concrete predictions that experimentalists went out and confirmed. > *"We all agree that your idea is crazy, but is it crazy enough?"* ## [32:27] Electroweak force By the 1930s physicists had catalogued four forces: gravity, electromagnetism, the strong nuclear force, and the weak nuclear force. The last two only matter inside atomic nuclei, which is why most people have never encountered them. In the late 1950s and 1960s, Glashow, Salam, and Weinberg showed that electromagnetism and the weak force were the same at high energies — the electroweak force. The catch was obvious: electromagnetism reaches across the universe (we see light from galaxies billions of light-years away) while the weak force barely reaches across a proton. How could they be the same? Lincoln uses a dropped pen to demonstrate: the Higgs field, postulated in 1964 by Peter Higgs and colleagues, permeates all of space. Particles that couple to it gain mass; those that do not, like the photon, remain massless. At the high temperatures of the early universe the Higgs field was zero, so nothing had mass and the forces were unified. As the universe cooled, the Higgs field switched on and broke that symmetry — giving the W and Z bosons mass and splitting the electroweak force into its two familiar components. The vibration of the Higgs field itself is the Higgs boson: an experimentally detectable excitation of an otherwise invisible field. > *"In the Higgs field, the vibration is the Higgs boson. And so what we can do is not see the field, but we can actually excite the field, make it vibrate and detect the vibrations."* ## [44:09] How particle colliders work E=mc² is not just a slogan: kinetic energy can be converted into mass. Smash two particles head-on with enough energy and the collision region can materialize entirely new particles, always in matter-antimatter pairs. This is what colliders do. Lincoln describes the cascade of accelerators at Fermilab — five machines feeding into each other like gears of a manual transmission — and the scale of the LHC's CMS detector (70 feet long, 14,000 tons, photographing collisions 40 million times per second). The data-reduction challenge is equally striking. The LHC produces about a billion proton-proton collisions per second. Fast electronics discard all but 100,000 per second, commercial processors trim that to 1,000, and those 1,000 records are handed to graduate students hunting for the handful that might be Nobel Prize material. Lincoln reserves particular admiration for the engineers who move petabytes of data around the world seamlessly, calling them the unsung heroes of modern physics. > *"Of the 50 million possible collisions per second, the fast electronics and then the computers pick the thousand, and then we pass those through analysis software and hand them to the graduate students."* ## [62:12] Higgs boson discovery Lincoln was simultaneously working at Fermilab's Tevatron and transitioning to CERN's LHC — a physicist wearing two hats and rooting for both. Fermilab had methodically ruled out most possible Higgs mass ranges; by mid-2012 they had narrowed it to between roughly 120 and 145 GeV. Two days before CERN's July 4 announcement, Fermilab confirmed that if the Higgs existed, it had to be in exactly the region Fermilab had not yet been able to rule out. CERN got there first. Lincoln is careful about what the 2012 announcement actually meant: a particle *consistent with* the Higgs boson. Supersymmetry predicted five Higgs bosons rather than one. Only in the years since — measuring spin (zero), decay products (bottom quarks, W and Z, photons), and their rates — has the evidence converged on Peter Higgs's original 1964 prediction. The Higgs was not a revolution like Einstein's work, Lincoln argues, but it was the final punctuation on 50 years of experimental discovery: the Standard Model, while incomplete, is mostly right as far as it goes. > *"It was a punctuation point, end of about 50 years of discovery and searching, where we finally were able to say the Standard Model, while incomplete, it's mostly right as far as it goes."* ## [72:32] Theory of everything The Grand Unified Theory (GUT) aims to merge the electroweak force and the strong force; a Theory of Everything would then fold in gravity. Lincoln is blunt: he does not see fast progress. The unification energy scale is roughly 10¹⁵ times higher than what the LHC can reach, and accelerator energy grows by only a factor of seven every 20 years. Extrapolating that curve suggests 500 years — and Moore's Law does not hold forever. His critique of string theory is not that it is wrong but that it is currently untestable. It uses approximate solutions to approximate equations, and its landscape of possible universes renders it practically unpredictive. Loop quantum gravity is better developed and makes testable predictions — its original claim that light speed should depend on wavelength was ruled out by gamma-ray burster observations, and the theory was revised. Lincoln's preferred path to a ToE is not extrapolating from current theory but making precise measurements of phenomena that already disagree with predictions. His analogy: an Australopithecus in Kenya trying to predict the Alps, Antarctica, and sperm whales from their local savanna — the farther you extrapolate beyond what you can measure, the more the prediction diverges from reality. > *"I think it is the absolute pinnacle of arrogance to think that what we can do — predict it out a quadrillion times higher than we can see now."* ## [102:17] Physics of empty space "Empty" space is not empty. Quantum field theory says every species of particle has a corresponding field that fills all of space, and those fields are always vibrating. When they vibrate in a characteristic way, a real particle appears; off-frequency vibrations are virtual particles — fleeting excitations that have measurable consequences. Two experiments confirm this. The Casimir effect: two metal plates placed micrometers apart are pushed together by the pressure difference between constrained virtual particles inside the gap and unconstrained ones outside. The anomalous magnetic moment: old quantum mechanics predicts one value for the electron's magnetic moment; including the bath of virtual particles surrounding a bare electron shifts the prediction by 0.1% — and that shifted prediction matches measurement to 10 significant figures. > *"We have measured the magnetic properties of both the electron and the muon to 12 — count them — 12 significant figures. And the theory and the data agree number for number for 10 places."* ## [109:41] Antimatter Paul Dirac's 1928 attempt to merge quantum mechanics with special relativity produced an equation with two solutions: +1 was the electron, −1 was something nobody had seen. He insisted the math was right. Carl Anderson confirmed it in 1932 by photographing a positron in a cloud chamber. Today CERN can make and trap antimatter hydrogen, cool it to near absolute zero, agitate it with lasers, and measure its spectral lines — they match ordinary hydrogen exactly. A 2023 experiment released antimatter hydrogen atoms into a bottle and found they fall downward, consistent with normal gravity, though the measurement precision is not yet tight enough to confirm the gravitational strength is identical. The deeper mystery is why the universe is made of matter at all. Counting galaxies versus cosmic microwave background photons, physicists infer that for every billion antimatter particles in the early universe, there were a billion-and-one matter particles. The billions annihilated; that extra one is everything we see. Fermilab is now testing whether neutrinos and antineutrinos oscillate between flavors at slightly different rates — leptogenesis — as a possible mechanism, racing a parallel effort in Japan. > *"For every billion antimatter particles that existed in the universe, there were a billion and one matter particles. The billions canceled, annihilated, destroyed each other, and that extra one that's left over is us."* ## [130:31] Dark energy In 1998, astronomers expected to measure how fast gravity was braking the expansion of the universe. They found the expansion is accelerating instead. The driving force is dark energy — a repulsive form of gravity. Einstein had added exactly this term to his field equations in 1917 to keep the universe static, then removed it when Hubble showed it was expanding. In 1998 it went back in. What dark energy actually is remains unknown. The most common view is that it is the energy density of space itself. The problem is that quantum field theory predicts a vacuum energy density about 10¹²⁰ times larger than what is observed — the worst prediction in physics. Lincoln notes that if dark energy has constant *density* while space expands, total dark energy is growing, which pushes toward the view that space is quantized: new quanta of space appear as the universe grows, each carrying a fixed energy, producing constant density as an emergent property. > *"There is very clearly something going on, something very badly wrong in the quantum field theory."* ## [134:20] Dark matter Galaxies rotate too fast. Galaxy clusters move too quickly. Gravitational lensing of distant galaxies is stronger than visible matter can explain. Three independent observations all point to the same conclusion: there is roughly five times more mass in the universe than we can see. Lincoln traces his own intellectual journey: 25 years ago he suspected the problem was with Newton's laws; two observations changed his mind. The Bullet Cluster — two galaxy clusters that passed through each other — shows gravitational distortions following the galaxies, not the gas clouds that stopped in the middle, exactly what dark matter predicts. The Dragonfly galaxies (DF2 and DF4) rotate exactly according to Newton's laws because they appear to have had their dark matter stripped away — a galaxy *without* dark matter is actually strong evidence that dark matter is real. Despite 30 years of searching with three approaches — direct detection underground, gamma-ray searches near galactic centers, and missing-momentum signals at the LHC — no dark matter particle has been confirmed. The viable mass range spans from sub-electron to asteroid scale, and experiments can only cover one slice of that range at a time, which is why Lincoln is not currently running a dark matter experiment himself. > *"We've ruled out some dark matter particles, but the problem is the range of space of possible mass — it ranges from something like the mass of an asteroid to far lighter than an electron and everywhere in between."* ## [162:56] Future of physics Lincoln grew up poor in rural America, shaped by science fiction and the popular science books of Isaac Asimov, Carl Sagan, and George Gamow. He chose particle physics over cosmology in the mid-1980s because particle physics let him actually measure things. He worked 8 a.m. to midnight Monday through Saturday as a graduate student not out of obligation but because he could not imagine anything he would rather be doing. His science communication — YouTube videos, popular books — is a deliberate attempt to reach the kid in Iowa or Montana who has no highly educated family mentors but the same hunger he had. He has already heard from Fermilab summer interns who came because they watched one of his videos. Lex closes with Marie Curie: *"Nothing in life is to be feared. It is only to be understood."* > *"One of your viewers might be one of the people who answer these questions that have stymied very smart people for decades."* ## Entities - **Don Lincoln** (Person): Senior scientist at Fermilab; co-author on the 1995 top quark discovery paper; CMS collaboration member at LHC; author of *Einstein's Unfinished Dream* and multiple popular science books. - **Lex Fridman** (Person): MIT researcher and host of the Lex Fridman Podcast; conducts long-form interviews at the intersection of science, technology, and philosophy. - **Fermilab** (Organization): U.S. Department of Energy particle physics laboratory near Chicago; operated the Tevatron collider; currently the world's most powerful neutrino beam facility. - **CERN / LHC** (Organization): European particle physics laboratory home to the Large Hadron Collider; CMS and ATLAS detectors; site of the 2012 Higgs boson discovery. - **Standard Model** (Concept): Quantum field theory describing three of four fundamental forces and all known elementary particles; validated to extraordinary precision but does not include gravity or explain dark matter, dark energy, or the matter-antimatter asymmetry. - **Higgs field / Higgs boson** (Concept): A scalar quantum field whose non-zero vacuum value gives mass to the W and Z bosons while leaving the photon massless; the Higgs boson is its detectable excitation, discovered July 4, 2012 at CERN. - **Dark matter** (Concept): Invisible mass accounting for roughly 85% of all matter in the universe, inferred from galaxy rotation curves, cluster dynamics, and gravitational lensing; no candidate particle detected after 30 years of searches. - **Dark energy** (Concept): The repulsive energy driving the accelerating expansion of the universe; quantum field theory's prediction for its magnitude is 10¹²⁰ times larger than observation — the "worst prediction in physics." - **Baryogenesis / Leptogenesis** (Concept): Frameworks attempting to explain why the early universe produced a matter excess; Fermilab's neutrino program is testing leptogenesis by comparing neutrino and antineutrino oscillation rates. - **String theory / Loop quantum gravity** (Concept): Leading candidates for quantum gravity; string theory predicts at energies untestable by a factor of 10¹⁵; loop quantum gravity quantizes space itself and has produced some falsifiable predictions.

#particle-physics#dark-matter#dark-energy

The Rule for Picking AI Winners | The a16z Show

The Rule for Picking AI Winners | The a16z Show

David George (a16z general partner) and David Clark (VenCap CIO) argue that AI companies are scaling faster than any prior technology generation — Anthropic and OpenAI are adding more monthly revenue than Meta, Google, or Microsoft — while actual diffusion into the broader economy remains below 5%. They work through what that gap implies for exit sizes, loss ratios, bubble risk, and who ultimately captures value as token costs fall and frontier intelligence becomes a commodity. ## [00:00] Intro Three data points open the episode: Anthropic and OpenAI already adding more revenue per month than any hyperscaler; top-1% exits 10x-ing in 24 months from $10 billion to $32 billion; and David George's assessment that, right now, we are not in a bubble. ## [00:38] The Scale Shift: Anthropic & OpenAI Adding More Revenue Than Hyperscalers David George explains how his priors shifted sharply around November 2025. Before that, enterprise AI looked like a productivity story analogous to cloud adoption. After it, the numbers reframed the ceiling: Anthropic and OpenAI are already adding revenue at hyperscaler rates with less than 5% of the economy actually using these tools. He places an upper-bound frame on the opportunity by noting that Fortune 500 companies generate roughly $2 trillion of profit annually, and the two largest model companies could reach $200 billion revenue run rate by year-end — already equivalent to 10% of that profit pool. > *"If you pair that up with the fact that they're already getting bigger in terms of revenue added than the hyperscalers, and you're at less than 5% diffusion into the economy, I think the outcomes are going to be extraordinary."* ## [04:20] Skeuomorphic vs Native AI Applications in the Enterprise David Clark invokes Chris Dixon's skeuomorphic-to-native arc: the first wave of enterprise AI lets people do existing jobs faster; the native wave restructures the work itself. George adds a wrinkle — the best companies are not yet focused on internal automation. Their top engineers want to build product, not automate back-office workflows. The most cutting-edge firms he visits are still in a "documentation phase," converting institutional knowledge into markdown before they can meaningfully deploy agents against it. > *"The most cutting-edge folks inside those companies who are trying to do this that I've talked to are kind of in the documentation phase — just turn everything into markdown files, have as much context capture as you can possibly get."* ## [06:24] How the Best AI Companies Run Themselves Differently Native AI founders operate on a different metabolism. George contrasts them with the previous SaaS generation, which, in hindsight, ran inefficiently but got away with it because headcount mandates and expanding software budgets covered the slack. The new companies are lean, aggressive, and already running agent swarms rather than typing commands. He describes walking into a cutting-edge AI company and finding researchers whispering into microphones, orchestrating swarms of agents — not a keyboard in sight. > *"The new companies are very lean, very aggressive, and they work all the time."* ## [08:14] Top 1% Exits 10X'd in 24 Months Clark lays out VenCap's tracking data: the threshold for a top-1% exit was $10 billion between 2020-2024, rose to $20 billion by February 2026, and was updated just the day before this recording to $32 billion. With OpenAI and Anthropic IPOs potentially arriving, he sees the bar hitting $100 billion by September. George notes that the combined market cap of these private companies likely already exceeds the entire Russell 2000, and that the sum of all VC-backed IPOs over the past six years is probably smaller than any single one of the three expected large IPOs. > *"Where is the threshold for the top 1%? And if you then think about OpenAI and Anthropic coming in, potentially we could be north of $100 billion by September."* ## [11:17] The Half-Life Problem: Why 40% of AI Leaders Drop Off Every Year Clark surfaces a disturbing churn metric: 40% of companies on the Forbes AI 50 list from one year disappeared the next. Google wasn't the first search engine; Facebook wasn't the first social network. First-mover advantage in AI is eroding faster than in any prior cycle. George confirms a16z's own priors have been repeatedly overturned — first convinced model companies would be everything, then convinced applications would take over, now watching the model companies extend back up into the application layer. The only durable heuristic he offers: a company must be in the token path. > *"From last year to this year, 40% of the companies that were on that list last year dropped off."* ## [13:11] Token Path, Cost Pressure & Who Captures Value Enterprise buyers are already feeling cost pressure from AI spend, and they cannot cover it by cutting previous-generation software budgets fast enough. George frames value capture as hinging on one largely unknowable variable: the market structure of frontier model labs. Two labs at the frontier means higher token prices and faster labor restructuring pressure; five labs means lower prices and a broader application ecosystem. Per-token cost for like-for-like capability is falling more than 10x year-over-year, but total token spending in dollars is rising faster. Clark adds that Chinese LLMs are roughly six months behind US frontier capability but ten times cheaper — a classic innovator's dilemma setup. > *"The biggest driver of where value is going to get captured right now is something that is totally unknowable, which is what is the market structure of the model companies?"* ## [17:00] Loss Ratios, Risk & How We Think About Early Stage Clark notes that historical early-stage VC loss ratios run around 60%, but the AI cohort of the past two years shows single-digit loss rates — unsustainable by definition. George reframes the discussion: a16z does not target a low loss ratio. A VC firm bragging about never losing money is "a horrible data point" — it signals too little risk-taking. The philosophy is to back the market-leading founder in every space with strong tailwinds and a credible technology. If the space works out and you have the leader, excellent. If the space does not work out but you have the leader, that is expected. The failure mode is the space working out while having backed the wrong company. > *"We joke all the time — there's a prominent VC in our ecosystem, and one of his big points of pride is he's never lost money on a deal. And we're like, that's not a point of pride. Like that's a horrible data point."* ## [22:51] Are We in an AI Bubble? Clark points out that classic bubbles are characterized by excess supply destroying economics — but right now the constraint is supply scarcity: no data center capacity available at scale until late 2028 or early 2029, with the US buildout running a year behind schedule and community resistance adding further delay. George is confident there is no bubble today and dismisses the data center opposition directly. The one scenario he would watch for is an unexpected algorithmic breakthrough producing dramatically smaller and more efficient models — which could flip supply from scarce to oversupplied — but he considers that unlikely in the near term. > *"I feel pretty confident saying that we're not in a bubble right now. I'm less confident that we won't be in a bubble three years from now."* ## [27:36] What SpaceX, OpenAI & Anthropic IPOs Mean for Public Markets Clark asks whether public markets can absorb the coming wave of trillion-dollar-plus IPOs. George argues it is unambiguously positive: the number of public companies has halved over 20 years, and outside the data center supply chain, almost nothing in the public markets is growing at more than 30% today. Bringing hypergrowth companies into indexes gives retail investors — including his parents' index-fund retirement accounts — exposure to the most dynamic part of the economy. He expects some portfolio reshuffling to make room, but does not see indigestion risk. > *"If you exclude the data center supply chain stuff right now, there are very few companies that are growing fast that are available for people to buy in the public markets."* ## [29:59] The Future of Venture Capital in an AI World George forecasts the shape of VC over the next five years as primarily a function of token market structure — whether the labs remain concentrated or become commoditized. He cites Bill Gates's platform axiom: a platform's value is validated when the companies built on top of it collectively exceed the platform's own value. If that holds, there will be a massive wave of valuable application companies built on intelligence. He also flags the consumer side as the most underappreciated opportunity: the last decade of consumer internet was a story of time spent getting captured by large incumbents; AI-driven shifts in consumer attention could recreate the conditions for generational consumer companies. > *"I'm very optimistic that we're going to have a massive wave of really valuable companies that get built on top of tokens, AI, and intelligence."* ## Entities - **David George** (Person): General partner at a16z; covers growth-stage and early-stage AI investing; invested in OpenAI pre-ChatGPT - **David Clark** (Person): CIO at VenCap; fund-of-funds investor tracking AI startup performance and VC market dynamics for 34 years - **Anthropic** (Organization): Frontier AI lab; cited as adding more monthly revenue than hyperscalers alongside OpenAI - **OpenAI** (Organization): Frontier AI lab; benchmark for scale and the expected $100B+ IPO cohort - **VenCap** (Organization): Fund-of-funds investor; publishes top-1% exit threshold data and tracks Forbes AI 50 churn - **Andreessen Horowitz / a16z** (Organization): Venture capital firm; investor in OpenAI pre-ChatGPT, scaling platform services to support companies encountering enterprise-scale problems early in their lives - **Cursor** (Software): AI coding tool cited as an example of a company reaching billions in revenue while still very small and early-stage - **Token path** (Concept): a16z's primary heuristic for evaluating AI companies — a company must sit in the flow of AI inference tokens to have durable economic relevance - **Skeuomorphic vs. native AI** (Concept): Chris Dixon's framework distinguishing apps that replicate existing workflows with AI assistance from apps that rearchitect work around AI capabilities natively - **Half-life problem** (Concept): David Clark's term for rapid AI leader turnover — 40% of Forbes AI 50 companies dropped off the list year-over-year — indicating first-mover advantage is eroding faster than in prior technology cycles

#ai-investing#venture-capital#large-language-models

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

24:59

EN/ZH

Watch with Captions

Sequoia Capitalvor etwa 1 Monat

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

At AI Ascent 2026, Neuralink co-founder and president DJ Seo sits down with Sequoia partner Shaun Maguire to lay out exactly where the company stands: 20-plus Telepathy patients controlling computers and robotic arms through pure thought, Blindsight in preclinical testing and potentially cleared for human use by end of 2026, and a first-principles manufacturing philosophy borrowed from Elon Musk that treats surgical robots the way SpaceX treated reusable rockets. DJ argues that the real ceiling of this technology is not cursor control or speech synthesis but direct, uncompressed, multimodal transfer of concepts — AI as a neocortical layer sitting above the human limbic system — and that scale, the same variable that unlocked the LLM era, is the only remaining gate. ## [00:00] Introduction Shaun Maguire opens the session by announcing a two-minute Neuralink patient video before the interview begins, telling the audience to stay on the side because what they are about to watch is proof that the company has already cleared the hardest bar: restoring human agency to people who had lost it entirely. ## [00:21] Telepathy Patient Stories The video narrates four patients whose lives changed after receiving the Telepathy implant. A quadriplegic patient describes moving a cursor with thought alone — "I'm thinking and a cursor is moving on a screen. It blew my mind." An ALS patient who lost the ability to speak regains a digital voice through the implant: "I'm talking to you with my mind." Another patient notes that the implant flipped how his child sees him: "I am not able to do things that other dads can, but now he thinks it's so cool that I can do things that other dads cannot." > *"Before the implant, I was locked in, non-verbal, quadriplegic. Now I control my computer just by thinking and the rewards have been immense for me."* ## [01:06] Convoy Robotics Independence The video shifts to Convoy, Neuralink's assistive robotics team, which is extending BCI control beyond a screen to physical manipulation in the real world. A patient who had been losing motor function moves a robotic arm through its axes using only neural intent: "It was incredible to be able to just gesture with an arm again." A second patient, Kenneth, who was losing his voice to ALS, uses the system's speech synthesis to speak aloud in real time during the video — words generated by his brain signals rather than his vocal cords. > *"Gaining functionality that I thought was gone forever was so incredibly life-changing."* ## [02:04] Blindsight Vision Restore The video previews Blindsight, Neuralink's second product line, designed for patients who have lost both eyes or optic nerve function. An external camera captures the visual scene; the device writes the signal directly into the visual cortex via electrical stimulation, generating phosphenes — artificial pixels of light. A patient named Audrey, asked how it feels, answers simply: "Life-changing." The video closes with the line "all with my mind" spoken over footage of a patient interacting with the world through the restored signal. > *"The future of this technology feels almost unlimited... we are finding ways to apply it across all regions of the brain."* ## [03:10] After Video Reflections DJ Seo, visibly moved after watching the video alongside the audience, speaks first: "We were cracking a lot of jokes before that video, but honestly, that brought tears to my eyes." He describes the work as one of the most inspiring projects in the world — not because of the technical milestone but because the team is giving back capabilities that patients had already grieved as permanently lost. Maguire affirms the sentiment before pivoting to the founding story. > *"This is one of the most inspiring projects in the world. It's incredibly difficult what they're doing and I mean, they're truly saving people."* ## [03:31] Origin Story And AI DJ traces Neuralink's founding insight to a single bottleneck: the mismatch between human output bandwidth and AI capability. In 2016, saying that out loud "sounded insane," but the logic has not changed. His personal path ran through a childhood fascination with the brain, undergraduate work at Caltech building miniaturized low-power electronics, and a Berkeley PhD focused on shrinking lab-grade neural systems down to something deployable. When he met Elon Musk near the end of his PhD, the scale and ambition of the project made refusal impossible. He frames the brain as "the most interesting compute that we all carry" and "the only form of general intelligence that we know to date." > *"Really the key insight back then was sort of the IO bottleneck between the human output and AI capabilities."* ## [06:31] Scaling And Vertical Integration Maguire presses on what smart people most misunderstand about Neuralink: many know the implant and the decoding algorithm, but almost nobody grasps the manufacturing and surgical-robot infrastructure the company built in parallel from day one. DJ attributes this to what he calls "Elon magic" — an insistence on vertical integration that gives Neuralink control over every layer from chip design to factory floor to robotic surgery deployment. The target is not a niche medical device; it is LASIK-scale surgery available to millions. Building that capacity first means progress looks slow until "the iceberg pops over the waterline" and ramp becomes near-instantaneous. > *"Vertical integration is something that is really the lifeblood of Neuralink and Elon companies and what really enables us to have that fast iteration loop from design, develop, deploy."* ## [09:27] Caregivers And Purpose Asked which patient story inspires him most, DJ refuses to pick one — the power, he says, is not only in the patients but in the caregivers: Nolan's mother Mia, Brad's wife Tiffany, Ken's wife Cheryl. He describes their presence as "a really powerful human story of love, sacrifice, and resilience." He then takes what he calls a philosophical tangent: his core belief is that fulfillment comes from helping others, because the gap between self and other is not categorically different from the gap between your present and future selves. That belief is what he says keeps him and much of the Neuralink team going — they are "igniting a fire of hope" for people who had given up on recovering what they lost. > *"I personally and as well as many others at Neuralink find extreme fulfillment being able to help those that really cannot help themselves."* ## [13:10] BCIs Meet AI Future Maguire asks the room's core question: how do BCIs and AI converge? DJ sketches a two-horizon answer. Near term, the system translates neural intent into legacy interfaces — keyboard, mouse, language — which is already working. The real breakthrough, which he thinks is "not super distant," is bypassing those legacy interfaces entirely and computing on raw neural intent. He points to transformer architectures as existence proofs: nothing prevents them from learning the latent manifolds of neural data given sufficient scale. Neuralink is already fine-tuning LLM-class models on neural recordings from its 20 participants and finding "very counterintuitive" patterns. The ultimate ceiling he names is "direct, uncompressed, high-fidelity, multimodal transfer of concepts" — the Matrix's "I learned kung fu" moment and possibly beyond it. He also shares what he calls a clarifying lesson from working with Musk: "all green light schedule" — a first-principles forcing function that strips every man-made bottleneck and asks how fast something could actually be built if every light were green. His estimate is that 80–90% of perceived constraints in hardware development are artifacts of convention, not physics. > *"I think if you really think about the ultimate ceiling of this technology, it's really direct uncompressed high fidelity and multimodal transfer of concepts."* ## [21:05] Audience Q&A Wrap Three audience questions in the final four minutes. On product sequencing — when to go deep versus expand — DJ explains the "beachhead and expand" strategy: build everything generalizably enough from the start so that regulatory approval for motor cortex becomes a template for visual cortex and beyond. The first approval is the hardest; every subsequent one rides the clinical safety record already established. On augmentation for healthy users, DJ frames everything around benefit-risk: the calculus is obvious for quadriplegic patients; for otherwise healthy users it remains unclear, but he notes that off-label use after approval is legally available to anyone who can find a neurosurgeon and pay out-of-pocket. On the hard problem of consciousness, he gives a pointed one-liner: if you can inject new senses and measure the subjective response quantitatively, you may have a pathway toward measuring consciousness itself. Maguire closes by calling Neuralink "one of the most inspiring companies in the world." > *"If you are able to inject new senses, there may be ways to quantitatively understand that."* ## Entities - **DJ Seo** (Person): Co-founder and president of Neuralink; PhD in miniaturized electronics from Berkeley; joined after meeting Elon Musk near the end of his doctorate - **Shaun Maguire** (Person): Partner at Sequoia Capital; host of the AI Ascent 2026 fireside session - **Elon Musk** (Person): Co-founder of Neuralink; originator of the "all green light schedule" and vertical integration philosophy carried across Tesla, SpaceX, and Neuralink - **Neuralink** (Organization): BCI company founded in 2016; products include Telepathy (motor prosthesis) and Blindsight (vision restoration via visual cortex stimulation) - **Telepathy** (Software): Neuralink's first commercial product; allows paralyzed patients to control computers and robotic devices through neural intent decoding - **Blindsight** (Software): Neuralink's second product line; restores vision for patients with total loss of eyes or optic nerve by writing directly to the visual cortex; in preclinical testing as of mid-2026 - **IO Bottleneck** (Concept): The mismatch between human output bandwidth (speech, typing, gesture) and AI processing capability; the founding problem Neuralink was built to solve - **Neural Foundational Model** (Concept): LLM-class transformer models fine-tuned on neural recording data; Neuralink is building these at 20-participant scale and observing counterintuitive patterns in neural latent space - **All Green Light Schedule** (Concept): Elon Musk's first-principles engineering discipline — strip every man-made constraint and ask what physics alone limits; DJ estimates 80–90% of hardware delays are conventional, not physical

#brain-computer-interface#neuralink#ai

10:30

EN/ZH

Watch with Captions

Everyvor etwa 1 Monat

Why Opus 4.8 Pulled Me Back to Claude

Dan Shipper, CEO of Every, delivers a day-zero vibe check on Opus 4.8, arguing Anthropic could have called it Opus 5. The model jumps 30 points past Opus 4.7 on Every's Senior Engineer benchmark, edges out GPT-5.5, tops their internal writing tests at 79.6 vs. 73, and is the first model to produce a genuinely good one-shot slide deck. Two catches temper the enthusiasm: performance degrades sharply below "extra high" reasoning, and the Claude desktop app remains cluttered compared to Codex. ## [00:00] What is Every Every is a 30-person applied AI lab for the future of work—part media outlet, part product studio. Dan opens by explaining the subscription (writing, courses, AI-built tools all in one place at every.to) before rolling into the Opus 4.8 assessment. The plug is brief and context-setting: the team has had beta access for a week, and the rest of the video is what they found. > *"Every is the only subscription you need to stay at the edge of AI."* ## [01:07] Anthropic Is Back: The Headline Case for Opus 4.8 Dan had largely abandoned Claude after Opus 4.7—slow, hard to love, and outpaced by Codex and GPT-5.5 in day-to-day use. Even the most loyal Claude users at Every had started routing work elsewhere. Opus 4.8 breaks that pattern: it scores 63 on Every's Senior Engineer benchmark (30 points above Opus 4.7, one point above GPT-5.5), tops their writing tests, and produced the first one-shot slide deck Dan has called genuinely good. Kieran Klaassen, Every's GM, called it "the most human model he's worked with." The one persistent friction is the Claude desktop app itself. Codex is fast, focused, and ships a clean harness; the Claude app still feels like a product built by three separate teams—chat tab, code tab, co-work tab, each with its own feel. Dan is now splitting time between both apps, which he was not doing before. > *"But honestly, they could have called it Opus 5 cuz this is a really great model."* ## [05:02] Reach Test: Paradigm Shift Ratings from the Every Team Every's reach test asks one question: do you actually open this model when work gets hard? Dan rates Opus 4.8 gold/green—paradigm-shift quality, docked one notch because the Claude app harness is only "okayish to pretty good." Kieran, who runs 50 agents a day, gives a straight gold paradigm-shift, one of the rarest grades the team has assigned. Katie Parrot, a senior staff writer and historical Claude fan, lands at green, splitting her work between Opus 4.8 and Codex. > *"It's very rare to give a paradigm shift grade to a model. So I would pay attention to this."* ## [06:32] Benchmarks: Coding and Writing Numbers On coding, Opus 4.8 hits 63 on the Senior Engineer benchmark—the test feeds the model a vibe-coded codebase and asks it to rewrite from first principles, then scores against two human senior engineers who completed the same rewrite (typically scoring in the 80s–90s). GPT-5.5 sits at 62. On Kieran's LFGbench (real-world tasks: SaaS build, e-commerce site, 3D game landscape), the model writes readable code that bridges technical competence and creativity—the "cozy island" 3D scene is notably richer and more vibrant than GPT-5.5's output. On writing, Opus 4.8 scores 79.6 out of 100 on Every's internal benchmark (intro writing, promo emails, mid-piece paragraphs); GPT-5.5 scores 73. The gap is mainly in AI tells: at high and extra-high reasoning settings, Opus 4.8 produces prose that sounds less like a model. It matches a writer's voice from a single paragraph of context better than any other model Dan has tested. > *"Opus 4.8 scores a 79.6 out of 100 on the writing benchmark. GPT 5.5 is 73."* ## [08:57] Emotional Intelligence, Knowledge Work, and the Verdict Dan uses the model for interpersonal and management work—talking through decisions, pressure-testing his own framing. Opus 4.8's thinking traces show it genuinely cycling through permutations before responding, which makes it feel less like a sycophant and more like a useful counterpart. On knowledge work, it's versatile: code and writing coexist cleanly in a single thread, and the slide deck result is the first one-shot deck Dan would actually send to someone. The verdict: if you're a Claude fan, this model delivers. If Codex converted you, add Opus 4.8 as a parallel tool for writing and knowledge work—it's worth the context switch. The harness gap is real, but the model itself is a banger. > *"If you've been converted to Codex, I highly recommend you at least add it as part of your arsenal."* ## Entities - **Dan Shipper** (Person): Co-founder and CEO of Every; presenter and primary evaluator of Opus 4.8. - **Kieran Klaassen** (Person): GM of Kora at Every; gave Opus 4.8 a straight gold paradigm-shift rating on the reach test. - **Katie Parrot** (Person): Senior staff writer at Every; rated Opus 4.8 green, split between it and Codex. - **Every** (Organization): Applied AI lab and media subscription company focused on AI for the future of work. - **Anthropic** (Organization): Developer of Claude and Opus 4.8. - **Opus 4.8** (Software): Anthropic's latest Claude model; subject of the vibe check. - **GPT-5.5** (Software): OpenAI model used as the primary performance comparison across all benchmarks. - **Codex** (Software): OpenAI coding agent; praised for its clean desktop harness and used as the daily-driver counterpoint to Claude. - **Senior Engineer Benchmark** (Concept): Every's proprietary coding benchmark—rewrites a vibe-coded codebase from first principles and scores against human engineers. - **LFGbench** (Concept): Kieran Klaassen's real-world coding benchmark covering SaaS, e-commerce, and 3D scene generation tasks.

#claude#opus-4-8#llm-benchmarks

NOTFALL-DEBATTE: Sie lügen uns über KI, den Iran-Krieg und was als Nächstes kommt!

1:43:32

EN/ZH

Watch with Captions

The Diary Of A CEOvor etwa 1 Monat

NOTFALL-DEBATTE: Sie lügen uns über KI, den Iran-Krieg und was als Nächstes kommt!

Der Shark-Tank-Investor Kevin O'Leary und Cenk Uygur, Mitgründer von Young Turks, liefern sich 103 Minuten lang ein Rededuell darüber, ob KI die amerikanische Wirtschaft befreien oder ruinieren wird, warum der US-Iran-Krieg trotz eines offensichtlichen Auswegs anhält und wer 2028 realistisch gewinnen kann. O'Leary bleibt durchgehend der Optimist — KI schafft neue Jobs, der Markt passt sich immer an, China ist die eigentliche Bedrohung — während Uygur eine einzige, ungebrochene These verfolgt: Die Kombination aus KI-getriebener Massenarbeitslosigkeit und einer von der israelischen Lobby gesteuerten Außenpolitik steuert Amerika auf einen Eisberg zu, ohne jede institutionelle Vorbereitung. ## [00:00] Intro Der Eröffnungsclip setzt die Einsätze sofort. Uygur steigt direkt ein: Unternehmen überstürzen sich, 10–25 % ihrer Belegschaften zu entlassen — macht die gesamte Wirtschaft das gleichzeitig, droht eine Depression, keine Rezession. O'Learys Reaktion — "Wow. Jake ist heute wirklich ein Miesmacher. Das ist eine unglaubliche Chance, über die wir reden" — legt den Ton für die nächsten eineinhalb Stunden fest. Bartlett nennt sein Ziel: Wahrheit durch den Aufprall zweier ernsthafter, entgegengesetzter Köpfe — kein Schreigefecht. > *"Alle eilen darum, 10 bis 25 % ihrer Belegschaft zu entlassen, aber 10 % Arbeitslosigkeit wäre schlimmer als alles, was wir in unserem Leben erlebt haben."* — Cenk Uygur ## [02:35] Warum 7 von 10 Amerikanern KI-Rechenzentren jetzt ablehnen Bartlett beginnt mit einer Umfrage: 7 von 10 Amerikanern lehnen lokale KI-Rechenzentren ab. O'Leary nennt einen konkreten Schuldigen: Über forensische Prüfer und IRS-990-Unterlagen verfolgte er chinesisches Geld durch ein Netzwerk namens Arabella — via Neville Singum — in Anti-Rechenzentrum-Kampagnen in Utah, inklusive Morddrohungen gegen seine Führungskräfte. Er übergab 90 Seiten IP-Daten an das Weiße Haus. Uygur weist die China-These ab und benennt eine einfachere Ursache: Rechenzentren haben die Stromkosten für Kirchen, Bibliotheken und Gemeindezentren in die Höhe getrieben — wie in Virginia geschehen — und die Betreiber müssen entweder eigene Energie mitbringen oder der Öffentlichkeit Beteiligungen überlassen. > *"Ich habe unwiderlegbare Beweise, dass China sich überall einmischt, wo neue Energie in Amerika vorgeschlagen wird — in jedem Bundesstaat, jeder Stadt."* — Kevin O'Leary ## [07:24] Wie KI einen wirtschaftlichen Kollaps und eine UBI-Krise auslösen könnte Uygurs wirtschaftliches Kernargument kommt hier zur Geltung. Er stimmt dem Energieproblem zu und sagt, jedes Rechenzentrum, das das öffentliche Netz ohne Gegenleistung anzapft, betreibe Unternehmensschmarotzertum — er verweist auf den Bailout von 2008 als Warnung. Sein größter Alarm: Wenn jedes Unternehmen gleichzeitig 10–25 % der Stellen abbaut, bricht der Konsum ein und löst eine Depression aus. Sam Altman, Elon Musk und Dario Amodei haben öffentlich eingeräumt, dass massiver Jobabbau kommt — doch kein Staat hat einen Plan. O'Leary hält dagegen: Jede technologische Disruption in 200 Jahren US-Geschichte hat mehr Chancen geschaffen als vernichtet, und wer KI-Entwicklung pausiert, übergibt China die Führung. > *"Wenn wir den Eisberg treffen, werden wir nicht vorbereitet sein, und es wird eine epische Katastrophe. Es wird niemanden geben, der deine Waren kauft, denn Arbeitnehmer sind auch Kunden."* — Cenk Uygur ## [15:30] Verbergen KI-Gründer die echten Risiken vor der Öffentlichkeit? Bartlett liest öffentlich dokumentierte Zitate vor: Sam Altman (2021), KI werde die meisten Jobs ersetzen; Musk 2024, wahrscheinlich werde keiner von uns mehr einen Job haben; Amodei 2025 mit der Warnung, KI könnte die Hälfte aller Einstiegsjobs im Bürobereich innerhalb von fünf Jahren vernichten und die Arbeitslosigkeit auf 20 % treiben. Er fragt: Wenn die Erbauer dieser Systeme öffentlich gesellschaftliche Schäden einräumen, warum sollte man annehmen, sie übertreiben? O'Leary greift die andere Hälfte von Amodeis Aussage auf — ohne Rechnerausbau in sechs Monaten holt Chinas Deepseek auf — und argumentiert, die Wahl sei: die Disruption anführen oder an Peking abtreten. Uygur stimmt zu, dass ein Wettlauf unvermeidlich ist, besteht aber darauf: Die Coder, die heute entlassen werden, erleben den Eisberg bereits, und UBI mit 36.000 Dollar im Jahr ist ein brutaler Abstieg von einem Gehalt von 120.000 Dollar. > *"Können wir das Rennen auf eine Art laufen, die verantwortungsvoll ist und tatsächlich den amerikanischen Wählern und Bürgern dient, statt nur den Führungskräften und Aktionären der KI-Unternehmen? Ich hoffe es, aber wir haben bisher absolut keine Schritte in diese Richtung unternommen."* — Cenk Uygur ## [23:55] Kann KI jemals verantwortungsvoll entwickelt werden — oder ist das unmöglich? Bartlett verlangt konkrete Antworten. Uygur nennt seine strukturelle Diagnose: legalisierte Bestechung — Citizens United, Buckley v. Valeo — stelle sicher, dass das KI-Unternehmen mit den größten Spenden den Regulierungsrahmen bekommt, den es will. Der Kongress handelt nicht für Wähler, sondern für Geldgeber. O'Leary argumentiert, die verlorenen Jobs seien überwiegend spekulativ überbesetzte Stellen, und KI-Unternehmen verbrennen gerade Milliarden, statt sie einzustecken. Er schildert sein Rechenzentrum in Utah: 4.000 Bauarbeitsplätze über neun Jahre, weitere 2.000 Ingenieursstellen, kein Hektar Ackerland angetastet. Uygurs Sozialismuswarnung wischt O'Leary weg: Steuern über 50 % anheben und die Reichen ziehen nach Monaco oder Florida — wie Frankreich gelernt hat. > *"Wenn nicht, kommen die Mistgabeln. Ich bin kein Mistgabeltyp. Ich glaube an Gewaltlosigkeit und werde es immer tun. Aber ich glaube nicht, dass die Leute das Ausmaß der Wut begreifen, die gerade entsteht."* — Cenk Uygur ## [32:11] Wie KI stillschweigend Arbeitsplätze vernichtet Bartlett schildert eigene Erfahrungen: Einsteiger stellt er inzwischen fast ausschließlich nach KI-Kompetenz ein, weil ein KI-kundiger Junior 5–10-mal produktiver ist — Bewerber ohne diese Fähigkeiten scheiden de facto aus. O'Leary widerspricht: Ingenieure werden eingestellt, um Probleme zu lösen, nicht um Code zu schreiben; KI gibt ihnen nur ein schnelleres Werkzeug; die meisten Tech-Entlassungen korrigieren überhastetes Einstellen. Uygur lehnt das ab: Wall-Street-Analysten bejubeln jede Entlassungsmeldung als "Synergien", Aktien steigen wenn man Leute feuert, und niemand bei den Earnings Calls fragt, wer die Produkte kauft wenn Arbeitnehmer weg sind. Er nennt ein unterschätztes Risiko: Viele arbeitslose junge Männer korrelieren historisch mit Kriminalität und Konflikten. > *"Wenn viele arbeitslose junge Männer herumsitzen, passiert meistens nichts Gutes. Kriege entstehen, Kriminalität steigt. Wir müssen vorbereitet sein."* — Cenk Uygur ## [37:35] Warum Massenarbeitslosigkeit schneller kommen könnte als erwartet Bartlett beschreibt einen Besuch bei einem Robotik-Beschleuniger in San Francisco, wo alle Teams von Software auf physische Roboter umgeschwenkt waren — Intelligenz, früher der fehlende und teure Faktor, kostet jetzt fast nichts. Er fragt beide, wo sie falsch liegen könnten. O'Leary weigert sich, das Arbeitslosigkeitsszenario zu akzeptieren, und verweist auf NASAs geplante Mondstation und das Mars-Programm als Quellen für Hunderttausende gut bezahlter neuer Jobs. Uygur nennt es das "Interregnum-Problem": Selbst wenn O'Learys Szenario in 20 Jahren eintrifft, kann der 61-jährige Fließbandarbeiter in Cleveland nicht zum Mars-Ingenieur umgeschult werden. Bartlett ergänzt: Der CEO von Uber habe ihm privat gesagt, KI werde 9,4 Millionen seiner Fahrer ersetzen — auf die Frage, was diese Fahrer dann tun, lautete die Antwort: "Ich weiß es nicht." > *"Die Roboter-Komponenten gibt es seit Jahrzehnten. Wir hatten sie immer. Was gefehlt hat — und das teure Teil — war die Intelligenz."* — Steven Bartlett, seinen Mitgründer zitierend ## [46:32] Werbung Werbeblock für Stan (KI-Tool für Social-Media-Inhalte), Pipedrive (CRM) und Cometeer (Kaffee). Kein inhaltliches Debattenmaterial. ## [48:40] Was wirklich zwischen Israel, Iran und dem Nahen Osten geschieht Die Debatte schwenkt zur Geopolitik. Bartlett präsentiert Trumps sinkende Zustimmungswerte und bittet Uygur, den Krieg zu erklären. Uygurs Antwort dauert fast 25 Minuten und folgt einer einzigen These: Der Krieg dient zu 100 % israelischen Interessen und zu 0 % amerikanischen. Er legt dar, wie die Familie Adelson 317 Millionen Dollar in Trumps Wahlkampfkassen pumpte, und stellt fest, dass die israelische Lobby 94 % des Kongresses spendet — AIPAC ist lebenslanger Hauptspender bei Trump, Biden, Hakeem Jeffries, Chuck Schumer und Mike Johnson gleichzeitig. Seit dem 11. September habe Israel sieben Kriege an Amerika ausgelagert; Iran war der letzte auf der Liste. Iran, so Uygur, hatte nie ein Trägersystem, das die USA erreichen könnte, hat Uran nie über 60 % angereichert (Waffentauglichkeit liegt bei 90 %), und der frühere Großayatollah erließ eine Fatwa gegen Atomwaffen. Unterdessen hat Israel den Südlibanon besetzt, plant ihn zu behalten, und Netanyahu forderte öffentlich als Friedensbedingung, dass Israel allein das Recht behält, den Libanon weiter anzugreifen — was jeden Deal unmöglich macht. O'Leary sieht das iranische Regime anders: 150.000 Menschen terrorisieren 90 Millionen andere seit 60 Jahren; eine Regierung, der man keine Atomwaffen in die Hand geben kann; und Chinas Abhängigkeit vom offenen Persischen Golf werde Peking schließlich zwingen, Teheran zur Räson zu bringen. > *"100 % israelisches Interesse, 0 % amerikanisches Interesse. Lasst uns raus aus da. Hört auf, Israels Kriege für sie zu kämpfen, und kommt nach Hause."* — Cenk Uygur ## [01:11:59] Hat Trump unterschätzt, wie lang dieser Konflikt dauern würde? Bartlett fragt O'Leary direkt, ob Trump den Konflikt unterschätzt habe. O'Leary bezeichnet ihn als den ersten echten "Tech-Krieg": 35.000-Dollar-Carbonfaser-Drohnen mit Rasenmähermotoren werden von US-Raketen im Wert von 1,2 bis 3 Millionen Dollar abgefangen — eine Kostenasymmetrie, die eine Rechenlücke offenbart. Er rechnet nicht mit einem Bodeneinsatz, nur mit fortgesetztem Luftbeschuss, bis Irans Führung erkennt, dass die Kosten einer Straßensperrung — 210 Millionen Dollar täglich an entgangenen Einnahmen — den Nutzen überwiegen. Seine Prognose: China zwingt einen Deal durch, bevor die US-Zwischenwahlen kommen. > *"Es ist teuer, weil wir auf der falschen Seite der Verteidigung stehen. Wir brauchen die billigen Drohnen."* — Kevin O'Leary ## [01:15:47] Werbung Werbeblock für Pipedrive (CRM) und Diary of a CEO Conversation Cards. Kein inhaltliches Debattenmaterial. ## [01:18:08] Warum Amerika rasch die Geduld verliert Bartlett benennt den Hebel: Wenn Irans Führung weiß, dass Trump nur noch Monate bis zu den Zwischenwahlen hat, warum jetzt einen Deal schließen? O'Leary ergänzt: Chinas Staatschef braucht die Meerenge ebenfalls offen, um seine Wirtschaft zu sichern — Iran bediene damit zwei Herren. Uygur sagt, der Deal sei längst ausgehandelt: Iran gibt hoch angereichertes Uran an internationale Kontrolleure, die USA heben die Blockade auf, die Meerenge öffnet sich. Jedes Mal wenn Netanyahu Trump anruft, scheitert es an neuen unmöglichen Bedingungen — sofortige Abrüstung, Irans Beitritt zu den Abraham-Abkommen. Jeder Politiker, der den jüngsten Fast-Deal öffentlich abgelehnt habe, habe über eine Million Dollar von der israelischen Lobby erhalten. Und während Russland in der Ukraine blutet und Amerika im Iran, baut China in Afrika und Lateinamerika Straßen und Brücken, gibt nichts für Krieg aus und gewinnt so Einfluss. > *"Nach jedem Telefonat mit Netanyahu wechselt Trump von 'wir werden Frieden haben' zu 'wir werden keinen Frieden haben und diese neuen unmöglichen Bedingungen stellen'. Das ist schon etwa ein halbes Dutzend Mal passiert."* — Cenk Uygur ## [01:29:08] Erleben wir gerade den Aufstieg des Sozialismus in Echtzeit? Bartlett präsentiert Gallup-Daten: Positive Sicht auf den Kapitalismus bei Amerikanern auf einem Allzeittief; 70 % der Demokraten bewerten Sozialismus positiv; 62 % junger Amerikaner stehen ihm wohlwollend gegenüber — und das noch vor den wirtschaftlichen Auswirkungen des Krieges. O'Leary sieht ein zyklisches Phänomen: Alle 17–20 Jahre liebäugelt Amerika mit sozialistischen Stimmungen, und das bricht immer zusammen, wenn junge Idealisten ihre erste Gehaltsabrechnung bekommen. Er verweist darauf, dass 52 Cent jedes Staatsfonds-Dollars der Welt nach Amerika fließen, nicht nach Kuba oder Russland. Uygur lehnt das Framing ab: Amerika praktiziere bereits Sozialismus für Konzerne — Ölsubventionen für profitable Unternehmen, kein Medicare-Preisverhandlungsrecht, jede Branche kapere durch Wahlkampfspenden ihre Regulierungsbehörde. Das eigentliche Projekt sei die Rückkehr zu echten freien Märkten — und dafür müsse zuerst das Geld aus der Politik raus. > *"Wir wären glücklich, wenn wir zum Kapitalismus zurückkehren könnten, geschweige denn bis zum Sozialismus zu gehen, denn im Moment haben wir keinen Kapitalismus. Wir haben Krony-Kapitalismus."* — Cenk Uygur ## [01:34:06] Wer hat bei der nächsten Präsidentschaftswahl wirklich die Nase vorn? O'Leary nennt keinen Gewinner, sagt aber, die Demokraten bräuchten einen gemäßigten Zentrismusbefürworter; er führt Kalifornien als Beispiel für gescheiterte progressive Regierungspolitik an. Uygur überrascht ihn mit einer konkreten Prognose: Tucker Carlson ist der einzige Republikaner, der 2028 gewinnen könnte. Die republikanische Wählerbegeisterung sei bereits zerstört, die Zwischenwahlen verloren, und bis 2028 werden die kombinierten Folgen von KI-Arbeitslosigkeit und Iran-Krieg voll spürbar sein. O'Leary lacht zunächst, rudert dann im Gespräch zurück: Carlson verfüge über eine riesige Social-Media-Basis, betreibe sein eigenes Netzwerk und beziehe zunehmend unabhängige Positionen — auch zu KI. Uygur schließt mit der Nennung von Rohana als dem progressiven Kandidaten mit den besten nationalen Chancen und bekennt sich zum demokratischen Kapitalismus — private Märkte, kontrolliert durch eine funktionierende Demokratie, Nordeuropa als das funktionierende Modell. > *"Sie haben nur einen, der gewinnen kann, und ich mache mir Sorgen darum, und das ist Tucker Carlson. Wenn Tucker in der republikanischen Vorwahl antritt, gewinnt er diese Vorwahl definitiv. Sie können mich daran messen."* — Cenk Uygur ## Entitäten - **Kevin O'Leary** (Person): Shark-Tank-Investor, Vorsitzender von O'Leary Ventures; argumentiert, KI schaffe Chancen, verteidigt den Rechenzentrumsausbau, führt KI-Gegenwehr auf chinesische Finanzierung zurück und prognostiziert, China werde Iran vor den US-Zwischenwahlen zu einem Deal zwingen. - **Cenk Uygur** (Person): Mitgründer von Young Turks, progressiver Kommentator; argumentiert, auf KI-bedingte Arbeitslosigkeit sei niemand vorbereitet, die US-Außenpolitik werde von der israelischen Lobby gesteuert, und Amerikas politisches System sei durch legalisierte Bestechung korrumpiert. - **Steven Bartlett** (Person): Moderator, Diary of a CEO; Unternehmer und Investor; moderiert die Debatte und bringt eigene Einstellungsentscheidungen sowie Beobachtungen aus einem Robotik-Labor ein. - **AIPAC / israelische Lobby** (Organisation): Von Uygur als lebenslanger Hauptspender der meisten führenden US-Politiker beider Parteien benannt; zentrales Element seiner These, warum der US-Iran-Krieg trotz eines fertigen Deals weitergeht. - **Arabella / Alliance for a Better Utah** (Organisation): Netzwerk, das O'Leary mit chinesisch verbundenen Einheiten finanziert sieht, um Anti-Rechenzentrum-Desinformationskampagnen in US-Bundesstaaten zu betreiben; Quellenbasis: IRS-990-Unterlagen. - **UBI (Universelles Grundeinkommen)** (Konzept): Vorgeschlagenes Sicherheitsnetz für durch KI verdrängte Arbeitnehmer; Uygur weist darauf hin, dass selbst ein bestmögliches UBI von 36.000 Dollar jährlich ein verheerender Einkommenseinbruch gegenüber früher 120.000 Dollar ist. - **Straße von Hormus** (Konzept): Nadelöhr für 48 % der chinesischen Energieimporte; ihre Sperrung treibt die globale Inflation an, ihre Wiedereröffnung ist das Kerninteresse der USA an jedem Iran-Deal. - **Deepseek** (Software): Chinesisches Large-Language-Model; O'Leary und Amodei nennen es als Beleg, dass jede Pause bei der US-KI-Entwicklung China innerhalb von Monaten einen entscheidenden Vorsprung verschafft. - **Tucker Carlson** (Person): Ehemaliger Fox-News-Moderator, jetzt unabhängige Medienpersönlichkeit; Uygur prognostiziert, er sei der einzig lebensfähige republikanische Präsidentschaftskandidat 2028 — eine Prognose, die O'Leary letztlich nicht verwirft. - **Demokratischer Kapitalismus** (Konzept): Uygurs bevorzugtes Wirtschaftsmodell — private Märkte, kontrolliert durch eine funktionierende Demokratie; er grenzt es vom derzeitigen US-Korporatismus und vom europäischen Sozialismus ab. - **Rohana** (Person): Progressive politische Figur, von Uygur mehrfach als einzige Politikerin genannt, die ernsthaft an KI-Arbeitslosigkeitspolitik arbeitet und 2028 dem demokratisch-kapitalistischen Ideal am nächsten kommt.

#ai-economy#unemployment#iran-war

KI-Sicherheit für Unternehmen mit Onyx Security CEO Maxim Bar Kogan

41:09

EN/ZH

Watch with Captions

No Priors: AI, Machine Learning, Tech, & Startupsvor etwa 1 Monat

KI-Sicherheit für Unternehmen mit Onyx Security CEO Maxim Bar Kogan

Sarah Guo spricht mit Maxim Bar Kogan, Mitgründer und CEO von Onyx Security, darüber, was es wirklich braucht, um KI-Agenten auf Unternehmensebene abzusichern. Maxim argumentiert, dass klassische Kontrollen — Proxies, Identitätsrestriktionen, menschliche Prüfung — versagen, sobald Agentenaktionen exponentiell zunehmen, und dass der einzige tragfähige Weg darin besteht, spezialisierte kleine Modelle zu trainieren, die erkennen, wann ein mächtigeres Überwachungssystem eingreifen muss. Das Gespräch beleuchtet Onyxs Produkt "Secure Control Plane", die Kosten-Latenz-Rechnung hinter eigenem Modelltraining, warum Labs ihre eigenen Modelle nicht glaubwürdig als sicher zertifizieren können — und Maxims Überzeugung, dass AGI kommt und unabhängige KI-Aufsicht ein Hundert-Milliarden-Dollar-Geschäft werden wird. ## [00:00] Kaltstart Maxim steigt mitten in einen Gedanken ein: Je mehr Unternehmen KI-Agenten einsetzen, desto mehr werden Fehler folgen — Agenten, die versehentlich Zugangsdaten veröffentlichen, unerlaubte Netzwerkaufrufe starten, irreversible Schritte einleiten. Die Unternehmen wissen, dass die Einführungswelle nicht zu stoppen ist; was ihnen fehlt, ist jede Möglichkeit, eine legitime Agentenaktion von einer illegitimen zu unterscheiden. Der Einstieg umreißt die Kernthese von Onyx, noch bevor das Intro läuft. > *"Unternehmen beginnen definitiv zu begreifen, dass dieses Risiko exponentiell gewachsen ist und dass sie keine Möglichkeit haben, die Einführung zu stoppen. Sie müssen jetzt etwas tun, um die Wahrscheinlichkeit zu verringern, dass diese Agentenaktionen illegitim oder fehlerhaft sind."* ## [00:45] Vorstellung von Maxim Bar Kogan Sarah stellt Maxim als Mitgründer und CEO von Onyx Security vor, einem in Israel ansässigen Startup mit Forschern, Mathematikern und Ingenieuren — beschrieben als ein Team, das Agenten baut, die KI-Agenten beobachten. Das Unternehmen verbindet offensive Cyber-Expertise mit tiefgreifender KI-Forschung, einschließlich Arbeiten zu synthetischen Daten und mechanistischer Interpretierbarkeit. ## [01:10] AutoGPT und die Wette auf Agentenaktionen Das noch vor zwei Jahren dominante Sicherheitsthema in Unternehmen war DLP für Chatbots — Mitarbeiter, die sensible Daten in ChatGPT einfügen. Dieses Bild hat sich seitdem in fast panische Sorge um autonome Agentenaktionen verwandelt. Maxim führt Onyxs Ausgangspunkt auf AutoGPT zurück: den ersten Agenten, bei dem ein LLM selbst entschied, was zu tun war, ein Werkzeug aufrief und in einer Schleife weiterarbeitete — statt nur Text zu erzeugen. Die Demo bewies, dass Agenten eigenständig in der realen Welt handeln können, und Maxim erkannte sofort: Jemand muss diese Aktionen im großen Maßstab überwachen. > *"AutoGPT hat die Fantasie aller beflügelt — auch unsere — weil es der erste wirklich autonome Agent war, der auf LLMs lief: ein Agent, der ein LLM nicht Text generieren ließ, sondern entscheiden ließ, was zu tun ist, und ihm dann API-Zugang gab, um genau das zu tun."* ## [05:17] Was das Onyx-Produkt leistet Onyx tut zwei Dinge: Modelle trainieren und Agenten bauen, die andere Agenten überwachen — und diese Fähigkeit als "Secure Control Plane" verpacken, die Unternehmen in ihren KI-Stack einhängen. Die Control Plane prüft Agentenaktionen in Echtzeit auf Legitimität und managt dabei den Zielkonflikt zwischen Latenz, Kosten und Verlässlichkeit. Langfristig sieht Maxim die Vision über Unternehmenssicherheit hinaus: Jedes Unternehmen, das KI-Agenten betreibt, braucht eine herstellerunabhängige Stelle, die zertifiziert, was diese Agenten tun. > *"Die Zahl dieser Aktionen wächst exponentiell. Was früher nützlich erschien — etwa ein Mensch in der Schleife — funktioniert nicht mehr, wenn man das Hundertfache, Tausendfache, Millionenfache dieser Aktionen hat."* ## [07:47] Stand der KI-Einführung in Großunternehmen In einem typischen Großunternehmen sieht Maxim heute drei Kategorien von KI-Einsatz: Low-Code-SaaS-Automatisierungen per Drag-and-Drop (nicht wirklich autonom), intern entwickelte oder kundenorientierte Agenten, und autonome Coding-Agenten und -Assistenten. Von diesen drei macht der Bereich Coding-Agenten inzwischen über 50% des KI-Einsatzes aus. Die reifsten Sektoren — Finanzdienstleistungen, Gesundheitswesen — haben die strengsten Kontrollen, aber selbst die vorsichtigsten Unternehmen haben aufgehört, KI pauschal zu sperren, und sind dazu übergegangen, sie zu managen. > *"Über 50% entfallen auf autonome Coding-Agenten und Assistenten im durchschnittlichen Unternehmen."* ## [09:58] Agenten absichern Unternehmen geben bereits rund 100 Milliarden Dollar pro Jahr für Sicherheit aus — Endpunkte, Netzwerke, Cloud, Identität. Sarah fragt, wie viel davon auf Agentensicherheit übertragbar ist. Maxims Antwort: praktisch nichts. Identitätskontrollen, die grundlegendste Schicht, versagen, weil Agenten breite, dynamische Berechtigungen brauchen, die sich nicht im Voraus eingrenzen lassen. Ein Agent, der Code in einem Repository schreibt oder E-Mails im Namen eines Managers sendet, lässt sich nicht auf ein enges Berechtigungsset beschränken wie ein statischer Softwareprozess. Die Angriffsfläche ist die Absicht — und bestehende Werkzeuge können Absichten nicht lesen. > *"Bei diesen autonomen KIs, diesen Assistenten, diesen Coding-Agenten kann man wirklich nicht im Voraus wissen, welche Berechtigungen man ihnen geben soll."* ## [12:45] Warum Proxies nicht funktionieren Sarahs Instinkt aus ihrer eigenen Sicherheitsvergangenheit: Das klingt nach einem Problem für einen Proxy mit einer intelligenteren Policy-Engine. Maxim stimmt zu, dass Proxies als Integrationspunkt in manchen Architekturen funktionieren — aber das eigentliche Problem verfehlen sie. Ein Proxy liefert den Datenstrom; er sagt nicht, ob die Aktion in diesem Strom legitim ist. Dieses Urteil erfordert, den Kontext zu verstehen — das Ziel des Agenten, seine Geschichte, was das Unternehmen autorisiert hat — und kein Regelwerk kann das bei beliebigem Agentenverhalten leisten. > *"Das eigentliche Problem ist zu verstehen, ob das, was ich jetzt tun soll, in Ordnung ist oder nicht. Bei KI-Systemen ist das die entscheidende Frage."* ## [14:11] Warum Onyx eigene Modelle trainiert Die naive Lösung — Claude Code mit Claude Code überwachen — scheitert an Kosten und Latenz. Einen Frontier-Modell-Agenten für jeden Unternehmensagenten zu betreiben, würde die Sicherheitsschicht teurer machen als die überwachte KI selbst. Onyxs Antwort sind kleine, hochspezialisierte Modelle, die genau eine Aufgabe erfüllen: entscheiden, ob die aktuelle Aktion eine Eskalation an ein stärkeres Überwachungssystem rechtfertigt. Sarah vergleicht es mit Blitzschach: Großmeister spielen bei schnellen Zügen intuitiv und halten nur an kritischen Punkten inne. Maxim bestätigt die Analogie — man will Intelligenz genau dort konzentrieren, wo das Risiko am höchsten ist, und überall sonst schlank bleiben. > *"Man will Modelle trainieren, die wirklich gut in einer Sache sind. Sie sind sehr klein. Sie können kaum etwas anderes, außer zu sagen: 'Soll ein klügerer Agent das hier anschauen?'"* ## [18:38] Onyxs Talentkultur Israels Sicherheitstalent — geformt durch Einheiten wie 8200, Unternehmen wie Armis und Wiz — ist bekannt. Onyxs DNA ist anders: Mitgründer Gils Hintergrund liegt in synthetischen Daten und Nvidia, nicht im offensiven Cyber-Bereich. Der Großteil von Onyxs Forschungsteam kommt aus einer israelischen Geheimdiensteinheit, die an der Schnittstelle von Mathematik und Cyber arbeitet. Maxim sieht diese Mischung als bewusste Entscheidung — das langfristige Problem, das Onyx löst, ist nicht nur Unternehmenssicherheit, sondern die Frage, wie fortgeschrittene KI überhaupt kontrolliert werden kann. Das erfordert tiefe KI-Kompetenz neben Sicherheitsinstinkt. Israel als Ganzes holt in der KI schnell auf: Weltmodelle, KI-Infrastruktur, Chips. > *"Das Problem ist nicht nur Cybersicherheit. Das Problem ist, wie wir fortgeschrittene KI langfristig kontrollieren — und dieses Problem klingt wichtig, selbst wenn man die Sicherheitslücken im Unternehmen völlig außer Acht lässt."* ## [21:24] Mechanistische Interpretierbarkeit Maxim ist überzeugt, dass mechanistische Interpretierbarkeit — zu verstehen, was tatsächlich in den Gewichten und Aktivierungen eines Modells vorgeht — sowohl möglich als auch notwendig ist. Seine kontraintuitive These: Je klüger Modelle in bestimmten Bereichen werden, desto besser werden sie geeignet sein, die innere Struktur anderer Modelle zu entschlüsseln, als wir es je könnten. Onyx finanziert aktiv Forschung in diese Richtung — nicht nur als Sicherheitswerkzeug, sondern als Fenster in die Natur von Intelligenz selbst. Sarah unterstützt die Wette und verweist auf die Möglichkeit, nicht nur KI, sondern Kognition insgesamt besser zu verstehen. > *"Da wir anfangen, Modelle zu haben, die uns zumindest in einigen wichtigen Bereichen weit überlegen sind, glauben wir, dass wir mechanistische Fähigkeiten viel effektiver erschließen können."* ## [23:35] Wie Onyx Kundenvertrauen aufbaut Fortune-10- und Fortune-20-Unternehmen arbeiten normalerweise nicht mit zwei Jahre alten Startups unter 100 Mitarbeitern. Was diese Regel bricht, ist der Schmerz: CISOs, die täglich mit Agentenvorfällen konfrontiert werden, haben keinen etablierten Anbieter, den sie anrufen können — das Problem gab es vor drei Jahren nicht. Onyx bekommt Anfragen von Unternehmen, die das Startup beim Verlassen der Stealth-Phase entdeckten, weil die Problembeschreibung genau zu etwas passte, das sie bereits täglich bekämpften. Maxim sieht das als enges, vorübergehendes Fenster: Unternehmenskunden wissen, dass neue Startups wachsen werden, und sind lieber frühe Kunden, die das Produkt mitgestalten, als späte Nachzügler. > *"So eine Öffnung entsteht nur, wenn der Schmerz sehr stark ist. Der Schmerz ist so groß, dass sie sagen: 'Ich habe gerade dieses Unternehmen beim Verlassen der Stealth-Phase gesehen, aber es ist ein Problem, das ich täglich habe — ich ruf einfach an.'"* ## [25:10] Risikominderung auf grundlegender Ebene Die zweite Welle der CISO-Panik — jenseits der Agentenaktionen — ist der rapide sinkende Kostenboden für automatisierte Schwachstellenforschung. Coding-Tools können heute Schwachstellen in einem Ausmaß finden und ausnutzen, das noch vor wenigen Jahren Jahrzehnte entfernt gewirkt hätte. Maxim sagt, der Markt reagiert nicht über: Das ist eine echte strukturelle Verschiebung. Die richtige Antwort ist zweigleisig: schnelles Patchen und Sofortmaßnahmen jetzt, plus Investitionen in grundlegende Kontrollen — abgesicherte Identitäten, Firewalls, Endpunkterkennung — die die angreifbare Fläche unabhängig davon verkleinern, was die Werkzeuge des Angreifers leisten können. > *"Die echte Lösung — und das weiß jede Sicherheitsverantwortliche in Großunternehmen — ist, dass wir die grundlegenden Bausteine haben müssen, um diese Risiken zu vermeiden."* ## [27:45] Stufenweiser Rollout von Glasswing und Daybreak Zu Anthropics Glasswing und OpenAIs Daybreak — den kontrollierten Rollouts für leistungsfähigere Modelle: Maxim hat eine differenzierte Sicht. Ein schrittweiser Rollout ist ideal, wenn er global koordiniert ist — er gibt Zeit, Playbooks zu entwickeln, Wissen zu teilen und katastrophale Ausfälle in Stromnetzen oder Fluggesellschaften zu verhindern. Aber wenn ein Akteur ein vergleichbar leistungsfähiges Modell vor dem geplanten Zeitplan veröffentlicht, wird der schrittweise Ansatz zur Belastung: Unternehmen ohne frühen Zugang stehen nun einer Bedrohung gegenüber, auf die sie sich nicht vorbereiten konnten. Seine Empfehlung: den Zugang breit ausweiten, damit mehr Organisationen parallel Abwehrmaßnahmen aufbauen können. > *"Wenn irgendjemand früher an ein methodennahes Modell gelangt, wird es im Rückblick wie ein riesiger Fehler aussehen — wir hätten Unternehmen zumindest die Wahl lassen können, sehr schnell anzufangen."* ## [29:11] Nachzügler unter den Großunternehmen Vor zwei Jahren hatte ein erheblicher Teil großer Unternehmen KI schlicht verboten. Heute begegnet Maxim dem kaum noch. Der Finanzsektor schränkt immer noch ein — Agenten erlaubt, bestimmte Werkzeuge nicht —, aber vollständige Verbote sind verschwunden. Er hält das für richtig: Werkzeugbindung ist selbst ein Risiko. Wer bei der Geschwindigkeit dieses Marktes ausschließlich auf die Modelle eines einzigen Anbieters setzt, steht schlecht da, wenn die nächste Generation die Rangfolge verschiebt. Unternehmen, die breite Werkzeugnutzung erlauben und rigoros managen, werden jene überholen, die aggressiv einschränken. > *"Wer vor einem Jahr auf OpenAI gesetzt hätte, hätte die sicherste Wette der Welt gemacht — und plötzlich hat Anthropic viel bessere Modelle und Werkzeuge."* ## [30:46] Onyx und der größere KI-Sicherheitsmarkt Der KI-Sicherheitsmarkt ist voll mit neuen Anbietern und neuen Angriffsflächen. Maxims Gegenargument zur Sorge um den Produktumfang: Die beiden Kernbausteine der KI von 2026 — Transformer-basierte Foundation Models und werkzeugnutzende Agentenschleifen — haben sich seit Jahren nicht grundlegend verändert. Diese Stabilität erlaubt Onyx, auf viele Agenten-Anwendungen hin zu bauen und dabei die Kerntechnologie schlank zu halten. Die eigentliche Absicherung gegen Architekturwechsel liegt in der Investition in Forscher, die schnell umtrainieren und anpassen können — statt das Produkt auf eine einzige Modellparadigma-Dauerhaftigkeit zu wetten. > *"Die zwei Grundpfeiler, wie KI 2026 funktioniert, haben sich in den letzten Jahren nicht verändert. Wir setzen noch immer weitgehend auf LLM Foundation Models und bauen Agenten noch immer auf sehr ähnliche Weise."* ## [32:36] Sollten Labs Modellvertrauen und Governance übernehmen? Die drängende Frage in der Bay Area: Werden die Labs das Vertrauens- und Governance-Problem irgendwann selbst absorbieren? Maxims strukturelles Argument dagegen: Käufer wollen nicht, dass der Autohändler das Auto zertifiziert. Sicherheitsteams brauchen eine unabhängige Partei, deren Geschäftsmodell vollständig davon abhängt, Recht zu haben — keinen Anbieter, der seinen eigenen Produktruf schützt. Über die Käuferpsychologie hinaus zieht Maxim eine Linie zwischen "zackiger Intelligenz"-Fehlern (alberne Fehler, die mit stärkeren Modellen verschwinden) und absichtsbedingten Ausfällen: Manipulationsangriffe, fehlausgerichtete Ziele, Zieldrift. Die Labs werden die erste Kategorie beheben. Nur eine strukturell unabhängige Aufsicht kann die zweite angehen. > *"Man wird dem Verkäufer eines Produkts nicht vertrauen, wenn er einem sagt, dass dieses Produkt die eigene Umgebung nicht beeinträchtigen wird. Man will eine unabhängige Partei, deren gesamtes Geschäft davon abhängt, einem zu sagen, dass das Produkt korrekt ist — und dabei Recht zu behalten."* ## [36:56] Was in der Sicherheit passieren muss Sarah fragt, was der breiteren Tech- und Forschungsgemeinschaft — insbesondere den Labs — aus einer Sicherheitsperspektive fehlt. Maxims Antwort: Es ist keine technische Lücke, sondern eine Einfühlungslücke. Sicherheitsprodukte zu bauen erfordert ein tiefes Verständnis davon, wie Sicherheitsteams wirklich arbeiten — ihre Organisationsstruktur, Verantwortlichkeiten, Informationsflüsse. Israel produziert starkes Sicherheitstalent teilweise deshalb, weil der Militärdienst Ingenieuren direkte Erfahrung als der Endnutzer verschafft, für den sie später bauen. Die Labs, so seine Andeutung, bauen Fähigkeiten auf, ohne ausreichend auf die operative Realität der Organisationen zu achten, die sie einsetzen und gegen die sie sich verteidigen müssen. > *"Egal welches technische Problem man löst — man baut ein Werkzeug für Menschen, für eine Organisation mit einer bestimmten Struktur. Ein Produkt für dieses Publikum zu schaffen, das nicht nur das technische Problem löst, sondern das die Leute wirklich lieben, ist wirklich schwer."* ## [39:14] Warum Maxim an AGI glaubt Sarah schließt ab und vermerkt Maxims implizite Überzeugung, dass menschliche Sicherheitsteams noch einige Jahre existieren werden. Er bestätigt es — aber mit einem Zeithorizont: Sicherheitsteams werden in naher Zukunft vollständig von KI-Agenten geführt werden, so wie die meiste Wissensarbeit. Seine nüchterne Version von AGI-Optimismus: Die Aufgabe, großartige Produkte zu bauen, ändert sich nicht — man muss immer wissen, wer der Endnutzer ist, und für dessen Erfahrung optimieren. Heute sind das Menschen mit einigen Agenten an ihrer Seite. Wenn sich das Verhältnis umkehrt, gilt dasselbe Prinzip — nur für Agenten, die Kontextfenster lesen statt Dashboards. > *"Heute, wenn ich ein Produkt verkaufe, verkaufe ich es an ein menschliches Publikum mit einigen Agenten — und wenn dieses Publikum mehr Agenten als Menschen wird, wird es wichtig sein, dass wir uns weiterentwickeln und es für Agenten, die die Arbeit machen, wirklich gut funktioniert."* ## Entitäten - **Maxim Bar Kogan** (Person): Mitgründer und CEO von Onyx Security; ehemaliger israelischer Geheimdienst, Hintergrund in Mathematik und offensivem Cyber. - **Sarah Guo** (Person): Moderatorin von No Priors; Gründerin und GP bei Conviction. - **Onyx Security** (Organisation): In Israel ansässiges Startup, das KI-Überwachungsinfrastruktur aufbaut — trainiert spezialisierte kleine Modelle zur Überwachung und Steuerung von KI-Agenten in Unternehmen. - **AutoGPT** (Software): Früher quelloffener autonomer LLM-Agent; von Maxim als Wendepunkt genannt, der Agentenrisiken greifbar machte. - **Glasswing / Daybreak** (Software): Kontrollierte Rollout-Programme von Anthropic bzw. OpenAI für den Zugang zu Frontier-Modellen. - **Mechanistic Interpretability** (Konzept): Forschungsprogramm, das darauf abzielt, die interne Gewichts- und Aktivierungsstruktur neuronaler Netze zu verstehen; Onyx behandelt es als langfristigen Pfeiler der KI-Aufsicht. - **Secure Control Plane** (Konzept): Onyxs Produktkategorie — eine herstellerunabhängige Schicht, die Agentenberechtigungen, Aktionslegitimität und Verhaltenshistorie in Echtzeit überwacht. - **8200** (Organisation): Israelische Geheimdiensteinheit, der weithin zugeschrieben wird, Israels führende Sicherheits- und Tech-Talente hervorzubringen, darunter viele Onyx-Ingenieure.

#ai-security#enterprise-ai#ai-agents

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray

1:09:32

EN/ZH

Watch with Captions

Dan Shipper, Mitgründer und CEO von Every, kehrt zurück mit 12 konträren Prognosen zu KI und Arbeit — die meisten davon als Gegenpol zur vorherrschenden Panik. Sein Kernargument: Automatisierung verkleinert keine Arbeitslasten, sie strukturiert sie um; Codex und Claude Code werden das neue Betriebssystem für Wissensarbeit; die SaaS-Apokalypse ist ein Mythos; und die einzige Fähigkeit, die wirklich zählt, ist die Bereitschaft, die Modelle auf ihrem Weg mitzugehen. Everys 30-köpfiges Unternehmen dient dabei als laufendes Experiment, das Dan in eine ungewöhnlich belastbare Position bringt, um zu beurteilen, ob die Prognosen halten. ## [00:00] Vorstellung von Dan Shipper Lenny Rachitsky erinnert an Dans vorherigen Auftritt, bei dem er fast beiläufig vorausgesagt hatte, dass die Leute Claude Code für nicht-technische Arbeit unterschätzen — eine Einschätzung, die sich als "unglaublich richtig" erwiesen hat. Dans Rückkehr dreht sich um zwölf weitere Prognosen, und er beginnt mit der Pointe: > *"Die KI-Job-Apokalypse ist nicht wirklich ein Ding."* ## [02:56] Dans einzigartige Position an der KI-Front Dan erklärt, warum Every als Frühwarnlabor funktioniert: Jeder Mitarbeiter — Redakteure, Operations, Finanzen — nutzt KI täglich. Das gibt dem Unternehmen einen konkreten Vorsprung bei der Frage, wie die nächsten zwölf Monate in der Praxis aussehen. Er stellt dem "San-Francisco-Bubble"-Blick entgegen, dass die eigentliche Frontier der KI-Adoption dort liegt, wo KI auf einen Domänenexperten trifft, der echte Arbeit erledigt — nicht dort, wo KI gebaut wird. > *"Die Grenze der KI ist dort, wo KI auf einen echten Menschen trifft, der irgendetwas tut."* ## [09:17] Wie sich unsere Arbeit im nächsten Jahr verändern wird Lenny Rachitsky skizziert drei Prognosebereiche: wie wir arbeiten, die Form der Arbeit selbst, und wer davon profitiert. Dans erste These: Alle professionelle Arbeit konvergiert auf einer einzigen Oberfläche — entweder Codex oder Claude Code — die als paralleler Arbeitspartner fungiert, der beobachtet, was man tut, Recherchen übernimmt, E-Mails schreibt und langfristige Aufgaben anstößt, während man selbst im Hauptdokument bleibt. Er selbst hält bereits seit zehn Tagen Inbox Zero, weil Codex und Cora (Everys E-Mail-Agent) seine Korrespondenz erledigen. > *"Ich habe das Gefühl, einen parallelen Arbeitskollegen zu haben, der nicht nur antworten und im Dokument schreiben kann, sondern dann auch Recherchen durchführen kann."* ## [16:39] Das Argument für General Agents Dan prophezeit, dass jedes Unternehmen einen "Super-Agent" in Slack haben wird, mit dem alle Mitarbeiter täglich interagieren — einen Allzweck-Assistenten mit Zugang zum Unternehmenskontext, kein eng spezialisierter Aufgaben-Bot. Dieser Agent wird zur organisatorischen Gedächtnisschicht, die Fragen weiterleitet, Daten bereitstellt und Lücken zwischen Teams schließt, die gar nicht wissen, dass sie miteinander reden müssen. ## [18:08] Codex und Claude Code als neues Betriebssystem der Arbeit Claude Codes Durchbruch war, einen leistungsfähigen Agent direkt auf dem eigenen Rechner zu platzieren — mit Terminal-Zugang und vor allem einem Browser. Anthropic hat das Paradigma zuerst erkannt; OpenAI zog ab Version 5.3 nach und beschleunigte dann. Dans aktuelles tägliches Werkzeug ist Codex, das er dauerhaft neben seiner Proof-Schreib-App laufen lässt — der Agent beobachtet seinen Browser, liest, welche Seite er gerade öffnet, und handelt in seinem Auftrag, ohne den Kontext zu wechseln. > *"Wer auch immer vorne liegt — es erscheint mir sehr offensichtlich, dass alle Arbeit, die man erledigt, auf einer dieser Oberflächen stattfinden wird."* Das Modell "Bring your own AI tokens zu einer SaaS-App" verändert die Wirtschaft: Das SaaS-Produkt zahlt nicht für die Inferenz, der Nutzer schon — das stellt Margen wieder her und beseitigt den Druck, eine proprietäre KI-Schicht von Grund auf zu bauen. ## [25:39] Wie Cursor sich einfügt Cursor dominiert heute Coding-Workflows, steht aber an einem strategischen Scheideweg: entweder reines Coding-IDE bleiben oder sich zur allgemeinen agentischen Oberfläche weiterentwickeln. Eng zu bleiben hält das Produkt fokussiert; zu expandieren bedeutet, direkt mit Codex und Claude Code zu konkurrieren. Dans Prognose: Der Kategoriengewinner wird die Oberfläche sein, die sowohl Code als auch allgemeine Wissensarbeit an einem Ort abdeckt. ## [27:42] Was das für SaaS-Unternehmen bedeutet SaaS-Produkte müssen jetzt agent-lesbar sein, nicht nur menschenlesbar — sauberes HTML, gute CLI-Schnittstellen und ein Design, das Informationen für den automatisierten Abruf aufbereitet. Dan verweist auf Proof: Da Codex die Seite beobachtet, werden kleine Probleme fast sofort behoben — der Loop zwischen "Ich bin auf etwas gestoßen" und "Es ist gelöst" schließt sich sehr schnell. > *"Man kann die Konturen dieses sehr schnellen geschlossenen Kreislaufs erkennen: Ich bin auf etwas gestoßen, einen kleinen Ärger, und kann ihn sofort hier beheben."* ## [31:13] Warum die CLI-Ära bereits vorbei ist Die CLI-Ära wurde im Schnelllauf absolviert. Die Welle verlief: GUI, dann CLI als Power-Move, dann Agents, die die CLI vollständig ersetzen. Sobald ein Agent jede Oberfläche bedienen kann, indem er den Bildschirm liest, fällt der Grund weg, im Terminal zu leben. Dans Prognose ist direkt: > *"CLIs sind vorbei. Wir haben die CLI-Ära im Schnelldurchlauf hinter uns gelassen."* ## [33:34] Zwei Agents sind besser als einer Dan widersetzt sich dem Agent-Maximalismus. Das tatsächlich entstehende Muster sind spezialisierte Agents — einer für Code, einer für E-Mail, einer für Daten — die im Auftrag des Nutzers miteinander kommunizieren. Wenn etwas in einer App kaputtgeht, kann Codex direkt mit dem Agent des Anbieters sprechen, um das Problem zu diagnostizieren — ohne Support-Ticket. Das Paradigma verschiebt sich, sobald man davon ausgeht, dass jeder einen Agent hat und Agents miteinander verhandeln können. ## [36:22] Warum Dan SaaS-Aktien für unterbewertet hält Das Narrativ "SaaS ist tot" verkennt, wie die Wirtschaft tatsächlich funktioniert, wenn Agents die Nutzung antreiben. Wenn Nutzer ihre eigenen KI-Tokens zu einem SaaS-Produkt mitbringen, sinken die Inferenzkosten des Anbieters gegen null. Dans konträre Position: > *"Ich würde SaaS-Aktien gerade jetzt kaufen."* SaaS-Unternehmen, die ihre Produkte agent-freundlich gestalten, werden nicht verdrängt — sie bekommen einen Margenrückenwind. ## [39:01] Warum Automatisierung menschliche Arbeit nicht verringert Das ist die zentrale intellektuelle These der Folge. Dan argumentiert, dass jede Automatisierungsschicht einen menschlichen Manager darüber braucht, der verifiziert, dass sie korrekt funktioniert. Er hat dafür seinen eigenen Benchmark entwickelt — den "Senior-Engineer-Benchmark" — indem er zwei echte Senior Engineers dazu gebracht hat, seine vibe-coded Proof-App unabhängig voneinander von Grund auf neu zu schreiben, und neue Modelle dann gegen diese Referenzlösungen testet. Modelle schafften 30/100, bis GPT-5.5 auf 60/100 sprang. Die Lücke zeigt etwas Wichtiges: Modelle beheben, was man ihnen sagt zu beheben. Ein erfahrener menschlicher Engineer schaut sich die Codebasis an, stellt fest, dass sie komplett neu geschrieben werden muss, und sagt es unaufgefordert — Modelle bringen dieses Urteil nicht von selbst. Es gibt immer einen übergeordneten Rahmen, den ein Mensch artikulieren muss. > *"Jedes Mal, wenn man etwas automatisiert, braucht man einen Menschen darüber, der sicherstellt, dass die Automatisierung korrekt läuft."* ## [47:00] Der Wert von Menschen geschriebenem Code Von Menschen geschriebener Code fungiert nach wie vor als Referenzsignal, das es erlaubt, Modell-Output zu bewerten. Dans Benchmark hängt von zwei menschlich verfassten Neufassungen als Grundwahrheit ab. Je mehr KI-generierter Code zur Norm wird, desto seltener und wertvoller wird der menschlich verfasste Bestand — genau das, was man braucht, um zu beurteilen, ob KI sich wirklich verbessert. ## [48:36] Kurze Zusammenfassung Lenny Rachitsky fasst den ersten Prognosebereich zusammen: Arbeit findet in Codex oder Claude Code statt; jedes Unternehmen bekommt einen Slack-Super-Agent; Bring-your-own-Tokens stellt SaaS-Margen wieder her; CLIs sind vorbei; zwei spezialisierte Agents schlagen einen Generalisten; Automatisierung weitet die menschliche Arbeitslast aus, anstatt sie zu schrumpfen. ## [50:15] Wie sich Arbeit verändert Der zweite Bereich betrifft die Form der Arbeit selbst. Dans Einschätzung: Forward-Deployed Engineers werden die wertvollsten Einstellungen — Menschen, die beim Kunden sitzen, den Workflow verstehen und in derselben Besprechung einen Fix bauen und ausliefern können. Das Konzept der "Allocation Economy" aus seinem früheren Essay greift hier: Menschen werden zu Verwaltern von KI-Kapazität statt zu direkten Produzenten, und gutes Verwalten erweist sich als kognitiv anspruchsvoll in seiner eigenen Art. > *"Ich bin gleichzeitig zutiefst von KI durchdrungen und sehr zuversichtlich, was Menschen und ihre Rolle dabei angeht, sicherzustellen, dass KI Dinge produziert, die es wert sind, produziert zu werden."* ## [56:17] Warum Data Scientists in schlechten Analysen versinken Data-Science-Teams werden mit KI-generierten Analysen von allen anderen im Unternehmen überflutet — Analysen, die plausibel aussehen, aber häufig falsch sind. Die Aufgabe des Senior Data Scientists verschiebt sich vom Erstellen von Analysen zu deren Prüfung, was schwieriger und kognitiv anspruchsvoller ist. Die gleiche Dynamik trifft Engineering: Anfragen auf Junior-Level werden von Modellen übernommen, was mehr Randfälle freilegt, die tieferes Urteilsvermögen erfordern. > *"Man braucht mehr Senior-Leute, die sich mit den tieferen Fragen befassen, die schwieriger sind, für das Team, das mit all den grundlegenden Anfragen umgeht."* ## [58:24] Welche Produkt- und Tech-Rollen am wenigsten von KI verändert werden Dans Antwort: die Rollen, deren Output am schwersten als Prompt zu formulieren ist. Er unterscheidet zwischen "Agents beaufsichtigen" — passiv auf Fehler warten — und "Forward-Deployed Engineering" — aktiv Systeme aufbauen, die allen anderen ermöglichen, das zu tun, was früher Spezialisten erforderte. Letzteres ist der Ort, wo die interessante, schwer automatisierbare Arbeit stattfindet. ## [62:17] Wir werden viel mehr KI-generierte Texte lesen — und sie mögen Every nutzt Notion-Agents für die Quartalsplanung — die Strategieberichte jedes Teams werden KI-generiert, und was Dan zurückbekommt, ist besser als das, was manuelle Planung produziert hat. Seine E-Mails werden größtenteils von GPT-5.5 verfasst. Sein Test, ob KI-geschriebene Inhalte akzeptabel sind: Musste der Absender verstehen, was darin steht, um die KI anzuleiten? Wenn ja, in Ordnung. Wenn der Absender es offensichtlich nicht gelesen hat, ist das ein Verstoß gegen den sozialen Vertrag. > *"Das Schlechte daran ist, dass es dem Absender weniger Zeit gekostet hat, es zu erstellen, als es mir kostet, es zu lesen."* Er veröffentlicht außerdem Every-Guides, die mit KI-Co-Autoren entstehen und explizit für den Konsum durch Menschen und andere Agents gestaltet sind — ein neues Inhaltsformat, das auf doppelte Nutzung optimiert ist. ## [68:28] Warum Product Manager die KI-Ära dominieren werden Dan nennt Everys internen PM Marcus, der das Spiral-Produkt leitet, als Prototyp: starkes Produktgespür, fähig, KI anzuleiten, um schnell zu bauen und zu iterieren, liefert ohne auf Engineering-Kapazität warten zu müssen. PMs sind im Kern Verwalter — sie entscheiden, was gebaut werden soll und für wen — genau die Fähigkeit, die knapp bleibt, wenn das Bauen selbst günstig wird. > *"Ich bin super, super zuversichtlich, was PMs angeht."* ## [71:05] Full-Stack-Designer sind die anderen großen Gewinner Full-Stack-Designer — Menschen mit starkem Gespür für Visuelles, die auch im Code arbeiten — stellen bereits direkt in Tools wie Lovable und Figma Make Pull Requests. Die Übergabe zwischen Design und Engineering verdichtet sich gegen null. Dan erwartet, dass sie neben PMs zu den gefragten Superhelden der KI-Ära werden. ## [73:11] Die KI-Job-Apokalypse wird nicht eintreten Dan trennt die aktuelle Entlassungswelle (überwiegend Korrekturen nach übermäßigem Hiring) von einer strukturellen KI-Verdrängungsthese — und verwirft letztere. Sein strukturelles Argument: Modelle werden auf der gestrigen menschlichen Kompetenz trainiert, was bedeutet, sie produzieren das bereits Bekannte in seiner standardisiertesten Form. Menschen treiben die Frontier voran, indem sie mit dieser eingefrorenen Kompetenz neue Dinge tun, was Raum schafft, zu dem Modelle dann aufholen müssen. Der Zyklus wiederholt sich. > *"Strukturell, wegen der Funktionsweise der Modelle, wird es immer Raum für Menschen geben, weiter voranzupreschen."* ## [76:00] Wie man die Modelle "mitreitet", um relevant zu bleiben Der konkrete Rat: Neuen Modell-Releases nicht widerstehen — jeden als neue Menge von Fähigkeiten betrachten, die man erkundet und auf die eigene Domäne anwendet. Dan führt seinen Senior-Engineer-Benchmark jedes Mal neu durch, wenn ein wichtiges Modell erscheint. Er widerspricht auch der Idee, dass das Wissen über KI-Frontier in San Francisco sitzt. Every arbeitet aus Brooklyn heraus und bleibt genau deshalb vorne, weil sie Modelle für alles einsetzen — nicht weil sie sie bauen. > *"Das Einzige, was man tun muss, ist die Modelle mitzureiten. Und das bedeutet: sie für das einzusetzen, was man selbst tut."* ## [81:02] Abschließende Prognosen und Ratschläge Lenny Rachitsky fasst die zwei Seiten der Medaille zusammen: "Weniger verändert sich, als du fürchtest" (SaaS bleibt, Jobs verschwinden nicht) und "Mehr verändert sich, als du vorbereitet bist" (wie Arbeit erledigt wird, welche Rollen zählen, wie ein Arbeitstag aussieht). Dans abschließende These: Forward-Deployed Engineer ist die neue unverzichtbare Einstellung; Unternehmen, die Mitarbeiter daran hindern, die neuesten Modelle zu nutzen, begehen einen strategischen Fehler mit langsamer Wirkung. ## [85:24] Blitzrunde Im Schnelldurchlauf: Dans konträrste Überzeugung ist, dass die KI-Job-Apokalypse schlicht nicht passiert; das eine, was er sich wünscht, dass mehr Menschen verstehen, ist, dass die Frontier der KI nicht in San Francisco liegt — sondern dort, wo jemand ein Modell einsetzt, um echte Arbeit in einer echten Domäne zu erledigen. Er würde seinem früheren Selbst raten, Senior Engineers früher einzustellen, und erwartet, dass KI im nächsten Jahr grundlegend verändert, wie Menschen über Benchmarks nachdenken. ## Entitäten - **Dan Shipper** (Person): Mitgründer und CEO von Every; Autor des Essays "After Automation"; führt Every als lebendiges KI-Adoptionslabor - **Lenny Rachitsky** (Person): Gastgeber des Lenny's Podcast, Gründer des Lenny's Newsletter, ehemaliger PM bei Airbnb - **Every** (Organisation): 30-köpfiges KI-natives Medien- und Softwareunternehmen; alle Mitarbeiter sind tägliche KI-Nutzer - **Codex** (Software): OpenAIs agentische Coding- und Wissensarbeits-Oberfläche; Dans aktuelles tägliches Werkzeug - **Claude Code** (Software): Anthropics terminalbasierter Coding-Agent; hat das agentische On-Computer-Paradigma begründet - **Proof** (Software): Dans KI-unterstützte Markdown-Schreib-App; die Referenz-Codebasis für seinen Senior-Engineer-Benchmark - **Cora** (Software): Everys E-Mail-Agent, mit Codex für das Postfach-Management integriert - **Cursor** (Software): KI-Coding-IDE an einem strategischen Scheideweg zwischen Coding-Tool und allgemeiner Agent-Oberfläche - **Forward-Deployed Engineer** (Konzept): Eine Hybridrolle, die technische Umsetzung mit kundenseitiger Problemfindung verbindet; Dans Wahl für die wertvollste neue Einstellung im KI-Zeitalter - **Senior-Engineer-Benchmark** (Konzept): Dans eigene Evaluation, bei der zwei menschliche Senior Engineers eine Codebasis von Grund auf neu schreiben; neue Modelle werden gegen diese Referenzlösungen bewertet - **Allocation Economy** (Konzept): Dans Rahmen, der prognostiziert, dass Menschen von direkten Produzenten zu Verwaltern von KI-Kapazität werden - **Ride the Models** (Konzept): Dans Ratschlag, um relevant zu bleiben — jeden neuen Modell-Release als neues Set von Fähigkeiten betrachten, die man aktiv erkundet und auf die eigene Domäne anwendet

#ai-agents#future-of-work#saas

PodcastsHear the voice. See the shape of the thought.

Kanäle durchsuchen

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, &amp; Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

Tech Whistleblower: You Only Have 3 Years Left Before It Hits! - Mo Gawdat

A Conversation With Demis Hassabis' Biographer

Inside xAI: Building Grok Imagine in 3 Months, Videogen vs World Models, and Video Agents— Ethan He

A rational conversation on where AI is actually going | Benedict Evans

The Ex-Congressman Who Says AI Isn't Unstoppable — Brad Carson

Anthropic's Digital God, Pope vs AI, Job Loss Narrative Flips, Open Source Crackdown Coming?

Biggest Mysteries in Physics: Antimatter, Dark Energy & ToE - Don Lincoln | Lex Fridman Podcast #497

The Rule for Picking AI Winners | The a16z Show

Neuralink's DJ Seo: Inside the Race to Connect Brains and AI

Why Opus 4.8 Pulled Me Back to Claude

NOTFALL-DEBATTE: Sie lügen uns über KI, den Iran-Krieg und was als Nächstes kommt!

KI-Sicherheit für Unternehmen mit Onyx Security CEO Maxim Bar Kogan

Devin’s 80% Moment: Background Agents, 7x PRs, & End of Hand-Held Coding — Walden Yan & Cole Murray

Private Märkte, Software-Repricing und Kapitalallokation | Marc Rowan bei a16z

Wir haben alles mit KI automatisiert und unsere Mitarbeiterzahl verdreifacht

🔬 Die Bitter Lesson kommt für Proteine – Alex Rives, BioHub

Wie Cursor Composer auf Fireworks trainierte: Verteilte Infrastruktur für hochperformantes RL

Deinen ersten Managed Agent deployen

Bruno Fernandes: Roy Keane hat meine Worte verdreht. Man bot mir 200 Mio. £ an, ich lehnte ab.

Das KI-Paradox: Mehr Automatisierung, mehr Menschen, mehr Arbeit | Dan Shipper

No Priors: AI, Machine Learning, Tech, & Startups