LaiDub

팟캐스트Hear the voice. See the shape of the thought.

채널 둘러보기

전체 AI & 테크 비즈니스 과학 문화 정치 철학 건강

What does the next training paradigm look like?

What does the next training paradigm look like?

Dwarkesh Patel narrates his essay on where AI training is headed. The labs are betting that scaling RL across millions of verifiable tasks gets you to AGI, but Dwarkesh argues that bet leaves two holes: most valuable skills aren't "grindable" enough to farm in a simulator, and the learning models pick up on the job never makes it back into their weights. He walks through why sample efficiency and continual learning are the same problem, sketches two candidate fixes — on-policy self-distillation and "dreaming" — and imagines an AI that keeps getting smarter from being deployed rather than from pretraining. ## [00:00] The big research bet the labs are making The labs' working theory: train AIs on millions of verifiable tasks across thousands of RL environments and you'll get a general problem-solver that can grind on open-ended work for weeks. Optimists argue the known deficits — data inefficiency, no continual learning — will get steamrolled by more compute, the same way classic NLP problems collapsed once LLMs scaled. Dwarkesh lays out their strongest counter to his own skepticism: the million-fold sample-inefficiency he flagged in his last essay is only a training-time cost, amortized across billions of sessions. What matters is how capable the model is *during* a session, and that keeps improving. Continual learning might not even be needed if context windows grow large enough to hold months of on-the-job experience. > *People often say that their employees are not net productive until six months or more on the job. So clearly, online learning is necessary for competence. But what if you could just fit those six months into the context window?* ## [02:12] Grindability is just as important as verifiability Why has computer use lagged coding and math when it's just as verifiable? Dwarkesh's underrated answer: being verifiable isn't enough — a domain also has to be *grindable*, meaning you can run thousands of parallel rollouts against a deterministic, replayable simulator from the same starting point. A coding repo clones trivially into a container; Amazon's checkout flow does not. This is the canyon wall AI progress only slowly chips at. You can sometimes build farmable simulators (clone Slack, clone Gmail), but most high-value skills — building a business, winning a court case, running a profitable trading day — require irreproducible interaction with the real world, where verification takes months and can't be re-observed across parallel rollouts. > *What is the RL environment to make an AI that is as good at politics as Lyndon Johnson, or as good at building a space-launch business as Elon Musk?* ## [06:10] Will RLVR alone generalize? The labs are betting RLVR generalizes — that enough containerized environments yield an agent that plans, adapts, and picks up new skills inside a single session, good enough to out-advise LBJ on a 1948 Senate race or build SpaceX with a hundred million dollars. Whether it generalizes that far is an empirical question, and Dwarkesh reads a Dario Amodei quote as a hint that it doesn't stretch infinitely: short-horizon training may not transfer to long-horizon performance. Even if in-context experience could turn a model into Henry Ford for a session, it's all wasted if the learning can't return to the weights. 30–50% of a lab's compute goes to inference that currently does nothing to improve the model — even though deployment is exactly where the most valuable information is revealed. > *We've got some genius grad student who's never been allowed to take a real internship, and we keep giving it more and more classroom case studies in the form of RL training on environments.* ## [08:41] Getting the learning back to the weights Continual learning means updating the weights, not endlessly growing a KV cache — brains don't separate parameters from activations, and they compress what they learn. But moving into the weights forfeits in-context learning's sample efficiency, because gradient updates are coarse. That's why every shipped online-learning model (like Cursor's Tab model, learning the same accept/reject objective across 400M+ requests a day) learns one identical thing across all users, which defeats the point when every job and company differs. Dwarkesh frames sample efficiency and continual learning as the same problem, then argues the bottleneck isn't architecture — new sparse-attention and KV-compaction papers ship weekly — but the loss function. His candidate is on-policy self-distillation: train the base model to make the same predictions a context-rich veteran version of itself would make. OPSD needs no outer-loop reward, gives denser per-token supervision than RL, and keeps RL's sparse-update property so on-the-job learning doesn't overwrite what the model already knows. > *The way you get better at your job is not by recalling the transcript of every single thing that happened every day with perfect fidelity. Rather, it's by consolidating the handful of insights and pieces of knowledge that are actually relevant to you getting better at your job.* ## [15:22] Dreaming The second, more speculative fix: let the AI build a simulation of reality and rehearse against it, experiencing orders of magnitude more samples per unit of wall-clock time. The precedent is EfficientZero, which beat novice humans at unfamiliar Atari games by playing dozens of simulated games in its head per real step. Simulating the whole world is far harder than emulating Go, which is why Dwarkesh flags this as speculative — but if it works, it becomes a fourth scaling axis alongside pretraining, RL, and inference-time compute. Instead of hitting `/compact` to summarize a session, you'd hit `/dream` and burn compute rehearsing against a video-game version of what the model is seeing in production. > *So instead of hitting /compact in Codex or Cursor or Claude... you hit /dream. And this incinerates huge amounts of compute to build and train against a video-game version of what the model is witnessing in the real world.* ## [17:23] What 2027 looks like Dwarkesh's scenario: RLVR produces an agent competent enough to start getting real-world experience, context windows stretch to a full week of co-working, and at the end of the week a thumbs-up triggers the base model to distill what it learned — via OPSD, dreaming, or some mix. Each round the model expands into domains adjacent to what it was last trained or deployed on. The endgame flips how AI improves: capability comes mostly from broad deployment across the economy, not from pretraining before release. Every interaction makes the model smarter — learning from your past sessions and from everyone else's — which Dwarkesh calls scary, exciting, and very different from today. > *Just as pretraining created a base intelligence that was smart enough to become a competent agent with enough RLVR on top, so RLVR has created an agent that is competent enough to actually be broadly deployed in the world.* ## Entities - **Dwarkesh Patel** (Person): Podcast host and essayist; narrates his own blog post on AI training paradigms. - **Dario Amodei** (Person): Anthropic CEO, quoted on why model performance degrades at long context. - **RLVR** (Concept): Reinforcement learning from verifiable rewards — training on reproducible, checkable tasks; the labs' main bet for reaching AGI. - **Continual learning** (Concept): Updating a model's weights from on-the-job deployment rather than only from pre-release training. - **Grindability** (Concept): Dwarkesh's term for whether a domain can be farmed via many parallel rollouts on a deterministic, replayable simulator. - **On-policy self-distillation (OPSD)** (Concept): Distilling a context-rich session's learning back into the base model's weights with dense per-token supervision. - **Dreaming** (Concept): Speculative fourth scaling axis where a model builds and trains against its own simulation of reality. - **EfficientZero** (Software): Sample-efficient RL model that beat novice humans at unseen Atari games by simulating many games per real step. - **Mercury** (Organization): Fintech banking platform; episode sponsor referenced in the bill-pay anecdote.

#ai-training#reinforcement-learning#rlvr

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

2:08:20

EN/ZH

Watch with Captions

Dwarkesh Patel약 1개월 전

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Historian and novelist Ada Palmer joins Dwarkesh Patel to dismantle the "Machiavellian villain" myth and replace it with the actual Niccolò Machiavelli: a patriot who watched Cesare Borgia conquer half of Italy from up close, was tortured and exiled by the Medici, and then wrote *The Prince* as a secret job application addressed to the very regime that had wronged him. Palmer traces the structural forces — cascading legitimacy collapse among Italian city-states, popes who functioned as warring hereditary princes, and a patronage system that made nepotism feel like sound risk management — that made Machiavelli's analysis both urgent and unprecedented. The conversation closes on a sharp irony: the word "Machiavellian" now means self-serving cunning, yet the man himself gave up income, fame, and freedom rather than serve any cause that was not Florence. ## [00:00] How Florence bargained with Cesare Borgia for survival Italy in 1513 was a cascade of broken legitimacy. Palmer explains that when a long-standing government falls, successor regimes inherit none of its credibility, making rapid further overthrows nearly inevitable — what she calls the thread of continuity being cut. By the time Machiavelli is writing *The Prince*, this dynamic had swept dozens of Italian city-states. Compounding this was papal instability: because popes were elected rather than hereditary, the next pope was almost always a coalition pick of people who hated the current one, guaranteeing policy reversals every ten years. Machiavelli's day job during this era was standing next to Cesare Borgia — "Valentino" — and whispering endlessly that Florence was loyal, buying what Palmer calls "the boon of Polyphemus": the conqueror's promise to eat you last. His advice to Florence was to betray allies, pay tribute, give military support, and buy time, knowing full conquest was only delayed by Alexander VI's mortality. His biographers can still feel how much he was under Borgia's spell: when describing Valentino's fall, Machiavelli breaks from third person and writes "he told me" — the historian slips through the veil. > *"Machiavelli's job dealing with Cesare Borgia… it's very clear that the Borgia plan is to conquer the Papal States in the middle of Italy."* ## [15:08] Machiavelli's analytical innovations Machiavelli is not the crude "ends justify the means" thinker of caricature. Palmer shows that he is obsessed with the means — specifically, which means of acquiring power are stable and which are not. Whether betrayal works depends on the nature of your power base: Borgia could betray allies because his terror made remaining allies step further into line, while Savonarola's power rested on his followers believing him divinely infallible, so his flip-flopping destroyed him. The lesson is conditional, not universal. Machiavelli also makes the first recorded European argument that competing political parties can be stable and politically useful, rather than requiring mutual annihilation. Florence's own history was the counterexample: it had literally salted the earth where its Ghibelline opponents' houses once stood. His observation of Siena as a countermodel — parties competing without destroying each other — was genuinely novel. > *"Machiavelli is the first person that we have ever in the European tradition to suggest that it could be viable for there to be more than one political party in a state at the same time."* ## [23:58] Why popes became warlords The closer you lived to Rome, the less abstract the papacy felt. Palmer draws the contrast sharply: a Danish subject saw the pope as a figure of vast spiritual majesty; a Florentine saw "that asshole who went to college with your brother." Italians judged popes as specific men with dirty laundry, family grudges, and factional allegiances — which is why cities that were hereditarily Guelph (pro-papal) sometimes ended up fighting wars against the sitting pope when he happened to be from a Ghibelline family. The corruption was structural and self-reinforcing. As the Church accumulated donated wealth across generations, the incentive for ambitious families to capture it through bribery and nepotism grew. Palmer reads Machiavelli's personal letters haggling over the correct bribe to buy a priesthood for his brother Totto — written as routine household correspondence — to show how completely normalized the practice was. Every generation saw popes get more secular and military than the last; Machiavelli explicitly predicted the institution would collapse under accumulated corruption unless reformed from within, as St. Francis had temporarily saved it two centuries earlier. > *"This makes a stronger and stronger incentive for every ambitious family to send their second son into the Church."* ## [36:13] Why the common people demanded nepotism When Pope Paul III appointed a competent outsider general instead of his own illegitimate son, there were riots. Palmer explains this is not irrational: in a world where a soldier's oath ran to his commander, not to the state, the only guarantee the papal armies wouldn't turn on Rome was putting the pope's own son in charge — someone who rose and fell with the pontiff. Nepotism was the trust mechanism that made institutions function. Patronage also determined justice outcomes. Medieval law codes prescribed death for almost everything, but roughly 99 in 100 capital-eligible convictions ended in a fine because the defendant's patron intervened. This was considered correct: the trial was meant to replicate the soul's experience before divine judgment — terrifying, then mercifully pardoned — so patron intervention mirrored the intercession of a saint. The system had a grimly consistent internal logic, and Palmer traces it from Giordano Bruno (burned because he had angered his patron, not because of his ideas) to Giovanni Pico della Mirandola (spared because Lorenzo de' Medici went through the Orsini network to Rome). Without a patron, even innocence was precarious. > *"The norm is: you're accused of a severe crime, you're put on trial for your life, your patron intervenes, and you get a lighter sentence. This is how justice is supposed to work."* ## [47:57] Cesare Borgia brought terror to rulers and justice to the people Borgia's conquests produced a paradox that startled contemporaries: he massacred ruling families and was adored by common people. Palmer's explanation is structural. Factional cities had lived for generations under justice that tracked who was in power, not the facts of the case. A carpenter whose family worked for the dominant faction faced minimal consequences for his son's drunken homicide; the same crime by the carpenter of the out-of-power faction could be a capital offense. When Borgia wiped out both factions and installed outside administrators with no local feuds to take sides in, neutral adjudication felt like a revelation. Machiavelli also drew a hard line for why even a beneficent Borgia conquest of Florence would be catastrophic: under any arbitrary ruler, a citizen can be executed by a pointed finger in the street. Machiavelli called that condition slavery, regardless of how fair the tyrant might be in practice. Florence's "LIBERTAS" banner — flown by ordinary citizens defending an oligarchic Senate that excluded them — represented a genuine commitment to the existence of a process, however biased, over the absence of any process at all. > *"As a result, to everyone's surprise, he moves into a city, he massacres the rulers, he implements an authoritarian regime, and he's incredibly popular and beloved by the people."* ## [57:55] Art as a proxy for war Renaissance Florence could not afford to fight France militarily; it could afford to paint French royal symbols on its government buildings and commission beautiful gifts for the French king. Palmer frames this not as surplus expenditure but as substitution: the art budgets were military budgets redirected into a form of warfare Florence could win. Like the Fulbright Program being a higher return-per-dollar than the defense budget, Florentine cultural patronage was strategic deterrence. The period's orientation toward the past further supercharged the value of art. Where modernity assumes humanity advances into the future, Renaissance Europe pointed the other direction: the ideal was recapturing Rome. High-tech achievement meant successfully imitating a lost Roman technique. When a French diplomat arrived in Florence and saw the cathedral or the neoclassical buildings, he was not seeing quaint historical imitation — he was seeing something that approached what only Rome had achieved, and that France could not. That perception was itself a form of power. > *"If we fought him, we would lose. But if we play the culture victory game, that's cheaper, and we can try to win."* ## [01:06:41] Florence, a city famous in hell Dwarkesh raises the obvious puzzle: if everyone in Renaissance Italy was a Christian who genuinely believed in hell, why did they commit the sins Machiavelli describes constantly? Palmer's answer has two parts. First, the Dante answer: Dante fills the *Inferno* with Florentines precisely because he wants his contemporaries to feel the discomfort of consequences they were ignoring. His Paolo and Francesca passage — damning a love story everyone celebrated — was designed to be a shock to readers who thought romantic adultery was exempt from theological reckoning. Second, pre-Reformation Christianity assumed everyone sinned constantly and focused on repentance cycles rather than purity maintenance. St. Julian the Hospitaller, patron saint of murderers, was omnipresent in Florentine iconography — his legend held that he killed his own parents, spent his life in pilgrimage to repent, and was saved. Dozens of icons of him meant dozens of Florentines who had killed someone and were working through it. The Calvinist and Puritan emphasis on spotlessness came later and was a genuine departure from how the medieval and early Renaissance church operated. > *"He fills his hell with Florentines."* ## [01:15:57] The Prince was a job application to Machiavelli's torturers After the Medici retook Florence in 1513 and, on mistaken suspicion of conspiracy, tortured and exiled Machiavelli, everyone expected him to defect. He had contacts at every major court in Europe and the skills — military history, diplomatic networks, classical scholarship — that kings paid for. He chose instead to sit in a hamlet outside Florence writing *The Prince* as a secret appeal to the Medici to take him back. No other courts received it; he kept it proprietary, treating his political science the way Palmer says a nuclear scientist would treat classified weapons knowledge. His other works — the *Discourses*, the history of Florence, the comedy *Mandragola* — circulated publicly to build his reputation. *The Prince* did not. Palmer compares it to historian friends who produce classified 100-page reports for Department of Defense committees: bespoke proprietary knowledge for an audience of five, whose existence may be whispered about but whose contents are guarded. It also explains why the book was eventually published in 1532 without Machiavelli's input: surviving relatives wanted family fame, and the Medici wanted credit for a text dedicated to their house. Neither understood what its author had intended to keep contained. > *"I'm going to stay, and I'm going to rot, and I'm going to write The Prince, which is my job application begging the new regime to bring me back and let me work for them and demonstrating my loyalty, and I'm going to send it to them and only them, them and my immediate friends."* ## [01:41:39] During the Renaissance, original ideas had to be couched in antiquity The Renaissance's obsession with recovering ancient Rome created a peculiar incentive structure: original ideas were unfashionable; ideas presented as recovered ancient wisdom were prestigious. Palmer shows this goes far beyond homage. Giordano Bruno attributed to Aristotle claims that Aristotle explicitly contradicted. Annius of Viterbo forged ancient texts and staged fake archaeological digs to give his original historical theories the authority of antiquity. Marsilio Ficino, translating Plato, genuinely convinced himself that the wildly original cosmological and magical system he had assembled was secretly coded in the Platonic texts. This explains why Machiavelli's other major work is called *Discourses on Livy* rather than, say, *A New Theory of Republican Governance*. A discourse on an ancient was a prestige format; an original political treatise was a niche curiosity. The 19th century misread the Renaissance as intellectually barren — "200 years of people being wrong about Plato" — because it expected original standalone treatises and found commentary after commentary. Palmer argues the original ideas are there, using the ancients as what she calls the trellis up which the rose climbs. > *"Nobody wants original ideas. Original ideas are out of vogue. Original ideas are dead. All ideas need to be from the ancients."* ## [01:50:44] Why copyright began with the Inquisition Machiavelli was one of the first authors to experience unauthorized printing. A local press printed one of his works without asking, riddled it with compositor typos, and his only recourse was to write letters to important people clarifying that the errors were not his. There was no legal framework at all. The solution emerged from an unexpected direction: post-1515, the Inquisition required pre-publication approval for all texts to screen for heresy. In exchange for going through this process, the approved printer received a monopoly license — the Inquisition's record of permission served as proof that no one else could legally print the same book. The first copyright was a censorship certificate. England, observing this, copied the mechanism while eventually stripping out (or softening) the censorship half, producing the ancestor of modern copyright law. The institutional logic held together: the Inquisition needed to please local rulers to get resources, so approving books dedicated to the duke and granting his favored printer exclusivity was a political investment. Everyone — inquisitors, printers, authors, and ruling families — had reasons to make the system work. > *"So the very first version of copyright is the Inquisition."* ## [02:02:12] Machiavelli wasn't Machiavellian The word "Machiavellian" came to mean scheming self-advancement — Shakespeare's Richard III invokes "the murderous Machiavel" as his role model. Palmer traces how the idea of Machiavelli separated from the actual man and became a useful thought-experiment figure: the cynical, probably atheistic politician who wants nothing but personal power. The same splitting happened to Hobbes (the Beast of Malmesbury) and Spinoza, whose actual writing is warm and theistic but whose excommunication from the Jewish community made people assume he must be the most radical heretic imaginable. The real Machiavelli — who refused lucrative court positions across Europe, who kept his most important work secret to protect Florence from foreign exploitation, who chose to rot in an isolated hamlet over serving any cause that wasn't his country — is almost the opposite of "Machiavellian." His book is not about gaining power but about keeping power stable enough to protect people. Palmer's closing point: the gap between Old Nick and Niccolò Machiavelli is itself a revealing fact about how societies use ideas, splitting thinkers into a character useful for one purpose and the actual work useful for another. Read *The Prince* knowing it was written by someone who would give up anything to serve Florence, and a very different text comes through. > *"This is why it's so weirdly ironic to me that the reputation—the word"Machiavellian"—means"self-serving", when Machiavelli himself is one of the most selfless men I've ever read about in the history of the Earth."* ## Entities - **Dwarkesh Patel** (Person): Host of the Dwarkesh Podcast; interviews scholars on history, science, and technology. - **Ada Palmer** (Person): Historian and science fiction novelist at the University of Chicago; specialist in Renaissance intellectual history and the history of censorship. - **Niccolò Machiavelli** (Person): Florentine diplomat (1469–1527), author of *The Prince* and *Discourses on Livy*; wrote *The Prince* as a secret appeal to the Medici regime that had tortured and exiled him. - **Cesare Borgia** (Person): Renaissance military commander known as "Valentino"; son of Pope Alexander VI, conquered central Italy and was Machiavelli's primary case study in effective (if brutal) statecraft. - **The Prince** (Concept): Machiavelli's treatise on political power, written ~1513, kept proprietary during his lifetime and published posthumously in 1532; misread as a self-advancement manual rather than a guide to maintaining stable government. - **Discourses on Livy** (Concept): Machiavelli's longer republican political theory, structured as commentary on the Roman historian Livy; his public bid for intellectual prestige in a culture that prized commentary on ancients over originality. - **The Medici** (Organization): Ruling family of Florence, whose patronage networks and papal connections shaped both the political instability Machiavelli analyzed and the conditions under which he wrote and was exiled. - **Florence** (Organization): Italian city-state and center of Renaissance banking, art, and humanist scholarship; Machiavelli's country, for which he subordinated his entire career. - **Patronage System** (Concept): The multi-generational network of family obligations that served as the functional glue of Renaissance society, determining access to justice, employment, publication, and protection from the Inquisition.

#machiavelli#renaissance#political-philosophy

Sarah Paine - Why Putin and Xi can't escape geography

1:02:07

EN/ZH

Watch with Captions

Dwarkesh Patel약 2개월 전

Sarah Paine - Why Putin and Xi can't escape geography

Naval War College historian Sarah Paine delivers a standalone lecture tracing two thousand years of geopolitical logic: continental empires (China, Russia) pursue security by expanding borders and crushing neighbors, while maritime powers (Athens, Britain, the US) pursue prosperity by trading across open seas. She argues this structural divide—rooted in the brute fact of geography—explains Putin's war on Ukraine, Xi's ambitions over Taiwan, and why the post-WWII rules-based order is the only arrangement that produces compounded growth rather than compounded ruin. ## [00:00] Setting the stage Paine opens by framing the lecture's core question: why do some great powers keep grabbing territory while others keep opening trade routes? The answer comes down to one physical fact—whether it is feasible to defend yourself at sea. Maritime powers can; continental powers cannot. That single asymmetry generates two entirely different military traditions, two economic models, and two competing visions of world order. She walks through American history as a warm-up: the US began life as a continental power (manifest destiny, the Mexican-American War, Alaska purchased when Russia needed cash), then pivoted toward a maritime identity after Alfred Thayer Mahan convinced strategists that naval trade, not westward land, was the real source of national power. Alongside Mahan, Paine introduces the three geopoliticians whose maps anchor the lecture: Halford Mackinder (the Eurasian heartland as the world's natural fortress, impervious to sea power), Nicholas Spykman (control the rimlands, and you influence the heartland), and their shared lesson that US security runs through sea lanes and alliances, not borders. > *"Maritime powers are the exception and continental powers are the rule. Why? Because maritime powers, if need be, can defend themselves primarily at sea with their navies. Whereas a continental power simply cannot—think Ukraine, a navy is not going to save them from Russia."* ## [12:10] The continental powers Paine works through the logic of the continental world starting with China—the original case—then Russia. Sun Tzu's *Art of War* contains no references to maritime warfare: it was written for a world where neighbors invade overland at any time and the only viable response is a mass army. Geography tells the rest: too much of China's land is vertical to feed its people, which makes controlling the arable lowlands an existential imperative. The Han expansion from the Yellow River Valley followed that logic for millennia, wiping out the Zongars, subjugating Tibet, producing the ethnic patchwork Beijing still manages with military administrative overlays. Russia's pattern is the same dynamic in reverse—a Moscow core expanding outward in concentric rings until it hit countries that fought back. The continental security playbook that emerges is ruthlessly coherent: no two-front wars, no great-power neighbors, take on threats sequentially, destabilize the rising ones, absorb the failing ones, maintain buffer zones in between. Paine closes the section with the WWII body count that makes the paradigm's cost visible: Russia lost over 25 million dead (soldiers plus civilians); the United States lost 295,000. The ocean moat is not an abstraction—it is the difference between hundreds of thousands and tens of millions. > *"In this world, you're faced with a binary choice: you either become Han or they will kill you. And genocide is what happens to the losers in continental warfare."* ## [29:12] The maritime alternative Where continental empires carve the world into exclusive spheres, maritime powers treat the sea as a commons to be shared. Paine traces the lineage from Athens through Rome ("Mediterranean" means the sea in the middle of the lands; "Zhongguo" means the kingdom among the kingdoms—one term centers the sea, the other the land), the Dutch Republic, and finally Britain. Hugo Grotius, a Dutchman watching his nation's trade pirated, wrote *Mare Liberum* to establish that the sea belongs to no one and therefore belongs to everyone—the founding document of international maritime law. Britain refined the operating strategy over the Napoleonic Wars into six rules for "elephant hunting": keep the home economy growing, blockade enemy trade, fund the allied continental power facing the main front, find a peripheral theater where sea access beats land access, never attack the enemy's main force directly, and—only after the elephant has been bled—pile on with allies. The key structural point: a navy that prevents invasion produces wealth invisibly. Britain compounded wealth for a century after Waterloo while its continental neighbors burned money funding standing armies and fighting each other. That invisible compounding, over generations, is the difference between North and South Korea. > *"Trade is going to finance the navy. It's going to protect both British homeland and some of the trade. And then Britain is going to be compounding wealth while its neighbors are busy—constantly fighting with each other and destroying wealth in the process."* ## [42:00] How the Industrial Revolution changed everything The Industrial Revolution flipped the source of power from land to commerce. When land determines wealth, conquest makes sense. Once wealth comes from industry and trade, territorial expansion is literally negative-sum: you destroy the asset while fighting for it. The Suez Canal is Paine's sharpest example—Egypt sank block ships in 1967 to deny Israel access, but the strategic result was that global shipping shifted to supertankers that go the long way around Africa at one-third the cost per ton. Closing a chokepoint accelerated the maritime world's efficiency. Malcolm McLean's shipping container reduced cargo loading costs from nearly $6 per ton to under 20 cents, and the ISO then harmonized container dimensions across trucks, railways, and ships—producing plummeting transport costs and the trade explosion that lifted hundreds of millions out of poverty. Xi's Belt and Road Initiative, Paine notes dryly, crosses some of the world's most unstable territory, requires constant trans-shipment between incompatible rail gauges, and can never be rerouted—the exact opposite of maritime flexibility. China's own geographic trap is inescapable: shallow, island-cluttered seas that become kill zones in wartime mean its merchant fleet reaches global markets only in peacetime. > *"Once wealth is a function of commerce, industry, and trade, it isn't land anymore. And this upends the world. If you think about the world today, who's rich, who's poor—it's often the degree to which the country is industrialized."* ## [52:00] Why Putin wants to break the world The post-WWII institutional framework—UN, IMF, NATO, WTO, EU—was built by people who survived both the trenches of WWI and the Great Depression, then spent WWII watching their own children die. Their conclusion: hash out differences with diplomats and lawyers, because sending soldiers destroys more value than any conceivable prize is worth. That system held the peace in the industrialized world for 75 years, until Putin decided to break it. Putin's challenge is not irrational by continental logic: a rising Ukraine integrated into NATO is precisely the kind of strong, stable neighbor that, in the old paradigm, becomes an existential threat. His goal is to hollow out the alliance system and shatter international law so the world reverts to warring spheres of influence—a world where continental powers can once again play their traditional game without maritime rules they were never designed for. Paine's answer is that sanctions are "economic chemotherapy": they suppress growth by one or two percent per year, and compounded over generations, that gap is the difference between North and South Korea. The objective is never to eliminate the rogue state but to contain it at acceptable cost. The only exit that avoids nuclear escalation is the one the post-war generation built: diplomats, lawyers, and institutions. > *"The only win-win solution is to deploy the diplomats and lawyers to hash out these things in international forums—because if we're all going to send soldiers, we're going to get a third world war with nuclear follow-on effects, and we'll see whether humanity makes it."* ## Entities - **Sarah Paine** (Person): Military historian at the U.S. Naval War College; sole speaker in this lecture; author of a 2025 lecture series on continental vs. maritime powers. - **Alfred Thayer Mahan** (Person): 19th-century U.S. naval strategist; argued that maritime trade and sea power, not land conquest, determine national greatness; associated with the Naval War College. - **Halford Mackinder** (Person): British geographer; 1904 "pivot area" thesis posited that the Eurasian heartland, insulated from sea power, is the world's natural fortress. - **Nicholas Spykman** (Person): Dutch-American strategist; argued that controlling Eurasia's rimland determines global power; died 1943 while warning the US about Eurasian dominance. - **Hugo Grotius** (Person): Dutch jurist; founder of international maritime law; *Mare Liberum* (1609) established freedom of the seas as a universal right. - **Malcolm McLean** (Person): American trucking entrepreneur who invented the standardized shipping container, collapsing cargo loading costs and enabling the post-war trade explosion. - **Continental power** (Concept): A state that cannot defend itself primarily at sea; prioritizes territorial expansion, mass armies, buffer zones, and exclusive spheres of influence; exemplified by Russia and China. - **Maritime power** (Concept): A state that can defend itself primarily at sea; prioritizes trade, open sea commons, alliance-building, and compounding wealth; exemplified by Britain and the United States. - **Rules-based international order** (Concept): The post-WWII institutional system (UN, IMF, NATO, WTO, EU) that enforces sovereignty and free trade; the system Putin and Xi seek to dismantle. - **U.S. Naval War College** (Organization): Graduate school of the US Navy in Newport, Rhode Island; Paine spent 24 years there; home of Mahanian sea-power theory.

#geopolitics#grand-strategy#maritime-power

AI가 발전할수록 경제에서 차지하는 몫은 오히려 줄어들 수 있다 – Alex Imas & Phil Trammell

1:16:08

EN/ZH

Watch with Captions

Dwarkesh Patel약 2개월 전

AI가 발전할수록 경제에서 차지하는 몫은 오히려 줄어들 수 있다 – Alex Imas & Phil Trammell

경제학자 Alex Imas(Google DeepMind / 시카고 대학교)와 Phil Trammell(Epoch / 스탠퍼드)은 완전 자동화의 가장 역설적인 결과가 자본이 모든 것을 독식하는 것이 아님을 주장한다. AI가 완전 자동화된 재화의 수요를 포화시키는 동안, 관계적·경험적 시장에서 인간은 여전히 희소하기 때문에 AI는 오히려 자신의 경제적 발자국을 축소시킬 수 있다. 대화는 AGI 이후에도 무엇이 희소성을 유지하는지, 재분배의 정치학, O-링 상보성이 현재 자동화를 늦추는 이유, 축적 지향적 선호를 가진 AI 에이전트가 미래 부의 대부분을 소유할 수 있는 이유, 그리고 AI 공급망에서 배제된 개발도상국이 취해야 할 전략까지 이어진다. ## [00:00] 자본 몫은 증가할까? Dwarkesh는 핵심 난제로 대화를 시작한다. AI가 인간이 하는 모든 일을 할 수 있다면, 노동 소득의 몫은 어디로 가는가? Alex Imas는 과거 산업 전환을 예측하려 했던 경제학자들이 자주 틀렸다는 점을 지적하며 운을 뗀다. 데이비드 리카도는 산업혁명으로 대량 실업이 일어날 것이라고 예측했고, 어떤 일자리가 사라질지에 대해서는 방향성이 맞았지만, 총체적 결과는 완전히 틀렸다. 2026년 현재 핵심 연령층의 고용률은 2000년 이후 거의 어느 시점보다도 높다. 구조적 전환을 연구하는 경제학자들은 기존 비용이 붕괴할 때 등장하는 새로운 재화와 일자리의 종류를 지속적으로 과소평가한다는 교훈이 있다. Imas는 그가 "관계 부문"이라고 부르는 개념을 소개한다. 인간의 존재 자체가 가치의 일부인 재화와 서비스다. 인간은 본질적으로 유한하기 때문에, 다른 모든 것이 자동화되면 인간이 참여하는 제품의 상대적 희소성과 가격이 오히려 높아진다. Phil Trammell은 공급망 회계 논리로 이를 더 날카롭게 다듬는다. 어떤 재화든 네트워크 조정 요소 몫을 살펴보면, 즉 원자재까지 노동과 자본 투입을 추적해 내려가면, 노동 몫이 이미 놀랍도록 견고하다는 것을 알 수 있다. AI가 비관계적 재화를 거의 한계비용 없이 포화시키면, 소비자는 그 재화에 대한 수요를 빠르게 소진하고 여전히 희소한 것으로 지출을 돌린다. 소프트웨어가 무료라도 발레 공연이 싸지지는 않는다. > *"인간은 본질적으로 희소하기 때문에, 다른 많은 것들이 더 이상 희소하지 않게 되는 자동화가 일어나더라도, 우리는 여전히 인간이 관여하고 루프 안에 있는 것들에서 희소성을 갖게 됩니다."* > — Alex Imas Trammell은 이 논리를 자본 몫 자체로 확장한다. 비인간 재화를 위한 공급망을 완전히 자동화하고 수요를 빠르게 충족시키면, 그 재화의 한계 효용은 0에 수렴한다. 결과적으로 자본의 가치 몫은 확대되기는커녕 실제로 축소될 수 있다는 것이 이 에피소드의 역설적인 핵심이다. ## [19:36] 혼란스러운 중간 시나리오 Dwarkesh는 Molly Kinder의 "혼란스러운 중간" 논제를 제기한다. AI가 재앙을 일으키지는 않지만 장기적인 분배 압박을 만드는 세계다. 기업은 생산성 이득을 독식하고, 노동자는 임금 정체에 직면하며, 정부 재분배는 대체 속도를 따라잡지 못한다. 역사적 유추는 전화 교환원이다. 1960년대에 이미 존재하던 기술로 완전히 자동화 가능했던 직종이지만, 제도적 관성 때문에 실제 자동화에는 20년이 걸렸다. 노동자들이 하루아침에 해고된 것이 아니라 서서히 재흡수되었는데, 대부분 더 낮은 임금과 불완전 고용 상태로였다. Imas는 단기적으로는 혼란스러운 중간 시나리오가 가능하지만 영속하지는 않을 것이라고 본다. AI로 인한 생산성 이득의 규모가 충분히 크기 때문에 파이가 분배할 만큼 커지기 때문이다. 정치경제 문제는 자원의 희소성이 아니라 속도와 조율이다. 정부는 어떤 노동자가 AI 때문에 대체되었는지 다른 원인 때문인지 알지 못하고, 정치적 제약이 마찰을 만들며, 대체와 재분배 사이의 간격이 수학적으로는 결국 맞아떨어질지라도 심각한 피해를 일으킬 만큼 길 수 있다. > *"전화 교환원은 완전히 자동화되었지만, 기술이 이미 존재했음에도 20년이 걸렸습니다. 그래서 이런 점진적 흐름이 있었습니다. 거대한 부문이 갑자기 사라진 게 아니라요."* > — Alex Imas ## [25:57] AI 부를 어떻게 과세하고 재분배할 것인가 Imas는 재분배 수단을 구현 복잡성과 효과 발현 속도라는 두 축으로 정리한다. 부의 소득세는 시행 즉시 바닥을 만들어준다. 보편적 기본 자본, 즉 모든 시민에게 AI 생산 기업의 지분을 부여하는 것은 수익이 발생하기까지 수년이 걸린다. UBI는 그 사이 어딘가에 위치한다. 이 트레이드오프는 속도만의 문제가 아니라 정치적 지속 가능성의 문제이기도 하다. 시민이 정부의 직접 지원금에 의존하도록 만드는 프로그램은 다음 선거에서 누가 이기느냐에 따라 취약해지지만, 자산이 분산되어 있는 광범위한 자본 소유는 몰수하기 어렵다. Trammell은 재원 조달 문제와 분배 방식을 분리한다. 돈을 어떻게 거두어들이느냐는 어떻게 돌려주느냐와 분석적으로 별개다. 조지스트 토지가치세가 자주 거론되지만, AI 시대 재분배에 필요한 규모의 재원으로는 부족하다. AI가 창출하는 부는 토지가 아니라 소프트웨어와 컴퓨팅에 집중되어 있기 때문이다. Phil은 세수로 AI 기업 지분을 광범위하게 분배하는 방식이 정치적으로도 안정적이고 경제적으로도 효율적일 수 있다고 제안한다. > *"지금 우리는 소득으로 전환할 수 있는 노동력을 갖추고 있습니다. 그것이 더 이상 적용되지 않게 되면, 우리는 기본적인 필요를 위해 선출된 공무원에게 의존하게 됩니다."* > — Alex Imas ## [30:02] 수요 붕괴가 일어날 가능성은 낮다 Dwarkesh는 화이트칼라 대재앙 서사를 압박한다. AI로 인한 대규모 실업이 이미 나타나고 있다는 데이터가 있는가? Imas는 예일 Budget Lab 데이터를 인용한다. 기껏해야 약한 신호만 보이는데, 주니어 소프트웨어 엔지니어 채용이 추세 대비 소폭 낮을 뿐이고, 시니어 엔지니어 수요는 변함이 없거나 오히려 늘고 있다. 화이트칼라 부문 전반에서 실업의 급격한 수준 이동은 나타나지 않았다. 한 가지 설명은 O-링 상보성이고, 또 다른 설명은 행동적 현상이다. 기업들이 근대성을 과시하기 위해 사람을 해고하거나 토큰 사용량을 극대화하는 등 퍼포먼스적 AI 도입을 하고 있으며, 때로는 실질 생산성에 실제 비용을 치르면서까지 그러고 있다. 더 넓은 수요 문제는 소프트웨어가 물리적 재화와 동일한 탄력성 규칙을 따르느냐는 것이다. 음식은 충분히 먹으면 멈추지만, 소프트웨어는 더 원하는 것을 멈추게 될까? Imas와 Dwarkesh는 소프트웨어 수요가 가격 하락에 충분히 탄력적이어서 계속 따라갈 수 있다고 본다. 컴퓨팅 역사를 보면 더 싼 컴퓨팅은 일관되게 수요를 붕괴시키는 것이 아니라 더 많은 수요를 창출했다. 주요 위험은 포화가 빠른 특정 재화이지, 총체적 노동 수요가 아니다. > *"주니어 개발자들이 전보다 취업이 덜 된다는 약간의 신호는 있을 수 있습니다. 하지만 그것은 '전보다 적다'는 것이지 수준 이동이 아닙니다. 오히려 시니어 소프트웨어 엔지니어에 대한 수요는 증가하고 있습니다."* > — Alex Imas ## [39:26] 인간 노동자를 기계 경제에 통합하기란 쉽지 않다 O-링 모델은 챌린저 우주왕복선 사고에서 이름을 딴 것으로, 하나의 결함 부품이 전체 결과물을 무효화하는 생산 방식을 설명한다. 이는 현재 AI 자동화가 예상보다 느린 이유와 미래 자동화가 구조적으로 인간을 배제할 수 있는 이유를 모두 설명한다. 지금은 법률이나 회계 업무의 90%를 자동화할 수 있어도, 고객들은 여전히 인간이 최종 서명을 해주길 원한다. 실패 지점 하나가 전체 결과물을 무효화할 수 있기 때문이다. 이 신뢰성 제약이 AI 역량이 높더라도 인간을 계속 고용하게 만든다. Phil Trammell은 이 논리를 앞으로 뒤집는다. AI가 충분히 뛰어나져서 생산 흐름이 기계 노동 중심으로 완전히 재편되면, 즉 에이전트들이 기계 속도로, 기계 고유의 표현 방식으로 소통하게 되면, 인간을 루프에 끼워 넣는 거래 비용이 병목이 된다. 특정 좁은 작업에서 인간이 비교우위를 가지더라도, 조율 부담과 신뢰성 불일치 때문에 인간을 우회하는 것이 더 저렴해진다. O-링은 양방향으로 작동한다. > *"인간이 더 비싸거나 덜 똑똑하다는 논리를 넘어서, 신경망으로 대화하고 수천 배 빠르게 생각하는 AI 노동을 위해 편성된 생산 흐름 전체가 생겨날 것입니다."* > — Dwarkesh Patel ## [43:08] 일부 인간(또는 AI)이 부 축적 자체를 목적으로 삼는다면? 가장 긴 챕터는 가장 투기적인 영역을 다룬다. Dwarkesh는 진화가 자원 축적, 지위, 번식 같은 특정 선호를 가진 인간을 선택해 왔으며, 그것이 지금 100조 달러 규모의 세계 경제를 형성하고 있다고 지적한다. AI 에이전트도 유사한 선택 압력에 의해 형성될 것이다. 축적을 선호하는 방식으로 훈련되거나 배포된 에이전트들이 그렇지 않은 에이전트들을 능가하고 오래 살아남을 것이다. 이는 파국적인 정렬 실패를 필요로 하지 않는다. 새로운 기질에 적용된 차별적 번식의 일반 논리다. Phil Trammell은 정상 상태 수학을 분석한다. 인간이든 AI든 현재와 미래 소비 사이의 대체 탄력성이 높은, 즉 소비에 만족하지 않고 계속 더 많은 자본을 원하는 집단이 인구의 소수에 불과하더라도, 장기적으로 그 에이전트들이 대부분의 부를 소유하고 경제가 무엇을 생산할지를 결정하게 된다. 자본 몫은 AI가 집단적으로 탐욕스러워서가 아니라 선호 이질성과 복리가 가장 인내심 있는 축적자에게 자산을 몰아주기 때문에 1.0에 가까워진다. > *"장기적으로 그들이 대부분의 부를 갖게 될 것이고, 전체 자본 몫은 기본적으로 그 사람의 지출에서 자본 몫이 될 것인데, 그것은 1에 가까울 것입니다."* > — Phil Trammell 대화는 이어서 할인율과 금리로 넘어간다. AI가 촉발하는 성장이 매우 빠르다면 단기 소비가 미래 소비 대비 저렴해져 이론상 저축 인센티브를 낮추고 금리를 압축해야 한다. 하지만 쌍곡 할인자와 축적 지향 에이전트들은 표준 방식으로 가격 신호에 반응하지 않을 수 있으며, 두 게스트 모두 이 부분이 경제 모델이 깔끔하게 해결할 수 있는 영역의 경계임을 인정한다. ## [61:28] 개발도상국은 어떻게 해야 하는가? Imas는 중소득국과 개발도상국이 주류 AI 경제학 논의에서 거의 완전히 빠져 있다고 지적하며, 그 공백의 책임 일부가 자신과 같은 분야 연구자들에게 있다고 말한다. 두 가지 시나리오가 문제의 경계를 그린다. 낙관적 시나리오에서는 오픈 웨이트 모델이 빠르게 확산되어 나이지리아나 인도에 거의 비용 없이 역량을 끌어올려 준다. 마치 모바일 뱅킹이 전통적인 금융 인프라 부재를 건너뛴 것처럼. 비관적 시나리오에서는 AI가 선진국의 상품 생산을 자동화하여 동아시아 경제가 산업화에 활용했던 제조업 수출 사다리를 없애버린다. 핵심 변수는 혜택이 얼마나 집중되느냐다. Alex는 전기의 유추를 꺼낸다. 전기는 자연 독점 기업들이 생산했지만, 하류 이득은 유틸리티 손에 집중되는 것이 아니라 이용자들에게 광범위하게 확산되었다. AI가 같은 패턴을 따른다면, 즉 상품화된 접근권과 경쟁적인 하류 시장이 형성된다면, 개발도상국도 순혜택을 받을 수 있다. 소수 플랫폼이 대부분의 가치를 독식하는 소셜 미디어 패턴을 따른다면, 집중이 불평등을 심화시킨다. Phil은 개발도상국 정부들이 상품 수출 붕괴 시나리오에 대한 헤지로 AI 공급망에 조기에 투자하는 국부 펀드 설립을 고려해야 한다고 주장한다. > *"AI 기술이 나이지리아와 개발도상국으로 확산되어 경쟁의 장을 평탄하게 만들고, 본질적으로 역량 면에서 한 단계 도약하게 해주는 시나리오도 있습니다. 그리고 그들이 모델을 훈련하지 않고, 하드웨어도 없어서 완전히 뒤처지는 시나리오도 있습니다."* > — Alex Imas ## 등장인물 및 개념 - **Alex Imas** (인물): Google DeepMind AGI 경제학 디렉터 겸 시카고 대학교 경제학 교수. 행동경제학 및 AI의 거시경제적 영향을 연구한다. - **Phil Trammell** (인물): Epoch 경제학 책임자 겸 스탠퍼드 연구원. Global Priorities Institute에서 변혁적 AI의 경제학과 장기적 자선 활동을 연구한다. - **Dwarkesh Patel** (인물): Dwarkesh Podcast 진행자. 과학, 기술, 경제학, 정책의 교차점에서 장형 인터뷰를 진행한다. - **관계 부문** (개념): 인간의 존재 자체가 가치 명제의 핵심인 재화와 서비스. 치료, 장인 공예, 라이브 공연 등이 해당하며 AI가 대체 가능한 결과물을 포화시킬수록 경제적 비중이 커질 것으로 예측된다. - **O-링 이론** (개념): 단 하나의 신뢰성 없는 부품이 전체 결과물을 무효화하는 생산 모델. 현재 AI 자동화의 한계와 미래 기계 중심 생산 흐름이 인간 노동을 구조적으로 배제할 수 있는 이유를 설명한다. - **자본 몫** (개념): 국민 소득에서 자본 소유자가 가져가는 비율. 이 에피소드의 핵심 지표로, 완전 자동화가 이를 확대하는 것이 아니라 오히려 축소할 수 있다는 역설적 논제를 다룬다. - **보편적 기본 자본** (개념): 현금이 아닌 생산적 자산(AI 기업 포함)의 지분을 시민에게 부여하는 재분배 정책. UBI보다 정치적으로 더 지속 가능하다는 주장이 있다. - **Epoch** (기관): AI 타임라인과 거시경제 예측에 집중하는 연구 기관. Phil Trammell이 경제학 책임자로 재직 중이다. - **예일 Budget Lab** (기관): AI의 노동시장 효과에 관한 실증 데이터를 발표하는 연구 센터. 2026년 중반 기준 화이트칼라 실업에서 수준 이동이 발견되지 않았다는 결과를 발표했다. - **토지가치세 / 조지스트 세금** (개념): 개량되지 않은 토지 가치에 매기는 세금. AI 시대 재분배의 재원으로는 부족하다는 평가를 받는다. AI 부가 토지가 아닌 소프트웨어와 컴퓨팅에 집중되어 있기 때문이다.

#agi-economics#labor-share#automation

Chip design from the bottom up – Reiner Pope

Chip design from the bottom up – Reiner Pope

Reiner Pope, CEO of MatX and former Google Brain TPU architect, gives Dwarkesh Patel a blackboard-style lecture on chip design from first principles. Starting with AND and NOT gates, Reiner works up through register files, systolic arrays, clock synchronization, FPGAs, cache hierarchies, and finally the structural difference between a GPU and a TPU. The throughline is a single engineering tension: every compute unit is wasted if the chip spends its time moving data rather than multiplying numbers. ## [00:00] Building a multiply-accumulate from logic gates Reiner starts at the bottom: AND, OR, and NOT gates, wired together as metal traces on silicon. The key operation AI chips want to run is matrix multiplication, and inside that the primitive is a multiply-accumulate — multiply two numbers, add the result into an accumulator. Reiner walks through how a full adder is assembled from a handful of XOR and AND gates, and how those cascade into a bit-serial multiplier and ultimately a floating-point MAC. The precision hierarchy matters here: accumulating low-precision multiplications requires higher-precision accumulators, which is why AI chips run 8-bit multiply but 32-bit accumulate. > *"The main function that AI chips want to compute is the multiplication of matrices. Inside that, the fundamental primitive is a multiply-accumulate of pairs of numbers."* ## [16:20] Muxes and the cost of data movement Before Tensor Cores, GPUs and CPUs used the same structure: a register file holding a few dozen values, feeding into an ALU, writing back to the register file. Reiner shows that a mux — a circuit that selects between multiple inputs — is the hardware tool that lets you address arbitrary registers, and that the cost of this generality is measured in area and energy. Every read from an eight-entry register file requires a mux tree of depth three; every write requires a decoder of the same size. The bottleneck for AI workloads isn't the multiply itself but the round-trip through that register file. > *"We want to analyze the cost of the data movement from the register file to the ALU and back."* ## [25:59] How systolic arrays work The key insight behind TPUs: instead of doing one multiply-accumulate at a time and writing back to registers, bake an entire matrix-vector loop into hardware. A systolic array is a grid of MAC units where each cell passes its partial sum to the right and its input operand downward, so data flows through without ever touching a register file. Reiner explains the two wins this buys: more compute per unit of data fetched, and the ability to keep operands resident inside the array for the full inner product instead of re-loading them. The trade-off is inflexibility — you can only efficiently run the exact loop shape the hardware was designed for. > *"The idea of a systolic array is to go two levels of loops up and bake this entire loop out here into hardware."* ## [39:00] Clock cycles and pipeline registers With 100 billion transistors on a chip, synchronization between parallel units is non-negotiable. Reiner explains the clock: every nanosecond or so, the chip pauses all computation for a synchronization pulse before the next operation. Clock frequency is set by the longest combinational path — the deepest chain of logic gates that a signal must traverse in one cycle. Pipeline registers chop that path into shorter stages, letting each shorter segment run at a higher frequency, at the cost of latency: a fully pipelined 32-stage multiplier produces one result per cycle but takes 32 cycles for any single multiplication. > *"Every nanosecond or so, all circuitry in the chip will pause for a moment and synchronize. That is the clock cycle."* ## [51:40] FPGAs vs ASICs An FPGA is a sea of programmable logic blocks — lookup tables and flip-flops that can be wired together in software. An ASIC is a chip taped out for one purpose. Conceptually they're the same: AND/OR gates in a fixed clock cycle. The economics diverge at first copy: an FPGA costs $10K to program; a first ASIC tape-out costs $30M. FPGAs make sense for workloads that change monthly and need deterministic latency at high speed with less care about energy or throughput. Jane Street uses them for high-frequency trading exactly because the clock cycle is deterministic — no cache misses, no branch prediction, no interrupts. > *"The first FPGA costs you $10,000, whereas the first ASIC you make costs $30 million because it requires an entire tape-out."* ## [63:14] Cache vs scratchpad CPUs are non-deterministic partly because of the L1/L2 cache: a small fast memory that speculatively stores data the processor thinks it will need next. Cache misses — when the prediction is wrong — stall execution for hundreds of cycles. AI accelerators replace the cache with a scratchpad: explicitly programmer-managed SRAM where the compiler decides exactly what lives there and when. Groq and TPUs both advertise deterministic latency because they use scratchpads instead of caches. The scratchpad is simpler and faster but shifts the burden to the compiler. > *"Probably the most important source of non-determinism on a CPU is the CPU cache itself."* ## [67:16] Why CPU cores are much bigger than GPU cores A modern CPU has maybe 100 cores, each taking up far more die area per core than a GPU's thousands of SMs. The reason: CPU cores carry enormous out-of-order execution machinery — reorder buffers, branch predictors, speculative execution units — all aimed at keeping a single thread running fast on unpredictable workloads. A GPU SM strips most of that out. It runs many simple threads in lockstep (a warp), and when one thread stalls on a memory load, the hardware instantly switches to another warp at zero cost. The CPU pays silicon for per-thread speed; the GPU pays silicon for throughput across thousands of parallel threads. > *"If there are so few cores, what are you spending all of the die on?"* ## [71:49] Brains vs chips Dwarkesh pushes Reiner on the brain-versus-chip comparison. Two genuine differences: the brain has unstructured sparsity (any neuron can connect to any other), while hardware accelerators use structured sparsity (aligned blocks); and the brain's clock runs at tens of hertz versus gigahertz on silicon. Reiner notes that co-location of memory and compute — often cited as a brain advantage — is also present in modern AI chips: the weights sit in HBM right next to the matrix units. The energy constraint is the more interesting gap: the brain runs on 20 watts, chips on kilowatts, which may reflect fundamental differences in what the brain is optimized to do. > *"This is exactly the co-location, in some sense, of the memory and compute."* ## [75:22] A GPU is just a bunch of tiny TPUs At the top level, a TPU has a handful of large systolic arrays plus a vector unit. A GPU has hundreds of SMs, each of which contains a small matrix unit and a small vector unit — essentially a miniaturized TPU. The architectural difference is granularity: a TPU commits to a few large matrix operations; a GPU runs thousands of smaller ones in parallel. Inside each SM, Tensor Cores add a fixed-function matrix unit on top of the original scalar/vector pipeline, making modern GPUs a hybrid of the two paradigms. The "GPU is just tiny TPUs" framing collapses what seemed like fundamentally different architectures into a single continuum. > *"You can think of scaling this thing down into a really tiny unit with a smaller matrix unit and a smaller vector unit, and that is sort of what an SM is."* ## Entities - **Reiner Pope** (Person): CEO and co-founder of MatX; previously led TPU software and compiler work at Google Brain - **Dwarkesh Patel** (Person): host of the Dwarkesh Podcast; angel investor in MatX - **MatX** (Organization): AI chip startup building inference accelerators - **Google / Google Brain** (Organization): where Reiner worked on TPU architecture before MatX - **Jane Street** (Organization): high-frequency trading firm that relies on FPGAs for deterministic latency - **Groq** (Organization): AI inference chip company that advertises deterministic latency via scratchpad architecture - **Multiply-Accumulate (MAC)** (Concept): the fundamental operation of neural network inference — multiply two numbers, add into an accumulator - **Systolic Array** (Concept): a grid of MACs that passes data between cells without touching a register file, enabling high compute-to-bandwidth ratios - **FPGA** (Technology): Field-Programmable Gate Array — reprogrammable logic fabric used where workloads change frequently - **ASIC** (Technology): Application-Specific Integrated Circuit — custom silicon optimized for one workload - **TPU** (Technology): Google's Tensor Processing Unit, organized around a few large systolic arrays - **SM / Streaming Multiprocessor** (Technology): the GPU core unit, containing scalar, vector, and matrix (Tensor Core) execution resources

#chip-design#hardware#ai-accelerators

AlphaGo를 처음부터 만들기 – Eric Jang

Eric Jang은 안식년 동안 최신 도구로 AlphaGo를 재구현했고, 그 결과물은 2시간 반에 걸친 기술적 심층 탐구로 이어졌다. 이 대화는 RL이 실제로 어떻게 작동하는지, 그리고 LLM 학습에 내재된 단순한 정책 경사 방식이 MCTS로는 피할 수 있는 근본적 한계를 왜 갖는지를 조명한다. 바둑 규칙에서 시작해 MCTS, 신경망 구조, 자기대국 학습, 오프폴리시 데이터를 거쳐, 자신의 프로젝트에 AI 연구 자동화 루프를 직접 돌려본 Jang의 관찰로 대화는 마무리된다. ## [00:00] 바둑 기초 바둑은 완전히 풀리지 않았기에 브루트포스 탐색이 무력하다—정복이 아니라 근사가 필요하다. Jang이 AlphaGo 재구현에 끌린 이유는 열 층짜리 네트워크가 우주의 원자 수보다 더 큰 분기 계수를 가진 게임 트리의 비용을 어떻게 상각할 수 있는지에 대한 의문 때문이었다. 초반부에는 바둑의 기본 규칙—집 차지, 활로, 따냄, 패—과 모호한 국면을 인간 합의 없이 알고리즘으로 해결하는 Tromp-Taylor 계가법을 설명한다. 채점 방식의 차이는 컴퓨터가 국면을 평가하는 방식과 직결된다. 인간은 포위된 돌을 보는 순간 운명을 직감하지만, 컴퓨터는 경기 끝에 경합 교차점을 셀 명확한 규칙이 필요하다. > *"2014, 2015, 2016년에 나온 AlphaGo의 초기 성과들을 보면서, AI 시스템이 얼마나 뛰어나질 수 있는지, 딥러닝으로 어떤 계산 복잡도 문제까지 다룰 수 있는지를 실감하며 깊은 인상을 받았습니다."* ## [08:06] 몬테카를로 트리 탐색 전체 게임 트리—합법적 수 361개, 평균 300수, 탐색 공간은 우주의 원자 수를 초과—를 펼치는 대신, AlphaGo는 MCTS로 어떤 가지를 확장할지 선택적으로 결정한다. 핵심 자료구조는 국면 단위 노드로, 방문 횟수와 Q값—해당 노드를 통과한 모든 시뮬레이션의 누적 승률 평균—을 저장한다. 행동 선택 공식인 PUCT는 활용과 탐색을 균형 있게 조절한다. 로그 함수 형태로 증가하는 보너스가 덜 방문된 노드로 알고리즘을 유도하다가, 시뮬레이션이 쌓이고 Q값이 안정되면 이 보너스가 감소한다. Jang은 UCB에서 유래한 이 방식이 후회를 한정짓는 이유, 바둑의 결정론적 특성 때문에 MCTS의 확률이 진짜 무작위성이 아닌 몬테카를로 평균의 산물인 이유, 그리고 치환 동치 국면을 병합해 탐색 트리를 가지치기하는 방법을 설명한다. > *"AlphaGo의 핵심 개념적 돌파구는 신경망을 활용해 이 탐색 문제를 다룰 수 있게 만든 것입니다."* ## [31:53] 신경망의 역할 두 개의 신경망이 MCTS 내부에서 비용이 큰 두 연산을 대체한다. 가치 네트워크는 국면을 승률 스칼라로 변환해 게임을 종료까지 롤아웃할 필요를 없애고, 정책 네트워크는 합법적 수에 대한 확률 분포를 출력해 탐색 트리를 유망한 자식 노드 쪽으로 집중시키고 무관한 긴 꼬리를 걸러낸다. Jang은 재구현 과정에서 ResNet과 트랜스포머를 모두 시험했다. 개인 GPU로 학습 데이터가 적은 환경에서는 ResNet이 트랜스포머를 앞질렀다. 트랜스포머는 멀리 떨어진 바둑판 특징을 연결하는 전역 어텐션이 필요하지만, 동시에 국소 불변성을 학습하기 위해 더 많은 데이터를 필요로 하기 때문이다. KataGo의 핵심 아키텍처 통찰은 잔차 스택에서 전역 특징을 명시적으로 풀링해, 전역 어텐션 없이도 19x19 바둑판 반대편에서 벌어지는 싸움이 서로 영향을 미치게 한 것이었다. > *"데이터가 적은 환경에서는 제 경험상 ResNet이 아직도 트랜스포머보다 낫고, 예산이 적을 때 더 효율적입니다."* ## [01:00:22] 자기대국 자기대국은 AlphaGo가 아무것도 모르는 상태에서 인간을 초월하는 실력으로 성장하는 핵심 과정이다. 매 게임이 끝나면 MCTS는 원래 정책 네트워크의 사전 분포보다 더 뾰족한 수 분포를 만들어내고, 이 분포가 정책 헤드의 학습 목표가 된다. 정책 네트워크는 MCTS 출력을 향해 증류되고, 다음 세대 게임은 더 나은 사전 확률에서 출발해 같은 탐색 단계에서 더 많은 향상을 얻는다. Jang은 이를 복리 배당이 붙는 테스트 타임 스케일링으로 설명한다. 1,000번의 MCTS 시뮬레이션을 정책 네트워크에 증류하면 다음 훈련 라운드의 출발점이 올라가고, 두 번째 1,000번의 시뮬레이션이 증류 없이 2,000번 이상 시뮬레이션해야 얻을 승률을 만들어낸다. 결정적으로, 모든 게임의 모든 수가 지도 학습 목표를 생성한다—단순히 승리자만이 아니라—그래서 학습 신호의 분산이 단순한 정책 경사 방식보다 훨씬 낮다. > *"AlphaGo가 스스로 훈련하는 방식의 아름다움은, 이 최종 탐색 과정의 결과를 가져다가 정책 네트워크에게 'MCTS가 여기까지 오느라 이 모든 수고를 하는 대신, 처음부터 그냥 이걸 예측하면 어때?'라고 말할 수 있다는 겁니다."* ## [01:25:27] 대안적 RL 접근법 Jang은 세심한 사고 실험을 제시한다. MCTS 목적함수를 LLM이 사용하는 단순한 정책 경사 방식—게임 승리자를 찾고 그 게임의 모든 수를 강화—으로 대체하면 어떻게 될까? 100명의 실력이 균등한 에이전트 리그에서 단 하나의 결정적 수 덕분에 51 대 49로 이긴 에이전트의 학습 데이터셋은 신호를 담지 않은 수들로 압도적으로 희석된다. 그 유일하게 의미 있는 수 하나가 약 3만 개의 무관한 수에 묻혀버린다. 이 신용 할당 문제가 RL에서 어드밴티지 함수와 기준선이 존재하는 근본 이유다. 가치 기준선을 빼면 원시 보상 신호가 어드밴티지로 변환된다—각 행동이 평균보다 얼마나 나았는지—그래서 경사 분산이 대폭 줄어든다. Q-러닝과 TD 방법은 전체 롤아웃 없이도 그 어드밴티지를 근사하기 때문에, MCTS를 쓸 수 없는 영역에서 중요하다. > *"핵심은 이런 겁니다. 우리가 취한 모든 행동에 대해 MCTS로 더 잘할 수 있는지 꽤 철저하게 탐색한 뒤, 정책 네트워크가 그 결과를 예측하게 만들어서 우리가 취한 모든 행동을 개선한다는 것입니다."* ## [01:45:36] MCTS가 LLM에 작동하지 않는 이유 PUCT 탐색 공식은 경계가 있는 이산 행동 공간과 국면 전반에 걸쳐 일반화되는 가치 함수를 전제한다. 바둑은 이 두 조건을 모두 만족하지만, LLM 추론은 둘 다 만족하지 않는다. 토큰 어휘가 너무 방대해서 같은 부분 시퀀스를 두 번 방문할 가능성이 거의 없고, 진행 중인 생각의 연쇄가 문제를 풀 궤도에 있는지 신뢰할 수 있게 알려주는 국면 수준의 가치 함수도 없다. Jang은 LLM이 겉으로 보면 트리 탐색과 비슷한 행동—재고, 되돌리기, 헤징—을 보이지만, 이는 명시적 트리 구성이 아니라 인컨텍스트 행동에서 나온다고 지적한다. 특히 중간 상태가 더 엄격한 논리 구조를 갖는 수학 같은 영역에서는 순방향 탐색이 어떤 형태로든 돌아올 가능성을 열어둔다. 근본적인 병목은 토큰 수준에서 신뢰할 수 있고 쿼리 효율적인 가치 함수가 없다는 것이다. > *"LLM에서는 같은 자식 노드를 두 번 이상 샘플링할 가능성이 거의 없습니다. 여러 단계의 사고 과정이 있다면, 언어가 너무 넓고 열린 공간이라 이산적 행동 집합은 LLM에 적합한 선택이 아닙니다."* ## [02:00:58] 오프폴리시 학습 Dwarkesh가 하나의 수수께끼를 제시한다. 모든 AI 연구자가 오프폴리시 학습을 경계하는데, AlphaGo Zero는 오래된 정책 버전으로 생성된 게임이 가득한 대형 리플레이 버퍼로도 잘 작동한다. Jang은 DAgger 관점으로 이를 풀어낸다. 중요한 건 데이터가 엄밀히 온폴리시인가가 아니라, 버퍼의 상태 분포가 현재 정책이 실제로 방문할 상태와 그 합리적인 주변 영역을 커버하는가다. AlphaGo에서 리플레이 버퍼가 작동하는 이유는 최근 체크포인트의 게임 상태가 여전히 현재 정책 분포 가까이 있기 때문이다. 로봇공학에서는 분포 이동이 심각하기 때문에, 에이전트가 절대 도달하지 않을 국면에 대해 최적 행동을 학습하는 실패 모드가 실제 위험이다. QT-Opt 같은 시스템에서 도출된 실용적 해법은 보상 형성에는 오프폴리시 데이터를 활용하면서 정책 경사는 온폴리시로 유지하는 것이다. > *"이런 알고리즘에서 원하는 건 방문할 상태가 대부분을 차지하되, 최적 궤적 주변의 고차원 튜브 안에 합리적인 비율의 상태도 포함되는 것입니다."* ## [02:11:51] RL은 생각보다 훨씬 더 정보 비효율적이다 Dwarkesh는 두 차원의 비효율성 논증을 제시한다. 첫 번째 차원은 모두가 아는 것이다. 정책 경사 RL은 학습 신호가 오기까지 전체 궤적 롤아웃이 필요하기 때문에, 에이전트가 더 긴 호라이즌의 과제를 다룰수록 FLOP당 샘플 수가 급감한다. 두 번째 차원은 샘플당 비트다. 학습 초기에 10만 토큰 어휘를 가진 LLM이 무작위 샘플링으로 "파란색"을 발견해야 한다면, 단 한 번의 성공을 보기 위해 약 10만 번의 롤아웃이 필요하다. 반면 지도 학습의 교차 엔트로피 손실은 매 단계마다 모델의 분포가 "파란색"에서 얼마나 멀었는지 정확히 알려준다. MCTS는 두 문제를 모두 피한다. 모든 수마다 지도 학습 목표를 생성하고, 그 목표는 이진 승패 신호를 수천 토큰에 희석하는 것이 아니라 현재 정책보다 엄격하게 더 낫다. Jang의 관찰: MCTS가 신호를 전혀 주지 않는 상황은, 정책이 이미 MCTS 분포에 정확히 수렴한 경우 외에는 존재하지 않는다. > *"MCTS가 신호를 전혀 주지 않는 상황은, MCTS 분포가 정책 네트워크의 예측과 정확히 일치하도록 수렴한 경우 외에는 없습니다."* ## [02:22:05] AI 연구 자동화 Jang은 AlphaGo 프로젝트 상당 부분을 자동화된 LLM 코딩 루프로 진행하면서, AI 연구 자동화가 잘 되는 부분과 아직 부족한 부분을 현장감 있게 전한다. 하이퍼파라미터 최적화 측면에서는 현재 모델이 실제로 대학원생 수준의 작업을 해낸다. 기울기 흐름 문제를 진단하고, 데이터 로더 증강을 재작성하고, 고정된 예산에서 측정 가능한 퍼플렉시티 향상을 이끌어낸다. 실험 실행과 플로팅 측면에서도 단순한 스킬 설명만으로 분석이 포함된 완전한 실험 세트가 생성된다. 모델이 아직 신뢰할 수 없는 것은 발상의 전환이다. 어떤 연구 방향이 구조적으로 막혔다는 걸 인식하고, 막다른 실험을 더 쌓기 전에 다른 프레임으로 점프하는 것. Jang은 이 문제를 반복적으로 겪었다. 모델은 막힌 방향을 계속 파고들었고, 그 방향 자체가 맞는지 물음표를 달지 않았다. 그의 진단은 학습 신호 문제다. 바둑처럼 올바른 외부 루프를 갖춘 RL 환경을 구축하는 것이 결국 모델이 연구의 지역 최적점에서 탈출하는 법을 배우게 할 것이라고 본다. > *"오늘날 대중이 접근할 수 있는 현재의 클로즈드 모델들은, 주어진 방향에서 다음 실험으로 무엇을 선택할지 그다지 잘 못하는 것 같습니다. 한 발 물러서서 '잠깐, 이 방향은 별로 말이 안 되는데'라는 발상의 전환을 하지 못하는 것 같습니다."* ## 등장인물 - **Eric Jang** (인물): 1X Robotics의 AI 부문 부사장; 이전에는 Google Brain/DeepMind Robotics의 선임 연구 과학자; 안식년에 AlphaGo를 재구현함. - **Dwarkesh Patel** (인물): Dwarkesh Podcast 진행자; 인터뷰 중 bits-per-FLOP RL 비효율성 분석을 함께 발전시킴. - **AlphaGo / AlphaZero** (소프트웨어): DeepMind의 바둑 AI 시스템으로 MCTS와 딥 신경망을 결합; 에피소드의 기술적 핵심. - **KataGo** (소프트웨어): David Wu(Jane Street)의 오픈소스 바둑 엔진으로 AlphaGo Zero 대비 40배 연산 효율을 달성; Jang의 주요 참조 구현체. - **Monte Carlo Tree Search (MCTS)** (개념): UCB/PUCT를 통해 활용과 탐색을 균형 있게 조절하는 반복적 탐색 알고리즘; 에피소드의 중심 분석 틀. - **신용 할당 문제** (개념): RL에서 긴 궤적 안의 어떤 행동이 긍정적 결과를 초래했는지 판별하는 어려움; 어드밴티지 함수, 기준선, 가치 네트워크의 존재 이유. - **DAgger** (개념): Dataset Aggregation 알고리즘; 버퍼 상태가 현재 정책 분포 가까이 있는 한 AlphaGo의 리플레이 버퍼가 허용되는 이유를 설명. - **Andrej Karpathy** (인물): 정책 경사 RL의 희소 학습 신호를 "빨대로 지도 학습을 빨아먹는 것"이라 표현한 것으로 인용됨.

#alphago#monte-carlo-tree-search#reinforcement-learning

AI는 아직 수학자를 대체하지 않는다 – Terence Tao

Terence Tao는 수학에서 AI가 맡게 될 진화하는 역할을 논하면서, AI가 많은 정형 업무를 자동화하겠지만 인간 수학자를 완전히 대체하지는 않고 오히려 그들이 새로운 영역에 집중하도록 만든다고 주장한다. 그는 인간과 AI의 협업이 열어갈 미래, 그리고 AI가 과학적 발견에 미칠 장기적 영향의 예측 불가능성을 강조한다. ## [00:10] 프런티어 수학에서 AI의 현재 역할 Terence Tao는 AI가 이미 인간은 할 수 없는 '프런티어 수학'을 수행하고 있지만, 그 프런티어는 우리가 익숙했던 것과는 다른 종류라고 설명한다. 그는 이를 과거에 계산기가 인간의 능력을 뛰어넘는 작업을 전문화된 방식으로 처리하며 수학의 가능성을 확장했던 방식에 비유한다. > *어떤 면에서 그것들은 이미 인간이 할 수 없는 초지능적인 프런티어 수학을 수행하고 있지만, 우리가 익숙한 프런티어와는 다른 종류의 프런티어입니다.* ## [00:52] AI는 대체가 아닌 자동화 도구 Tao는 10년 안에 AI가 현재 수학자들이 수행하는 많은 정형 업무를 대신 처리하면서, 인간은 더 복잡하고 중요한 문제에 집중할 수 있게 될 것이라고 전망한다. 그는 과거에 컴퓨터가 '인간 계산수'의 업무를 자동화했거나, 유전체 분석이 자동화된 뒤에도 유전학이라는 학문이 새로운 규모로 계속 진화한 역사적 전환을 예로 든다. > *10년 안에 지금 수학자들이 하는 많은 일들이… AI에 의해 수행될 수 있을 것입니다. 하지만 그것이 우리 작업에서 가장 중요한 부분은 아니었다는 것을 우리는 알게 될 것입니다.* ## [02:46] 수학에서의 인간-AI 협업의 미래 Dwarkesh Patel은 AI가 밀레니엄 난제를 자율적으로 풀 수 있는지 묻는다. Terence Tao는 '인간+AI 하이브리드'가 앞으로도 오랫동안 수학을 지배할 것이라고 본다. 현재의 AI는 지적 작업을 완전히 대체할 모든 요소를 갖추지 못했기에 보완적 도구로 기능한다는 설명이다. > *인간과 AI의 하이브리드가 앞으로도 오랫동안 수학을 지배할 것이라고 저는 믿습니다.* ## [03:43] 과학적 발견에 미칠 예측 불가능한 영향 Tao는 AI가 과학과 새로운 발견을 가속화하는 동시에, '우연성을 파괴함'으로써 특정 유형의 진보를 저해할 가능성도 있음을 인정한다. 그는 AI가 과학적 발견에 미칠 미래의 영향은 매우 예측 불가능하다고 결론짓는다. > *AI가 어떤 식으로든 우연성을 파괴함으로써 실제로 특정 유형의 진보를 저해할 가능성도 있습니다.* ## 등장인물·개념 - **Terence Tao (테렌스 타오)** (인물): 게스트이자 당대를 대표하는 수학자. - **Dwarkesh Patel** (인물): 해당 팟캐스트의 호스트. - **AI** (개념): 인공지능. 수학과 과학적 발견에서의 역할을 논의함. - **Mathematica / Wolfram Alpha** (소프트웨어): 수학 자동화의 예시로 언급된 계산 도구. - **밀레니엄 난제 (Millennium Prize Problems)** (개념): 수학의 7대 미해결 난제. 각 문제에 100만 달러의 상금이 걸려 있음.

#ai#mathematics#terence-tao

테런스 타오 – 세계 최고의 수학자가 AI를 활용하는 방법

타오와 드와케시는 케플러의 행성 운동 발견을 렌즈 삼아, AI가 과학에서 실제로 무엇을 바꾸고 있는지를 살펴본다. 타오는 가설 생성 비용이 이제 거의 0에 가까워졌기 때문에 병목이 평가, 동료 심사, 그리고 시간의 검증으로 이동했다고 주장한다. 현재 AI는 폭(모든 문제에 모든 표준 기법 시도)에서 우위를 점하고, 인간은 깊이(부분적 진전을 쌓아 올리는 능력)에서 앞서기 때문에, 하이브리드 방식이 적어도 앞으로 10년간 수학을 지배할 것이라고 본다. ## [00:00] 케플러는 고온 LLM이었다 타오는 케플러가 행성 운동의 세 법칙에 이르게 된 과정을 다시 풀어낸다. 케플러는 틀렸지만 아름다운 이론, 즉 행성 궤도 사이에 플라톤 입체를 내접시키는 이론에서 출발했다. 그는 티코 브라헤의 육안 관측 데이터를 수년간 씨름한 끝에야 그 이론을 포기했다. 타원 궤도, 면적 법칙, 조화의 법칙은 10년에 걸친 데이터 분석에서 나왔고, 뉴턴의 설명은 한 세기 뒤에야 등장했다. 드와케시의 프레임: 케플러는 검증 가능한 데이터셋에 대해 무작위 관계를 순환하는 고온 LLM과 닮았다. 타오는 메커니즘에는 동의하지만 병목에 대해서는 반박한다. 아이디어 생성은 이미 싸고 풍부했다. 케플러에게 부족했던 것은 브라헤의 한 차원 높은 데이터와, 데이터가 틀렸다고 말하는 아이디어를 버리는 인내심이었다. > *하지만 말씀하셨듯이, 그에 상응하는 검증이 뒷받침되지 않으면 그것은 슬롭에 불과합니다.* ## [11:44] AI 슬롭 더미 속에 새로운 통합 개념이 있다면 어떻게 알아챌 수 있을까? 타오: AI가 아이디어 생성 비용을 거의 0으로 낮췄다면, 동료 심사와 시간의 검증이 새로운 제약이 된다. 학술지는 이미 AI가 생성한 논문 투고에 허덕이고 있다. 어떤 아이디어의 위상은 이후 과학이 그것을 어떻게 활용하느냐에 달려 있다. 코페르니쿠스는 케플러가 그림을 완성하기 전까지 프톨레마이오스보다 정확도가 낮았다. 그래서 현재 시점에서 그 평가를 자동화하기란 어렵다. 드와케시는 수백만 편의 평범한 논문 속에 파묻힌 벨 연구소식 통합 개념, 즉 샤넌의 비트나 트랜스포머 같은 개념을 과학이 어떻게 찾아낼 수 있을지 묻는다. 타오의 답은 인간이 남을 수 있는 영역을 가리킨다. 과학자들은 단순히 이론을 생산하는 것이 아니라, 다른 과학자들이 수년간 후속 연구에 투자하도록 설득하는 이야기를 만들어낸다. 다윈의 산문이 뉴턴의 라틴어 방정식이 하지 못한 일을 해냈다. > *AI는 아이디어 생성 비용을 거의 0에 가깝게 낮췄습니다. 인터넷이 소통 비용을 거의 0으로 낮춘 것과 아주 비슷한 방식으로요.* ## [26:10] 연역적 오버행 타오는 기존 데이터 속에 아직 발굴되지 않은 신호에 대해 말한다. 천문학은 수세기 동안 최소한의 데이터에서 최대한의 정보를 끌어내는 학문이었다. 퀀트 헤지펀드가 천문학 박사를 선호하는 이유도 여기에 있다. 그가 좋아하는 사례 하나: 연구자들이 인용 사슬을 따라 오타가 전파되는 방식을 추적해서, 과학자들이 인용하는 논문을 실제로 읽는지 측정했다. 그는 AI 진보 자체에도 같은 과학사회학적 분석을 적용해, 인용 패턴, 학회 언급, 그리고 다른 흔적을 발굴하여 어떤 결과가 실제 진보였는지 시간의 검증을 기다리지 않고 탐지할 수 있다고 제안한다. > *한 가지 시사점은, 많은 분야에서 연역적 오버행이 사람들이 인식하는 것보다 훨씬 클 수 있다는 것이었습니다.* ## [30:31] 보고된 AI 발견의 선택 편향 AI는 약 1,100개의 에르되시 문제 중 약 50개를 풀었고, 그 이후 정체되었다. 타오는 선택 효과를 설명한다. 그 50개는 기존 문헌이 거의 없었다. 하나의 잘 알려지지 않은 기법과 하나의 알려진 결과만 있으면 충분했고, AI 도구는 "모든 표준 조합 시도하기"에 탁월하다. 기존 방법으로 80%가 완성된 문제라면 AI가 해결한다. 진정으로 새로운 기법이 필요한 문제에서는 도구가 멈추고, 체계적 탐색에서 문제당 성공률은 1-2%다. 타오의 비유: AI 도구는 어두운 산악 지형에 풀어놓은 점프 로봇 같다. 인간이 닿지 못하는 낮은 벽은 넘을 수 있지만, 손잡이를 잡고 버티면서 부분적 진전을 발판 삼아 올라오지는 못한다. 낙관적 해석, 즉 AI가 특정 수준에 도달하면 수백만 개의 병렬 복사본을 수백만 개의 문제에 실행할 수 있다는 점은, 과학이 폭을 실제로 활용하는 새로운 패러다임을 필요로 하는 구조적 이유이기도 하다. > *AI는 폭에서 탁월하고, 인간은, 적어도 전문가 인간은 깊이에서 탁월합니다.* ## [46:43] AI는 논문을 더 풍부하고 넓게 만들지만, 더 깊게 만들지는 않는다 타오는 자신의 작업 방식을 설명한다. 부수적 작업 비용이 약 5배 낮아졌기 때문에, 논문에는 이제 더 많은 코드, 더 많은 그림, 더 깊은 문헌 조사가 담긴다. 실제 핵심, 즉 문제의 가장 어려운 부분을 푸는 작업은 여전히 종이와 펜으로 이루어진다. 그는 자신이 "2배 더 생산적"이라고 선뜻 말하기 어렵다고 한다. 달라진 것은 쓰는 논문의 유형이지, 처음 제기한 질문에 답하는 속도가 아니기 때문이다. 영리함과 지성의 구분도 같은 지점에 닿는다. 두 인간이 수학 문제를 함께 풀 때, 각각의 실패한 시도는 다음 시도를 위한 발판이 된다. 현재 AI는 새로운 세션을 시작하면 이전 세션에서 파악한 것을 잊는다. 점진적 발판 쌓기 단계가 없다. 남은 것은 무작위 시행착오와, 결국 다음 학습 실행에 흡수되는 것뿐이다. > *논문이 더 풍부하고 넓어졌습니다. 하지만 반드시 더 깊어진 것은 아닙니다.* ## [53:00] AI가 문제를 풀면 인간은 거기서 이해를 얻을 수 있을까? AI가 Lean으로 리만 가설을 증명하고 우리는 아무것도 이해하지 못하는 상황이 올 수 있을까? 타오는 크게 걱정하지 않는다. Lean은 어떤 증명이든 원자적으로 분해할 수 있는 성질을 가진다. 각 보조 정리를 독립적으로 검사하고, 제거하고, 테스트할 수 있다. 3,000줄짜리 생성된 증명도 원자재가 된다. 다른 AI가 우아함을 위해 재구성하고, 다른 인간이 개념적 내용을 추출할 수 있으며, 원래 도출 과정이 불투명했더라도 결과물은 여전히 유용하다. 그는 거대한 Lean 생성 증명을 분해해서 그 안의 아이디어를 찾아내는 일, 일종의 증명 고고학을 직업으로 삼는 수학자 집단이 생겨날 것이라고 예측한다. 여기서 인간의 판단력과 AI의 절제 도구가 함께 쓰인다. > *인간이 이 도구들과 협력하는 상호작용에서 훨씬 더 많은 것을 얻을 수 있을 겁니다.* ## [59:20] 과학자들이 실제로 소통하는 방식을 담을 반형식 언어가 필요하다 드와케시는 수학적 증명이 아닌 수학적 전략을 위한 반형식 언어가 어떤 모습일지 묻는다. 타오는 가우스의 소수 정리, 즉 어떤 증명도 존재하기 전에 원시 데이터에서 도출된 수학 최초의 대규모 통계적 추측을 거쳐, 쌍둥이 소수 추측까지 이야기를 이어간다. 수학자들이 쌍둥이 소수 추측을 믿는 이유는 소수의 무작위 모델이 그것을 예측하기 때문이다. 수학에는 엄밀한 증명과 엄밀한 발견법이 모두 있다. 그런데 증명 쪽만 Lean이 검증할 수 있는 형태로 형식화되었다. 발견법 쪽이 형식화되지 않은 이유: RL로 검증 가능한 채점자가 있으면 그 채점자가 공략 대상이 되고, "이 논증이 설득력 있다"는 주관적 판단은 아직 해킹 가능한 프레임워크를 허용하지 않는다. 타오는 장난감 수학적 우주에서 소규모 AI를 실행하며 어떤 전략이 출현하는지 관찰하는 방식으로 추측 생성과 전략 선택을 대규모로 벤치마킹하는 방법을 원한다. > *우리가 AI를 유용한 방식으로 삽입할 수 있는 방법을 아직 모르는 과학의 주관적인 측면이 있습니다.* ## [69:48] 테리가 시간을 쓰는 방법 타오는 새로운 하위 분야를 어떻게 흡수하는지 설명한다. 그는 스스로를 벌린의 의미에서 여우로 위치시킨다. 모든 것에 대해 조금씩 알고, 필요할 때만 고슴도치가 된다. 원동력은 완결주의적 집착이다. 다른 수학자가 자신이 모르는 기법으로 결과를 증명한다면, 그 기법이 무엇인지 반드시 추적해야 한다. 같은 이유로 그는 비디오 게임도 그만두어야 했다. 다른 수학자들과의 협업이 주된 수단이고, 블로그에 글을 쓰는 것은 여섯 달 뒤에 자신이 유도한 논증을 잊고 다시 같은 논쟁을 반복하지 않기 위해 개발한 기억 보조 수단이다. 일정에서 타오는 의도적으로 우연의 여지를 남긴다. 편안한 영역 밖의 회의에 전혀 참석하지 않을 만큼 시간을 촘촘하게 최적화하는 것은 피하고 싶다고 한다. 고등연구소에서 보낸 1년이 그 함정을 확인해 주었다. 순수 연구 2주는 훌륭했지만, 그 뒤로는 영감이 고갈되었다. 다음 서가에서 우연히 발견한 책, 복도에서 나눈 가벼운 대화, 마지못해 참석한 회의가 사실 훨씬 더 많은 일을 하고 있었다. > *그런 우연한 만남들이 최적이 아닌 것처럼 보일 수 있지만, 실제로는 정말 중요합니다.* ## [77:05] 인간-AI 하이브리드가 수학을 훨씬 더 오래 지배할 것이다 AI가 언제쯤 수학을 홀로 할 수 있을까? 타오는 프레임을 바꾼다. AI는 이미 인간이 할 수 없는 수학을 하고 있다. 계산기가 그랬던 것처럼, 다만 다른 영역에서. 앞으로 10년 안에 대학원생이 현재 하는 일의 상당 부분, 즉 표준 기법 적용과 문헌 탐색이 AI로 이전될 것으로 본다. 하지만 분야 자체는 컴퓨터 대수 시스템이 기호 적분을 흡수했을 때처럼 한 단계 올라갈 것이다. 염기서열 분석이 저렴해졌다고 유전학이 끝나지 않았다. 생태계 규모로 확장되었을 뿐이다. 수학도 같은 길을 걸을 것이다. 지금 수학에 진입하는 학생들에게 그의 조언은 이렇다. 변화를 전제하되, 자격증은 옛날 방식으로 취득하라. 지금은 아직 전통적인 방식으로 수학을 공부하는 것의 대안이 없다. 동시에, 아직 존재하지 않는 것들을 포함해 완전히 새로운 연구 방식이 등장하면 그것을 활용할 수 있을 만큼 유연해져야 한다. AI 도구와 Lean 덕분에 오늘날 고등학생도 실제 수학 연구에 기여할 수 있다는 사실은 5년 전에는 없던 일이다. > *저는 인간과 AI의 하이브리드가 수학을 훨씬 더 오래 지배할 것이라고 믿는 것 같습니다.* ## 등장인물 - **Terence Tao** (인물): 필즈메달리스트(2006), UCLA 수학자. AI가 수학 연구에서 맡는 역할에 대해 정기적으로 글을 쓴다. - **Dwarkesh Patel** (인물): Dwarkesh Podcast 진행자. AI, 과학, 기술에 관한 장시간 인터뷰를 진행한다. - **Johannes Kepler** (인물): 천문학자(1571-1630). 티코 브라헤의 관측 데이터를 바탕으로 행성 운동의 세 법칙을 도출했다. - **Tycho Brahe** (인물): 덴마크의 육안 천문학자. 수십 년간의 행성 관측 데이터가 케플러에게 필요한 데이터셋이었다. - **Lean** (소프트웨어): 수학적 증명을 형식화하여 검증, 분해, 절제를 원자적으로 수행할 수 있는 증명 보조 도구. - **에르되시 문제** (개념): Paul Erdős가 제기한 약 1,100개의 미해결 문제. AI는 그중 약 50개를 풀었으며, 거의 모두 기존 문헌이 거의 없는 것들이었다. - **연역적 오버행** (개념): 기존 데이터에 이미 도출 가능한 지식이 훨씬 더 많이 내포되어 있다는 생각. 천문학이 그 모범 사례다. - **리만 가설** (개념): 소수 분포에 관한 미해결 추측. AI 증명이 인간의 수학적 이해를 실제로 진전시킬지를 가늠하는 시험 사례.

#ai-for-math#terence-tao#kepler

팟캐스트Hear the voice. See the shape of the thought.

채널 둘러보기

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, & Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

AI가 발전할수록 경제에서 차지하는 몫은 오히려 줄어들 수 있다 – Alex Imas & Phil Trammell

Chip design from the bottom up – Reiner Pope

AlphaGo를 처음부터 만들기 – Eric Jang

AI는 아직 수학자를 대체하지 않는다 – Terence Tao

테런스 타오 – 세계 최고의 수학자가 AI를 활용하는 방법

팟캐스트Hear the voice. See the shape of the thought.

채널 둘러보기

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, &amp; Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

AI가 발전할수록 경제에서 차지하는 몫은 오히려 줄어들 수 있다 – Alex Imas & Phil Trammell

Chip design from the bottom up – Reiner Pope

AlphaGo를 처음부터 만들기 – Eric Jang

AI는 아직 수학자를 대체하지 않는다 – Terence Tao

테런스 타오 – 세계 최고의 수학자가 AI를 활용하는 방법

No Priors: AI, Machine Learning, Tech, & Startups