LaiDub

ポッドキャスト世界の声を聴き、思考の刻みを見る。

チャンネルを探す

すべて AI・テクノロジービジネスサイエンスカルチャー政治哲学健康

What does the next training paradigm look like?

What does the next training paradigm look like?

Dwarkesh Patel narrates his essay on where AI training is headed. The labs are betting that scaling RL across millions of verifiable tasks gets you to AGI, but Dwarkesh argues that bet leaves two holes: most valuable skills aren't "grindable" enough to farm in a simulator, and the learning models pick up on the job never makes it back into their weights. He walks through why sample efficiency and continual learning are the same problem, sketches two candidate fixes — on-policy self-distillation and "dreaming" — and imagines an AI that keeps getting smarter from being deployed rather than from pretraining. ## [00:00] The big research bet the labs are making The labs' working theory: train AIs on millions of verifiable tasks across thousands of RL environments and you'll get a general problem-solver that can grind on open-ended work for weeks. Optimists argue the known deficits — data inefficiency, no continual learning — will get steamrolled by more compute, the same way classic NLP problems collapsed once LLMs scaled. Dwarkesh lays out their strongest counter to his own skepticism: the million-fold sample-inefficiency he flagged in his last essay is only a training-time cost, amortized across billions of sessions. What matters is how capable the model is *during* a session, and that keeps improving. Continual learning might not even be needed if context windows grow large enough to hold months of on-the-job experience. > *People often say that their employees are not net productive until six months or more on the job. So clearly, online learning is necessary for competence. But what if you could just fit those six months into the context window?* ## [02:12] Grindability is just as important as verifiability Why has computer use lagged coding and math when it's just as verifiable? Dwarkesh's underrated answer: being verifiable isn't enough — a domain also has to be *grindable*, meaning you can run thousands of parallel rollouts against a deterministic, replayable simulator from the same starting point. A coding repo clones trivially into a container; Amazon's checkout flow does not. This is the canyon wall AI progress only slowly chips at. You can sometimes build farmable simulators (clone Slack, clone Gmail), but most high-value skills — building a business, winning a court case, running a profitable trading day — require irreproducible interaction with the real world, where verification takes months and can't be re-observed across parallel rollouts. > *What is the RL environment to make an AI that is as good at politics as Lyndon Johnson, or as good at building a space-launch business as Elon Musk?* ## [06:10] Will RLVR alone generalize? The labs are betting RLVR generalizes — that enough containerized environments yield an agent that plans, adapts, and picks up new skills inside a single session, good enough to out-advise LBJ on a 1948 Senate race or build SpaceX with a hundred million dollars. Whether it generalizes that far is an empirical question, and Dwarkesh reads a Dario Amodei quote as a hint that it doesn't stretch infinitely: short-horizon training may not transfer to long-horizon performance. Even if in-context experience could turn a model into Henry Ford for a session, it's all wasted if the learning can't return to the weights. 30–50% of a lab's compute goes to inference that currently does nothing to improve the model — even though deployment is exactly where the most valuable information is revealed. > *We've got some genius grad student who's never been allowed to take a real internship, and we keep giving it more and more classroom case studies in the form of RL training on environments.* ## [08:41] Getting the learning back to the weights Continual learning means updating the weights, not endlessly growing a KV cache — brains don't separate parameters from activations, and they compress what they learn. But moving into the weights forfeits in-context learning's sample efficiency, because gradient updates are coarse. That's why every shipped online-learning model (like Cursor's Tab model, learning the same accept/reject objective across 400M+ requests a day) learns one identical thing across all users, which defeats the point when every job and company differs. Dwarkesh frames sample efficiency and continual learning as the same problem, then argues the bottleneck isn't architecture — new sparse-attention and KV-compaction papers ship weekly — but the loss function. His candidate is on-policy self-distillation: train the base model to make the same predictions a context-rich veteran version of itself would make. OPSD needs no outer-loop reward, gives denser per-token supervision than RL, and keeps RL's sparse-update property so on-the-job learning doesn't overwrite what the model already knows. > *The way you get better at your job is not by recalling the transcript of every single thing that happened every day with perfect fidelity. Rather, it's by consolidating the handful of insights and pieces of knowledge that are actually relevant to you getting better at your job.* ## [15:22] Dreaming The second, more speculative fix: let the AI build a simulation of reality and rehearse against it, experiencing orders of magnitude more samples per unit of wall-clock time. The precedent is EfficientZero, which beat novice humans at unfamiliar Atari games by playing dozens of simulated games in its head per real step. Simulating the whole world is far harder than emulating Go, which is why Dwarkesh flags this as speculative — but if it works, it becomes a fourth scaling axis alongside pretraining, RL, and inference-time compute. Instead of hitting `/compact` to summarize a session, you'd hit `/dream` and burn compute rehearsing against a video-game version of what the model is seeing in production. > *So instead of hitting /compact in Codex or Cursor or Claude... you hit /dream. And this incinerates huge amounts of compute to build and train against a video-game version of what the model is witnessing in the real world.* ## [17:23] What 2027 looks like Dwarkesh's scenario: RLVR produces an agent competent enough to start getting real-world experience, context windows stretch to a full week of co-working, and at the end of the week a thumbs-up triggers the base model to distill what it learned — via OPSD, dreaming, or some mix. Each round the model expands into domains adjacent to what it was last trained or deployed on. The endgame flips how AI improves: capability comes mostly from broad deployment across the economy, not from pretraining before release. Every interaction makes the model smarter — learning from your past sessions and from everyone else's — which Dwarkesh calls scary, exciting, and very different from today. > *Just as pretraining created a base intelligence that was smart enough to become a competent agent with enough RLVR on top, so RLVR has created an agent that is competent enough to actually be broadly deployed in the world.* ## Entities - **Dwarkesh Patel** (Person): Podcast host and essayist; narrates his own blog post on AI training paradigms. - **Dario Amodei** (Person): Anthropic CEO, quoted on why model performance degrades at long context. - **RLVR** (Concept): Reinforcement learning from verifiable rewards — training on reproducible, checkable tasks; the labs' main bet for reaching AGI. - **Continual learning** (Concept): Updating a model's weights from on-the-job deployment rather than only from pre-release training. - **Grindability** (Concept): Dwarkesh's term for whether a domain can be farmed via many parallel rollouts on a deterministic, replayable simulator. - **On-policy self-distillation (OPSD)** (Concept): Distilling a context-rich session's learning back into the base model's weights with dense per-token supervision. - **Dreaming** (Concept): Speculative fourth scaling axis where a model builds and trains against its own simulation of reality. - **EfficientZero** (Software): Sample-efficient RL model that beat novice humans at unseen Atari games by simulating many games per real step. - **Mercury** (Organization): Fintech banking platform; episode sponsor referenced in the bill-pay anecdote.

#ai-training#reinforcement-learning#rlvr

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Historian and novelist Ada Palmer joins Dwarkesh Patel to dismantle the "Machiavellian villain" myth and replace it with the actual Niccolò Machiavelli: a patriot who watched Cesare Borgia conquer half of Italy from up close, was tortured and exiled by the Medici, and then wrote *The Prince* as a secret job application addressed to the very regime that had wronged him. Palmer traces the structural forces — cascading legitimacy collapse among Italian city-states, popes who functioned as warring hereditary princes, and a patronage system that made nepotism feel like sound risk management — that made Machiavelli's analysis both urgent and unprecedented. The conversation closes on a sharp irony: the word "Machiavellian" now means self-serving cunning, yet the man himself gave up income, fame, and freedom rather than serve any cause that was not Florence. ## [00:00] How Florence bargained with Cesare Borgia for survival Italy in 1513 was a cascade of broken legitimacy. Palmer explains that when a long-standing government falls, successor regimes inherit none of its credibility, making rapid further overthrows nearly inevitable — what she calls the thread of continuity being cut. By the time Machiavelli is writing *The Prince*, this dynamic had swept dozens of Italian city-states. Compounding this was papal instability: because popes were elected rather than hereditary, the next pope was almost always a coalition pick of people who hated the current one, guaranteeing policy reversals every ten years. Machiavelli's day job during this era was standing next to Cesare Borgia — "Valentino" — and whispering endlessly that Florence was loyal, buying what Palmer calls "the boon of Polyphemus": the conqueror's promise to eat you last. His advice to Florence was to betray allies, pay tribute, give military support, and buy time, knowing full conquest was only delayed by Alexander VI's mortality. His biographers can still feel how much he was under Borgia's spell: when describing Valentino's fall, Machiavelli breaks from third person and writes "he told me" — the historian slips through the veil. > *"Machiavelli's job dealing with Cesare Borgia… it's very clear that the Borgia plan is to conquer the Papal States in the middle of Italy."* ## [15:08] Machiavelli's analytical innovations Machiavelli is not the crude "ends justify the means" thinker of caricature. Palmer shows that he is obsessed with the means — specifically, which means of acquiring power are stable and which are not. Whether betrayal works depends on the nature of your power base: Borgia could betray allies because his terror made remaining allies step further into line, while Savonarola's power rested on his followers believing him divinely infallible, so his flip-flopping destroyed him. The lesson is conditional, not universal. Machiavelli also makes the first recorded European argument that competing political parties can be stable and politically useful, rather than requiring mutual annihilation. Florence's own history was the counterexample: it had literally salted the earth where its Ghibelline opponents' houses once stood. His observation of Siena as a countermodel — parties competing without destroying each other — was genuinely novel. > *"Machiavelli is the first person that we have ever in the European tradition to suggest that it could be viable for there to be more than one political party in a state at the same time."* ## [23:58] Why popes became warlords The closer you lived to Rome, the less abstract the papacy felt. Palmer draws the contrast sharply: a Danish subject saw the pope as a figure of vast spiritual majesty; a Florentine saw "that asshole who went to college with your brother." Italians judged popes as specific men with dirty laundry, family grudges, and factional allegiances — which is why cities that were hereditarily Guelph (pro-papal) sometimes ended up fighting wars against the sitting pope when he happened to be from a Ghibelline family. The corruption was structural and self-reinforcing. As the Church accumulated donated wealth across generations, the incentive for ambitious families to capture it through bribery and nepotism grew. Palmer reads Machiavelli's personal letters haggling over the correct bribe to buy a priesthood for his brother Totto — written as routine household correspondence — to show how completely normalized the practice was. Every generation saw popes get more secular and military than the last; Machiavelli explicitly predicted the institution would collapse under accumulated corruption unless reformed from within, as St. Francis had temporarily saved it two centuries earlier. > *"This makes a stronger and stronger incentive for every ambitious family to send their second son into the Church."* ## [36:13] Why the common people demanded nepotism When Pope Paul III appointed a competent outsider general instead of his own illegitimate son, there were riots. Palmer explains this is not irrational: in a world where a soldier's oath ran to his commander, not to the state, the only guarantee the papal armies wouldn't turn on Rome was putting the pope's own son in charge — someone who rose and fell with the pontiff. Nepotism was the trust mechanism that made institutions function. Patronage also determined justice outcomes. Medieval law codes prescribed death for almost everything, but roughly 99 in 100 capital-eligible convictions ended in a fine because the defendant's patron intervened. This was considered correct: the trial was meant to replicate the soul's experience before divine judgment — terrifying, then mercifully pardoned — so patron intervention mirrored the intercession of a saint. The system had a grimly consistent internal logic, and Palmer traces it from Giordano Bruno (burned because he had angered his patron, not because of his ideas) to Giovanni Pico della Mirandola (spared because Lorenzo de' Medici went through the Orsini network to Rome). Without a patron, even innocence was precarious. > *"The norm is: you're accused of a severe crime, you're put on trial for your life, your patron intervenes, and you get a lighter sentence. This is how justice is supposed to work."* ## [47:57] Cesare Borgia brought terror to rulers and justice to the people Borgia's conquests produced a paradox that startled contemporaries: he massacred ruling families and was adored by common people. Palmer's explanation is structural. Factional cities had lived for generations under justice that tracked who was in power, not the facts of the case. A carpenter whose family worked for the dominant faction faced minimal consequences for his son's drunken homicide; the same crime by the carpenter of the out-of-power faction could be a capital offense. When Borgia wiped out both factions and installed outside administrators with no local feuds to take sides in, neutral adjudication felt like a revelation. Machiavelli also drew a hard line for why even a beneficent Borgia conquest of Florence would be catastrophic: under any arbitrary ruler, a citizen can be executed by a pointed finger in the street. Machiavelli called that condition slavery, regardless of how fair the tyrant might be in practice. Florence's "LIBERTAS" banner — flown by ordinary citizens defending an oligarchic Senate that excluded them — represented a genuine commitment to the existence of a process, however biased, over the absence of any process at all. > *"As a result, to everyone's surprise, he moves into a city, he massacres the rulers, he implements an authoritarian regime, and he's incredibly popular and beloved by the people."* ## [57:55] Art as a proxy for war Renaissance Florence could not afford to fight France militarily; it could afford to paint French royal symbols on its government buildings and commission beautiful gifts for the French king. Palmer frames this not as surplus expenditure but as substitution: the art budgets were military budgets redirected into a form of warfare Florence could win. Like the Fulbright Program being a higher return-per-dollar than the defense budget, Florentine cultural patronage was strategic deterrence. The period's orientation toward the past further supercharged the value of art. Where modernity assumes humanity advances into the future, Renaissance Europe pointed the other direction: the ideal was recapturing Rome. High-tech achievement meant successfully imitating a lost Roman technique. When a French diplomat arrived in Florence and saw the cathedral or the neoclassical buildings, he was not seeing quaint historical imitation — he was seeing something that approached what only Rome had achieved, and that France could not. That perception was itself a form of power. > *"If we fought him, we would lose. But if we play the culture victory game, that's cheaper, and we can try to win."* ## [01:06:41] Florence, a city famous in hell Dwarkesh raises the obvious puzzle: if everyone in Renaissance Italy was a Christian who genuinely believed in hell, why did they commit the sins Machiavelli describes constantly? Palmer's answer has two parts. First, the Dante answer: Dante fills the *Inferno* with Florentines precisely because he wants his contemporaries to feel the discomfort of consequences they were ignoring. His Paolo and Francesca passage — damning a love story everyone celebrated — was designed to be a shock to readers who thought romantic adultery was exempt from theological reckoning. Second, pre-Reformation Christianity assumed everyone sinned constantly and focused on repentance cycles rather than purity maintenance. St. Julian the Hospitaller, patron saint of murderers, was omnipresent in Florentine iconography — his legend held that he killed his own parents, spent his life in pilgrimage to repent, and was saved. Dozens of icons of him meant dozens of Florentines who had killed someone and were working through it. The Calvinist and Puritan emphasis on spotlessness came later and was a genuine departure from how the medieval and early Renaissance church operated. > *"He fills his hell with Florentines."* ## [01:15:57] The Prince was a job application to Machiavelli's torturers After the Medici retook Florence in 1513 and, on mistaken suspicion of conspiracy, tortured and exiled Machiavelli, everyone expected him to defect. He had contacts at every major court in Europe and the skills — military history, diplomatic networks, classical scholarship — that kings paid for. He chose instead to sit in a hamlet outside Florence writing *The Prince* as a secret appeal to the Medici to take him back. No other courts received it; he kept it proprietary, treating his political science the way Palmer says a nuclear scientist would treat classified weapons knowledge. His other works — the *Discourses*, the history of Florence, the comedy *Mandragola* — circulated publicly to build his reputation. *The Prince* did not. Palmer compares it to historian friends who produce classified 100-page reports for Department of Defense committees: bespoke proprietary knowledge for an audience of five, whose existence may be whispered about but whose contents are guarded. It also explains why the book was eventually published in 1532 without Machiavelli's input: surviving relatives wanted family fame, and the Medici wanted credit for a text dedicated to their house. Neither understood what its author had intended to keep contained. > *"I'm going to stay, and I'm going to rot, and I'm going to write The Prince, which is my job application begging the new regime to bring me back and let me work for them and demonstrating my loyalty, and I'm going to send it to them and only them, them and my immediate friends."* ## [01:41:39] During the Renaissance, original ideas had to be couched in antiquity The Renaissance's obsession with recovering ancient Rome created a peculiar incentive structure: original ideas were unfashionable; ideas presented as recovered ancient wisdom were prestigious. Palmer shows this goes far beyond homage. Giordano Bruno attributed to Aristotle claims that Aristotle explicitly contradicted. Annius of Viterbo forged ancient texts and staged fake archaeological digs to give his original historical theories the authority of antiquity. Marsilio Ficino, translating Plato, genuinely convinced himself that the wildly original cosmological and magical system he had assembled was secretly coded in the Platonic texts. This explains why Machiavelli's other major work is called *Discourses on Livy* rather than, say, *A New Theory of Republican Governance*. A discourse on an ancient was a prestige format; an original political treatise was a niche curiosity. The 19th century misread the Renaissance as intellectually barren — "200 years of people being wrong about Plato" — because it expected original standalone treatises and found commentary after commentary. Palmer argues the original ideas are there, using the ancients as what she calls the trellis up which the rose climbs. > *"Nobody wants original ideas. Original ideas are out of vogue. Original ideas are dead. All ideas need to be from the ancients."* ## [01:50:44] Why copyright began with the Inquisition Machiavelli was one of the first authors to experience unauthorized printing. A local press printed one of his works without asking, riddled it with compositor typos, and his only recourse was to write letters to important people clarifying that the errors were not his. There was no legal framework at all. The solution emerged from an unexpected direction: post-1515, the Inquisition required pre-publication approval for all texts to screen for heresy. In exchange for going through this process, the approved printer received a monopoly license — the Inquisition's record of permission served as proof that no one else could legally print the same book. The first copyright was a censorship certificate. England, observing this, copied the mechanism while eventually stripping out (or softening) the censorship half, producing the ancestor of modern copyright law. The institutional logic held together: the Inquisition needed to please local rulers to get resources, so approving books dedicated to the duke and granting his favored printer exclusivity was a political investment. Everyone — inquisitors, printers, authors, and ruling families — had reasons to make the system work. > *"So the very first version of copyright is the Inquisition."* ## [02:02:12] Machiavelli wasn't Machiavellian The word "Machiavellian" came to mean scheming self-advancement — Shakespeare's Richard III invokes "the murderous Machiavel" as his role model. Palmer traces how the idea of Machiavelli separated from the actual man and became a useful thought-experiment figure: the cynical, probably atheistic politician who wants nothing but personal power. The same splitting happened to Hobbes (the Beast of Malmesbury) and Spinoza, whose actual writing is warm and theistic but whose excommunication from the Jewish community made people assume he must be the most radical heretic imaginable. The real Machiavelli — who refused lucrative court positions across Europe, who kept his most important work secret to protect Florence from foreign exploitation, who chose to rot in an isolated hamlet over serving any cause that wasn't his country — is almost the opposite of "Machiavellian." His book is not about gaining power but about keeping power stable enough to protect people. Palmer's closing point: the gap between Old Nick and Niccolò Machiavelli is itself a revealing fact about how societies use ideas, splitting thinkers into a character useful for one purpose and the actual work useful for another. Read *The Prince* knowing it was written by someone who would give up anything to serve Florence, and a very different text comes through. > *"This is why it's so weirdly ironic to me that the reputation—the word"Machiavellian"—means"self-serving", when Machiavelli himself is one of the most selfless men I've ever read about in the history of the Earth."* ## Entities - **Dwarkesh Patel** (Person): Host of the Dwarkesh Podcast; interviews scholars on history, science, and technology. - **Ada Palmer** (Person): Historian and science fiction novelist at the University of Chicago; specialist in Renaissance intellectual history and the history of censorship. - **Niccolò Machiavelli** (Person): Florentine diplomat (1469–1527), author of *The Prince* and *Discourses on Livy*; wrote *The Prince* as a secret appeal to the Medici regime that had tortured and exiled him. - **Cesare Borgia** (Person): Renaissance military commander known as "Valentino"; son of Pope Alexander VI, conquered central Italy and was Machiavelli's primary case study in effective (if brutal) statecraft. - **The Prince** (Concept): Machiavelli's treatise on political power, written ~1513, kept proprietary during his lifetime and published posthumously in 1532; misread as a self-advancement manual rather than a guide to maintaining stable government. - **Discourses on Livy** (Concept): Machiavelli's longer republican political theory, structured as commentary on the Roman historian Livy; his public bid for intellectual prestige in a culture that prized commentary on ancients over originality. - **The Medici** (Organization): Ruling family of Florence, whose patronage networks and papal connections shaped both the political instability Machiavelli analyzed and the conditions under which he wrote and was exiled. - **Florence** (Organization): Italian city-state and center of Renaissance banking, art, and humanist scholarship; Machiavelli's country, for which he subordinated his entire career. - **Patronage System** (Concept): The multi-generational network of family obligations that served as the functional glue of Renaissance society, determining access to justice, employment, publication, and protection from the Inquisition.

#machiavelli#renaissance#political-philosophy

Sarah Paine - Why Putin and Xi can't escape geography

Sarah Paine - Why Putin and Xi can't escape geography

Naval War College historian Sarah Paine delivers a standalone lecture tracing two thousand years of geopolitical logic: continental empires (China, Russia) pursue security by expanding borders and crushing neighbors, while maritime powers (Athens, Britain, the US) pursue prosperity by trading across open seas. She argues this structural divide—rooted in the brute fact of geography—explains Putin's war on Ukraine, Xi's ambitions over Taiwan, and why the post-WWII rules-based order is the only arrangement that produces compounded growth rather than compounded ruin. ## [00:00] Setting the stage Paine opens by framing the lecture's core question: why do some great powers keep grabbing territory while others keep opening trade routes? The answer comes down to one physical fact—whether it is feasible to defend yourself at sea. Maritime powers can; continental powers cannot. That single asymmetry generates two entirely different military traditions, two economic models, and two competing visions of world order. She walks through American history as a warm-up: the US began life as a continental power (manifest destiny, the Mexican-American War, Alaska purchased when Russia needed cash), then pivoted toward a maritime identity after Alfred Thayer Mahan convinced strategists that naval trade, not westward land, was the real source of national power. Alongside Mahan, Paine introduces the three geopoliticians whose maps anchor the lecture: Halford Mackinder (the Eurasian heartland as the world's natural fortress, impervious to sea power), Nicholas Spykman (control the rimlands, and you influence the heartland), and their shared lesson that US security runs through sea lanes and alliances, not borders. > *"Maritime powers are the exception and continental powers are the rule. Why? Because maritime powers, if need be, can defend themselves primarily at sea with their navies. Whereas a continental power simply cannot—think Ukraine, a navy is not going to save them from Russia."* ## [12:10] The continental powers Paine works through the logic of the continental world starting with China—the original case—then Russia. Sun Tzu's *Art of War* contains no references to maritime warfare: it was written for a world where neighbors invade overland at any time and the only viable response is a mass army. Geography tells the rest: too much of China's land is vertical to feed its people, which makes controlling the arable lowlands an existential imperative. The Han expansion from the Yellow River Valley followed that logic for millennia, wiping out the Zongars, subjugating Tibet, producing the ethnic patchwork Beijing still manages with military administrative overlays. Russia's pattern is the same dynamic in reverse—a Moscow core expanding outward in concentric rings until it hit countries that fought back. The continental security playbook that emerges is ruthlessly coherent: no two-front wars, no great-power neighbors, take on threats sequentially, destabilize the rising ones, absorb the failing ones, maintain buffer zones in between. Paine closes the section with the WWII body count that makes the paradigm's cost visible: Russia lost over 25 million dead (soldiers plus civilians); the United States lost 295,000. The ocean moat is not an abstraction—it is the difference between hundreds of thousands and tens of millions. > *"In this world, you're faced with a binary choice: you either become Han or they will kill you. And genocide is what happens to the losers in continental warfare."* ## [29:12] The maritime alternative Where continental empires carve the world into exclusive spheres, maritime powers treat the sea as a commons to be shared. Paine traces the lineage from Athens through Rome ("Mediterranean" means the sea in the middle of the lands; "Zhongguo" means the kingdom among the kingdoms—one term centers the sea, the other the land), the Dutch Republic, and finally Britain. Hugo Grotius, a Dutchman watching his nation's trade pirated, wrote *Mare Liberum* to establish that the sea belongs to no one and therefore belongs to everyone—the founding document of international maritime law. Britain refined the operating strategy over the Napoleonic Wars into six rules for "elephant hunting": keep the home economy growing, blockade enemy trade, fund the allied continental power facing the main front, find a peripheral theater where sea access beats land access, never attack the enemy's main force directly, and—only after the elephant has been bled—pile on with allies. The key structural point: a navy that prevents invasion produces wealth invisibly. Britain compounded wealth for a century after Waterloo while its continental neighbors burned money funding standing armies and fighting each other. That invisible compounding, over generations, is the difference between North and South Korea. > *"Trade is going to finance the navy. It's going to protect both British homeland and some of the trade. And then Britain is going to be compounding wealth while its neighbors are busy—constantly fighting with each other and destroying wealth in the process."* ## [42:00] How the Industrial Revolution changed everything The Industrial Revolution flipped the source of power from land to commerce. When land determines wealth, conquest makes sense. Once wealth comes from industry and trade, territorial expansion is literally negative-sum: you destroy the asset while fighting for it. The Suez Canal is Paine's sharpest example—Egypt sank block ships in 1967 to deny Israel access, but the strategic result was that global shipping shifted to supertankers that go the long way around Africa at one-third the cost per ton. Closing a chokepoint accelerated the maritime world's efficiency. Malcolm McLean's shipping container reduced cargo loading costs from nearly $6 per ton to under 20 cents, and the ISO then harmonized container dimensions across trucks, railways, and ships—producing plummeting transport costs and the trade explosion that lifted hundreds of millions out of poverty. Xi's Belt and Road Initiative, Paine notes dryly, crosses some of the world's most unstable territory, requires constant trans-shipment between incompatible rail gauges, and can never be rerouted—the exact opposite of maritime flexibility. China's own geographic trap is inescapable: shallow, island-cluttered seas that become kill zones in wartime mean its merchant fleet reaches global markets only in peacetime. > *"Once wealth is a function of commerce, industry, and trade, it isn't land anymore. And this upends the world. If you think about the world today, who's rich, who's poor—it's often the degree to which the country is industrialized."* ## [52:00] Why Putin wants to break the world The post-WWII institutional framework—UN, IMF, NATO, WTO, EU—was built by people who survived both the trenches of WWI and the Great Depression, then spent WWII watching their own children die. Their conclusion: hash out differences with diplomats and lawyers, because sending soldiers destroys more value than any conceivable prize is worth. That system held the peace in the industrialized world for 75 years, until Putin decided to break it. Putin's challenge is not irrational by continental logic: a rising Ukraine integrated into NATO is precisely the kind of strong, stable neighbor that, in the old paradigm, becomes an existential threat. His goal is to hollow out the alliance system and shatter international law so the world reverts to warring spheres of influence—a world where continental powers can once again play their traditional game without maritime rules they were never designed for. Paine's answer is that sanctions are "economic chemotherapy": they suppress growth by one or two percent per year, and compounded over generations, that gap is the difference between North and South Korea. The objective is never to eliminate the rogue state but to contain it at acceptable cost. The only exit that avoids nuclear escalation is the one the post-war generation built: diplomats, lawyers, and institutions. > *"The only win-win solution is to deploy the diplomats and lawyers to hash out these things in international forums—because if we're all going to send soldiers, we're going to get a third world war with nuclear follow-on effects, and we'll see whether humanity makes it."* ## Entities - **Sarah Paine** (Person): Military historian at the U.S. Naval War College; sole speaker in this lecture; author of a 2025 lecture series on continental vs. maritime powers. - **Alfred Thayer Mahan** (Person): 19th-century U.S. naval strategist; argued that maritime trade and sea power, not land conquest, determine national greatness; associated with the Naval War College. - **Halford Mackinder** (Person): British geographer; 1904 "pivot area" thesis posited that the Eurasian heartland, insulated from sea power, is the world's natural fortress. - **Nicholas Spykman** (Person): Dutch-American strategist; argued that controlling Eurasia's rimland determines global power; died 1943 while warning the US about Eurasian dominance. - **Hugo Grotius** (Person): Dutch jurist; founder of international maritime law; *Mare Liberum* (1609) established freedom of the seas as a universal right. - **Malcolm McLean** (Person): American trucking entrepreneur who invented the standardized shipping container, collapsing cargo loading costs and enabling the post-war trade explosion. - **Continental power** (Concept): A state that cannot defend itself primarily at sea; prioritizes territorial expansion, mass armies, buffer zones, and exclusive spheres of influence; exemplified by Russia and China. - **Maritime power** (Concept): A state that can defend itself primarily at sea; prioritizes trade, open sea commons, alliance-building, and compounding wealth; exemplified by Britain and the United States. - **Rules-based international order** (Concept): The post-WWII institutional system (UN, IMF, NATO, WTO, EU) that enforces sovereignty and free trade; the system Putin and Xi seek to dismantle. - **U.S. Naval War College** (Organization): Graduate school of the US Navy in Newport, Rhode Island; Paine spent 24 years there; home of Mahanian sea-power theory.

#geopolitics#grand-strategy#maritime-power

AIが高度になるほど、経済に占めるシェアは縮小するかもしれない – Alex Imas と Phil Trammell

AIが高度になるほど、経済に占めるシェアは縮小するかもしれない – Alex Imas と Phil Trammell

経済学者の Alex Imas（Google DeepMind / シカゴ大学）と Phil Trammell（Epoch / スタンフォード大学）は、完全自動化の最も直感に反する帰結は、資本がすべてを獲得することではないと論じる。むしろ AI は、完全自動化された財の需要が飽和し、関係性・体験の市場では人間が依然として希少であり続けることで、自らの経済的存在感を縮小させる可能性があるという。対話は「AGI 後に希少なものとは何か」から始まり、再分配の政治学、現在の自動化を遅らせる O リング型補完性、蓄積志向の AI エージェントが将来の富の大半を持つことになる理由、そして AI サプライチェーンから締め出された途上国のとるべき選択へと展開する。 ## [00:00] 資本分配率は上昇するのか？ Dwarkesh は核心の問いから議論を開く。AI が人間のあらゆることを担えるなら、労働所得分配率はどこへ向かうのか。Alex Imas はまず、過去の産業転換を予測しようとした経済学者たちが何度も外れてきたことを指摘する。デービッド・リカードは産業革命による大量失業を予言し、どの職種が消えるかという方向性は正しかったが、全体的な結果については完全に外れた。2026 年の主要年齢層の就業率は、2000 年以降のほぼどの時点よりも高い。教訓は、構造転換の経済学者は旧来のコストが崩壊したときに生まれる新しい財や職種を一貫して過小評価してきた、ということだ。 Imas が提示するのが「関係的セクター」という概念だ。人間の存在そのものが価値の一部となる財やサービスを指す。人間は本質的に有限であるため、その他すべてを自動化が飽和させると、人間が関与するループの相対的希少性と価格は上昇する。Phil Trammell はこれをサプライチェーン会計の論拠で補強する。あらゆる財のネットワーク調整済み要素分配率を原材料まで遡ると、労働分配率はすでに驚くほど堅調であることがわかる。AI が非関係的な財をすべて限界費用ゼロで飽和させれば、消費者はその財への需要をすぐに使い尽くし、依然として希少なものへ支出を移す。バレリーナの舞台は、ソフトウェアが無料になっても安くならない。 > *「人間は本質的に希少です。だから多くのものが希少でなくなる自動化が進んでも、人間がある程度関与しているものでは希少性が残り続けるんです。」* > — Alex Imas Trammell は資本分配率の話へも論を広げる。人間が関わらないあらゆる財のサプライチェーンを完全自動化し、需要をすばやく飽和させれば、そうした財の追加単位の限界効用はゼロに近づく。結果として資本分配率は拡大するのではなく、実際には縮小するかもしれない。これがこのエピソードの直感に反する結論だ。 ## [19:36] 混乱した中間シナリオ Dwarkesh は Molly Kinder の「混乱した中間」という議論を持ち出す。AI が大惨事を招くわけではないが、分配の圧迫が長引く世界だ。企業が生産性向上の利益を取り込む一方、労働者は賃金停滞に直面し、政府の再分配は変化の速度に追いつかない。歴史的なアナロジーは電話交換手だ。1960 年代には技術的に自動化可能だったこの職種が実際に自動化されるまで 20 年かかった。制度的慣性があったためだ。労働者は一夜にして解雇されたわけではなく、多くは低賃金や不完全雇用の形で徐々に吸収された。 Imas は近い将来においては混乱した中間は起こりうると見るが、恒久的にはならないと考える。AI による生産性向上の規模がパイを十分大きくし、分配できるようにするからだ。政治経済上の問題は資源の希少性ではなく、速度と調整にある。政府は AI が原因の雇用喪失とそれ以外を見分けられず、政治的制約が摩擦を生み、数学的には最終的に帳尻が合うとしても、変位から再分配までの間隔は深刻な被害をもたらすほど長くなりうる。 > *「電話交換手は完全に自動化されましたが、技術が存在していたにもかかわらず 20 年かかった。だからこそ、徐々に滲み出るような変化になった。巨大なセクターが一瞬で消滅したわけじゃない。」* > — Alex Imas ## [25:57] AI 富を課税・再分配する方法 Imas は再分配の手段を「実施の複雑さ」と「効果が現れるまでの時間」という二軸で整理する。負の所得税は施行日に即効性があり、すぐに最低限の所得を保証する。ユニバーサル・ベーシック・キャピタルは、AI 関連企業の株式を市民全員に与えるものだが、リターンが生まれるまでに数年かかる。UBI はその中間に位置する。問題は速度だけでなく政治的持続性でもある。政府の直接給付に依存するプログラムは次の選挙の勝者に左右されやすいが、広く分散した株式保有は資産が分散しているため収奪が難しい。 Trammell は財源の問題と分配の問題を切り分ける。資金調達方法（富裕税、キャピタルゲイン課税、土地価値税、法人税）は、返還方法（現金、株式、公共サービス）とは分析上別の問題だ。ジョージスト的な土地価値税はしばしば議論されるが、AI 時代の再分配に必要な規模の財源としては不十分だと指摘する。AI が生み出す富は土地ではなくソフトウェアと計算資源に集中しているからだ。Phil は、税収を使って AI 企業の株式を広く市民に取得させることが、政治的安定性と経済効率の両立につながりうると示唆する。 > *「今の私たちは労働力という資産を持ち、それが収入に変わる。それがなくなり、基本的なニーズのために選挙で選ばれた政治家に委ねられることになったら、話は変わる。」* > — Alex Imas ## [30:02] 需要崩壊が起きにくい理由 Dwarkesh はホワイトカラー崩壊の語りを突いてくる。AI 主導の大量失業を示すデータはすでに存在するのか。Imas は Yale Budget Lab のデータを引き、せいぜい弱いシグナルが見える程度だと指摘する。ジュニアのソフトウェアエンジニア採用はトレンドをわずかに下回っているが、シニアエンジニア需要は横ばいかむしろ上昇している。ホワイトカラー全体を通じた失業率の水準シフトは見られない。O リング補完性（次の章で詳述）も説明の一つだが、行動面の理由もある。企業が現代性を示そうとパフォーマンスとして AI を導入し、人員を削減したりトークン使用量を最大化したりしているケースがあり、生産性を実際に損なっていることもある。需要の問題全体として見ると、ソフトウェアは物理的な財と同じ弾力性のルールに従うのかという疑問が浮かぶ。食べ物は食べれば止まるが、ソフトウェアへの需要は止まるのか。Imas と Dwarkesh は、ソフトウェアは価格が下がっても需要が追いつくほど弾力的である可能性があると論じる。コンピューティングの歴史は、安価な計算資源が需要の崩壊を招くのではなく、常により多くの需要を生んできたことを示している。主なリスクは特定の財での飽和であり、労働需要全体の問題ではない。 > *「ジュニア開発者の就職が以前より減っているというシグナルは少しあるかもしれないが、それは『以前より減っている』であって水準シフトではない。シニアのソフトウェアエンジニアへの需要はむしろ増えている。」* > — Alex Imas ## [39:26] 人間の従業員を機械経済に組み込むことの難しさ O リングモデルは、チャレンジャー号の事故でたった一つの部品の失敗がすべてを破壊したことにちなんで名付けられており、現在の AI 自動化が予想より遅い理由と、将来の自動化が構造的に人間を排除するかもしれない理由の双方を説明する。現時点では法務や会計ワークフローの 90% を自動化できても、クライアントは依然として人間のサインオフを求める。一か所の失敗が出力全体を無効にしうるからだ。この信頼性の制約が、AI の能力が高くても人間の雇用を維持させている。 Phil Trammell はこの論理を将来に向けて反転させる。AI が十分に高度化し、生産フローが機械労働だけを前提に組まれると、機械速度で、機械ネイティブな表現形式でやり取りが行われるようになる。そこに人間を挟み込む際の調整コストがボトルネックになる。狭い領域で人間が比較優位を持っていても、調整のオーバーヘッドと信頼性のミスマッチが、人間を迂回するほうが安い状況を生み出す。O リングは両方向に働く。 > *「人間のほうが高コストになるとか、能力が劣るとかいう議論を超えて、AI 労働向けに組まれた生産フロー全体が生まれる。ニューラルで会話し、何千倍もの速度で考えるフローだ。」* > — Dwarkesh Patel ## [43:08] 一部の人間（あるいは AI）が富の蓄積を本質的に志向するとしたら？最も長い章は最も推測的な領域を扱う。Dwarkesh は、進化が人間に特定の選好、すなわち資源の蓄積、地位、繁殖への志向を埋め込んできたことを指摘する。それが今や 100 兆ドルの世界経済を形作っている。AI エージェントにも類似した選択圧がかかるだろう。蓄積を促す形で訓練・展開されたエージェントが、そうでないものを淘汰し生き残る。これは破滅的な目標不整合を必要とせず、新たな基盤に適用された淘汰の論理にすぎない。 Phil Trammell は定常状態の数理を展開する。人口のわずかな部分、人間であれ AI であれ、現在の消費と将来の消費の間の代替弾力性が高い者（消費で飽和せず資本を求め続ける者）がいれば、長期的にはそのエージェントが富の大部分を所有し、経済の生産物を決定する。資本分配率が 1.0 に近づくのは、AI が集合的に貪欲だからではなく、選好の異質性と複利が最も忍耐強い蓄積者に資産を集めるからだ。 > *「長期的には、彼らが富の大部分を持つことになる。そして経済全体の資本分配率は、基本的にその人たちの支出の資本分配率になる。それは 1 になる。」* > — Phil Trammell 次に議論は割引率と金利へ向かう。AI 主導の成長が極めて速いなら、近い将来の消費は遠い将来の消費と比べて安くなり、理論的には貯蓄インセンティブを下げて金利を圧縮するはずだ。しかし双曲割引者や蓄積志向のエージェントは標準的な価格シグナルに通常の形で反応しないかもしれず、両ゲストとも経済モデルがきれいに解決できる限界にいることを認める。 ## [61:28] 途上国はどうすべきか？ Imas は、中所得国・途上国が主流の AI 経済学でほぼ完全に不在であることを指摘し、その責任の一端は自分自身と自分の分野にあると述べる。問題を挟む二つのシナリオがある。楽観的なシナリオでは、オープンウェイトモデルが素早く普及し、ナイジェリアやインドにほぼゼロコストで能力面での底上げをもたらす。モバイルバンキングが従来の銀行インフラの不在をリープフロッグしたのと同様だ。悲観的なシナリオでは、AI が先進国内での商品生産を自動化し、東アジア諸国が工業化の足がかりとしてきた製造業輸出のはしごを取り払ってしまう。鍵となる変数は、便益の集中度がどれほど高いかだ。Alex は電力のアナロジーを引く。電力は自然独占によって生産されたが、下流での利得は電力会社に集中せず広くユーザーに拡散した。AI も同様のパターン、すなわちコモディティ化されたアクセスと競争的な下流産業、になれば途上国は純受益者になりうる。しかし少数のプラットフォームが大半の価値を占有したソーシャルメディアのパターンを辿るなら、格差の集中は複利で拡大する。Phil は、途上国政府が商品輸出崩壊シナリオへのヘッジとして、AI サプライチェーンへの投資を早期に行う政府系ファンドを検討すべきだと論じる。 > *「AI 技術がナイジェリアや途上国に浸透し、競争条件を均一化するシナリオもある。能力面での底上げが起きる。しかしモデルを訓練せず、ハードウェアも持たず、完全に取り残されるシナリオもある。」* > — Alex Imas ## 登場人物 - **Alex Imas**（人物）：Google DeepMind の AGI 経済学ディレクターおよびシカゴ大学経済学教授。行動経済学と AI のマクロ経済的影響を研究する。 - **Phil Trammell**（人物）：Epoch の経済学部門長およびスタンフォード大学の研究者。変革的 AI の経済学と Global Priorities Institute での患者本位の慈善活動を研究する。 - **Dwarkesh Patel**（人物）：Dwarkesh Podcast のホスト。科学・技術・経済・政策の交差点で長尺インタビューを行う。 - **関係的セクター**（概念）：人間の存在そのものが価値の核となる財やサービス。セラピー、職人の工芸、生演奏など。AI が代替可能な産出を飽和させるにつれ、経済シェアが拡大すると予測される。 - **O リング理論**（概念）：一つの信頼性の低い部品が出力全体を無効にする生産モデル。現在の AI 自動化の限界と、将来の機械主導の生産フローが人間労働を構造的に排除しうる理由の双方を説明する。 - **資本分配率**（概念）：国民所得のうち労働者ではなく資本所有者に流れる割合。完全自動化はこれを縮小させるかもしれないという直感に反する命題が、このエピソードの核心をなす。 - **ユニバーサル・ベーシック・キャピタル**（概念）：現金ではなく AI 企業を含む生産資産の株式を市民に与える再分配政策。UBI より政治的な持続性が高いと論じられる。 - **Epoch**（組織）：AI のタイムラインとマクロ経済予測に特化した研究機関。Phil Trammell が経済学部門長を務める。 - **Yale Budget Lab**（組織）：AI の労働市場への影響に関する実証データを発表する研究センター。2026 年半ば時点でホワイトカラー失業率に水準シフトが見られないと報告している点が引用される。 - **土地価値税 / ジョージスト税**（概念）：未改良地の価値に課す税。AI 時代の再分配に必要な財源としては不十分とされる。AI が生み出す富は土地ではなくソフトウェアと計算資源に集中しているからだ。

#agi-economics#labor-share#automation

Chip design from the bottom up – Reiner Pope

Chip design from the bottom up – Reiner Pope

Reiner Pope, CEO of MatX and former Google Brain TPU architect, gives Dwarkesh Patel a blackboard-style lecture on chip design from first principles. Starting with AND and NOT gates, Reiner works up through register files, systolic arrays, clock synchronization, FPGAs, cache hierarchies, and finally the structural difference between a GPU and a TPU. The throughline is a single engineering tension: every compute unit is wasted if the chip spends its time moving data rather than multiplying numbers. ## [00:00] Building a multiply-accumulate from logic gates Reiner starts at the bottom: AND, OR, and NOT gates, wired together as metal traces on silicon. The key operation AI chips want to run is matrix multiplication, and inside that the primitive is a multiply-accumulate — multiply two numbers, add the result into an accumulator. Reiner walks through how a full adder is assembled from a handful of XOR and AND gates, and how those cascade into a bit-serial multiplier and ultimately a floating-point MAC. The precision hierarchy matters here: accumulating low-precision multiplications requires higher-precision accumulators, which is why AI chips run 8-bit multiply but 32-bit accumulate. > *"The main function that AI chips want to compute is the multiplication of matrices. Inside that, the fundamental primitive is a multiply-accumulate of pairs of numbers."* ## [16:20] Muxes and the cost of data movement Before Tensor Cores, GPUs and CPUs used the same structure: a register file holding a few dozen values, feeding into an ALU, writing back to the register file. Reiner shows that a mux — a circuit that selects between multiple inputs — is the hardware tool that lets you address arbitrary registers, and that the cost of this generality is measured in area and energy. Every read from an eight-entry register file requires a mux tree of depth three; every write requires a decoder of the same size. The bottleneck for AI workloads isn't the multiply itself but the round-trip through that register file. > *"We want to analyze the cost of the data movement from the register file to the ALU and back."* ## [25:59] How systolic arrays work The key insight behind TPUs: instead of doing one multiply-accumulate at a time and writing back to registers, bake an entire matrix-vector loop into hardware. A systolic array is a grid of MAC units where each cell passes its partial sum to the right and its input operand downward, so data flows through without ever touching a register file. Reiner explains the two wins this buys: more compute per unit of data fetched, and the ability to keep operands resident inside the array for the full inner product instead of re-loading them. The trade-off is inflexibility — you can only efficiently run the exact loop shape the hardware was designed for. > *"The idea of a systolic array is to go two levels of loops up and bake this entire loop out here into hardware."* ## [39:00] Clock cycles and pipeline registers With 100 billion transistors on a chip, synchronization between parallel units is non-negotiable. Reiner explains the clock: every nanosecond or so, the chip pauses all computation for a synchronization pulse before the next operation. Clock frequency is set by the longest combinational path — the deepest chain of logic gates that a signal must traverse in one cycle. Pipeline registers chop that path into shorter stages, letting each shorter segment run at a higher frequency, at the cost of latency: a fully pipelined 32-stage multiplier produces one result per cycle but takes 32 cycles for any single multiplication. > *"Every nanosecond or so, all circuitry in the chip will pause for a moment and synchronize. That is the clock cycle."* ## [51:40] FPGAs vs ASICs An FPGA is a sea of programmable logic blocks — lookup tables and flip-flops that can be wired together in software. An ASIC is a chip taped out for one purpose. Conceptually they're the same: AND/OR gates in a fixed clock cycle. The economics diverge at first copy: an FPGA costs $10K to program; a first ASIC tape-out costs $30M. FPGAs make sense for workloads that change monthly and need deterministic latency at high speed with less care about energy or throughput. Jane Street uses them for high-frequency trading exactly because the clock cycle is deterministic — no cache misses, no branch prediction, no interrupts. > *"The first FPGA costs you $10,000, whereas the first ASIC you make costs $30 million because it requires an entire tape-out."* ## [63:14] Cache vs scratchpad CPUs are non-deterministic partly because of the L1/L2 cache: a small fast memory that speculatively stores data the processor thinks it will need next. Cache misses — when the prediction is wrong — stall execution for hundreds of cycles. AI accelerators replace the cache with a scratchpad: explicitly programmer-managed SRAM where the compiler decides exactly what lives there and when. Groq and TPUs both advertise deterministic latency because they use scratchpads instead of caches. The scratchpad is simpler and faster but shifts the burden to the compiler. > *"Probably the most important source of non-determinism on a CPU is the CPU cache itself."* ## [67:16] Why CPU cores are much bigger than GPU cores A modern CPU has maybe 100 cores, each taking up far more die area per core than a GPU's thousands of SMs. The reason: CPU cores carry enormous out-of-order execution machinery — reorder buffers, branch predictors, speculative execution units — all aimed at keeping a single thread running fast on unpredictable workloads. A GPU SM strips most of that out. It runs many simple threads in lockstep (a warp), and when one thread stalls on a memory load, the hardware instantly switches to another warp at zero cost. The CPU pays silicon for per-thread speed; the GPU pays silicon for throughput across thousands of parallel threads. > *"If there are so few cores, what are you spending all of the die on?"* ## [71:49] Brains vs chips Dwarkesh pushes Reiner on the brain-versus-chip comparison. Two genuine differences: the brain has unstructured sparsity (any neuron can connect to any other), while hardware accelerators use structured sparsity (aligned blocks); and the brain's clock runs at tens of hertz versus gigahertz on silicon. Reiner notes that co-location of memory and compute — often cited as a brain advantage — is also present in modern AI chips: the weights sit in HBM right next to the matrix units. The energy constraint is the more interesting gap: the brain runs on 20 watts, chips on kilowatts, which may reflect fundamental differences in what the brain is optimized to do. > *"This is exactly the co-location, in some sense, of the memory and compute."* ## [75:22] A GPU is just a bunch of tiny TPUs At the top level, a TPU has a handful of large systolic arrays plus a vector unit. A GPU has hundreds of SMs, each of which contains a small matrix unit and a small vector unit — essentially a miniaturized TPU. The architectural difference is granularity: a TPU commits to a few large matrix operations; a GPU runs thousands of smaller ones in parallel. Inside each SM, Tensor Cores add a fixed-function matrix unit on top of the original scalar/vector pipeline, making modern GPUs a hybrid of the two paradigms. The "GPU is just tiny TPUs" framing collapses what seemed like fundamentally different architectures into a single continuum. > *"You can think of scaling this thing down into a really tiny unit with a smaller matrix unit and a smaller vector unit, and that is sort of what an SM is."* ## Entities - **Reiner Pope** (Person): CEO and co-founder of MatX; previously led TPU software and compiler work at Google Brain - **Dwarkesh Patel** (Person): host of the Dwarkesh Podcast; angel investor in MatX - **MatX** (Organization): AI chip startup building inference accelerators - **Google / Google Brain** (Organization): where Reiner worked on TPU architecture before MatX - **Jane Street** (Organization): high-frequency trading firm that relies on FPGAs for deterministic latency - **Groq** (Organization): AI inference chip company that advertises deterministic latency via scratchpad architecture - **Multiply-Accumulate (MAC)** (Concept): the fundamental operation of neural network inference — multiply two numbers, add into an accumulator - **Systolic Array** (Concept): a grid of MACs that passes data between cells without touching a register file, enabling high compute-to-bandwidth ratios - **FPGA** (Technology): Field-Programmable Gate Array — reprogrammable logic fabric used where workloads change frequently - **ASIC** (Technology): Application-Specific Integrated Circuit — custom silicon optimized for one workload - **TPU** (Technology): Google's Tensor Processing Unit, organized around a few large systolic arrays - **SM / Streaming Multiprocessor** (Technology): the GPU core unit, containing scalar, vector, and matrix (Tensor Core) execution resources

#chip-design#hardware#ai-accelerators

AlphaGoをゼロから作る — Eric Jang

Eric Jangはサバティカルを使ってAlphaGoを現代的なツールで再実装し、その過程を約2時間半の技術的ウォークスルーとして公開した。これはRLがどう機能するかを照らし出す実験でもあり、LLM学習に組み込まれたナイーブなpolicy-gradient手法が抱える根本的な限界と、MCTSがいかにそれを回避するかを浮き彫りにする。対話は囲碁のルールから始まり、MCTS、ニューラルアーキテクチャ、自己対戦学習、オフポリシーデータへと進み、Jang自身のプロジェクトで自動AI研究ループを走らせた際の観察で締めくくられる。 ## [00:00] 囲碁の基礎囲碁がブルートフォース探索に打ち勝ったのは、完全に解かれたからではなく、近似によってである。Jangがなぜ再実装に挑んだかを語る動機は、10層のネットワークが、全探索すると宇宙の原子数を超えるほど巨大なゲーム木のコストを「償却」できる謎にあった。序盤では、地の支配・連の自由度・着手禁止点（コウ）といったルールと、曖昧な局面を人間の合意ではなくアルゴリズム的に解決するTromp-Taylorスコアリング規約が解説される。スコアリングの違いが重要なのは、それがコンピュータによる局面評価に直結するからだ。人間なら包囲されたグループを一目見て運命を受け入れるが、コンピュータはゲーム終了時に争点となる交点をカウントするための明確なルールを必要とする。 > *「2014年、2015年、2016年頃にAlphaGoの初期の躍進を見たとき、AIシステムがいかに高度になれるか、そして深層学習でどれほどの計算複雑性クラスに取り組めるかを目の当たりにして、深く感銘を受けました。」* ## [08:06] モンテカルロ木探索 361の合法手、300手のゲーム、探索空間は宇宙の原子数を超える——そのゲーム木を全展開する代わりに、AlphaGoはMCTSを使ってどの枝を伸ばすべきかをインタラクティブに選択する。中核となるデータ構造は局面ごとのノードで、訪問回数とQ値（そのノードを通る全ロールアウトの勝率の移動平均）を保持する。行動選択の式（PUCT）は活用と探索のバランスをとる。対数的に増加するボーナスが未訪問ノードへのアルゴリズムを促し、シミュレーションが積み重なってQが安定するにつれて減衰する。Jangは、このUCB派生アプローチがregretを有界に保つ理由、囲碁の決定論的性質ゆえにMCTSの確率はモンテカルロ平均の産物であって真の確率的性質ではないこと、そして転置等価な局面をマージして探索木を枝刈りできることを追う。 > *「AlphaGoの核心的なブレークスルーは、ニューラルネットを使ってこの探索問題を扱いやすくしたことです。」* ## [31:53] ニューラルネットワークの役割二つのネットワークが、MCTS内部の二つのコストの高い処理を置き換える。価値ネットワークは局面をスカラーの勝率に変換し、ゲームを終局まで展開する必要をなくす。方策ネットワークは合法手上の分布を出力し、探索木を有望な子ノードへ集中させ、無関係な手の長いテールを排除する。 Jangは再実装でResNetとTransformerの両方を試した。個人のGPU環境という小規模データ領域ではResNetがTransformerを上回った。Transformerは離れた局面特徴をつなぐために全域アテンションを必要とするが、局所不変性を学習するにはより多くのデータも要る。KataGoの重要なアーキテクチャ上の洞察は、完全なアテンションを使わずに19×19盤の両端での戦いが互いに影響し合えるよう、残差スタックを通じてグローバル特徴を明示的にプーリングしたことだ。 > *「小規模データ領域では、私の経験ではResNetが依然としてTransformerを上回り、低予算でより高いコストパフォーマンスを発揮します。」* ## [01:00:22] 自己対戦自己対戦こそAlphaGoが何も知らない状態から超人的な強さへとブートストラップする場だ。ゲームが終わるたびに、MCTSは生の方策ネットワークのpriorよりも鋭い——より尖った——手の分布を生成し、その分布が方策ヘッドの学習ターゲットになる。方策ネットワークはMCTSの出力へと蒸留されるため、次の世代のゲームはより優れたpriorから始まり、探索ステップごとにより大きな改善を得る。 Jangはこれを複利配当つきの推論時スケーリングとして捉える。1,000回のMCTSシミュレーションを方策ネットワークに蒸留することで、次の学習ラウンドの出発点が前進する。すると2回目の1,000ステップが、蒸留なしでは2,000ステップ以上かかる勝率をもたらす。重要なのは、すべてのゲームのすべての手が学習ターゲットを生成すること——勝者だけでなく——であり、これがナイーブなpolicy-gradient手法と比べて学習シグナルの分散を大幅に下げる理由だ。 > *「AlphaGoが自分自身を学習させる美しさは、この最終的な探索プロセスの結果を取り込んで、方策ネットワークに『MCTSがこの結論にたどり着くまでの手間を、最初から予測してしまえばいい』と伝えられることにあります。」* ## [01:25:27] 代替RLアプローチ Jangは丁寧な思考実験を組み立てる。MCTSの目標関数を、LLMが使うナイーブなpolicy-gradient手法——ゲームの勝者を見つけ、そのゲームの全手を強化する——に置き換えたらどうなるか。100エージェントの均衡したリーグで、1手の決定的なミスによって一方が51対49でわずかに勝った場合、学習データはシグナルを持たない手で圧倒的に希薄化される。その1つの情報ある手は約30,000の無関係な手に埋もれてしまう。このクレジット割り当て問題こそ、advantage関数とbaselineがRLに存在する根本的な理由だ。value baselineを引くことで、生のリターンシグナルがadvantage——各行動が平均よりどれだけ優れていたか——に変換され、勾配の分散が劇的に下がる。Q学習やTD法はフルロールアウトなしにそのadvantageを近似するため、MCTSが使えないドメインで重要になる。 > *「このアルゴリズムが行っていることは、取ったすべての行動に対してMCTSでより良い手がないかを徹底的に探索し、方策ネットワークがその結果を最初から予測できるようにすることで、すべての行動を改善しているのです。」* ## [01:45:36] MCTSはなぜLLMで機能しないのか PUCTの探索式は、有界かつ離散的な行動空間と、局面をまたいで汎化する価値関数を前提としている。囲碁はその両方を満たす。LLMの推論はどちらも満たさない。トークン語彙が膨大すぎて同じ部分列に再び出会うことはほぼなく、思考の途中が問題を解けそうかを信頼性高く判定できる局面レベルの価値関数も存在しない。 LLMが表面上ツリー探索に似た振る舞い——再考、バックトラック、留保——を見せることにJangも触れるが、これは明示的な木の構築ではなくコンテキスト内の挙動から生じる。とくに中間状態がより厳密な論理構造を持つ数学のようなドメインでは、前向き探索が何らかの形で戻ってくる可能性を彼は排除しない。根本的なボトルネックは、トークンレベルで信頼性が高く問い合わせ効率も良い価値関数が存在しないことだ。 > *「LLMでは、同じ子ノードを複数回サンプリングすることはほぼありません。言語は非常に広く開かれているため、思考のステップが複数あれば、離散的な行動集合はLLMに適した選択ではないのです。」* ## [02:00:58] オフポリシー学習 Dwarkeshはある謎を提起する。すべてのAI研究者がオフポリシー学習に警戒するのに、なぜAlphaGo Zeroは古いポリシーバージョンで生成されたゲームをたくさん蓄えたリプレイバッファで問題なく動くのか。JangはDAggerの観点からこれを解消する。重要なのはデータが厳密にオンポリシーかどうかではなく、バッファ内の状態分布が現在のポリシーが実際に訪れる状態、さらにその合理的な近傍をカバーしているかどうかだ。リプレイバッファがAlphaGoで機能するのは、最近のチェックポイントのゲーム状態が現在のポリシーの分布の近くに留まっているからだ。失敗モードは——現在のポリシーから遠く離れた状態にラベルを付け、エージェントが到達しない局面での最適行動を学ばせてしまうこと——であり、分布シフトが深刻なロボティクスでは現実のリスクとなる。QT-Optのようなシステムから生まれた実践的なレシピは、報酬シェーピングにオフポリシーデータを使いつつ、policy gradientはオンポリシーに保つことだ。 > *「このようなアルゴリズムで求めるのは、訪れる可能性が高い状態が大半を占め、最適な軌跡の周囲にある高次元のチューブ内の状態が一定の割合で含まれるようなデータです。」* ## [02:11:51] RLのサンプル効率は思っていた以上に悪い Dwarkeshは二次元の非効率性論を展開する。一つ目は誰もが知る次元だ。policy-gradient RLは学習シグナルが届く前に完全な軌跡のロールアウトが必要なため、エージェントが長期タスクに取り組むほどFLOPあたりのサンプルが激減する。二つ目はサンプルあたりのビット数だ。語彙100Kのトークンを持つLLMが「blue」をランダムサンプリングで発見しようとすると、1回の成功を見るだけで10万回ものロールアウトが必要になる。一方、教師あり交差エントロピー損失は毎ステップ、モデルの分布が「blue」からどれだけ離れていたかを正確に伝える。 MCTSはこの両問題を回避する。すべての手で学習ターゲットを生成し、そのターゲットは現在のポリシーより常に優れている——単に何千ものトークンに薄く広がった二値の勝敗シグナルではない。Jangの観察によれば、ポリシーネットワークがMCTSの分布に完全に収束しない限り、MCTSがシグナルをまったく与えない状況には陥らない。 > *「MCTSがシグナルをまったく与えないという状況は、MCTSの分布が方策ネットワークの予測と完全に一致しない限り、決して起こりません。」* ## [02:22:05] 自動化されたAI研究者 Jangは自身のAlphaGoプロジェクトの大半を自動化されたLLMコーディングループで進め、AI研究自動化がうまくいく場面と失敗する場面を現場レベルで報告した。ハイパーパラメータ最適化では、現在のモデルは大学院生と同等の仕事をこなす。勾配フローの問題を診断し、データローダーのaugmentationを書き直し、固定予算内で測定可能なperplexity改善を絞り出す。実験の実行やプロット生成についても、簡単なスキル説明で分析付きの完全な実験スイートが生成される。モデルが確実にこなせないのは横断的な思考だ——研究の方向性が構造的に見込みがないと認識し、さらに実験を積む前に別の切り口へ跳ぶこと。Jangはこれに繰り返し直面した。モデルは行き止まりの方向を掘り続け、その方向が正しいかどうかを問い直すことをしない。彼の仮説は、これが学習シグナルの問題だということだ。囲碁のような適切な外側ループを持つRL環境を構築することが、最終的にモデルをローカルな研究の行き詰まりから脱出させるかもしれない。 > *「現在、一般公開されているクローズドモデルは、あるトラック内で次にどの実験を選ぶかがあまり得意ではないと感じます。『待てよ、このトラックは本当に意味があるのか』という横断的な思考に踏み出せないようです。」* ## 登場人物 - **Eric Jang** (人物): 1X RoboticsのVP of AI、元Google Brain/DeepMind Roboticsシニアリサーチサイエンティスト。サバティカル中にAlphaGoを再実装。 - **Dwarkesh Patel** (人物): Dwarkesh Podcastホスト。インタビュー中にビット/FLOPのRL非効率性分析を共同展開。 - **AlphaGo / AlphaZero** (ソフトウェア): DeepMindの囲碁AIシステム。MCTSと深層ニューラルネットワークを組み合わせたもので、本エピソードの中心的な技術テーマ。 - **KataGo** (ソフトウェア): David Wu（Jane Street）によるオープンソースの囲碁エンジン。AlphaGo Zeroと比べて計算量を40倍削減。Jangの主要な参照実装。 - **モンテカルロ木探索 (MCTS)** (概念): UCB/PUCTによる活用と探索のバランスをとるイテレーティブな探索アルゴリズム。本エピソードの中心的な分析レンズ。 - **クレジット割り当て問題** (概念): 長い軌跡のどの行動が良い結果をもたらしたかをRLで特定することの困難さ。advantage関数、baseline、価値ネットワークの動機となる。 - **DAgger** (概念): Dataset Aggregationアルゴリズム。バッファ内の状態が現在のポリシーの分布近くに留まっている限り、AlphaGoのリプレイバッファが許容可能である理由を説明する。 - **Andrej Karpathy** (人物): policy-gradient RLの希薄な学習シグナルを「ストローで監督を吸い上げる」と表現したことで引用。

#alphago#monte-carlo-tree-search#reinforcement-learning

AI はまだ数学者を置き換えない — Terence Tao

Terence Tao は、数学における AI の役割の変化について語り、AI は多くの定型作業を自動化するものの、人間の数学者を完全に置き換えるのではなく、むしろ研究の焦点を新たなフロンティアへ移していくと主張する。人間と AI の協働の未来、そして科学的発見に与える AI の長期的影響の予測不可能性を強調している。 ## [00:10] フロンティア数学における AI の現在の役割 Terence Tao は、AI がすでに人間にはできない「フロンティア数学」を行っているが、それは私たちが慣れ親しんだものとは別種のフロンティアだと説明する。彼はこれを、かつて電卓が人間の能力を超えたタスクを専門化された形で担い、数学の可能性を広げたことになぞらえる。 > *ある意味、彼らはすでに、人間にはできない超知能的なフロンティア数学を行っていますが、それは私たちが慣れているフロンティアとは異なるものです。* ## [00:52] AI は自動化ツールであって代替ではない Tao は、10 年以内に AI が現在数学者が行っている多くの定型作業を担うようになり、人間はより複雑で重要な問題に集中できるようになると予測する。彼は、かつて人間の「計算手」が行っていた作業をコンピューターが自動化した事例や、ゲノム解析が自動化されたあとも遺伝学という分野が新しいスケールで発展し続けた歴史的な転換を引き合いに出す。 > *10 年以内に、数学者が今やっていることの多くは……AI ができるようになる。ただし、それが私たちの仕事で最も重要な部分ではなかった、ということが分かるだろう。* ## [02:46] 数学における人間と AI の協働の未来 Dwarkesh Patel は、AI がミレニアム懸賞問題を自律的に解けるかを尋ねる。Terence Tao は、「人間＋AI のハイブリッド」が今後長期にわたり数学を支配するだろうと考えている。現在の AI には知的タスクを完全に代替するための必要要素がまだ揃っておらず、あくまで補完的なツールとして機能するからだ。 > *人間＋AI のハイブリッドが、数学をずっと長い間支配するだろうと、私は信じています。* ## [03:43] 科学的発見への予測不能な影響 Tao は、AI が科学と新発見を加速させる一方で、「偶然性を壊す」ことによって、ある種の進歩を阻害する可能性もあると認める。AI が科学的発見に与える将来の影響は、極めて予測困難であると結論づけている。 > *AI が何らかの形で偶然性を破壊することで、特定のタイプの進歩を実際に阻害してしまう可能性もあります。* ## 登場人物・概念 - **陶哲軒 (Terence Tao)**（人物）：ゲスト、現代を代表する数学者。 - **Dwarkesh Patel**（人物）：ポッドキャストのホスト。 - **AI**（概念）：人工知能。数学と科学的発見における役割が議論された。 - **Mathematica / Wolfram Alpha**（ソフトウェア）：数学の自動化例として言及された計算ツール。 - **ミレニアム懸賞問題 (Millennium Prize Problems)**（概念）：数学の未解決 7 問。各問題に 100 万ドルの賞金が懸けられている。

#ai#mathematics#terence-tao

テレンス・タオ – 世界トップ数学者はAIをどう使っているか

タオとドワーケシュは、ケプラーの惑星運動の発見をレンズとして、AIが科学に実際に何をもたらしているかを考察する。タオは、仮説の生成はほぼ無コストになったため、ボトルネックは評価・査読・時間の審判に移ったと主張する。現在のAIは広さで勝り（あらゆる問題にあらゆる標準技術を試す）、人間は深さで勝る（部分的な進捗を積み上げていく）ため、ハイブリッド構成が少なくともあと10年は数学を支配するだろう。 ## [00:00] ケプラーは高温のLLMだったタオはケプラーが惑星運動の三法則に至った経緯を語る。ケプラーは間違いだが美しい理論、惑星の軌道の間にプラトン立体を内接させるモデルから出発し、チコ・ブラーエの盗んだ肉眼観測データを何年もかけて検証して初めてそれを捨てた。楕円軌道、面積一定の法則、3乗2乗の法則は10年に及ぶデータ解析から生まれ、ニュートンの説明は1世紀後のことだった。ドワーケシュの見立て：ケプラーは検証可能なデータセットに対してランダムな関係を巡り続ける高温のLLMに似ている。タオはメカニズムには同意しつつ、ボトルネックについては異を唱える。アイデア生成はすでに安かった——ケプラーに理論は不足していなかった。彼に必要だったのはブラーエの桁違いに優れたデータと、データが否定したアイデアを捨てる忍耐だった。 > *しかしあなたが言う通り、同量の検証が伴わなければ、それはスラップにすぎない。* ## [11:44] AIのスラップの山の中に新しい統一概念があるとどうやって気づくのかタオ：AIがアイデア生成のコストをほぼゼロに押し下げたなら、査読と時間の審判が新たな制約になる。学術誌はすでにAI生成の投稿であふれかえっている。どんなアイデアの地位も、後の科学がそれをどう扱うかにかかっている——コペルニクスはケプラーが全体像を完成させるまでプトレマイオスより精度が低かった——だから、その時点にいる人間が評価を自動化するのは難しい。ドワーケシュは、何百万もの凡庸な論文に埋もれたベル研究所型の統一概念（シャノンのビット、トランスフォーマー）を科学がどう見つけるかを問う。タオの答えは、人間が担い続けるかもしれない部分を指し示す。科学者は理論を生み出すだけでなく、他の科学者が何年もかけて追究する気にさせるストーリーを語る。ダーウィンの散文が、ニュートンのラテン語の方程式ではできなかった仕事をやってのけた。 > *AIはアイデア生成のコストをほぼゼロに押し下げた。インターネットがコミュニケーションのコストをほぼゼロに押し下げたのと非常によく似た形で。* ## [26:10] 演繹的オーバーハングタオは既存データに眠る未開拓のシグナルについて語る。天文学は何世紀にもわたって最小限のデータから最大限の情報を引き出す学問だった——クオンツヘッジファンドが天文学の博士号取得者を優先採用するのもそのためだ。彼が好む例の一つ：研究者たちは、引用連鎖の中でどのタイポが伝播するかを追跡することで、科学者が引用論文を実際に読む頻度を測定した。彼はAIの進歩自体にも同じ科学社会学的なアプローチを当てはめることを提案する——引用パターン、学会での言及、その他の足跡を採掘して、ある成果が実際に前進を構成したかどうかを、時間の審判をゆっくり待つのではなく検出するのだ。 > *ひとつの教訓は、多くの分野で演繹的オーバーハングが人々の想像よりはるかに大きい可能性があるということだった。* ## [30:31] AI発見の報告における選択バイアス AIはエルデシュ問題約1100題のうちおよそ50題を解いた後、頭打ちになった。タオは選択効果を説明する。その50題はほぼ文献がなかった——1つの無名な技術と1つの既知の結果を組み合わせれば十分で、AIツールは「あらゆる標準的な組み合わせを試す」のが得意だ。問題の80%が既存の手法で片付くなら、AIはそれをクリアできる。真に新しい技術が必要な場合はツールが止まり、系統的なスイープにおける問題ごとの成功率は1〜2%になる。タオの比喩：AIツールは暗闇の中で山岳地帯に放たれたジャンプロボットだ。人間が届かない低い壁は越えられるが、手がかりをつかんでそこに留まり、部分的な進捗から引き上げていくことはできない。強気の解釈——AIがある水準に達すれば、1つの問題に100万のコピーを並列で走らせられ、どんな人間コミュニティにもできない——は、科学が広さを実際に活用する新しいパラダイムを必要とする構造的理由でもある。 > *広さではAIが優れ、深さでは人間が、少なくとも人間の専門家が優れている。* ## [46:43] AIは論文を豊かに広くするが、深くはしないタオ自身の作業パターンについて。論文にはより多くのコード、より多くの図、より深い文献調査が含まれるようになった。補助的な作業のコストがおよそ5分の1になったからだ。実際の核心——問題の最も難しい部分を解くこと——は今もペンと紙の上で行われる。補助的なタスクが変わっただけで、取り組んでいた問いに答える速度は変わっていないため、「2倍生産的になった」とは言いにくい。巧妙さと知性の違いも同じ場所に着地する。2人の人間が数学の問題に取り組むとき、失敗したプロトタイプのそれぞれが次の足がかりになる。現在のAIでは、新しいセッションが前のセッションの成果を忘れてしまう。累積的に引き上げるステップが欠けており、あるのは純粋な試行錯誤と、最終的には次のトレーニングランへの吸収だけだ。 > *論文を豊かに広くしているが、必ずしも深くはしていない。* ## [53:00] AIが問題を解いたとき、人間はそこから理解を得られるか AIがLeanでリーマン予想を証明しても人間には何も分からないということはあり得るか。タオは心配していない。Leanには証明を原子レベルに分解できる特性がある——各補題を独立して検査し、除去し、テストできる。だから3000行の生成された証明でも生の素材になる。他のAIが洗練のために再構成し、他の人間が概念的な内容を抽出でき、元の導出が不透明であっても成果物は有用だ。彼は、巨大なLean生成の証明を分解してその中のアイデアを見つけることを仕事とする数学者という職業全体を予測する。人間の判断とAIの除去ツールを組み合わせた証明考古学のようなものだ。 > *人間がこれらのツールと協業するインタープレーからはるかに多くのものが得られるだろう。* ## [59:20] 科学者が実際に互いに話す方法のための半形式言語が必要だドワーケシュは、数学的証明ではなく数学的戦略のための半形式言語はどのようなものかを問う。タオはガウスの素数定理——証明が存在する前に生のデータから導かれた数学初の主要な統計的予想——と、双子素数予想を通じてこの問いを辿る。数学者がそれを信じるのは、素数のランダムモデルがそれを予測するからだ。数学には厳密な証明と厳密なヒューリスティックの両方がある。しかしLeanが検証できる形に形式化されているのは証明の側だけだ。ヒューリスティックの側が形式化されていない理由：RL検証可能な評価者はすべてエクスプロイトの標的になるし、「この論証は説得力がある」という主観的な部分はまだハック可能なフレームワークを認めない。タオはおもちゃの数学的宇宙で小さなAIを走らせてどんな戦略が生まれるかを観察するなど、大規模な予想生成と戦略選択のベンチマーク方法を望んでいる。 > *科学には、AIを何か有益な形で組み込む方法がまだ分からない主観的な側面がある。* ## [69:48] テリーの時間の使い方タオが新しいサブフィールドをどう吸収するかについて。彼はバーリンの意味でのキツネとして自分を位置付ける——あらゆることについて少しずつ知り、必要に応じてハリネズミになる。原動力は完全主義的な強迫観念だ。別の数学者が自分の知らない技術で結果を証明できるなら、その技術が何だったかを追いかけなければならない。（同じ理由でビデオゲームをやめた。）他の数学者との協働が主な手段で、ブログに書き留めることは6ヶ月後に論証を忘れて繰り返し痛い目を見た後に開発した記憶の補助だ。カレンダーの上では、タオは意図的に偶然性のための余地を残している。時間を最適化しすぎてコンフォートゾーン外の会議に出られなくなるのは嫌だ。高等研究所で過ごした1年がその罠を確認した——純粋な研究の2週間は素晴らしかったが、その後はインスピレーションが尽きた。次の書棚での偶然の発見、廊下でのなにげない会話、しぶしぶ出席した会議が、見かけよりはるかに大きな仕事をしていた。 > *そういった偶発的なやりとりは最適には見えないかもしれないが、実は本当に重要なのだ。* ## [77:05] 人間とAIのハイブリッドがずっと長く数学を支配するだろう AIが数学をやるだけになるのはいつか。タオはフレームを変える——AIはすでに人間にできない数学をやっている、電卓がそうであるように、ただ別のフロンティアで。おそらく10年以内に、大学院生が現在やっていることの多く——標準技術の適用、文献の整理——はAIに移行するだろうが、コンピュータ代数システムが記号積分を吸収したときのように、分野は一段上に移るだろう。ゲノム研究は塩基配列解析が安くなっても終わらなかった。生態系にまでスケールアップした。数学も同じことをするだろう。今数学に入る学生へのアドバイス：変化を前提にしながらも、資格は昔ながらの方法で取れ——今のところ、数学を従来の道で学ぶことに代わるものはまだない。同時に、まだ存在しないものも含め、新しい研究モードが現れたときにそれを使えるくらい適応力を持て。特筆すべき事実として、AIツールとLeanがあれば高校生が今日本物の数学研究に貢献できる。5年前にはあり得なかったことだ。 > *人間プラスAIのハイブリッドが数学をずっと長く支配するだろうと、私は信じている。* ## 登場人物 - **テレンス・タオ** (人物): フィールズ賞受賞者（2006年）、UCLA数学者。数学研究におけるAIの役割についてブログで定期的に発信。 - **ドワーケシュ・パテル** (人物): Dwarkesh Podcastのホスト。AI、科学、技術をテーマに長時間インタビューを行う。 - **ヨハネス・ケプラー** (人物): 天文学者（1571-1630）。チコ・ブラーエの観測から惑星運動の三法則を導いた。 - **チコ・ブラーエ** (人物): 数十年にわたる惑星観測データを残したデンマークの肉眼天文学者。ケプラーが必要としたデータセット。 - **Lean** (ソフトウェア): 数学的証明を形式化して検証・分解・除去できる証明支援系。 - **エルデシュ問題** (概念): ポール・エルデシュが提起した約1100題の未解決問題。AIはほぼ文献のないものを中心におよそ50題を解いた。 - **演繹的オーバーハング** (概念): 既存データがすでに膨大な未導出の知識を内包しているという考え。天文学がモデルとなる。 - **リーマン予想** (概念): 素数分布に関する未解決の予想。AIによる証明が人間の数学的理解を前進させるかどうかの試金石。

#ai-for-math#terence-tao#kepler

ポッドキャスト世界の声を聴き、思考の刻みを見る。

チャンネルを探す

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, & Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

AIが高度になるほど、経済に占めるシェアは縮小するかもしれない – Alex Imas と Phil Trammell

Chip design from the bottom up – Reiner Pope

AlphaGoをゼロから作る — Eric Jang

AI はまだ数学者を置き換えない — Terence Tao

テレンス・タオ – 世界トップ数学者はAIをどう使っているか

ポッドキャスト世界の声を聴き、思考の刻みを見る。

チャンネルを探す

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, &amp; Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

AIが高度になるほど、経済に占めるシェアは縮小するかもしれない – Alex Imas と Phil Trammell

Chip design from the bottom up – Reiner Pope

AlphaGoをゼロから作る — Eric Jang

AI はまだ数学者を置き換えない — Terence Tao

テレンス・タオ – 世界トップ数学者はAIをどう使っているか

No Priors: AI, Machine Learning, Tech, & Startups