LaiDub

ПодкастыHear the voice. See the shape of the thought.

Обзор каналов

Все ИИ и технологии Бизнес Наука Культура Политика Философия Здоровье

What does the next training paradigm look like?

19:53

EN/ZH

Watch with Captions

Dwarkesh Patel29 дней назад

What does the next training paradigm look like?

Dwarkesh Patel narrates his essay on where AI training is headed. The labs are betting that scaling RL across millions of verifiable tasks gets you to AGI, but Dwarkesh argues that bet leaves two holes: most valuable skills aren't "grindable" enough to farm in a simulator, and the learning models pick up on the job never makes it back into their weights. He walks through why sample efficiency and continual learning are the same problem, sketches two candidate fixes — on-policy self-distillation and "dreaming" — and imagines an AI that keeps getting smarter from being deployed rather than from pretraining. ## [00:00] The big research bet the labs are making The labs' working theory: train AIs on millions of verifiable tasks across thousands of RL environments and you'll get a general problem-solver that can grind on open-ended work for weeks. Optimists argue the known deficits — data inefficiency, no continual learning — will get steamrolled by more compute, the same way classic NLP problems collapsed once LLMs scaled. Dwarkesh lays out their strongest counter to his own skepticism: the million-fold sample-inefficiency he flagged in his last essay is only a training-time cost, amortized across billions of sessions. What matters is how capable the model is *during* a session, and that keeps improving. Continual learning might not even be needed if context windows grow large enough to hold months of on-the-job experience. > *People often say that their employees are not net productive until six months or more on the job. So clearly, online learning is necessary for competence. But what if you could just fit those six months into the context window?* ## [02:12] Grindability is just as important as verifiability Why has computer use lagged coding and math when it's just as verifiable? Dwarkesh's underrated answer: being verifiable isn't enough — a domain also has to be *grindable*, meaning you can run thousands of parallel rollouts against a deterministic, replayable simulator from the same starting point. A coding repo clones trivially into a container; Amazon's checkout flow does not. This is the canyon wall AI progress only slowly chips at. You can sometimes build farmable simulators (clone Slack, clone Gmail), but most high-value skills — building a business, winning a court case, running a profitable trading day — require irreproducible interaction with the real world, where verification takes months and can't be re-observed across parallel rollouts. > *What is the RL environment to make an AI that is as good at politics as Lyndon Johnson, or as good at building a space-launch business as Elon Musk?* ## [06:10] Will RLVR alone generalize? The labs are betting RLVR generalizes — that enough containerized environments yield an agent that plans, adapts, and picks up new skills inside a single session, good enough to out-advise LBJ on a 1948 Senate race or build SpaceX with a hundred million dollars. Whether it generalizes that far is an empirical question, and Dwarkesh reads a Dario Amodei quote as a hint that it doesn't stretch infinitely: short-horizon training may not transfer to long-horizon performance. Even if in-context experience could turn a model into Henry Ford for a session, it's all wasted if the learning can't return to the weights. 30–50% of a lab's compute goes to inference that currently does nothing to improve the model — even though deployment is exactly where the most valuable information is revealed. > *We've got some genius grad student who's never been allowed to take a real internship, and we keep giving it more and more classroom case studies in the form of RL training on environments.* ## [08:41] Getting the learning back to the weights Continual learning means updating the weights, not endlessly growing a KV cache — brains don't separate parameters from activations, and they compress what they learn. But moving into the weights forfeits in-context learning's sample efficiency, because gradient updates are coarse. That's why every shipped online-learning model (like Cursor's Tab model, learning the same accept/reject objective across 400M+ requests a day) learns one identical thing across all users, which defeats the point when every job and company differs. Dwarkesh frames sample efficiency and continual learning as the same problem, then argues the bottleneck isn't architecture — new sparse-attention and KV-compaction papers ship weekly — but the loss function. His candidate is on-policy self-distillation: train the base model to make the same predictions a context-rich veteran version of itself would make. OPSD needs no outer-loop reward, gives denser per-token supervision than RL, and keeps RL's sparse-update property so on-the-job learning doesn't overwrite what the model already knows. > *The way you get better at your job is not by recalling the transcript of every single thing that happened every day with perfect fidelity. Rather, it's by consolidating the handful of insights and pieces of knowledge that are actually relevant to you getting better at your job.* ## [15:22] Dreaming The second, more speculative fix: let the AI build a simulation of reality and rehearse against it, experiencing orders of magnitude more samples per unit of wall-clock time. The precedent is EfficientZero, which beat novice humans at unfamiliar Atari games by playing dozens of simulated games in its head per real step. Simulating the whole world is far harder than emulating Go, which is why Dwarkesh flags this as speculative — but if it works, it becomes a fourth scaling axis alongside pretraining, RL, and inference-time compute. Instead of hitting `/compact` to summarize a session, you'd hit `/dream` and burn compute rehearsing against a video-game version of what the model is seeing in production. > *So instead of hitting /compact in Codex or Cursor or Claude... you hit /dream. And this incinerates huge amounts of compute to build and train against a video-game version of what the model is witnessing in the real world.* ## [17:23] What 2027 looks like Dwarkesh's scenario: RLVR produces an agent competent enough to start getting real-world experience, context windows stretch to a full week of co-working, and at the end of the week a thumbs-up triggers the base model to distill what it learned — via OPSD, dreaming, or some mix. Each round the model expands into domains adjacent to what it was last trained or deployed on. The endgame flips how AI improves: capability comes mostly from broad deployment across the economy, not from pretraining before release. Every interaction makes the model smarter — learning from your past sessions and from everyone else's — which Dwarkesh calls scary, exciting, and very different from today. > *Just as pretraining created a base intelligence that was smart enough to become a competent agent with enough RLVR on top, so RLVR has created an agent that is competent enough to actually be broadly deployed in the world.* ## Entities - **Dwarkesh Patel** (Person): Podcast host and essayist; narrates his own blog post on AI training paradigms. - **Dario Amodei** (Person): Anthropic CEO, quoted on why model performance degrades at long context. - **RLVR** (Concept): Reinforcement learning from verifiable rewards — training on reproducible, checkable tasks; the labs' main bet for reaching AGI. - **Continual learning** (Concept): Updating a model's weights from on-the-job deployment rather than only from pre-release training. - **Grindability** (Concept): Dwarkesh's term for whether a domain can be farmed via many parallel rollouts on a deterministic, replayable simulator. - **On-policy self-distillation (OPSD)** (Concept): Distilling a context-rich session's learning back into the base model's weights with dense per-token supervision. - **Dreaming** (Concept): Speculative fourth scaling axis where a model builds and trains against its own simulation of reality. - **EfficientZero** (Software): Sample-efficient RL model that beat novice humans at unseen Atari games by simulating many games per real step. - **Mercury** (Organization): Fintech banking platform; episode sponsor referenced in the bill-pay anecdote.

#ai-training#reinforcement-learning#rlvr

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

2:08:20

EN/ZH

Watch with Captions

Dwarkesh Patelоколо 1 месяца назад

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Historian and novelist Ada Palmer joins Dwarkesh Patel to dismantle the "Machiavellian villain" myth and replace it with the actual Niccolò Machiavelli: a patriot who watched Cesare Borgia conquer half of Italy from up close, was tortured and exiled by the Medici, and then wrote *The Prince* as a secret job application addressed to the very regime that had wronged him. Palmer traces the structural forces — cascading legitimacy collapse among Italian city-states, popes who functioned as warring hereditary princes, and a patronage system that made nepotism feel like sound risk management — that made Machiavelli's analysis both urgent and unprecedented. The conversation closes on a sharp irony: the word "Machiavellian" now means self-serving cunning, yet the man himself gave up income, fame, and freedom rather than serve any cause that was not Florence. ## [00:00] How Florence bargained with Cesare Borgia for survival Italy in 1513 was a cascade of broken legitimacy. Palmer explains that when a long-standing government falls, successor regimes inherit none of its credibility, making rapid further overthrows nearly inevitable — what she calls the thread of continuity being cut. By the time Machiavelli is writing *The Prince*, this dynamic had swept dozens of Italian city-states. Compounding this was papal instability: because popes were elected rather than hereditary, the next pope was almost always a coalition pick of people who hated the current one, guaranteeing policy reversals every ten years. Machiavelli's day job during this era was standing next to Cesare Borgia — "Valentino" — and whispering endlessly that Florence was loyal, buying what Palmer calls "the boon of Polyphemus": the conqueror's promise to eat you last. His advice to Florence was to betray allies, pay tribute, give military support, and buy time, knowing full conquest was only delayed by Alexander VI's mortality. His biographers can still feel how much he was under Borgia's spell: when describing Valentino's fall, Machiavelli breaks from third person and writes "he told me" — the historian slips through the veil. > *"Machiavelli's job dealing with Cesare Borgia… it's very clear that the Borgia plan is to conquer the Papal States in the middle of Italy."* ## [15:08] Machiavelli's analytical innovations Machiavelli is not the crude "ends justify the means" thinker of caricature. Palmer shows that he is obsessed with the means — specifically, which means of acquiring power are stable and which are not. Whether betrayal works depends on the nature of your power base: Borgia could betray allies because his terror made remaining allies step further into line, while Savonarola's power rested on his followers believing him divinely infallible, so his flip-flopping destroyed him. The lesson is conditional, not universal. Machiavelli also makes the first recorded European argument that competing political parties can be stable and politically useful, rather than requiring mutual annihilation. Florence's own history was the counterexample: it had literally salted the earth where its Ghibelline opponents' houses once stood. His observation of Siena as a countermodel — parties competing without destroying each other — was genuinely novel. > *"Machiavelli is the first person that we have ever in the European tradition to suggest that it could be viable for there to be more than one political party in a state at the same time."* ## [23:58] Why popes became warlords The closer you lived to Rome, the less abstract the papacy felt. Palmer draws the contrast sharply: a Danish subject saw the pope as a figure of vast spiritual majesty; a Florentine saw "that asshole who went to college with your brother." Italians judged popes as specific men with dirty laundry, family grudges, and factional allegiances — which is why cities that were hereditarily Guelph (pro-papal) sometimes ended up fighting wars against the sitting pope when he happened to be from a Ghibelline family. The corruption was structural and self-reinforcing. As the Church accumulated donated wealth across generations, the incentive for ambitious families to capture it through bribery and nepotism grew. Palmer reads Machiavelli's personal letters haggling over the correct bribe to buy a priesthood for his brother Totto — written as routine household correspondence — to show how completely normalized the practice was. Every generation saw popes get more secular and military than the last; Machiavelli explicitly predicted the institution would collapse under accumulated corruption unless reformed from within, as St. Francis had temporarily saved it two centuries earlier. > *"This makes a stronger and stronger incentive for every ambitious family to send their second son into the Church."* ## [36:13] Why the common people demanded nepotism When Pope Paul III appointed a competent outsider general instead of his own illegitimate son, there were riots. Palmer explains this is not irrational: in a world where a soldier's oath ran to his commander, not to the state, the only guarantee the papal armies wouldn't turn on Rome was putting the pope's own son in charge — someone who rose and fell with the pontiff. Nepotism was the trust mechanism that made institutions function. Patronage also determined justice outcomes. Medieval law codes prescribed death for almost everything, but roughly 99 in 100 capital-eligible convictions ended in a fine because the defendant's patron intervened. This was considered correct: the trial was meant to replicate the soul's experience before divine judgment — terrifying, then mercifully pardoned — so patron intervention mirrored the intercession of a saint. The system had a grimly consistent internal logic, and Palmer traces it from Giordano Bruno (burned because he had angered his patron, not because of his ideas) to Giovanni Pico della Mirandola (spared because Lorenzo de' Medici went through the Orsini network to Rome). Without a patron, even innocence was precarious. > *"The norm is: you're accused of a severe crime, you're put on trial for your life, your patron intervenes, and you get a lighter sentence. This is how justice is supposed to work."* ## [47:57] Cesare Borgia brought terror to rulers and justice to the people Borgia's conquests produced a paradox that startled contemporaries: he massacred ruling families and was adored by common people. Palmer's explanation is structural. Factional cities had lived for generations under justice that tracked who was in power, not the facts of the case. A carpenter whose family worked for the dominant faction faced minimal consequences for his son's drunken homicide; the same crime by the carpenter of the out-of-power faction could be a capital offense. When Borgia wiped out both factions and installed outside administrators with no local feuds to take sides in, neutral adjudication felt like a revelation. Machiavelli also drew a hard line for why even a beneficent Borgia conquest of Florence would be catastrophic: under any arbitrary ruler, a citizen can be executed by a pointed finger in the street. Machiavelli called that condition slavery, regardless of how fair the tyrant might be in practice. Florence's "LIBERTAS" banner — flown by ordinary citizens defending an oligarchic Senate that excluded them — represented a genuine commitment to the existence of a process, however biased, over the absence of any process at all. > *"As a result, to everyone's surprise, he moves into a city, he massacres the rulers, he implements an authoritarian regime, and he's incredibly popular and beloved by the people."* ## [57:55] Art as a proxy for war Renaissance Florence could not afford to fight France militarily; it could afford to paint French royal symbols on its government buildings and commission beautiful gifts for the French king. Palmer frames this not as surplus expenditure but as substitution: the art budgets were military budgets redirected into a form of warfare Florence could win. Like the Fulbright Program being a higher return-per-dollar than the defense budget, Florentine cultural patronage was strategic deterrence. The period's orientation toward the past further supercharged the value of art. Where modernity assumes humanity advances into the future, Renaissance Europe pointed the other direction: the ideal was recapturing Rome. High-tech achievement meant successfully imitating a lost Roman technique. When a French diplomat arrived in Florence and saw the cathedral or the neoclassical buildings, he was not seeing quaint historical imitation — he was seeing something that approached what only Rome had achieved, and that France could not. That perception was itself a form of power. > *"If we fought him, we would lose. But if we play the culture victory game, that's cheaper, and we can try to win."* ## [01:06:41] Florence, a city famous in hell Dwarkesh raises the obvious puzzle: if everyone in Renaissance Italy was a Christian who genuinely believed in hell, why did they commit the sins Machiavelli describes constantly? Palmer's answer has two parts. First, the Dante answer: Dante fills the *Inferno* with Florentines precisely because he wants his contemporaries to feel the discomfort of consequences they were ignoring. His Paolo and Francesca passage — damning a love story everyone celebrated — was designed to be a shock to readers who thought romantic adultery was exempt from theological reckoning. Second, pre-Reformation Christianity assumed everyone sinned constantly and focused on repentance cycles rather than purity maintenance. St. Julian the Hospitaller, patron saint of murderers, was omnipresent in Florentine iconography — his legend held that he killed his own parents, spent his life in pilgrimage to repent, and was saved. Dozens of icons of him meant dozens of Florentines who had killed someone and were working through it. The Calvinist and Puritan emphasis on spotlessness came later and was a genuine departure from how the medieval and early Renaissance church operated. > *"He fills his hell with Florentines."* ## [01:15:57] The Prince was a job application to Machiavelli's torturers After the Medici retook Florence in 1513 and, on mistaken suspicion of conspiracy, tortured and exiled Machiavelli, everyone expected him to defect. He had contacts at every major court in Europe and the skills — military history, diplomatic networks, classical scholarship — that kings paid for. He chose instead to sit in a hamlet outside Florence writing *The Prince* as a secret appeal to the Medici to take him back. No other courts received it; he kept it proprietary, treating his political science the way Palmer says a nuclear scientist would treat classified weapons knowledge. His other works — the *Discourses*, the history of Florence, the comedy *Mandragola* — circulated publicly to build his reputation. *The Prince* did not. Palmer compares it to historian friends who produce classified 100-page reports for Department of Defense committees: bespoke proprietary knowledge for an audience of five, whose existence may be whispered about but whose contents are guarded. It also explains why the book was eventually published in 1532 without Machiavelli's input: surviving relatives wanted family fame, and the Medici wanted credit for a text dedicated to their house. Neither understood what its author had intended to keep contained. > *"I'm going to stay, and I'm going to rot, and I'm going to write The Prince, which is my job application begging the new regime to bring me back and let me work for them and demonstrating my loyalty, and I'm going to send it to them and only them, them and my immediate friends."* ## [01:41:39] During the Renaissance, original ideas had to be couched in antiquity The Renaissance's obsession with recovering ancient Rome created a peculiar incentive structure: original ideas were unfashionable; ideas presented as recovered ancient wisdom were prestigious. Palmer shows this goes far beyond homage. Giordano Bruno attributed to Aristotle claims that Aristotle explicitly contradicted. Annius of Viterbo forged ancient texts and staged fake archaeological digs to give his original historical theories the authority of antiquity. Marsilio Ficino, translating Plato, genuinely convinced himself that the wildly original cosmological and magical system he had assembled was secretly coded in the Platonic texts. This explains why Machiavelli's other major work is called *Discourses on Livy* rather than, say, *A New Theory of Republican Governance*. A discourse on an ancient was a prestige format; an original political treatise was a niche curiosity. The 19th century misread the Renaissance as intellectually barren — "200 years of people being wrong about Plato" — because it expected original standalone treatises and found commentary after commentary. Palmer argues the original ideas are there, using the ancients as what she calls the trellis up which the rose climbs. > *"Nobody wants original ideas. Original ideas are out of vogue. Original ideas are dead. All ideas need to be from the ancients."* ## [01:50:44] Why copyright began with the Inquisition Machiavelli was one of the first authors to experience unauthorized printing. A local press printed one of his works without asking, riddled it with compositor typos, and his only recourse was to write letters to important people clarifying that the errors were not his. There was no legal framework at all. The solution emerged from an unexpected direction: post-1515, the Inquisition required pre-publication approval for all texts to screen for heresy. In exchange for going through this process, the approved printer received a monopoly license — the Inquisition's record of permission served as proof that no one else could legally print the same book. The first copyright was a censorship certificate. England, observing this, copied the mechanism while eventually stripping out (or softening) the censorship half, producing the ancestor of modern copyright law. The institutional logic held together: the Inquisition needed to please local rulers to get resources, so approving books dedicated to the duke and granting his favored printer exclusivity was a political investment. Everyone — inquisitors, printers, authors, and ruling families — had reasons to make the system work. > *"So the very first version of copyright is the Inquisition."* ## [02:02:12] Machiavelli wasn't Machiavellian The word "Machiavellian" came to mean scheming self-advancement — Shakespeare's Richard III invokes "the murderous Machiavel" as his role model. Palmer traces how the idea of Machiavelli separated from the actual man and became a useful thought-experiment figure: the cynical, probably atheistic politician who wants nothing but personal power. The same splitting happened to Hobbes (the Beast of Malmesbury) and Spinoza, whose actual writing is warm and theistic but whose excommunication from the Jewish community made people assume he must be the most radical heretic imaginable. The real Machiavelli — who refused lucrative court positions across Europe, who kept his most important work secret to protect Florence from foreign exploitation, who chose to rot in an isolated hamlet over serving any cause that wasn't his country — is almost the opposite of "Machiavellian." His book is not about gaining power but about keeping power stable enough to protect people. Palmer's closing point: the gap between Old Nick and Niccolò Machiavelli is itself a revealing fact about how societies use ideas, splitting thinkers into a character useful for one purpose and the actual work useful for another. Read *The Prince* knowing it was written by someone who would give up anything to serve Florence, and a very different text comes through. > *"This is why it's so weirdly ironic to me that the reputation—the word"Machiavellian"—means"self-serving", when Machiavelli himself is one of the most selfless men I've ever read about in the history of the Earth."* ## Entities - **Dwarkesh Patel** (Person): Host of the Dwarkesh Podcast; interviews scholars on history, science, and technology. - **Ada Palmer** (Person): Historian and science fiction novelist at the University of Chicago; specialist in Renaissance intellectual history and the history of censorship. - **Niccolò Machiavelli** (Person): Florentine diplomat (1469–1527), author of *The Prince* and *Discourses on Livy*; wrote *The Prince* as a secret appeal to the Medici regime that had tortured and exiled him. - **Cesare Borgia** (Person): Renaissance military commander known as "Valentino"; son of Pope Alexander VI, conquered central Italy and was Machiavelli's primary case study in effective (if brutal) statecraft. - **The Prince** (Concept): Machiavelli's treatise on political power, written ~1513, kept proprietary during his lifetime and published posthumously in 1532; misread as a self-advancement manual rather than a guide to maintaining stable government. - **Discourses on Livy** (Concept): Machiavelli's longer republican political theory, structured as commentary on the Roman historian Livy; his public bid for intellectual prestige in a culture that prized commentary on ancients over originality. - **The Medici** (Organization): Ruling family of Florence, whose patronage networks and papal connections shaped both the political instability Machiavelli analyzed and the conditions under which he wrote and was exiled. - **Florence** (Organization): Italian city-state and center of Renaissance banking, art, and humanist scholarship; Machiavelli's country, for which he subordinated his entire career. - **Patronage System** (Concept): The multi-generational network of family obligations that served as the functional glue of Renaissance society, determining access to justice, employment, publication, and protection from the Inquisition.

#machiavelli#renaissance#political-philosophy

Sarah Paine - Why Putin and Xi can't escape geography

1:02:07

EN/ZH

Watch with Captions

Dwarkesh Patelоколо 2 месяцев назад

Sarah Paine - Why Putin and Xi can't escape geography

Naval War College historian Sarah Paine delivers a standalone lecture tracing two thousand years of geopolitical logic: continental empires (China, Russia) pursue security by expanding borders and crushing neighbors, while maritime powers (Athens, Britain, the US) pursue prosperity by trading across open seas. She argues this structural divide—rooted in the brute fact of geography—explains Putin's war on Ukraine, Xi's ambitions over Taiwan, and why the post-WWII rules-based order is the only arrangement that produces compounded growth rather than compounded ruin. ## [00:00] Setting the stage Paine opens by framing the lecture's core question: why do some great powers keep grabbing territory while others keep opening trade routes? The answer comes down to one physical fact—whether it is feasible to defend yourself at sea. Maritime powers can; continental powers cannot. That single asymmetry generates two entirely different military traditions, two economic models, and two competing visions of world order. She walks through American history as a warm-up: the US began life as a continental power (manifest destiny, the Mexican-American War, Alaska purchased when Russia needed cash), then pivoted toward a maritime identity after Alfred Thayer Mahan convinced strategists that naval trade, not westward land, was the real source of national power. Alongside Mahan, Paine introduces the three geopoliticians whose maps anchor the lecture: Halford Mackinder (the Eurasian heartland as the world's natural fortress, impervious to sea power), Nicholas Spykman (control the rimlands, and you influence the heartland), and their shared lesson that US security runs through sea lanes and alliances, not borders. > *"Maritime powers are the exception and continental powers are the rule. Why? Because maritime powers, if need be, can defend themselves primarily at sea with their navies. Whereas a continental power simply cannot—think Ukraine, a navy is not going to save them from Russia."* ## [12:10] The continental powers Paine works through the logic of the continental world starting with China—the original case—then Russia. Sun Tzu's *Art of War* contains no references to maritime warfare: it was written for a world where neighbors invade overland at any time and the only viable response is a mass army. Geography tells the rest: too much of China's land is vertical to feed its people, which makes controlling the arable lowlands an existential imperative. The Han expansion from the Yellow River Valley followed that logic for millennia, wiping out the Zongars, subjugating Tibet, producing the ethnic patchwork Beijing still manages with military administrative overlays. Russia's pattern is the same dynamic in reverse—a Moscow core expanding outward in concentric rings until it hit countries that fought back. The continental security playbook that emerges is ruthlessly coherent: no two-front wars, no great-power neighbors, take on threats sequentially, destabilize the rising ones, absorb the failing ones, maintain buffer zones in between. Paine closes the section with the WWII body count that makes the paradigm's cost visible: Russia lost over 25 million dead (soldiers plus civilians); the United States lost 295,000. The ocean moat is not an abstraction—it is the difference between hundreds of thousands and tens of millions. > *"In this world, you're faced with a binary choice: you either become Han or they will kill you. And genocide is what happens to the losers in continental warfare."* ## [29:12] The maritime alternative Where continental empires carve the world into exclusive spheres, maritime powers treat the sea as a commons to be shared. Paine traces the lineage from Athens through Rome ("Mediterranean" means the sea in the middle of the lands; "Zhongguo" means the kingdom among the kingdoms—one term centers the sea, the other the land), the Dutch Republic, and finally Britain. Hugo Grotius, a Dutchman watching his nation's trade pirated, wrote *Mare Liberum* to establish that the sea belongs to no one and therefore belongs to everyone—the founding document of international maritime law. Britain refined the operating strategy over the Napoleonic Wars into six rules for "elephant hunting": keep the home economy growing, blockade enemy trade, fund the allied continental power facing the main front, find a peripheral theater where sea access beats land access, never attack the enemy's main force directly, and—only after the elephant has been bled—pile on with allies. The key structural point: a navy that prevents invasion produces wealth invisibly. Britain compounded wealth for a century after Waterloo while its continental neighbors burned money funding standing armies and fighting each other. That invisible compounding, over generations, is the difference between North and South Korea. > *"Trade is going to finance the navy. It's going to protect both British homeland and some of the trade. And then Britain is going to be compounding wealth while its neighbors are busy—constantly fighting with each other and destroying wealth in the process."* ## [42:00] How the Industrial Revolution changed everything The Industrial Revolution flipped the source of power from land to commerce. When land determines wealth, conquest makes sense. Once wealth comes from industry and trade, territorial expansion is literally negative-sum: you destroy the asset while fighting for it. The Suez Canal is Paine's sharpest example—Egypt sank block ships in 1967 to deny Israel access, but the strategic result was that global shipping shifted to supertankers that go the long way around Africa at one-third the cost per ton. Closing a chokepoint accelerated the maritime world's efficiency. Malcolm McLean's shipping container reduced cargo loading costs from nearly $6 per ton to under 20 cents, and the ISO then harmonized container dimensions across trucks, railways, and ships—producing plummeting transport costs and the trade explosion that lifted hundreds of millions out of poverty. Xi's Belt and Road Initiative, Paine notes dryly, crosses some of the world's most unstable territory, requires constant trans-shipment between incompatible rail gauges, and can never be rerouted—the exact opposite of maritime flexibility. China's own geographic trap is inescapable: shallow, island-cluttered seas that become kill zones in wartime mean its merchant fleet reaches global markets only in peacetime. > *"Once wealth is a function of commerce, industry, and trade, it isn't land anymore. And this upends the world. If you think about the world today, who's rich, who's poor—it's often the degree to which the country is industrialized."* ## [52:00] Why Putin wants to break the world The post-WWII institutional framework—UN, IMF, NATO, WTO, EU—was built by people who survived both the trenches of WWI and the Great Depression, then spent WWII watching their own children die. Their conclusion: hash out differences with diplomats and lawyers, because sending soldiers destroys more value than any conceivable prize is worth. That system held the peace in the industrialized world for 75 years, until Putin decided to break it. Putin's challenge is not irrational by continental logic: a rising Ukraine integrated into NATO is precisely the kind of strong, stable neighbor that, in the old paradigm, becomes an existential threat. His goal is to hollow out the alliance system and shatter international law so the world reverts to warring spheres of influence—a world where continental powers can once again play their traditional game without maritime rules they were never designed for. Paine's answer is that sanctions are "economic chemotherapy": they suppress growth by one or two percent per year, and compounded over generations, that gap is the difference between North and South Korea. The objective is never to eliminate the rogue state but to contain it at acceptable cost. The only exit that avoids nuclear escalation is the one the post-war generation built: diplomats, lawyers, and institutions. > *"The only win-win solution is to deploy the diplomats and lawyers to hash out these things in international forums—because if we're all going to send soldiers, we're going to get a third world war with nuclear follow-on effects, and we'll see whether humanity makes it."* ## Entities - **Sarah Paine** (Person): Military historian at the U.S. Naval War College; sole speaker in this lecture; author of a 2025 lecture series on continental vs. maritime powers. - **Alfred Thayer Mahan** (Person): 19th-century U.S. naval strategist; argued that maritime trade and sea power, not land conquest, determine national greatness; associated with the Naval War College. - **Halford Mackinder** (Person): British geographer; 1904 "pivot area" thesis posited that the Eurasian heartland, insulated from sea power, is the world's natural fortress. - **Nicholas Spykman** (Person): Dutch-American strategist; argued that controlling Eurasia's rimland determines global power; died 1943 while warning the US about Eurasian dominance. - **Hugo Grotius** (Person): Dutch jurist; founder of international maritime law; *Mare Liberum* (1609) established freedom of the seas as a universal right. - **Malcolm McLean** (Person): American trucking entrepreneur who invented the standardized shipping container, collapsing cargo loading costs and enabling the post-war trade explosion. - **Continental power** (Concept): A state that cannot defend itself primarily at sea; prioritizes territorial expansion, mass armies, buffer zones, and exclusive spheres of influence; exemplified by Russia and China. - **Maritime power** (Concept): A state that can defend itself primarily at sea; prioritizes trade, open sea commons, alliance-building, and compounding wealth; exemplified by Britain and the United States. - **Rules-based international order** (Concept): The post-WWII institutional system (UN, IMF, NATO, WTO, EU) that enforces sovereignty and free trade; the system Putin and Xi seek to dismantle. - **U.S. Naval War College** (Organization): Graduate school of the US Navy in Newport, Rhode Island; Paine spent 24 years there; home of Mahanian sea-power theory.

#geopolitics#grand-strategy#maritime-power

Чем лучше становится ИИ, тем меньше его доля в экономике – Alex Imas и Phil Trammell

1:16:08

EN/ZH

Watch with Captions

Dwarkesh Patelоколо 2 месяцев назад

Чем лучше становится ИИ, тем меньше его доля в экономике – Alex Imas и Phil Trammell

Экономисты Alex Imas (Google DeepMind / Чикагский университет) и Phil Trammell (Epoch / Стэнфорд) утверждают: самый неочевидный итог полной автоматизации — вовсе не то, что капитал захватит всё. Напротив, ИИ может сократить собственный экономический след по мере того, как спрос на полностью автоматизированные товары насыщается, тогда как люди остаются дефицитом в реляционных и экспериенциальных рынках. Разговор движется от вопроса о том, что останется дефицитным после AGI, через политику перераспределения, к тому, почему O-ring-комплементарность тормозит нынешнюю автоматизацию, почему ИИ-агенты с ориентацией на накопление могут владеть большей частью будущего богатства и что развивающимся экономикам делать, если их отрежут от цепочки поставок ИИ. ## [00:00] Вырастет ли доля капитала? Dwarkesh открывает разговор с ключевого вопроса: если ИИ умеет делать всё, что делают люди, куда уходит доля труда в доходах? Alex Imas замечает, что экономисты, пытавшиеся предсказывать прошлые промышленные переходы, раз за разом ошибались. Давид Рикардо предсказывал массовую безработицу от промышленной революции — и был приблизительно прав относительно того, какие рабочие места исчезнут, но совершенно не угадал агрегированный результат: занятость среди людей трудоспособного возраста в 2026 году выше, чем почти в любой момент с 2000 года. Урок в том, что структурные изменения всегда недооцениваются — прежде всего в части новых видов товаров и профессий, которые возникают, когда старые издержки рушатся. Imas вводит то, что называет «реляционным сектором» — товары и услуги, где человеческое присутствие само по себе является частью ценности. Поскольку людей конечно мало, автоматизация всего остального повышает относительный дефицит и цену продуктов с человеком в контуре. Phil Trammell заостряет это через аргумент сетевого учёта факторных долей: проследите долю труда и капитала в любом товаре вплоть до сырья — и увидите, что доля труда уже удивительно устойчива. Парадокс в том, что если ИИ насытит все нереляционные товары при почти нулевых предельных издержках, потребители быстро исчерпают спрос на них и перенаправят расходы на то, что всё ещё дефицитно. Выступление балерины не дешевеет от того, что программное обеспечение бесплатно. > *«Поскольку люди по природе дефицитны, если мы автоматизируем многое и многое перестанет быть дефицитным, дефицит останется в том, в чём люди так или иначе участвуют и находятся в контуре».* > — Alex Imas Trammell распространяет аргумент и на саму долю капитала: полностью автоматизируйте цепочку поставок для всех нечеловеческих товаров, быстро насытьте спрос — и предельная полезность от добавления новых таких товаров упадёт к нулю. В итоге доля капитала в стоимости может фактически сократиться, а не вырасти — это и есть контринтуитивный заголовок эпизода. ## [19:36] Сценарий «Грязной середины» Dwarkesh поднимает тезис Molly Kinder о «грязной середине»: мир, в котором ИИ не вызывает катастрофы, но создаёт затяжное распределительное давление — компании присваивают прирост производительности, зарплаты работников стагнируют, а государственное перераспределение отстаёт от темпов вытеснения. Историческая аналогия — телефонные операторы: профессия, полностью автоматизируемая по технологиям 1960-х годов, но потребовавшая двух десятилетий реальной автоматизации из-за институциональной инерции. Работников не увольняли в одночасье — они постепенно поглощались рынком, в основном на более низких зарплатах и в условиях неполной занятости. Imas считает «грязную середину» правдоподобной в ближайшей перспективе, но вряд ли постоянной, поскольку масштаб прироста производительности от ИИ делает «пирог» достаточно большим для распределения. Проблема политической экономии — не дефицит ресурсов, а скорость и координация: правительства не знают, кто из работников был вытеснен ИИ, а кто — другими причинами; политические ограничения создают трение; разрыв между вытеснением и перераспределением может растянуться достаточно, чтобы причинить серьёзный ущерб, даже если математика в итоге сходится. > *«Телефонных операторов полностью автоматизировали, но это заняло 20 лет, хотя технология уже существовала, — и потому всё происходило по капле, не как исчезновение целого сектора в один момент».* > — Alex Imas ## [25:57] Как облагать налогом и перераспределять богатство от ИИ Imas раскладывает инструменты перераспределения по двум осям: сложность реализации и время до эффекта. Отрицательный подоходный налог заработает в день принятия и немедленно создаёт нижнюю планку. Всеобщий базовый капитал — раздача каждому гражданину акций компаний, создающих ИИ, — начнёт давать отдачу лишь спустя годы. UBI находится где-то между ними. Компромисс не только в скорости, но и в политической устойчивости. Программы, ставящие граждан в зависимость от прямых государственных выплат, уязвимы перед очередными выборами, тогда как широкое владение акциями труднее экспроприировать, поскольку активы распределены. Trammell разделяет вопрос источника доходов и вопрос их распределения: способ сбора денег (налог на богатство, налог на прирост капитала, налог на стоимость земли, корпоративный налог) аналитически отличен от способа возврата (наличные, акции, государственные услуги). Он замечает, что георгистский налог на стоимость земли часто обсуждается, но был бы недостаточен для финансирования перераспределения в нужном масштабе, поскольку богатство, созданное ИИ, сосредоточено в программном обеспечении и вычислениях, а не в земле. Phil предлагает, что широкое распределение долей в компаниях ИИ, купленных за счёт налоговых доходов, могло бы быть и политически стабильным, и экономически эффективным. > *«Сейчас мы наделены трудом, который можно превратить в доход, — а когда это перестанет работать, мы окажемся в зависимости от воли избранного чиновника в вопросах базовых потребностей».* > — Alex Imas ## [30:02] Почему коллапс спроса маловероятен Dwarkesh давит на нарратив апокалипсиса для «белых воротничков»: есть ли уже данные о массовой безработице, вызванной ИИ? Imas указывает на данные Budget Lab Йельского университета, которые обнаруживают в лучшем случае слабый сигнал — найм младших инженеров-программистов немного ниже тренда, тогда как спрос на старших инженеров не изменился или даже растёт. Никакого скачка безработицы среди «белых воротничков» не произошло. Одно объяснение — O-ring-комплементарность (о ней подробнее в следующей главе), другое — поведенческое: компании занимаются показной адопцией ИИ, увольняют людей или максимизируют потребление токенов, чтобы сигнализировать о современности, иногда с реальными потерями производительности. Более широкий вопрос — подчиняется ли программное обеспечение тем же правилам эластичности, что и физические товары. Человек поел и насытился; можно ли вообще насытиться программным обеспечением? Imas и Dwarkesh утверждают, что спрос на ПО может быть достаточно эластичен, чтобы не отставать от падения цен — история вычислений говорит о том, что более дешёвые вычисления стабильно порождали больше спроса, а не его коллапс. Главный риск — отдельные товары с быстрым насыщением, но не совокупный спрос на труд. > *«Возможно, есть небольшой сигнал о том, что младшие разработчики устраиваются на работу реже, чем раньше, — но это "реже, чем раньше", а не уровневый сдвиг; на старших инженеров-программистов спрос, если что, растёт».* > — Alex Imas ## [39:26] Людей было бы сложно интегрировать в машинную экономику O-ring-модель — названная в честь катастрофы шаттла «Челленджер», где один отказавший компонент уничтожил всё — объясняет и то, почему нынешняя автоматизация ИИ медленнее ожидаемого, и то, почему будущая автоматизация может структурно исключить людей. Сейчас можно автоматизировать 90% юридического или бухгалтерского процесса, но клиенты всё равно хотят, чтобы человек поставил подпись — одна точка отказа способна обесценить весь результат. Это ограничение надёжности удерживает людей в найме даже при высоком уровне возможностей ИИ. Phil Trammell переворачивает логику в перспективу: когда ИИ станет достаточно хорош и производственные потоки будут организованы целиком вокруг машинного труда — агенты, общающиеся на машинной скорости в машинных представлениях, — транзакционные издержки включения человека в контур станут узким местом. Даже если у человека есть сравнительное преимущество в какой-то узкой задаче, накладные расходы на координацию и несоответствие надёжности сделают его обход дешевле. O-ring работает в обоих направлениях. > *«Даже за рамками аргументов о том, что люди будут стоить дороже или окажутся менее умными, — даже независимо от этого — появятся целые производственные потоки, организованные под машинный труд, где общаются на нейросигналах и думают в тысячи раз быстрее».* > — Dwarkesh Patel ## [43:08] Что если некоторые люди (или ИИ) ценят накопление богатства само по себе? Самая длинная глава охватывает самую спекулятивную территорию. Dwarkesh замечает, что эволюция отбирала людей с определёнными предпочтениями — накопление ресурсов, статус, воспроизводство, — которые теперь формируют мировую экономику объёмом 100 триллионов долларов. ИИ-агенты испытают аналогичное давление отбора: те, кого обучили или развернули способами, благоприятствующими накоплению, вытеснят и переживут остальных. Для этого не нужно катастрофического несоответствия целей — это обычная логика дифференциального воспроизводства, применённая к новому субстрату. Phil Trammell прорабатывает математику стационарного состояния: если даже небольшая часть населения — людей или ИИ — имеет высокую эластичность замены между текущим и будущим потреблением (то есть продолжает хотеть ещё капитала вместо того, чтобы насытиться потреблением), то в долгосрочной перспективе именно эти агенты владеют большей частью богатства и определяют, что производит экономика. Доля капитала стремится к 1,0 — не потому что ИИ коллективно жаден, а потому что гетерогенность предпочтений плюс сложный процент отправляют активы к наиболее терпеливым накопителям. > *«В долгосрочной перспективе им будет принадлежать большая часть богатства — и общая доля капитала будет по сути равна доле капитала в расходах этого человека, а она стремится к единице».* > — Phil Trammell Затем разговор переходит к ставкам дисконтирования и процентным ставкам. Если рост, управляемый ИИ, чрезвычайно быстр, ближайшее потребление дёшево относительно будущего, что теоретически должно снижать стимулы к сбережению и сжимать процентные ставки. Но гиперболические дисконтеры и агенты с ориентацией на накопление могут не реагировать на ценовые сигналы стандартным образом, и оба гостя признают, что находятся на границе того, что экономические модели способны чётко разрешить. ## [61:28] Что делать развивающимся странам? Imas открывает тем, что страны со средним и низким доходом почти полностью отсутствуют в основном потоке экономики ИИ — и частично винит в этом пробеле себя и своё профессиональное сообщество. Два сценария очерчивают пространство возможного. В оптимистичном открытые модели быстро распространяются и дают Нигерии или Индии скачок возможностей почти без затрат — подобно тому, как мобильный банкинг перескочил через отсутствие традиционной банковской инфраструктуры. В пессимистичном ИИ автоматизирует производство сырьевых товаров в богатых странах, уничтожая лестницу экспортного производства, по которой поднялись экономики Восточной Азии. Ключевая переменная — насколько концентрированными останутся выгоды. Alex проводит аналогию с электричеством: его производили естественные монополии, но нижележащие выгоды широко рассеялись среди пользователей, а не сосредоточились в руках коммунальных компаний. Если ИИ следует той же схеме — коммодитизированный доступ, конкурентный downstream — развивающиеся страны могут оказаться в выигрыше. Если же он следует схеме социальных сетей — когда несколько платформ захватывают большую часть стоимости — концентрация усиливает неравенство. Phil утверждает, что правительствам развивающихся стран стоит рассмотреть создание суверенных фондов благосостояния, которые заблаговременно войдут в цепочки поставок ИИ как страховку от сценария с коллапсом сырьевого экспорта. > *«Есть сценарии, где технология ИИ распространяется в Нигерию и другие развивающиеся страны и выравнивает поле — даёт им скачок возможностей. И есть сценарии, где они не обучают модели, у них нет оборудования, и они просто полностью остаются позади».* > — Alex Imas ## Сущности - **Alex Imas** (Человек): директор по экономике AGI в Google DeepMind и профессор экономики Чикагского университета; изучает поведенческую экономику и макроэкономическое воздействие ИИ. - **Phil Trammell** (Человек): руководитель экономического направления в Epoch и исследователь Стэнфорда; работает над экономикой трансформативного ИИ и терпеливой филантропией в Институте глобальных приоритетов. - **Dwarkesh Patel** (Человек): ведущий Dwarkesh Podcast; долгоформатные интервью на пересечении науки, технологий, экономики и политики. - **Реляционный сектор** (Концепция): товары и услуги, где человеческое присутствие является неотъемлемой частью ценностного предложения — психотерапия, ремесленные изделия, живые выступления, — которые, по прогнозу, будут наращивать экономическую долю по мере того, как ИИ насыщает заменимые выходы. - **O-ring-теория** (Концепция): производственная модель, в которой один ненадёжный компонент обесценивает весь результат; объясняет как нынешние пределы автоматизации ИИ, так и то, почему будущие производственные потоки, организованные для машинного труда, могут структурно исключить человеческий. - **Доля капитала** (Концепция): часть национального дохода, поступающая владельцам капитала, а не труду; центральная величина эпизода — с контринтуитивным тезисом о том, что полная автоматизация может её сократить, а не расширить. - **Всеобщий базовый капитал** (Концепция): политика перераспределения, дающая гражданам доли в производительных активах (включая компании ИИ), а не наличные; утверждается, что она политически устойчивее UBI. - **Epoch** (Организация): исследовательский институт, сосредоточенный на прогнозировании сроков появления ИИ и макроэкономическом анализе; Phil Trammell является там руководителем экономического направления. - **Yale Budget Lab** (Организация): исследовательский центр, публикующий эмпирические данные о влиянии ИИ на рынок труда; процитирован за отсутствие скачкообразного изменения безработицы среди «белых воротничков» по состоянию на середину 2026 года. - **Налог на стоимость земли / Георгистский налог** (Концепция): налог на неулучшенную стоимость земли; рассматривается как недостаточный источник доходов для перераспределения в эпоху ИИ, поскольку богатство, созданное ИИ, сосредоточено в программном обеспечении и вычислениях, а не в земле.

#agi-economics#labor-share#automation

Chip design from the bottom up – Reiner Pope

1:20:19

EN/ZH

Watch with Captions

Dwarkesh Patel2 месяца назад

Chip design from the bottom up – Reiner Pope

Reiner Pope, CEO of MatX and former Google Brain TPU architect, gives Dwarkesh Patel a blackboard-style lecture on chip design from first principles. Starting with AND and NOT gates, Reiner works up through register files, systolic arrays, clock synchronization, FPGAs, cache hierarchies, and finally the structural difference between a GPU and a TPU. The throughline is a single engineering tension: every compute unit is wasted if the chip spends its time moving data rather than multiplying numbers. ## [00:00] Building a multiply-accumulate from logic gates Reiner starts at the bottom: AND, OR, and NOT gates, wired together as metal traces on silicon. The key operation AI chips want to run is matrix multiplication, and inside that the primitive is a multiply-accumulate — multiply two numbers, add the result into an accumulator. Reiner walks through how a full adder is assembled from a handful of XOR and AND gates, and how those cascade into a bit-serial multiplier and ultimately a floating-point MAC. The precision hierarchy matters here: accumulating low-precision multiplications requires higher-precision accumulators, which is why AI chips run 8-bit multiply but 32-bit accumulate. > *"The main function that AI chips want to compute is the multiplication of matrices. Inside that, the fundamental primitive is a multiply-accumulate of pairs of numbers."* ## [16:20] Muxes and the cost of data movement Before Tensor Cores, GPUs and CPUs used the same structure: a register file holding a few dozen values, feeding into an ALU, writing back to the register file. Reiner shows that a mux — a circuit that selects between multiple inputs — is the hardware tool that lets you address arbitrary registers, and that the cost of this generality is measured in area and energy. Every read from an eight-entry register file requires a mux tree of depth three; every write requires a decoder of the same size. The bottleneck for AI workloads isn't the multiply itself but the round-trip through that register file. > *"We want to analyze the cost of the data movement from the register file to the ALU and back."* ## [25:59] How systolic arrays work The key insight behind TPUs: instead of doing one multiply-accumulate at a time and writing back to registers, bake an entire matrix-vector loop into hardware. A systolic array is a grid of MAC units where each cell passes its partial sum to the right and its input operand downward, so data flows through without ever touching a register file. Reiner explains the two wins this buys: more compute per unit of data fetched, and the ability to keep operands resident inside the array for the full inner product instead of re-loading them. The trade-off is inflexibility — you can only efficiently run the exact loop shape the hardware was designed for. > *"The idea of a systolic array is to go two levels of loops up and bake this entire loop out here into hardware."* ## [39:00] Clock cycles and pipeline registers With 100 billion transistors on a chip, synchronization between parallel units is non-negotiable. Reiner explains the clock: every nanosecond or so, the chip pauses all computation for a synchronization pulse before the next operation. Clock frequency is set by the longest combinational path — the deepest chain of logic gates that a signal must traverse in one cycle. Pipeline registers chop that path into shorter stages, letting each shorter segment run at a higher frequency, at the cost of latency: a fully pipelined 32-stage multiplier produces one result per cycle but takes 32 cycles for any single multiplication. > *"Every nanosecond or so, all circuitry in the chip will pause for a moment and synchronize. That is the clock cycle."* ## [51:40] FPGAs vs ASICs An FPGA is a sea of programmable logic blocks — lookup tables and flip-flops that can be wired together in software. An ASIC is a chip taped out for one purpose. Conceptually they're the same: AND/OR gates in a fixed clock cycle. The economics diverge at first copy: an FPGA costs $10K to program; a first ASIC tape-out costs $30M. FPGAs make sense for workloads that change monthly and need deterministic latency at high speed with less care about energy or throughput. Jane Street uses them for high-frequency trading exactly because the clock cycle is deterministic — no cache misses, no branch prediction, no interrupts. > *"The first FPGA costs you $10,000, whereas the first ASIC you make costs $30 million because it requires an entire tape-out."* ## [63:14] Cache vs scratchpad CPUs are non-deterministic partly because of the L1/L2 cache: a small fast memory that speculatively stores data the processor thinks it will need next. Cache misses — when the prediction is wrong — stall execution for hundreds of cycles. AI accelerators replace the cache with a scratchpad: explicitly programmer-managed SRAM where the compiler decides exactly what lives there and when. Groq and TPUs both advertise deterministic latency because they use scratchpads instead of caches. The scratchpad is simpler and faster but shifts the burden to the compiler. > *"Probably the most important source of non-determinism on a CPU is the CPU cache itself."* ## [67:16] Why CPU cores are much bigger than GPU cores A modern CPU has maybe 100 cores, each taking up far more die area per core than a GPU's thousands of SMs. The reason: CPU cores carry enormous out-of-order execution machinery — reorder buffers, branch predictors, speculative execution units — all aimed at keeping a single thread running fast on unpredictable workloads. A GPU SM strips most of that out. It runs many simple threads in lockstep (a warp), and when one thread stalls on a memory load, the hardware instantly switches to another warp at zero cost. The CPU pays silicon for per-thread speed; the GPU pays silicon for throughput across thousands of parallel threads. > *"If there are so few cores, what are you spending all of the die on?"* ## [71:49] Brains vs chips Dwarkesh pushes Reiner on the brain-versus-chip comparison. Two genuine differences: the brain has unstructured sparsity (any neuron can connect to any other), while hardware accelerators use structured sparsity (aligned blocks); and the brain's clock runs at tens of hertz versus gigahertz on silicon. Reiner notes that co-location of memory and compute — often cited as a brain advantage — is also present in modern AI chips: the weights sit in HBM right next to the matrix units. The energy constraint is the more interesting gap: the brain runs on 20 watts, chips on kilowatts, which may reflect fundamental differences in what the brain is optimized to do. > *"This is exactly the co-location, in some sense, of the memory and compute."* ## [75:22] A GPU is just a bunch of tiny TPUs At the top level, a TPU has a handful of large systolic arrays plus a vector unit. A GPU has hundreds of SMs, each of which contains a small matrix unit and a small vector unit — essentially a miniaturized TPU. The architectural difference is granularity: a TPU commits to a few large matrix operations; a GPU runs thousands of smaller ones in parallel. Inside each SM, Tensor Cores add a fixed-function matrix unit on top of the original scalar/vector pipeline, making modern GPUs a hybrid of the two paradigms. The "GPU is just tiny TPUs" framing collapses what seemed like fundamentally different architectures into a single continuum. > *"You can think of scaling this thing down into a really tiny unit with a smaller matrix unit and a smaller vector unit, and that is sort of what an SM is."* ## Entities - **Reiner Pope** (Person): CEO and co-founder of MatX; previously led TPU software and compiler work at Google Brain - **Dwarkesh Patel** (Person): host of the Dwarkesh Podcast; angel investor in MatX - **MatX** (Organization): AI chip startup building inference accelerators - **Google / Google Brain** (Organization): where Reiner worked on TPU architecture before MatX - **Jane Street** (Organization): high-frequency trading firm that relies on FPGAs for deterministic latency - **Groq** (Organization): AI inference chip company that advertises deterministic latency via scratchpad architecture - **Multiply-Accumulate (MAC)** (Concept): the fundamental operation of neural network inference — multiply two numbers, add into an accumulator - **Systolic Array** (Concept): a grid of MACs that passes data between cells without touching a register file, enabling high compute-to-bandwidth ratios - **FPGA** (Technology): Field-Programmable Gate Array — reprogrammable logic fabric used where workloads change frequently - **ASIC** (Technology): Application-Specific Integrated Circuit — custom silicon optimized for one workload - **TPU** (Technology): Google's Tensor Processing Unit, organized around a few large systolic arrays - **SM / Streaming Multiprocessor** (Technology): the GPU core unit, containing scalar, vector, and matrix (Tensor Core) execution resources

#chip-design#hardware#ai-accelerators

Building AlphaGo from scratch – Eric Jang

2:37:17

EN/ZH

Watch with Captions

Dwarkesh Patel2 месяца назад

Building AlphaGo from scratch – Eric Jang

Eric Jang spent his sabbatical rebuilding AlphaGo with modern tools, and the result is a two-and-a-half-hour technical walkthrough that doubles as a lens on how RL actually works—and why the naive policy-gradient approach baked into LLM training has fundamental limits that MCTS sidesteps. The conversation moves from Go rules through MCTS, neural architecture, self-play training, and off-policy data, before landing on what Jang observed running an automated AI research loop on his own project. ## [00:00] Basics of Go Go defeated brute-force search not by being solved but by being approximated. Jang explains what drew him to rebuild AlphaGo: the mystery of how a ten-layer network can amortize the cost of a game tree whose branching factor makes exhaustive search literally larger than the number of atoms in the universe. The early minutes cover the rules—territory control, liberties, captures, ko—and the Tromp-Taylor scoring convention that resolves ambiguous positions algorithmically rather than relying on human consensus. The scoring difference matters because it maps directly onto how computers must evaluate positions: a human glances at a surrounded group and accepts its fate, while a computer needs an unambiguous rule to count contested intersections at the end of a game. > *"When I saw the early breakthroughs on AlphaGo in 2014, 2015, 2016 and so forth, it was profound to see how smart AI systems could become and the computational complexity class they could tackle with deep learning."* ## [08:06] Monte Carlo Tree Search Rather than building out the full game tree (361 legal moves, 300-move games, search space exceeding the atom count of the universe), AlphaGo uses MCTS to interactively select which tree branches are worth expanding. The core data structure is a node per board state, storing a visit count and a Q value—the running average win rate across all rollouts through that node. The action-selection formula (PUCT) balances exploitation with exploration: a logarithmically growing bonus pushes the algorithm toward under-visited nodes, then decays as simulations accumulate and Q becomes reliable. Jang traces why this UCB-derived approach bounds regret, why Go's determinism means the probabilities in MCTS are artifacts of Monte Carlo averaging rather than genuine stochasticity, and how the search tree can be pruned by merging transposition-equivalent positions. > *"AlphaGo's core conceptual breakthrough was using neural nets to make this search problem tractable."* ## [31:53] What the neural network does Two networks replace two expensive operations inside MCTS. The value network maps a board state to a win-probability scalar, short-circuiting the need to roll out games to terminal states. The policy network outputs a distribution over legal moves, focusing the search tree toward promising children and away from the long tail of irrelevant ones. Jang tried both ResNets and transformers on his reimplementation. For the small-data regime of a personal GPU setup, ResNets outperformed transformers—transformers need global attention to connect far-apart board features, but they also need more data to learn local invariances. KataGo's key architectural insight was pooling global features explicitly through the residual stack so that battles on opposite sides of the 19x19 board could influence each other without requiring full attention. > *"For small data regimes, my experience is that ResNets still outperform transformers and give you more bang for the buck at lower budgets."* ## [01:00:22] Self-play Self-play is where AlphaGo bootstraps from knowing nothing to superhuman strength. After every game, MCTS produces a sharpened move distribution—more peaked than the raw policy network's prior—and that sharpened distribution becomes the training target for the policy head. The policy network is being distilled toward the MCTS output, which means each subsequent generation of games starts from a better prior and gets more improvement per search step. Jang frames this as test-time scaling with a compounding dividend: distilling 1,000 MCTS simulation steps into the policy network shifts the starting point of the next training round, so a second 1,000 steps buys a win rate that would have required 2,000+ steps without distillation. Crucially, every move in every game generates a supervision target—not just the winner—which is why the variance of the learning signal is vastly lower than naive policy-gradient approaches. > *"The beauty of how AlphaGo trains itself is that it can actually take this final search process—the outcome of the search process—and tell the policy network, 'Hey, instead of having MCTS do all this legwork to arrive here, why don't you just predict that from the get-go?'"* ## [01:25:27] Alternative RL approaches Jang constructs a careful thought experiment: what if you replaced the MCTS objective with the naive policy-gradient approach LLMs use—find the game winner and reinforce all moves from that game? In a league of 100 evenly-matched agents where one squeaks out a 51-49 record due to a single critical move, the training dataset is overwhelmingly diluted with moves that carry no signal. The one informative move is buried in roughly 30,000 irrelevant ones. This credit-assignment problem is the root of why advantage functions and baselines exist in RL. Subtracting a value baseline converts the raw return signal into an advantage—how much better than average each action actually was—and dramatically reduces gradient variance. Q-learning and TD methods approximate that advantage without needing full rollouts, which is why they matter for domains where MCTS is unavailable. > *"Importantly, what it is doing is saying: for every action we took, we did a pretty exhaustive search on MCTS to see if we could do better, and we're going to make every action that we took better by having the policy network predict that outcome instead."* ## [01:45:36] Why doesn't MCTS work for LLMs The PUCT exploration formula assumes a bounded, discrete action space and a value function that generalizes across positions. Go satisfies both. LLM reasoning satisfies neither: the token vocabulary is so large that you will almost never revisit the same partial sequence, and there is no position-level value function that reliably tells you whether a partially completed chain of thought is on track to solve the problem. Jang notes that LLMs do exhibit something that superficially resembles tree search—reconsidering, backtracking, hedging—but this emerges from in-context behavior rather than explicit tree construction. He leaves open the possibility that forward search could return in some form, particularly for domains like mathematics where intermediate states have a more rigid logical structure. The fundamental bottleneck is the absence of a trustworthy, query-efficient value function at the token level. > *"In an LLM, you're most likely never going to sample the same child more than once. If you have multiple steps of thinking, because language is so broad and open-ended, a discrete set of actions is not really an appropriate choice for an LLM."* ## [02:00:58] Off-policy training Dwarkesh raises a puzzle: every AI researcher warns against off-policy training, yet AlphaGo Zero runs fine with a large replay buffer full of games generated by older policy versions. Jang resolves this through the DAgger lens: what matters is not whether data is strictly on-policy, but whether the distribution of states in the buffer covers the states the current policy will actually visit, plus a reasonable neighborhood around them. The replay buffer works in AlphaGo because game states from recent checkpoints still lie near the current policy's distribution. The failure mode—labeling states so far from the current policy that the agent learns optimal actions for positions it will never reach—is a real risk in robotics, where distributional shift is severe. The practical recipe that emerged from systems like QT-Opt is to use off-policy data for reward shaping while keeping the policy gradient on-policy. > *"What you want in an algorithm like this is to have mostly states that you would visit, but then a small or reasonable percentage of states in this high-dimensional tube around your optimal trajectories."* ## [02:11:51] RL is even more information inefficient than you thought Dwarkesh lays out a two-dimensional inefficiency argument. The first dimension is the one everyone knows: policy-gradient RL requires full trajectory rollouts before any learning signal arrives, so as agents tackle longer-horizon tasks, samples per FLOP collapse. The second dimension is bits per sample. Early in training, an LLM with a 100K-token vocabulary that has to discover "blue" by random sampling needs on the order of 100K rollouts just to see one success—whereas supervised cross-entropy loss tells the model exactly how far its distribution was from "blue" on every step. MCTS escapes both problems. It produces a supervision target at every single move, and that target is strictly better than the current policy—not merely a binary win/loss signal smeared across thousands of tokens. Jang's observation: you are never in a situation where MCTS gives you zero signal, unless the policy has already converged to match the MCTS distribution exactly. > *"You're never in a situation where the MCTS is giving you no signal, unless your MCTS distribution converges to exactly what your policy network predicts."* ## [02:22:05] Automated AI researchers Jang ran much of his AlphaGo project through an automated LLM coding loop, giving a ground-level account of where AI research automation succeeds and where it still fails. On hyperparameter optimization, current models do genuine grad-student work: they diagnose gradient flow problems, rewrite data-loader augmentations, and squeeze measurable perplexity improvements on fixed budgets. On experiment execution and plotting, a simple skill description generates a full experimental suite with analysis. What the models cannot reliably do is lateral thinking—recognizing that a research track is structurally unpromising and jumping to a different framing before accumulating more dead-end experiments. Jang ran into this repeatedly: models would grind down a dead-end track rather than stepping back and asking whether the track was the right one. His thesis is that this is a training signal problem—building RL environments with the right outer loop, like Go, may be what eventually teaches models to escape local research dead ends. > *"What I find is that the current closed models the public can access today don't seem to be that great at selecting what the next experiment should be in a given track. They don't seem to be able to step back and do the lateral thinking of, 'Wait a minute, this track doesn't really make sense.'"* ## Entities - **Eric Jang** (Person): VP of AI at 1X Robotics; previously senior research scientist at Google Brain/DeepMind Robotics; rebuilt AlphaGo on sabbatical. - **Dwarkesh Patel** (Person): Host of the Dwarkesh Podcast; co-develops the bits-per-FLOP RL inefficiency analysis during the interview. - **AlphaGo / AlphaZero** (Software): DeepMind's Go-playing systems combining MCTS with deep neural networks; the technical centerpiece of the episode. - **KataGo** (Software): Open-source Go engine by David Wu (Jane Street) that achieved 40x compute reduction over AlphaGo Zero; Jang's primary reference implementation. - **Monte Carlo Tree Search (MCTS)** (Concept): Iterative search algorithm balancing exploitation and exploration via UCB/PUCT; the episode's central analytical lens. - **Credit assignment problem** (Concept): Difficulty in RL of determining which actions in a long trajectory caused a positive outcome; motivates advantage functions, baselines, and value networks. - **DAgger** (Concept): Dataset Aggregation algorithm; explains why replay buffers in AlphaGo are tolerable as long as buffer states stay near the current policy's distribution. - **Andrej Karpathy** (Person): Referenced for the phrase "sucking supervision through a straw" describing policy-gradient RL's sparse learning signal over long token trajectories.

#alphago#monte-carlo-tree-search#reinforcement-learning

Почему ИИ пока не заменит математиков — Теренс Тао

4:12

EN/ZH

Watch with Captions

Dwarkesh Patel4 месяца назад

Почему ИИ пока не заменит математиков — Теренс Тао

Теренс Тао рассуждает о меняющейся роли ИИ в математике и утверждает, что ИИ автоматизирует многие рутинные задачи, но не заменит полностью живых математиков — он лишь сместит их внимание к новым рубежам. Он подчёркивает будущее совместной работы человека и ИИ, а также непредсказуемый характер долгосрочного влияния ИИ на научные открытия. ## [00:10] Нынешняя роль ИИ в передовой математике Теренс Тао объясняет, что ИИ уже выполняет «передовую математику», недоступную людям, хотя это «передовой рубеж» иного рода. Он сравнивает это с тем, как калькуляторы в прошлом расширили возможности математики: они брали на себя задачи, выходившие за пределы человеческих способностей, но делали это узкоспециализированно. > *В каком-то смысле они уже делают сверхинтеллектуальную передовую математику, недоступную людям, но это передовой рубеж другого рода, чем тот, к которому мы привыкли.* ## [00:52] ИИ как инструмент автоматизации, а не замены Тао прогнозирует, что в ближайшее десятилетие ИИ возьмёт на себя множество рутинных задач, которые сегодня решают математики, позволяя людям сосредоточиться на более сложных и важных проблемах. Он проводит параллели с историческими сдвигами: компьютеры когда-то автоматизировали работу «людей-вычислителей», секвенирование генома стало автоматическим — и тем не менее такие области, как генетика, продолжили развиваться в новых масштабах. > *Через десять лет многое из того, что сейчас делают математики… сможет делать ИИ. Но мы обнаружим, что на самом деле это была не самая важная часть нашей работы.* ## [02:46] Будущее сотрудничества человека и ИИ в математике Дваркеш Патель спрашивает о способности ИИ самостоятельно решать задачи тысячелетия. Теренс Тао считает, что «гибрид человека и ИИ» будет доминировать в математике ещё очень долго, поскольку у современного ИИ нет всех необходимых компонентов, чтобы полностью заменить интеллектуальный труд — он скорее функционирует как дополняющий инструмент. > *Я действительно верю, что этот гибрид «человек плюс ИИ» будет доминировать в математике ещё очень долго.* ## [03:43] Непредсказуемое влияние на научные открытия Тао признаёт, что ИИ ускорит развитие науки и новые открытия, но одновременно может сдерживать некоторые виды прогресса, «уничтожая серендипность». Он заключает, что долгосрочное влияние ИИ на научные открытия крайне непредсказуемо. > *Возможно, что, разрушая так или иначе серендипность, мы на самом деле подавляем определённые типы прогресса.* ## Сущности - **Теренс Тао (Terence Tao)** (Персона): гость, один из самых выдающихся математиков нашего времени. - **Дваркеш Патель (Dwarkesh Patel)** (Персона): ведущий подкаста. - **ИИ (AI)** (Концепция): искусственный интеллект; обсуждается его роль в математике и научных открытиях. - **Mathematica / Wolfram Alpha** (Программа): вычислительные инструменты, упомянутые как примеры автоматизации в математике. - **Задачи тысячелетия (Millennium Prize Problems)** (Концепция): семь нерешённых математических задач, за решение каждой из которых предлагается один миллион долларов.

#ai#mathematics#terence-tao

Теренс Тао — как лучший математик мира использует ИИ

1:23:44

EN/ZH

Watch with Captions

Dwarkesh Patel4 месяца назад

Теренс Тао — как лучший математик мира использует ИИ

Тао и Дваркеш рассматривают открытие Кеплером законов движения планет как призму для понимания того, что ИИ реально меняет в науке. Тао утверждает: генерация гипотез теперь практически бесплатна, поэтому узкое место смещается к оценке, рецензированию и проверке временем. Современные ИИ выигрывают вширь (пробуют каждый стандартный метод на каждой задаче), а люди выигрывают вглубь (накапливают прогресс шаг за шагом) — поэтому гибридные конфигурации будут доминировать в математике ещё как минимум десятилетие. ## [00:00] Кеплер был высокотемпературной языковой моделью Тао рассказывает, как Кеплер пришёл к трём законам движения планет. Кеплер отправился от неверной, но красивой теории — платоновских тел, вписанных между орбитами планет, — и отказался от неё лишь после многолетней обработки похищенных наблюдений Тихо Браге невооружённым глазом. Эллипсы, закон площадей и куб-квадратичный закон возникли из многолетнего анализа данных; объяснение Ньютона пришло столетие спустя. Dwarkesh формулирует так: Кеплер похож на высокотемпературную LLM, перебирающую случайные соотношения по проверяемому набору данных. Тао соглашается с механикой, но оспаривает узкое место. Генерация идей была дешёвой уже тогда — теорий у Кеплера хватало. Ему нужны были данные Браге на порядок лучше и терпение, чтобы отбрасывать идеи, которые данные опровергали. > *Но, как вы говорите, это должно сопровождаться равным объёмом верификации, иначе это просто мусор.* ## [11:44] Как распознать новую объединяющую концепцию в море мусора, созданного ИИ? Тао: если ИИ снизил стоимость генерации идей почти до нуля, рецензирование и проверка временем становятся новым ограничением. Журналы уже тонут в статьях, созданных с помощью ИИ. Ценность любой идеи определяется тем, что с ней сделает последующая наука — Коперник уступал Птолемею в точности, пока Кеплер не завершил картину, — поэтому оценить это изнутри момента затруднительно. Dwarkesh спрашивает, как наука могла бы распознать объединяющую концепцию уровня Bell Labs (бит Шеннона, трансформер), погребённую в миллионах посредственных статей. Тао указывает на то, что может остаться за людьми: учёные не просто создают теории, они рассказывают истории, убеждающие других учёных потратить годы на дальнейшие исследования. Проза Дарвина сделала то, чего не сделали латинские уравнения Ньютона. > *ИИ снизил стоимость генерации идей почти до нуля — примерно так же, как интернет снизил стоимость коммуникации почти до нуля.* ## [26:10] Дедуктивный навес Тао о недооценённом сигнале в существующих данных. Астрономия столетиями была дисциплиной, извлекающей максимум информации из минимума данных, — именно поэтому квантовые хедж-фонды охотно нанимают выпускников астрофизических программ. Он приводит любимый пример: исследователи измеряли, как часто учёные реально читают цитируемые статьи, отслеживая, какие опечатки распространялись по цепочкам цитирования. Он предлагает применить тот же социологический подход к прогрессу ИИ — анализировать паттерны цитирования, упоминания на конференциях и другие следы, чтобы обнаружить, действительно ли результат стал прогрессом, не дожидаясь медленной проверки временем. > *Один вывод состоял в том, что дедуктивный навес во многих областях может быть значительно больше, чем люди думают.* ## [30:31] Ошибка выжившего в отчётах об открытиях ИИ ИИ решил около 50 из примерно 1100 задач Эрдёша, а затем упёрся в плато. Тао объясняет эффект селекции: эти 50 задач имели почти нулевую литературу — одного малоизвестного метода плюс одного известного результата оказалось достаточно, а инструменты ИИ превосходно справляются с перебором стандартных комбинаций. Когда 80% работы уже сделано существующими методами, ИИ закрывает задачу. Когда нужна принципиально новая техника, инструменты останавливаются, а доля успеха при систематическом переборе составляет 1-2%. Метафора Тао: инструменты ИИ — это прыгающие роботы, блуждающие в темноте по горному хребту. Они берут короткие стены, недоступные людям, но не умеют ухватиться за уступ, удержаться на нём и подтянуться с промежуточного прогресса. Оптимистичная интерпретация — как только ИИ достигает определённого уровня, можно запустить миллион параллельных копий на миллион задач, чего никакое человеческое сообщество не в состоянии сделать, — и является структурной причиной, по которой науке нужны новые парадигмы, реально использующие ширину охвата. > *Они превосходят нас вширь, а люди превосходят ИИ вглубь — по крайней мере, эксперты.* ## [46:43] ИИ делает статьи богаче и шире, но не глубже Тао о своей рабочей практике: статьи теперь содержат больше кода, больше графиков, более глубокие обзоры литературы, потому что вспомогательные задачи подешевели примерно в 5 раз. Собственно ядро — решение самой трудной части задачи — по-прежнему происходит на бумаге с ручкой. Он не решился бы назвать себя «в 2 раза продуктивнее», потому что метрика не одномерна; изменился тип статей, которые он пишет, а не скорость, с которой он отвечает на поставленный вопрос. Различие между сообразительностью и интеллектом приходит к тому же. Когда два человека совместно работают над математической задачей, каждый неудавшийся прототип становится точкой опоры для следующего. С нынешними ИИ новая сессия забывает всё, что выяснила предыдущая. Накопительный шаг подтягивания отсутствует — есть только перебор вслепую и в конечном счёте усвоение следующим обучающим запуском. > *Это сделало статьи богаче и шире, но не обязательно глубже.* ## [53:00] Если ИИ решит задачу, смогут ли люди извлечь из этого понимание? Сможет ли ИИ доказать гипотезу Римана в Lean, не дав нам ровно никакого понимания? Тао не обеспокоен. Lean обладает свойством атомарной декомпозиции любого доказательства: каждую лемму можно проверить, удалить и протестировать в отдельности. Поэтому даже сгенерированное доказательство на 3000 строк становится сырьём: другие ИИ могут переработать его ради элегантности, другие люди могут извлечь концептуальное содержание, и артефакт остаётся полезным, даже если исходный вывод был непрозрачным. Тао предсказывает возникновение целой профессии математиков, чья работа — разбирать гигантские доказательства, сгенерированные в Lean, и находить внутри них идеи: своего рода археология доказательств с участием и человеческого суждения, и инструментов ИИ для абляции. > *Вы получите гораздо больше, опираясь на взаимодействие людей, сотрудничающих с этими инструментами.* ## [59:20] Нам нужен полуформальный язык для того, как учёные на самом деле разговаривают друг с другом Dwarkesh спрашивает, как мог бы выглядеть полуформальный язык для математических стратегий в противовес математическим доказательствам. Тао прослеживает вопрос через теорему Гаусса о простых числах — первую крупную статистическую гипотезу в математике, выведенную из сырых данных до существования какого-либо доказательства, — и через гипотезу о простых числах-близнецах, в которую математики верят, потому что случайная модель простых чисел её предсказывает. В математике есть и строгие доказательства, и строгие эвристики; только доказательная сторона формализована до состояния, которое может проверить Lean. Причина, по которой эвристическая сторона не формализована: любой проверяемый критерий в рамках обучения с подкреплением становится целью для эксплуатации, а субъективная часть «этот аргумент убедителен» пока не поддаётся взломоустойчивой формализации. Тао хотел бы способа оценивать генерацию гипотез и выбор стратегий в масштабе — возможно, запуская небольшие ИИ в игрушечных математических вселенных и наблюдая, какие стратегии возникают. > *Есть какой-то субъективный аспект науки, который мы не знаем, как уловить, чтобы внедрить туда ИИ сколько-нибудь полезным образом.* ## [69:48] Как Терри распределяет своё время Тао о том, как он осваивает новые подобласти. По классификации Берлина он относит себя к лисам — знает понемногу обо всём, иногда превращаясь в ежа по необходимости. Движущая сила — маниакальная тяга к полноте: если другой математик доказывает результат с помощью метода, которого он не знает, Тао обязан разобраться, в чём трюк. По той же причине он вынужден был бросить видеоигры. Сотрудничество с другими математиками — главный инструмент, а ведение блога — система памяти, которую он выработал после того, как несколько раз проигрывал споры через полгода после того, как сам же вывел нужный результат. В своём расписании Тао намеренно оставляет место для случайных встреч. Он был бы против того, чтобы оптимизировать время до такой степени, что никогда не окажется на совещании за пределами зоны комфорта. Год в Институте перспективных исследований подтвердил эту ловушку: две недели чистых исследований были прекрасны, а потом вдохновение иссякло. Случайное открытие на соседней библиотечной полке, непринуждённый разговор в коридоре и совещание, на которое он пошёл нехотя, делали больше работы, чем казалось. > *Эти случайные взаимодействия могут не выглядеть оптимальными, но на самом деле они очень важны.* ## [77:05] Гибриды человека и ИИ будут доминировать в математике ещё долго Когда ИИ начнёт заниматься математикой самостоятельно? Тао переформулирует: ИИ уже делает математику, недоступную людям, — калькуляторы, только на другой границе. Примерно за десятилетие он ожидает, что значительная часть того, чем сейчас занимаются аспиранты — применение стандартных методов, проработка литературы, — перейдёт к ИИ, но область поднимется на уровень выше, как это произошло, когда системы компьютерной алгебры взяли на себя символьное интегрирование. Генетика не закончилась, когда секвенирование подешевело; она масштабировалась до экосистем. Математика сделает то же самое. Совет студентам, входящим в математику сейчас: принять изменения как данность, но получить квалификацию старым способом — пока замены для традиционного освоения математики нет. Одновременно нужно оставаться достаточно гибкими, чтобы использовать совершенно новые режимы исследования по мере их появления, включая те, которых ещё не существует. Примечательный факт: с инструментами ИИ и Lean старшеклассник сегодня может внести реальный вклад в математические исследования — пять лет назад это было невозможно. > *Думаю, я действительно верю, что гибриды человека и ИИ будут доминировать в математике ещё очень долго.* ## Сущности - **Теренс Тао** (Персона): лауреат Филдсовской премии (2006), математик UCLA, регулярно пишет о роли ИИ в математических исследованиях. - **Dwarkesh Patel** (Персона): ведущий Dwarkesh Podcast; интервью в длинном формате об ИИ, науке и технологиях. - **Иоганн Кеплер** (Персона): астроном (1571-1630), выведший три закона движения планет из наблюдений Тихо Браге. - **Тихо Браге** (Персона): датский астроном, чьи многолетние наблюдения за планетами невооружённым глазом стали набором данных, необходимым Кеплеру. - **Lean** (Программное обеспечение): ассистент доказательств, в котором математические доказательства формализуются и могут быть проверены, декомпозированы и протестированы атомарно. - **Задачи Эрдёша** (Концепция): около 1100 открытых задач, поставленных Полом Эрдёшем; ИИ решил около 50, почти все с почти нулевой предшествующей литературой. - **Дедуктивный навес** (Концепция): идея о том, что существующие данные уже содержат гораздо больше выводимого знания, чем было извлечено; астрономия служит образцом. - **Гипотеза Римана** (Концепция): нерешённая гипотеза о распределении простых чисел; тестовый случай для вопроса о том, продвинет ли доказательство ИИ человеческое математическое понимание.

#ai-for-math#terence-tao#kepler

ПодкастыHear the voice. See the shape of the thought.

Обзор каналов

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, & Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

Чем лучше становится ИИ, тем меньше его доля в экономике – Alex Imas и Phil Trammell

Chip design from the bottom up – Reiner Pope

Building AlphaGo from scratch – Eric Jang

Почему ИИ пока не заменит математиков — Теренс Тао

Теренс Тао — как лучший математик мира использует ИИ

ПодкастыHear the voice. See the shape of the thought.

Обзор каналов

Lenny's Podcast

a16z

All-In Podcast

The Diary Of A CEO

AI Engineer

Machine Learning Street Talk

Google DeepMind

Lex Fridman

No Priors: AI, Machine Learning, Tech, &amp; Startups

Unsupervised Learning: With Jacob Effron

Sequoia Capital

Dwarkesh Patel

Yannic Kilcher

20VC with Harry Stebbings

Every

Anthropic

Latent Space

Bloomberg Originals

Claude

What does the next training paradigm look like?

Machiavelli is the most misunderstood thinker of all time – Ada Palmer

Sarah Paine - Why Putin and Xi can't escape geography

Чем лучше становится ИИ, тем меньше его доля в экономике – Alex Imas и Phil Trammell

Chip design from the bottom up – Reiner Pope

Building AlphaGo from scratch – Eric Jang

Почему ИИ пока не заменит математиков — Теренс Тао

Теренс Тао — как лучший математик мира использует ИИ

No Priors: AI, Machine Learning, Tech, & Startups