Back to Podcasts Latent Space

Scaling Past Informal AI - Carina Hong, Axiom Math

But it's for the first time now I think verified AI is to [music] open up collaboration either it's human AI collaboration well before blueprinting that's human human collaboration and lyn was a grounding was a verification formal language and then human [music] AI collaboration like we're seeing now future AI agent agent agent like collaboration like I [music] think verified AI is for openness it's not for meeting the requirements of closed industries and I think just like I think verification should not be about 我认为，经验证的 AI 第一次真正打开了协作的大门，无论是人与 AI 的协作，还是更早阶段蓝图设计时的人与人之间的协作。Lean 是形式化验证的奠基语言，然后是我们现在看到的人与 AI 协作，未来是 AI 智能体之间的协作。我认为，经验证的 AI 的意义在于开放性，而不是满足封闭行业的需求。验证不应该是 oh I remember like you know there's this article like chatbots 哦，我记得有篇文章说，聊天机器人 It's mixed up of is math solution to hallucination. 把这些混在一起说：数学是不是解决幻觉问题的方案？ Verification to me is not about lousiness. 对我来说，验证不是关于修补低劣。 Verification to me is about scaling brilliance, compounding brilliance. 对我来说，验证是关于扩展卓越，是关于复利式地积累卓越。 It's like just kind of going back to the collaboration point. 这就又回到了我说的协作这个点。 It's about Rammenujin being a much stronger mathematician. 说的是拉马努金成为一个更强大的数学家。 He was already a really strong one, but verification helps him extend the brilliance like both kind of like scale up and [music] scale out. 他本来就已经非常强了，但验证帮助他将这种卓越延伸出去，既能纵向扩展，也能横向扩展。[音乐] Welcome to the Leen Space AR for science podcast. 欢迎来到 Latent Space AI for Science 播客。 I'm Brandon Anderson. 我是 Brandon Anderson。 I build uh RNA therapeutics at Atomic AI and I'm joined by R.J. Haneki uh the CTO of Mirror Omix uh working on spatial transcrytoics. 我在 Atomic AI 做 RNA 疗法研究，今天我身边是 RJ Honicky，他是 Miro Omix 的 CTO，专注于空间转录组学。 It's a pleasure to have Karina Hong from CEO and founder of Axiom Math. 我们很荣幸邀请到 Axiom Math 的 CEO 兼创始人 Carina Hong。 Um Axiom has made a splash [music] in several different areas. Axiom 在好几个领域都掀起了不小的波澜。[音乐] First they were they got a perfect score in the Putinome uh last uh December. 首先，他们在去年 12 月的普特南数学竞赛中获得了满分。 I think they also had the claim of the first AI to prove research conjectures using formal verification and um very exciting they just yesterday [music] announced quite a large uh series A. 我记得他们还宣称是首个通过形式化验证来证明研究猜想的 AI，而且就在昨天，他们宣布了相当大规模的 A 轮融资，非常令人振奋。[音乐] Um yeah, welcome to the show. 欢迎来到节目。 Thank you for having me. 谢谢你们邀请我。 You just raised $200 million, which as one of your colleagues said, this is like basically the entire like US math budget for math research each year. 你们刚刚完成了 2 亿美元的融资，正如你的一位同事说的，这相当于美国每年的整个数学研究预算。 Is that true? 真的吗？ Actually, 其实， according to his LinkedIn post, yeah. 根据他 LinkedIn 上的帖子，是的。 Wow. 哇。 Uh 250 million is our apparently the annual uh math budget. 2.5 亿美元好像是美国每年的数学预算。 We should spend more on math research. 我们应该在数学研究上投入更多钱。 [laughter] [笑声] Yeah, it's kind of sad, but 是啊，有点悲哀，但 Yeah, I know. 是啊，我知道。 But anyway, like you know, as a you know, as a nerd who loves math, that's like really cool. 不管怎样，作为一个热爱数学的书呆子，这真的很酷。 But I mean I'm just like that kind of blew my mind like what 不过我真的，这件事让我大开眼界，就是 [laughter] [笑声] I heard that like okay so like yeah how is it 200 200 million 我听说，好吧，就是，2 亿是怎么算出来的 I guess 1.6 billion valuation. 估值应该是 16 亿美元吧。 Yeah I don't know. 是的，我也说不准。 Yeah. 是啊。 Um well super super excited to be here. 非常非常开心来到这里。 Um also I think like you know this is a series A so it's very very interesting timely timely podcast. 而且，这是 A 轮，所以这期播客来得非常及时，时机很好。 Uh we are like a seven eight months old company so it definitely means a lot to us. 我们是一家只有七八个月历史的公司，这对我们来说真的意义重大。 It's a really cool milestone. 这是一个非常棒的里程碑。 Um we're currently about like 30 people now. 我们现在大概有 30 个人。 So kind of going into I think this amount of funding will like give us a fuel that we it needs to to to accelerate um the strong execution momentum that we have so far. 这笔资金会给我们提供我们需要的燃料，来加速我们迄今为止强劲的执行势头。 Um I think like people think of us like there are many kind of ways to think about people think of as us as a math startup. 大家对我们的看法有很多种，有人把我们看成一家数学公司。 So math startup lin startup the other obviously things that we do um that are formal verification. 数学公司、Lean 公司，以及我们做的形式化验证方面的事情。 We think verification is a really good best first market for math 我们认为，验证是数学领域非常好的首选市场。 and so I think this fundra is going to like let us explore some of the applied domains 这次融资将让我们去探索一些应用领域。 uh as my colleague CTO Shubo said in the the little launch video um of the series A we had is it it lets us broaden our dreams. 正如我们 CTO Shubo 在 A 轮发布的小视频里说的，这让我们能够拓展我们的梦想。 So yeah 就是这样。 but still like $200 million and I guess a 1.6 billion valuation. 但 2 亿美元，还有 16 亿的估值 How is there a market for that? 这个市场在哪里？ I mean I was like obviously you're not doing this just for the fun of proving things although I'm sure there's a lot of that but 你们显然不只是为了证明定理的乐趣而做这件事，虽然我相信这也有很大一部分 so let's let's bring us back to 2024 so when you know 01 recently models like just came out 我们回到 2024 年，o1 那批模型刚刚问世的时候 what is what was anthropic kind of like secretly working on back then it was coding Anthropic 当时在秘密做什么？是 coding。 and everyone knows they're working on coding like open AI meta ax everyone has full knowledge that anthropic was working on coding and they just like overlooked it they thought oh there are at B2B place 大家都知道他们在做 coding，OpenAI、Meta 所有人都清楚 Anthropic 在做 coding，但他们就是忽视了，以为 Anthropic 只是个 B to B 公司 they just want one vertical people think of coding as one vertical and now look at where we are today coding kind of like strong transfer learning from coding to reasoning to basically you know a monopoly in the in the future of reasoning and I think that's that's really really shocking the people who are working on coding 只做一个垂直领域。人们把 coding 当成一个垂直领域，但看看我们今天的情况：coding 带来了从推理到基本上垄断未来推理能力的强迁移学习。这让当时做 coding 的人都大感震惊， I think back then believe in something that we believe you know similarly with math and le now which is that if you have more structured and formal data it's going to be a lot more horizontal than the specific vertical we are tackling. 我认为他们当时相信的东西，和我们现在对数学和 Lean 的信念是一样的：如果你有更结构化、更正式的数据，它的水平迁移将远远超出我们正在攻克的特定垂直领域。 So you know if today we are doing you know math informal way like the standard train of thought data train a math model based on human preference then I would say well perhaps we are just a math startup right but you know while we are pursuing math we are also doing things that do have transfer learning to other to other domains um so I think that's kind of like the broader broader picture is that while the DNA of the company remains math and all of us are math nerds and this is a very strong culture statement. 如果今天我们用非正式的方式做数学，比如用标准的思维链数据、基于人类偏好来训练数学模型，那我可能会说我们只是一家数学公司。但是，在追求数学的同时，我们也在做确实能向其他领域迁移的事情。所以，更宏观的图景是：公司的 DNA 始终是数学，我们所有人都是数学痴，这是非常强烈的文化宣言。 Everyone has a great mission of having AI be a superhuman mathematician like we are seeing on punan on a batch of research conjectures. 每个人都怀有让 AI 成为超人类数学家的伟大使命，就像我们在普特南以及一批研究猜想上所看到的。 In fact, we have another batch coming. 事实上，我们还有下一批成果即将发布。 Um we're also thinking that this is going to be fundamental to verified reasoning and we kind of talk a little bit about verified AI. 我们也在思考，这将是经验证的推理的基础，这也是我们一直在谈的经验证的 AI。 I want to talk a little bit about verified AI next because I think you have another 我想接下来聊聊经验证的 AI，因为我觉得你还有 Yeah. 嗯。 Yeah. 嗯。 I have several things I want like uh so I want to hear about the verified AI. 我有好几个想问的，想听听经验证的 AI 这块。 I do want to dig in a little bit. 我确实想深挖一下。 So, do we know that you know Anthropic and OpenAI and everyone they're not doing formal verification and using that for their rollouts and whatever? 那么，我们知道 Anthropic、OpenAI 这些公司有没有在做形式化验证，并把它用于他们的产品发布？ I think I have a lot of like rumor mill that I probably shouldn't like put it on the record like I think you know like researchers talk they play card games. 我听说了不少小道消息，这些不太适合公开讲，研究员们会互相交流，打打牌聊聊天嘛。 Yeah. 对。 But there there really interesting reasons if they are or not doing it. 但无论他们做没做，背后都有非常有趣的原因。 I think that's like kind of the takeaway I have which is that if you're like at a frontier lab and the direction [clears throat] actually does change a lot for a lot of reasons beyond your control. 我的感受是，如果你在前沿实验室工作，方向其实会因为很多你控制不了的原因而频繁改变。 So I want to kind of like bring us back to the alpha proof moment right like alpha proof was such an amazing that really the 2024 um 28 out of 42 performance was the IMO moment for me. 所以我想把我们拉回到 AlphaProof 那个时刻，AlphaProof 真的很了不起，2024 年 42 题中的 28 题那个成绩，对我来说就是 IMO 时刻。 It was not gold in 2025 because across 2024 and 2025 AI models could solve all the problem that are not combinatorics. 2025 年没能拿金牌，因为从 2024 到 2025 年，AI 模型能解决所有非组合数学的题目。 The only difference is that you know if you get all the problems that are not combinotaurics you get 28 in 2024 and 35 in 2025 because there's only one combinatorics question in 2025. 区别在于，如果你把所有非组合数学的题都做对，2024 年得 28 分，2025 年得 35 分，因为 2025 年只有一道组合数学题。 um after alpha proof kind of like we didn't see a lot of the formal math uh you know results or kind of progress from Google deep mind and that's actually because of reasons that are not necessarily technical but if you're at a startup and you have very singular focus that is formal math and verified AI then um you know you get to work on really cool problem for a long time and you have like a lot a lot higher likelihood to get to where you want to be in terms of like progress and breakthrough unlock. AlphaProof 之后，我们没有看到太多来自 Google DeepMind 的正式数学结果或进展，但这其实并不完全是技术原因。如果你是一家初创公司，专注于形式化数学和经验证的 AI，你就能在一个很酷的问题上深耕很久，你在达到进展突破的概率也会高得多。 So yeah, just define that for us. 那帮我们定义一下这个概念。 Yeah, like a lot of people think about formal verification as an ancient you know subject. 很多人认为形式化验证是一门古老的学问。 Um it it existed like as long as you know way before like like deep learning and it existed in the time of rule-based computer science. 它的历史甚至比深度学习还久远，诞生于基于规则的计算机科学时代。 uh there's this really strong push of like formal verification around like since ever since 1980s uh really interesting historic anecdotes such as uh I think the Paris trade union demanded that the automatic switching of the subway system needs to be formally verified for safety purpose. 从 1980 年代起，形式化验证就有一股很强的推进力，有一些非常有趣的历史典故，比如巴黎的工会要求地铁自动换轨系统必须经过形式化验证，以保证安全。 So quite interesting trade union for for technology um and like I think around the time of challenger both before and after European space agency was using formal verification for the Arian spacecraft. 对技术的工会要求，挺有意思的。在挑战者号事故前后，欧洲航天局也在为阿丽亚娜飞船使用形式化验证。 Uh it's also interesting Boeing Airbus for verification and then more recent years right like I think like there's a lot of push about automated reasoning at AWS because they have a lot of enterprise customers that really requires things to be to be 100% you know verified and there's no edge cases mi uh like missed and like just general like testing doesn't satisfy the need. 波音、空客也在做验证。近些年，AWS 在自动推理方面有很大投入，因为他们有很多企业客户，要求系统百分之百经过验证、没有边缘案例遗漏，普通测试根本无法满足这种需求。 So a lot of people think about verification as something that's like annoying because it's like tax and compliance like it's making sure that we are good to go right and like that's really not the and and so we we talked about like verification 很多人把验证看成一件令人烦躁的事，像税务和合规一样，就是确保我们合规。但其实不是这样的，我们谈过验证 I think our competitor when they launched they talk about um formal verification pre-reasoning they talked about it uh in the time of hallucination and and maybe for them like formal verification is about the lousiness the the hallucination for us. 我认为我们的竞争对手在发布时，谈的是推理前的形式化验证，他们在幻觉盛行的时代谈这件事，对他们来说形式化验证可能是关于低劣性、关于幻觉的。 No, like for us verified AI is about the brilliance. 但对我们来说，经验证的 AI 是关于卓越的。 It's about scaling and compounding super intelligence. 是关于扩展和复利式积累超级智能的。 So this is quite a deep point and sometimes it takes a little bit of explanation. 这是一个很深刻的观点，有时需要一点解释。 So if you think about like you know the the place of brilliance for example ramen like he's a brilliant mathematician he was able to find a lot of like interesting formulas just by intuition before he know how to do proofs. 比如以拉马努金为例，他是一个才华横溢的数学家，能在真正学会如何证明之前，仅凭直觉发现大量有趣的公式。 So he went to Cambridge um you know work with Hardy and Littlewood and you know in the famous movie the man who knew infinity there's this like storyline of how hard it was for Hardy to force him to no longer rely on intuitions and do proofs. 他去了剑桥，与哈代和利特尔伍德合作，在著名的电影《知无涯者》里，有一条故事线讲述哈代如何费力地迫使他不再依赖直觉，而去写证明。 After he learned proof writing he came out as a much more powerful mathematician whose results um like intuitions turn into theorems and future generations of mathematicians build on that those theorems. 当他学会写证明之后，他成为了一个更强大的数学家，他的直觉变成了定理，后世的数学家在这些定理上继续构建。 So it is a way to kind of scale and compound uh the intelligence that we already have. 这就是一种扩展和复利式积累我们已有智识的方式。 Another example, mathematicians kind of have been writing code in English or their respective countries natural language for thousands of years. 另一个例子：数学家几千年来一直在用英语或各自国家的自然语言写代码。 And why do I call it writing code? 我为什么说是写代码？ Um because there's this sort of community standard of rigorous logical deduction. 因为这个社群有一套严格逻辑演绎的社区规范。 Everything has to be step by step correct. 每一步都必须是正确的。 Um otherwise you will get outcasted by your mass community. 否则你会被数学社区排斥。 uh like 呃，比如 the law well 就像法律一样， [laughter] [笑声] rules in the community. 这是社区的规则。 So, so it's interesting right because that is kind of human mathematician enforced right 所以这很有意思，因为这是由人类数学家来强制执行的，对吧 and so uh it's a peerreview process 这就是一个同行评审过程 peer review of a paper currently takes two years 一篇论文的同行评审目前需要两年时间 okay so but proof assistant and you know formal proof trackers like lean still found its place right 好，但是像 Lean 这样的证明辅助系统和正式证明检查器还是找到了它的位置，对吧