ポッドキャストに戻るDwarkesh Patel

AlphaGoをゼロから作る — Eric Jang

Today, I'm here with Eric Jang, who was most recently vice president of AI at 1X Technologies, and before that, senior research scientist at what is now Google DeepMind Robotics. 本日は Eric Jang さんにお越しいただきました。直近は 1X Technologies で AI 担当バイスプレジデントを務め、それ以前は現在の Google DeepMind Robotics でシニアリサーチサイエンティストとして働いていました。 You've been on sabbatical for the last few months. ここ数ヶ月はサバティカル中だったんですよね。 One of the things you've been doing is rebuilding, improving, and hacking on AlphaGo. その間にやっていたことのひとつが、AlphaGo の再構築・改良・ハッキングです。 Today, you're going to explain building AlphaGo from scratch and what it tells us about the future of AI research and development. 今日は AlphaGo をゼロから構築することと、それが AI 研究開発の未来について何を示唆するかを教えていただきます。 Before we get to that, why is AlphaGo interesting? その前にまず、なぜ AlphaGo が面白いのか聞かせてください。 Why is this the project you decided to do on sabbatical rather than just hanging out at the beach? なぜビーチでのんびりするのではなく、サバティカルでこのプロジェクトを選んだんですか？ I like making things, and AlphaGo and Go AI is one of those things that really got me into the field. 物を作るのが好きで、AlphaGo と Go AI はこの分野に引き込んでくれたものの一つなんです。 When I saw the early breakthroughs on AlphaGo in 2014, 2015, 2016 and so forth, it was profound to see how smart AI systems could become and the computational complexity class they could tackle with deep learning. 2014年、2015年、2016年あたりに AlphaGo の初期ブレークスルーを見たとき、AI システムがいかに賢くなれるか、そして深層学習でどれほどの計算複雑性クラスに挑めるかを目の当たりにして、本当に衝撃を受けました。 This is a problem that has long been understood to be intractable for search, and yet it was solved through deep learning. これは探索にとって長らく手に負えないとされてきた問題で、それが深層学習によって解かれたんです。 That was quite mysterious to me, and I've always wanted to understand that phenomenon a little better. 私にはずっと不思議で、その現象をもう少し深く理解したいとずっと思っていました。 My training is in deep neural nets for robotics, where the decisions made by the neural networks are a bit more intuitive. 私の専門はロボティクス向けの深層ニューラルネットで、ニューラルネットの判断はどちらかというと直感的に理解しやすい。 But AlphaGo is a problem where the decisions are the result of a very, very deep search. でも AlphaGo は、判断が非常に深い探索の結果として出てくる問題です。 It's always been very mysterious to me how a ten-layer network can amortize the simulation of something so deep in the game tree. 10層のネットワークが、ゲームツリーのあれほど深いシミュレーションをどうやって効率的に近似できるのか、ずっと謎でした。 If you plot out how much compute it took to build various iterations of strong Go bots over the years, you can see that in 2020 there was an open-source project called KataGo by David Wu from Jane Street, which achieved a 40x reduction in the compute needed to train a really strong Go bot tabula rasa. 歴代の強い囲碁 bot を構築するのにかかった計算量を時系列でプロットすると、2020年に Jane Street の David Wu による KataGo というオープンソースプロジェクトが登場し、ゼロから強い囲碁 bot を訓練するのに必要な計算量を 40分の1 に削減したのがわかります。 I'm not certain if it's stronger than AlphaGo Zero, AlphaZero, or MuZero, but it's very strong, and this is what most Go practitioners today train against when they're playing an AI. AlphaGo Zero、AlphaZero、MuZero より強いかどうかは確かではありませんが、非常に強く、今日ほとんどの囲碁実践者が AI 対局で使うのはこれです。 Thanks to LLM coding, what took a whole team of research scientists at DeepMind and millions of dollars of research and compute can now be done for a few thousand dollars of rented compute. LLM コーディングのおかげで、DeepMind の研究科学者チーム全体と数百万ドルの研究・計算コストが必要だったものが、今では数千ドルのレンタル計算量でできるようになりました。 We should first discuss how Go works. まず囲碁のルールについて話しましょう。 How does the game work? どんなゲームなんですか？ Go is a very simple game that can be implemented quickly and easily on a computer. 囲碁はとてもシンプルなゲームで、コンピュータ上で素早く簡単に実装できます。 The objective is to put down black and white stones and try to occupy as much territory as possible. 目標は白と黒の石を置いて、できるだけ多くの陣地を占領することです。 I might start by putting down a black stone. 黒石を置いてみましょうか。 Black always goes first. 黒が常に先手です。 Go ahead. どうぞ。 The way you capture an opponent's stones is that for every intersection, if you can surround all four of its neighbors with your stones, then it's cut off from oxygen, if you will, and it's a dead stone. 相手の石を取るには、各交点で四方を自分の石で囲めば、その石は酸素を断たれ、いわば死に石になります。 Now I control these four stones as well as this empty intersection here. 今、私はこの四つの石とここの空き交点を支配しています。 There are slight variations between Chinese, Japanese, and what are called Tromp-Taylor rules. 中国ルール、日本ルール、トロンプ・テイラールールといった若干の違いがあります。 Tromp-Taylor rules are designed to be completely unambiguous, so this is what all Go AIs train and resolve against. トロンプ・テイラールールは完全に曖昧さがないよう設計されており、すべての囲碁 AI はこれで訓練・判定しています。 In typical Go, when humans play, you're actually not allowed to put this white stone down here. 通常の人間が対局する囲碁では、ここに白石を置くことは実際には許可されていません。 It would be instant suicide. 即座に自滅になります。 In Tromp-Taylor, it's actually fine. トロンプ・テイラールールでは、実は問題ありません。 You put it down, and it immediately resolves to death, so the outcome is the same. 石を置くと即座に死が確定するので、結果は同じです。 Let's start over and play a few stones, and then I'll explain some more. 最初からやり直して石をいくつか打ちましょう、その後でもう少し説明します。 I'll just start there. ではここから始めます。 I'm basically playing randomly here, but I'm trying to get around your stones and see if I can surround them. とりあえずランダムに打っていますが、あなたの石を囲もうとしています。 This move exposes one empty neighbor for your white stone. この手で白石の空点が1つ露わになりました。 It's akin to a check in chess. チェスの王手みたいなものです。 If you don't respond immediately by putting one here, then I can immediately capture this. すぐにここへ返さないと、即座にこれを取られてしまいます。 I see. なるほど。 Because it's the diagonals that determine whether you're surrounded or not. 囲まれるかどうかを決めるのは斜めの隣接点なんですか？ The cross-section, not the diagonals. 縦横の隣接点です、斜めではなく。 This one is surrounded on three sides, so you're at threat of losing that stone if you don't play one immediately there. これは 3 方向を囲まれているので、すぐにここへ打たないと石を取られる脅威があります。 Now you can see that I'm starting to pressure you, because by putting a stone here, you're forced to put one here. ここに石を置くことで圧力をかけ始めています。ここに打てば、あなたはここへ返さざるを得ません。 Otherwise, you would have this two-block to yourself. そうしなければ、この 2 点のブロックを丸ごと取られてしまうわけですね。 Yes. そうです。 And if you think through what happens if you were to respond here, you can probably search into the future and deduce what I'll do in response once you do that. ここへ返したらどうなるか考えてみると、未来を読んで私がどう返すか推測できるはずです。 You have a lot of confidence in my abilities, but I'm guessing you'd put the black here. 随分と買いかぶってもらっていますが、あなたなら黒をここへ打つかなと思います。 That's right, and then I would capture all three of these stones. その通りで、そうしたら 3 つの石をまとめて取れます。 So I should just assume that this little block is gone. じゃあ、このブロックは諦めるということですね。 Yes. はい。 In Go, it's actually okay to let an opponent capture some stones if, for example, it lets you position to capture more stones somewhere else on the board. 囲碁では、たとえば盤の別の場所でより多くの石を取れる体勢を作るためであれば、相手に石を取らせることも戦略のうちです。 This is what makes これが囲碁を Go a beautiful game: you can lose the battle but win the war. 美しいゲームたらしめている理由で、局地戦で負けても大局で勝つことができます。 As the board size increases, the complexity of these micro versus macro dynamics gets more interesting. 盤が大きくなるほど、ミクロとマクロの駆け引きの複雑さが面白くなっていきます。 Presumably you'd put one here. たぶんここへ打ちますよね。 So now I would capture this entire group, and this would be mine. じゃあ今度はこのグループ全体を取って、ここが私のものになります。 There's one more case I want to demonstrate, which I actually had a bug in my code for recently. もう一つ示したいケースがあります。実はつい最近、自分のコードにバグがあった部分です。 Let's consider a formation like this, with other pieces on the board in play. 他の石も盤に乗った状態で、こういう形を考えてみましょう。 Let's talk about how the game ends. では、ゲームがどう終わるか話しましょう。 In this territory, who controls these areas? この陣地では、どちらがこのエリアを支配していますか？ Is it white or black? 白ですか、黒ですか？ White. 白です。 It's actually black, because I've surrounded this whole area. 実は黒です。このエリア全体を囲んでいるので。 Assuming I have other black stones here, it's very hard for you to break this out of the control of these stones. ここに他の黒石があるとすると、この石の支配から抜け出すのはとても難しい。 And when the final score is tallied, would these ones also count as being in... 最終的にカウントする際、これらも含まれますか… Great question. いい質問ですね。 This is where different rule sets have different ways of scoring. ここが、ルールセットによってスコアリングの方法が異なるところです。 We should talk about how you resolve scores between humans and how you resolve scores in computer 人間同士でどう得点を決めるか、そしてコンピュータ囲碁でどう決めるかについて話しましょう。 Go, because there's some ambiguity in how humans evaluate this. 人間の評価にはいくつかの曖昧さがあるので。 Most humans would look at this board configuration and conclude that black has totally surrounded white, and white has no chance of life. ほとんどの人はこの盤面を見て、黒が白を完全に囲んでおり、白は生きる見込みがないと判断します。 We could play out more here, but at the end I would capture everything. もう少し打ち続けることもできますが、最終的にすべて取ることになります。 However, if you have a way of breaking this formation and connecting white to something outside of it, then it can flip. ただし、この陣形を崩して白が外側の石につながる方法があれば、形勢が逆転することもあります。 This is where it's a little bit hard for a computer to decide these kinds of things. ここがコンピュータにとって判断が難しい部分です。 How do humans do it? 人間はどうするんですか？ It's worth thinking about how humans resolve this, because this will map later to how we think about the deep neural network. 人間がどう解決するかを考えることは重要で、後で深層ニューラルネットの話につながります。 Humans basically say,"I think the game is done," and then you have to also say,"I think the game is done." 人間は基本的に「ゲームはもう終わりだと思う」と言い、相手も「終わりだと思う」と言う必要があります。 Then we'll say,"I think these are my stones," and you have to agree. それからそれぞれが「これは私の石だと思う」と言い、相手が同意しなければなりません。 If you don't agree, we keep playing. 同意しなければ、続きを打ちます。 Essentially, once two humans—their so-called value function—agree on a consensus, then the Chinese rules resolve that. 要は、2人の人間の価値判断が合意に達したとき、中国ルールでその合意に基づいてスコアを決めます。 In Tromp-Taylor scoring, it's perfectly unambiguous, so it can be decided algorithmically by a computer. Tromp-Taylor スコアリングは完全に一義的なので、アルゴリズムでコンピュータが決定できます。 If you have this at the endgame, the way you score it is that you first count how many stones you control, and that's unambiguous. 終盤でこういう状態になったとき、スコアの数え方はまず自分が支配している石の数を数えます。それは明確です。 Then you count how many empty intersections are not touched by your opponent's stones. 次に、相手の石に接していない空点を数えます。 These intersections would not count for either player, because all of these intersections are connected to both white stones and black stones. この交点はどちらの点にもカウントされません。白石にも黒石にも接しているからです。 If this were like this, then white would get three points. もしこうなっていたら、白は 3 点を得ます。 This is a little odd because a human would know that white is actually losing these points. 人間的には白が実際にこれらの点を失うとわかっているので、少し奇妙に感じます。 But Tromp-Taylor scoring would consider white to have all of these points as well as these points. でも Tromp-Taylor スコアリングでは、白はこれらの点もすべて持っているとみなします。 So that is a very big difference in how computer Go scores things and how humans score things. コンピュータ囲碁のスコアリングと人間のスコアリングには大きな違いがあるわけです。 How does the game end? ゲームはどう終わりますか？ The game ends when either a player chooses to resign or both players pass consecutively. どちらかが投了するか、両者が連続してパスしたとき終了します。 Those are the rules. それがルールです。 Now help me crack this with AI. では、AI でこれをどう攻略するか教えてください。 Let's understand how AlphaGo actually works and how somebody in the audience might be able to implement it. AlphaGo が実際にどう動くか、そして視聴者の皆さんが実装できるかどうかを理解しましょう。 Let's start with an intuition about the underlying search process used to make moves, and we'll layer on ideas from deep learning to make it much more efficient and tractable. 手を決めるための基本的な探索プロセスの直感から始めて、そこに深層学習のアイデアを重ねて効率化・実現可能化していきます。 Go is a game with just two players. 囲碁は 2 人のゲームです。 We're going to draw a person here, and we're going to draw an AI here. ここに人間を描いて、ここに AI を描きます。 Let's say this person is playing black, so they go first. この人間が黒を持つとしましょう。黒が先手です。 They go here. ここへ打ちます。 Now the AI is going to make a move based on what it sees here. 次に AI がここで見えている状態をもとに手を打ちます。 There's a question of how you encode these inputs into the AI. この入力を AI にどうエンコードするかという問題があります。 Maybe you could use ones and zeros, but you want to represent black, white, and empty. 0 と 1 を使うことも考えられますが、黒・白・空点を表現する必要があります。 You would need at least three different values. 少なくとも 3 種類の値が必要です。