팟캐스트로 돌아가기 Dwarkesh Patel

AlphaGo를 처음부터 만들기 – Eric Jang

Today, I'm here with Eric Jang, who was most recently vice president of AI at 1X Technologies, and before that, senior research scientist at what is now Google DeepMind Robotics. 오늘은 Eric Jang을 모셨습니다. 가장 최근에는 1X Technologies에서 AI 부사장을 역임했고, 그 전에는 현재 Google DeepMind Robotics가 된 곳에서 수석 연구 과학자로 일했죠. You've been on sabbatical for the last few months. 최근 몇 달간 안식월을 보내셨죠. One of the things you've been doing is rebuilding, improving, and hacking on AlphaGo. 안식월 동안 하신 일 중 하나가 AlphaGo를 재구현하고 개선하면서 직접 손을 대는 작업이었습니다. Today, you're going to explain building AlphaGo from scratch and what it tells us about the future of AI research and development. 오늘은 AlphaGo를 처음부터 만드는 방법, 그리고 그게 AI 연구와 개발의 미래에 어떤 의미를 갖는지 설명해 주실 거예요. Before we get to that, why is AlphaGo interesting? 그 얘기 전에, AlphaGo가 왜 흥미로운 건가요? Why is this the project you decided to do on sabbatical rather than just hanging out at the beach? 안식월에 해변에서 쉬는 대신 이 프로젝트를 선택하신 이유가 뭔가요? I like making things, and AlphaGo and Go AI is one of those things that really got me into the field. 뭔가 만드는 걸 좋아하는데, AlphaGo랑 바둑 AI는 제가 이 분야에 입문하게 된 계기거든요. When I saw the early breakthroughs on AlphaGo in 2014, 2015, 2016 and so forth, it was profound to see how smart AI systems could become and the computational complexity class they could tackle with deep learning. 2014년, 2015년, 2016년에 AlphaGo가 초기 돌파구를 마련하는 걸 보면서, AI 시스템이 얼마나 똑똑해질 수 있는지, 그리고 딥러닝으로 어떤 계산 복잡도 클래스를 다룰 수 있는지 확인하면서 정말 놀라웠죠. This is a problem that has long been understood to be intractable for search, and yet it was solved through deep learning. 탐색 측면에서 오랫동안 다루기 어렵다고 여겨진 문제였는데, 딥러닝으로 풀렸거든요. That was quite mysterious to me, and I've always wanted to understand that phenomenon a little better. 저한테는 꽤 신비로운 일이었고, 그 현상을 좀 더 깊이 이해하고 싶었어요. My training is in deep neural nets for robotics, where the decisions made by the neural networks are a bit more intuitive. 제 배경은 로봇 공학을 위한 딥러닝인데, 거기선 신경망이 내리는 결정이 좀 더 직관적인 편이에요. But AlphaGo is a problem where the decisions are the result of a very, very deep search. 하지만 AlphaGo는 결정 하나하나가 아주 깊은 탐색의 결과물인 문제거든요. It's always been very mysterious to me how a ten-layer network can amortize the simulation of something so deep in the game tree. 열 레이어짜리 네트워크가 게임 트리 깊은 곳의 시뮬레이션을 어떻게 근사화하는지 항상 신기했어요. If you plot out how much compute it took to build various iterations of strong Go bots over the years, you can see that in 2020 there was an open-source project called KataGo by David Wu from Jane Street, which achieved a 40x reduction in the compute needed to train a really strong Go bot tabula rasa. 연도별로 강력한 바둑 봇을 만드는 데 얼마나 많은 컴퓨팅이 필요했는지 그래프로 그려보면, 2020년에 Jane Street의 David Wu가 만든 KataGo라는 오픈소스 프로젝트가 있었는데, 백지 상태에서 강력한 바둑 봇을 훈련하는 데 필요한 컴퓨팅을 40배 줄였다는 게 보여요. I'm not certain if it's stronger than AlphaGo Zero, AlphaZero, or MuZero, but it's very strong, and this is what most Go practitioners today train against when they're playing an AI. AlphaGo Zero나 AlphaZero, MuZero보다 강한지는 확실하지 않지만 매우 강력하고, 오늘날 대부분의 바둑 기사들이 AI와 대국할 때 이걸 상대로 훈련해요. Thanks to LLM coding, what took a whole team of research scientists at DeepMind and millions of dollars of research and compute can now be done for a few thousand dollars of rented compute. LLM 코딩 덕분에, DeepMind의 연구 과학자 팀 전체와 수백만 달러의 연구·컴퓨팅이 필요했던 일을 이제는 몇 천 달러의 임대 컴퓨팅으로 해낼 수 있게 됐죠. We should first discuss how Go works. 먼저 바둑이 어떻게 작동하는지 얘기해봐야겠네요. How does the game work? 게임이 어떻게 진행되죠? Go is a very simple game that can be implemented quickly and easily on a computer. 바둑은 컴퓨터에서 빠르고 쉽게 구현할 수 있는 매우 단순한 게임이에요. The objective is to put down black and white stones and try to occupy as much territory as possible. 목표는 흑돌과 백돌을 놓아 최대한 많은 영역을 차지하는 거예요. I might start by putting down a black stone. 흑돌을 먼저 놓을게요. Black always goes first. 흑이 항상 먼저 둡니다. Go ahead. 두세요. The way you capture an opponent's stones is that for every intersection, if you can surround all four of its neighbors with your stones, then it's cut off from oxygen, if you will, and it's a dead stone. 상대 돌을 잡는 방법은, 각 교차점에서 네 방향의 이웃 모두를 내 돌로 에워싸면 그 돌이 산소가 끊긴 것처럼 되어 죽은 돌이 되는 거예요. Now I control these four stones as well as this empty intersection here. 이제 이 네 돌과 이 빈 교차점을 제가 지배하게 됐어요. There are slight variations between Chinese, Japanese, and what are called Tromp-Taylor rules. 중국식, 일본식, 그리고 트롬프-테일러 규칙 사이에 약간의 차이가 있어요. Tromp-Taylor rules are designed to be completely unambiguous, so this is what all Go AIs train and resolve against. 트롬프-테일러 규칙은 완전히 명확하게 설계되어 있어서 모든 바둑 AI가 이 규칙으로 훈련하고 판정해요. In typical Go, when humans play, you're actually not allowed to put this white stone down here. 일반 바둑에서 사람들이 두면 여기에 이 백돌을 실제로 놓을 수 없어요. It would be instant suicide. 즉사하게 되거든요. In Tromp-Taylor, it's actually fine. 트롬프-테일러 규칙에서는 사실 괜찮아요. You put it down, and it immediately resolves to death, so the outcome is the same. 놓으면 바로 죽는 걸로 처리되니까 결과는 같아요. Let's start over and play a few stones, and then I'll explain some more. 처음부터 시작해서 돌 몇 개를 두고 좀 더 설명할게요. I'll just start there. 여기서 시작할게요. I'm basically playing randomly here, but I'm trying to get around your stones and see if I can surround them. 여기선 거의 무작위로 두고 있지만, 상대 돌을 에워싸 포위할 수 있는지 보려고 해요. This move exposes one empty neighbor for your white stone. 이 수가 백돌의 빈 이웃 한 자리를 노출시켰어요. It's akin to a check in chess. 체스의 체크와 비슷한 상황이에요. If you don't respond immediately by putting one here, then I can immediately capture this. 여기에 바로 응수하지 않으면 제가 바로 이걸 잡을 수 있어요. I see. 알겠어요. Because it's the diagonals that determine whether you're surrounded or not. 포위 여부를 결정하는 건 대각선이 아니라 상하좌우거든요. The cross-section, not the diagonals. 대각선이 아니라 교차 방향이죠. This one is surrounded on three sides, so you're at threat of losing that stone if you don't play one immediately there. 이게 세 방향으로 막혀 있으니까, 여기에 바로 안 두면 저 돌을 잃을 위험이 있어요. Now you can see that I'm starting to pressure you, because by putting a stone here, you're forced to put one here. 이제 제가 압박하기 시작한다는 게 보이죠. 여기에 돌을 두면 상대는 여기에 둘 수밖에 없어요. Otherwise, you would have this two-block to yourself. 그렇지 않으면 이 두 칸을 상대방이 독차지하게 되니까요. Yes. 네. And if you think through what happens if you were to respond here, you can probably search into the future and deduce what I'll do in response once you do that. 여기에 응수하면 어떻게 되는지 따져보면, 제가 어떻게 대응할지 미래를 내다보고 추론할 수 있을 거예요. You have a lot of confidence in my abilities, but I'm guessing you'd put the black here. 제 능력을 높이 사주시는데, 여기에 흑을 두실 것 같아요. That's right, and then I would capture all three of these stones. 맞아요, 그러면 이 세 돌을 다 잡게 되죠. So I should just assume that this little block is gone. 그러면 이 작은 덩어리는 없어진다고 봐야겠네요. Yes. 네. In Go, it's actually okay to let an opponent capture some stones if, for example, it lets you position to capture more stones somewhere else on the board. 바둑에서는 상대가 일부 돌을 잡더라도, 예를 들어 다른 곳에서 더 많은 돌을 잡을 수 있는 위치를 잡는다면 허용하는 게 괜찮아요. This is what makes 이게 바로 Go a beautiful game: you can lose the battle but win the war. 바둑이 아름다운 게임인 이유예요. 전투에서 지더라도 전쟁에서 이길 수 있거든요. As the board size increases, the complexity of these micro versus macro dynamics gets more interesting. 판 크기가 커질수록 이런 미시적·거시적 역학의 복잡성이 더 흥미로워져요. Presumably you'd put one here. 아마 여기에 두시겠죠. So now I would capture this entire group, and this would be mine. 그럼 이 그룹 전체를 잡고 이게 제 것이 되겠네요. There's one more case I want to demonstrate, which I actually had a bug in my code for recently. 한 가지 더 보여드리고 싶은 경우가 있어요. 최근 제 코드에서 버그를 발견한 부분이기도 해요. Let's consider a formation like this, with other pieces on the board in play. 이런 형태를 생각해봐요. 판에 다른 돌들도 있는 상태에서요. Let's talk about how the game ends. 게임이 어떻게 끝나는지 얘기해볼게요. In this territory, who controls these areas? 이 영역에서 누가 이 구역을 지배하고 있죠? Is it white or black? 백인가요, 흑인가요? White. 백이요. It's actually black, because I've surrounded this whole area. 사실 흑이에요. 제가 이 전체 구역을 에워쌌거든요. Assuming I have other black stones here, it's very hard for you to break this out of the control of these stones. 여기에 다른 흑돌들이 있다고 가정하면, 이 돌들의 지배권에서 벗어나기가 매우 어려워요. And when the final score is tallied, would these ones also count as being in... 최종 집계 때 이 돌들도 포함되나요? Great question. 좋은 질문이에요. This is where different rule sets have different ways of scoring. 여기서 규칙 세트마다 계산 방식이 달라요. We should talk about how you resolve scores between humans and how you resolve scores in computer 사람들 사이에서 집을 어떻게 세는지, 그리고 컴퓨터 바둑에서는 Go, because there's some ambiguity in how humans evaluate this. 어떻게 세는지 얘기해야 해요. 사람들이 이걸 평가하는 방식에 약간 모호함이 있거든요. Most humans would look at this board configuration and conclude that black has totally surrounded white, and white has no chance of life. 대부분의 사람은 이 판 배치를 보고 흑이 백을 완전히 에워쌌고 백이 살 가능성이 없다고 결론 내릴 거예요. We could play out more here, but at the end I would capture everything. 더 두면 되겠지만, 마지막에 전부 잡게 될 거예요. However, if you have a way of breaking this formation and connecting white to something outside of it, then it can flip. 하지만 이 형태를 부수고 백을 외부와 연결하는 방법이 있다면 뒤집힐 수도 있어요. This is where it's a little bit hard for a computer to decide these kinds of things. 이런 상황이 컴퓨터 입장에서 판단하기 조금 어려운 부분이에요. How do humans do it? 사람은 어떻게 판단하죠? It's worth thinking about how humans resolve this, because this will map later to how we think about the deep neural network. 사람들이 이걸 어떻게 해결하는지 생각해볼 가치가 있어요. 나중에 심층 신경망을 생각하는 방식으로 이어지거든요. Humans basically say,"I think the game is done," and then you have to also say,"I think the game is done." 사람들은 기본적으로 '게임이 끝난 것 같아'라고 하고, 상대방도 '끝난 것 같아'라고 해야 해요. Then we'll say,"I think these are my stones," and you have to agree. 그러고 나서 '이게 내 돌이에요'라고 하면 상대가 동의해야 해요. If you don't agree, we keep playing. 동의하지 않으면 계속 두는 거예요. Essentially, once two humans—their so-called value function—agree on a consensus, then the Chinese rules resolve that. 사실상 두 사람의 이른바 가치함수가 합의에 이르면 중국 규칙으로 해결되는 거예요. In Tromp-Taylor scoring, it's perfectly unambiguous, so it can be decided algorithmically by a computer. 트롬프-테일러 방식에서는 완전히 명확해서 컴퓨터 알고리즘으로 결정할 수 있어요. If you have this at the endgame, the way you score it is that you first count how many stones you control, and that's unambiguous. 종반전에 이런 상황이면, 먼저 자신이 지배하는 돌이 몇 개인지 세는데, 이건 명확해요. Then you count how many empty intersections are not touched by your opponent's stones. 그런 다음 상대 돌에 닿지 않은 빈 교차점이 몇 개인지 세는 거예요. These intersections would not count for either player, because all of these intersections are connected to both white stones and black stones. 이 교차점들은 어느 쪽에도 집으로 안 들어가요. 백돌과 흑돌 모두에 연결되어 있으니까요. If this were like this, then white would get three points. 만약 이런 형태라면 백이 세 집을 얻겠죠. This is a little odd because a human would know that white is actually losing these points. 이게 좀 이상한 건, 사람이라면 백이 실제로 이 점들을 잃고 있다는 걸 알 텐데요. But Tromp-Taylor scoring would consider white to have all of these points as well as these points. 하지만 트롬프-테일러 방식에서는 백이 이 점들도 다 갖는 것으로 계산돼요. So that is a very big difference in how computer Go scores things and how humans score things. 그게 컴퓨터 바둑과 사람 바둑이 집 계산하는 방식의 큰 차이예요. How does the game end? 게임은 어떻게 끝나죠? The game ends when either a player chooses to resign or both players pass consecutively. 한 쪽이 기권하거나, 양쪽이 연속으로 패스하면 게임이 끝나요. Those are the rules. 규칙은 그래요. Now help me crack this with AI. 이제 AI로 이걸 어떻게 푸는지 알려주세요. Let's understand how AlphaGo actually works and how somebody in the audience might be able to implement it. AlphaGo가 실제로 어떻게 작동하는지, 그리고 청중 중 누가 구현할 수 있는지 이해해봐요. Let's start with an intuition about the underlying search process used to make moves, and we'll layer on ideas from deep learning to make it much more efficient and tractable. 수를 결정하는 기본 탐색 과정의 직관부터 시작해서, 딥러닝 아이디어를 더해 훨씬 효율적이고 다루기 쉽게 만드는 과정을 살펴볼게요. Go is a game with just two players. 바둑은 두 명이 하는 게임이에요. We're going to draw a person here, and we're going to draw an AI here. 여기에 사람을 그리고, 여기에 AI를 그릴게요. Let's say this person is playing black, so they go first. 이 사람이 흑을 두니까 먼저 시작해요. They go here. 여기에 두네요. Now the AI is going to make a move based on what it sees here. 이제 AI는 이 상황을 보고 수를 결정해요. There's a question of how you encode these inputs into the AI. 이 입력을 AI에 어떻게 인코딩하느냐는 질문이 있어요. Maybe you could use ones and zeros, but you want to represent black, white, and empty. 1과 0을 쓸 수도 있지만, 흑, 백, 빈 칸을 표현해야 해요. You would need at least three different values. 세 가지 값이 필요하죠.