Назад к подкастам Dwarkesh Patel

Building AlphaGo from scratch – Eric Jang

Today, I'm here with Eric Jang, who was most recently vice president of AI at 1X Technologies, and before that, senior research scientist at what is now Google DeepMind Robotics. Hoy estoy con Eric Jang, quien fue recientemente vicepresidente de IA en 1X Technologies y, antes de eso, investigador científico sénior en lo que ahora es Google DeepMind Robotics. You've been on sabbatical for the last few months. Has estado de sabático los últimos meses. One of the things you've been doing is rebuilding, improving, and hacking on AlphaGo. Una de las cosas que has estado haciendo es reconstruir, mejorar y experimentar con AlphaGo. Today, you're going to explain building AlphaGo from scratch and what it tells us about the future of AI research and development. Hoy vas a explicar cómo construir AlphaGo desde cero y qué nos dice eso sobre el futuro de la investigación y el desarrollo en IA. Before we get to that, why is AlphaGo interesting? Antes de llegar a eso, ¿por qué es interesante AlphaGo? Why is this the project you decided to do on sabbatical rather than just hanging out at the beach? ¿Por qué este fue el proyecto que elegiste hacer en tu sabático en lugar de simplemente relajarte en la playa? I like making things, and AlphaGo and Go AI is one of those things that really got me into the field. Me gusta construir cosas, y AlphaGo junto con la IA de Go es una de esas cosas que realmente me introdujeron al campo. When I saw the early breakthroughs on AlphaGo in 2014, 2015, 2016 and so forth, it was profound to see how smart AI systems could become and the computational complexity class they could tackle with deep learning. Cuando vi los primeros avances de AlphaGo en 2014, 2015, 2016 y siguientes, fue impresionante ver lo inteligentes que podían volverse los sistemas de IA y la clase de complejidad computacional que podían abordar con el aprendizaje profundo. This is a problem that has long been understood to be intractable for search, and yet it was solved through deep learning. Este es un problema que durante mucho tiempo se entendió como intratable para la búsqueda, y sin embargo fue resuelto mediante el aprendizaje profundo. That was quite mysterious to me, and I've always wanted to understand that phenomenon a little better. Eso me resultaba bastante misterioso, y siempre quise entender ese fenómeno un poco mejor. My training is in deep neural nets for robotics, where the decisions made by the neural networks are a bit more intuitive. Mi formación es en redes neuronales profundas para robótica, donde las decisiones tomadas por las redes neuronales son un poco más intuitivas. But AlphaGo is a problem where the decisions are the result of a very, very deep search. Pero AlphaGo es un problema en el que las decisiones son el resultado de una búsqueda muy, muy profunda. It's always been very mysterious to me how a ten-layer network can amortize the simulation of something so deep in the game tree. Siempre me ha parecido muy misterioso cómo una red de diez capas puede amortizar la simulación de algo tan profundo en el árbol de juego. If you plot out how much compute it took to build various iterations of strong Go bots over the years, you can see that in 2020 there was an open-source project called KataGo by David Wu from Jane Street, which achieved a 40x reduction in the compute needed to train a really strong Go bot tabula rasa. Si graficas cuánto cómputo se necesitó para construir distintas iteraciones de bots de Go fuertes a lo largo de los años, puedes ver que en 2020 hubo un proyecto de código abierto llamado KataGo, de David Wu de Jane Street, que logró una reducción de 40x en el cómputo necesario para entrenar un bot de Go muy fuerte desde cero. I'm not certain if it's stronger than AlphaGo Zero, AlphaZero, or MuZero, but it's very strong, and this is what most Go practitioners today train against when they're playing an AI. No estoy seguro de si es más fuerte que AlphaGo Zero, AlphaZero o MuZero, pero es muy fuerte, y es contra lo que la mayoría de los practicantes de Go hoy entrenan cuando juegan contra una IA. Thanks to LLM coding, what took a whole team of research scientists at DeepMind and millions of dollars of research and compute can now be done for a few thousand dollars of rented compute. Gracias al código con LLM, lo que le tomó a todo un equipo de investigadores en DeepMind y millones de dólares en investigación y cómputo ahora puede hacerse por unos pocos miles de dólares de cómputo rentado. We should first discuss how Go works. Primero deberíamos hablar de cómo funciona Go. How does the game work? ¿Cómo funciona el juego? Go is a very simple game that can be implemented quickly and easily on a computer. Go es un juego muy simple que puede implementarse rápidamente y con facilidad en una computadora. The objective is to put down black and white stones and try to occupy as much territory as possible. El objetivo es colocar piedras negras y blancas e intentar ocupar la mayor cantidad de territorio posible. I might start by putting down a black stone. Podría empezar colocando una piedra negra. Black always goes first. Las negras siempre van primero. Go ahead. Adelante. The way you capture an opponent's stones is that for every intersection, if you can surround all four of its neighbors with your stones, then it's cut off from oxygen, if you will, and it's a dead stone. La forma de capturar las piedras del oponente es que, para cada intersección, si puedes rodear sus cuatro vecinos con tus piedras, queda sin oxígeno, por así decirlo, y es una piedra muerta. Now I control these four stones as well as this empty intersection here. Ahora controlo estas cuatro piedras, así como esta intersección vacía de aquí. There are slight variations between Chinese, Japanese, and what are called Tromp-Taylor rules. Hay ligeras variaciones entre las reglas chinas, japonesas y las llamadas reglas Tromp-Taylor. Tromp-Taylor rules are designed to be completely unambiguous, so this is what all Go AIs train and resolve against. Las reglas Tromp-Taylor están diseñadas para ser completamente inequívocas, por eso es contra lo que todas las IAs de Go entrenan y resuelven. In typical Go, when humans play, you're actually not allowed to put this white stone down here. En el Go típico, cuando los humanos juegan, no está permitido colocar esta piedra blanca aquí. It would be instant suicide. Sería un suicidio instantáneo. In Tromp-Taylor, it's actually fine. En Tromp-Taylor, en realidad está bien. You put it down, and it immediately resolves to death, so the outcome is the same. La colocas ahí, y de inmediato se resuelve en muerte, así que el resultado es el mismo. Let's start over and play a few stones, and then I'll explain some more. Empecemos de nuevo y juguemos algunas piedras, y luego explico un poco más. I'll just start there. Empiezo por aquí. I'm basically playing randomly here, but I'm trying to get around your stones and see if I can surround them. Básicamente estoy jugando al azar, pero intento rodear tus piedras y ver si puedo encerrarlas. This move exposes one empty neighbor for your white stone. Este movimiento deja al descubierto un vecino vacío de tu piedra blanca. It's akin to a check in chess. Es similar a un jaque en el ajedrez. If you don't respond immediately by putting one here, then I can immediately capture this. Si no respondes de inmediato poniendo una aquí, puedo capturarla al instante. I see. Ya veo. Because it's the diagonals that determine whether you're surrounded or not. Porque son las conexiones ortogonales las que determinan si estás rodeado o no. The cross-section, not the diagonals. La intersección en cruz, no las diagonales. This one is surrounded on three sides, so you're at threat of losing that stone if you don't play one immediately there. Esta está rodeada por tres lados, así que corres el riesgo de perder esa piedra si no juegas una ahí de inmediato. Now you can see that I'm starting to pressure you, because by putting a stone here, you're forced to put one here. Ahora puedes ver que empiezo a presionarte, porque al poner una piedra aquí, te ves obligado a poner una aquí. Otherwise, you would have this two-block to yourself. De lo contrario, tendrías este bloque de dos para ti solo. Yes. Sí. And if you think through what happens if you were to respond here, you can probably search into the future and deduce what I'll do in response once you do that. Y si piensas en lo que ocurriría si respondieras aquí, probablemente puedes proyectar hacia el futuro y deducir lo que haré en respuesta cuando lo hagas. You have a lot of confidence in my abilities, but I'm guessing you'd put the black here. Tienes mucha confianza en mis capacidades, pero supongo que pondrías la negra aquí. That's right, and then I would capture all three of these stones. Exacto, y entonces capturaría estas tres piedras. So I should just assume that this little block is gone. Así que simplemente debo asumir que este pequeño bloque ya está perdido. Yes. Sí. In Go, it's actually okay to let an opponent capture some stones if, for example, it lets you position to capture more stones somewhere else on the board. En Go, está bien dejar que el oponente capture algunas piedras si, por ejemplo, eso te permite posicionarte para capturar más piedras en otro lugar del tablero. This is what makes Esto es lo que hace Go a beautiful game: you can lose the battle but win the war. al Go un juego hermoso: puedes perder la batalla pero ganar la guerra. As the board size increases, the complexity of these micro versus macro dynamics gets more interesting. A medida que aumenta el tamaño del tablero, la complejidad de estas dinámicas micro versus macro se vuelve más interesante. Presumably you'd put one here. Supongo que pondrías una aquí. So now I would capture this entire group, and this would be mine. Así que ahora capturaría todo este grupo, y esto sería mío. There's one more case I want to demonstrate, which I actually had a bug in my code for recently. Hay un caso más que quiero demostrar, que de hecho tuve como un bug en mi código hace poco. Let's consider a formation like this, with other pieces on the board in play. Consideremos una formación así, con otras piezas en juego en el tablero. Let's talk about how the game ends. Hablemos de cómo termina el juego. In this territory, who controls these areas? En este territorio, ¿quién controla estas áreas? Is it white or black? ¿Es blanco o negro? White. Blanco. It's actually black, because I've surrounded this whole area. En realidad es negro, porque he rodeado toda esta área. Assuming I have other black stones here, it's very hard for you to break this out of the control of these stones. Suponiendo que tengo otras piedras negras aquí, es muy difícil que puedas romper el control de estas piedras. And when the final score is tallied, would these ones also count as being in... Y cuando se suma la puntuación final, ¿estas también contarían como dentro de... Great question. Buena pregunta. This is where different rule sets have different ways of scoring. Aquí es donde los distintos conjuntos de reglas tienen formas diferentes de puntuar. We should talk about how you resolve scores between humans and how you resolve scores in computer Deberíamos hablar de cómo se resuelven los puntajes entre humanos y cómo se resuelven en Go computacional, Go, because there's some ambiguity in how humans evaluate this. porque hay cierta ambigüedad en cómo los humanos evalúan esto. Most humans would look at this board configuration and conclude that black has totally surrounded white, and white has no chance of life. La mayoría de los humanos miraría esta configuración del tablero y concluiría que negro ha rodeado completamente a blanco, y que blanco no tiene ninguna posibilidad de vivir. We could play out more here, but at the end I would capture everything. Podríamos seguir jugando aquí, pero al final capturaría todo. However, if you have a way of breaking this formation and connecting white to something outside of it, then it can flip. Sin embargo, si tienes una manera de romper esta formación y conectar blanco con algo fuera de ella, entonces puede invertirse. This is where it's a little bit hard for a computer to decide these kinds of things. Aquí es donde se vuelve un poco difícil para una computadora decidir este tipo de cosas. How do humans do it? ¿Cómo lo hacen los humanos? It's worth thinking about how humans resolve this, because this will map later to how we think about the deep neural network. Vale la pena pensar en cómo los humanos resuelven esto, porque esto se relacionará luego con cómo pensamos sobre la red neuronal profunda. Humans basically say,"I think the game is done," and then you have to also say,"I think the game is done." Los humanos básicamente dicen: "Creo que el juego terminó", y luego tú también tienes que decir: "Creo que el juego terminó". Then we'll say,"I think these are my stones," and you have to agree. Entonces diremos: "Creo que estas son mis piedras", y tienes que estar de acuerdo. If you don't agree, we keep playing. Si no estás de acuerdo, seguimos jugando. Essentially, once two humans—their so-called value function—agree on a consensus, then the Chinese rules resolve that. Esencialmente, una vez que dos humanos, con su llamada función de valor, llegan a un consenso, las reglas chinas lo resuelven. In Tromp-Taylor scoring, it's perfectly unambiguous, so it can be decided algorithmically by a computer. En la puntuación Tromp-Taylor es completamente inequívoco, así que puede decidirse algorítmicamente por una computadora. If you have this at the endgame, the way you score it is that you first count how many stones you control, and that's unambiguous. Si tienes esto en el final del juego, la forma de puntuarlo es primero contar cuántas piedras controlas, y eso es inequívoco. Then you count how many empty intersections are not touched by your opponent's stones. Luego cuentas cuántas intersecciones vacías no están tocadas por las piedras de tu oponente. These intersections would not count for either player, because all of these intersections are connected to both white stones and black stones. Estas intersecciones no contarían para ningún jugador, porque todas están conectadas tanto a piedras blancas como a piedras negras. If this were like this, then white would get three points. Si fuera así, entonces blanco obtendría tres puntos. This is a little odd because a human would know that white is actually losing these points. Esto es un poco raro porque un humano sabría que blanco en realidad está perdiendo estos puntos. But Tromp-Taylor scoring would consider white to have all of these points as well as these points. Pero la puntuación Tromp-Taylor consideraría que blanco tiene todos estos puntos así como estos otros. So that is a very big difference in how computer Go scores things and how humans score things. Entonces esa es una diferencia muy grande entre cómo el Go computacional puntúa las cosas y cómo las puntúan los humanos. How does the game end? ¿Cómo termina el juego? The game ends when either a player chooses to resign or both players pass consecutively. El juego termina cuando un jugador elige rendirse o cuando ambos jugadores pasan consecutivamente. Those are the rules. Esas son las reglas. Now help me crack this with AI. Ahora ayúdame a resolver esto con IA. Let's understand how AlphaGo actually works and how somebody in the audience might be able to implement it. Entendamos cómo funciona realmente AlphaGo y cómo alguien en la audiencia podría implementarlo. Let's start with an intuition about the underlying search process used to make moves, and we'll layer on ideas from deep learning to make it much more efficient and tractable. Empecemos con una intuición sobre el proceso de búsqueda subyacente que se usa para hacer movimientos, y luego agregaremos ideas del aprendizaje profundo para hacerlo mucho más eficiente y manejable. Go is a game with just two players. Go es un juego de solo dos jugadores. We're going to draw a person here, and we're going to draw an AI here. Vamos a dibujar una persona aquí, y vamos a dibujar una IA aquí. Let's say this person is playing black, so they go first. Digamos que esta persona juega con negro, así que va primero. They go here. Van aquí. Now the AI is going to make a move based on what it sees here. Ahora la IA hará un movimiento basándose en lo que ve aquí. There's a question of how you encode these inputs into the AI. Existe la pregunta de cómo codificas estas entradas en la IA. Maybe you could use ones and zeros, but you want to represent black, white, and empty. Quizás podrías usar unos y ceros, pero quieres representar negro, blanco y vacío. You would need at least three different values. Necesitarías al menos tres valores diferentes.