Simulating Humans at Scale: Simile's Joon Sung Park
I am somebody who is quite inspired by science fiction.
我深受科幻作品启发。
And when you read science fiction uh that covers societies that have progressed far enough in its technological maturity, you always see two pillars.
读那些描写技术文明高度成熟社会的科幻作品,你会发现两根支柱始终存在。
You have some version of AGI and you have some version of simulations that really help guide the society.
一是某种形式的 AGI,二是某种能切实引导社会的模拟系统。
I do see an opportunity today to really take the first crack at building the simulation.
我确实看到了今天的机会,真正迈出构建这套模拟的第一步。
I would not have said that even 5 years ago, but that is a conviction that we have built up over the years as we are going deep into this research.
5 年前我不会这么说,但随着研究越做越深,我们一步步建立了这个信念。
Today we're delighted to have Jun uh founder and CEO of Simile.
今天很高兴邀请到 Joon,Simile 的创始人兼 CEO。
Uh, Similey is building an applied AI lab simulating human behavior and societies and I'm very excited to have you here to discuss what you're building.
Simile 是一家模拟人类行为与社会的应用 AI 实验室,很期待和你聊聊你们正在做的事情。
Same here.
我也是。
Thank you for having me.
感谢邀请。
Okay, take me back to April 2023, Stanford, California, specifically Smallville, Stanford, California.
好,带我回到 2023 年 4 月,斯坦福,加州,具体说就是 Smallville,斯坦福,加州。
What was what was that?
那是什么?
So, Smovville was a project that we were running at Stanford where the idea was that we made this observation that large lynch models can now encode a lot of human behavior that is embedded in its training data from the web and social media and so forth that if you sort of prove at the right angle, you can actually get a lot of micro behaviors out of these models.
Smallville 是我们在斯坦福做的一个项目,核心观察是:大语言模型已经能从网页、社交媒体等训练数据中编码大量人类行为,只要以恰当的角度去提取,就能从模型里挖出丰富的微观行为。
So, given a very specific demonstration or description of a situation, what would person X do?
给定对某个情境的具体示范或描述,人物 X 会怎么做?
And it would actually generate really interesting behaviors.
结果真的产生了非常有趣的行为。
We found that to be so interesting and we found that to be the ingredient that we had been waiting for for creating really complex agentic behaviors.
我们觉得这非常有意思,也发现这正是期待已久的关键要素,可以用来构建真正复杂的智能体行为。
So small bit actually was an experiment where we decided that if we pushed this as far as possible what would a society that is created by these agents look like.
Smallville 就是一个实验:把这一能力推到极限,看看由这些智能体构成的社会会是什么样子。
So we basically created generative agents that is paired with generative AI model with memory planning and reflection to basically create this lived experience of agents living in the small town.
我们构建了生成式智能体,将生成式 AI 模型与记忆、规划和反思机制结合,让智能体在小镇里过上有血有肉的生活。
So smallville was basically a game town of 25 agents living in it.
Smallville 本质上是一个游戏小镇,里面住着 25 个智能体。
Individual agents had a description of persona but they would actually wake up in the morning do their routines go to work actually have relationship sort of like people would and they would actually have emergent phenomena like having parties and so forth.
每个智能体都有人物设定,会在早上醒来、完成日常、去上班、建立人际关系,还会出现派对这样的涌现现象。
So that was the experiment that we ran.
这就是我们做的那个实验。
What was the most surprising things to come out of the experiment?
实验里最出乎意料的发现是什么?
So one of the surprising things was so the experiment the simulation itself actually sets place the day before a Valentine's Day.
令人惊讶的事情之一是,模拟本身的时间设定在情人节前一天。
So you actually see these agents one of the agents actually thinking well I run a cafe.
你会真的看到这些智能体在思考,其中一个心想:我有家咖啡馆。
So she's a cafe owner.
她是个咖啡馆老板。
Her name's Isabella.
她叫伊莎贝拉。
She goes and thinks it would be great if I can do a Valentine's Day party where we invite a lot of friends, customers.
她想到:如果能办个情人节派对,邀请一堆朋友和顾客来就太好了。
So you actually see her on the day before Valentine's Day going around actually gathering materials for the party, actually telling her customers, hey, we're going to have this party.
在情人节前一天,你能真实看到她四处张罗派对物资,跑去告诉顾客们:嘿,我们要办派对。
Please come.
请来。
And on the day of Valentine's, you actually see this immersion party that actually get formed with all these agents coming to the to the basically cafe.
到情人节当天,你看到一场自发形成的派对,所有智能体纷纷赶到咖啡馆。
Did anyone not get invited?
有没有人没收到邀请?
Well, some of the people did get the invite invitation, but they forgot.
有些人收到了邀请,但忘了。
That's one thing that did happen.
这确实发生了。
Uh some of the agents did not explicitly get invited, but we had one agent who got the invite, Claus, who decided to ask his crush out on a date.
有些智能体没有明确收到邀请,但有一个叫 Claus 的智能体收到了邀请,他决定邀请自己喜欢的人去约会。
So he would actually bring in the date.
他还真的把对方带来了。
They would actually have a party at this cafe.
他们在咖啡馆里开了派对。
So quite surreal.
挺魔幻的。
So how did you end up building Smallville in the first place?
Smallville 是怎么做起来的?
Like were you studying kind of human psychology and social behavior or was this coming from was this coming from the kind of customer back or was it coming from the technology out?
是在研究人类心理和社会行为,还是从用户需求出发,或者说是从技术可能性推导出来的?
So my particular team has been excited about simulations and we saw the vision of simulation failure early on.
我的团队一直对模拟很感兴趣,很早就看到了现有模拟手段的局限。
So my career as a researcher at Stanford really started back in 2020.
我在斯坦福做研究员的经历,从 2020 年开始。
That was the year when GPT3 was about to come out.
那一年 GPT-3 即将发布。
It wasn't quite there yet but it was just about to come out.
还没完全准备好,但快了。
We started to get its first demos.
我们开始拿到最初的 demo 了。
And my first year uh we wrote this paper called opportunities and risks of foundation model alongside many of the Stanford researchers and was led by one of my co-founders Perc Leang who is now the head of the center for foundation model at Stanford and when we were writing that the part that I was really focused on was well here's a new class of models that we have not seen in the past that these models that can be very generalizable in ways we didn't quite have in the past and I got into thinking well if we can imagine the kind of interaction we can create with these models, what would that be?
入学第一年,我们和很多斯坦福研究者合写了一篇论文,叫基础模型的机遇与风险,由我的联合创始人 Percy Liang 主导,他现在是斯坦福基础模型中心的负责人。当时我最关注的部分是,我们有了一类前所未见的新模型,其泛化能力是过去没有的,我就开始想,如果把和这些模型能创造的交互空间想象得足够大,那会是什么?
And many of my colleagues back then were surprised that these agents or these models can do classification or simple generation.
当时很多同事对这些智能体或者说这些模型能做分类、做简单生成感到惊讶。
And that was really incredible to see because these models didn't really know or didn't really wasn't really taught to do that.
这真的让人震撼,因为模型并没有被专门训练来做这些事。
But the part that was surprising to me wasn't that these models can do that because from interaction perspective, we've known how to do this for a long time.
但让我惊讶的其实不是这个,因为从交互的角度看,这类能力我们早就掌握了。
The interesting part was well, these models can actually encode human behavior.
真正有意思的是,模型竟然能编码人类行为。
What does that mean if we were to push this as far as possible?
如果把这一点推到极致,意味着什么?
So part of the tradition I come from research included what we call social computing and social computing within human computer interaction really has to do with this idea of how can we build a better techn technological platform that would enable social interactions and collaboration.
我所在的研究传统里有一个叫社会计算的领域,社会计算在人机交互里核心关注的,是如何构建更好的技术平台,来支撑社交互动与协作。
One of the most difficult challenges of building a social platform is not necessarily testing the UIU UX of the system, but it's more about when you have tens of people, millions of people, and down the line billions of people.
构建社交平台最难的挑战,不一定是测试系统的 UI/UX,而是当你面对数十人、数百万人,乃至未来数十亿人的时候。
How do all these people come together to create the immersion phenomenon that's both good and bad and how can we design for a scale?
所有这些人聚合在一起,会产生哪些好的或坏的沉浸式效应?我们又该如何为这种规模做设计?
And so far, we didn't really have a tool that would enable us to test for that.
到目前为止,根本没有工具能用来测试这些问题。
The only way we test it today is you basically field test it.
现在唯一的测试方式就是实地测试。
You release your prototype, see what happens.
你发布原型,然后观察会发生什么。
And sometimes it actually comes at a real cost.
有时候代价是真实的。
Obviously, it's high cost in terms of human hours and the time it takes.
显然,人力成本和时间成本都很高。
But at the same time, if you have a bad design, imagine you have a feed on social media that is more likely to propagate certain emotion that is negative.
但同时,如果设计有问题,比如社交媒体的信息流更容易传播某些负面情绪。
Then obviously that is something that we want to avoid.
这显然是我们要避免的。
But this now gets tested in the in the field.
但现在这些只能在实地测试中才能发现。
So we wanted to see whether we can actually create a simulation that would actually let you test for this.
于是我们想验证,能否真的构建出一个模拟环境,来对此进行测试。
So 2022 this was actually a year before generative agents we worked on a paper called social similacra which actually really was the precursor to the agent paper that we ended up writing.
2022 年,也就是生成式智能体论文发表前一年,我们做了一篇叫 Social Simulacra 的论文,那其实正是后来那篇 agent 论文的前身。
The core thesis was imagine you're building a subreddit.
核心想法是这样的:假设你在搭建一个 subreddit。
You're a designer on a subreddit.
你是这个 subreddit 的设计师。
You want to see what people might do in the subreddit which is surprisingly hard task even for practice designers.
你想预判用户在上面会做什么,而这对专业设计师来说也出乎意料地困难。
And we basically decided, hey, we have this model seems unique.
我们基本上决定了,嘿,手头这个模型看起来很独特。
Let's use this model to create simulations of the entire subreddit.
就用它来模拟整个 subreddit。
So you define the goal, you define the moderation strategies, and you populate it with thousands of what back then we didn't call them agents, but we call them personas, but populated with thousands of personas.
你设定目标,设定管理策略,然后填入数千个,当时还不叫智能体,叫人物设定,就这样填入了数千个人物设定。
This is basically 22 uh version of mobook, which is quite interesting that it actually came back.
本质上这是 2022 年的早期雏形,有趣的是它后来真的回来了。
And when we saw that, we actually got a lot of really important insights out of this.
当我们看到这些,确实从中得到了很多非常重要的洞见。
What are the good behaviors?
什么样的行为是好的?
We actually simulated a community where the entire idea was for people to discuss with each other the sight sing uh places to sightsee in Pittsburgh.
我们模拟了一个社区,设定是让大家互相讨论匹兹堡的旅游景点。
And all of a sudden, you start to see this personas actually collaborate to actually discuss, hey, XYZ places are amazing.
突然之间,你会看到这些人物设定真的开始协作,真的去讨论,哎,XYZ 这些地方太棒了。
Do you want to actually go to a trip together and actually plan those trips live in the simulated subreddit?
你们想不想一起去旅行,就在这个模拟 subreddit 里实时规划行程?
So that's how we got excited.
就这样我们被点燃了。
So we saw the vision and the excitement and the potential applications failure early on.
很早就看到了这个愿景、那份兴奋,以及潜在的应用方向。
But then the work that we had to do was then demonstrating how can we go beyond simple personas to create complex agents that actually can think over time because we want to simulate the longitudinal aspect of the our society and then actually validating that these simulations are actually accurate in practice.
但接下来要做的工作,是证明如何从简单的人物设定走向能够持续思考的复杂智能体,因为我们想模拟社会随时间演化的纵向面貌,同时还要验证这些模拟在实践中是否真的准确。
Was there a point of model evolution at which you felt like okay we're there the the models are good enough for us to actually have a you know faithful representation of human society.
模型发展到哪个节点,你觉得它足够好,可以真正忠实地呈现人类社会了?
So Gypty 3 when it came out and social similar was built with GP3 and it was very janky.
GPT-3 发布那时,Social Simulacra 就是用 GPT-3 搭的,效果很粗糙。
It didn't do any instruction tuning.
它根本没有做指令微调。
It did not follow your instructions.
不听你的指令。
So just to have it to listen to you and do what you wanted to do, you had to do some weird tricks with prompting and so forth.
为了让它听话、按你想要的去做,需要搞一些奇怪的提示技巧之类的。
But you could actually see the promise.
但你确实能看到它的潜力。
The model actually have encoded a lot of human behavior and you could actually see the trajectory and when we had the generative agents paper wasn't quite tragic but we now had instruction tuning.
模型已经编码了大量人类行为,能看出发展的轨迹;等到生成式智能体那篇论文,还算不上魔法,但我们已经有了指令微调。
So we could actually build much more complex agents that can reason about its memory that wasn't really possible when we did social similar and since then of course the models have improved.
这样就能构建出复杂得多的智能体,能对自身记忆进行推理,而这在做 Social Simulacra 时是根本做不到的,此后模型当然也一直在进步。
So where we are today is the models at its foundational level have reached a point where we can actually imagine building these kind of applications.
我们现在所处的位置是,模型在基础层面已经到了一个节点,让人可以真正设想搭建这类应用。
Now the part that actually I do think however that's quite interesting here today if you look at many of the large linkage model companies whether it's open AAI anthropic and many of the neolabs that are getting formed the models they are creating are models that I would consider to be their north start to be something that is similar to let's build a super intelligent machines.
不过有一点我确实觉得很值得关注:今天如果你看那些大语言模型公司,无论是 OpenAI、Anthropic,还是众多新兴实验室,他们训练的模型,我认为其北极星都是打造超级智能机器。
These machines are meant to be rational and these machines are supposed to be really amazing at technique problems that have an objective answer.
这些机器被设计成理性的,极其擅长有客观答案的技术问题。
So maybe that's not even the best simulation of true human society.
那也许这并不是真正模拟人类社会的最佳方式。
Then
那
turns out, yeah,
结果嘛,
people are irrational.
人是非理性的。
Yeah.
是的。
Right.
对。
We have a lot of subjective values, preferences, and taste.
我们有大量主观的价值观、偏好和品味。
So you actually start to see divergence in moral science going up and the performance in its ability to predict and simulate human behavior.
于是你会看到:道德科学的分歧在不断扩大,而模型预测和模拟人类行为的表现也开始出现偏差。
So we have sort of plateaued with current modeling paradigm our ability to really simulate humans.
在当前建模范式下,我们真正模拟人类的能力已经遭遇瓶颈。
So it is sort of at the starting good foundational level but to make it really amazing
目前的基础打得还不错,但要做到真正出色,
we do need the next frontier that is more geared towards actually modeling people's diversity.
就需要下一个前沿方向,更专注于建模人群的多样性。
Very interesting.
很有意思。