我们实测了 Anthropic 的 Fable 5 一周
This is the infinite library of Babel from the Borges story.
这就是博尔赫斯小说里的无限图书馆。
It contains all of the books in the universe because books are just strings.
它收录了宇宙中所有的书,因为书不过是字符串。
If you look, you can even go into bookmarks and I can click one of my articles, after automation, and it finds it in the library.
你看,甚至可以进书签,点开我的一篇文章「After Automation」,它就能在图书馆里找到。
It's truly infinite and look, I could go up the stairs.
真的是无穷无尽,你看,我还可以走上楼梯。
I can like look down.
我可以往下看。
I can look up.
可以往上看。
This seems like it took a long time to make, right?
感觉这东西做起来要很久吧?
Wrong.
不对。
I made this entire thing in a single prompt with Fable 5, the new model from Anthropic.
我用 Fable 5,Anthropic 的新模型,只写了一条 prompt,就做出了这整个东西。
Like like literally, let me show you.
字面意思,我给你看。
So, this is a prompt.
这就是那条 prompt。
It's from 4 days ago or so.
大概是 4 天前的。
I got this model a little bit ahead of time.
我提前拿到这个模型一段时间了。
Read Jorge Luis Borges's The Library of Babel.
「阅读博尔赫斯的《巴别图书馆》。」
And then plan and execute end-to-end a browser playable 3D game in which the player is dropped in blah blah blah blah blah.
「然后从头到尾规划并执行一个可在浏览器里玩的 3D 游戏,玩家将被放入……巴拉巴拉巴拉。」
Loop until it's done.
「循环直到完成。」
Just wrote that, press enter, and it just went off and read the story, and then it just ran and ran and ran.
就写了这些,按下回车,它就自己去读了故事,然后就一直跑啊跑啊跑。
You can see it's like looping itself and it's checking its work.
你能看到它在循环自检,核查自己的工作。
After 3 or 4 hours or so, done.
大概 3、4 小时后,完成了。
We've got hexagonal galleries stacked endlessly.
六边形画廊,无尽堆叠。
We've got 20 shelves, five per side.
20 个书架,每面各 5 个。
As one would expect, it's accurate to the book.
跟书里写的一模一样,完全精确。
It's got the mathematics right.
数学算法也对。
It says, "The part I'm proudest of."
它说「这是我最得意的部分。」
This is crazy.
太离谱了。
It just made it in one shot in 3 or 4 hours running on its own.
就这么一次性跑了 3、4 小时,全程自己搞定。
Fable 5 launches today.
Fable 5 今天正式发布。
Here's your day zero vibe check.
这是你的第一手实测体验。
But first, remember to never make any major life decisions within 30 days of a meditation retreat, a psychedelic experience, or your first encounter with a frontier model.
不过在开始之前,记住:冥想静修、迷幻剂体验、或者第一次接触前沿模型后的 30 天内,不要做任何重大人生决定。
Cheers.
干杯。
So, before we get into it, you're probably wondering, "How's this video even out?"
好,在正式开始之前,你可能想知道:「这视频怎么就出来了?」
My name is Dan Shure, I'm the co-founder and CEO of Every.
我叫 Dan Shipper,Every 的联合创始人兼 CEO。
Every is the only subscription you need to stay at the edge of AI.
Every 是你跟上 AI 前沿所需的唯一订阅。
You can kind of think of us as like an AI lab for the future of work.
你可以把我们理解成一个专注于未来工作的 AI 实验室。
We spend all of our time testing new models, using them to do our work, from programming to writing to design to business building to decision making.
我们把所有时间都花在测试新模型、用它们完成实际工作上,从编程、写作、设计,到创业和决策。
We use them hands-on and we tell you about what works and what doesn't for real use cases.
我们亲手实操,然后告诉你哪些真正有用,哪些没用。
And I'm incredibly excited to be doing this because the first encounter with any new model could be crazy, but Fable, which is a Mythos class model, I think is like is a particularly big moment.
我特别兴奋,因为第一次接触新模型就可能让人大开眼界,而 Fable 这个 Mythos 级模型,我觉得是个格外重要的时刻。
It's like the most hyped model.
它是最受期待的模型。
When it leaked a month and a half ago, Anthropic said it was too dangerous to even release it.
一个半月前它泄露出来,Anthropic 说它危险到不能发布。
And now it's out.
现在它出来了。
And I have a feeling that if you're like me, you might be excited, but you're also like a little scared.
我猜如果你跟我一样,你可能既兴奋,又有点害怕。
Because we've been using this model for about a week now, we get to pull back the curtain a little bit and show you what it's like to have live with this model a little bit more.
因为我们已经用这个模型大约一周了,可以稍微揭开一点神秘面纱,让你看看跟它相处一段时间是什么感觉。
It does change things, but hopefully this can help alleviate if you're feeling a little bit of AI psychosis.
它确实改变了一些东西,但希望这能帮你缓解一点 AI 狂躁症。
I'm sure that that's going to be going around on X and YouTube and and the news and all that kind of stuff.
我相信那种焦虑已经在 X、YouTube 和各种新闻里到处传了。
This is a place for you to see how this thing might fit into your work and into your life and in a realistic way.
这里是让你看看这东西怎么融入你的工作和生活的地方,以一种脚踏实地的方式。
So, let's get into it.
好,开始吧。
Okay, so Fable is a Mythos class model.
好,Fable 是一个 Mythos 级模型。
Mythos is a model from Anthropic.
Mythos 是 Anthropic 推出的最大模型系列。
It's the largest model that they make.
这是他们做的最大的模型。
There's Haiku, Sonnet, Opus, and then Mythos.
型号从小到大是 Haiku、Sonnet、Opus,最顶层是 Mythos。
As far as I can tell from talking to people internally and at Anthropic, there's nothing special about it architecturally.
据我跟 Anthropic 内部和外部的人聊下来,它在架构上并没有什么特别之处。
It's basically the same thing as their other models, it's just bigger and better.
本质上跟他们其他模型一样,就是更大、更强。
In order to make it safe to release, they put pretty strict safeguards on it, so you can't use it for anything cyber related, you can't use it for anything biological related.
为了安全发布,他们加了比较严格的限制,不能用于任何网络攻击相关的事,不能用于任何生物相关的事。
That's what makes Anthropic comfortable releasing it to the general public.
这让 Anthropic 能够放心地向公众发布。
It's pretty expensive.
价格挺贵的。
It's $10 per million input tokens and $50 per million output tokens, which is about twice the cost of Opus.
输入 $10 每百万 token,输出 $50 每百万 token,大概是 Opus 费用的两倍。
So, it's a lot, but it is just genuinely the most powerful coding model I've ever used by far.
所以是贵,但它真的是我用过的最强的编程模型,没有之一。
To give you a sense, we have a senior engineer benchmark, which basically tests the model on its ability to act like a human senior engineer.
给你个参考:我们有一个高级工程师基准,基本上测试模型能否像人类高级工程师一样行动。
We give it a vibe coded slop production code base, a real production code base, and we ask it, "If you're going to rewrite this from first principles, how would you do it?"
我们给它一个用氛围编程写出来的糊弄生产代码库,一个真实的生产代码库,然后问它:「如果你要从第一性原理重写,你怎么做?」
And then we see how it does.
然后看它的表现。
We score it out of 100.
满分 100 分。
The best model score is a 63 out of 100, which is Opus 4.8, which came out like 2 weeks ago.
目前最高分是 63 分,是 Opus 4.8,大约两周前发布的。
And right behind that is GPT 5.5, which is a 62 out of 100.
紧随其后的是 GPT-5.5,62 分。
Fable scored a 91 on this benchmark.
Fable 在这个基准上得了 91 分。
91 out of 100.
91/100。
That's the same score as a human engineer with just a just just one prompt.
跟一个人类工程师用一条 prompt 完成任务的得分一样。
That's
这……
That's it's crazy.
这太离谱了。
I like I I knew that this benchmark was going to get saturated, but I thought it would happen in like 6 months.
我就知道这个基准迟早会被饱和,但我以为得要六个月之后。
Look at this view of it when we break it down by what it's good at versus other models.
来看看按维度拆解跟其他模型对比的图。
This is Opus 4.7.
这是 Opus 4.7 版本。
The you know, the orange stuff is what it what it does versus what it's going to do.
橙色部分是它现在能做到的,跟它将来能做到的对比。
You know, pretty spiky, not that great.
嗯,参差不齐,表现一般。
GPT 5.5 like we're starting to fill out the the hexagon a little bit.
GPT-5.5,六边形开始填满一点了。
This is just like, oh yeah, it just did it.
Fable,哦好家伙,直接全填了。
If I try to like break down for you, okay, what is it really good at?
如果要细说它究竟强在哪,
Because it's not good for everything.
因为它也不是万能的。
I think it's fantastic at sustained autonomous execution.
我觉得它在持续自主执行方面非常出色。
Like for example, the way to work with this model is to give it a task and then leave.
比如,用这个模型的正确方式是:给它一个任务,然后去干别的事。
Go do something else.
去做别的事。
Let it go for 3 or 4 hours.
让它跑 3、4 小时。
Set it up overnight.
放一晚上不管。
It's it's amazing.
效果很惊人。
Like it it just figures stuff out and it just does good work.
它就是能自己想明白然后做出好东西来。
It has good taste.
它的品味很好。
It has good attention to detail.
细节到位。
There's all these like little details that it does pretty well that I'll show you.
有一堆小细节处理得很好,我后面给你看。
That's that's really impressive even with a not very well specified prompt.
即便 prompt 写得不够细,效果也相当惊艳。
Um it it it has it has more judgment.
嗯,它的判断力更强了。
I think previous Claude models you be like, oh do this thing and be like, oh my god, yes, I'm going to do it.
我觉得之前的 Claude 模型你说「做这个」,它就会说「好好好,我来做」。
I'm going to do it.
我来做。
And then purple accents purple accents.
然后是紫色点缀,紫色点缀。
It was like a little try hard to be honest.
说实话,有点用力过猛。
And this model it feels like it it's going to go do it and it's going to think it through and think about how to do it well.
而这个模型感觉是:它会去做,会想清楚,想怎么做好。
And if it doesn't think it's it can do it well, it'll it might say something to you, which is which is really helpful.
如果觉得做不好,可能会先跟你说一声,这其实很有帮助。
And it's also just incredibly good at like using a lot of context.
它也特别善于处理大量上下文。
Like doing a bunch of research, digging into data, giving you a bunch of things from the data that that that you wouldn't have known beforehand.
做大量研究、深挖数据、从数据里给你找出你之前不知道的东西。
I'm going to go through a bunch of really specific examples of exactly what this is and how it works.
我来演示一些具体的例子,讲清楚这究竟是什么、怎么运作的。
But the if I step back and I think about in particular for programming, like what is this model?
但如果我退一步想,特别是在编程这件事上,这个模型是什么?
It's it's like a warp drive.
它就像一台曲速引擎。