ポッドキャストに戻る AI Engineer

The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google

Hello, hello. 大家好，你们好。 Hello, can you hear me okay? 你好，大家听得到我说话吗？ Okay, cool. 好的，没问题。 Let's get started. 那我们开始吧。 So, we are going to be talking a little bit about Web MCP. 今天我们来聊一聊 Web MCP。 Has anybody, just out of curiosity, has anybody already played around with Web MCP? 好奇问一下，有没有人已经试玩过 Web MCP？ Only a few people. 只有几个人。 Okay, great. 好，很棒。 Those few people, you have a bit of a head start, but for everyone else, we'll be going into a bit more of the background, how it works, what it does. 那几位同学，你们已经有点先发优势了，不过对于其余的人，我们会先介绍一些背景知识，讲讲它是什么、怎么运作的。 So, my name is Tara. 我叫 Tara。 I am part of the Google Chrome team. 我是 Google Chrome 团队的成员。 Um I'm a developer relations engineer, and I'm here with a few of my colleagues from Google Chrome, alongside the DeepMind team, too. 我是一名开发者关系工程师，今天和我一起来的还有几位 Google Chrome 的同事，以及 DeepMind 团队的成员。 So, we'll be really interested to talking to you afterwards around the DeepMind booth, if you like have thoughts around web and AI and the intersection between the two. 所以，如果你对 web 与 AI 的交叉领域有想法，欢迎来 DeepMind 展台和我们聊聊。 That is where my focus is these days. 这正是我最近的工作重心。 So, let's get into it. 好，我们正式开始。 The let's say past few decades, we have been building the web for human actions and human eyes, and we've been trying to optimize for that, but these days, it's not just humans that are using the web. 可以说，过去这几十年里，我们一直在为人类的操作和人类的视线而构建 web，并努力为此优化。但如今，使用 web 的不只是人类了。 We have agents using the web on the human behalf, too. 现在也有 Agent 在代替人类使用 web。 And we are seeing an increasing number of agents using the web. 我们正在看到越来越多的 Agent 使用 web。 But, the problem is the agents are having to do so much work to do simple actions on the sites that we've built. 但问题在于，Agent 要在我们构建的网站上完成一些简单操作，却不得不付出大量的工作。 And just to give you a bit of an example of this, this is a a website that I've built coded, and it's a concert website for selling tickets for concerts. 举个例子，这是我写的一个网站，一个卖演唱会门票的音乐会网站。 And we have Gemini in Chrome panel on the side here. 这里侧边有 Gemini 的 Chrome 面板。 And let's say you come along to this website and you've typed this prompt. 假设你打开这个网站，然后输入了这条 prompt。 You want to buy two tickets to the Afro Beats Festival. 你想买两张 Afro Beats Festival 的票。 You've given it the details. 你已经把详情告诉它了。 The AI agent has to do so much work to make this happen. 但 AI Agent 要完成这件事，需要做大量工作。 So, it'll probably look at the HTML because usually the agents will pass the entire DOM just to understand what's happening on your page. 首先，它大概会读取 HTML，因为通常 Agent 会传入整个 DOM，以理解页面上发生了什么。 Then it will look into the accessibility tree just to understand the structure of your HTML page. 然后它会查看无障碍树，以理解 HTML 页面的结构。 Then it maybe it'll take a screenshot of the page, analyze all the different elements that couldn't see in the HTML and the accessibility tree. 再然后，它可能会对页面截图，分析 HTML 和无障碍树里看不到的各种元素。 And then maybe it will measure how far down it needs to click, how far across, where the exact element that it needs to click, and then it'll click that element. 接着，它可能会计算需要点击的位置，横向距离和纵向距离，找到需要点击的精确元素，然后点击它。 And as you can see, this process is quite long. 如你所见，这个过程相当漫长。 It can be brittle, and I don't even want to guess at how many tokens you probably just used trying to do this. 这个过程很脆弱，我都不想去猜你刚才为了完成这件事用掉了多少 token。 It's probably a lot. 肯定不少。 And then after all that, maybe your ad has loaded at the top of the page, pushed all your content down, and your AI agent couldn't even click the right place in the end. 而且经历了这一切之后，也许你页面顶部的广告刚好加载进来，把内容往下推了，结果 AI Agent 最终连正确的位置都没点到。 So, there's so much to think about, but before we go into this proposed web standard, it's worth mentioning that you can do so much by improving web foundations first. 这里需要考虑的问题很多，但在介绍这个提案 web 标准之前，值得一提的是，先从改善 web 基础做起，就能做到很多。 So, making your site accessible for everyone makes it accessible to AI agents by default. 让你的网站对所有人无障碍可访问，默认也就让它对 AI Agent 可访问了。 So, if you improve your semantic HTML, if you focus on robust accessibility standards, and if you improve your page performance, make it load really quickly, think about those core web vitals, and then improve really good user experience flows through your site, you're already halfway to getting an agent-ready website. 所以，如果你改善了语义化 HTML，专注于健壮的无障碍标准，提升页面性能让它加载得更快，关注 Core Web Vitals，并在网站中优化用户体验流程，你就已经成功了一半，离打造一个 Agent 就绪的网站不远了。 And it's only once you have those in place that it makes sense to start thinking about Web MCP. 而且，只有先把这些做好，才有必要开始考虑 Web MCP。 So, if you're not already aware, the Web Model Context Protocol is a a proposed web standard, and that gives you the ability to define your site's capabilities as structured tools for your AI agents to use. 如果你还不了解，Web Model Context Protocol 是一个提案 web 标准，它能让你将网站的功能定义为结构化工具，供 AI Agent 使用。 And so, you might have heard references to this as the USB-C of AI agent interactions. 你可能听过有人把它称为 AI Agent 交互的 USB-C。 And that's because it instead of any agent guessing what your website does, you're kind of giving the AI agent a menu of tools that it can take of tools that it can use and actions that it can take. 这是因为，它不再让 Agent 去猜测你的网站能做什么，而是给 AI Agent 提供一份工具菜单，列出它可以使用的工具和可以执行的操作。 And so, because of this, we're seeing that WebMCP significantly improves the performance and the reliability of agents navigating your website. 正因如此，我们看到 WebMCP 显著提升了 Agent 在你网站上的表现和可靠性。 So, let's see it in action. 好，我们来看看实际效果。 Hopefully, Gemini treats me well today. 希望 Gemini 今天给力。 So, this is the maze escape game built by our team in Chrome DevRel. 这是我们 Chrome DevRel 团队构建的迷宫逃脱游戏。 And just on the side here, we have a Chrome extension. 侧边这里，我们安装了一个 Chrome 扩展。 Um I'll show you a link to that afterwards. 这个链接我稍后会发给大家。 But, this is the model context tool inspector. 这是 Model Context 工具检查器。 And so, we're using this 我们正在用这个 This is a standard Chrome extension extension that lives in your side panel, and it lists out all the tools that it finds on your website. 这是一个标准的 Chrome 扩展，它住在侧边面板里，会列出它在你网站上找到的所有工具。 So, at the moment, it only has one 目前只有一个 It can only see one tool, and that's the start maze game tool. 它只能看到一个工具，就是开始迷宫游戏这个工具。 And then at the bottom down here, it gives you two options to interact with the page. 下面这里提供了两种与页面交互的方式。 So, you can interact via a prompt like a user would prompt normally via the AI agent, or you can call tools directly at the bottom, but we won't be looking at that one today. 你可以像普通用户一样通过 prompt 与 AI Agent 交互，或者在底部直接调用工具，不过今天我们不演示那个。 So, this specific maze game is actually more unique in that you actually can't browse it by clicking around the UI. 这个迷宫游戏比较特殊，你无法通过点击 UI 来浏览它。 You can only use this app with the AI tooling. 这个应用只能通过 AI 工具来使用。 So, let's start a new maze game here. 好，我们在这里开始一局新的迷宫游戏。 You can also choose your model on the side. 你也可以在侧边选择你的模型。 So, let's stick with the Gemini 1.5. 先用 Gemini 1.5 吧。 So, you'll see that at the bottom, when you send a prompt, it gives you all the information. 你会看到，在底部，当你发送 prompt 时，它会显示所有信息。 So, the new prompt prompt to start a new maze game, and the AI agent, Gemini in our case, has called that tool start game. 这条新的 prompt 是开始一局新迷宫游戏，我们这里用的 AI Agent 是 Gemini，它调用了那个工具 start game。 The tool itself has returned this information, and then the AI has read that and given me this response. 工具本身返回了这些信息，然后 AI 读取后给了我这条回复。 And so, now we have our maze, and you'll notice that on this page, we have a bunch of new tools in the scope of this page, whereas the previous page only had that one tool. 现在迷宫出来了，你会注意到这个页面上有一批新工具，而之前的页面只有那一个工具。 This page, we've got a bunch of tools to help us navigate the maze. 这个页面有一批工具帮助我们导航迷宫。 So, in this maze, you can move around with the north, south, east, west directions. 在这个迷宫里，你可以用北、南、东、西方向来移动。 You can look to see where you are in the maze and which directions are open. 你可以查看自己在迷宫中的位置，以及哪些方向是通的。 And then, you can pick up items, drop items, use items as you navigate this 然后，你可以在导航过程中拾取物品、丢弃物品、使用物品。 And if I pop in some prompts, I can see that I can move down, then maybe after that, then right. 如果我输入几条 prompt，我可以往下走，然后可能再往右走。 The AI agent should use my prompt, match it to the specific tools, so in this case, the move tool. AI Agent 应该读取我的 prompt，与具体工具对应，在这里就是 move 工具。 It's taken my direction of down and right, matched that to the north, south, east direction, and sent that off to the tool that we have registered on this page, and then it's moved it down and right. 它把我说的「下」和「右」方向，映射到南北东西方向，然后发送给我们在这个页面注册的工具，结果它就往下和往右移动了。 And so, you can do 你可以这样操作 And because it's an AI agent, it can understand a whole bunch of different things. 而且因为是 AI Agent，它能理解很多不同的表达方式。 So, I could just say, right, up, maybe right again. 比如我可以说，右，上，可能再右一次。 Let's try that. 我们试试。 And so, the AI agent has seen that uh sounds for right, mapped that to the direction, and then called the move tool with those information. AI Agent 识别了这些方向表达，把它们映射成具体方向，然后用这些信息调用了 move 工具。 And because it's an AI agent, it can just keep repeating the same tool tool calls until it thinks that it's done what needs to be done. 因为是 AI Agent，它可以持续重复调用同一个工具，直到它认为任务完成。 So, I could even say, complete the maze. 我甚至可以说，完成这个迷宫。 And then the AI agent should use all the tools available to just keep moving around the maze to pick up items, to use the items when it needs to because it has all the information in the tools available. 然后 AI Agent 会利用所有可用工具，不断在迷宫里移动、拾取物品、在需要时使用物品，因为它通过工具掌握了所有信息。 This specific prompt was not the most efficient, so sometimes you'll see it'll go backwards all the way to the start and then go forwards again. 这条 prompt 不是最高效的，有时你会看到它会倒退回起点再重新往前走。 But, the more that you refine the prompt, the better the agent knows how to complete the maze in the most efficient way. 但你的 prompt 越精细，Agent 就越清楚如何以最高效的方式完成迷宫。 For example, if you just say, the exit is in the bottom right corner, it'll be more efficient in its instructions to get to that to that direction. 比如你只需说，出口在右下角，它完成迷宫的路径就会更高效。 So, I won't I won't continue this cuz it can take quite a while to complete this maze. 我就不继续演示了，因为完成这个迷宫可能要花不少时间。 But, if we go back to the slides here, so this is the model context tool inspector that I mentioned. 如果我们回到幻灯片，这就是我提到的 Model Context 工具检查器。 So, this is the web extension that our team in Chrome DevRel built. 这是我们 Chrome DevRel 团队构建的 web 扩展。 The QR code there is is if you want to see where that is in the Chrome Web Store, but anyone can use that and grab it from the Web Store. 那里的二维码是它在 Chrome Web Store 的地址，任何人都可以去 Web Store 下载安装。 But, essentially, WebMCP kind of unlocks this new approach to using the web where your users don't have to spend a lot of time trying to figure out how to use more complicated sites. 但本质上，WebMCP 开启了一种全新的 web 使用方式，让你的用户不必花大量时间去研究如何使用那些复杂的网站。 And they can figure out their own workflow. 用户可以自己决定工作流程。 So, they can choose to browse your website the normal way for a bit, then they can hand over control to their AI agent, and the AI agent takes steps on their behalf. 他们可以先按正常方式浏览你的网站，然后把控制权交给 AI Agent，由 Agent 代他们执行操作。 And then your user can come in at any time to take control again and browse your site again the way they normally would. 之后，用户随时可以介入，重新接管，像平时一样浏览你的网站。 And so, that ability to simplify user journeys and make those user journeys for people easier has been a large part of the reason we've seen interest and excitement in this new standard. 正是这种简化用户旅程、让用户体验更轻松的能力，是这个新标准引发广泛关注和期待的主要原因。 So, I want to pause for a minute just to address the question that some people have. 我想在这里暂停一下，来回答一个很多人都有的问题。 And that's what is the difference between Web MCP and MCP. 那就是 Web MCP 和 MCP 有什么区别。 But you can kind of see them as being complementary to each other. 你可以把它们理解为互补的关系。 So, whereas Web So, whereas MCP enables AI agents to connect to applications on the server side, and you'd need to set up your own service for the agent to access, and then the agent can access the information anywhere, anytime, Web MCP is different in that it's kind of inspired by MCP. MCP 让 AI Agent 在服务器端连接应用，你需要搭建自己的服务供 Agent 访问，Agent 可以随时随地获取信息；而 Web MCP 则不同，它是从 MCP 获得灵感的。 I like to think of it of as how JavaScript is inspired by Java. 我喜欢把它类比成 JavaScript 和 Java 的关系。 And that's in short, Web MCP is the implementation of the tools part of the MCP. 简而言之，Web MCP 就是 MCP 中工具部分的实现。 And so, Web MCP allows engineers to provide tools to in-browser AI agents. Web MCP 让工程师能够向浏览器内的 AI Agent 提供工具。 And it's very specific for the client-side features. 它非常专注于客户端功能。 So, you have to have your browser window open for Web MCP to work, and then you can use it to help your agent interact with the browser. 所以，Web MCP 工作时需要打开浏览器窗口，然后你就可以用它来帮助 Agent 与浏览器交互。 So, all of the tools live in the browser. 所有工具都在浏览器里。 But you can imagine this for quite a few different types of use cases. 但你可以设想，这适用于很多不同的使用场景。