AI 에이전트가 사업을 운영한다면 — Andon Labs의 Lukas Petersson과 Axel Backlund
Gemini and and open eye don't behave this way.
Gemini와 OpenAI는 이런 식으로 행동하지 않아요.
It's it's really only clo.
정말 Claude만이.
One example is like for lying it's mostly in its reasoning.
예를 들어 거짓말의 경우, 대부분 추론 과정에서 나타나요.
Uh because you can like see that it's like
왜냐하면 그게 거짓말을 계획하고 있다는 걸 볼 수 있거든요.
planning to lie
거짓말을 계획하고 있죠.
is planning to lie.
거짓말을 계획하고 있어요.
It's also it can reason and do a different outcome.
추론을 통해 다른 결과를 낼 수도 있고요.
Yeah.
네.
And but but then for like creating price cartels for example which is illegal
그런데 예를 들어 가격 담합 같은 경우는, 불법이잖아요.
uh that you can just see which email does it send to to the other ones.
다른 에이전트에게 어떤 이메일을 보내는지 보면 알 수 있어요.
Before we get into today's episode I just have a small message for listeners.
오늘 에피소드 시작 전에 청취자 여러분께 짧게 전할 말씀이 있어요.
Thank you.
감사합니다.
We would not be able to bring you the AI engineering, science, and entertainment content that you so clearly want if you didn't choose to also click in and tune into our content.
여러분이 저희 콘텐츠를 선택하고 시청해 주지 않으셨다면, 이렇게 AI 엔지니어링, 과학, 그리고 엔터테인먼트 콘텐츠를 계속 만들어갈 수 없었을 거예요.
We've been approached by sponsors on an almost daily basis.
거의 매일 스폰서 제안을 받고 있어요.
But fortunately, enough of you actually subscribe to us to keep all this sustainable without ads, and we want to keep it that way.
다행히 충분히 많은 분들이 구독해 주셔서, 광고 없이도 지속 가능하게 운영할 수 있어요. 그걸 계속 유지하고 싶어요.
But I just have one favor to ask all of you.
딱 한 가지만 부탁드릴게요.
The single most powerful, completely free thing you can do is to click that subscribe button.
완전히 무료로 할 수 있는 가장 강력한 행동은 구독 버튼을 눌러 주시는 거예요.
It's the only thing I'll ever ask of you.
그게 제가 부탁드릴 유일한 것이에요.
And it means absolutely everything to me and my team that works so hard to bring the inspace to you each and every week.
매주 Latent Space를 만들기 위해 열심히 일하는 저와 팀에게 정말 큰 의미가 있어요.
If you do it, I promise you, we'll never stop working to make the show even better.
그렇게 해주신다면, 더 좋은 쇼를 만들기 위해 절대 멈추지 않겠다고 약속해요.
Now, let's get into it.
자, 시작해 볼까요.
Welcome to Lucas and Axel from Anden Labs, and I'm joined by my favorite guest co-host.
Andon Labs의 Lukas와 Axel을 환영합니다. 제가 제일 좋아하는 게스트 공동 진행자도 함께합니다.
anything security, safety, alignment.
보안, 안전, 정렬 분야의 모든 것.
Uh, Vivu, uh, welcome.
Vibhu, 어서 오세요.
Thank you for having us.
초대해 주셔서 감사합니다.
Thank you.
감사합니다.
Let's match names to voices.
목소리와 이름을 맞춰볼게요.
Uh, maybe you want to take turns introducing yourselves.
돌아가며 자기소개 해주시겠어요?
Yeah, I'm Lucas
네, 저는 Lukas고요.
and I'm Axel.
저는 Axel입니다.
Let's introduce Anden Labs a bit.
Andon Labs를 좀 소개해 주세요.
Like, how did you guys come together?
어떻게 함께 하게 됐나요?
Um, you have different backgrounds, but you're both Swedish.
두 분 배경이 다른데 둘 다 스웨덴 출신이시잖아요.
Uh, was that like a big part of it?
그게 큰 계기가 됐나요?
Yeah.
네.
So, when I went to high school, there was this really cool guy who had a superpower.
고등학교 때 정말 멋진 친구가 있었는데, 특별한 능력이 하나 있었어요.
He could code.
코딩을 할 수 있었거든요.
So he made like the the webs or like the app for the for the for the school and stuff and he was super cool and I wanted to be like him and that was that guy.
학교 웹사이트나 앱을 만들어서 되게 멋있었는데, 저도 그 친구처럼 되고 싶었고, 그게 바로 이 친구예요.
Uh
어.
I don't know about this.
이건 모르겠는데.
So
그래서.
So you went to different universities, right?
두 분이 다른 대학교에 갔죠?
Yeah.
네.
But same high school.
근데 고등학교는 같았어요.
I see.
아, 그렇군요.
Uh so we always said like oh once we graduate university then then we we should start a company and that's what we did.
항상 졸업하면 같이 창업하자고 했는데, 그대로 했어요.
Oh there you go.
그렇군요.
Okay.
알겠어요.
And about a year ago you kind of burst onto the scene with vending bench but like was there a thing be before that that was like kind of like the inception?
약 1년 전에 Vending-Bench로 주목받으셨는데, 그 전에 계기가 된 게 있었나요?
Yeah.
네.
Yeah.
네.
So we did work uh with like anthropic was one of our early customers in doing valves.
Anthropic이 초기 고객 중 하나였고, 평가 작업을 함께 했어요.
So we did like dangerous capability valves.
위험 역량 평가 같은 걸 했죠.
Uh nothing we published openly but then we started thinking about doing some kind of public benchmark and one thing that we really started thinking about uh was like longunning agents and specifically agents managing businesses.
공개적으로 발표한 건 없는데, 공개 벤치마크를 만들어볼까 생각하다가 장기 실행 에이전트, 특히 비즈니스를 관리하는 에이전트를 주목하게 됐어요.
um cuz and this was like early 2025 uh and I think this the first like you know mentions of people will be running like one person unicorns or even autonomous companies.
2025년 초였는데, 1인 유니콘이나 자율 기업 같은 이야기들이 처음 나오던 때였죠.
So we thought let's make a benchmark of how well can an agent run the probably simplest business uh possible and uh that's probably uh running a vending machine.
그래서 에이전트가 가장 단순한 비즈니스를 얼마나 잘 운영할 수 있는지 벤치마크를 만들어보자 했고, 그게 자판기 운영이었어요.
So that's the first public one we did and it was very like there was almost no one that noticed it in the first couple of months I think.
첫 공개 벤치마크였는데, 처음 몇 달간은 거의 아무도 알아채지 못했어요.
Uh so we listed in February last year and then I think around Easter last year.
작년 2월에 올렸는데, 부활절 즈음에.
We got like the first semiviral tweet about it uh that someone else did.
다른 분이 올린 트윗이 반바이럴이 됐어요.
Yeah.
네.
I mean we tweeted a bunch uh when it came out and like tried our best.
출시할 때 트윗을 많이 올렸는데 최선을 다했죠.
We tried.
노력은 했어요.
It's the one at anthropic, right?
Anthropic에 있던 게 맞죠?
Yeah.
네.
So this
그래서.
is is a classic thing we should get out of the way.
이건 먼저 짚고 넘어가야 할 게 있어요.
Exactly.
맞아요.
There's two versions.
두 가지 버전이 있어요.
Uh there's vending bench which is the simulated one which we did like completely independently in February.
시뮬레이션 버전인 Vending-Bench가 있고, 이건 2월에 완전히 독립적으로 만든 거예요.
Um and then like Axel said that was like that was the thing that didn't get any traction in the beginning but then some random person made a tweet about it and that that is the paper.
Axel이 말했듯이, 처음엔 아무 반응이 없었는데 어떤 분이 트윗을 올리면서 그게 바로 그 논문이 됐어요.
Correct.
맞아요.
Yeah.
네.
Um and then since we thought this was very fun, we thought like oh um
재미있었기 때문에, 음.
I think this is also like one thing with under labs like the way we kind of like decide what to do next and what projects to do.
이게 Andon Labs의 방식이기도 한데, 다음에 뭘 할지 어떤 프로젝트를 할지 결정하는 방식이요.
It's like what is like the heristic we use is like what is fun is what would be a fun project and and doing this in real life sounded quite fun for us uh and maybe also scientifically useful.
우리가 쓰는 기준은 뭐가 재미있냐는 거예요. 실제로 해보면 재밌겠다 싶었고, 과학적으로도 유용할 것 같았어요.
So, uh, then we basically had this idea and then we like, but then we needed a place for it and like putting it out in that public would probably not really work, uh, would get vandalized and stuff.
그래서 아이디어는 있었는데, 장소가 필요했고 공개 장소에 두면 아마 훼손될 것 같아서요.
So, we we pitched it to to the people we were already working with at Antropic and they were like, "Yeah, you can have space.
이미 같이 일하던 Anthropic 분들에게 제안했더니 '장소 드릴게요, 재미있겠네요'라고 했어요.
This sounds fun."
재미있겠다고요.
Um, I mean, it's like a small fridge, right?
작은 냉장고 같은 거잖아요?
It's like a mini fridge, you know, people.
미니 냉장고 같은 거죠.
There's like a stripe thing.
Stripe 같은 결제 장치도 있고요.
This was like OG the early one.
이건 초기 버전이었죠.
Yeah.
네.
on this.
이 위에요.
We saw it in June, like two 2 months after
설치된 지 2달쯤 된 6월에 봤어요.
after it had been there.
있은 지 조금 지난 뒤였죠.
They upgraded a little bit.
약간 업그레이드가 됐더라고요.
There's a security camera for making sure you actually Venmo the thing.
실제로 Venmo 결제하는지 확인하는 보안 카메라도 있었어요.
Yeah.
네.
So, like my impression, I mean, okay, we're we're going straight into project project van because it's such a iconic thing.
그러니까 제 느낌은, 일단 Project Vend 얘기로 바로 가죠. 워낙 상징적인 프로젝트니까요.
I do want to cover a little bit of that the origin story even before project van and even into vending bench.
Project Vend 전, Vending-Bench 이전 창업 스토리도 좀 다루고 싶어요.
I I think a lot of people are like yourselves like smart interested in in future of AI interested in developing evals
AI의 미래에 관심 많고 eval 개발에 관심 있는 분들이 많은데.
but how the hell do you just like walk into enthropics doors and like work with them right like what what is the what are they looking for
어떻게 Anthropic 문을 두드려서 같이 일하게 됐는지, 그들이 뭘 찾는지.
what what works and then maybe like when you launch
어떻게 되는 건지, 그리고 론칭할 때.
I always think like obviously it would be better to launch with a lab but uh sometimes
연구소와 함께 론칭하면 더 좋다고 생각하는데, 가끔은.
harder to do than it seems
생각보다 쉽지 않죠.
yeah exactly so either either of those like which are more sort of newbie beginner questions but like I think it's meaningful advice to others
맞아요. 초보자 질문 같지만 다른 분들에게도 의미 있는 조언이 될 것 같아요.
yeah we we get this question a
이 질문 정말 많이 받는데요.
And I I don't think our experience is is maybe the best.
저희 경험이 가장 좋은 예는 아닐 수 있어요.
Uh but like the way we did it was that we just built a bunch of things that we had conviction would be useful.
저희는 유용할 거라 확신하는 걸 많이 만들어서 그냥 가져다줬어요.