When you talk to Michael Abramov, CEO of Keymakr & Keylabs, you get the sense he’s been living a few years ahead of the rest of us. He doesn’t just talk about AI in terms of what it can do today – he’s already mapping the road where “today” is a distant mirror in the rearview.
We sat down for what was meant to be a quick Q&A about AI agents. Out of that conversation, I had pages of notes, a new mental model for where AI is heading, and a clearer understanding of what’s hype and what’s actually happening.
From LLMs to True Agents
Abramov starts with a travel story. “If I ask a large language model to plan a trip from Toronto to Miami – four hours driving per day, scenic stops – it will give me an answer. It’s a static text response,” he says.
“But an agent? That’s different. First, it classifies the request as travel-related. Then it asks: what data do I need? Mapping routes, weather, fuel costs, hotels, restaurants. It breaks that into sub-tasks, sends them to specialized modules, and orchestrates the whole thing under one logic.”
It’s a simple but potent distinction:
- LLMs: Answer.
- Agents: Plan, coordinate, execute.
The next step – what excites and worries him – is when agents start acting on their own. Booking hotels. Making payments. Creating other agents to handle sub-problems. “It sounds futuristic, but we’re headed straight for it,” he says.
Why Data Annotation Still Matters
You might think in this agentic future, data labeling becomes an afterthought. Abramov shakes his head. “Annotation is the bridge between general intelligence and specialized capability.”
He paints a picture: a healthcare agent that doesn’t just “know” medicine but understands how to navigate patient forms, medical imagery, and symptom descriptions because it was trained on meticulously labeled examples.

“In agent systems, it’s not just about labeling raw data. You’re teaching a model how to think inside a workflow,” he says. “That means labeling reasoning steps, tool outputs, even success/failure feedback.”
The Context Window Problem Nobody Talks About
Abramov’s analogy is disarmingly simple: “Imagine you send me an email, but I can only read the first thousand characters. I’ll still reply – and you might never realize I missed half your message.”
That’s how LLM context limits work. If your input exceeds the limit, the rest is silently ignored. The model still returns a confident, coherent answer – sometimes completely detached from the part you really needed it to see.
“This is one of the most dangerous hidden risks,” Abramov warns. “People upload a huge document, get a beautiful analysis, and assume the AI read it all. It didn’t.”
AGI and the “Humanity’s Last Exam”
Talk turns to AGI benchmarks like Humanity’s Last Exam – 2,500 questions spanning everything from physics to chess. Current models score under 5%.
Abramov doesn’t romanticize it. “Passing the exam wouldn’t mean we have AGI. It would mean we trained a model to pass one test.”
For him, AGI means something much harder: reasoning across domains, making decisions, generalizing to new problems, and interpreting without narrow task boundaries. “We’re moving toward it, but we’re not there and maybe that’s a good thing.”
Where AGI Will Come From
If it ever arrives, Abramov believes AGI will be agent-based, coordinating a “collective intelligence” of specialized models. It won’t be one giant model, but a meta-agent that partially evolves itself.
“No single human knows everything, and no single LLM will either,” he says. “The future is orchestration.”
Risks We’re Underestimating
He leans forward when we get to the risks. “The most dangerous agents aren’t the smartest – they’re the ones with too much access.”
A mediocre agent plugged into corporate infrastructure or payment systems can do far more damage than a “genius” model that’s kept in a sandbox. “We need permission frameworks and containment systems. Intelligence isn’t the only safety concern – scope of action matters just as much.”
Human-in-the-Loop Isn’t Dead
Yes, annotation is automating fast. Off-the-shelf models can now handle huge portions of routine labeling in minutes. But for complex, novel, or high-stakes tasks? Humans still hold the edge.
“It’s like mechanical watches,” Abramov says. “Smartwatches have all the features. But mechanical watches remain valuable because of their precision and craftsmanship. The same will be true for boutique annotation.”
Ethics: Beyond Marketing
For Abramov, “ethical AI” isn’t a slogan – it’s a discipline. That means transparency in data sourcing, better labeling practices, and sustainability as a core metric. “Until we have global standards, it’s on builders to set the bar higher than ‘responsible AI’ PR.”
The Conversation That Sticks
By the time we wrap, I realize the biggest takeaway isn’t any one prediction – it’s the mental model Abramov uses to navigate the space.
He doesn’t treat AI as a monolith to be made “smarter.” He treats it as a system of systems, where planning, orchestration, and human judgment are just as important as the horsepower of any one model.
And if he’s right, the companies that win won’t just build better models – they’ll build the ecosystems that let those models think, act, and adapt together.
Photo by Melyna Valle on Unsplash
