Your AI Agent Has the Same Red Flags as Your Ex
A field guide to the five ways agents misbehave in production, and how a trace turns each one from a mystery into a span you can see.
Read the field notes →Product manager based in Sofia. I work on agent observability at Progress Software, and I'm the founder of Babuger, an AI SDR platform that runs email and LinkedIn outreach end-to-end.
Traces, evals, cost, latency and replay for agentic systems. Currently scoping the MVP and the early-access program.
AI sales agents that handle outbound email, LinkedIn and follow-up without a human orchestrating every thread.
Free generator for LLM-as-Judge evaluator prompts. Paste a trace, get a research-grounded prompt with a built-in stress test.
Free analyzer for MCP servers and tool-call schemas. Static + LLM-assisted findings cited to primary research, plus a portable test pack and an open-source CLI for live servers.
Most of what I do sits at the same edge: AI agents that look flawless in a demo and start drifting the moment they meet real users, real tools, and real money. I spend my days writing specs, scoping go-to-market, and shipping the tooling that lets teams tell whether their agents are actually behaving in production.
A field guide to the five ways agents misbehave in production, and how a trace turns each one from a mystery into a span you can see.
Read the field notes →Long-form explainer on why agents fail in production and what a useful stack actually covers.
Read the explainer →Working on something at the edge of agents and production? Send me a note.