Arthur Agent Toolkit
Product & Design Lead
Series B · Open Source · New York, NY · 2025–2026

Built the Arthur Agent Toolkit from 0 to 1, an open-source toolkit for building, testing, and monitoring AI agents in production. One workflow for the entire agent lifecycle: prompt versioning, evaluation, A/B testing, and live observability in a single tool.
Technology Partners
Challenge
Agent development was a stitched-together mess. Teams were building prompts in one tool, testing in another, deploying through a third, and monitoring through none. There was no single workflow for shipping agents reliably, and most "agent frameworks" stopped at prototyping. The hard parts, evaluation, regression testing, production monitoring, had no home.
Solution
Designed and shipped the Arthur Agent Toolkit as an open-source workflow that spans the full agent lifecycle. Built prompt versioning and A/B testing for safe iteration, pre-deployment evaluation harnesses to catch regressions, and end-to-end tracing instrumented with OpenTelemetry. Designed for teams that need to ship agents, not just prototype them.
Impact
- •Took the Agent Toolkit from 0 to 1, open-source from day one, built for production agent teams
- •Championed AI-native workflow adoption across engineering, product, and GTM, tripling shipping velocity
- •Unified the agent development lifecycle in a single tool, build, test, ship, and monitor without switching tools
- •Shipped prompt versioning and A/B testing, making agent iteration as rigorous as model development
- •Built end-to-end tracing on OpenTelemetry, full observability into every agent call, tool use, and decision
- •Closed the loop with the Arthur Platform, agents tested in the Toolkit ship to production with the same evaluators
Results
- •3x increase in shipping velocity across engineering, product, and GTM after AI-native workflow rollout
- •Established Arthur as the category-defining workflow for production agent development
- •Drove adoption among agent teams shipping into production environments
- •Tied Arthur's OSS surface together, Engine, Toolkit, and Platform as one continuous developer journey
Design


