AN
← Back to projects
Open SourceEvaluation0→1Developer ToolsAI Governance

Arthur Engine

Product & Design Lead

Series B · Open Source · New York, NY · 2024–2026

Arthur AI hero

Built Arthur Engine from 0 to 1, a free, open-source toolkit for evaluating AI models. Designed to give every developer the same evaluation primitives that enterprises pay for, packaged for the open-source community and built to run anywhere.

Technology Partners

OpenAIAnthropicHugging FaceOpenTelemetry

Challenge

Real-time AI evaluation was locked behind enterprise contracts. Independent developers, researchers, and small teams had no way to run production-grade guardrails on their own models without rolling their own from scratch, and the open-source landscape was a patchwork of one-off scripts, not a coherent toolkit.

Solution

Defined the product, shaped the surface area, and shipped Arthur Engine as an open-source evaluation toolkit. Engine ships with built-in evaluators for hallucination, toxicity, PII, sensitive data, and prompt injection, usable on any model, in any environment, with a few lines of code. Built developer-first: clear docs, low setup cost, and a clean upgrade path to the enterprise platform.

Impact

  • Led the 0-to-1 launch of the open-source Evals Engine, cutting deployment time-to-value 90% for early adopters
  • First open-source real-time AI evaluation toolkit in the agentic infrastructure category
  • Designed the developer surface from scratch, installable in minutes, runs anywhere, framework-agnostic
  • Shipped a complete evaluator suite out of the box: hallucination, toxicity, PII, sensitive data, and prompt injection
  • Created a clean OSS-to-enterprise path, devs adopt Engine, teams graduate to the Arthur Platform

Results

  • 90% reduction in deployment time-to-value for early adopters
  • Arthur Engine reached thousands of developers within weeks of launch
  • Became Arthur's top of funnel, driving inbound from devs to enterprise pipeline
  • Established Arthur as the open-source standard for production AI evaluation

Design

Arthur AI screenshot 1
Arthur AI screenshot 2