<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://agentreplay.dev/blog</id>
    <title>AgentReplay Blog</title>
    <updated>2026-02-08T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://agentreplay.dev/blog"/>
    <subtitle>Updates, guides, and deep dives on AI observability, LLM evaluation, and agent debugging.</subtitle>
    <icon>https://agentreplay.dev/img/icon.png</icon>
    <entry>
        <title type="html"><![CDATA[A Guide to AgentReplay's 20+ Evaluators]]></title>
        <id>https://agentreplay.dev/blog/evaluators-guide</id>
        <link href="https://agentreplay.dev/blog/evaluators-guide"/>
        <updated>2026-02-08T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Learn how to use AgentReplay's 20+ built-in evaluators for hallucination detection, RAGAS, G-Eval, toxicity checks, and adversarial testing — all running locally.]]></summary>
        <content type="html"><![CDATA[<p>Evaluating AI agents is hard. AgentReplay ships with 20+ built-in evaluators that cover everything from hallucination detection to adversarial testing. Here's how to use them.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-evaluation-pyramid">The Evaluation Pyramid<a href="https://agentreplay.dev/blog/evaluators-guide#the-evaluation-pyramid" class="hash-link" aria-label="Direct link to The Evaluation Pyramid" title="Direct link to The Evaluation Pyramid" translate="no">​</a></h2>
<p>Not all evaluations are equal. We organize them in a pyramid from cheap/fast to expensive/thorough:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ┌─────────────┐</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │    CIP       │  ← Adversarial</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ (Saboteur)   │     (Most expensive)</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │  G-Eval      │  ← LLM-as-Judge</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │  RAGAS       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Hallucination│  ← Quality Checks</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Relevance    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Toxicity     │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Latency      │  ← Metrics</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Cost         │     (Cheapest)</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Perplexity   │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          └─────────────┘</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="progressive-evaluation">Progressive Evaluation<a href="https://agentreplay.dev/blog/evaluators-guide#progressive-evaluation" class="hash-link" aria-label="Direct link to Progressive Evaluation" title="Direct link to Progressive Evaluation" translate="no">​</a></h2>
<p>AgentReplay's <code>ProgressiveEvaluator</code> automatically manages this pyramid:</p>
<ol>
<li class=""><strong>Phase 1</strong> — Run cheap heuristic checks (latency, cost, perplexity)</li>
<li class=""><strong>Phase 2</strong> — If heuristics pass, run quality checks (hallucination, relevance)</li>
<li class=""><strong>Phase 3</strong> — If quality is uncertain, escalate to LLM-as-judge (G-Eval)</li>
</ol>
<p>This saves LLM API costs by only running expensive evaluations when needed.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="our-favorite-evaluators">Our Favorite Evaluators<a href="https://agentreplay.dev/blog/evaluators-guide#our-favorite-evaluators" class="hash-link" aria-label="Direct link to Our Favorite Evaluators" title="Direct link to Our Favorite Evaluators" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="g-eval">G-Eval<a href="https://agentreplay.dev/blog/evaluators-guide#g-eval" class="hash-link" aria-label="Direct link to G-Eval" title="Direct link to G-Eval" translate="no">​</a></h3>
<p>The gold standard for LLM evaluation. Automatically generates chain-of-thought evaluation steps:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token function" style="color:rgb(80, 250, 123)">curl</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(189, 147, 249);font-style:italic">-X</span><span class="token plain"> POST http://127.0.0.1:47100/api/v1/evals/geval </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(189, 147, 249);font-style:italic">-d</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'{</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "trace_id": "abc-123",</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "criteria": ["relevance", "coherence", "fluency"],</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "rubric": "Score 1-5 based on..."</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">  }'</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cip-causal-integrity-protocol">CIP (Causal Integrity Protocol)<a href="https://agentreplay.dev/blog/evaluators-guide#cip-causal-integrity-protocol" class="hash-link" aria-label="Direct link to CIP (Causal Integrity Protocol)" title="Direct link to CIP (Causal Integrity Protocol)" translate="no">​</a></h3>
<p>Our most novel evaluator. Creates "saboteur agents" that:</p>
<ol>
<li class="">Perturb the input in subtle ways</li>
<li class="">Run the agent on perturbed inputs</li>
<li class="">Check if the output changes appropriately</li>
</ol>
<p>This tests <strong>causal reasoning</strong> — does the agent understand <em>why</em> it produces its output, or is it just pattern matching?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ragas">RAGAS<a href="https://agentreplay.dev/blog/evaluators-guide#ragas" class="hash-link" aria-label="Direct link to RAGAS" title="Direct link to RAGAS" translate="no">​</a></h3>
<p>Comprehensive RAG evaluation:</p>
<ul>
<li class=""><strong>QAG Faithfulness</strong> — Is the answer faithful to the context?</li>
<li class=""><strong>Embedding Answer Relevance</strong> — Is the answer relevant to the question?</li>
<li class=""><strong>Claim Verification</strong> — Are specific claims supported?</li>
<li class=""><strong>NLI Verdict</strong> — Natural language inference check</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="running-a-full-eval-suite">Running a Full Eval Suite<a href="https://agentreplay.dev/blog/evaluators-guide#running-a-full-eval-suite" class="hash-link" aria-label="Direct link to Running a Full Eval Suite" title="Direct link to Running a Full Eval Suite" translate="no">​</a></h2>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> requests</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Create a dataset</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">dataset </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string" style="color:rgb(255, 121, 198)">"http://127.0.0.1:47100/api/v1/evals/datasets"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"name"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"support-qa-v1"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"description"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Customer support Q&amp;A pairs"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Add test cases</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"http://127.0.0.1:47100/api/v1/evals/datasets/</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">dataset</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'id'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">/examples"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"input"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"How do I reset my password?"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"expected"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Navigate to Settings &gt; Security &gt; Reset Password"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Create and run an evaluation run</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">run </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string" style="color:rgb(255, 121, 198)">"http://127.0.0.1:47100/api/v1/evals/runs"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"dataset_id"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> dataset</span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string" style="color:rgb(255, 121, 198)">"id"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"evaluators"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string" style="color:rgb(255, 121, 198)">"hallucination"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"relevance"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"geval"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"ragas"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Check results</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">results </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">get</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"http://127.0.0.1:47100/api/v1/evals/runs/</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">run</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'id'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">print</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"Overall score: </span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">results</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'summary'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'mean_score'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><br></span></code></pre></div></div>
<p><a class="" href="https://agentreplay.dev/docs/features/evaluations">Explore all evaluators →</a></p>]]></content>
        <author>
            <name>AgentReplay Team</name>
            <uri>https://github.com/agentreplay</uri>
        </author>
        <category label="evaluations" term="evaluations"/>
        <category label="quality" term="quality"/>
        <category label="guide" term="guide"/>
        <category label="llm-testing" term="llm-testing"/>
        <category label="hallucination-detection" term="hallucination-detection"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Architecture Deep Dive: 16 Rust Crates]]></title>
        <id>https://agentreplay.dev/blog/architecture-deep-dive</id>
        <link href="https://agentreplay.dev/blog/architecture-deep-dive"/>
        <updated>2026-02-07T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[How AgentReplay's modular Rust architecture delivers sub-millisecond vector search, 50K+ traces/sec ingestion, and ACID storage — all in a single binary.]]></summary>
        <content type="html"><![CDATA[<p>AgentReplay is built as a modular Rust workspace with 16 focused crates. Here's why we chose this architecture and how each piece fits together.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-rust">Why Rust?<a href="https://agentreplay.dev/blog/architecture-deep-dive#why-rust" class="hash-link" aria-label="Direct link to Why Rust?" title="Direct link to Why Rust?" translate="no">​</a></h2>
<p>We chose Rust for three reasons:</p>
<ol>
<li class=""><strong>Performance</strong> — Sub-millisecond vector search, high-throughput WAL writes, SIMD-optimized embeddings</li>
<li class=""><strong>Memory safety</strong> — No garbage collection pauses, no null pointer exceptions</li>
<li class=""><strong>Single binary</strong> — Ship a self-contained executable with zero runtime dependencies</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-crate-map">The Crate Map<a href="https://agentreplay.dev/blog/architecture-deep-dive#the-crate-map" class="hash-link" aria-label="Direct link to The Crate Map" title="Direct link to The Crate Map" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">┌─ Transport ──────────────────────────┐</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-server (Axum + gRPC)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-tauri (Desktop)         │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-cli (CLI)               │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Intelligence ───────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-evals (20+ evaluators)  │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-prompts (Versioning)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-query (Search engine)   │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-memory (Persistence)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Storage ────────────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-storage (SochDB)        │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-index (HNSW + PQ)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Core ───────────────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-core (Data types)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-observability (OTEL)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Extensibility ──────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-plugins (WASM)          │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-experiments (A/B)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">└──────────────────────────────────────┘</span><br></span></code></pre></div></div>
<p>Each crate has a single responsibility and clear API boundaries. This lets us:</p>
<ul>
<li class=""><strong>Test in isolation</strong> — Each crate has its own test suite</li>
<li class=""><strong>Compile in parallel</strong> — Cargo builds independent crates concurrently</li>
<li class=""><strong>Replace components</strong> — Swap the storage engine without touching evaluators</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-design-decisions">Key Design Decisions<a href="https://agentreplay.dev/blog/architecture-deep-dive#key-design-decisions" class="hash-link" aria-label="Direct link to Key Design Decisions" title="Direct link to Key Design Decisions" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="sochdb-for-storage">SochDB for Storage<a href="https://agentreplay.dev/blog/architecture-deep-dive#sochdb-for-storage" class="hash-link" aria-label="Direct link to SochDB for Storage" title="Direct link to SochDB for Storage" translate="no">​</a></h3>
<p>We built on SochDB rather than SQLite or RocksDB because:</p>
<ul>
<li class="">ACID transactions with MVCC — concurrent reads during writes</li>
<li class="">Group Commit WAL — ~10× throughput vs standard WAL</li>
<li class="">Adaptive sketches — HyperLogLog, CountMinSketch, DDSketch built-in</li>
<li class="">No external process — runs in-process with zero setup</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hnsw-for-vector-search">HNSW for Vector Search<a href="https://agentreplay.dev/blog/architecture-deep-dive#hnsw-for-vector-search" class="hash-link" aria-label="Direct link to HNSW for Vector Search" title="Direct link to HNSW for Vector Search" translate="no">​</a></h3>
<p>Our HNSW implementation uses:</p>
<ul>
<li class=""><strong>Lock-free entry point</strong> with packed atomic CAS</li>
<li class=""><strong>CSR graph</strong> for cache-efficient traversal</li>
<li class=""><strong>Hot buffer</strong> for inserts without graph rebuilds</li>
<li class=""><strong>Product Quantization</strong> — 32× memory reduction (15 GB → 480 MB for 10M vectors)</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hybrid-logical-clock">Hybrid Logical Clock<a href="https://agentreplay.dev/blog/architecture-deep-dive#hybrid-logical-clock" class="hash-link" aria-label="Direct link to Hybrid Logical Clock" title="Direct link to Hybrid Logical Clock" translate="no">​</a></h3>
<p>We use HLC instead of wall clocks for causal ordering:</p>
<ul>
<li class="">Physical component for human-readable timestamps</li>
<li class="">Logical component for causal ordering when clocks collide</li>
<li class="">Guaranteed monotonic within a process</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance-numbers">Performance Numbers<a href="https://agentreplay.dev/blog/architecture-deep-dive#performance-numbers" class="hash-link" aria-label="Direct link to Performance Numbers" title="Direct link to Performance Numbers" translate="no">​</a></h2>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Trace ingestion</td><td>50K+ traces/sec</td></tr><tr><td>Vector search (1M vectors)</td><td>&lt; 1ms p99</td></tr><tr><td>WAL write (group commit)</td><td>200K+ writes/sec</td></tr><tr><td>Embedding (local ONNX)</td><td>~5ms per text</td></tr><tr><td>PQ compression ratio</td><td>32×</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-next">What's Next<a href="https://agentreplay.dev/blog/architecture-deep-dive#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next" translate="no">​</a></h2>
<p>We're working on:</p>
<ul>
<li class="">Distributed mode with Raft consensus</li>
<li class="">GPU-accelerated embedding pipeline</li>
<li class="">More evaluator plugins</li>
<li class="">React Native mobile app</li>
</ul>
<p><a class="" href="https://agentreplay.dev/docs/architecture">Explore the architecture →</a></p>]]></content>
        <author>
            <name>AgentReplay Team</name>
            <uri>https://github.com/agentreplay</uri>
        </author>
        <category label="architecture" term="architecture"/>
        <category label="rust" term="rust"/>
        <category label="engineering" term="engineering"/>
        <category label="performance" term="performance"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Welcome to AgentReplay]]></title>
        <id>https://agentreplay.dev/blog/welcome</id>
        <link href="https://agentreplay.dev/blog/welcome"/>
        <updated>2026-02-01T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Introducing AgentReplay — the open-source, 100% local AI observability platform. Trace every LLM call, evaluate with 20+ evaluators, and keep all data on your machine.]]></summary>
        <content type="html"><![CDATA[<p>We're excited to introduce <strong>AgentReplay</strong> — the open-source AI observability platform that runs 100% locally.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-we-built-this">Why We Built This<a href="https://agentreplay.dev/blog/welcome#why-we-built-this" class="hash-link" aria-label="Direct link to Why We Built This" title="Direct link to Why We Built This" translate="no">​</a></h2>
<p>As AI agents become more complex, understanding what they do and how well they do it is critical. Existing observability tools either:</p>
<ul>
<li class=""><strong>Send your data to the cloud</strong> — raising privacy and compliance concerns</li>
<li class=""><strong>Cost $50–500+/month</strong> — making them inaccessible to individual developers and small teams</li>
<li class=""><strong>Provide limited evaluators</strong> — typically 3–5 basic metrics</li>
</ul>
<p>AgentReplay solves all three problems. It runs entirely on your machine, it's free and open source, and it ships with 20+ built-in evaluators.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-you-get">What You Get<a href="https://agentreplay.dev/blog/welcome#what-you-get" class="hash-link" aria-label="Direct link to What You Get" title="Direct link to What You Get" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tracing">Tracing<a href="https://agentreplay.dev/blog/welcome#tracing" class="hash-link" aria-label="Direct link to Tracing" title="Direct link to Tracing" translate="no">​</a></h3>
<p>OpenTelemetry-native tracing that captures every LLM call, tool invocation, and agent step. Auto-instrument OpenAI, Anthropic, LangChain, and LlamaIndex with zero code changes.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="20-evaluators">20+ Evaluators<a href="https://agentreplay.dev/blog/welcome#20-evaluators" class="hash-link" aria-label="Direct link to 20+ Evaluators" title="Direct link to 20+ Evaluators" translate="no">​</a></h3>
<p>From hallucination detection to RAGAS and G-Eval. Run evaluations locally without sending your data anywhere.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="prompt-management">Prompt Management<a href="https://agentreplay.dev/blog/welcome#prompt-management" class="hash-link" aria-label="Direct link to Prompt Management" title="Direct link to Prompt Management" translate="no">​</a></h3>
<p>Semantic versioning, A/B traffic splitting, canary rollouts, and deployment environments. Treat your prompts like production code.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="mcp-server">MCP Server<a href="https://agentreplay.dev/blog/welcome#mcp-server" class="hash-link" aria-label="Direct link to MCP Server" title="Direct link to MCP Server" translate="no">​</a></h3>
<p>Built-in Model Context Protocol server that lets Claude, Cursor, and other AI tools search and explore your traces.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="desktop-app">Desktop App<a href="https://agentreplay.dev/blog/welcome#desktop-app" class="hash-link" aria-label="Direct link to Desktop App" title="Direct link to Desktop App" translate="no">​</a></h3>
<p>Native macOS, Windows, and Linux app with an embedded server, OTLP receiver, and full React UI — all in one download.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-started">Get Started<a href="https://agentreplay.dev/blog/welcome#get-started" class="hash-link" aria-label="Direct link to Get Started" title="Direct link to Get Started" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">pip </span><span class="token function" style="color:rgb(80, 250, 123)">install</span><span class="token plain"> agentreplay</span><br></span></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> agentreplay</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">from</span><span class="token plain"> openai </span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> OpenAI</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">agentreplay</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">init</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">client </span><span class="token operator">=</span><span class="token plain"> agentreplay</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">wrap_openai</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token plain">OpenAI</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">response </span><span class="token operator">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">create</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    model</span><span class="token operator">=</span><span class="token string" style="color:rgb(255, 121, 198)">"gpt-4o"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    messages</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string" style="color:rgb(255, 121, 198)">"role"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"user"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"content"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Hello!"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><br></span></code></pre></div></div>
<p>That's it. Three lines and your agent is fully instrumented.</p>
<p><a class="" href="https://agentreplay.dev/docs/getting-started">Read the docs →</a></p>]]></content>
        <author>
            <name>AgentReplay Team</name>
            <uri>https://github.com/agentreplay</uri>
        </author>
        <category label="launch" term="launch"/>
        <category label="open-source" term="open-source"/>
        <category label="ai-observability" term="ai-observability"/>
        <category label="local-first" term="local-first"/>
    </entry>
</feed>