<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>AgentReplay Blog</title>
        <link>https://agentreplay.dev/blog</link>
        <description>Updates, guides, and deep dives on AI observability, LLM evaluation, and agent debugging.</description>
        <lastBuildDate>Sun, 08 Feb 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[A Guide to AgentReplay's 20+ Evaluators]]></title>
            <link>https://agentreplay.dev/blog/evaluators-guide</link>
            <guid>https://agentreplay.dev/blog/evaluators-guide</guid>
            <pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Learn how to use AgentReplay's 20+ built-in evaluators for hallucination detection, RAGAS, G-Eval, toxicity checks, and adversarial testing — all running locally.]]></description>
            <content:encoded><![CDATA[<p>Evaluating AI agents is hard. AgentReplay ships with 20+ built-in evaluators that cover everything from hallucination detection to adversarial testing. Here's how to use them.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-evaluation-pyramid">The Evaluation Pyramid<a href="https://agentreplay.dev/blog/evaluators-guide#the-evaluation-pyramid" class="hash-link" aria-label="Direct link to The Evaluation Pyramid" title="Direct link to The Evaluation Pyramid" translate="no">​</a></h2>
<p>Not all evaluations are equal. We organize them in a pyramid from cheap/fast to expensive/thorough:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ┌─────────────┐</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │    CIP       │  ← Adversarial</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ (Saboteur)   │     (Most expensive)</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │  G-Eval      │  ← LLM-as-Judge</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │  RAGAS       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Hallucination│  ← Quality Checks</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Relevance    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Toxicity     │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          ├─────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Latency      │  ← Metrics</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Cost         │     (Cheapest)</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          │ Perplexity   │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">          └─────────────┘</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="progressive-evaluation">Progressive Evaluation<a href="https://agentreplay.dev/blog/evaluators-guide#progressive-evaluation" class="hash-link" aria-label="Direct link to Progressive Evaluation" title="Direct link to Progressive Evaluation" translate="no">​</a></h2>
<p>AgentReplay's <code>ProgressiveEvaluator</code> automatically manages this pyramid:</p>
<ol>
<li class=""><strong>Phase 1</strong> — Run cheap heuristic checks (latency, cost, perplexity)</li>
<li class=""><strong>Phase 2</strong> — If heuristics pass, run quality checks (hallucination, relevance)</li>
<li class=""><strong>Phase 3</strong> — If quality is uncertain, escalate to LLM-as-judge (G-Eval)</li>
</ol>
<p>This saves LLM API costs by only running expensive evaluations when needed.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="our-favorite-evaluators">Our Favorite Evaluators<a href="https://agentreplay.dev/blog/evaluators-guide#our-favorite-evaluators" class="hash-link" aria-label="Direct link to Our Favorite Evaluators" title="Direct link to Our Favorite Evaluators" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="g-eval">G-Eval<a href="https://agentreplay.dev/blog/evaluators-guide#g-eval" class="hash-link" aria-label="Direct link to G-Eval" title="Direct link to G-Eval" translate="no">​</a></h3>
<p>The gold standard for LLM evaluation. Automatically generates chain-of-thought evaluation steps:</p>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token function" style="color:rgb(80, 250, 123)">curl</span><span class="token plain"> </span><span class="token parameter variable" style="color:rgb(189, 147, 249);font-style:italic">-X</span><span class="token plain"> POST http://127.0.0.1:47100/api/v1/evals/geval </span><span class="token punctuation" style="color:rgb(248, 248, 242)">\</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">  </span><span class="token parameter variable" style="color:rgb(189, 147, 249);font-style:italic">-d</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">'{</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "trace_id": "abc-123",</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "criteria": ["relevance", "coherence", "fluency"],</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">    "rubric": "Score 1-5 based on..."</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token string" style="color:rgb(255, 121, 198)">  }'</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="cip-causal-integrity-protocol">CIP (Causal Integrity Protocol)<a href="https://agentreplay.dev/blog/evaluators-guide#cip-causal-integrity-protocol" class="hash-link" aria-label="Direct link to CIP (Causal Integrity Protocol)" title="Direct link to CIP (Causal Integrity Protocol)" translate="no">​</a></h3>
<p>Our most novel evaluator. Creates "saboteur agents" that:</p>
<ol>
<li class="">Perturb the input in subtle ways</li>
<li class="">Run the agent on perturbed inputs</li>
<li class="">Check if the output changes appropriately</li>
</ol>
<p>This tests <strong>causal reasoning</strong> — does the agent understand <em>why</em> it produces its output, or is it just pattern matching?</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ragas">RAGAS<a href="https://agentreplay.dev/blog/evaluators-guide#ragas" class="hash-link" aria-label="Direct link to RAGAS" title="Direct link to RAGAS" translate="no">​</a></h3>
<p>Comprehensive RAG evaluation:</p>
<ul>
<li class=""><strong>QAG Faithfulness</strong> — Is the answer faithful to the context?</li>
<li class=""><strong>Embedding Answer Relevance</strong> — Is the answer relevant to the question?</li>
<li class=""><strong>Claim Verification</strong> — Are specific claims supported?</li>
<li class=""><strong>NLI Verdict</strong> — Natural language inference check</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="running-a-full-eval-suite">Running a Full Eval Suite<a href="https://agentreplay.dev/blog/evaluators-guide#running-a-full-eval-suite" class="hash-link" aria-label="Direct link to Running a Full Eval Suite" title="Direct link to Running a Full Eval Suite" translate="no">​</a></h2>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> requests</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Create a dataset</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">dataset </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string" style="color:rgb(255, 121, 198)">"http://127.0.0.1:47100/api/v1/evals/datasets"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"name"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"support-qa-v1"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"description"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Customer support Q&amp;A pairs"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Add test cases</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"http://127.0.0.1:47100/api/v1/evals/datasets/</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">dataset</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'id'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">/examples"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"input"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"How do I reset my password?"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"expected"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Navigate to Settings &gt; Security &gt; Reset Password"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Create and run an evaluation run</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">run </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">post</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string" style="color:rgb(255, 121, 198)">"http://127.0.0.1:47100/api/v1/evals/runs"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> json</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"dataset_id"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> dataset</span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string" style="color:rgb(255, 121, 198)">"id"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    </span><span class="token string" style="color:rgb(255, 121, 198)">"evaluators"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string" style="color:rgb(255, 121, 198)">"hallucination"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"relevance"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"geval"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"ragas"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token comment" style="color:rgb(98, 114, 164)"># Check results</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">results </span><span class="token operator">=</span><span class="token plain"> requests</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">get</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"http://127.0.0.1:47100/api/v1/evals/runs/</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">run</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'id'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">json</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">print</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">f"Overall score: </span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string-interpolation interpolation">results</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'summary'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token string-interpolation interpolation string" style="color:rgb(255, 121, 198)">'mean_score'</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token string-interpolation interpolation punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token string-interpolation string" style="color:rgb(255, 121, 198)">"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><br></span></code></pre></div></div>
<p><a class="" href="https://agentreplay.dev/docs/features/evaluations">Explore all evaluators →</a></p>]]></content:encoded>
            <category>evaluations</category>
            <category>quality</category>
            <category>guide</category>
            <category>llm-testing</category>
            <category>hallucination-detection</category>
        </item>
        <item>
            <title><![CDATA[Architecture Deep Dive: 16 Rust Crates]]></title>
            <link>https://agentreplay.dev/blog/architecture-deep-dive</link>
            <guid>https://agentreplay.dev/blog/architecture-deep-dive</guid>
            <pubDate>Sat, 07 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[How AgentReplay's modular Rust architecture delivers sub-millisecond vector search, 50K+ traces/sec ingestion, and ACID storage — all in a single binary.]]></description>
            <content:encoded><![CDATA[<p>AgentReplay is built as a modular Rust workspace with 16 focused crates. Here's why we chose this architecture and how each piece fits together.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-rust">Why Rust?<a href="https://agentreplay.dev/blog/architecture-deep-dive#why-rust" class="hash-link" aria-label="Direct link to Why Rust?" title="Direct link to Why Rust?" translate="no">​</a></h2>
<p>We chose Rust for three reasons:</p>
<ol>
<li class=""><strong>Performance</strong> — Sub-millisecond vector search, high-throughput WAL writes, SIMD-optimized embeddings</li>
<li class=""><strong>Memory safety</strong> — No garbage collection pauses, no null pointer exceptions</li>
<li class=""><strong>Single binary</strong> — Ship a self-contained executable with zero runtime dependencies</li>
</ol>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-crate-map">The Crate Map<a href="https://agentreplay.dev/blog/architecture-deep-dive#the-crate-map" class="hash-link" aria-label="Direct link to The Crate Map" title="Direct link to The Crate Map" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">┌─ Transport ──────────────────────────┐</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-server (Axum + gRPC)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-tauri (Desktop)         │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-cli (CLI)               │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Intelligence ───────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-evals (20+ evaluators)  │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-prompts (Versioning)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-query (Search engine)   │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-memory (Persistence)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Storage ────────────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-storage (SochDB)        │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-index (HNSW + PQ)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Core ───────────────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-core (Data types)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-observability (OTEL)    │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">├─ Extensibility ──────────────────────┤</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-plugins (WASM)          │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">│  agentreplay-experiments (A/B)       │</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">└──────────────────────────────────────┘</span><br></span></code></pre></div></div>
<p>Each crate has a single responsibility and clear API boundaries. This lets us:</p>
<ul>
<li class=""><strong>Test in isolation</strong> — Each crate has its own test suite</li>
<li class=""><strong>Compile in parallel</strong> — Cargo builds independent crates concurrently</li>
<li class=""><strong>Replace components</strong> — Swap the storage engine without touching evaluators</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="key-design-decisions">Key Design Decisions<a href="https://agentreplay.dev/blog/architecture-deep-dive#key-design-decisions" class="hash-link" aria-label="Direct link to Key Design Decisions" title="Direct link to Key Design Decisions" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="sochdb-for-storage">SochDB for Storage<a href="https://agentreplay.dev/blog/architecture-deep-dive#sochdb-for-storage" class="hash-link" aria-label="Direct link to SochDB for Storage" title="Direct link to SochDB for Storage" translate="no">​</a></h3>
<p>We built on SochDB rather than SQLite or RocksDB because:</p>
<ul>
<li class="">ACID transactions with MVCC — concurrent reads during writes</li>
<li class="">Group Commit WAL — ~10× throughput vs standard WAL</li>
<li class="">Adaptive sketches — HyperLogLog, CountMinSketch, DDSketch built-in</li>
<li class="">No external process — runs in-process with zero setup</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hnsw-for-vector-search">HNSW for Vector Search<a href="https://agentreplay.dev/blog/architecture-deep-dive#hnsw-for-vector-search" class="hash-link" aria-label="Direct link to HNSW for Vector Search" title="Direct link to HNSW for Vector Search" translate="no">​</a></h3>
<p>Our HNSW implementation uses:</p>
<ul>
<li class=""><strong>Lock-free entry point</strong> with packed atomic CAS</li>
<li class=""><strong>CSR graph</strong> for cache-efficient traversal</li>
<li class=""><strong>Hot buffer</strong> for inserts without graph rebuilds</li>
<li class=""><strong>Product Quantization</strong> — 32× memory reduction (15 GB → 480 MB for 10M vectors)</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hybrid-logical-clock">Hybrid Logical Clock<a href="https://agentreplay.dev/blog/architecture-deep-dive#hybrid-logical-clock" class="hash-link" aria-label="Direct link to Hybrid Logical Clock" title="Direct link to Hybrid Logical Clock" translate="no">​</a></h3>
<p>We use HLC instead of wall clocks for causal ordering:</p>
<ul>
<li class="">Physical component for human-readable timestamps</li>
<li class="">Logical component for causal ordering when clocks collide</li>
<li class="">Guaranteed monotonic within a process</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="performance-numbers">Performance Numbers<a href="https://agentreplay.dev/blog/architecture-deep-dive#performance-numbers" class="hash-link" aria-label="Direct link to Performance Numbers" title="Direct link to Performance Numbers" translate="no">​</a></h2>
<table><thead><tr><th>Metric</th><th>Value</th></tr></thead><tbody><tr><td>Trace ingestion</td><td>50K+ traces/sec</td></tr><tr><td>Vector search (1M vectors)</td><td>&lt; 1ms p99</td></tr><tr><td>WAL write (group commit)</td><td>200K+ writes/sec</td></tr><tr><td>Embedding (local ONNX)</td><td>~5ms per text</td></tr><tr><td>PQ compression ratio</td><td>32×</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="whats-next">What's Next<a href="https://agentreplay.dev/blog/architecture-deep-dive#whats-next" class="hash-link" aria-label="Direct link to What's Next" title="Direct link to What's Next" translate="no">​</a></h2>
<p>We're working on:</p>
<ul>
<li class="">Distributed mode with Raft consensus</li>
<li class="">GPU-accelerated embedding pipeline</li>
<li class="">More evaluator plugins</li>
<li class="">React Native mobile app</li>
</ul>
<p><a class="" href="https://agentreplay.dev/docs/architecture">Explore the architecture →</a></p>]]></content:encoded>
            <category>architecture</category>
            <category>rust</category>
            <category>engineering</category>
            <category>performance</category>
        </item>
        <item>
            <title><![CDATA[Welcome to AgentReplay]]></title>
            <link>https://agentreplay.dev/blog/welcome</link>
            <guid>https://agentreplay.dev/blog/welcome</guid>
            <pubDate>Sun, 01 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Introducing AgentReplay — the open-source, 100% local AI observability platform. Trace every LLM call, evaluate with 20+ evaluators, and keep all data on your machine.]]></description>
            <content:encoded><![CDATA[<p>We're excited to introduce <strong>AgentReplay</strong> — the open-source AI observability platform that runs 100% locally.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-we-built-this">Why We Built This<a href="https://agentreplay.dev/blog/welcome#why-we-built-this" class="hash-link" aria-label="Direct link to Why We Built This" title="Direct link to Why We Built This" translate="no">​</a></h2>
<p>As AI agents become more complex, understanding what they do and how well they do it is critical. Existing observability tools either:</p>
<ul>
<li class=""><strong>Send your data to the cloud</strong> — raising privacy and compliance concerns</li>
<li class=""><strong>Cost $50–500+/month</strong> — making them inaccessible to individual developers and small teams</li>
<li class=""><strong>Provide limited evaluators</strong> — typically 3–5 basic metrics</li>
</ul>
<p>AgentReplay solves all three problems. It runs entirely on your machine, it's free and open source, and it ships with 20+ built-in evaluators.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-you-get">What You Get<a href="https://agentreplay.dev/blog/welcome#what-you-get" class="hash-link" aria-label="Direct link to What You Get" title="Direct link to What You Get" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="tracing">Tracing<a href="https://agentreplay.dev/blog/welcome#tracing" class="hash-link" aria-label="Direct link to Tracing" title="Direct link to Tracing" translate="no">​</a></h3>
<p>OpenTelemetry-native tracing that captures every LLM call, tool invocation, and agent step. Auto-instrument OpenAI, Anthropic, LangChain, and LlamaIndex with zero code changes.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="20-evaluators">20+ Evaluators<a href="https://agentreplay.dev/blog/welcome#20-evaluators" class="hash-link" aria-label="Direct link to 20+ Evaluators" title="Direct link to 20+ Evaluators" translate="no">​</a></h3>
<p>From hallucination detection to RAGAS and G-Eval. Run evaluations locally without sending your data anywhere.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="prompt-management">Prompt Management<a href="https://agentreplay.dev/blog/welcome#prompt-management" class="hash-link" aria-label="Direct link to Prompt Management" title="Direct link to Prompt Management" translate="no">​</a></h3>
<p>Semantic versioning, A/B traffic splitting, canary rollouts, and deployment environments. Treat your prompts like production code.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="mcp-server">MCP Server<a href="https://agentreplay.dev/blog/welcome#mcp-server" class="hash-link" aria-label="Direct link to MCP Server" title="Direct link to MCP Server" translate="no">​</a></h3>
<p>Built-in Model Context Protocol server that lets Claude, Cursor, and other AI tools search and explore your traces.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="desktop-app">Desktop App<a href="https://agentreplay.dev/blog/welcome#desktop-app" class="hash-link" aria-label="Direct link to Desktop App" title="Direct link to Desktop App" translate="no">​</a></h3>
<p>Native macOS, Windows, and Linux app with an embedded server, OTLP receiver, and full React UI — all in one download.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="get-started">Get Started<a href="https://agentreplay.dev/blog/welcome#get-started" class="hash-link" aria-label="Direct link to Get Started" title="Direct link to Get Started" translate="no">​</a></h2>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token plain">pip </span><span class="token function" style="color:rgb(80, 250, 123)">install</span><span class="token plain"> agentreplay</span><br></span></code></pre></div></div>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#F8F8F2;--prism-background-color:#282A36"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#F8F8F2;background-color:#282A36"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#F8F8F2"><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> agentreplay</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">from</span><span class="token plain"> openai </span><span class="token keyword" style="color:rgb(189, 147, 249);font-style:italic">import</span><span class="token plain"> OpenAI</span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">agentreplay</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">init</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">client </span><span class="token operator">=</span><span class="token plain"> agentreplay</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">wrap_openai</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token plain">OpenAI</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">response </span><span class="token operator">=</span><span class="token plain"> client</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">chat</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">completions</span><span class="token punctuation" style="color:rgb(248, 248, 242)">.</span><span class="token plain">create</span><span class="token punctuation" style="color:rgb(248, 248, 242)">(</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    model</span><span class="token operator">=</span><span class="token string" style="color:rgb(255, 121, 198)">"gpt-4o"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain">    messages</span><span class="token operator">=</span><span class="token punctuation" style="color:rgb(248, 248, 242)">[</span><span class="token punctuation" style="color:rgb(248, 248, 242)">{</span><span class="token string" style="color:rgb(255, 121, 198)">"role"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"user"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">,</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"content"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">:</span><span class="token plain"> </span><span class="token string" style="color:rgb(255, 121, 198)">"Hello!"</span><span class="token punctuation" style="color:rgb(248, 248, 242)">}</span><span class="token punctuation" style="color:rgb(248, 248, 242)">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#F8F8F2"><span class="token plain"></span><span class="token punctuation" style="color:rgb(248, 248, 242)">)</span><br></span></code></pre></div></div>
<p>That's it. Three lines and your agent is fully instrumented.</p>
<p><a class="" href="https://agentreplay.dev/docs/getting-started">Read the docs →</a></p>]]></content:encoded>
            <category>launch</category>
            <category>open-source</category>
            <category>ai-observability</category>
            <category>local-first</category>
        </item>
    </channel>
</rss>