Corey Bello • Applied AI & Solutions Engineer

Enterprise RAG Chatbot (Agentic RAG — In Development/Testing)

Window: 15 apps • 2,000+ docs • 4-week development

Baseline → After: Status: internal testing

Method: Eval harness • groundedness • citation accuracy • latency • guardrail compliance

Agentic, self-hosted RAG system that consumes the KBMS corpus (15 apps, 2,000+ docs) to deliver cited, policy-compliant answers with no-answer fallback, guardrails, and nightly evals.

Role: ArchitectTeam: 1Duration: 4 weeks

Python

embeddings + vector store

retrieval middleware

LLM templates

logging/CI

Problem

Analysts lacked a safe, reliable way to query 2,000+ internal docs — they needed answers that were not only cited, but also policy-compliant and self-correcting.

Approach

Agentic loop: ingestion → enrichment → embeddings + vector DB → retrieval + optional rerank → synthesis → guardrails (must-cite, PII) → templated LLM answer. Backed by nightly evals (groundedness, citation accuracy, latency) and observability logs.

Results

Produces cited, policy-compliant answers with measurable no-answer fallback and self-critique via guardrails. Evaluation harness tracks groundedness, citation accuracy, latency. Status: Internal testing.

How we measured →

Architecture

Vector DB for embeddings + metadata filters • Retrieval middleware + optional reranker • Guardrails for must-cite + PII/secret filtering • Policy-driven no-answer fallback • Evaluation harness (Ragas nightly metrics) • Observability: Langfuse/Phoenix + CI logging

Vector DB for embeddings + metadata filters
Retrieval middleware + optional reranker
Guardrails for must-cite + PII/secret filtering
Policy-driven no-answer fallback
Evaluation harness (Ragas nightly metrics)
Observability: Langfuse/Phoenix + CI logging

Key Challenges

Groundedness on edge cases
Near-duplicate handling
Chunk size vs recall

Lessons Learned

Prioritize "no-answer" quality
Version prompts and track deltas
Log everything

Explore the Project

View Demo Evidence Repo

Interested in similar results?

Let's discuss how we can implement AI automation for your team.

Book a 15-min Intro