# Building an AI Customer Support Chatbot with Claude + Your Knowledge Base
TL;DR
Build a customer support chatbot that answers from YOUR knowledge base using Claude (or GPT). This guide covers the full build: scraping your docs, embedding into a vector DB, retrieval-augmented generation, conversation memory, human handoff, and deployment on DomainIndia hosting.
## Why a KB-grounded chatbot beats generic AI
A generic ChatGPT plugin:
- Doesn't know your product details
- Confidently hallucinates wrong answers
- Can't cite sources
- Users don't trust it
A KB-grounded chatbot:
- Answers only from your real docs
- Says "I don't know" when unsure
- Shows which KB article the answer came from
- Reduces support tickets 30–60% (our customers' measured impact)
## Architecture
```
User question
│
▼
[Embed question] ──► [Vector DB search] ──► Top 5 relevant KB chunks
│
▼
[Claude/GPT with context]
│
▼
Answer + citations → User
│
[Unanswered? → Escalate to human]
```
Pieces:
- **KB source** — your help center, docs, FAQs
- **Vector DB** — pgvector on PostgreSQL
- **Embedding model** — OpenAI `text-embedding-3-small` (cheap, fast)
- **LLM** — Claude Sonnet or GPT-4o for quality; Haiku/Mini for cost
- **Frontend** — widget on your website or ticket system
See prerequisites in our [RAG guide](https://domainindia.com/support/kb/building-rag-system-vector-db-embeddings).
## Step 1 — Ingest your KB
Pull articles from wherever they live:
```python
import requests
from bs4 import BeautifulSoup
def scrape_kb_article(url):
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
title = soup.select_one('h1').text.strip()
body = soup.select_one('article').text.strip()
return {'title': title, 'url': url, 'body': body}
# For DomainIndia KB:
articles = []
for slug_url in get_all_kb_urls():
articles.append(scrape_kb_article(slug_url))
```
If KB is in a DB you own, read directly — no need to scrape your own site.
## Step 2 — Chunk and embed
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI
import psycopg2
client = OpenAI()
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
conn = psycopg2.connect("dbname=chatbot")
cur = conn.cursor()
for article in articles:
chunks = splitter.split_text(article['body'])
# Batch embed
embeddings = client.embeddings.create(
model='text-embedding-3-small',
input=chunks,
).data
for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
cur.execute("""
INSERT INTO kb_chunks (article_url, article_title, chunk_index, content, embedding)
VALUES (%s, %s, %s, %s, %s)
""", (article['url'], article['title'], i, chunk, emb.embedding))
conn.commit()
```
Schema:
```sql
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE kb_chunks (
id bigserial PRIMARY KEY,
article_url text NOT NULL,
article_title text NOT NULL,
chunk_index int,
content text NOT NULL,
embedding vector(1536),
created_at timestamptz DEFAULT now()
);
CREATE INDEX ON kb_chunks USING hnsw (embedding vector_cosine_ops);
```
## Step 3 — Retrieval + LLM
```python
import anthropic
claude = anthropic.Anthropic()
def answer(question, conversation_history=[]):
# 1. Embed the question
q_vec = client.embeddings.create(
model='text-embedding-3-small',
input=question,
).data[0].embedding
# 2. Retrieve top 5 chunks
cur.execute("""
SELECT article_title, article_url, content,
1 - (embedding <=> %s::vector) AS similarity
FROM kb_chunks
ORDER BY embedding <=> %s::vector
LIMIT 5
""", (q_vec, q_vec))
chunks = cur.fetchall()
# 3. Build context
context = "
---
".join([
f"## {title} ({url})
{content}"
for title, url, content, _ in chunks
])
# 4. Call Claude with strict grounding prompt
system = """You are a helpful customer support assistant for DomainIndia.
Answer ONLY from the provided knowledge base context below.
If the context doesn't cover the question, say "I don't have information about that in our knowledge base — let me connect you with our team."
Always cite the article URL you used in format [source: URL].
Be concise — 2-3 paragraphs max."""
user_msg = f"Context:
{context}
Question: {question}"
resp = claude.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=system,
messages=conversation_history + [{"role": "user", "content": user_msg}],
)
return {
"answer": resp.content[0].text,
"sources": [{"title": t, "url": u} for t, u, _, _ in chunks[:3]],
}
```
## Step 4 — Conversation memory
Multi-turn conversations need context. Pass previous turns:
```python
history = []
while True:
q = input("You: ")
result = answer(q, history)
print("Bot:", result['answer'])
history.append({"role": "user", "content": q})
history.append({"role": "assistant", "content": result['answer']})
# Trim if too long (Claude has 1M tokens but cost scales)
if len(history) > 20: history = history[-20:]
```
For production, store history in Redis keyed by session ID.
## Step 5 — Human handoff
The bot should escalate when:
- Confidence low (similarity score < 0.4 on top result)
- User asks to talk to a human
- Question is about account-specific data the bot can't see
- Two unsuccessful retries
```python
HANDOFF_TRIGGERS = [
'talk to human', 'agent please', 'real person', 'not helpful'
]
def should_handoff(question, result):
if any(t in question.lower() for t in HANDOFF_TRIGGERS): return True
if result.get('top_similarity', 1) < 0.4: return True
if "I don't have information" in result['answer']: return True
return False
if should_handoff(question, result):
# Create ticket in your system
create_support_ticket(user_id, question, history)
return "I've escalated this to our support team — they'll reach you at your email within 30 minutes."
```
## Step 6 — The widget
Simple embedded chat widget for your site:
```html
```
## Step 7 — Safety guardrails
## Step 8 — Keep KB fresh
Re-index regularly so new KB articles become queryable:
```bash
# Cron: nightly 3 AM
0 3 * * * python3 /opt/chatbot/reindex.py
```
Incremental reindex — only re-embed changed articles (hash-compare).
## Measuring impact
Track:
- **Deflection rate** — % of chats not escalated to humans
- **Answer quality** — user thumbs-up/down, periodic manual review
- **Top unanswered questions** — drive new KB articles
- **Response latency** — target <3 seconds end-to-end
## Common pitfalls
## FAQ
Q
Claude or GPT for this?
Claude Sonnet for quality on grounding tasks (tends to stick to context better). GPT-4o Mini for cost. Test both on your data.
Q
What if my KB is in multiple languages?
OpenAI embeddings support 100+ languages natively. Multilingual KB works out of the box. For answer generation, Claude/GPT handle Hindi, Tamil, Bengali well.
Q
Cost per conversation?
~$0.002–0.005 per message with Claude Haiku + small embeddings. 10K conversations/month = ~$40. With self-hosted LLM (see our self-hosting guide): near-zero ongoing cost.
Q
Can I integrate with my existing ticket system?
Yes — most support systems (Zendesk, Freshdesk, our HostCore ticketing) have webhooks. Escalate via webhook when handoff triggered.
Q
Self-hosted chatbot on DomainIndia hosting?
Backend on VPS. Vector DB (pgvector) on same VPS. Widget served from shared or static hosting. Full control.
Build your AI support bot on a DomainIndia VPS — predictable costs, full privacy.
Explore VPS plans