Client Area

Building an AI Customer Support Chatbot with Claude + Your Knowledge Base

ByDomain India Team·DomainIndia Engineering
6 min read24 Apr 20264 views
# Building an AI Customer Support Chatbot with Claude + Your Knowledge Base
TL;DR
Build a customer support chatbot that answers from YOUR knowledge base using Claude (or GPT). This guide covers the full build: scraping your docs, embedding into a vector DB, retrieval-augmented generation, conversation memory, human handoff, and deployment on DomainIndia hosting.
## Why a KB-grounded chatbot beats generic AI A generic ChatGPT plugin: - Doesn't know your product details - Confidently hallucinates wrong answers - Can't cite sources - Users don't trust it A KB-grounded chatbot: - Answers only from your real docs - Says "I don't know" when unsure - Shows which KB article the answer came from - Reduces support tickets 30–60% (our customers' measured impact) ## Architecture ``` User question │ ▼ [Embed question] ──► [Vector DB search] ──► Top 5 relevant KB chunks │ ▼ [Claude/GPT with context] │ ▼ Answer + citations → User │ [Unanswered? → Escalate to human] ``` Pieces: - **KB source** — your help center, docs, FAQs - **Vector DB** — pgvector on PostgreSQL - **Embedding model** — OpenAI `text-embedding-3-small` (cheap, fast) - **LLM** — Claude Sonnet or GPT-4o for quality; Haiku/Mini for cost - **Frontend** — widget on your website or ticket system See prerequisites in our [RAG guide](https://domainindia.com/support/kb/building-rag-system-vector-db-embeddings). ## Step 1 — Ingest your KB Pull articles from wherever they live: ```python import requests from bs4 import BeautifulSoup def scrape_kb_article(url): html = requests.get(url).text soup = BeautifulSoup(html, 'html.parser') title = soup.select_one('h1').text.strip() body = soup.select_one('article').text.strip() return {'title': title, 'url': url, 'body': body} # For DomainIndia KB: articles = [] for slug_url in get_all_kb_urls(): articles.append(scrape_kb_article(slug_url)) ``` If KB is in a DB you own, read directly — no need to scrape your own site. ## Step 2 — Chunk and embed ```python from langchain.text_splitter import RecursiveCharacterTextSplitter from openai import OpenAI import psycopg2 client = OpenAI() splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100) conn = psycopg2.connect("dbname=chatbot") cur = conn.cursor() for article in articles: chunks = splitter.split_text(article['body']) # Batch embed embeddings = client.embeddings.create( model='text-embedding-3-small', input=chunks, ).data for i, (chunk, emb) in enumerate(zip(chunks, embeddings)): cur.execute(""" INSERT INTO kb_chunks (article_url, article_title, chunk_index, content, embedding) VALUES (%s, %s, %s, %s, %s) """, (article['url'], article['title'], i, chunk, emb.embedding)) conn.commit() ``` Schema: ```sql CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE kb_chunks ( id bigserial PRIMARY KEY, article_url text NOT NULL, article_title text NOT NULL, chunk_index int, content text NOT NULL, embedding vector(1536), created_at timestamptz DEFAULT now() ); CREATE INDEX ON kb_chunks USING hnsw (embedding vector_cosine_ops); ``` ## Step 3 — Retrieval + LLM ```python import anthropic claude = anthropic.Anthropic() def answer(question, conversation_history=[]): # 1. Embed the question q_vec = client.embeddings.create( model='text-embedding-3-small', input=question, ).data[0].embedding # 2. Retrieve top 5 chunks cur.execute(""" SELECT article_title, article_url, content, 1 - (embedding <=> %s::vector) AS similarity FROM kb_chunks ORDER BY embedding <=> %s::vector LIMIT 5 """, (q_vec, q_vec)) chunks = cur.fetchall() # 3. Build context context = " --- ".join([ f"## {title} ({url}) {content}" for title, url, content, _ in chunks ]) # 4. Call Claude with strict grounding prompt system = """You are a helpful customer support assistant for DomainIndia. Answer ONLY from the provided knowledge base context below. If the context doesn't cover the question, say "I don't have information about that in our knowledge base — let me connect you with our team." Always cite the article URL you used in format [source: URL]. Be concise — 2-3 paragraphs max.""" user_msg = f"Context: {context} Question: {question}" resp = claude.messages.create( model="claude-sonnet-4-6", max_tokens=500, system=system, messages=conversation_history + [{"role": "user", "content": user_msg}], ) return { "answer": resp.content[0].text, "sources": [{"title": t, "url": u} for t, u, _, _ in chunks[:3]], } ``` ## Step 4 — Conversation memory Multi-turn conversations need context. Pass previous turns: ```python history = [] while True: q = input("You: ") result = answer(q, history) print("Bot:", result['answer']) history.append({"role": "user", "content": q}) history.append({"role": "assistant", "content": result['answer']}) # Trim if too long (Claude has 1M tokens but cost scales) if len(history) > 20: history = history[-20:] ``` For production, store history in Redis keyed by session ID. ## Step 5 — Human handoff The bot should escalate when: - Confidence low (similarity score < 0.4 on top result) - User asks to talk to a human - Question is about account-specific data the bot can't see - Two unsuccessful retries ```python HANDOFF_TRIGGERS = [ 'talk to human', 'agent please', 'real person', 'not helpful' ] def should_handoff(question, result): if any(t in question.lower() for t in HANDOFF_TRIGGERS): return True if result.get('top_similarity', 1) < 0.4: return True if "I don't have information" in result['answer']: return True return False if should_handoff(question, result): # Create ticket in your system create_support_ticket(user_id, question, history) return "I've escalated this to our support team — they'll reach you at your email within 30 minutes." ``` ## Step 6 — The widget Simple embedded chat widget for your site: ```html
Support Assistant
``` ## Step 7 — Safety guardrails ## Step 8 — Keep KB fresh Re-index regularly so new KB articles become queryable: ```bash # Cron: nightly 3 AM 0 3 * * * python3 /opt/chatbot/reindex.py ``` Incremental reindex — only re-embed changed articles (hash-compare). ## Measuring impact Track: - **Deflection rate** — % of chats not escalated to humans - **Answer quality** — user thumbs-up/down, periodic manual review - **Top unanswered questions** — drive new KB articles - **Response latency** — target <3 seconds end-to-end ## Common pitfalls ## FAQ
Q Claude or GPT for this?

Claude Sonnet for quality on grounding tasks (tends to stick to context better). GPT-4o Mini for cost. Test both on your data.

Q What if my KB is in multiple languages?

OpenAI embeddings support 100+ languages natively. Multilingual KB works out of the box. For answer generation, Claude/GPT handle Hindi, Tamil, Bengali well.

Q Cost per conversation?

~$0.002–0.005 per message with Claude Haiku + small embeddings. 10K conversations/month = ~$40. With self-hosted LLM (see our self-hosting guide): near-zero ongoing cost.

Q Can I integrate with my existing ticket system?

Yes — most support systems (Zendesk, Freshdesk, our HostCore ticketing) have webhooks. Escalate via webhook when handoff triggered.

Q Self-hosted chatbot on DomainIndia hosting?

Backend on VPS. Vector DB (pgvector) on same VPS. Widget served from shared or static hosting. Full control.

Build your AI support bot on a DomainIndia VPS — predictable costs, full privacy. Explore VPS plans

Was this article helpful?

Your feedback helps us improve our documentation

Still need help? Submit a support ticket