Building an AI Customer Support Chatbot with Claude + Your Knowledge Base

Q: Claude or GPT for this?

Claude Sonnet for quality on grounding tasks (tends to stick to context better). GPT-4o Mini for cost. Test both on your data.

Q: What if my KB is in multiple languages?

OpenAI embeddings support 100+ languages natively. Multilingual KB works out of the box. For answer generation, Claude/GPT handle Hindi, Tamil, Bengali well.

Q: Cost per conversation?

~$0.002–0.005 per message with Claude Haiku + small embeddings. 10K conversations/month = ~$40. With self-hosted LLM (see our [self-hosting guide](https://domainindia.com/support/kb/self-hosting-llama-mistral-ollama-vllm-vps)): near-zero ongoing cost.

Q: Can I integrate with my existing ticket system?

Yes — most support systems (Zendesk, Freshdesk, our HostCore ticketing) have webhooks. Escalate via webhook when handoff triggered.

Q: Self-hosted chatbot on DomainIndia hosting?

Backend on VPS. Vector DB (pgvector) on same VPS. Widget served from shared or static hosting. Full control.

ByDomain India Team·DomainIndia Engineering

6 min readPublished 23 Apr 2026Updated 13 Jul 2026223 views

Building an AI Customer Support Chatbot with Claude + Your Knowledge Base

TL;DR

Build a customer support chatbot that answers from YOUR knowledge base using Claude (or GPT). This guide covers the full build: scraping your docs, embedding into a vector DB, retrieval-augmented generation, conversation memory, human handoff, and deployment on DomainIndia hosting.

Why a KB-grounded chatbot beats generic AI

A generic ChatGPT plugin:

Doesn't know your product details
Confidently hallucinates wrong answers
Can't cite sources
Users don't trust it

A KB-grounded chatbot:

Answers only from your real docs
Says "I don't know" when unsure
Shows which KB article the answer came from
Reduces support tickets 30–60% (our customers' measured impact)

Architecture

User question
    │
    ▼
[Embed question] ──► [Vector DB search] ──► Top 5 relevant KB chunks
                                                       │
                                                       ▼
                                          [Claude/GPT with context]
                                                       │
                                                       ▼
                                          Answer + citations → User
                                                       │
                       [Unanswered? → Escalate to human]

Pieces:

KB source — your help center, docs, FAQs
Vector DB — pgvector on PostgreSQL
Embedding model — OpenAI text-embedding-3-small (cheap, fast)
LLM — Claude Sonnet or GPT-4o for quality; Haiku/Mini for cost
Frontend — widget on your website or ticket system

See prerequisites in our RAG guide.

Step 1 — Ingest your KB

Pull articles from wherever they live:

python

import requests
from bs4 import BeautifulSoup

def scrape_kb_article(url):
    html = requests.get(url).text
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.select_one('h1').text.strip()
    body = soup.select_one('article').text.strip()
    return {'title': title, 'url': url, 'body': body}

# For DomainIndia KB:
articles = []
for slug_url in get_all_kb_urls():
    articles.append(scrape_kb_article(slug_url))

If KB is in a DB you own, read directly — no need to scrape your own site.

Step 2 — Chunk and embed

python

from langchain.text_splitter import RecursiveCharacterTextSplitter
from openai import OpenAI
import psycopg2

client = OpenAI()
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)
conn = psycopg2.connect("dbname=chatbot")
cur = conn.cursor()

for article in articles:
    chunks = splitter.split_text(article['body'])
    # Batch embed
    embeddings = client.embeddings.create(
        model='text-embedding-3-small',
        input=chunks,
    ).data

    for i, (chunk, emb) in enumerate(zip(chunks, embeddings)):
        cur.execute("""
            INSERT INTO kb_chunks (article_url, article_title, chunk_index, content, embedding)
            VALUES (%s, %s, %s, %s, %s)
        """, (article['url'], article['title'], i, chunk, emb.embedding))

conn.commit()

Schema:

sql

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE kb_chunks (
    id            bigserial PRIMARY KEY,
    article_url   text NOT NULL,
    article_title text NOT NULL,
    chunk_index   int,
    content       text NOT NULL,
    embedding     vector(1536),
    created_at    timestamptz DEFAULT now()
);

CREATE INDEX ON kb_chunks USING hnsw (embedding vector_cosine_ops);

Step 3 — Retrieval + LLM

python

import anthropic

claude = anthropic.Anthropic()

def answer(question, conversation_history=[]):
    # 1. Embed the question
    q_vec = client.embeddings.create(
        model='text-embedding-3-small',
        input=question,
    ).data[0].embedding

    # 2. Retrieve top 5 chunks
    cur.execute("""
        SELECT article_title, article_url, content,
               1 - (embedding <=> %s::vector) AS similarity
        FROM kb_chunks
        ORDER BY embedding <=> %s::vector
        LIMIT 5
    """, (q_vec, q_vec))
    chunks = cur.fetchall()

    # 3. Build context
    context = "

---

".join([
        f"## {title} ({url})
{content}"
        for title, url, content, _ in chunks
    ])

    # 4. Call Claude with strict grounding prompt
    system = """You are a helpful customer support assistant for DomainIndia.
Answer ONLY from the provided knowledge base context below.
If the context doesn't cover the question, say "I don't have information about that in our knowledge base — let me connect you with our team."
Always cite the article URL you used in format [source: URL].
Be concise — 2-3 paragraphs max."""

    user_msg = f"Context:
{context}

Question: {question}"

    resp = claude.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        system=system,
        messages=conversation_history + [{"role": "user", "content": user_msg}],
    )

    return {
        "answer": resp.content[0].text,
        "sources": [{"title": t, "url": u} for t, u, _, _ in chunks[:3]],
    }

Step 4 — Conversation memory

Multi-turn conversations need context. Pass previous turns:

python

history = []
while True:
    q = input("You: ")
    result = answer(q, history)
    print("Bot:", result['answer'])
    history.append({"role": "user", "content": q})
    history.append({"role": "assistant", "content": result['answer']})
    # Trim if too long (Claude has 1M tokens but cost scales)
    if len(history) > 20: history = history[-20:]

For production, store history in Redis keyed by session ID.

Step 5 — Human handoff

The bot should escalate when:

Confidence low (similarity score < 0.4 on top result)
User asks to talk to a human
Question is about account-specific data the bot can't see
Two unsuccessful retries

python

HANDOFF_TRIGGERS = [
    'talk to human', 'agent please', 'real person', 'not helpful'
]

def should_handoff(question, result):
    if any(t in question.lower() for t in HANDOFF_TRIGGERS): return True
    if result.get('top_similarity', 1) < 0.4: return True
    if "I don't have information" in result['answer']: return True
    return False

if should_handoff(question, result):
    # Create ticket in your system
    create_support_ticket(user_id, question, history)
    return "I've escalated this to our support team — they'll reach you at your email within 30 minutes."

Simple embedded chat widget for your site:

html

<div id="chat-widget" style="position:fixed; bottom:20px; right:20px; width:350px;">
  <div id="chat-header" style="background:#0f172a; color:white; padding:10px; border-radius:8px 8px 0 0;">
    Support Assistant
  </div>
  <div id="chat-messages" style="height:400px; overflow-y:auto; border:1px solid #ccc; padding:10px;"></div>
  <input id="chat-input" placeholder="Ask a question..." style="width:100%; padding:10px; border:1px solid #ccc;">
</div>

<script>
  const sessionId = crypto.randomUUID();
  const input = document.getElementById('chat-input');
  const messages = document.getElementById('chat-messages');

  input.addEventListener('keypress', async (e) => {
    if (e.key !== 'Enter') return;
    const q = input.value;
    input.value = '';
    append('user', q);

    const resp = await fetch('/api/chat', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify({ sessionId, question: q }),
    });
    const { answer, sources } = await resp.json();
    append('bot', answer, sources);
  });

  function append(who, text, sources = []) {
    const div = document.createElement('div');
    div.innerHTML = `<b>${who}:</b> ${text}`;
    if (sources.length) {
      div.innerHTML += '<br><small>Sources: ' +
        sources.map(s => `<a href="${s.url}" target="_blank">${s.title}</a>`).join(', ') +
        '</small>';
    }
    messages.appendChild(div);
    messages.scrollTop = messages.scrollHeight;
  }
</script>

Step 7 — Safety guardrails

Step 8 — Keep KB fresh

Re-index regularly so new KB articles become queryable:

bash

# Cron: nightly 3 AM
0 3 * * * python3 /opt/chatbot/reindex.py

Incremental reindex — only re-embed changed articles (hash-compare).

Measuring impact

Track:

Deflection rate — % of chats not escalated to humans
Answer quality — user thumbs-up/down, periodic manual review
Top unanswered questions — drive new KB articles
Response latency — target <3 seconds end-to-end

Common pitfalls

FAQ

Q Claude or GPT for this?

Claude Sonnet for quality on grounding tasks (tends to stick to context better). GPT-4o Mini for cost. Test both on your data.

Q What if my KB is in multiple languages?

OpenAI embeddings support 100+ languages natively. Multilingual KB works out of the box. For answer generation, Claude/GPT handle Hindi, Tamil, Bengali well.

Q Cost per conversation?

~$0.002–0.005 per message with Claude Haiku + small embeddings. 10K conversations/month = ~$40. With self-hosted LLM (see our self-hosting guide): near-zero ongoing cost.

Q Can I integrate with my existing ticket system?

Yes — most support systems (Zendesk, Freshdesk, our HostCore ticketing) have webhooks. Escalate via webhook when handoff triggered.

Q Self-hosted chatbot on DomainIndia hosting?

Backend on VPS. Vector DB (pgvector) on same VPS. Widget served from shared or static hosting. Full control.

Build your AI support bot on a DomainIndia VPS — predictable costs, full privacy. Explore VPS plans