How to Build a WhatsApp AI Triage Bot with n8n in 30 Minutes
Complete architecture guide: message buffering, deduplication, AI escalation, and human handoff
This guide shows you how to build a 24/7 WhatsApp AI bot that responds instantly to inbound messages, qualifies leads in real-time, and escalates urgent cases to your team — without hiring anyone. It's fully automatable with n8n, Evolution API or WhatsApp Cloud API, and Claude/OpenRouter.
The architecture we're covering is battle-tested on real clients. You'll learn the exact patterns that prevent double processing, handle concurrency, and keep your bot reliable at scale.
What You'll Build
A WhatsApp bot that:
- Responds 24/7 — No human needed until it's actually urgent
- Qualifies leads — Asks relevant questions based on your business (clinic, salon, e-commerce, etc.)
- Detects urgency — Identifies emergencies and escalates instantly
- Schedules appointments — Integrates with Google Calendar (Pro only)
- Never loses messages — Buffers messages to prevent race conditions and duplicate processing
- Hands off smoothly — Notifies your team with full conversation context when human input is needed
Prerequisites
Before you start, you'll need:
- n8n instance — Self-hosted (Docker, Cloudfy, or DigitalOcean) or cloud (n8n.cloud)
- WhatsApp API — Evolution API (fastest: QR code, 10 minutes) OR WhatsApp Cloud API (official, slower: 3-5 days for verification)
- LLM API key — OpenRouter or Anthropic API (Claude model). OpenRouter recommended (better routing, cheaper fallback)
- PostgreSQL database — For message buffering and state tracking (included with most n8n hosting)
- A WhatsApp Business number — Already active or ready to activate
Architecture Overview: Why Each Step Matters
Here's the core flow:
Why this architecture?
- NORMALIZE — WhatsApp can send text, images, audio, location. You need to extract just the meaningful data and detect media.
- IS_DUPLICATE_CHECK — n8n workflows can trigger twice for the same webhook if there's network jitter. Check recent messages to avoid "Hello Hello!"
- BUFFER_INSERT → WAIT 10s → SELECT — This is the core innovation. When a customer sends 5 rapid messages ("Can you help?" "I have a toothache" "It's urgent" "Please call" "Thanks"), we DON'T fire the AI 5 times. We insert all 5, wait a bit, then select and aggregate as one conversation.
- FOR UPDATE SKIP LOCKED — PostgreSQL row-level locking. If two webhook executions hit SELECT at the same time, one locks the row. The other skips it (SKIP LOCKED) instead of waiting. Prevents duplicate processing.
- BUFFER_AGGREGATE — Combine the 5 messages into a single prompt context for the AI, preserving message order.
- AI AGENT — Claude running the triage logic (questions, scoring, handoff rules). Cheap with Haiku ($0.005 per conversation).
- HANDOFF_CHECK — AI might include {handoff} tag in output, OR urgent keywords in input override the AI (if AI missed it). Fallback logic catches human mistakes.
This prevents the two most common bugs: double processing and missing urgent cases.
Step-by-Step Build
Step 1: Set Up Evolution API + Webhook
Create an Evolution API instance. This is the bridge between WhatsApp and n8n.
- In Evolution API (or your provider's UI), create a new instance. Name it something like "my-clinic-bot".
- Get the instance token and webhook URL path from n8n. In n8n, create a new workflow and add a Webhook trigger node. The path will be something like:
/webhook/5ca49874-447c-46fc-9e4a-3a2bc8f98afd - Paste that webhook URL into Evolution API's instance settings.
- Scan the QR code with your WhatsApp Business number. Wait 10-30 seconds for it to connect.
{
"type": "webhook",
"typeVersion": 2,
"position": [250, 100],
"parameters": {
"path": "5ca49874-447c-46fc-9e4a-3a2bc8f98afd",
"responseMode": "onReceived",
"options": {}
}
}
Step 2: Normalize Incoming Messages
WhatsApp sends different message types: text, image, audio, document, location. We need to extract just the content and media type.
// Extract key fields from Evolution API webhook
return {
chat_id: $input.first().json.data.chatId,
message_id: $input.first().json.data.id,
sender: $input.first().json.data.fromMe ? 'bot' : 'user',
text: $input.first().json.data.body || '',
message_type: $input.first().json.data.type || 'text',
timestamp: new Date($input.first().json.data.timestamp * 1000).toISOString(),
media_type: $input.first().json.data.mediaType || null,
media_url: $input.first().json.data.media?.url || null
};
Now you have a clean, standardized message object for the rest of the workflow.
Step 3: Message Buffering (The Most Important Part)
This is where the magic happens. Instead of processing messages immediately, we queue them for a few seconds and batch-process them.
Why? When a user sends rapid messages ("My tooth hurts!" "It's swollen!" "Please help!"), you don't want the bot to respond 3 separate times. You want one context-aware response that saw all 3 messages.
-- Create the buffer table (run once)
CREATE TABLE IF NOT EXISTS wa_msg_buffer (
id SERIAL PRIMARY KEY,
chat_id TEXT NOT NULL,
message_id TEXT UNIQUE,
content TEXT,
message_type VARCHAR(50),
inserted_at TIMESTAMPTZ DEFAULT NOW(),
processed_at TIMESTAMPTZ
);
-- Insert the message
INSERT INTO wa_msg_buffer (chat_id, message_id, content, message_type)
VALUES ($1, $2, $3, $4)
ON CONFLICT (message_id) DO NOTHING;
The ON CONFLICT DO NOTHING ensures if the webhook fires twice with the same message_id, the second insert is silently ignored.
Add a simple Wait node set to 10 seconds. This gives time for rapid-fire messages to accumulate in the buffer.
-- Lock and fetch unprocessed messages for this chat
WITH locked_rows AS (
SELECT id, chat_id, message_id, content, message_type, inserted_at
FROM wa_msg_buffer
WHERE chat_id = $1
AND processed_at IS NULL
ORDER BY inserted_at ASC
LIMIT 20
FOR UPDATE SKIP LOCKED
)
SELECT * FROM locked_rows;
FOR UPDATE SKIP LOCKED is the secret sauce. If two workflow executions hit this query concurrently:
- Execution 1 locks the rows and processes them
- Execution 2 tries to lock the same rows, finds them locked, and skips them (returns 0 rows)
This prevents duplicate processing without blocking.
// Combine multiple messages into one prompt context
const messages = $input.all();
const aggregated = messages.map(msg =>
`[${new Date(msg.json.inserted_at).toLocaleTimeString()}] ${msg.json.content}`
).join('\n');
return {
chat_id: messages[0].json.chat_id,
aggregated_text: aggregated,
message_count: messages.length,
first_message_id: messages[0].json.message_id,
last_message_id: messages[messages.length - 1].json.message_id
};
-- Mark messages as processed
UPDATE wa_msg_buffer
SET processed_at = NOW()
WHERE chat_id = $1
AND message_id IN ($2, $3, $4, ...); -- List all message IDs
Step 4: AI Agent with System Prompt
Now that you have aggregated messages, send them to Claude via OpenRouter.
{
"type": "openaiChat",
"typeVersion": 1,
"position": [750, 300],
"parameters": {
"model": "anthropic/claude-3-5-haiku",
"provider": "openai",
"prompt": "=See system prompt below=",
"text": "={{ $('BUFFER_AGGREGATE').first().json.aggregated_text }}",
"options": {
"maxTokens": 500,
"temperature": 0.7,
"topP": 0.9
}
}
}
The system prompt is the core of your bot's behavior. Here's a real template:
You are a helpful AI receptionist for a dental clinic.
**Your role:**
- Answer patient questions about procedures, hours, and policies
- Qualify incoming leads with 3 questions: name, issue, and urgency
- Escalate to human if patient is in pain or the issue is urgent
**Triage rules:**
1. If patient mentions pain, swelling, bleeding, infection, or emergency → handoff
2. If patient asks for appointment → collect details and {handoff}
3. If you're confident you answered the question → respond naturally
4. If unsure → ask 1 clarifying question, don't guess
**Guardrails:**
- NEVER promise immediate response times ("instantly", "right now")
- NEVER schedule appointments without collecting preferred date/time
- NEVER recommend specific medications or diagnoses
- If patient needs emergency care → say "This needs urgent attention. Please call 911 or go to the ER"
**Handoff signal:**
When you need a human, end your response with {handoff} tag.
Example:
"I've noted your symptoms. Let me connect you with Dr. Silva who can schedule you for an urgent exam. {handoff}"
**Tone:**
- Professional but warm
- Clear and concise (max 2 sentences per message)
- Use patient's name when known
Now respond to the patient's message:
The {handoff} tag is crucial — the next step will look for it.
Step 5: Handoff Detection + Fallback Logic
The AI outputs a response. If it includes {handoff}, route to your team. If it doesn't but the input has urgent keywords, force a handoff anyway (safety net).
// Condition: Is this a handoff?
const aiOutput = $('AI_AGENT').first().json.text;
const userInput = $('BUFFER_AGGREGATE').first().json.aggregated_text;
// Check 1: AI included {handoff} tag
const hasHandoffTag = aiOutput.includes('{handoff}');
// Check 2: User input has urgency keywords (fallback if AI missed it)
const urgencyKeywords = /emergency|urgent|pain|swelling|bleeding|help|911|cannot wait|right now/i;
const isUrgent = urgencyKeywords.test(userInput);
return {
shouldHandoff: hasHandoffTag || isUrgent,
reason: hasHandoffTag ? 'ai_tag' : 'urgency_fallback'
};
// Clean the response for sending to user
const aiText = $('AI_AGENT').first().json.text;
const cleaned = aiText
.replace(/\{handoff\}/g, '')
.replace(/\{urgente\}/g, '')
.trim();
return { response_text: cleaned };
Step 6: Send Response or Escalate
Split into two branches: handoff (notify team) or normal response (send to user).
{
"type": "evolutionAPI",
"parameters": {
"operation": "send-text",
"instanceKey": "my-clinic-bot",
"remoteJid": "={{ $('NORMALIZE').first().json.chat_id }}",
"messageText": "={{ $json.response_text.replace(/\\\\n/g, '\\n') }}",
"options": {}
}
}
{
"type": "evolutionAPI",
"parameters": {
"operation": "send-text",
"instanceKey": "my-clinic-bot",
"remoteJid": "5516994247541@s.whatsapp.net",
"messageText": "=See template below="
}
}
🚨 URGENT HANDOFF
**Patient:** John Doe
**Issue:** Severe toothache, swelling
**Timestamp:** 2:34 PM
**Full Conversation:**
---
2:32 PM: Hi, I have a tooth problem
2:33 PM: It's been hurting since yesterday
2:34 PM: Can you help?
---
**Bot Note:** Patient reports pain + swelling. Needs immediate triage.
👉 Reply directly to continue the conversation.
This gives your team full context so they can respond intelligently.
Advanced Features (Tease Pro)
The foundation above is solid for simple triage. Here's what separates a basic bot from a production-grade one:
Follow-Up Automation (D+1 and D+3)
After handing off to a human, the conversation often goes quiet. If the patient doesn't respond within 24 hours, send a follow-up ("Just checking in..."). Again at 72 hours. This is huge for recovery.
Google Calendar Integration
When a patient says "I want to book an appointment for Monday at 3pm", the bot doesn't just write it down — it creates the Google Calendar event in real-time and sends back the booking confirmation. Zero manual data entry.
Anthropic API Fallback
If OpenRouter goes down (rare, but happens), automatically switch to Anthropic direct API. Your bot keeps working, customers never notice.
Error Monitoring
Messages stuck in the buffer? Database connection down? A monitoring workflow checks every 30 minutes and alerts you on WhatsApp before customers complain.
These are all included in ZapPro Pro ($497) — pre-built, tested, and deployed in production.
Common Pitfalls (And How to Avoid Them)
1. No Buffer = Double Processing
If you skip the buffer and send every message straight to the AI, a network glitch causes the webhook to fire twice. Your user gets two bot responses. With a buffer + FOR UPDATE SKIP LOCKED, the second execution finds 0 unprocessed rows and exits gracefully.
2. AI Hallucinating Appointment Times
Haiku is fast but sometimes invents information. System prompt must explicitly say: "NEVER schedule without confirming both date AND time with the patient." And: "If unsure, ask again."
3. Webhook Path as Custom String
Using `/my-webhook` instead of UUID causes silent 500 errors. n8n publishes the workflow, the webhook path doesn't sync. Always use the auto-generated UUID. If you must change it, deactivate + activate the workflow to re-register the webhook handler.
4. Saving Without Publishing
In n8n, PATCH (save) and Publish are different. Saving updates the draft. Publishing updates the webhook handler. If you modify a workflow and only save, the old version still runs on incoming webhooks. Always: Save → Publish (via UI or API).
5. Human Mode Lock Forever
You might use a `human_mode` flag to pause the bot while a human is responding. If the flag gets stuck (UPDATE query fails), the bot stays locked forever. Add a TTL: human_mode_until TIMESTAMPTZ, and check `human_mode_until > NOW()` — expires automatically after 24 hours.
6. Missing Urgency Fallback
Relying 100% on AI to detect urgency fails when the AI misses it. Always add a keyword-based fallback: if input contains "emergency|urgent|pain|bleeding", force handoff regardless of AI output.
Scaling Considerations
This architecture works for small teams (1-3 clinics, 100-500 messages/day). As you grow:
- Database connections: PostgreSQL pool size defaults to 5 simultaneous connections. At 5+ clients, increase to 10-20. Ask your hosting provider.
- Message buffer retention: Keep buffer data for 30 days (for debugging), then archive. Large buffer tables slow SELECT queries.
- n8n execution history: Disable automatic cleanup if you're on shared Cloudfy (it runs slow). Prune manually weekly.
- API costs: At 100 clinics sending 5K messages/month each (500K total), Haiku costs ~$2.5K/month. Still <8% of median clinic revenue.
Ready to Launch?
This guide shows the architecture, but building 70+ nodes from scratch takes 6-8 hours. The buffer logic, error handling, notification formatting — it all matters.
ZapPro templates skip the build time. Import, customize the system prompt for your business, and run. All the patterns above are already implemented, tested, and documented.
Core includes: Webhook + buffer + AI triage + handoff. Ready to deploy in n8n. 30 days of free setup support via zapproai@gmail.com (deployment only).
Pro includes: Everything in Core + Google Calendar scheduling + D+1/D+3 follow-up automation + monitoring dashboard + Anthropic fallback.
FAQ
How long does it actually take to deploy?
If you're starting from scratch and building from this guide: 6-8 hours. If you use ZapPro Core: 30 minutes (import JSON + add your API credentials + test with a message).
What if I want to use WhatsApp Cloud API instead of Evolution?
The architecture is the same. Only the webhook payload format changes. Cloud API sends JSON with different field names. We handle that in the NORMALIZE step.
Can I add multiple languages?
Yes. Detect the user's language in NORMALIZE, then include a language parameter in the AI prompt. Claude handles 100+ languages.
What if the AI fails?
Add an error handler branch. If the AI node errors, fallback to a safe message ("I'm having trouble. A human will respond shortly. {handoff}"). This is included in ZapPro Pro as the Anthropic fallback.
Can I use this for sales, not just support?
Absolutely. Swap "triage questions" for "qualification questions". Same architecture, different system prompt. ZapPro includes a sales bot template (Converte track).
How much does this cost to run?
For a small clinic (500 messages/month):
- Evolution API: ~$0 (self-hosted)
- Claude Haiku (OpenRouter): ~$0.50
- n8n: $20-100/month (self-hosted or cloud)
- PostgreSQL: $0-50/month (included with most hosting)
- Total: ~$20-150/month (usually <8% of clinic revenue)
Conclusion
A 24/7 WhatsApp bot isn't magic — it's pattern matching + intelligent buffering + a good system prompt. The architecture above prevents the two biggest failure modes: double processing and missed urgency.
If you want to build it yourself, follow the steps above. If you want to skip the 8 hours and launch today, ZapPro has you covered.
Either way, your customers get instant responses at 11pm. And your team gets to sleep.
Have questions? Email zapproai@gmail.com or join the n8n Community and search "WhatsApp buffer" — we're active there.