📖 15 min read Apr 22, 2026

How to Build a WhatsApp AI Triage Bot with n8n in 30 Minutes

Complete architecture guide: message buffering, deduplication, AI escalation, and human handoff

ZapPro Team Builders of n8n WhatsApp automation templates

Your clients are messaging at 11pm. Your team responds at 9am. That's 10 hours of silence — and lost revenue.

This guide shows you how to build a 24/7 WhatsApp AI bot that responds instantly to inbound messages, qualifies leads in real-time, and escalates urgent cases to your team — without hiring anyone. It's fully automatable with n8n, Evolution API or WhatsApp Cloud API, and Claude/OpenRouter.

The architecture we're covering is battle-tested on real clients. You'll learn the exact patterns that prevent double processing, handle concurrency, and keep your bot reliable at scale.

What You'll Build

A WhatsApp bot that:

Responds 24/7 — No human needed until it's actually urgent
Qualifies leads — Asks relevant questions based on your business (clinic, salon, e-commerce, etc.)
Detects urgency — Identifies emergencies and escalates instantly
Schedules appointments — Integrates with Google Calendar (Pro only)
Never loses messages — Buffers messages to prevent race conditions and duplicate processing
Hands off smoothly — Notifies your team with full conversation context when human input is needed

Prerequisites

Before you start, you'll need:

n8n instance — Self-hosted (Docker, Cloudfy, or DigitalOcean) or cloud (n8n.cloud)
WhatsApp API — Evolution API (fastest: QR code, 10 minutes) OR WhatsApp Cloud API (official, slower: 3-5 days for verification)
LLM API key — OpenRouter or Anthropic API (Claude model). OpenRouter recommended (better routing, cheaper fallback)
PostgreSQL database — For message buffering and state tracking (included with most n8n hosting)
A WhatsApp Business number — Already active or ready to activate

⚡ Evolution vs Cloud API? Evolution API is faster to set up (QR code login, 10 min) but isn't official Meta. WhatsApp Cloud API is official and more reliable at scale, but requires business verification (3-5 days) and disables your personal app. For this tutorial, we'll use Evolution API. The same architecture works with Cloud API — only the webhook details change.

Architecture Overview: Why Each Step Matters

Here's the core flow:

Inbound Message ↓ NORMALIZE (handle text, media, etc.) ↓ IS_DUPLICATE_CHECK (check recent memory) ↓ BUFFER_INSERT (write to queue) ↓ WAIT 10 SECONDS (collect rapid messages) ↓ BUFFER_SELECT (FOR UPDATE SKIP LOCKED) (get queued messages, lock row) ↓ BUFFER_AGGREGATE (combine into one context) ↓ BUFFER_DELETE (remove processed) ↓ AI AGENT (Claude/Haiku via OpenRouter) ↓ HANDOFF_CHECK ({handoff} tag + urgency keywords) ↓ If handoff: SEND_TEAM_NOTIFICATION (include full context) Else: SEND_RESPONSE (reply to user)

Why this architecture?

NORMALIZE — WhatsApp can send text, images, audio, location. You need to extract just the meaningful data and detect media.
IS_DUPLICATE_CHECK — n8n workflows can trigger twice for the same webhook if there's network jitter. Check recent messages to avoid "Hello Hello!"
BUFFER_INSERT → WAIT 10s → SELECT — This is the core innovation. When a customer sends 5 rapid messages ("Can you help?" "I have a toothache" "It's urgent" "Please call" "Thanks"), we DON'T fire the AI 5 times. We insert all 5, wait a bit, then select and aggregate as one conversation.
FOR UPDATE SKIP LOCKED — PostgreSQL row-level locking. If two webhook executions hit SELECT at the same time, one locks the row. The other skips it (SKIP LOCKED) instead of waiting. Prevents duplicate processing.
BUFFER_AGGREGATE — Combine the 5 messages into a single prompt context for the AI, preserving message order.
AI AGENT — Claude running the triage logic (questions, scoring, handoff rules). Cheap with Haiku ($0.005 per conversation).
HANDOFF_CHECK — AI might include {handoff} tag in output, OR urgent keywords in input override the AI (if AI missed it). Fallback logic catches human mistakes.

This prevents the two most common bugs: double processing and missing urgent cases.

Step-by-Step Build

Step 1: Set Up Evolution API + Webhook

Create an Evolution API instance. This is the bridge between WhatsApp and n8n.

In Evolution API (or your provider's UI), create a new instance. Name it something like "my-clinic-bot".
Get the instance token and webhook URL path from n8n. In n8n, create a new workflow and add a Webhook trigger node. The path will be something like: /webhook/5ca49874-447c-46fc-9e4a-3a2bc8f98afd
Paste that webhook URL into Evolution API's instance settings.
Scan the QR code with your WhatsApp Business number. Wait 10-30 seconds for it to connect.

n8n Webhook Node Config

{
  "type": "webhook",
  "typeVersion": 2,
  "position": [250, 100],
  "parameters": {
    "path": "5ca49874-447c-46fc-9e4a-3a2bc8f98afd",
    "responseMode": "onReceived",
    "options": {}
  }
}

💡 Webhook path must be UUID format. Custom strings like "/my-webhook" cause persistent 500 errors in some n8n versions. Always use the auto-generated UUID from the Webhook node.

Step 2: Normalize Incoming Messages

WhatsApp sends different message types: text, image, audio, document, location. We need to extract just the content and media type.

NORMALIZE Node (Set Node)

// Extract key fields from Evolution API webhook
return {
  chat_id: $input.first().json.data.chatId,
  message_id: $input.first().json.data.id,
  sender: $input.first().json.data.fromMe ? 'bot' : 'user',
  text: $input.first().json.data.body || '',
  message_type: $input.first().json.data.type || 'text',
  timestamp: new Date($input.first().json.data.timestamp * 1000).toISOString(),
  media_type: $input.first().json.data.mediaType || null,
  media_url: $input.first().json.data.media?.url || null
};

Now you have a clean, standardized message object for the rest of the workflow.

Step 3: Message Buffering (The Most Important Part)

This is where the magic happens. Instead of processing messages immediately, we queue them for a few seconds and batch-process them.

Why? When a user sends rapid messages ("My tooth hurts!" "It's swollen!" "Please help!"), you don't want the bot to respond 3 separate times. You want one context-aware response that saw all 3 messages.

Step 3a: BUFFER_INSERT (PostgreSQL)

-- Create the buffer table (run once)
CREATE TABLE IF NOT EXISTS wa_msg_buffer (
  id SERIAL PRIMARY KEY,
  chat_id TEXT NOT NULL,
  message_id TEXT UNIQUE,
  content TEXT,
  message_type VARCHAR(50),
  inserted_at TIMESTAMPTZ DEFAULT NOW(),
  processed_at TIMESTAMPTZ
);

-- Insert the message
INSERT INTO wa_msg_buffer (chat_id, message_id, content, message_type)
VALUES ($1, $2, $3, $4)
ON CONFLICT (message_id) DO NOTHING;

The ON CONFLICT DO NOTHING ensures if the webhook fires twice with the same message_id, the second insert is silently ignored.

Step 3b: WAIT Node

Add a simple Wait node set to 10 seconds. This gives time for rapid-fire messages to accumulate in the buffer.

Step 3c: BUFFER_SELECT (PostgreSQL with Row Locking)

-- Lock and fetch unprocessed messages for this chat
WITH locked_rows AS (
  SELECT id, chat_id, message_id, content, message_type, inserted_at
  FROM wa_msg_buffer
  WHERE chat_id = $1
    AND processed_at IS NULL
  ORDER BY inserted_at ASC
  LIMIT 20
  FOR UPDATE SKIP LOCKED
)
SELECT * FROM locked_rows;

FOR UPDATE SKIP LOCKED is the secret sauce. If two workflow executions hit this query concurrently:

Execution 1 locks the rows and processes them
Execution 2 tries to lock the same rows, finds them locked, and skips them (returns 0 rows)

This prevents duplicate processing without blocking.

Step 3d: BUFFER_AGGREGATE (Set Node)

// Combine multiple messages into one prompt context
const messages = $input.all();
const aggregated = messages.map(msg =>
  `[${new Date(msg.json.inserted_at).toLocaleTimeString()}] ${msg.json.content}`
).join('\n');

return {
  chat_id: messages[0].json.chat_id,
  aggregated_text: aggregated,
  message_count: messages.length,
  first_message_id: messages[0].json.message_id,
  last_message_id: messages[messages.length - 1].json.message_id
};

Step 3e: BUFFER_DELETE (PostgreSQL)

-- Mark messages as processed
UPDATE wa_msg_buffer
SET processed_at = NOW()
WHERE chat_id = $1
  AND message_id IN ($2, $3, $4, ...);  -- List all message IDs

⚠️ Order matters! Do SELECT, then AGGREGATE, then DELETE. If you delete before the AI finishes, you'll lose the context.

Step 4: AI Agent with System Prompt

Now that you have aggregated messages, send them to Claude via OpenRouter.

AI Agent Node Config (n8n)

{
  "type": "openaiChat",
  "typeVersion": 1,
  "position": [750, 300],
  "parameters": {
    "model": "anthropic/claude-3-5-haiku",
    "provider": "openai",
    "prompt": "=See system prompt below=",
    "text": "={{ $('BUFFER_AGGREGATE').first().json.aggregated_text }}",
    "options": {
      "maxTokens": 500,
      "temperature": 0.7,
      "topP": 0.9
    }
  }
}

The system prompt is the core of your bot's behavior. Here's a real template:

System Prompt Template

You are a helpful AI receptionist for a dental clinic.

**Your role:**
- Answer patient questions about procedures, hours, and policies
- Qualify incoming leads with 3 questions: name, issue, and urgency
- Escalate to human if patient is in pain or the issue is urgent

**Triage rules:**
1. If patient mentions pain, swelling, bleeding, infection, or emergency → handoff
2. If patient asks for appointment → collect details and {handoff}
3. If you're confident you answered the question → respond naturally
4. If unsure → ask 1 clarifying question, don't guess

**Guardrails:**
- NEVER promise immediate response times ("instantly", "right now")
- NEVER schedule appointments without collecting preferred date/time
- NEVER recommend specific medications or diagnoses
- If patient needs emergency care → say "This needs urgent attention. Please call 911 or go to the ER"

**Handoff signal:**
When you need a human, end your response with {handoff} tag.
Example:
"I've noted your symptoms. Let me connect you with Dr. Silva who can schedule you for an urgent exam. {handoff}"

**Tone:**
- Professional but warm
- Clear and concise (max 2 sentences per message)
- Use patient's name when known

Now respond to the patient's message:

The {handoff} tag is crucial — the next step will look for it.

🧠 Why Haiku? Claude 3.5 Haiku is cheap ($0.005 per 1K input tokens), fast (<200ms), and good enough for triage logic. Don't need Sonnet for routing questions. Save Sonnet for Complex reasoning (contract analysis, strategy) or closing sales (Converte). Cost: ~$0.01 per conversation at typical message volumes.

Step 5: Handoff Detection + Fallback Logic

The AI outputs a response. If it includes {handoff}, route to your team. If it doesn't but the input has urgent keywords, force a handoff anyway (safety net).

HANDOFF_CHECK (IF Node)

// Condition: Is this a handoff?
const aiOutput = $('AI_AGENT').first().json.text;
const userInput = $('BUFFER_AGGREGATE').first().json.aggregated_text;

// Check 1: AI included {handoff} tag
const hasHandoffTag = aiOutput.includes('{handoff}');

// Check 2: User input has urgency keywords (fallback if AI missed it)
const urgencyKeywords = /emergency|urgent|pain|swelling|bleeding|help|911|cannot wait|right now/i;
const isUrgent = urgencyKeywords.test(userInput);

return {
  shouldHandoff: hasHandoffTag || isUrgent,
  reason: hasHandoffTag ? 'ai_tag' : 'urgency_fallback'
};

Remove {handoff} tags from response

// Clean the response for sending to user
const aiText = $('AI_AGENT').first().json.text;
const cleaned = aiText
  .replace(/\{handoff\}/g, '')
  .replace(/\{urgente\}/g, '')
  .trim();

return { response_text: cleaned };

Step 6: Send Response or Escalate

Split into two branches: handoff (notify team) or normal response (send to user).

SEND_RESPONSE (Evolution API node)

{
  "type": "evolutionAPI",
  "parameters": {
    "operation": "send-text",
    "instanceKey": "my-clinic-bot",
    "remoteJid": "={{ $('NORMALIZE').first().json.chat_id }}",
    "messageText": "={{ $json.response_text.replace(/\\\\n/g, '\\n') }}",
    "options": {}
  }
}

⚠️ Newline escaping bug: n8n's langchain agent sometimes outputs `\n` as a literal backslash-n instead of a real newline. The `.replace(/\\\\n/g, '\\n')` fixes it. This is a known issue with double-serialization in some n8n versions.

SEND_TEAM_NOTIFICATION (Evolution API)

{
  "type": "evolutionAPI",
  "parameters": {
    "operation": "send-text",
    "instanceKey": "my-clinic-bot",
    "remoteJid": "5516994247541@s.whatsapp.net",
    "messageText": "=See template below="
  }
}

Notification Message Template

🚨 URGENT HANDOFF

**Patient:** John Doe
**Issue:** Severe toothache, swelling
**Timestamp:** 2:34 PM

**Full Conversation:**
---
2:32 PM: Hi, I have a tooth problem
2:33 PM: It's been hurting since yesterday
2:34 PM: Can you help?
---

**Bot Note:** Patient reports pain + swelling. Needs immediate triage.

👉 Reply directly to continue the conversation.

This gives your team full context so they can respond intelligently.

Advanced Features (Tease Pro)

The foundation above is solid for simple triage. Here's what separates a basic bot from a production-grade one:

Follow-Up Automation (D+1 and D+3)

After handing off to a human, the conversation often goes quiet. If the patient doesn't respond within 24 hours, send a follow-up ("Just checking in..."). Again at 72 hours. This is huge for recovery.

Google Calendar Integration

When a patient says "I want to book an appointment for Monday at 3pm", the bot doesn't just write it down — it creates the Google Calendar event in real-time and sends back the booking confirmation. Zero manual data entry.

Anthropic API Fallback

If OpenRouter goes down (rare, but happens), automatically switch to Anthropic direct API. Your bot keeps working, customers never notice.

Error Monitoring

Messages stuck in the buffer? Database connection down? A monitoring workflow checks every 30 minutes and alerts you on WhatsApp before customers complain.

These are all included in ZapPro Pro ($497) — pre-built, tested, and deployed in production.

Common Pitfalls (And How to Avoid Them)

1. No Buffer = Double Processing

If you skip the buffer and send every message straight to the AI, a network glitch causes the webhook to fire twice. Your user gets two bot responses. With a buffer + FOR UPDATE SKIP LOCKED, the second execution finds 0 unprocessed rows and exits gracefully.

2. AI Hallucinating Appointment Times

Haiku is fast but sometimes invents information. System prompt must explicitly say: "NEVER schedule without confirming both date AND time with the patient." And: "If unsure, ask again."

3. Webhook Path as Custom String

Using `/my-webhook` instead of UUID causes silent 500 errors. n8n publishes the workflow, the webhook path doesn't sync. Always use the auto-generated UUID. If you must change it, deactivate + activate the workflow to re-register the webhook handler.

4. Saving Without Publishing

In n8n, PATCH (save) and Publish are different. Saving updates the draft. Publishing updates the webhook handler. If you modify a workflow and only save, the old version still runs on incoming webhooks. Always: Save → Publish (via UI or API).

5. Human Mode Lock Forever

You might use a `human_mode` flag to pause the bot while a human is responding. If the flag gets stuck (UPDATE query fails), the bot stays locked forever. Add a TTL: human_mode_until TIMESTAMPTZ, and check `human_mode_until > NOW()` — expires automatically after 24 hours.

6. Missing Urgency Fallback

Relying 100% on AI to detect urgency fails when the AI misses it. Always add a keyword-based fallback: if input contains "emergency|urgent|pain|bleeding", force handoff regardless of AI output.

Scaling Considerations

This architecture works for small teams (1-3 clinics, 100-500 messages/day). As you grow:

Database connections: PostgreSQL pool size defaults to 5 simultaneous connections. At 5+ clients, increase to 10-20. Ask your hosting provider.
Message buffer retention: Keep buffer data for 30 days (for debugging), then archive. Large buffer tables slow SELECT queries.
n8n execution history: Disable automatic cleanup if you're on shared Cloudfy (it runs slow). Prune manually weekly.
API costs: At 100 clinics sending 5K messages/month each (500K total), Haiku costs ~$2.5K/month. Still <8% of median clinic revenue.

Ready to Launch?

This guide shows the architecture, but building 70+ nodes from scratch takes 6-8 hours. The buffer logic, error handling, notification formatting — it all matters.

ZapPro templates skip the build time. Import, customize the system prompt for your business, and run. All the patterns above are already implemented, tested, and documented.

ZapPro Core — $297 ZapPro Pro — $497

Core includes: Webhook + buffer + AI triage + handoff. Ready to deploy in n8n. 30 days of free setup support via zapproai@gmail.com (deployment only).

Pro includes: Everything in Core + Google Calendar scheduling + D+1/D+3 follow-up automation + monitoring dashboard + Anthropic fallback.

FAQ

How long does it actually take to deploy?

If you're starting from scratch and building from this guide: 6-8 hours. If you use ZapPro Core: 30 minutes (import JSON + add your API credentials + test with a message).

What if I want to use WhatsApp Cloud API instead of Evolution?

The architecture is the same. Only the webhook payload format changes. Cloud API sends JSON with different field names. We handle that in the NORMALIZE step.

Can I add multiple languages?

Yes. Detect the user's language in NORMALIZE, then include a language parameter in the AI prompt. Claude handles 100+ languages.

What if the AI fails?

Add an error handler branch. If the AI node errors, fallback to a safe message ("I'm having trouble. A human will respond shortly. {handoff}"). This is included in ZapPro Pro as the Anthropic fallback.

Can I use this for sales, not just support?

Absolutely. Swap "triage questions" for "qualification questions". Same architecture, different system prompt. ZapPro includes a sales bot template (Converte track).

How much does this cost to run?

For a small clinic (500 messages/month):

Evolution API: ~$0 (self-hosted)
Claude Haiku (OpenRouter): ~$0.50
n8n: $20-100/month (self-hosted or cloud)
PostgreSQL: $0-50/month (included with most hosting)
Total: ~$20-150/month (usually <8% of clinic revenue)

Conclusion

A 24/7 WhatsApp bot isn't magic — it's pattern matching + intelligent buffering + a good system prompt. The architecture above prevents the two biggest failure modes: double processing and missed urgency.

If you want to build it yourself, follow the steps above. If you want to skip the 8 hours and launch today, ZapPro has you covered.

Either way, your customers get instant responses at 11pm. And your team gets to sleep.

Have questions? Email zapproai@gmail.com or join the n8n Community and search "WhatsApp buffer" — we're active there.