Skip to content
Voice Bot · Human-Like Avatar · Full Scope of Work

Meet Joy — the lifelike voice + face that runs the EatCookJoy app, 24/7.

A real-time avatar a client or chef can talk to inside the app — no typing, no forms. Joy registers new users, takes bookings, reads back menus, and confirms via WhatsApp — in English & Arabic, with under 500 ms end-to-end response and a face that lip-syncs to her own voice. Built on the same stack Deliveroo shipped with ElevenLabs for rider onboarding.

ElevenLabs · STT <150ms D-ID V4 Avatar · 100 FPS LiveKit · WebRTC GPT-4o · Function Calling n8n · Workflow Glue UAE PDPL Compliant
● LIVE
Listening…
Joy · EatCookJoy Avatar
Bilingual EN / AR · 70+ languages · WebRTC live
"Hi, I'm Joy! Are you here to book a chef, or would you like to cook on our platform?"
<500 ms
End-to-end voice response · STT + LLM + TTS + lip-sync
86%
Successful-contact rate Deliveroo achieved with the same vendor
70+
Languages supported · live English ↔ Arabic switching
$200–$500
Total monthly operating cost at 1,000–5,000 sessions/mo
20 wk
Foundation → Avatar → Onboarding → Booking → Launch
Part 1 · Layman Language

What is this, and what will it do?

Today a new client or chef opens the EatCookJoy app and has to read instructions, fill forms, pick from menus, wait for confirmations. They abandon. Joy replaces all of that with one conversation — they just talk, and the app does the rest.

For business stakeholders

The problem we're solving

Forms kill conversion. Chefs get confused on onboarding. Clients give up halfway through booking. Our support team spends hours answering the same questions every week.

Joy is a digital human on the screen that greets every user by name, asks the right questions, takes their answers by voice, and writes everything back to the database automatically — like a FaceTime call with a perfectly-trained assistant who never sleeps.

  • Greets returning users by name — and remembers their previous chefs
  • Speaks natural English & Arabic — switches mid-sentence
  • Reads back menus, prices, confirmations — no typing required
  • Sends a WhatsApp confirmation the moment the booking is logged
  • Can call inactive users back to re-engage them — outbound voice
For engineers · same idea, technical

What it actually is

A chained STT → LLM → TTS pipeline wired into a real-time WebRTC room, with a D-ID V4 Expressive avatar published as a secondary AI participant — synchronised audio + video tracks rendered at 100 FPS.

The LLM uses function calling to translate intent ("book my usual chef Saturday") into structured JSON, which n8n intercepts and POSTs to the existing ops.eatcookjoy.com REST API. No new backend.

  • ElevenLabs Conversational AI — STT <150 ms, cloned brand voice
  • OpenAI GPT-4o — intent, function calling, tool use
  • D-ID V4 Expressive — real-time lip-sync, 100 FPS rendering
  • LiveKit WebRTC — open-source, scales to 1,000s concurrent rooms
  • n8n + Qdrant — workflow automation + vector RAG over the Playbook
Part 2 · Technical Architecture

The pipeline — STT → LLM → TTS → Lip-Sync

Three layers — Presentation (the avatar you see), Intelligence (the voice AI), Integration (your existing REST API). They talk over WebRTC, the same low-latency protocol video calls use. The result: under 500 ms from when a user stops speaking to when Joy starts replying.

1

User speaks

Mic captured in app via LiveKit React Native SDK

0 ms
2

Speech-to-Text

ElevenLabs Scribe v2 Realtime — Arabic + English

<150 ms
3

LLM brain

GPT-4o · intent + function call to backend

~200 ms
4

Text-to-Speech

ElevenLabs cloned "Joy" voice · warm UAE tone

~80 ms
5

Avatar lip-sync

D-ID V4 Expressive · 100 FPS WebRTC video

<100 ms

Backend integration

When Joy detects the user wants to complete an action (register, book, cancel) the LLM emits a structured JSON tool call. n8n intercepts the call and routes it to the matching endpoint on ops.eatcookjoy.com. Same architecture ElevenLabs uses in its enterprise food-service deployments.

// LLM function-call output (intercepted by n8n) { "intent": "book_session", "client_id": "u_18234", "chef_id": "chef_ahmad", "date": "2026-05-25T19:00:00+04:00", "guests": 6, "cuisine": "Mediterranean", "locale": "en-AE" } // → POST https://ops.eatcookjoy.com/api/sessions → WhatsApp confirmation
Part 3 · Technology Stack

Recommended for EatCookJoy UAE

Best-of-breed components, all production-proven. We stay vendor-thin where it matters (ElevenLabs handles voice end-to-end) and modular where it doesn't (n8n + Qdrant are self-hostable, no lock-in).

LayerRecommendedWhy this one
Voice Agent PlatformElevenLabs Conversational AIIndustry-leading voice naturalness; 70+ languages; sub-150ms STT; verified Deliveroo deployment in food-service.
Real-Time CommsLiveKit (WebRTC)Open-source, scalable, every major SDK (iOS, Android, React Native, web). Self-host or LiveKit Cloud.
Avatar ProviderD-ID V4 Expressive · Beyond Presence (alt)Real-time lip-sync, 100 FPS WebRTC streaming, native LiveKit plugin. Beyond Presence as fallback for hyper-realism.
LLM / BrainOpenAI GPT-4o (primary) · Gemini Live (alt)Best-in-class function calling and tool use; structured JSON output reliability.
Speech-to-TextElevenLabs Scribe v2 RealtimeHighest accuracy for accented English & Arabic; bundled with ElevenLabs Agents.
Text-to-SpeechElevenLabs cloned voiceCustom-cloned warm "Joy" voice — UAE-appropriate, consistent across all sessions.
Workflow Automationn8n (self-hosted)No-code webhook triggers; visual logic; self-hostable for PDPL compliance.
Backend APIExisting ops.eatcookjoy.com RESTNo backend rebuild. Voice writes through your existing endpoints on session/user creation.
Mobile SDKLiveKit React Native SDKNative iOS & Android voice + avatar integration via one component.
Knowledge Base / RAGElevenLabs KB + Qdrant Vector DBJoy answers FAQs from the Playbook + menus + chef listings, kept fresh via webhook re-index.
Telephony (outbound)ElevenLabs outbound + Twilio fallbackFor re-engagement calls (like Deliveroo's inactive-rider campaign).
Messaging glueWhatsApp Business API (360dialog / Twilio)Booking confirmation, reminders, escalation when Joy cannot resolve.
Part 4 · User Flows

What Joy does for each user type

Three primary flows, mirroring the model Deliveroo proved with ElevenLabs (86% successful contact rate on rider onboarding). The voice writes structured JSON to your existing ops.eatcookjoy.com endpoints — no new backend.

🙋

Client registration

First-time visitor → live account in <90 seconds
  1. 1
    User opens app → Joy greets and asks: "Are you here to book a chef, or to cook on the platform?"
  2. 2
    User says "I want to book a chef"
  3. 3
    Joy collects: name → phone → email → location → cuisine → date
  4. 4
    LLM extracts structured JSON via function call
  5. 5
    n8n webhook fires → POST /api/users on ops.eatcookjoy.com
  6. 6
    Joy confirms verbally + WhatsApp confirmation sent
👨‍🍳

Chef onboarding

Voice + parallel photo upload
  1. 1
    Chef opens app → Joy: "Let's get you set up as a chef on EatCookJoy."
  2. 2
    Joy collects: full name → cuisine specialties → certifications → working hours → location
  3. 3
    Chef uploads photo through the app UI (parallel to voice flow)
  4. 4
    Structured payload written to ops.eatcookjoy.com/chefs
  5. 5
    Joy: "Fantastic! Your profile is live. Your first clients can book starting today."
  6. 6
    Admin notified → vetting workflow triggered
📅

Booking automation

Returning client · spoken booking · CRM memory
  1. 1
    Returning client: "Joy, book my usual chef for Saturday dinner, 6 people"
  2. 2
    LLM queries CRM via tool call → identifies preferred chef + last menu
  3. 3
    Joy: "That's Chef Ahmad. Saturday 7 PM, 6 guests, Mediterranean menu. Shall I confirm?"
  4. 4
    Client: "Yes"
  5. 5
    Booking logged via POST /api/sessions
  6. 6
    WhatsApp confirmation to client + chef
Part 5 · Screenshots · Avatar In-App

What the user sees

Live in-app mockups of the three primary touchpoints — registration greeting, voice booking in flight, and the confirmation card. Note: live blink, lip-sync, listening pulse are animated here exactly as they will render in production.

9:41
5G●●●
EatCookJoy
Talk to Joy
Joy
● Live · Listening
Hi! I'm Joy. Are you here to book a chef, or would you like to cook on our platform?Joy · just now
Book a chef please.You · 2s ago
Perfect. What's your name?Joy · just now
🎤
Screen 1 · First GreetingJoy lip-syncs the welcome and listens for the voice reply.
9:42
5G●●●
Voice Booking
EN · AR Available
Joy
● Booking in progress
Book Chef Ahmad for Saturday, 6 guests.You · 4s ago
Saturday 7 PM, Chef Ahmad, Mediterranean. Shall I confirm?Joy · just now
Yes please.You · 1s ago
✓ Booked. WhatsApp confirmation sent.Joy · just now
🎤
Screen 2 · Voice BookingReturning client books their usual chef in 3 turns.
9:43
5G●●●
Booking Summary
Session #ECJ-2826
Read back by Joy
Saturday Dinner · 24 May
ChefAhmad
CuisineMediterranean
Guests6
Start7:00 PM
LocationDubai Marina
PriceAED 600
✓ WhatsApp confirmation sent to +971 55 ••• 2370Joy · just now
Screen 3 · Summary & WhatsAppBooking written to ops.eatcookjoy.com + WhatsApp sent.
Part 6 · Live Demos · Try It Now

Talk to the actual stack — vendor demos

We're not asking you to imagine. Every component below is live and you can talk to it right now from this page. Same vendors, same architecture, same latency profile we will use for Joy.

Part 7 · Cost · Monthly Tooling Fees

The monthly bill — in QuickBooks format

Every tool, every cost, line by line — exactly how your accountant or bookkeeper would see it in QuickBooks. Operational monthly fees, one-time development cost, totals. Assumes 1,000–5,000 voice sessions per month at production scale.

QuickBooks Online · Vendor Bill
Recurring subscription · category: AI / Software
Invoice · Monthly
Bill #ECJ-VOICE-2026-05
Due monthly · Cycle: 1st of month
Bill To
EatCookJoy UAE FZ-LLCOffice 1203 · Bay Square 13 · Business Bay · DubaiTRN · 100-XXXXXX-00003
Cost Center
AI Operations · Voice Bot & AvatarGL Code: 5410 · AI / SaaS ToolsApprover: Aziz Saif (BD · Gulf)
Service · Vendor Notes Qty Rate (USD) Amount (USD)
Recurring Monthly · AI Voice Stack
ElevenLabs · Conversational AIVoice agent platform · Business plan base + per-minute usage $99 base + $0.06 / min · 2,500 min budgeted 1 249.00 $249.00
D-ID · V4 Expressive Avatar APIReal-time lip-sync, 100 FPS · Studio + enterprise API Studio $99 + API streaming credits 1 108.00 $108.00
LiveKit Cloud · WebRTC RoomsReal-time transport · Production tier · audio + video $99 base · bandwidth / concurrency included 1 99.00 $99.00
OpenAI · GPT-4o APILLM brain · function calling · ~50M input + 12M output tokens / mo $2.50/M input · $10/M output 1 120.00 $120.00
n8n Cloud · Workflow AutomationWebhook → backend bridge · Starter plan Self-host option = $0 + server cost 1 50.00 $50.00
Qdrant Cloud · Vector DB (RAG)Knowledge base — Playbook + menus + FAQs · 1 GB Hobby tier scales up to $25 at production 1 25.00 $25.00
WhatsApp Business API · 360dialogBooking confirmations + reminders · ~3,000 conversations / mo $0.025 / conversation (UAE rate) 3,000 0.025 $75.00
Twilio · Outbound Telephony (optional)Outbound re-engagement calls · only if Phase 4 activated Pay-as-you-go · usage-capped at $50/mo 1 50.00 $50.00
Recurring Monthly · Infra & Ops
AWS / Vercel · Hosting & BandwidthEdge functions for n8n + avatar worker container ~3 GB egress · 2 vCPU container 1 40.00 $40.00
Monitoring · Datadog / SentryLive call logs, error rates, latency SLOs Sentry Team tier 1 29.00 $29.00
Voice cloning · ElevenLabs Pro add-onCustom "Joy" voice license · one cloned voice slot Included in Business plan above 1 0.00 $0.00
One-Time · Development (Capitalised over 20 weeks)
Phase 1 · Foundation & Voice EngineWeeks 1–4 · ElevenLabs agent · KB · API wiring · widget One-time capex · amortise 24 mo 1 7,500.00 $7,500.00
Phase 2 · Avatar IntegrationWeeks 5–8 · D-ID · LiveKit room · React Native hook One-time capex 1 8,500.00 $8,500.00
Phase 3 · Client & Chef OnboardingWeeks 9–12 · Voice registration · Arabic enable One-time capex 1 7,000.00 $7,000.00
Phase 4 · Booking & Outbound CallsWeeks 13–16 · Booking flow · admin dashboard One-time capex 1 8,000.00 $8,000.00
Phase 5 · UAT, Optimisation & LaunchWeeks 17–20 · PDPL review · dialect tuning · training One-time capex 1 4,000.00 $4,000.00
Bill notes / accountant memo
Vendor invoices charged in USD, booked at the day's CBUAE rate (≈ AED 3.67 / USD). ElevenLabs and OpenAI bill per usage — rates above assume ~2,500 voice minutes/month and ~62M LLM tokens/month at production load (1,000–5,000 sessions).

Tax treatment: Software-as-a-Service expenses · UAE Corporate Tax deductible · no VAT charged on imported digital services (reverse-charge mechanism).

One-time development is capitalised & amortised over 24 months — straight-line. Per-month amortised cost = $1,458 / mo for two years.
Subtotal · Recurring Monthly$845.00
Buffer (10% volume variance)$85.00
Net Monthly Operating$930.00
+ Amortised Dev (24 mo)$1,458.00
Total Monthly · 24 mo blended$2,388.00
AED equivalent (3.67 × $)AED 8,764
Lean monthly · post-launch
$200 – $500
Operating cost at 1,000–5,000 voice sessions/month only, ex-dev. Per SOW conclusion.
Full production monthly
$930
All optional add-ons on: monitoring, outbound calls, n8n cloud, hosting buffer.
One-time build
$15,000 – $40,000
Full 20-week build (5 phases). SOW recommends $35K mid-point.
Part 8 · How to Use Joy · For the Client

Client — what you do, what Joy does

Three sentences from sign-up to dinner on the table. No forms, no scrolling, no typing — just talk.

🙋

You — the client

First-time booking · ~90 seconds

What you do

  • Open the EatCookJoy app on iOS or Android
  • Tap the microphone — Joy will greet you
  • Speak naturally: "I want to book a chef for Saturday dinner"
  • Answer Joy's questions: name, phone, location, cuisine, guests
  • Listen to the booking summary read aloud — say "yes" to confirm
  • Get a WhatsApp confirmation within seconds

What you do NOT do

  • Fill out any form
  • Type anything
  • Wait for a customer-service reply
  • Pick from drop-downs or scroll through menus
🤖

Joy — the avatar

Lives inside the app · 24/7 · EN + AR

What Joy does for you

  • Greets returning users by name the moment the app opens
  • Remembers your preferred chef, cuisine, last menu, allergens
  • Reads back menu suggestions in a warm UAE-tone voice
  • Checks chef availability live against the ops calendar
  • Confirms the booking with WhatsApp message + calendar invite
  • Handles changes: "Joy, move Saturday's booking to Sunday"
  • Answers FAQ from the Playbook: pricing, halal, allergens, refunds

What Joy can NOT do

  • Take payment without your explicit "yes" confirmation
  • Override a chef's confirmed schedule
  • Reveal another user's data — strictly per-account memory
Part 9 · How to Use Joy · For the Chef

Chef — onboarding by voice, schedule by voice

Joy turns the 12-step chef onboarding into a 5-minute conversation. Once you're live, you can check today's bookings, mark availability, and request payouts — all by voice.

👨‍🍳

Chef · onboarding

From sign-up to live profile in 5 min

Joy will ask you

  • Full name + nationality + spoken languages
  • Cuisine specialties (you can list multiple)
  • Certifications: food-handler card, allergen, halal-trained
  • Working hours and which days you're available
  • Your preferred areas across the UAE (Marina, JBR, Yas Island…)
  • One profile photo (upload via the camera button — runs in parallel)

What happens next

  • Your profile is written to ops.eatcookjoy.com/chefs
  • Admin gets notified for vetting
  • Once approved, Joy texts you: "You're live. First clients can book today."
📅

Chef · day-to-day

Voice schedule management

What you can say to Joy

  • "What are my bookings today?" — Joy reads them aloud
  • "Block Wednesday afternoon — I have a wedding"
  • "How much did I earn this week?" — Joy reads the payout summary
  • "Confirm the Saturday booking" — Joy confirms with the client
  • "I need to swap Friday with Chef Mariam" — Joy proposes the swap
  • "What's the client's allergen profile?" — Joy reads from CRM

WhatsApp + Voice combined

  • Confirmations land on WhatsApp instantly
  • Reminders 24 h before the session
  • Joy can call you if a client cancels last-minute
Part 10 · Admin · Owner · BD

Admin — the dashboard behind Joy

Joy doesn't run unsupervised. Every conversation is logged, every booking is auditable, and you can escalate any session to a human via WhatsApp at any time.

📊

What the admin sees

Web dashboard · ops.eatcookjoy.com/voice

Live monitoring

  • Active voice sessions — count + per-room transcript stream
  • STT confidence score per turn (flag low-confidence for review)
  • End-to-end latency — p50 / p95 / p99 (SLO <500 ms)
  • Booking conversion rate — voice vs traditional
  • Drop-off step — where users abandon the voice flow

Action buttons

  • "Take over" — admin joins the LiveKit room as a human
  • "Escalate to WhatsApp" — Joy hands off + sends transcript
  • "Block session" — kills a misbehaving session immediately
  • "Replay transcript" — for post-mortem on edge cases
🛡

Compliance & control

UAE PDPL · audit trail

Privacy controls

  • Zero-data-retention option on ElevenLabs Enterprise ($1K/mo upgrade)
  • Voice recordings stored max 30 days · purged automatically
  • User can request a deletion via "Joy, forget me"
  • All transcripts encrypted at rest (AES-256) and in transit (TLS 1.3)

Audit & reporting

  • Every booking has an immutable audit-log with the transcript ID
  • Monthly compliance report — PDPL-friendly format
  • Per-cost-line spend report exports to QuickBooks (this format)
  • NPS captured at end of each voice session ("Rate this 1–5")
Part 11 · Developer Spec · For the Vendor

Engineer-ready brief

Hand this section to your contracted vendor. SDK names, hooks, endpoints, function signatures, and the exact JSON contract Joy will use to talk to ops.eatcookjoy.com.

Mobile integration · React Native

LiveKit Agents SDK · D-ID AvatarSession plugin

React Native — useVoiceAssistant() hook

Drop one component into the app shell. It connects to a LiveKit room and renders the audio + video tracks published by the avatar worker.

import { useVoiceAssistant, LiveKitRoom, VideoTrack } from '@livekit/react-native'; function JoyAvatar() { const { audioTrack, videoTrack, state } = useVoiceAssistant(); return ( <LiveKitRoom serverUrl={LIVEKIT_URL} token={token}> <VideoTrack source={videoTrack} /> // avatar face <AudioRenderer source={audioTrack} /> // Joy's voice <StatusPill text={state} /> // listening / thinking / speaking </LiveKitRoom> ); }

Function-call schema · what the LLM emits

{ "tool": "create_session", "args": { "client_id": "u_18234", "chef_id": "chef_ahmad", "start_iso": "2026-05-25T19:00:00+04:00", "guests": 6, "cuisine": "mediterranean", "halal": true, "allergens": ["dairy"], "location": { "area": "Dubai Marina" } } }

Backend endpoints · ops.eatcookjoy.com

  • POST /api/users — register a new client (voice payload)
  • POST /api/chefs — register a new chef (voice + photo)
  • POST /api/sessions — create a booking (function call)
  • GET /api/users/:id/preferences — for personalised greeting
  • GET /api/chefs/availability?date=… — pre-flight before confirming
  • POST /api/whatsapp/send — confirmation trigger

Latency budget · per voice turn

  • STT (ElevenLabs Scribe v2 Realtime): ≤ 150 ms
  • LLM (GPT-4o, first token): ≤ 200 ms
  • TTS first chunk (ElevenLabs cloned voice): ≤ 80 ms
  • D-ID lip-sync render (WebRTC, 100 FPS): ≤ 100 ms
  • Total p95 SLO: < 500 ms end-to-end

Repos to clone (vendor starting point)

Part 12 · Risks & Mitigations

What could go wrong, and how we handle it

Every voice deployment hits the same five risks. We pre-bake mitigations into the SOW — nothing here is novel; it's the standard playbook used by Deliveroo, QuickEats, and the other ElevenLabs production references.

RiskImpactMitigation
Avatar latency on slow mobile networks Poor UX D-ID WebRTC streaming at 100 FPS · automatic text-fallback if RTT > 800 ms · graceful degrade to voice-only.
Arabic dialect variation (Gulf vs Egyptian vs Levant) Misunderstanding Train the ElevenLabs agent on a curated Gulf-Arabic prompt set · auto-fallback to English when confidence < 0.7 · escalate to human after 2 retries.
Privacy / data security (voice recording) Legal / regulatory Comply with UAE PDPL · ElevenLabs Enterprise Zero-Data-Retention upgrade ($1K/mo) · 30-day max retention · explicit consent on first run.
User resistance to talking to a bot Low adoption Joy is opt-in · text chat always available · "talk to a human" button always visible · first-run video shows what Joy can do.
Cost overrun at high voice volume Budget pressure Cap voice minutes per user per day · auto-route to Retell AI ($0.07/min) above a daily volume threshold · monthly spend alerts in QuickBooks.
LLM hallucination on prices or availability Booking errors Joy never quotes prices or availability without a fresh tool call · function-calling is mandatory for any commitment · human-in-the-loop for refunds.
Avatar uncanny-valley reaction UX comfort Test 3 avatar styles in UAT (illustrated · semi-realistic · photo-realistic) · ship the option with highest NPS · let users toggle.
Part 13 · 20-Week Build Plan

From kickoff to launch — five phases

Phased to de-risk. Phase 1 ships a voice-only widget you can already test on eatcookjoy.com in 4 weeks. Each subsequent phase adds one layer of intelligence and automation. No big-bang launches.

Phase 1 · Weeks 1–4
Foundation & Voice Engine
Functional voice agent with EatCookJoy knowledge — no avatar yet. Embeddable widget live on eatcookjoy.com.
ElevenLabs agent configuredKnowledge base loadedSTT + LLM + TTS in EN + ARn8n webhook → APIEmbeddable widget
Phase 2 · Weeks 5–8
Avatar Integration
Add the real-time human-like face. "Joy" goes live with lip-sync, blinking, breathing idle animations.
"Joy" digital human commissionedLiveKit room + D-ID AvatarSessionReact Native useVoiceAssistant() hookiOS + Android UAT
Phase 3 · Weeks 9–12
Client & Chef Onboarding Automation
Voice fully replaces forms. Joy registers clients and chefs, remembers returning users, switches to Arabic on cue.
Voice client registrationVoice chef onboardingSession memoryArabic activatedError recovery
Phase 4 · Weeks 13–16
Booking Engine & Outbound Calls
Full booking by voice. Joy can also call inactive users back — same playbook as Deliveroo's rider re-engagement campaign.
Voice booking flowChef voice scheduleOutbound calls (Twilio)Admin dashboardWhatsApp escalation
Phase 5 · Weeks 17–20
Testing, Optimisation & Launch
UAT with real UAE chefs and clients. Latency tuned to <500 ms p95. PDPL review signed off. Staff trained.
UAT with UAE chefsLatency <500 ms p95Arabic dialect tuningPDPL compliance reviewMonitoring dashboardStaff training

Joy is the moat — conversation, not forms.

The technology exists today, is proven in food-service (Deliveroo · QuickEats), and ships in a phased 20-week program. Full operational cost at scale is under $500 / month. The automation eliminates manual support overhead and slashes registration drop-off.

← Back to the Playbook Open AI Ops Playbook App Dev SOW ⎙ Save as PDF