Natural Language Voice Ordering

Every QSR chain in America is replacing humans with kiosks. They’re discovering a problem they didn’t expect: kiosks are slower.

View the pitch

DraftPOS KioskPhase 1

Version 0.1

Date April 2026

Owner Product

Why Current Kiosks Fail

A skilled cashier interprets “cheeseburger, no onions, add bacon” in under five seconds. A kiosk makes the customer find Burgers → Cheeseburger → Customize → Toppings → uncheck Onions → Add-ons → Bacon. That’s a navigation problem masquerading as a technology solution.

Kiosk menus are organized around restaurant operations—item categories, modifier trees, upsell flows—not around how customers actually order. Customers speak in natural shorthand. They reference items by nickname, bundle customizations into a single phrase, and expect the system to resolve ambiguity the way a person would.

The result: kiosk transactions take longer than counter transactions for any order above baseline complexity. This undercuts the core value proposition of kiosk deployment (throughput, labor cost reduction) and degrades customer satisfaction.

Root Causes

  • Menu taxonomy reflects kitchen logic, not customer mental models
  • Modifier selection requires multiple taps per customization
  • No recovery path for customers who can’t find an item
  • Upsell prompts interrupt flow rather than integrate naturally
  • No channel for customers to express a complete order at once

02 — OPPORTUNITY

The Hypothesis

An optional natural language voice interface—layered on top of the existing touch UI—lets customers order the way they talk. The system handles disambiguation through guided dialogue, not menu navigation. Customers who prefer touch keep it. Customers who want speed get it.

~40%Est. reduction in order time for complex orders

3–5×More modifiers captured per voice order vs. touch

↑ ATVNatural upsell via conversational prompt

These are directional targets pending pilot data. The primary success metric is order completion time, not revenue—if voice ordering isn’t faster, the feature has failed its core purpose. 03 — SCOPE

Goals & Non-Goals

✓ In Scope

  • Voice-initiated order entry at kiosk
  • NLP resolution of item names, nicknames, and combos
  • Guided dialogue for required modifiers (size, protein, etc.)
  • Inline upsell via conversational prompt
  • Graceful fallback to touch UI at any point
  • Order confirmation screen before payment
  • Accessibility mode (optional voice for customers with motor impairments)

✗ Out of Scope (v1)

  • Multi-language support beyond English
  • Loyalty program integration via voice
  • Payment via voice
  • Drive-thru speaker integration
  • Mobile app or web ordering
  • Custom wake word training per brand
  • Real-time inventory awareness

04 — USER STORIES

Core Use Cases

As a…I want to…So that…Priority
CustomerSay my full order at onceI don’t navigate menus at allP1
CustomerUse item nicknames (“large fry,” “double double”)I order the way I actually talkP1
CustomerBe prompted only for missing required infoThe system doesn’t ask what I already saidP1
CustomerSwitch to touch at any pointI’m not locked into voice if it’s not workingP1
CustomerHear a natural upsell promptI can add items without restartingP2
OperatorConfigure item aliases and nicknamesVoice matches our brand languageP2
OperatorDisable voice mode per kioskHigh-noise environments can opt outP2
OperatorReview voice order transcripts and error ratesI can identify failure patterns and improveP3

05 — FUNCTIONAL REQUIREMENTS

Feature Specification

5.1 Activation

  • Voice mode is opt-in; initiated by tapping a microphone button on the home screen
  • System greets with a single short prompt: “What can I get for you?”
  • Wake-word activation (e.g., “Hey [brand]”) is a v2 consideration

5.2 Intent Resolution

  • NLP layer resolves spoken input to menu items using semantic matching (not keyword matching)
  • Handles item aliases, common nicknames, and partial matches
  • Handles bundle orders (“number 3” or combo names)
  • Confidence threshold: items below threshold surface a disambiguation prompt, not an error
  • Disambiguation prompt offers 2–3 options max, displayed on screen and spoken

5.3 Guided Modifier Collection

  • System tracks which required modifiers are missing after initial utterance
  • Asks for missing info in a single compound question where possible: “What size, and would you like that with cheese?”
  • Never re-asks for information already provided
  • Optional modifiers are not requested unless customer signals interest

5.4 Upsell Integration

  • One upsell prompt per transaction, offered after order is complete but before confirmation
  • Prompt is contextual: based on order content, not rotation logic (“Want to add a drink to that?”)
  • Customer can accept via voice or decline; no friction either way

5.5 Confirmation & Handoff

  • System reads back complete order before confirmation
  • Customer confirms via voice (“Yes,” “That’s right”) or taps confirm on screen
  • Confirmed order passes to existing POS cart; payment flow is unchanged
  • Customer can request corrections before confirmation; system updates incrementally

5.6 Fallback & Error Handling

  • After two failed resolution attempts on any utterance, system surfaces touch UI for that item
  • Voice session remains active for remaining items
  • If microphone input drops or times out, system defaults to touch mode and saves current cart state
  • All failures are logged with session ID for analytics

06 — UX PRINCIPLES

Design Constraints

The voice interface should feel faster than touch, not more clever. Every design decision is tested against one question: does this make ordering faster for the customer?

  • No menus in voice mode. The customer should never hear “say 1 for Burgers, say 2 for Sandwiches.”
  • One question at a time. Compound questions only when modifiers are closely related.
  • Visible progress. Screen shows cart building in real time as voice is processed.
  • Silence is not failure. Brief pauses don’t terminate the session; system waits for natural turn completion.
  • Zero dead ends. Any failure state has a graceful exit back to touch without losing cart.
  • Latency budget: 1.5s. Response from end of customer utterance to system reply must be under 1.5 seconds. Above 2s feels broken.

07 — TECHNICAL REQUIREMENTS

System Constraints

RequirementSpecification
Speech-to-TextOn-device preferred for latency; cloud fallback acceptable. Must handle ambient restaurant noise (65–80 dB environments).
NLP EngineLLM-backed intent + entity extraction, fine-tuned on menu corpus. Menu data synced from POS at publish time.
LatencyEnd-to-end voice response <1.5s P95. Touch-to-voice handoff <200ms.
Offline ModeCore ordering must function if cloud NLP is unreachable; degrade to touch gracefully.
Audio HardwareDirectional microphone array required. Speaker for TTS response. Noise cancellation mandatory.
PrivacyAudio not persisted beyond session. Transcripts anonymized before logging. No biometric data captured.
AccessibilityWCAG 2.1 AA for visual elements. Voice modality as accessibility enhancement, not replacement.
POS IntegrationVoice-built cart passes to existing cart/payment API unchanged. No new payment flows in v1.

08 — SUCCESS METRICS

How We’ll Know It’s Working

Primary

  • Order completion time — voice vs. touch, controlled for order complexity. Target: voice ≤ touch at baseline; voice < touch for orders with 2+ modifiers.
  • Voice completion rate — % of voice sessions that result in a confirmed order without abandonment. Target: ≥75% in month 1, ≥85% by month 3.

Secondary

  • Intent resolution accuracy (% of utterances resolved on first attempt). Target: ≥88%.
  • Modifier capture rate — voice vs. touch. Hypothesis: voice captures more optional modifiers.
  • Average transaction value, voice vs. touch cohorts.
  • Customer satisfaction score delta (voice vs. touch session, via post-order prompt).

Kill Switch Criteria

  • Voice completion rate <60% at 30-day pilot review → pause and reassess
  • P95 latency >2.5s in production → revert to touch-only pending fix
  • Any privacy/data incident → immediate suspension pending audit

09 — PHASING

Rollout Plan

PhaseScopeDurationGate
AlphaInternal testing; simulated ordering with staff. Hardware validation.4 weeksLatency <1.5s P95; accuracy ≥85% on test corpus
Pilot3–5 live locations. Limited daypart (lunch peak). Voice opt-in only.6 weeksCompletion rate ≥75%; no P1 incidents
Limited GA25% of estate. Full daypart coverage. A/B test voice vs. touch default.8 weeksCompletion rate ≥82%; ATV neutral or positive
Full GAFull rollout. Operator dashboard live. Multi-language scoping begins.OngoingSustained completion rate ≥85%

10 — RISKS & MITIGATIONS

Known Risks

RiskLikelihoodImpactMitigation
Ambient noise degrades ASR accuracyHighHighDirectional mic array; noise floor testing required in Alpha; location-level disable flag
Customers distrust or avoid voice modeMediumMediumTouch always available; voice is opt-in; staff briefed on coaching customers
Menu alias coverage gaps cause frequent disambiguationMediumHighOperator alias configuration tool; auto-learn common failed utterances post-pilot
Latency exceeds budget on cloud NLP pathMediumHighOn-device NLP for common items; cloud for edge cases; hard 2s timeout with touch fallback
Accessibility misuse (voice replaces touch for customers who need touch)LowMediumTouch always present; voice never replaces, only augments

11 — OPEN QUESTIONS

Decisions Pending

  1. Wake word vs. button activation. Button is safer for v1, but wake word creates a more natural experience. Decision needed before Alpha hardware spec is locked.
  2. TTS voice selection. Brand-consistent synthetic voice vs. neutral. Requires brand/marketing sign-off.
  3. Transcript retention policy. Legal/privacy review needed. Current assumption: zero retention beyond session.
  4. Operator alias tooling. Build custom CMS or surface through existing menu management system? Engineering scoping required.
  5. A/B test design for pilot. Voice opt-in vs. voice-default? Need agreement with analytics team before pilot launch.
  6. Multi-language sequencing. Spanish is the obvious v2 target. Scoping timeline TBD post-GA data.

PRD — Natural Language Voice Ordering — v0.1 DRAFT April 2026 · Confidential