Every QSR chain in America is replacing humans with kiosks. They’re discovering a problem they didn’t expect: kiosks are slower.
DraftPOS KioskPhase 1
Version 0.1
Date April 2026
Owner Product
Why Current Kiosks Fail
A skilled cashier interprets “cheeseburger, no onions, add bacon” in under five seconds. A kiosk makes the customer find Burgers → Cheeseburger → Customize → Toppings → uncheck Onions → Add-ons → Bacon. That’s a navigation problem masquerading as a technology solution.
Kiosk menus are organized around restaurant operations—item categories, modifier trees, upsell flows—not around how customers actually order. Customers speak in natural shorthand. They reference items by nickname, bundle customizations into a single phrase, and expect the system to resolve ambiguity the way a person would.
The result: kiosk transactions take longer than counter transactions for any order above baseline complexity. This undercuts the core value proposition of kiosk deployment (throughput, labor cost reduction) and degrades customer satisfaction.
Root Causes
- Menu taxonomy reflects kitchen logic, not customer mental models
- Modifier selection requires multiple taps per customization
- No recovery path for customers who can’t find an item
- Upsell prompts interrupt flow rather than integrate naturally
- No channel for customers to express a complete order at once
02 — OPPORTUNITY
The Hypothesis
An optional natural language voice interface—layered on top of the existing touch UI—lets customers order the way they talk. The system handles disambiguation through guided dialogue, not menu navigation. Customers who prefer touch keep it. Customers who want speed get it.
~40%Est. reduction in order time for complex orders
3–5×More modifiers captured per voice order vs. touch
↑ ATVNatural upsell via conversational prompt
These are directional targets pending pilot data. The primary success metric is order completion time, not revenue—if voice ordering isn’t faster, the feature has failed its core purpose. 03 — SCOPE
Goals & Non-Goals
✓ In Scope
- Voice-initiated order entry at kiosk
- NLP resolution of item names, nicknames, and combos
- Guided dialogue for required modifiers (size, protein, etc.)
- Inline upsell via conversational prompt
- Graceful fallback to touch UI at any point
- Order confirmation screen before payment
- Accessibility mode (optional voice for customers with motor impairments)
✗ Out of Scope (v1)
- Multi-language support beyond English
- Loyalty program integration via voice
- Payment via voice
- Drive-thru speaker integration
- Mobile app or web ordering
- Custom wake word training per brand
- Real-time inventory awareness
04 — USER STORIES
Core Use Cases
| As a… | I want to… | So that… | Priority |
|---|---|---|---|
| Customer | Say my full order at once | I don’t navigate menus at all | P1 |
| Customer | Use item nicknames (“large fry,” “double double”) | I order the way I actually talk | P1 |
| Customer | Be prompted only for missing required info | The system doesn’t ask what I already said | P1 |
| Customer | Switch to touch at any point | I’m not locked into voice if it’s not working | P1 |
| Customer | Hear a natural upsell prompt | I can add items without restarting | P2 |
| Operator | Configure item aliases and nicknames | Voice matches our brand language | P2 |
| Operator | Disable voice mode per kiosk | High-noise environments can opt out | P2 |
| Operator | Review voice order transcripts and error rates | I can identify failure patterns and improve | P3 |
05 — FUNCTIONAL REQUIREMENTS
Feature Specification
5.1 Activation
- Voice mode is opt-in; initiated by tapping a microphone button on the home screen
- System greets with a single short prompt: “What can I get for you?”
- Wake-word activation (e.g., “Hey [brand]”) is a v2 consideration
5.2 Intent Resolution
- NLP layer resolves spoken input to menu items using semantic matching (not keyword matching)
- Handles item aliases, common nicknames, and partial matches
- Handles bundle orders (“number 3” or combo names)
- Confidence threshold: items below threshold surface a disambiguation prompt, not an error
- Disambiguation prompt offers 2–3 options max, displayed on screen and spoken
5.3 Guided Modifier Collection
- System tracks which required modifiers are missing after initial utterance
- Asks for missing info in a single compound question where possible: “What size, and would you like that with cheese?”
- Never re-asks for information already provided
- Optional modifiers are not requested unless customer signals interest
5.4 Upsell Integration
- One upsell prompt per transaction, offered after order is complete but before confirmation
- Prompt is contextual: based on order content, not rotation logic (“Want to add a drink to that?”)
- Customer can accept via voice or decline; no friction either way
5.5 Confirmation & Handoff
- System reads back complete order before confirmation
- Customer confirms via voice (“Yes,” “That’s right”) or taps confirm on screen
- Confirmed order passes to existing POS cart; payment flow is unchanged
- Customer can request corrections before confirmation; system updates incrementally
5.6 Fallback & Error Handling
- After two failed resolution attempts on any utterance, system surfaces touch UI for that item
- Voice session remains active for remaining items
- If microphone input drops or times out, system defaults to touch mode and saves current cart state
- All failures are logged with session ID for analytics
06 — UX PRINCIPLES
Design Constraints
The voice interface should feel faster than touch, not more clever. Every design decision is tested against one question: does this make ordering faster for the customer?
- No menus in voice mode. The customer should never hear “say 1 for Burgers, say 2 for Sandwiches.”
- One question at a time. Compound questions only when modifiers are closely related.
- Visible progress. Screen shows cart building in real time as voice is processed.
- Silence is not failure. Brief pauses don’t terminate the session; system waits for natural turn completion.
- Zero dead ends. Any failure state has a graceful exit back to touch without losing cart.
- Latency budget: 1.5s. Response from end of customer utterance to system reply must be under 1.5 seconds. Above 2s feels broken.
07 — TECHNICAL REQUIREMENTS
System Constraints
| Requirement | Specification |
|---|---|
| Speech-to-Text | On-device preferred for latency; cloud fallback acceptable. Must handle ambient restaurant noise (65–80 dB environments). |
| NLP Engine | LLM-backed intent + entity extraction, fine-tuned on menu corpus. Menu data synced from POS at publish time. |
| Latency | End-to-end voice response <1.5s P95. Touch-to-voice handoff <200ms. |
| Offline Mode | Core ordering must function if cloud NLP is unreachable; degrade to touch gracefully. |
| Audio Hardware | Directional microphone array required. Speaker for TTS response. Noise cancellation mandatory. |
| Privacy | Audio not persisted beyond session. Transcripts anonymized before logging. No biometric data captured. |
| Accessibility | WCAG 2.1 AA for visual elements. Voice modality as accessibility enhancement, not replacement. |
| POS Integration | Voice-built cart passes to existing cart/payment API unchanged. No new payment flows in v1. |
08 — SUCCESS METRICS
How We’ll Know It’s Working
Primary
- Order completion time — voice vs. touch, controlled for order complexity. Target: voice ≤ touch at baseline; voice < touch for orders with 2+ modifiers.
- Voice completion rate — % of voice sessions that result in a confirmed order without abandonment. Target: ≥75% in month 1, ≥85% by month 3.
Secondary
- Intent resolution accuracy (% of utterances resolved on first attempt). Target: ≥88%.
- Modifier capture rate — voice vs. touch. Hypothesis: voice captures more optional modifiers.
- Average transaction value, voice vs. touch cohorts.
- Customer satisfaction score delta (voice vs. touch session, via post-order prompt).
Kill Switch Criteria
- Voice completion rate <60% at 30-day pilot review → pause and reassess
- P95 latency >2.5s in production → revert to touch-only pending fix
- Any privacy/data incident → immediate suspension pending audit
09 — PHASING
Rollout Plan
| Phase | Scope | Duration | Gate |
|---|---|---|---|
| Alpha | Internal testing; simulated ordering with staff. Hardware validation. | 4 weeks | Latency <1.5s P95; accuracy ≥85% on test corpus |
| Pilot | 3–5 live locations. Limited daypart (lunch peak). Voice opt-in only. | 6 weeks | Completion rate ≥75%; no P1 incidents |
| Limited GA | 25% of estate. Full daypart coverage. A/B test voice vs. touch default. | 8 weeks | Completion rate ≥82%; ATV neutral or positive |
| Full GA | Full rollout. Operator dashboard live. Multi-language scoping begins. | Ongoing | Sustained completion rate ≥85% |
10 — RISKS & MITIGATIONS
Known Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Ambient noise degrades ASR accuracy | High | High | Directional mic array; noise floor testing required in Alpha; location-level disable flag |
| Customers distrust or avoid voice mode | Medium | Medium | Touch always available; voice is opt-in; staff briefed on coaching customers |
| Menu alias coverage gaps cause frequent disambiguation | Medium | High | Operator alias configuration tool; auto-learn common failed utterances post-pilot |
| Latency exceeds budget on cloud NLP path | Medium | High | On-device NLP for common items; cloud for edge cases; hard 2s timeout with touch fallback |
| Accessibility misuse (voice replaces touch for customers who need touch) | Low | Medium | Touch always present; voice never replaces, only augments |
11 — OPEN QUESTIONS
Decisions Pending
- Wake word vs. button activation. Button is safer for v1, but wake word creates a more natural experience. Decision needed before Alpha hardware spec is locked.
- TTS voice selection. Brand-consistent synthetic voice vs. neutral. Requires brand/marketing sign-off.
- Transcript retention policy. Legal/privacy review needed. Current assumption: zero retention beyond session.
- Operator alias tooling. Build custom CMS or surface through existing menu management system? Engineering scoping required.
- A/B test design for pilot. Voice opt-in vs. voice-default? Need agreement with analytics team before pilot launch.
- Multi-language sequencing. Spanish is the obvious v2 target. Scoping timeline TBD post-GA data.
PRD — Natural Language Voice Ordering — v0.1 DRAFT April 2026 · Confidential